mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
b805aa0113
* feat: add Confusion Protocol to preamble resolver Injects a high-stakes ambiguity gate at preamble tier >= 2 so all workflow skills get it. Fires when Claude encounters architectural decisions, data model changes, destructive operations, or contradictory requirements. Does NOT fire on routine coding. Addresses Karpathy failure mode #1 (wrong assumptions) with an inline STOP gate instead of relying on workflow skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Hermes and GBrain host configs Hermes: tool rewrites for terminal/read_file/patch/delegate_task, paths to ~/.hermes/skills/gstack, AGENTS.md config file. GBrain: coding skills become brain-aware when GBrain mod is installed. Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP). GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain host, enabling brain-first lookup and save-to-brain behavior. Both registered in hosts/index.ts with setup script redirect messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver — brain-first lookup and save-to-brain New scripts/resolvers/gbrain.ts with two resolver functions: - GBRAIN_CONTEXT_LOAD: search brain for context before skill starts - GBRAIN_SAVE_RESULTS: save skill output to brain after completion Placeholders added to 4 thinking skill templates (office-hours, investigate, plan-ceo-review, retro). Resolves to empty string on all hosts except gbrain via suppressedResolvers. GBRAIN suppression added to all 9 non-gbrain host configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire slop:diff into /review as advisory diagnostic Adds Step 3.5 to the review template: runs bun run slop:diff against the base branch to catch AI code quality issues (empty catches, redundant return await, overcomplicated abstractions). Advisory only, never blocking. Skips silently if slop-scan is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Karpathy compatibility note to README Positions gstack as the workflow enforcement layer for Karpathy-style CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills. Maps each Karpathy failure mode to the gstack skill that addresses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve native OpenClaw thinking skills office-hours: add design doc path visibility message after writing ceo-review: add HARD GATE reminder at review section transitions retro: add non-git context support (check memory for meeting notes) Mirrors template improvements to hand-crafted native skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update tests and golden fixtures for new hosts - Host count: 8 → 10 (hermes, gbrain) - OpenClaw adapter test: expects undefined (dead code removed) - Golden ship fixtures: updated with Confusion Protocol + vendoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files Regenerated from templates after Confusion Protocol, GBrain resolver placeholders, slop:diff in review, HARD GATE reminders, investigation learnings, design doc visibility, and retro non-git context changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.18.0.0 - CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain, slop in review, Karpathy note, skill improvements) - CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing - README.md: update agent count 8→10, add Hermes + GBrain to table - VERSION: bump to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: sync package.json version to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extract Step 0 from review SKILL.md in E2E test The review-base-branch E2E test was copying the full 1493-line review/SKILL.md into the test fixture. The agent spent 8+ turns reading it in chunks, leaving only 7 turns for actual work, causing error_max_turns on every attempt. Now extracts only Step 0 (base branch detection, ~50 lines) which is all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy a full SKILL.md file into an E2E test fixture." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: update GBrain and Hermes host configs for v0.10.0 integration GBrain: add 'triggers' to keepFields so generated skills pass checkResolvable() validation. Add version compat comment. Hermes: un-suppress GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS. The resolvers handle GBrain-not-installed gracefully, so Hermes agents with GBrain as a mod get brain features automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver DX improvements and preamble health check Resolver changes: - gbrain query → gbrain search (fast keyword search, not expensive hybrid) - Add keyword extraction guidance for agents - Show explicit gbrain put_page syntax with --title, --tags, heredoc - Add entity enrichment with false-positive filter - Name throttle error patterns (exit code 1, stderr keywords) - Add data-research routing for investigate skill - Expand skillSaveMap from 4 to 8 entries - Add brain operation telemetry summary Preamble changes: - Add gbrain doctor --fast --json health check for gbrain/hermes hosts - Parse check failures/warnings count - Show failing check details when score < 50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve keepFields in allowlist frontmatter mode The allowlist mode hard-coded name + description reconstruction but never iterated keepFields for additional fields. Adding 'triggers' to keepFields was a no-op because the field was silently stripped. Now iterates keepFields and preserves any field beyond name/description from the source template frontmatter, including YAML arrays. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add triggers to all 38 skill templates Multi-word, skill-specific trigger keywords for GBrain's RESOLVER.md router. Each skill gets 3-6 triggers derived from its "Use when asked to..." description text. Avoids single generic words that would collide across skills (e.g., "debug this" not "debug"). These are distinct from voice-triggers (speech-to-text aliases) and serve GBrain's checkResolvable() validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files and update golden fixtures Regenerated from updated templates (triggers, brain placeholders, resolver DX improvements, preamble health check). Golden fixtures updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: settings-hook remove exits 1 when nothing to remove gstack-settings-hook remove was exiting 0 when settings.json didn't exist, causing gstack-uninstall to report "SessionStart hook" as removed on clean systems where nothing was installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for GBrain v0.10.0 integration ARCHITECTURE.md: added GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS to resolver table. CHANGELOG.md: expanded v0.18.0.0 entry with GBrain v0.10.0 integration details (triggers, expanded brain-awareness, DX improvements, Hermes brain support), updated date. CLAUDE.md: added gbrain to resolvers/ directory comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: routing E2E stops writing to user's ~/.claude/skills/ installSkills() was copying SKILL.md files to both project-level (.claude/skills/ in tmpDir) and user-level (~/.claude/skills/). Writing to the user's real install fails when symlinks point to different worktrees or dangling targets (ENOENT on copyFileSync). Now installs to project-level only. The test already sets cwd to the tmpDir, so project-level discovery works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: scale Gemini E2E back to smoke test Gemini CLI gets lost in worktrees on complex tasks (review times out at 600s, discover-skill hits exit 124). Nobody uses Gemini for gstack skill execution. Replace the two failing tests (gemini-discover-skill and gemini-review-findings) with a single smoke test that verifies Gemini can start and read the README. 90s timeout, no skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
171 lines
5.0 KiB
Cheetah
171 lines
5.0 KiB
Cheetah
---
|
|
name: browse
|
|
preamble-tier: 1
|
|
version: 1.1.0
|
|
description: |
|
|
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
|
|
elements, verify page state, diff before/after actions, take annotated screenshots, check
|
|
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
|
|
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
|
|
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
|
|
site", "take a screenshot", or "dogfood this". (gstack)
|
|
triggers:
|
|
- browse a page
|
|
- headless browser
|
|
- take page screenshot
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- AskUserQuestion
|
|
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
# browse: QA Testing & Dogfooding
|
|
|
|
Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
|
|
State persists between calls (cookies, tabs, login sessions).
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
## Core QA Patterns
|
|
|
|
### 1. Verify a page loads correctly
|
|
```bash
|
|
$B goto https://yourapp.com
|
|
$B text # content loads?
|
|
$B console # JS errors?
|
|
$B network # failed requests?
|
|
$B is visible ".main-content" # key elements present?
|
|
```
|
|
|
|
### 2. Test a user flow
|
|
```bash
|
|
$B goto https://app.com/login
|
|
$B snapshot -i # see all interactive elements
|
|
$B fill @e3 "user@test.com"
|
|
$B fill @e4 "password"
|
|
$B click @e5 # submit
|
|
$B snapshot -D # diff: what changed after submit?
|
|
$B is visible ".dashboard" # success state present?
|
|
```
|
|
|
|
### 3. Verify an action worked
|
|
```bash
|
|
$B snapshot # baseline
|
|
$B click @e3 # do something
|
|
$B snapshot -D # unified diff shows exactly what changed
|
|
```
|
|
|
|
### 4. Visual evidence for bug reports
|
|
```bash
|
|
$B snapshot -i -a -o /tmp/annotated.png # labeled screenshot
|
|
$B screenshot /tmp/bug.png # plain screenshot
|
|
$B console # error log
|
|
```
|
|
|
|
### 5. Find all clickable elements (including non-ARIA)
|
|
```bash
|
|
$B snapshot -C # finds divs with cursor:pointer, onclick, tabindex
|
|
$B click @c1 # interact with them
|
|
```
|
|
|
|
### 6. Assert element states
|
|
```bash
|
|
$B is visible ".modal"
|
|
$B is enabled "#submit-btn"
|
|
$B is disabled "#submit-btn"
|
|
$B is checked "#agree-checkbox"
|
|
$B is editable "#name-field"
|
|
$B is focused "#search-input"
|
|
$B js "document.body.textContent.includes('Success')"
|
|
```
|
|
|
|
### 7. Test responsive layouts
|
|
```bash
|
|
$B responsive /tmp/layout # mobile + tablet + desktop screenshots
|
|
$B viewport 375x812 # or set specific viewport
|
|
$B screenshot /tmp/mobile.png
|
|
```
|
|
|
|
### 8. Test file uploads
|
|
```bash
|
|
$B upload "#file-input" /path/to/file.pdf
|
|
$B is visible ".upload-success"
|
|
```
|
|
|
|
### 9. Test dialogs
|
|
```bash
|
|
$B dialog-accept "yes" # set up handler
|
|
$B click "#delete-button" # trigger dialog
|
|
$B dialog # see what appeared
|
|
$B snapshot -D # verify deletion happened
|
|
```
|
|
|
|
### 10. Compare environments
|
|
```bash
|
|
$B diff https://staging.app.com https://prod.app.com
|
|
```
|
|
|
|
### 11. Show screenshots to the user
|
|
After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.
|
|
|
|
## User Handoff
|
|
|
|
When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor
|
|
login), hand off to the user:
|
|
|
|
```bash
|
|
# 1. Open a visible Chrome at the current page
|
|
$B handoff "Stuck on CAPTCHA at login page"
|
|
|
|
# 2. Tell the user what happened (via AskUserQuestion)
|
|
# "I've opened Chrome at the login page. Please solve the CAPTCHA
|
|
# and let me know when you're done."
|
|
|
|
# 3. When user says "done", re-snapshot and continue
|
|
$B resume
|
|
```
|
|
|
|
**When to use handoff:**
|
|
- CAPTCHAs or bot detection
|
|
- Multi-factor authentication (SMS, authenticator app)
|
|
- OAuth flows that require user interaction
|
|
- Complex interactions the AI can't handle after 3 attempts
|
|
|
|
The browser preserves all state (cookies, localStorage, tabs) across the handoff.
|
|
After `resume`, you get a fresh snapshot of wherever the user left off.
|
|
|
|
## Snapshot Flags
|
|
|
|
{{SNAPSHOT_FLAGS}}
|
|
|
|
## CSS Inspector & Style Modification
|
|
|
|
### Inspect element CSS
|
|
```bash
|
|
$B inspect .header # full CSS cascade for selector
|
|
$B inspect # latest picked element from sidebar
|
|
$B inspect --all # include user-agent stylesheet rules
|
|
$B inspect --history # show modification history
|
|
```
|
|
|
|
### Modify styles live
|
|
```bash
|
|
$B style .header background-color #1a1a1a # modify CSS property
|
|
$B style --undo # revert last change
|
|
$B style --undo 2 # revert specific change
|
|
```
|
|
|
|
### Clean screenshots
|
|
```bash
|
|
$B cleanup --all # remove ads, cookies, sticky, social
|
|
$B cleanup --ads --cookies # selective cleanup
|
|
$B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero.png
|
|
```
|
|
|
|
## Full Command List
|
|
|
|
{{COMMAND_REFERENCE}}
|