diff --git a/BROWSER.md b/BROWSER.md index 570f1ddb..624aa5f3 100644 --- a/BROWSER.md +++ b/BROWSER.md @@ -8,15 +8,16 @@ This document covers the command reference and internals of gstack's headless br |----------|----------|----------| | Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page | | Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content | -| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel]` | Get refs for interaction | -| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport` | Use the page | -| Inspect | `js`, `eval`, `css`, `attrs`, `console`, `network`, `cookies`, `storage`, `perf` | Debug and verify | +| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel] [-D] [-a] [-o] [-C]` | Get refs, diff, annotate | +| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`, `upload` | Use the page | +| Inspect | `js`, `eval`, `css`, `attrs`, `is`, `console`, `network`, `dialog`, `cookies`, `storage`, `perf` | Debug and verify | | Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees | | Compare | `diff ` | Spot differences between environments | +| Dialogs | `dialog-accept [text]`, `dialog-dismiss` | Control alert/confirm/prompt handling | | Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows | | Multi-step | `chain` (JSON from stdin) | Batch commands in one call | -All selector arguments accept CSS selectors or `@ref` after `snapshot`. 40+ commands total. +All selector arguments accept CSS selectors, `@e` refs after `snapshot`, or `@c` refs after `snapshot -C`. 50+ commands total. ## How it works @@ -60,11 +61,11 @@ browse/ │ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response │ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright │ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling -│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map -│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, etc.) -│ ├── write-commands.ts # Mutating commands (click, fill, select, navigate, etc.) -│ ├── meta-commands.ts # Server management (status, stop, restart) -│ └── buffers.ts # Console + network log capture (in-memory + disk flush) +│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map + diff/annotate/-C +│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, is, dialog, etc.) +│ ├── write-commands.ts # Mutating commands (click, fill, select, upload, dialog-accept, etc.) +│ ├── meta-commands.ts # Server management, chain, diff, snapshot routing +│ └── buffers.ts # CircularBuffer + console/network/dialog capture ├── test/ # Integration tests + HTML fixtures └── dist/ └── browse # Compiled binary (~58MB, Bun --compile) @@ -82,18 +83,28 @@ The browser's key innovation is ref-based element selection, built on Playwright No DOM mutation. No injected scripts. Just Playwright's native accessibility API. +**Extended snapshot features:** +- `--diff` (`-D`): Stores each snapshot as a baseline. On the next `-D` call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked. +- `--annotate` (`-a`): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use `-o ` to control the output path. +- `--cursor-interactive` (`-C`): Scans for non-ARIA interactive elements (divs with `cursor:pointer`, `onclick`, `tabindex>=0`) using `page.evaluate`. Assigns `@c1`, `@c2`... refs with deterministic `nth-child` CSS selectors. These are elements the ARIA tree misses but users can still click. + ### Authentication Each server session generates a random UUID as a bearer token. The token is written to the state file (`/tmp/browse-server.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer `. This prevents other processes on the machine from controlling the browser. -### Console and network capture +### Console, network, and dialog capture -The server hooks into Playwright's `page.on('console')` and `page.on('response')` events. All entries are kept in memory and flushed to disk every second: +The server hooks into Playwright's `page.on('console')`, `page.on('response')`, and `page.on('dialog')` events. All entries are kept in O(1) circular buffers (50,000 capacity each) and flushed to disk asynchronously via `Bun.write()`: - Console: `/tmp/browse-console.log` - Network: `/tmp/browse-network.log` +- Dialog: `/tmp/browse-dialog.log` -The `console` and `network` commands read from the in-memory buffers, not disk. +The `console`, `network`, and `dialog` commands read from the in-memory buffers, not disk. + +### Dialog handling + +Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept ` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken. ### Multi-workspace support @@ -184,7 +195,7 @@ bun test browse/test/commands # run command integration tests only bun test browse/test/snapshot # run snapshot tests only ``` -Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. Tests take ~3 seconds. +Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. 148 tests across 2 files, ~15 seconds total. ### Source map @@ -193,11 +204,11 @@ Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fi | `browse/src/cli.ts` | Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response. | | `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. | | `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. | -| `browse/src/snapshot.ts` | Parses Playwright's accessibility tree, assigns `@ref` labels, builds Locator map. | -| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `forms`, etc. | -| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `select`, `scroll`, etc. | -| `browse/src/meta-commands.ts` | Server management: `status`, `stop`, `restart`. | -| `browse/src/buffers.ts` | In-memory + disk capture for console and network logs. | +| `browse/src/snapshot.ts` | Parses accessibility tree, assigns `@e`/`@c` refs, builds Locator map. Handles `--diff`, `--annotate`, `-C`. | +| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `is`, `dialog`, `forms`, etc. Exports `getCleanText()`. | +| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `upload`, `dialog-accept`, `useragent` (with context recreation), etc. | +| `browse/src/meta-commands.ts` | Server management, chain routing, diff (DRY via `getCleanText`), snapshot delegation. | +| `browse/src/buffers.ts` | `CircularBuffer` (O(1) ring buffer) + console/network/dialog capture with async disk flush. | ### Deploying to the active skill diff --git a/SKILL.md b/SKILL.md index 08ad3e93..08f5216a 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,254 +1,324 @@ --- name: gstack -version: 1.0.0 +version: 1.1.0 description: | - Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL, - read page content, click elements, fill forms, run JavaScript, take screenshots, - inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after - first call. Use when you need to check a website, verify a deployment, read docs, - or interact with any web page. No MCP, no Chrome extension — just fast CLI. + Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with + elements, verify page state, diff before/after actions, take annotated screenshots, check + responsive layouts, test forms and uploads, handle dialogs, and assert element states. + ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a + user flow, or file a bug with evidence. allowed-tools: - Bash - Read --- -# gstack: Persistent Browser for Claude Code +# gstack browse: QA Testing & Dogfooding -Persistent headless Chromium daemon. First call auto-starts the server (~3s). -Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle. +Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command. +Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, sessions). ## SETUP (run this check BEFORE any browse command) -Before using any browse command, find the skill and check if the binary exists: - ```bash -# Check project-level first, then user-level -if test -x .claude/skills/gstack/browse/dist/browse; then - echo "READY_PROJECT" -elif test -x ~/.claude/skills/gstack/browse/dist/browse; then - echo "READY_USER" +B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null) +if [ -n "$B" ]; then + echo "READY: $B" else echo "NEEDS_SETUP" fi ``` -Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist. - If `NEEDS_SETUP`: -1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response. -2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run: -```bash -cd && ./setup -``` -3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash` -4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it. - -Once setup is done, it never needs to run again (the compiled binary persists). +1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. +2. Run: `cd && ./setup` +3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` ## IMPORTANT -- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user). +- Use the compiled binary via Bash: `$B ` - NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable. -- The browser persists between calls — cookies, tabs, and state carry over. -- The server auto-starts on first command. No setup needed. +- Browser persists between calls — cookies, login sessions, and tabs carry over. +- Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup. -## Quick Reference +## QA Workflows + +### Test a user flow (login, signup, checkout, etc.) ```bash B=~/.claude/skills/gstack/browse/dist/browse -# Navigate to a page -$B goto https://example.com +# 1. Go to the page +$B goto https://app.example.com/login -# Read cleaned page text -$B text - -# Take a screenshot (then Read the image) -$B screenshot /tmp/page.png - -# Snapshot: accessibility tree with refs +# 2. See what's interactive $B snapshot -i -# Click by ref (after snapshot) -$B click @e3 +# 3. Fill the form using refs +$B fill @e3 "test@example.com" +$B fill @e4 "password123" +$B click @e5 -# Fill by ref -$B fill @e4 "test@test.com" - -# Run JavaScript -$B js "document.title" - -# Get all links -$B links - -# Click by CSS selector -$B click "button.submit" - -# Fill a form by CSS selector -$B fill "#email" "test@test.com" -$B fill "#password" "abc123" -$B click "button[type=submit]" - -# Get HTML of an element -$B html "main" - -# Get computed CSS -$B css "body" "font-family" - -# Get element attributes -$B attrs "nav" - -# Wait for element to appear -$B wait ".loaded" - -# Accessibility tree -$B accessibility - -# Set viewport -$B viewport 375x812 - -# Set cookies / headers -$B cookie "session=abc123" -$B header "Authorization:Bearer token123" +# 4. Verify it worked +$B snapshot -D # diff shows what changed after clicking +$B is visible ".dashboard" # assert the dashboard appeared +$B screenshot /tmp/after-login.png ``` -## Command Reference +### Verify a deployment / check prod -### Navigation -``` -browse goto Navigate current tab -browse back Go back -browse forward Go forward -browse reload Reload page -browse url Print current URL +```bash +$B goto https://yourapp.com +$B text # read the page — does it load? +$B console # any JS errors? +$B network # any failed requests? +$B js "document.title" # correct title? +$B is visible ".hero-section" # key elements present? +$B screenshot /tmp/prod-check.png ``` -### Content extraction -``` -browse text Cleaned page text (no scripts/styles) -browse html [selector] innerHTML of element, or full page HTML -browse links All links as "text → href" -browse forms All forms + fields as JSON -browse accessibility Accessibility tree snapshot (ARIA) +### Dogfood a feature end-to-end + +```bash +# Navigate to the feature +$B goto https://app.example.com/new-feature + +# Take annotated screenshot — shows every interactive element with labels +$B snapshot -i -a -o /tmp/feature-annotated.png + +# Find ALL clickable things (including divs with cursor:pointer) +$B snapshot -C + +# Walk through the flow +$B snapshot -i # baseline +$B click @e3 # interact +$B snapshot -D # what changed? (unified diff) + +# Check element states +$B is visible ".success-toast" +$B is enabled "#next-step-btn" +$B is checked "#agree-checkbox" + +# Check console for errors after interactions +$B console ``` -### Snapshot (ref-based element selection) -``` -browse snapshot Full accessibility tree with @refs -browse snapshot -i Interactive elements only (buttons, links, inputs) -browse snapshot -c Compact (no empty structural elements) -browse snapshot -d Limit depth to N levels -browse snapshot -s Scope to CSS selector +### Test responsive layouts + +```bash +# Quick: 3 screenshots at mobile/tablet/desktop +$B goto https://yourapp.com +$B responsive /tmp/layout + +# Manual: specific viewport +$B viewport 375x812 # iPhone +$B screenshot /tmp/mobile.png +$B viewport 1440x900 # Desktop +$B screenshot /tmp/desktop.png ``` -After snapshot, use @refs as selectors in any command: +### Test file upload + +```bash +$B goto https://app.example.com/upload +$B snapshot -i +$B upload @e3 /path/to/test-file.pdf +$B is visible ".upload-success" +$B screenshot /tmp/upload-result.png ``` -browse click @e3 Click the element assigned ref @e3 -browse fill @e4 "value" Fill the input assigned ref @e4 -browse hover @e1 Hover the element assigned ref @e1 -browse html @e2 Get innerHTML of ref @e2 -browse css @e5 "color" Get computed CSS of ref @e5 -browse attrs @e6 Get attributes of ref @e6 + +### Test forms with validation + +```bash +$B goto https://app.example.com/form +$B snapshot -i + +# Submit empty — check validation errors appear +$B click @e10 # submit button +$B snapshot -D # diff shows error messages appeared +$B is visible ".error-message" + +# Fill and resubmit +$B fill @e3 "valid input" +$B click @e10 +$B snapshot -D # diff shows errors gone, success state +``` + +### Test dialogs (delete confirmations, prompts) + +```bash +# Set up dialog handling BEFORE triggering +$B dialog-accept # will auto-accept next alert/confirm +$B click "#delete-button" # triggers confirmation dialog +$B dialog # see what dialog appeared +$B snapshot -D # verify the item was deleted + +# For prompts that need input +$B dialog-accept "my answer" # accept with text +$B click "#rename-button" # triggers prompt +``` + +### Compare two pages / environments + +```bash +$B diff https://staging.app.com https://prod.app.com +``` + +### Multi-step chain (efficient for long flows) + +```bash +echo '[ + ["goto","https://app.example.com"], + ["snapshot","-i"], + ["fill","@e3","test@test.com"], + ["fill","@e4","password"], + ["click","@e5"], + ["snapshot","-D"], + ["screenshot","/tmp/result.png"] +]' | $B chain +``` + +## Quick Assertion Patterns + +```bash +# Element exists and is visible +$B is visible ".modal" + +# Button is enabled/disabled +$B is enabled "#submit-btn" +$B is disabled "#submit-btn" + +# Checkbox state +$B is checked "#agree" + +# Input is editable +$B is editable "#name-field" + +# Element has focus +$B is focused "#search-input" + +# Page contains text +$B js "document.body.textContent.includes('Success')" + +# Element count +$B js "document.querySelectorAll('.list-item').length" + +# Specific attribute value +$B attrs "#logo" # returns all attributes as JSON + +# CSS property +$B css ".button" "background-color" +``` + +## Snapshot System + +The snapshot is your primary tool for understanding and interacting with pages. + +```bash +$B snapshot -i # Interactive elements only (buttons, links, inputs) with @e refs +$B snapshot -c # Compact (no empty structural elements) +$B snapshot -d 3 # Limit depth to 3 levels +$B snapshot -s "main" # Scope to CSS selector +$B snapshot -D # Diff against previous snapshot (what changed?) +$B snapshot -a # Annotated screenshot with ref labels +$B snapshot -o /tmp/x.png # Output path for annotated screenshot +$B snapshot -C # Cursor-interactive elements (@c refs — divs with pointer, onclick) +``` + +Combine flags: `$B snapshot -i -a -C -o /tmp/annotated.png` + +After snapshot, use @refs everywhere: +```bash +$B click @e3 $B fill @e4 "value" $B hover @e1 +$B html @e2 $B css @e5 "color" $B attrs @e6 +$B click @c1 # cursor-interactive ref (from -C) ``` Refs are invalidated on navigation — run `snapshot` again after `goto`. +## Command Reference + +### Navigation +| Command | Description | +|---------|-------------| +| `goto ` | Navigate to URL | +| `back` / `forward` | History navigation | +| `reload` | Reload page | +| `url` | Print current URL | + +### Reading +| Command | Description | +|---------|-------------| +| `text` | Cleaned page text | +| `html [selector]` | innerHTML | +| `links` | All links as "text -> href" | +| `forms` | Forms + fields as JSON | +| `accessibility` | Full ARIA tree | + ### Interaction -``` -browse click Click element (CSS selector or @ref) -browse fill Fill input field -browse select Select dropdown value -browse hover Hover over element -browse type Type into focused element -browse press Press key (Enter, Tab, Escape, etc.) -browse scroll [selector] Scroll element into view, or page bottom -browse wait Wait for element to appear (max 10s) -browse viewport Set viewport size (e.g. 375x812) -``` +| Command | Description | +|---------|-------------| +| `click ` | Click element | +| `fill ` | Fill input | +| `select ` | Select dropdown | +| `hover ` | Hover element | +| `type ` | Type into focused element | +| `press ` | Press key (Enter, Tab, Escape) | +| `scroll [sel]` | Scroll element into view | +| `wait ` | Wait for element (max 10s) | +| `wait --networkidle` | Wait for network to be idle | +| `wait --load` | Wait for page load event | +| `upload ` | Upload file(s) | +| `cookie-import ` | Import cookies from JSON file | +| `dialog-accept [text]` | Auto-accept dialogs | +| `dialog-dismiss` | Auto-dismiss dialogs | +| `viewport ` | Set viewport size | ### Inspection -``` -browse js Run JS, print result -browse eval Run JS file against page -browse css Get computed CSS property -browse attrs Get element attributes as JSON -browse console Dump captured console messages -browse console --clear Clear console buffer -browse network Dump captured network requests -browse network --clear Clear network buffer -browse cookies Dump all cookies as JSON -browse storage localStorage + sessionStorage as JSON -browse storage set Set localStorage value -browse perf Page load performance timings -``` +| Command | Description | +|---------|-------------| +| `js ` | Run JavaScript | +| `eval ` | Run JS file | +| `css ` | Computed CSS | +| `attrs ` | Element attributes | +| `is ` | State check (visible/hidden/enabled/disabled/checked/editable/focused) | +| `console [--clear\|--errors]` | Console messages (--errors filters to error/warning) | +| `network [--clear]` | Network requests | +| `dialog [--clear]` | Dialog messages | +| `cookies` | All cookies | +| `storage` | localStorage + sessionStorage | +| `perf` | Page load timings | ### Visual -``` -browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png) -browse pdf [path] Save as PDF -browse responsive [prefix] Screenshots at mobile/tablet/desktop -``` - -### Compare -``` -browse diff Text diff between two pages -``` - -### Multi-step (chain) -``` -echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain -``` +| Command | Description | +|---------|-------------| +| `screenshot [path]` | Screenshot | +| `pdf [path]` | Save as PDF | +| `responsive [prefix]` | Mobile/tablet/desktop screenshots | +| `diff ` | Text diff between pages | ### Tabs -``` -browse tabs List tabs (id, url, title) -browse tab Switch to tab -browse newtab [url] Open new tab -browse closetab [id] Close tab -``` +| Command | Description | +|---------|-------------| +| `tabs` | List tabs | +| `tab ` | Switch tab | +| `newtab [url]` | Open tab | +| `closetab [id]` | Close tab | -### Server management -``` -browse status Server health, uptime, tab count -browse stop Shutdown server -browse restart Kill + restart server -``` +### Server +| Command | Description | +|---------|-------------| +| `status` | Health check | +| `stop` | Shutdown | +| `restart` | Restart | -## Speed Rules +## Tips -1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly. -2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors. -3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text. -4. **Use `links` to survey.** Faster than `text` when you just need navigation structure. -5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step. -6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots. - -## When to Use What - -| Task | Commands | -|------|----------| -| Read a page | `goto ` then `text` | -| Interact with elements | `snapshot -i` then `click @e3` | -| Check if element exists | `js "!!document.querySelector('.thing')"` | -| Extract specific data | `js "document.querySelector('.price').textContent"` | -| Visual check | `screenshot /tmp/x.png` then Read the image | -| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` | -| Check CSS | `css "selector" "property"` or `css @e3 "property"` | -| Inspect DOM | `html "selector"` or `attrs @e3` | -| Debug console errors | `console` | -| Check network requests | `network` | -| Check local dev | `goto http://127.0.0.1:3000` | -| Compare two pages | `diff ` | -| Mobile layout check | `responsive /tmp/prefix` | -| Multi-step flow | `echo '[...]' \| browse chain` | - -## Architecture - -- Persistent Chromium daemon on localhost (port 9400-9410) -- Bearer token auth per session -- State file: `/tmp/browse-server.json` -- Console log: `/tmp/browse-console.log` -- Network log: `/tmp/browse-network.log` -- Auto-shutdown after 30 min idle -- Chromium crash → server exits → auto-restarts on next command +1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `screenshot` all hit the loaded page instantly. +2. **Use `snapshot -i` first.** See all interactive elements, then click/fill by ref. No CSS selector guessing. +3. **Use `snapshot -D` to verify.** Baseline → action → diff. See exactly what changed. +4. **Use `is` for assertions.** `is visible .modal` is faster and more reliable than parsing page text. +5. **Use `snapshot -a` for evidence.** Annotated screenshots are great for bug reports. +6. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses. +7. **Check `console` after actions.** Catch JS errors that don't surface visually. +8. **Use `chain` for long flows.** Single command, no per-step CLI overhead. diff --git a/browse/SKILL.md b/browse/SKILL.md index b752aec6..99c979c5 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -1,254 +1,128 @@ --- name: browse -version: 1.0.0 +version: 1.1.0 description: | - Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL, - read page content, click elements, fill forms, run JavaScript, take screenshots, - inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after - first call. Use when you need to check a website, verify a deployment, read docs, - or interact with any web page. No MCP, no Chrome extension — just fast CLI. + Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with + elements, verify page state, diff before/after actions, take annotated screenshots, check + responsive layouts, test forms and uploads, handle dialogs, and assert element states. + ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a + user flow, or file a bug with evidence. allowed-tools: - Bash - Read --- -# gstack: Persistent Browser for Claude Code +# browse: QA Testing & Dogfooding -Persistent headless Chromium daemon. First call auto-starts the server (~3s). -Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle. +Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command. +State persists between calls (cookies, tabs, login sessions). -## SETUP (run this check BEFORE any browse command) - -Before using any browse command, find the skill and check if the binary exists: +## Core QA Patterns +### 1. Verify a page loads correctly ```bash -# Check project-level first, then user-level -if test -x .claude/skills/gstack/browse/dist/browse; then - echo "READY_PROJECT" -elif test -x ~/.claude/skills/gstack/browse/dist/browse; then - echo "READY_USER" -else - echo "NEEDS_SETUP" -fi +$B goto https://yourapp.com +$B text # content loads? +$B console # JS errors? +$B network # failed requests? +$B is visible ".main-content" # key elements present? ``` -Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist. - -If `NEEDS_SETUP`: -1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response. -2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run: +### 2. Test a user flow ```bash -cd && ./setup +$B goto https://app.com/login +$B snapshot -i # see all interactive elements +$B fill @e3 "user@test.com" +$B fill @e4 "password" +$B click @e5 # submit +$B snapshot -D # diff: what changed after submit? +$B is visible ".dashboard" # success state present? ``` -3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash` -4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it. - -Once setup is done, it never needs to run again (the compiled binary persists). - -## IMPORTANT - -- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user). -- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable. -- The browser persists between calls — cookies, tabs, and state carry over. -- The server auto-starts on first command. No setup needed. - -## Quick Reference +### 3. Verify an action worked ```bash -B=~/.claude/skills/gstack/browse/dist/browse - -# Navigate to a page -$B goto https://example.com - -# Read cleaned page text -$B text - -# Take a screenshot (then Read the image) -$B screenshot /tmp/page.png - -# Snapshot: accessibility tree with refs -$B snapshot -i - -# Click by ref (after snapshot) -$B click @e3 - -# Fill by ref -$B fill @e4 "test@test.com" - -# Run JavaScript -$B js "document.title" - -# Get all links -$B links - -# Click by CSS selector -$B click "button.submit" - -# Fill a form by CSS selector -$B fill "#email" "test@test.com" -$B fill "#password" "abc123" -$B click "button[type=submit]" - -# Get HTML of an element -$B html "main" - -# Get computed CSS -$B css "body" "font-family" - -# Get element attributes -$B attrs "nav" - -# Wait for element to appear -$B wait ".loaded" - -# Accessibility tree -$B accessibility - -# Set viewport -$B viewport 375x812 - -# Set cookies / headers -$B cookie "session=abc123" -$B header "Authorization:Bearer token123" +$B snapshot # baseline +$B click @e3 # do something +$B snapshot -D # unified diff shows exactly what changed ``` -## Command Reference - -### Navigation -``` -browse goto Navigate current tab -browse back Go back -browse forward Go forward -browse reload Reload page -browse url Print current URL +### 4. Visual evidence for bug reports +```bash +$B snapshot -i -a -o /tmp/annotated.png # labeled screenshot +$B screenshot /tmp/bug.png # plain screenshot +$B console # error log ``` -### Content extraction -``` -browse text Cleaned page text (no scripts/styles) -browse html [selector] innerHTML of element, or full page HTML -browse links All links as "text → href" -browse forms All forms + fields as JSON -browse accessibility Accessibility tree snapshot (ARIA) +### 5. Find all clickable elements (including non-ARIA) +```bash +$B snapshot -C # finds divs with cursor:pointer, onclick, tabindex +$B click @c1 # interact with them ``` -### Snapshot (ref-based element selection) -``` -browse snapshot Full accessibility tree with @refs -browse snapshot -i Interactive elements only (buttons, links, inputs) -browse snapshot -c Compact (no empty structural elements) -browse snapshot -d Limit depth to N levels -browse snapshot -s Scope to CSS selector +### 6. Assert element states +```bash +$B is visible ".modal" +$B is enabled "#submit-btn" +$B is disabled "#submit-btn" +$B is checked "#agree-checkbox" +$B is editable "#name-field" +$B is focused "#search-input" +$B js "document.body.textContent.includes('Success')" ``` -After snapshot, use @refs as selectors in any command: -``` -browse click @e3 Click the element assigned ref @e3 -browse fill @e4 "value" Fill the input assigned ref @e4 -browse hover @e1 Hover the element assigned ref @e1 -browse html @e2 Get innerHTML of ref @e2 -browse css @e5 "color" Get computed CSS of ref @e5 -browse attrs @e6 Get attributes of ref @e6 +### 7. Test responsive layouts +```bash +$B responsive /tmp/layout # mobile + tablet + desktop screenshots +$B viewport 375x812 # or set specific viewport +$B screenshot /tmp/mobile.png ``` -Refs are invalidated on navigation — run `snapshot` again after `goto`. - -### Interaction -``` -browse click Click element (CSS selector or @ref) -browse fill Fill input field -browse select Select dropdown value -browse hover Hover over element -browse type Type into focused element -browse press Press key (Enter, Tab, Escape, etc.) -browse scroll [selector] Scroll element into view, or page bottom -browse wait Wait for element to appear (max 10s) -browse viewport Set viewport size (e.g. 375x812) +### 8. Test file uploads +```bash +$B upload "#file-input" /path/to/file.pdf +$B is visible ".upload-success" ``` -### Inspection -``` -browse js Run JS, print result -browse eval Run JS file against page -browse css Get computed CSS property -browse attrs Get element attributes as JSON -browse console Dump captured console messages -browse console --clear Clear console buffer -browse network Dump captured network requests -browse network --clear Clear network buffer -browse cookies Dump all cookies as JSON -browse storage localStorage + sessionStorage as JSON -browse storage set Set localStorage value -browse perf Page load performance timings +### 9. Test dialogs +```bash +$B dialog-accept "yes" # set up handler +$B click "#delete-button" # trigger dialog +$B dialog # see what appeared +$B snapshot -D # verify deletion happened ``` -### Visual -``` -browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png) -browse pdf [path] Save as PDF -browse responsive [prefix] Screenshots at mobile/tablet/desktop +### 10. Compare environments +```bash +$B diff https://staging.app.com https://prod.app.com ``` -### Compare +## Snapshot Flags + ``` -browse diff Text diff between two pages +-i Interactive elements only (buttons, links, inputs) +-c Compact (no empty structural nodes) +-d Limit depth +-s Scope to CSS selector +-D Diff against previous snapshot +-a Annotated screenshot with ref labels +-o Output path for screenshot +-C Cursor-interactive elements (@c refs) ``` -### Multi-step (chain) -``` -echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain -``` +Combine: `$B snapshot -i -a -C -o /tmp/annotated.png` -### Tabs -``` -browse tabs List tabs (id, url, title) -browse tab Switch to tab -browse newtab [url] Open new tab -browse closetab [id] Close tab -``` +Use @refs after snapshot: `$B click @e3`, `$B fill @e4 "value"`, `$B click @c1` -### Server management -``` -browse status Server health, uptime, tab count -browse stop Shutdown server -browse restart Kill + restart server -``` +## Full Command List -## Speed Rules - -1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly. -2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors. -3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text. -4. **Use `links` to survey.** Faster than `text` when you just need navigation structure. -5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step. -6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots. - -## When to Use What - -| Task | Commands | -|------|----------| -| Read a page | `goto ` then `text` | -| Interact with elements | `snapshot -i` then `click @e3` | -| Check if element exists | `js "!!document.querySelector('.thing')"` | -| Extract specific data | `js "document.querySelector('.price').textContent"` | -| Visual check | `screenshot /tmp/x.png` then Read the image | -| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` | -| Check CSS | `css "selector" "property"` or `css @e3 "property"` | -| Inspect DOM | `html "selector"` or `attrs @e3` | -| Debug console errors | `console` | -| Check network requests | `network` | -| Check local dev | `goto http://127.0.0.1:3000` | -| Compare two pages | `diff ` | -| Mobile layout check | `responsive /tmp/prefix` | -| Multi-step flow | `echo '[...]' \| browse chain` | - -## Architecture - -- Persistent Chromium daemon on localhost (port 9400-9410) -- Bearer token auth per session -- State file: `/tmp/browse-server.json` -- Console log: `/tmp/browse-console.log` -- Network log: `/tmp/browse-network.log` -- Auto-shutdown after 30 min idle -- Chromium crash → server exits → auto-restarts on next command +**Navigate:** goto, back, forward, reload, url +**Read:** text, html, links, forms, accessibility +**Snapshot:** snapshot (with flags above) +**Interact:** click, fill, select, hover, type, press, scroll, wait, wait --networkidle, wait --load, viewport, upload, cookie-import, dialog-accept, dialog-dismiss +**Inspect:** js, eval, css, attrs, is, console, console --errors, network, dialog, cookies, storage, perf +**Visual:** screenshot, pdf, responsive +**Compare:** diff +**Multi-step:** chain (pipe JSON array) +**Tabs:** tabs, tab, newtab, closetab +**Server:** status, stop, restart