mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
Phase 2: Rewrite SKILL.md as QA playbook + command reference
Reorient SKILL.md files from raw command reference to QA-first playbook with 10 workflow patterns (test user flows, verify deployments, dogfood features, responsive layouts, file upload, forms, dialogs, compare pages). Compact command reference tables at the bottom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+29
-18
@@ -8,15 +8,16 @@ This document covers the command reference and internals of gstack's headless br
|
||||
|----------|----------|----------|
|
||||
| Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page |
|
||||
| Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content |
|
||||
| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel]` | Get refs for interaction |
|
||||
| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport` | Use the page |
|
||||
| Inspect | `js`, `eval`, `css`, `attrs`, `console`, `network`, `cookies`, `storage`, `perf` | Debug and verify |
|
||||
| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel] [-D] [-a] [-o] [-C]` | Get refs, diff, annotate |
|
||||
| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`, `upload` | Use the page |
|
||||
| Inspect | `js`, `eval`, `css`, `attrs`, `is`, `console`, `network`, `dialog`, `cookies`, `storage`, `perf` | Debug and verify |
|
||||
| Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees |
|
||||
| Compare | `diff <url1> <url2>` | Spot differences between environments |
|
||||
| Dialogs | `dialog-accept [text]`, `dialog-dismiss` | Control alert/confirm/prompt handling |
|
||||
| Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows |
|
||||
| Multi-step | `chain` (JSON from stdin) | Batch commands in one call |
|
||||
|
||||
All selector arguments accept CSS selectors or `@ref` after `snapshot`. 40+ commands total.
|
||||
All selector arguments accept CSS selectors, `@e` refs after `snapshot`, or `@c` refs after `snapshot -C`. 50+ commands total.
|
||||
|
||||
## How it works
|
||||
|
||||
@@ -60,11 +61,11 @@ browse/
|
||||
│ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response
|
||||
│ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright
|
||||
│ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling
|
||||
│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map
|
||||
│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, etc.)
|
||||
│ ├── write-commands.ts # Mutating commands (click, fill, select, navigate, etc.)
|
||||
│ ├── meta-commands.ts # Server management (status, stop, restart)
|
||||
│ └── buffers.ts # Console + network log capture (in-memory + disk flush)
|
||||
│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map + diff/annotate/-C
|
||||
│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, is, dialog, etc.)
|
||||
│ ├── write-commands.ts # Mutating commands (click, fill, select, upload, dialog-accept, etc.)
|
||||
│ ├── meta-commands.ts # Server management, chain, diff, snapshot routing
|
||||
│ └── buffers.ts # CircularBuffer<T> + console/network/dialog capture
|
||||
├── test/ # Integration tests + HTML fixtures
|
||||
└── dist/
|
||||
└── browse # Compiled binary (~58MB, Bun --compile)
|
||||
@@ -82,18 +83,28 @@ The browser's key innovation is ref-based element selection, built on Playwright
|
||||
|
||||
No DOM mutation. No injected scripts. Just Playwright's native accessibility API.
|
||||
|
||||
**Extended snapshot features:**
|
||||
- `--diff` (`-D`): Stores each snapshot as a baseline. On the next `-D` call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked.
|
||||
- `--annotate` (`-a`): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use `-o <path>` to control the output path.
|
||||
- `--cursor-interactive` (`-C`): Scans for non-ARIA interactive elements (divs with `cursor:pointer`, `onclick`, `tabindex>=0`) using `page.evaluate`. Assigns `@c1`, `@c2`... refs with deterministic `nth-child` CSS selectors. These are elements the ARIA tree misses but users can still click.
|
||||
|
||||
### Authentication
|
||||
|
||||
Each server session generates a random UUID as a bearer token. The token is written to the state file (`/tmp/browse-server.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer <token>`. This prevents other processes on the machine from controlling the browser.
|
||||
|
||||
### Console and network capture
|
||||
### Console, network, and dialog capture
|
||||
|
||||
The server hooks into Playwright's `page.on('console')` and `page.on('response')` events. All entries are kept in memory and flushed to disk every second:
|
||||
The server hooks into Playwright's `page.on('console')`, `page.on('response')`, and `page.on('dialog')` events. All entries are kept in O(1) circular buffers (50,000 capacity each) and flushed to disk asynchronously via `Bun.write()`:
|
||||
|
||||
- Console: `/tmp/browse-console.log`
|
||||
- Network: `/tmp/browse-network.log`
|
||||
- Dialog: `/tmp/browse-dialog.log`
|
||||
|
||||
The `console` and `network` commands read from the in-memory buffers, not disk.
|
||||
The `console`, `network`, and `dialog` commands read from the in-memory buffers, not disk.
|
||||
|
||||
### Dialog handling
|
||||
|
||||
Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.
|
||||
|
||||
### Multi-workspace support
|
||||
|
||||
@@ -184,7 +195,7 @@ bun test browse/test/commands # run command integration tests only
|
||||
bun test browse/test/snapshot # run snapshot tests only
|
||||
```
|
||||
|
||||
Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. Tests take ~3 seconds.
|
||||
Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. 148 tests across 2 files, ~15 seconds total.
|
||||
|
||||
### Source map
|
||||
|
||||
@@ -193,11 +204,11 @@ Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fi
|
||||
| `browse/src/cli.ts` | Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response. |
|
||||
| `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. |
|
||||
| `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. |
|
||||
| `browse/src/snapshot.ts` | Parses Playwright's accessibility tree, assigns `@ref` labels, builds Locator map. |
|
||||
| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `forms`, etc. |
|
||||
| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `select`, `scroll`, etc. |
|
||||
| `browse/src/meta-commands.ts` | Server management: `status`, `stop`, `restart`. |
|
||||
| `browse/src/buffers.ts` | In-memory + disk capture for console and network logs. |
|
||||
| `browse/src/snapshot.ts` | Parses accessibility tree, assigns `@e`/`@c` refs, builds Locator map. Handles `--diff`, `--annotate`, `-C`. |
|
||||
| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `is`, `dialog`, `forms`, etc. Exports `getCleanText()`. |
|
||||
| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `upload`, `dialog-accept`, `useragent` (with context recreation), etc. |
|
||||
| `browse/src/meta-commands.ts` | Server management, chain routing, diff (DRY via `getCleanText`), snapshot delegation. |
|
||||
| `browse/src/buffers.ts` | `CircularBuffer<T>` (O(1) ring buffer) + console/network/dialog capture with async disk flush. |
|
||||
|
||||
### Deploying to the active skill
|
||||
|
||||
|
||||
@@ -1,254 +1,324 @@
|
||||
---
|
||||
name: gstack
|
||||
version: 1.0.0
|
||||
version: 1.1.0
|
||||
description: |
|
||||
Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL,
|
||||
read page content, click elements, fill forms, run JavaScript, take screenshots,
|
||||
inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after
|
||||
first call. Use when you need to check a website, verify a deployment, read docs,
|
||||
or interact with any web page. No MCP, no Chrome extension — just fast CLI.
|
||||
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
|
||||
elements, verify page state, diff before/after actions, take annotated screenshots, check
|
||||
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
|
||||
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
|
||||
user flow, or file a bug with evidence.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
||||
---
|
||||
|
||||
# gstack: Persistent Browser for Claude Code
|
||||
# gstack browse: QA Testing & Dogfooding
|
||||
|
||||
Persistent headless Chromium daemon. First call auto-starts the server (~3s).
|
||||
Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle.
|
||||
Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.
|
||||
Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, sessions).
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
Before using any browse command, find the skill and check if the binary exists:
|
||||
|
||||
```bash
|
||||
# Check project-level first, then user-level
|
||||
if test -x .claude/skills/gstack/browse/dist/browse; then
|
||||
echo "READY_PROJECT"
|
||||
elif test -x ~/.claude/skills/gstack/browse/dist/browse; then
|
||||
echo "READY_USER"
|
||||
B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
|
||||
if [ -n "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist.
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response.
|
||||
2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run:
|
||||
```bash
|
||||
cd <SKILL_DIR> && ./setup
|
||||
```
|
||||
3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
|
||||
4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
|
||||
|
||||
Once setup is done, it never needs to run again (the compiled binary persists).
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
## IMPORTANT
|
||||
|
||||
- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user).
|
||||
- Use the compiled binary via Bash: `$B <command>`
|
||||
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
|
||||
- The browser persists between calls — cookies, tabs, and state carry over.
|
||||
- The server auto-starts on first command. No setup needed.
|
||||
- Browser persists between calls — cookies, login sessions, and tabs carry over.
|
||||
- Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup.
|
||||
|
||||
## Quick Reference
|
||||
## QA Workflows
|
||||
|
||||
### Test a user flow (login, signup, checkout, etc.)
|
||||
|
||||
```bash
|
||||
B=~/.claude/skills/gstack/browse/dist/browse
|
||||
|
||||
# Navigate to a page
|
||||
$B goto https://example.com
|
||||
# 1. Go to the page
|
||||
$B goto https://app.example.com/login
|
||||
|
||||
# Read cleaned page text
|
||||
$B text
|
||||
|
||||
# Take a screenshot (then Read the image)
|
||||
$B screenshot /tmp/page.png
|
||||
|
||||
# Snapshot: accessibility tree with refs
|
||||
# 2. See what's interactive
|
||||
$B snapshot -i
|
||||
|
||||
# Click by ref (after snapshot)
|
||||
$B click @e3
|
||||
# 3. Fill the form using refs
|
||||
$B fill @e3 "test@example.com"
|
||||
$B fill @e4 "password123"
|
||||
$B click @e5
|
||||
|
||||
# Fill by ref
|
||||
$B fill @e4 "test@test.com"
|
||||
|
||||
# Run JavaScript
|
||||
$B js "document.title"
|
||||
|
||||
# Get all links
|
||||
$B links
|
||||
|
||||
# Click by CSS selector
|
||||
$B click "button.submit"
|
||||
|
||||
# Fill a form by CSS selector
|
||||
$B fill "#email" "test@test.com"
|
||||
$B fill "#password" "abc123"
|
||||
$B click "button[type=submit]"
|
||||
|
||||
# Get HTML of an element
|
||||
$B html "main"
|
||||
|
||||
# Get computed CSS
|
||||
$B css "body" "font-family"
|
||||
|
||||
# Get element attributes
|
||||
$B attrs "nav"
|
||||
|
||||
# Wait for element to appear
|
||||
$B wait ".loaded"
|
||||
|
||||
# Accessibility tree
|
||||
$B accessibility
|
||||
|
||||
# Set viewport
|
||||
$B viewport 375x812
|
||||
|
||||
# Set cookies / headers
|
||||
$B cookie "session=abc123"
|
||||
$B header "Authorization:Bearer token123"
|
||||
# 4. Verify it worked
|
||||
$B snapshot -D # diff shows what changed after clicking
|
||||
$B is visible ".dashboard" # assert the dashboard appeared
|
||||
$B screenshot /tmp/after-login.png
|
||||
```
|
||||
|
||||
## Command Reference
|
||||
### Verify a deployment / check prod
|
||||
|
||||
### Navigation
|
||||
```
|
||||
browse goto <url> Navigate current tab
|
||||
browse back Go back
|
||||
browse forward Go forward
|
||||
browse reload Reload page
|
||||
browse url Print current URL
|
||||
```bash
|
||||
$B goto https://yourapp.com
|
||||
$B text # read the page — does it load?
|
||||
$B console # any JS errors?
|
||||
$B network # any failed requests?
|
||||
$B js "document.title" # correct title?
|
||||
$B is visible ".hero-section" # key elements present?
|
||||
$B screenshot /tmp/prod-check.png
|
||||
```
|
||||
|
||||
### Content extraction
|
||||
```
|
||||
browse text Cleaned page text (no scripts/styles)
|
||||
browse html [selector] innerHTML of element, or full page HTML
|
||||
browse links All links as "text → href"
|
||||
browse forms All forms + fields as JSON
|
||||
browse accessibility Accessibility tree snapshot (ARIA)
|
||||
### Dogfood a feature end-to-end
|
||||
|
||||
```bash
|
||||
# Navigate to the feature
|
||||
$B goto https://app.example.com/new-feature
|
||||
|
||||
# Take annotated screenshot — shows every interactive element with labels
|
||||
$B snapshot -i -a -o /tmp/feature-annotated.png
|
||||
|
||||
# Find ALL clickable things (including divs with cursor:pointer)
|
||||
$B snapshot -C
|
||||
|
||||
# Walk through the flow
|
||||
$B snapshot -i # baseline
|
||||
$B click @e3 # interact
|
||||
$B snapshot -D # what changed? (unified diff)
|
||||
|
||||
# Check element states
|
||||
$B is visible ".success-toast"
|
||||
$B is enabled "#next-step-btn"
|
||||
$B is checked "#agree-checkbox"
|
||||
|
||||
# Check console for errors after interactions
|
||||
$B console
|
||||
```
|
||||
|
||||
### Snapshot (ref-based element selection)
|
||||
```
|
||||
browse snapshot Full accessibility tree with @refs
|
||||
browse snapshot -i Interactive elements only (buttons, links, inputs)
|
||||
browse snapshot -c Compact (no empty structural elements)
|
||||
browse snapshot -d <N> Limit depth to N levels
|
||||
browse snapshot -s <sel> Scope to CSS selector
|
||||
### Test responsive layouts
|
||||
|
||||
```bash
|
||||
# Quick: 3 screenshots at mobile/tablet/desktop
|
||||
$B goto https://yourapp.com
|
||||
$B responsive /tmp/layout
|
||||
|
||||
# Manual: specific viewport
|
||||
$B viewport 375x812 # iPhone
|
||||
$B screenshot /tmp/mobile.png
|
||||
$B viewport 1440x900 # Desktop
|
||||
$B screenshot /tmp/desktop.png
|
||||
```
|
||||
|
||||
After snapshot, use @refs as selectors in any command:
|
||||
### Test file upload
|
||||
|
||||
```bash
|
||||
$B goto https://app.example.com/upload
|
||||
$B snapshot -i
|
||||
$B upload @e3 /path/to/test-file.pdf
|
||||
$B is visible ".upload-success"
|
||||
$B screenshot /tmp/upload-result.png
|
||||
```
|
||||
browse click @e3 Click the element assigned ref @e3
|
||||
browse fill @e4 "value" Fill the input assigned ref @e4
|
||||
browse hover @e1 Hover the element assigned ref @e1
|
||||
browse html @e2 Get innerHTML of ref @e2
|
||||
browse css @e5 "color" Get computed CSS of ref @e5
|
||||
browse attrs @e6 Get attributes of ref @e6
|
||||
|
||||
### Test forms with validation
|
||||
|
||||
```bash
|
||||
$B goto https://app.example.com/form
|
||||
$B snapshot -i
|
||||
|
||||
# Submit empty — check validation errors appear
|
||||
$B click @e10 # submit button
|
||||
$B snapshot -D # diff shows error messages appeared
|
||||
$B is visible ".error-message"
|
||||
|
||||
# Fill and resubmit
|
||||
$B fill @e3 "valid input"
|
||||
$B click @e10
|
||||
$B snapshot -D # diff shows errors gone, success state
|
||||
```
|
||||
|
||||
### Test dialogs (delete confirmations, prompts)
|
||||
|
||||
```bash
|
||||
# Set up dialog handling BEFORE triggering
|
||||
$B dialog-accept # will auto-accept next alert/confirm
|
||||
$B click "#delete-button" # triggers confirmation dialog
|
||||
$B dialog # see what dialog appeared
|
||||
$B snapshot -D # verify the item was deleted
|
||||
|
||||
# For prompts that need input
|
||||
$B dialog-accept "my answer" # accept with text
|
||||
$B click "#rename-button" # triggers prompt
|
||||
```
|
||||
|
||||
### Compare two pages / environments
|
||||
|
||||
```bash
|
||||
$B diff https://staging.app.com https://prod.app.com
|
||||
```
|
||||
|
||||
### Multi-step chain (efficient for long flows)
|
||||
|
||||
```bash
|
||||
echo '[
|
||||
["goto","https://app.example.com"],
|
||||
["snapshot","-i"],
|
||||
["fill","@e3","test@test.com"],
|
||||
["fill","@e4","password"],
|
||||
["click","@e5"],
|
||||
["snapshot","-D"],
|
||||
["screenshot","/tmp/result.png"]
|
||||
]' | $B chain
|
||||
```
|
||||
|
||||
## Quick Assertion Patterns
|
||||
|
||||
```bash
|
||||
# Element exists and is visible
|
||||
$B is visible ".modal"
|
||||
|
||||
# Button is enabled/disabled
|
||||
$B is enabled "#submit-btn"
|
||||
$B is disabled "#submit-btn"
|
||||
|
||||
# Checkbox state
|
||||
$B is checked "#agree"
|
||||
|
||||
# Input is editable
|
||||
$B is editable "#name-field"
|
||||
|
||||
# Element has focus
|
||||
$B is focused "#search-input"
|
||||
|
||||
# Page contains text
|
||||
$B js "document.body.textContent.includes('Success')"
|
||||
|
||||
# Element count
|
||||
$B js "document.querySelectorAll('.list-item').length"
|
||||
|
||||
# Specific attribute value
|
||||
$B attrs "#logo" # returns all attributes as JSON
|
||||
|
||||
# CSS property
|
||||
$B css ".button" "background-color"
|
||||
```
|
||||
|
||||
## Snapshot System
|
||||
|
||||
The snapshot is your primary tool for understanding and interacting with pages.
|
||||
|
||||
```bash
|
||||
$B snapshot -i # Interactive elements only (buttons, links, inputs) with @e refs
|
||||
$B snapshot -c # Compact (no empty structural elements)
|
||||
$B snapshot -d 3 # Limit depth to 3 levels
|
||||
$B snapshot -s "main" # Scope to CSS selector
|
||||
$B snapshot -D # Diff against previous snapshot (what changed?)
|
||||
$B snapshot -a # Annotated screenshot with ref labels
|
||||
$B snapshot -o /tmp/x.png # Output path for annotated screenshot
|
||||
$B snapshot -C # Cursor-interactive elements (@c refs — divs with pointer, onclick)
|
||||
```
|
||||
|
||||
Combine flags: `$B snapshot -i -a -C -o /tmp/annotated.png`
|
||||
|
||||
After snapshot, use @refs everywhere:
|
||||
```bash
|
||||
$B click @e3 $B fill @e4 "value" $B hover @e1
|
||||
$B html @e2 $B css @e5 "color" $B attrs @e6
|
||||
$B click @c1 # cursor-interactive ref (from -C)
|
||||
```
|
||||
|
||||
Refs are invalidated on navigation — run `snapshot` again after `goto`.
|
||||
|
||||
## Command Reference
|
||||
|
||||
### Navigation
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `goto <url>` | Navigate to URL |
|
||||
| `back` / `forward` | History navigation |
|
||||
| `reload` | Reload page |
|
||||
| `url` | Print current URL |
|
||||
|
||||
### Reading
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `text` | Cleaned page text |
|
||||
| `html [selector]` | innerHTML |
|
||||
| `links` | All links as "text -> href" |
|
||||
| `forms` | Forms + fields as JSON |
|
||||
| `accessibility` | Full ARIA tree |
|
||||
|
||||
### Interaction
|
||||
```
|
||||
browse click <selector> Click element (CSS selector or @ref)
|
||||
browse fill <selector> <value> Fill input field
|
||||
browse select <selector> <val> Select dropdown value
|
||||
browse hover <selector> Hover over element
|
||||
browse type <text> Type into focused element
|
||||
browse press <key> Press key (Enter, Tab, Escape, etc.)
|
||||
browse scroll [selector] Scroll element into view, or page bottom
|
||||
browse wait <selector> Wait for element to appear (max 10s)
|
||||
browse viewport <WxH> Set viewport size (e.g. 375x812)
|
||||
```
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `click <sel>` | Click element |
|
||||
| `fill <sel> <val>` | Fill input |
|
||||
| `select <sel> <val>` | Select dropdown |
|
||||
| `hover <sel>` | Hover element |
|
||||
| `type <text>` | Type into focused element |
|
||||
| `press <key>` | Press key (Enter, Tab, Escape) |
|
||||
| `scroll [sel]` | Scroll element into view |
|
||||
| `wait <sel>` | Wait for element (max 10s) |
|
||||
| `wait --networkidle` | Wait for network to be idle |
|
||||
| `wait --load` | Wait for page load event |
|
||||
| `upload <sel> <file...>` | Upload file(s) |
|
||||
| `cookie-import <json>` | Import cookies from JSON file |
|
||||
| `dialog-accept [text]` | Auto-accept dialogs |
|
||||
| `dialog-dismiss` | Auto-dismiss dialogs |
|
||||
| `viewport <WxH>` | Set viewport size |
|
||||
|
||||
### Inspection
|
||||
```
|
||||
browse js <expression> Run JS, print result
|
||||
browse eval <js-file> Run JS file against page
|
||||
browse css <selector> <prop> Get computed CSS property
|
||||
browse attrs <selector> Get element attributes as JSON
|
||||
browse console Dump captured console messages
|
||||
browse console --clear Clear console buffer
|
||||
browse network Dump captured network requests
|
||||
browse network --clear Clear network buffer
|
||||
browse cookies Dump all cookies as JSON
|
||||
browse storage localStorage + sessionStorage as JSON
|
||||
browse storage set <key> <val> Set localStorage value
|
||||
browse perf Page load performance timings
|
||||
```
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `js <expr>` | Run JavaScript |
|
||||
| `eval <file>` | Run JS file |
|
||||
| `css <sel> <prop>` | Computed CSS |
|
||||
| `attrs <sel>` | Element attributes |
|
||||
| `is <prop> <sel>` | State check (visible/hidden/enabled/disabled/checked/editable/focused) |
|
||||
| `console [--clear\|--errors]` | Console messages (--errors filters to error/warning) |
|
||||
| `network [--clear]` | Network requests |
|
||||
| `dialog [--clear]` | Dialog messages |
|
||||
| `cookies` | All cookies |
|
||||
| `storage` | localStorage + sessionStorage |
|
||||
| `perf` | Page load timings |
|
||||
|
||||
### Visual
|
||||
```
|
||||
browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png)
|
||||
browse pdf [path] Save as PDF
|
||||
browse responsive [prefix] Screenshots at mobile/tablet/desktop
|
||||
```
|
||||
|
||||
### Compare
|
||||
```
|
||||
browse diff <url1> <url2> Text diff between two pages
|
||||
```
|
||||
|
||||
### Multi-step (chain)
|
||||
```
|
||||
echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
|
||||
```
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `screenshot [path]` | Screenshot |
|
||||
| `pdf [path]` | Save as PDF |
|
||||
| `responsive [prefix]` | Mobile/tablet/desktop screenshots |
|
||||
| `diff <url1> <url2>` | Text diff between pages |
|
||||
|
||||
### Tabs
|
||||
```
|
||||
browse tabs List tabs (id, url, title)
|
||||
browse tab <id> Switch to tab
|
||||
browse newtab [url] Open new tab
|
||||
browse closetab [id] Close tab
|
||||
```
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `tabs` | List tabs |
|
||||
| `tab <id>` | Switch tab |
|
||||
| `newtab [url]` | Open tab |
|
||||
| `closetab [id]` | Close tab |
|
||||
|
||||
### Server management
|
||||
```
|
||||
browse status Server health, uptime, tab count
|
||||
browse stop Shutdown server
|
||||
browse restart Kill + restart server
|
||||
```
|
||||
### Server
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `status` | Health check |
|
||||
| `stop` | Shutdown |
|
||||
| `restart` | Restart |
|
||||
|
||||
## Speed Rules
|
||||
## Tips
|
||||
|
||||
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly.
|
||||
2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors.
|
||||
3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text.
|
||||
4. **Use `links` to survey.** Faster than `text` when you just need navigation structure.
|
||||
5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step.
|
||||
6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots.
|
||||
|
||||
## When to Use What
|
||||
|
||||
| Task | Commands |
|
||||
|------|----------|
|
||||
| Read a page | `goto <url>` then `text` |
|
||||
| Interact with elements | `snapshot -i` then `click @e3` |
|
||||
| Check if element exists | `js "!!document.querySelector('.thing')"` |
|
||||
| Extract specific data | `js "document.querySelector('.price').textContent"` |
|
||||
| Visual check | `screenshot /tmp/x.png` then Read the image |
|
||||
| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` |
|
||||
| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
|
||||
| Inspect DOM | `html "selector"` or `attrs @e3` |
|
||||
| Debug console errors | `console` |
|
||||
| Check network requests | `network` |
|
||||
| Check local dev | `goto http://127.0.0.1:3000` |
|
||||
| Compare two pages | `diff <url1> <url2>` |
|
||||
| Mobile layout check | `responsive /tmp/prefix` |
|
||||
| Multi-step flow | `echo '[...]' \| browse chain` |
|
||||
|
||||
## Architecture
|
||||
|
||||
- Persistent Chromium daemon on localhost (port 9400-9410)
|
||||
- Bearer token auth per session
|
||||
- State file: `/tmp/browse-server.json`
|
||||
- Console log: `/tmp/browse-console.log`
|
||||
- Network log: `/tmp/browse-network.log`
|
||||
- Auto-shutdown after 30 min idle
|
||||
- Chromium crash → server exits → auto-restarts on next command
|
||||
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `screenshot` all hit the loaded page instantly.
|
||||
2. **Use `snapshot -i` first.** See all interactive elements, then click/fill by ref. No CSS selector guessing.
|
||||
3. **Use `snapshot -D` to verify.** Baseline → action → diff. See exactly what changed.
|
||||
4. **Use `is` for assertions.** `is visible .modal` is faster and more reliable than parsing page text.
|
||||
5. **Use `snapshot -a` for evidence.** Annotated screenshots are great for bug reports.
|
||||
6. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
|
||||
7. **Check `console` after actions.** Catch JS errors that don't surface visually.
|
||||
8. **Use `chain` for long flows.** Single command, no per-step CLI overhead.
|
||||
|
||||
+87
-213
@@ -1,254 +1,128 @@
|
||||
---
|
||||
name: browse
|
||||
version: 1.0.0
|
||||
version: 1.1.0
|
||||
description: |
|
||||
Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL,
|
||||
read page content, click elements, fill forms, run JavaScript, take screenshots,
|
||||
inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after
|
||||
first call. Use when you need to check a website, verify a deployment, read docs,
|
||||
or interact with any web page. No MCP, no Chrome extension — just fast CLI.
|
||||
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
|
||||
elements, verify page state, diff before/after actions, take annotated screenshots, check
|
||||
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
|
||||
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
|
||||
user flow, or file a bug with evidence.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
|
||||
---
|
||||
|
||||
# gstack: Persistent Browser for Claude Code
|
||||
# browse: QA Testing & Dogfooding
|
||||
|
||||
Persistent headless Chromium daemon. First call auto-starts the server (~3s).
|
||||
Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle.
|
||||
Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
|
||||
State persists between calls (cookies, tabs, login sessions).
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
Before using any browse command, find the skill and check if the binary exists:
|
||||
## Core QA Patterns
|
||||
|
||||
### 1. Verify a page loads correctly
|
||||
```bash
|
||||
# Check project-level first, then user-level
|
||||
if test -x .claude/skills/gstack/browse/dist/browse; then
|
||||
echo "READY_PROJECT"
|
||||
elif test -x ~/.claude/skills/gstack/browse/dist/browse; then
|
||||
echo "READY_USER"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
$B goto https://yourapp.com
|
||||
$B text # content loads?
|
||||
$B console # JS errors?
|
||||
$B network # failed requests?
|
||||
$B is visible ".main-content" # key elements present?
|
||||
```
|
||||
|
||||
Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist.
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response.
|
||||
2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run:
|
||||
### 2. Test a user flow
|
||||
```bash
|
||||
cd <SKILL_DIR> && ./setup
|
||||
$B goto https://app.com/login
|
||||
$B snapshot -i # see all interactive elements
|
||||
$B fill @e3 "user@test.com"
|
||||
$B fill @e4 "password"
|
||||
$B click @e5 # submit
|
||||
$B snapshot -D # diff: what changed after submit?
|
||||
$B is visible ".dashboard" # success state present?
|
||||
```
|
||||
3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
|
||||
4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
|
||||
|
||||
Once setup is done, it never needs to run again (the compiled binary persists).
|
||||
|
||||
## IMPORTANT
|
||||
|
||||
- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user).
|
||||
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
|
||||
- The browser persists between calls — cookies, tabs, and state carry over.
|
||||
- The server auto-starts on first command. No setup needed.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### 3. Verify an action worked
|
||||
```bash
|
||||
B=~/.claude/skills/gstack/browse/dist/browse
|
||||
|
||||
# Navigate to a page
|
||||
$B goto https://example.com
|
||||
|
||||
# Read cleaned page text
|
||||
$B text
|
||||
|
||||
# Take a screenshot (then Read the image)
|
||||
$B screenshot /tmp/page.png
|
||||
|
||||
# Snapshot: accessibility tree with refs
|
||||
$B snapshot -i
|
||||
|
||||
# Click by ref (after snapshot)
|
||||
$B click @e3
|
||||
|
||||
# Fill by ref
|
||||
$B fill @e4 "test@test.com"
|
||||
|
||||
# Run JavaScript
|
||||
$B js "document.title"
|
||||
|
||||
# Get all links
|
||||
$B links
|
||||
|
||||
# Click by CSS selector
|
||||
$B click "button.submit"
|
||||
|
||||
# Fill a form by CSS selector
|
||||
$B fill "#email" "test@test.com"
|
||||
$B fill "#password" "abc123"
|
||||
$B click "button[type=submit]"
|
||||
|
||||
# Get HTML of an element
|
||||
$B html "main"
|
||||
|
||||
# Get computed CSS
|
||||
$B css "body" "font-family"
|
||||
|
||||
# Get element attributes
|
||||
$B attrs "nav"
|
||||
|
||||
# Wait for element to appear
|
||||
$B wait ".loaded"
|
||||
|
||||
# Accessibility tree
|
||||
$B accessibility
|
||||
|
||||
# Set viewport
|
||||
$B viewport 375x812
|
||||
|
||||
# Set cookies / headers
|
||||
$B cookie "session=abc123"
|
||||
$B header "Authorization:Bearer token123"
|
||||
$B snapshot # baseline
|
||||
$B click @e3 # do something
|
||||
$B snapshot -D # unified diff shows exactly what changed
|
||||
```
|
||||
|
||||
## Command Reference
|
||||
|
||||
### Navigation
|
||||
```
|
||||
browse goto <url> Navigate current tab
|
||||
browse back Go back
|
||||
browse forward Go forward
|
||||
browse reload Reload page
|
||||
browse url Print current URL
|
||||
### 4. Visual evidence for bug reports
|
||||
```bash
|
||||
$B snapshot -i -a -o /tmp/annotated.png # labeled screenshot
|
||||
$B screenshot /tmp/bug.png # plain screenshot
|
||||
$B console # error log
|
||||
```
|
||||
|
||||
### Content extraction
|
||||
```
|
||||
browse text Cleaned page text (no scripts/styles)
|
||||
browse html [selector] innerHTML of element, or full page HTML
|
||||
browse links All links as "text → href"
|
||||
browse forms All forms + fields as JSON
|
||||
browse accessibility Accessibility tree snapshot (ARIA)
|
||||
### 5. Find all clickable elements (including non-ARIA)
|
||||
```bash
|
||||
$B snapshot -C # finds divs with cursor:pointer, onclick, tabindex
|
||||
$B click @c1 # interact with them
|
||||
```
|
||||
|
||||
### Snapshot (ref-based element selection)
|
||||
```
|
||||
browse snapshot Full accessibility tree with @refs
|
||||
browse snapshot -i Interactive elements only (buttons, links, inputs)
|
||||
browse snapshot -c Compact (no empty structural elements)
|
||||
browse snapshot -d <N> Limit depth to N levels
|
||||
browse snapshot -s <sel> Scope to CSS selector
|
||||
### 6. Assert element states
|
||||
```bash
|
||||
$B is visible ".modal"
|
||||
$B is enabled "#submit-btn"
|
||||
$B is disabled "#submit-btn"
|
||||
$B is checked "#agree-checkbox"
|
||||
$B is editable "#name-field"
|
||||
$B is focused "#search-input"
|
||||
$B js "document.body.textContent.includes('Success')"
|
||||
```
|
||||
|
||||
After snapshot, use @refs as selectors in any command:
|
||||
```
|
||||
browse click @e3 Click the element assigned ref @e3
|
||||
browse fill @e4 "value" Fill the input assigned ref @e4
|
||||
browse hover @e1 Hover the element assigned ref @e1
|
||||
browse html @e2 Get innerHTML of ref @e2
|
||||
browse css @e5 "color" Get computed CSS of ref @e5
|
||||
browse attrs @e6 Get attributes of ref @e6
|
||||
### 7. Test responsive layouts
|
||||
```bash
|
||||
$B responsive /tmp/layout # mobile + tablet + desktop screenshots
|
||||
$B viewport 375x812 # or set specific viewport
|
||||
$B screenshot /tmp/mobile.png
|
||||
```
|
||||
|
||||
Refs are invalidated on navigation — run `snapshot` again after `goto`.
|
||||
|
||||
### Interaction
|
||||
```
|
||||
browse click <selector> Click element (CSS selector or @ref)
|
||||
browse fill <selector> <value> Fill input field
|
||||
browse select <selector> <val> Select dropdown value
|
||||
browse hover <selector> Hover over element
|
||||
browse type <text> Type into focused element
|
||||
browse press <key> Press key (Enter, Tab, Escape, etc.)
|
||||
browse scroll [selector] Scroll element into view, or page bottom
|
||||
browse wait <selector> Wait for element to appear (max 10s)
|
||||
browse viewport <WxH> Set viewport size (e.g. 375x812)
|
||||
### 8. Test file uploads
|
||||
```bash
|
||||
$B upload "#file-input" /path/to/file.pdf
|
||||
$B is visible ".upload-success"
|
||||
```
|
||||
|
||||
### Inspection
|
||||
```
|
||||
browse js <expression> Run JS, print result
|
||||
browse eval <js-file> Run JS file against page
|
||||
browse css <selector> <prop> Get computed CSS property
|
||||
browse attrs <selector> Get element attributes as JSON
|
||||
browse console Dump captured console messages
|
||||
browse console --clear Clear console buffer
|
||||
browse network Dump captured network requests
|
||||
browse network --clear Clear network buffer
|
||||
browse cookies Dump all cookies as JSON
|
||||
browse storage localStorage + sessionStorage as JSON
|
||||
browse storage set <key> <val> Set localStorage value
|
||||
browse perf Page load performance timings
|
||||
### 9. Test dialogs
|
||||
```bash
|
||||
$B dialog-accept "yes" # set up handler
|
||||
$B click "#delete-button" # trigger dialog
|
||||
$B dialog # see what appeared
|
||||
$B snapshot -D # verify deletion happened
|
||||
```
|
||||
|
||||
### Visual
|
||||
```
|
||||
browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png)
|
||||
browse pdf [path] Save as PDF
|
||||
browse responsive [prefix] Screenshots at mobile/tablet/desktop
|
||||
### 10. Compare environments
|
||||
```bash
|
||||
$B diff https://staging.app.com https://prod.app.com
|
||||
```
|
||||
|
||||
### Compare
|
||||
## Snapshot Flags
|
||||
|
||||
```
|
||||
browse diff <url1> <url2> Text diff between two pages
|
||||
-i Interactive elements only (buttons, links, inputs)
|
||||
-c Compact (no empty structural nodes)
|
||||
-d <N> Limit depth
|
||||
-s <sel> Scope to CSS selector
|
||||
-D Diff against previous snapshot
|
||||
-a Annotated screenshot with ref labels
|
||||
-o <path> Output path for screenshot
|
||||
-C Cursor-interactive elements (@c refs)
|
||||
```
|
||||
|
||||
### Multi-step (chain)
|
||||
```
|
||||
echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
|
||||
```
|
||||
Combine: `$B snapshot -i -a -C -o /tmp/annotated.png`
|
||||
|
||||
### Tabs
|
||||
```
|
||||
browse tabs List tabs (id, url, title)
|
||||
browse tab <id> Switch to tab
|
||||
browse newtab [url] Open new tab
|
||||
browse closetab [id] Close tab
|
||||
```
|
||||
Use @refs after snapshot: `$B click @e3`, `$B fill @e4 "value"`, `$B click @c1`
|
||||
|
||||
### Server management
|
||||
```
|
||||
browse status Server health, uptime, tab count
|
||||
browse stop Shutdown server
|
||||
browse restart Kill + restart server
|
||||
```
|
||||
## Full Command List
|
||||
|
||||
## Speed Rules
|
||||
|
||||
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly.
|
||||
2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors.
|
||||
3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text.
|
||||
4. **Use `links` to survey.** Faster than `text` when you just need navigation structure.
|
||||
5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step.
|
||||
6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots.
|
||||
|
||||
## When to Use What
|
||||
|
||||
| Task | Commands |
|
||||
|------|----------|
|
||||
| Read a page | `goto <url>` then `text` |
|
||||
| Interact with elements | `snapshot -i` then `click @e3` |
|
||||
| Check if element exists | `js "!!document.querySelector('.thing')"` |
|
||||
| Extract specific data | `js "document.querySelector('.price').textContent"` |
|
||||
| Visual check | `screenshot /tmp/x.png` then Read the image |
|
||||
| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` |
|
||||
| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
|
||||
| Inspect DOM | `html "selector"` or `attrs @e3` |
|
||||
| Debug console errors | `console` |
|
||||
| Check network requests | `network` |
|
||||
| Check local dev | `goto http://127.0.0.1:3000` |
|
||||
| Compare two pages | `diff <url1> <url2>` |
|
||||
| Mobile layout check | `responsive /tmp/prefix` |
|
||||
| Multi-step flow | `echo '[...]' \| browse chain` |
|
||||
|
||||
## Architecture
|
||||
|
||||
- Persistent Chromium daemon on localhost (port 9400-9410)
|
||||
- Bearer token auth per session
|
||||
- State file: `/tmp/browse-server.json`
|
||||
- Console log: `/tmp/browse-console.log`
|
||||
- Network log: `/tmp/browse-network.log`
|
||||
- Auto-shutdown after 30 min idle
|
||||
- Chromium crash → server exits → auto-restarts on next command
|
||||
**Navigate:** goto, back, forward, reload, url
|
||||
**Read:** text, html, links, forms, accessibility
|
||||
**Snapshot:** snapshot (with flags above)
|
||||
**Interact:** click, fill, select, hover, type, press, scroll, wait, wait --networkidle, wait --load, viewport, upload, cookie-import, dialog-accept, dialog-dismiss
|
||||
**Inspect:** js, eval, css, attrs, is, console, console --errors, network, dialog, cookies, storage, perf
|
||||
**Visual:** screenshot, pdf, responsive
|
||||
**Compare:** diff
|
||||
**Multi-step:** chain (pipe JSON array)
|
||||
**Tabs:** tabs, tab, newtab, closetab
|
||||
**Server:** status, stop, restart
|
||||
|
||||
Reference in New Issue
Block a user