mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
07b4e15b34
* fix: cookie import picker returns JSON instead of HTML jsonResponse() was defined at module scope but referenced `url` which only existed as a parameter of handleCookiePickerRoute(). Every API call crashed, the catch block also crashed, and Bun returned a default HTML page that the frontend couldn't parse as JSON. Thread port via corsOrigin() helper and options objects. Add route-level tests to prevent this class of bug from shipping again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add help command to browse server Agents that don't have SKILL.md loaded (or misread flags) had no way to self-discover the CLI. The help command returns a formatted reference of all commands and snapshot flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: version-aware find-browse with META signal protocol Agents in other workspaces found stale browse binaries that were missing newer flags. find-browse now compares the local binary's git SHA against origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE when behind. SKILL.md setup checks parse META signals and prompt the user to update. - New compiled binary: browse/dist/find-browse (TypeScript, testable) - Bash shim at browse/bin/find-browse delegates to compiled binary - .version file written at build time with git commit SHA - Build script compiles both browse and find-browse binaries - Graceful degradation: offline, missing .version, corrupt cache all skip check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: clean up .bun-build temp files after compile bun build --compile leaves ~58MB temp files in the working directory. Add rm -f .*.bun-build to the build script to clean up after each build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make help command reachable by removing it from META_COMMANDS help was in META_COMMANDS, so it dispatched to handleMetaCommand() which threw "Unknown meta command: help". Removing it from the set lets the dedicated else-if handler in handleCommand() execute correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared Greptile comment triage reference doc Shared reference for fetching, filtering, and classifying Greptile review comments on GitHub PRs. Used by both /review and /ship skills. Includes parallel API fetching, suppressions check, classification logic, reply APIs, and history file writes. * feat: make /review and /ship Greptile-aware /review: Step 2.5 fetches and classifies Greptile comments, Step 5 resolves them with AskUserQuestion for valid issues and false positives. /ship: Step 3.75 triages Greptile comments between pre-landing review and version bump. Adds Greptile Review section to PR body in Step 8. Re-runs tests if any Greptile fixes are applied. * feat: add Greptile batting average to /retro Reads ~/.gstack/greptile-history.md, computes signal ratio (valid catches vs false positives), includes in metrics table, JSON snapshot, and Code Quality Signals narrative. * docs: add Greptile integration section to README Personal endorsement, two-layer review narrative, full UX walkthrough transcript, skills table updates. Add Greptile training feedback loop to TODO.md future ideas. * feat: add local dev mode for testing skills from within the repo bin/dev-setup creates .claude/skills/gstack symlink to the working tree so Claude Code discovers skills locally. bin/dev-teardown cleans up. DEVELOPING_GSTACK.md documents the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow gitignore to .claude/skills/ instead of all .claude/ Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Rewritten as a contributor-friendly guide instead of a dry plan doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add browser interaction guidance to CLAUDE.md Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared config module for project-local browse state Centralizes path resolution (git root detection, state dir, log paths) into config.ts. Both cli.ts and server.ts import from it, eliminating duplicated PORT_OFFSET/BROWSE_PORT/STATE_FILE logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rewrite port selection to use random ports Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port 10000-60000. Atomic state file writes, log paths from config module, binaryVersion field for auto-restart on update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: move browse state from /tmp to project-local .gstack/ CLI now uses config module for state paths, passes BROWSE_STATE_FILE to spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup with PID verification, and removes stale global install fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update crash log path reference to .gstack/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add config tests and update CLI lifecycle test 14 new tests for config resolution, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection. Remove obsolete CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update BROWSER.md and TODO.md for project-local state Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document random port selection and per-project isolation. Add server bundling TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 - README: replace Conductor-aware language with project-local isolation, add Greptile setup note - CHANGELOG: comprehensive v0.3.2 entry with all state management changes - CONTRIBUTING: add instructions for testing branches in other repos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff When on a feature branch, /qa now reads git diff main, identifies affected pages/routes from changed files, and tests them automatically. No URL required. The most natural flow: write code, /ship, /qa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update CHANGELOG for complete v0.3.2 coverage Add missing entries: diff-aware QA mode, Greptile integration, local dev mode, crash log path fix, README/SKILL.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
230 lines
14 KiB
Markdown
230 lines
14 KiB
Markdown
# Browser — technical details
|
|
|
|
This document covers the command reference and internals of gstack's headless browser.
|
|
|
|
## Command reference
|
|
|
|
| Category | Commands | What for |
|
|
|----------|----------|----------|
|
|
| Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page |
|
|
| Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content |
|
|
| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel] [-D] [-a] [-o] [-C]` | Get refs, diff, annotate |
|
|
| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`, `upload` | Use the page |
|
|
| Inspect | `js`, `eval`, `css`, `attrs`, `is`, `console`, `network`, `dialog`, `cookies`, `storage`, `perf` | Debug and verify |
|
|
| Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees |
|
|
| Compare | `diff <url1> <url2>` | Spot differences between environments |
|
|
| Dialogs | `dialog-accept [text]`, `dialog-dismiss` | Control alert/confirm/prompt handling |
|
|
| Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows |
|
|
| Cookies | `cookie-import`, `cookie-import-browser` | Import cookies from file or real browser |
|
|
| Multi-step | `chain` (JSON from stdin) | Batch commands in one call |
|
|
|
|
All selector arguments accept CSS selectors, `@e` refs after `snapshot`, or `@c` refs after `snapshot -C`. 50+ commands total plus cookie import.
|
|
|
|
## How it works
|
|
|
|
gstack's browser is a compiled CLI binary that talks to a persistent local Chromium daemon over HTTP. The CLI is a thin client — it reads a state file, sends a command, and prints the response to stdout. The server does the real work via [Playwright](https://playwright.dev/).
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Claude Code │
|
|
│ │
|
|
│ "browse goto https://staging.myapp.com" │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌──────────┐ HTTP POST ┌──────────────┐ │
|
|
│ │ browse │ ──────────────── │ Bun HTTP │ │
|
|
│ │ CLI │ localhost:rand │ server │ │
|
|
│ │ │ Bearer token │ │ │
|
|
│ │ compiled │ ◄────────────── │ Playwright │──── Chromium │
|
|
│ │ binary │ plain text │ API calls │ (headless) │
|
|
│ └──────────┘ └──────────────┘ │
|
|
│ ~1ms startup persistent daemon │
|
|
│ auto-starts on first call │
|
|
│ auto-stops after 30 min idle │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Lifecycle
|
|
|
|
1. **First call**: CLI checks `.gstack/browse.json` (in the project root) for a running server. None found — it spawns `bun run browse/src/server.ts` in the background. The server launches headless Chromium via Playwright, picks a random port (10000-60000), generates a bearer token, writes the state file, and starts accepting HTTP requests. This takes ~3 seconds.
|
|
|
|
2. **Subsequent calls**: CLI reads the state file, sends an HTTP POST with the bearer token, prints the response. ~100-200ms round trip.
|
|
|
|
3. **Idle shutdown**: After 30 minutes with no commands, the server shuts down and cleans up the state file. Next call restarts it automatically.
|
|
|
|
4. **Crash recovery**: If Chromium crashes, the server exits immediately (no self-healing — don't hide failure). The CLI detects the dead server on the next call and starts a fresh one.
|
|
|
|
### Key components
|
|
|
|
```
|
|
browse/
|
|
├── src/
|
|
│ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response
|
|
│ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright
|
|
│ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling
|
|
│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map + diff/annotate/-C
|
|
│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, is, dialog, etc.)
|
|
│ ├── write-commands.ts # Mutating commands (click, fill, select, upload, dialog-accept, etc.)
|
|
│ ├── meta-commands.ts # Server management, chain, diff, snapshot routing
|
|
│ ├── cookie-import-browser.ts # Decrypt + import cookies from real Chromium browsers
|
|
│ ├── cookie-picker-routes.ts # HTTP routes for interactive cookie picker UI
|
|
│ ├── cookie-picker-ui.ts # Self-contained HTML/CSS/JS for cookie picker
|
|
│ └── buffers.ts # CircularBuffer<T> + console/network/dialog capture
|
|
├── test/ # Integration tests + HTML fixtures
|
|
└── dist/
|
|
└── browse # Compiled binary (~58MB, Bun --compile)
|
|
```
|
|
|
|
### The snapshot system
|
|
|
|
The browser's key innovation is ref-based element selection, built on Playwright's accessibility tree API:
|
|
|
|
1. `page.locator(scope).ariaSnapshot()` returns a YAML-like accessibility tree
|
|
2. The snapshot parser assigns refs (`@e1`, `@e2`, ...) to each element
|
|
3. For each ref, it builds a Playwright `Locator` (using `getByRole` + nth-child)
|
|
4. The ref-to-Locator map is stored on `BrowserManager`
|
|
5. Later commands like `click @e3` look up the Locator and call `locator.click()`
|
|
|
|
No DOM mutation. No injected scripts. Just Playwright's native accessibility API.
|
|
|
|
**Extended snapshot features:**
|
|
- `--diff` (`-D`): Stores each snapshot as a baseline. On the next `-D` call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked.
|
|
- `--annotate` (`-a`): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use `-o <path>` to control the output path.
|
|
- `--cursor-interactive` (`-C`): Scans for non-ARIA interactive elements (divs with `cursor:pointer`, `onclick`, `tabindex>=0`) using `page.evaluate`. Assigns `@c1`, `@c2`... refs with deterministic `nth-child` CSS selectors. These are elements the ARIA tree misses but users can still click.
|
|
|
|
### Authentication
|
|
|
|
Each server session generates a random UUID as a bearer token. The token is written to the state file (`.gstack/browse.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer <token>`. This prevents other processes on the machine from controlling the browser.
|
|
|
|
### Console, network, and dialog capture
|
|
|
|
The server hooks into Playwright's `page.on('console')`, `page.on('response')`, and `page.on('dialog')` events. All entries are kept in O(1) circular buffers (50,000 capacity each) and flushed to disk asynchronously via `Bun.write()`:
|
|
|
|
- Console: `.gstack/browse-console.log`
|
|
- Network: `.gstack/browse-network.log`
|
|
- Dialog: `.gstack/browse-dialog.log`
|
|
|
|
The `console`, `network`, and `dialog` commands read from the in-memory buffers, not disk.
|
|
|
|
### Dialog handling
|
|
|
|
Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.
|
|
|
|
### Multi-workspace support
|
|
|
|
Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`).
|
|
|
|
| Workspace | State file | Port |
|
|
|-----------|------------|------|
|
|
| `/code/project-a` | `/code/project-a/.gstack/browse.json` | random (10000-60000) |
|
|
| `/code/project-b` | `/code/project-b/.gstack/browse.json` | random (10000-60000) |
|
|
|
|
No port collisions. No shared state. Each project is fully isolated.
|
|
|
|
### Environment variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `BROWSE_PORT` | 0 (random 10000-60000) | Fixed port for the HTTP server (debug override) |
|
|
| `BROWSE_IDLE_TIMEOUT` | 1800000 (30 min) | Idle shutdown timeout in ms |
|
|
| `BROWSE_STATE_FILE` | `.gstack/browse.json` | Path to state file (CLI passes to server) |
|
|
| `BROWSE_SERVER_SCRIPT` | auto-detected | Path to server.ts |
|
|
|
|
### Performance
|
|
|
|
| Tool | First call | Subsequent calls | Context overhead per call |
|
|
|------|-----------|-----------------|--------------------------|
|
|
| Chrome MCP | ~5s | ~2-5s | ~2000 tokens (schema + protocol) |
|
|
| Playwright MCP | ~3s | ~1-3s | ~1500 tokens (schema + protocol) |
|
|
| **gstack browse** | **~3s** | **~100-200ms** | **0 tokens** (plain text stdout) |
|
|
|
|
The context overhead difference compounds fast. In a 20-command browser session, MCP tools burn 30,000-40,000 tokens on protocol framing alone. gstack burns zero.
|
|
|
|
### Why CLI over MCP?
|
|
|
|
MCP (Model Context Protocol) works well for remote services, but for local browser automation it adds pure overhead:
|
|
|
|
- **Context bloat**: every MCP call includes full JSON schemas and protocol framing. A simple "get the page text" costs 10x more context tokens than it should.
|
|
- **Connection fragility**: persistent WebSocket/stdio connections drop and fail to reconnect.
|
|
- **Unnecessary abstraction**: Claude Code already has a Bash tool. A CLI that prints to stdout is the simplest possible interface.
|
|
|
|
gstack skips all of this. Compiled binary. Plain text in, plain text out. No protocol. No schema. No connection management.
|
|
|
|
## Acknowledgments
|
|
|
|
The browser automation layer is built on [Playwright](https://playwright.dev/) by Microsoft. Playwright's accessibility tree API, locator system, and headless Chromium management are what make ref-based interaction possible. The snapshot system — assigning `@ref` labels to accessibility tree nodes and mapping them back to Playwright Locators — is built entirely on top of Playwright's primitives. Thank you to the Playwright team for building such a solid foundation.
|
|
|
|
## Development
|
|
|
|
### Prerequisites
|
|
|
|
- [Bun](https://bun.sh/) v1.0+
|
|
- Playwright's Chromium (installed automatically by `bun install`)
|
|
|
|
### Quick start
|
|
|
|
```bash
|
|
bun install # install dependencies + Playwright Chromium
|
|
bun test # run integration tests (~3s)
|
|
bun run dev <cmd> # run CLI from source (no compile)
|
|
bun run build # compile to browse/dist/browse
|
|
```
|
|
|
|
### Dev mode vs compiled binary
|
|
|
|
During development, use `bun run dev` instead of the compiled binary. It runs `browse/src/cli.ts` directly with Bun, so you get instant feedback without a compile step:
|
|
|
|
```bash
|
|
bun run dev goto https://example.com
|
|
bun run dev text
|
|
bun run dev snapshot -i
|
|
bun run dev click @e3
|
|
```
|
|
|
|
The compiled binary (`bun run build`) is only needed for distribution. It produces a single ~58MB executable at `browse/dist/browse` using Bun's `--compile` flag.
|
|
|
|
### Running tests
|
|
|
|
```bash
|
|
bun test # run all tests
|
|
bun test browse/test/commands # run command integration tests only
|
|
bun test browse/test/snapshot # run snapshot tests only
|
|
bun test browse/test/cookie-import-browser # run cookie import unit tests only
|
|
```
|
|
|
|
Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. 203 tests across 3 files, ~15 seconds total.
|
|
|
|
### Source map
|
|
|
|
| File | Role |
|
|
|------|------|
|
|
| `browse/src/cli.ts` | Entry point. Reads `.gstack/browse.json`, sends HTTP to the server, prints response. |
|
|
| `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. |
|
|
| `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. |
|
|
| `browse/src/snapshot.ts` | Parses accessibility tree, assigns `@e`/`@c` refs, builds Locator map. Handles `--diff`, `--annotate`, `-C`. |
|
|
| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `is`, `dialog`, `forms`, etc. Exports `getCleanText()`. |
|
|
| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `upload`, `dialog-accept`, `useragent` (with context recreation), etc. |
|
|
| `browse/src/meta-commands.ts` | Server management, chain routing, diff (DRY via `getCleanText`), snapshot delegation. |
|
|
| `browse/src/cookie-import-browser.ts` | Decrypt Chromium cookies via macOS Keychain + PBKDF2/AES-128-CBC. Auto-detects installed browsers. |
|
|
| `browse/src/cookie-picker-routes.ts` | HTTP routes for `/cookie-picker/*` — browser list, domain search, import, remove. |
|
|
| `browse/src/cookie-picker-ui.ts` | Self-contained HTML generator for the interactive cookie picker (dark theme, no frameworks). |
|
|
| `browse/src/buffers.ts` | `CircularBuffer<T>` (O(1) ring buffer) + console/network/dialog capture with async disk flush. |
|
|
|
|
### Deploying to the active skill
|
|
|
|
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
|
|
|
|
1. Push your branch
|
|
2. Pull in the skill directory: `cd ~/.claude/skills/gstack && git pull`
|
|
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
|
|
|
|
Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
|
|
|
|
### Adding a new command
|
|
|
|
1. Add the handler in `read-commands.ts` (non-mutating) or `write-commands.ts` (mutating)
|
|
2. Register the route in `server.ts`
|
|
3. Add a test case in `browse/test/commands.test.ts` with an HTML fixture if needed
|
|
4. Run `bun test` to verify
|
|
5. Run `bun run build` to compile
|