mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Files

T

Garry Tan 07b4e15b34 feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36 )

* fix: cookie import picker returns JSON instead of HTML

jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.

Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add help command to browse server

Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: version-aware find-browse with META signal protocol

Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.

- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: clean up .bun-build temp files after compile

bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make help command reachable by removing it from META_COMMANDS

help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared Greptile comment triage reference doc

Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.

* feat: make /review and /ship Greptile-aware

/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.

/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.

* feat: add Greptile batting average to /retro

Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.

* docs: add Greptile integration section to README

Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.

* feat: add local dev mode for testing skills from within the repo

bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow gitignore to .claude/skills/ instead of all .claude/

Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md

Rewritten as a contributor-friendly guide instead of a dry plan doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add browser interaction guidance to CLAUDE.md

Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared config module for project-local browse state

Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: rewrite port selection to use random ports

Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: move browse state from /tmp to project-local .gstack/

CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update crash log path reference to .gstack/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add config tests and update CLI lifecycle test

14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update BROWSER.md and TODO.md for project-local state

Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2

- README: replace Conductor-aware language with project-local isolation,
  add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff

When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: update CHANGELOG for complete v0.3.2 coverage

Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 18:10:56 -07:00

14 KiB

Raw Blame History

Browser — technical details

This document covers the command reference and internals of gstack's headless browser.

Command reference

Category	Commands	What for
Navigate	`goto`, `back`, `forward`, `reload`, `url`	Get to a page
Read	`text`, `html`, `links`, `forms`, `accessibility`	Extract content
Snapshot	`snapshot [-i] [-c] [-d N] [-s sel] [-D] [-a] [-o] [-C]`	Get refs, diff, annotate
Interact	`click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`, `upload`	Use the page
Inspect	`js`, `eval`, `css`, `attrs`, `is`, `console`, `network`, `dialog`, `cookies`, `storage`, `perf`	Debug and verify
Visual	`screenshot`, `pdf`, `responsive`	See what Claude sees
Compare	`diff <url1> <url2>`	Spot differences between environments
Dialogs	`dialog-accept [text]`, `dialog-dismiss`	Control alert/confirm/prompt handling
Tabs	`tabs`, `tab`, `newtab`, `closetab`	Multi-page workflows
Cookies	`cookie-import`, `cookie-import-browser`	Import cookies from file or real browser
Multi-step	`chain` (JSON from stdin)	Batch commands in one call

All selector arguments accept CSS selectors, @e refs after snapshot, or @c refs after snapshot -C. 50+ commands total plus cookie import.

How it works

gstack's browser is a compiled CLI binary that talks to a persistent local Chromium daemon over HTTP. The CLI is a thin client — it reads a state file, sends a command, and prints the response to stdout. The server does the real work via Playwright.

┌─────────────────────────────────────────────────────────────────┐
│  Claude Code                                                    │
│                                                                 │
│  "browse goto https://staging.myapp.com"                        │
│       │                                                         │
│       ▼                                                         │
│  ┌──────────┐    HTTP POST     ┌──────────────┐                 │
│  │ browse   │ ──────────────── │ Bun HTTP     │                 │
│  │ CLI      │  localhost:rand  │ server       │                 │
│  │          │  Bearer token    │              │                 │
│  │ compiled │ ◄──────────────  │  Playwright  │──── Chromium    │
│  │ binary   │  plain text      │  API calls   │    (headless)   │
│  └──────────┘                  └──────────────┘                 │
│   ~1ms startup                  persistent daemon               │
│                                 auto-starts on first call       │
│                                 auto-stops after 30 min idle    │
└─────────────────────────────────────────────────────────────────┘

Lifecycle

First call: CLI checks .gstack/browse.json (in the project root) for a running server. None found — it spawns bun run browse/src/server.ts in the background. The server launches headless Chromium via Playwright, picks a random port (10000-60000), generates a bearer token, writes the state file, and starts accepting HTTP requests. This takes ~3 seconds.
Subsequent calls: CLI reads the state file, sends an HTTP POST with the bearer token, prints the response. ~100-200ms round trip.
Idle shutdown: After 30 minutes with no commands, the server shuts down and cleans up the state file. Next call restarts it automatically.
Crash recovery: If Chromium crashes, the server exits immediately (no self-healing — don't hide failure). The CLI detects the dead server on the next call and starts a fresh one.

Key components

browse/
├── src/
│   ├── cli.ts              # Thin client — reads state file, sends HTTP, prints response
│   ├── server.ts           # Bun.serve HTTP server — routes commands to Playwright
│   ├── browser-manager.ts  # Chromium lifecycle — launch, tabs, ref map, crash handling
│   ├── snapshot.ts         # Accessibility tree → @ref assignment → Locator map + diff/annotate/-C
│   ├── read-commands.ts    # Non-mutating commands (text, html, links, js, css, is, dialog, etc.)
│   ├── write-commands.ts   # Mutating commands (click, fill, select, upload, dialog-accept, etc.)
│   ├── meta-commands.ts    # Server management, chain, diff, snapshot routing
│   ├── cookie-import-browser.ts  # Decrypt + import cookies from real Chromium browsers
│   ├── cookie-picker-routes.ts   # HTTP routes for interactive cookie picker UI
│   ├── cookie-picker-ui.ts       # Self-contained HTML/CSS/JS for cookie picker
│   └── buffers.ts          # CircularBuffer<T> + console/network/dialog capture
├── test/                   # Integration tests + HTML fixtures
└── dist/
    └── browse              # Compiled binary (~58MB, Bun --compile)

The snapshot system

The browser's key innovation is ref-based element selection, built on Playwright's accessibility tree API:

page.locator(scope).ariaSnapshot() returns a YAML-like accessibility tree
The snapshot parser assigns refs (@e1, @e2, ...) to each element
For each ref, it builds a Playwright Locator (using getByRole + nth-child)
The ref-to-Locator map is stored on BrowserManager
Later commands like click @e3 look up the Locator and call locator.click()

No DOM mutation. No injected scripts. Just Playwright's native accessibility API.

Extended snapshot features:

--diff (-D): Stores each snapshot as a baseline. On the next -D call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked.
--annotate (-a): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use -o <path> to control the output path.
--cursor-interactive (-C): Scans for non-ARIA interactive elements (divs with cursor:pointer, onclick, tabindex>=0) using page.evaluate. Assigns @c1, @c2... refs with deterministic nth-child CSS selectors. These are elements the ARIA tree misses but users can still click.

Authentication

Each server session generates a random UUID as a bearer token. The token is written to the state file (.gstack/browse.json) with chmod 600. Every HTTP request must include Authorization: Bearer <token>. This prevents other processes on the machine from controlling the browser.

Console, network, and dialog capture

The server hooks into Playwright's page.on('console'), page.on('response'), and page.on('dialog') events. All entries are kept in O(1) circular buffers (50,000 capacity each) and flushed to disk asynchronously via Bun.write():

Console: .gstack/browse-console.log
Network: .gstack/browse-network.log
Dialog: .gstack/browse-dialog.log

The console, network, and dialog commands read from the in-memory buffers, not disk.

Dialog handling

Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The dialog-accept and dialog-dismiss commands control this behavior. For prompts, dialog-accept <text> provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.

Multi-workspace support

Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in .gstack/ inside the project root (detected via git rev-parse --show-toplevel).

Workspace	State file	Port
`/code/project-a`	`/code/project-a/.gstack/browse.json`	random (10000-60000)
`/code/project-b`	`/code/project-b/.gstack/browse.json`	random (10000-60000)

No port collisions. No shared state. Each project is fully isolated.

Environment variables

Variable	Default	Description
`BROWSE_PORT`	0 (random 10000-60000)	Fixed port for the HTTP server (debug override)
`BROWSE_IDLE_TIMEOUT`	1800000 (30 min)	Idle shutdown timeout in ms
`BROWSE_STATE_FILE`	`.gstack/browse.json`	Path to state file (CLI passes to server)
`BROWSE_SERVER_SCRIPT`	auto-detected	Path to server.ts

Performance

Tool	First call	Subsequent calls	Context overhead per call
Chrome MCP	~5s	~2-5s	~2000 tokens (schema + protocol)
Playwright MCP	~3s	~1-3s	~1500 tokens (schema + protocol)
gstack browse	~3s	~100-200ms	0 tokens (plain text stdout)

The context overhead difference compounds fast. In a 20-command browser session, MCP tools burn 30,000-40,000 tokens on protocol framing alone. gstack burns zero.

Why CLI over MCP?

MCP (Model Context Protocol) works well for remote services, but for local browser automation it adds pure overhead:

Context bloat: every MCP call includes full JSON schemas and protocol framing. A simple "get the page text" costs 10x more context tokens than it should.
Connection fragility: persistent WebSocket/stdio connections drop and fail to reconnect.
Unnecessary abstraction: Claude Code already has a Bash tool. A CLI that prints to stdout is the simplest possible interface.

gstack skips all of this. Compiled binary. Plain text in, plain text out. No protocol. No schema. No connection management.

Acknowledgments

The browser automation layer is built on Playwright by Microsoft. Playwright's accessibility tree API, locator system, and headless Chromium management are what make ref-based interaction possible. The snapshot system — assigning @ref labels to accessibility tree nodes and mapping them back to Playwright Locators — is built entirely on top of Playwright's primitives. Thank you to the Playwright team for building such a solid foundation.

Development

Prerequisites

Bun v1.0+
Playwright's Chromium (installed automatically by bun install)

Quick start

bun install              # install dependencies + Playwright Chromium
bun test                 # run integration tests (~3s)
bun run dev <cmd>        # run CLI from source (no compile)
bun run build            # compile to browse/dist/browse

Dev mode vs compiled binary

During development, use bun run dev instead of the compiled binary. It runs browse/src/cli.ts directly with Bun, so you get instant feedback without a compile step:

bun run dev goto https://example.com
bun run dev text
bun run dev snapshot -i
bun run dev click @e3

The compiled binary (bun run build) is only needed for distribution. It produces a single ~58MB executable at browse/dist/browse using Bun's --compile flag.

Running tests

bun test                         # run all tests
bun test browse/test/commands              # run command integration tests only
bun test browse/test/snapshot              # run snapshot tests only
bun test browse/test/cookie-import-browser # run cookie import unit tests only

Tests spin up a local HTTP server (browse/test/test-server.ts) serving HTML fixtures from browse/test/fixtures/, then exercise the CLI commands against those pages. 203 tests across 3 files, ~15 seconds total.

Source map

File	Role
`browse/src/cli.ts`	Entry point. Reads `.gstack/browse.json`, sends HTTP to the server, prints response.
`browse/src/server.ts`	Bun HTTP server. Routes commands to the right handler. Manages idle timeout.
`browse/src/browser-manager.ts`	Chromium lifecycle — launch, tab management, ref map, crash detection.
`browse/src/snapshot.ts`	Parses accessibility tree, assigns `@e`/`@c` refs, builds Locator map. Handles `--diff`, `--annotate`, `-C`.
`browse/src/read-commands.ts`	Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `is`, `dialog`, `forms`, etc. Exports `getCleanText()`.
`browse/src/write-commands.ts`	Mutating commands: `goto`, `click`, `fill`, `upload`, `dialog-accept`, `useragent` (with context recreation), etc.
`browse/src/meta-commands.ts`	Server management, chain routing, diff (DRY via `getCleanText`), snapshot delegation.
`browse/src/cookie-import-browser.ts`	Decrypt Chromium cookies via macOS Keychain + PBKDF2/AES-128-CBC. Auto-detects installed browsers.
`browse/src/cookie-picker-routes.ts`	HTTP routes for `/cookie-picker/*` — browser list, domain search, import, remove.
`browse/src/cookie-picker-ui.ts`	Self-contained HTML generator for the interactive cookie picker (dark theme, no frameworks).
`browse/src/buffers.ts`	`CircularBuffer<T>` (O(1) ring buffer) + console/network/dialog capture with async disk flush.

Deploying to the active skill

The active skill lives at ~/.claude/skills/gstack/. After making changes:

Push your branch
Pull in the skill directory: cd ~/.claude/skills/gstack && git pull
Rebuild: cd ~/.claude/skills/gstack && bun run build

Or copy the binary directly: cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse

Adding a new command

Add the handler in read-commands.ts (non-mutating) or write-commands.ts (mutating)
Register the route in server.ts
Add a test case in browse/test/commands.test.ts with an HTML fixture if needed
Run bun test to verify
Run bun run build to compile

14 KiB Raw Blame History