mirror of https://github.com/garrytan/gstack.git synced 2026-05-02 11:45:20 +02:00

Files

T

Garry Tan 1717ed2891 fix: browse binary discovery broken for agents (v0.3.5) (#44 )

* fix: replace find-browse with direct path in SKILL.md setup blocks

Agents were skipping the find-browse binary and guessing bin/browse
(wrong path). Now the setup block explicitly checks browse/dist/browse
with workspace-local priority, global fallback.

Also adds || true to update check to prevent misleading exit code 1.

Adds {{UPDATE_CHECK}} and {{BROWSE_SETUP}} template placeholders to
gen-skill-docs.ts so all skills share a single source of truth.

* refactor: convert qa/ and setup-browser-cookies/ to .tmpl templates

Replaces hardcoded update check and find-browse blocks with
{{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders. Both skills
are now generated from templates via gen-skill-docs.

* test: add e2e and LLM eval tests for SKILL.md setup block

- 3 Agent SDK e2e tests: happy path, NEEDS_SETUP, non-git-repo
- LLM eval: setup block clarity + actionability >= 4
- New error pattern: 'no such file or directory.*browse'

These tests catch the exact failure mode where agents can't discover
the browse binary via SKILL.md instructions.

* chore: bump version and changelog (v0.3.5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-14 00:24:06 -07:00

14 KiB

Raw Blame History

Changelog

0.3.4 — 2026-03-13

Added

Daily update check — all 9 skills now check for new versions once per day via bin/gstack-update-check (pure bash, <5ms cached). Prompts user via AskUserQuestion with option to upgrade or defer 24h.
/gstack-upgrade skill — standalone upgrade command that detects install type (global-git, local-git, vendored), upgrades, and shows a "What's New" summary from CHANGELOG
"Just upgraded" confirmation — after upgrading, the next skill invocation shows "Running gstack v{new} (just updated!)" via ~/.gstack/just-upgraded-from marker
AskUserQuestion added to 5 skills — gstack (root), browse, qa, retro, setup-browser-cookies now have AskUserQuestion in allowed-tools for upgrade prompts
Bash added to plan-eng-review — enables the update check preamble to run in plan review sessions
browse/test/gstack-update-check.test.ts — 10 test cases covering all script branch paths with GSTACK_REMOTE_URL env var for test isolation
TODOS.md for tracking deferred work

Changed

Version check is now one system — removed SHA-based checkVersion() from browse/src/find-browse.ts (~120 lines deleted) and browse/test/find-browse.test.ts (~100 lines deleted). Replaced by bin/gstack-update-check bash script using semver VERSION comparison with 24h cache.
Simplified qa/SKILL.md and setup-browser-cookies/SKILL.md setup blocks — removed old BROWSE_OUTPUT/META parsing, now use simple find-browse call
Updated browse/bin/find-browse shim comments to reflect simplified role (binary locator only)

Removed

checkVersion(), readCache(), writeCache(), fetchRemoteSHA(), resolveSkillDir(), CacheEntry interface from browse/src/find-browse.ts
META:UPDATE_AVAILABLE protocol from find-browse output
Old META-based upgrade instructions from qa and setup-browser-cookies SKILL.md files
Legacy /tmp/gstack-latest-version cache file (cleaned up by setup script)

0.3.5 — 2026-03-14

Fixed

Browse binary discovery broken for agents — replaced find-browse indirection with explicit browse/dist/browse path in SKILL.md setup blocks. Agents were guessing bin/browse (wrong) instead of running find-browse to discover browse/dist/browse (correct).
Update check exit code 1 misleading agents — [ -n "$_UPD" ] && echo "$_UPD" returned exit code 1 when no update available, causing agents to think gstack was broken. Added || true.
browse/SKILL.md missing setup block — /browse used $B in every example but never defined it. Added {{BROWSE_SETUP}} placeholder.

Changed

Enriched 14 command descriptions with specific arg formats, valid values, error behavior, and return types
Fixed header usage from <name> <value> to <name>:<value> (matching actual implementation)
Added cookie usage syntax: cookie <name>=<value>
Template system expanded — added {{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders to gen-skill-docs.ts. Converted qa/SKILL.md and setup-browser-cookies/SKILL.md to .tmpl templates. All 4 browse-using skills now generate from a single source of truth.
Setup block now checks workspace-local path first (for development), then falls back to global ~/.claude/skills/gstack/browse/dist/browse

Added

3 new e2e test cases for SKILL.md setup flow: happy path, NEEDS_SETUP, non-git-repo
LLM eval for setup block clarity (actionability + clarity >= 4)
no such file or directory.*browse error pattern in session-runner
TODO: convert remaining 5 non-browse skills to .tmpl files
Enriched 4 snapshot flag descriptions with defaults, output paths, and behavior details
Snapshot flags section now shows long flag names (-i / --interactive) alongside short
Added ref numbering explanation and output format example to snapshot docs
Replaced hand-maintained server.ts help text with auto-generated generateHelpText() from COMMAND_DESCRIPTIONS
Upgraded LLM eval judge from Haiku to Sonnet 4.6 for more stable scoring

Added

Usage string consistency test: cross-checks Usage: patterns in implementation against COMMAND_DESCRIPTIONS
Pipe guard test: ensures no command description contains | (would break markdown tables)

0.3.3 — 2026-03-13

Added

SKILL.md template system — .tmpl files with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders, auto-generated from source code at build time. Structurally prevents command drift between docs and code.
Command registry (browse/src/commands.ts) — single source of truth for all browse commands with categories and enriched descriptions. Zero side effects, safe to import from build scripts and tests.
Snapshot flags metadata (SNAPSHOT_FLAGS array in browse/src/snapshot.ts) — metadata-driven parser replaces hand-coded switch/case. Adding a flag in one place updates the parser, docs, and tests.
Tier 1 static validation — 43 tests: parses $B commands from SKILL.md code blocks, validates against command registry and snapshot flag metadata
Tier 2 E2E tests via Agent SDK — spawns real Claude sessions, runs skills, scans for browse errors. Gated by SKILL_E2E=1 env var (~$0.50/run)
Tier 3 LLM-as-judge evals — Haiku scores generated docs on clarity/completeness/actionability (threshold ≥4/5), plus regression test vs hand-maintained baseline. Gated by ANTHROPIC_API_KEY
bun run skill:check — health dashboard showing all skills, command counts, validation status, template freshness
bun run dev:skill — watch mode that regenerates and validates SKILL.md on every template or source file change
CI workflow (.github/workflows/skill-docs.yml) — runs gen:skill-docs on push/PR, fails if generated output differs from committed files
bun run gen:skill-docs script for manual regeneration
bun run test:eval for LLM-as-judge evals
test/helpers/skill-parser.ts — extracts and validates $B commands from Markdown
test/helpers/session-runner.ts — Agent SDK wrapper with error pattern scanning and transcript saving
ARCHITECTURE.md — design decisions document covering daemon model, security, ref system, logging, crash recovery
Conductor integration (conductor.json) — lifecycle hooks for workspace setup/teardown
.env propagation — bin/dev-setup copies .env from main worktree into Conductor workspaces automatically
.env.example template for API key configuration

Changed

Build now runs gen:skill-docs before compiling binaries
parseSnapshotArgs is metadata-driven (iterates SNAPSHOT_FLAGS instead of switch/case)
server.ts imports command sets from commands.ts instead of declaring inline
SKILL.md and browse/SKILL.md are now generated files (edit the .tmpl instead)

0.3.2 — 2026-03-13

Fixed

Cookie import picker now returns JSON instead of HTML — jsonResponse() referenced url out of scope, crashing every API call
help command routed correctly (was unreachable due to META_COMMANDS dispatch ordering)
Stale servers from global install no longer shadow local changes — removed legacy ~/.claude/skills/gstack fallback from resolveServerScript()
Crash log path references updated from /tmp/ to .gstack/

Added

Diff-aware QA mode — /qa on a feature branch auto-analyzes git diff, identifies affected pages/routes, detects the running app on localhost, and tests only what changed. No URL needed.
Project-local browse state — state file, logs, and all server state now live in .gstack/ inside the project root (detected via git rev-parse --show-toplevel). No more /tmp state files.
Shared config module (browse/src/config.ts) — centralizes path resolution for CLI and server, eliminates duplicated port/state logic
Random port selection — server picks a random port 10000-60000 instead of scanning 9400-9409. No more CONDUCTOR_PORT magic offset. No more port collisions across workspaces.
Binary version tracking — state file includes binaryVersion SHA; CLI auto-restarts the server when the binary is rebuilt
Legacy /tmp cleanup — CLI scans for and removes old /tmp/browse-server*.json files, verifying PID ownership before sending signals
Greptile integration — /review and /ship fetch and triage Greptile bot comments; /retro tracks Greptile batting average across weeks
Local dev mode — bin/dev-setup symlinks skills from the repo for in-place development; bin/dev-teardown restores global install
help command — agents can self-discover all commands and snapshot flags
Version-aware find-browse with META signal protocol — detects stale binaries and prompts agents to update
browse/dist/find-browse compiled binary with git SHA comparison against origin/main (4hr cached)
.version file written at build time for binary version tracking
Route-level tests for cookie picker (13 tests) and find-browse version check (10 tests)
Config resolution tests (14 tests) covering git root detection, BROWSE_STATE_FILE override, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection
Browser interaction guidance in CLAUDE.md — prevents Claude from using mcp__claude-in-chrome__* tools
CONTRIBUTING.md with quick start, dev mode explanation, and instructions for testing branches in other repos

Changed

State file location: .gstack/browse.json (was /tmp/browse-server.json)
Log files location: .gstack/browse-{console,network,dialog}.log (was /tmp/browse-*.log)
Atomic state file writes: .json.tmp → rename (prevents partial reads)
CLI passes BROWSE_STATE_FILE to spawned server (server derives all paths from it)
SKILL.md setup checks parse META signals and handle META:UPDATE_AVAILABLE
/qa SKILL.md now describes four modes (diff-aware, full, quick, regression) with diff-aware as the default on feature branches
jsonResponse/errorResponse use options objects to prevent positional parameter confusion
Build script compiles both browse and find-browse binaries, cleans up .bun-build temp files
README updated with Greptile setup instructions, diff-aware QA examples, and revised demo transcript

Removed

CONDUCTOR_PORT magic offset (browse_port = CONDUCTOR_PORT - 45600)
Port scan range 9400-9409
Legacy fallback to ~/.claude/skills/gstack/browse/src/server.ts
DEVELOPING_GSTACK.md (renamed to CONTRIBUTING.md)

0.3.1 — 2026-03-12

cookie-import-browser command — decrypt and import cookies from real Chromium browsers (Comet, Chrome, Arc, Brave, Edge)
Interactive cookie picker web UI served from the browse server (dark theme, two-panel layout, domain search, import/remove)
Direct CLI import with --domain flag for non-interactive use
/setup-browser-cookies skill for Claude Code integration
macOS Keychain access with async 10s timeout (no event loop blocking)
Per-browser AES key caching (one Keychain prompt per browser per session)
DB lock fallback: copies locked cookie DB to /tmp for safe reads
18 unit tests with encrypted cookie fixtures

0.3.0 — 2026-03-12

Phase 3: /qa skill — systematic QA testing

New /qa skill with 6-phase workflow (Initialize, Authenticate, Orient, Explore, Document, Wrap up)
Three modes: full (systematic, 5-10 issues), quick (30-second smoke test), regression (compare against baseline)
Issue taxonomy: 7 categories, 4 severity levels, per-page exploration checklist
Structured report template with health score (0-100, weighted across 7 categories)
Framework detection guidance for Next.js, Rails, WordPress, and SPAs
browse/bin/find-browse — DRY binary discovery using git rev-parse --show-toplevel

Phase 2: Enhanced browser

Dialog handling: auto-accept/dismiss, dialog buffer, prompt text support
File upload: upload <sel> <file1> [file2...]
Element state checks: is visible|hidden|enabled|disabled|checked|editable|focused <sel>
Annotated screenshots with ref labels overlaid (snapshot -a)
Snapshot diffing against previous snapshot (snapshot -D)
Cursor-interactive element scan for non-ARIA clickables (snapshot -C)
wait --networkidle / --load / --domcontentloaded flags
console --errors filter (error + warning only)
cookie-import <json-file> with auto-fill domain from page URL
CircularBuffer O(1) ring buffer for console/network/dialog buffers
Async buffer flush with Bun.write()
Health check with page.evaluate + 2s timeout
Playwright error wrapping — actionable messages for AI agents
Context recreation preserves cookies/storage/URLs (useragent fix)
SKILL.md rewritten as QA-oriented playbook with 10 workflow patterns
166 integration tests (was ~63)

0.0.2 — 2026-03-12

Fix project-local /browse installs — compiled binary now resolves server.ts from its own directory instead of assuming a global install exists
setup rebuilds stale binaries (not just missing ones) and exits non-zero if the build fails
Fix chain command swallowing real errors from write commands (e.g. navigation timeout reported as "Unknown meta command")
Fix unbounded restart loop in CLI when server crashes repeatedly on the same command
Cap console/network buffers at 50k entries (ring buffer) instead of growing without bound
Fix disk flush stopping silently after buffer hits the 50k cap
Fix ln -snf in setup to avoid creating nested symlinks on upgrade
Use git fetch && git reset --hard instead of git pull for upgrades (handles force-pushes)
Simplify install: global-first with optional project copy (replaces submodule approach)
Restructured README: hero, before/after, demo transcript, troubleshooting section
Six skills (added /retro)

0.0.1 — 2026-03-11

Initial release.

Five skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse
Headless browser CLI with 40+ commands, ref-based interaction, persistent Chromium daemon
One-command install as Claude Code skills (submodule or global clone)
setup script for binary compilation and skill symlinking

14 KiB Raw Blame History

Changelog

0.3.4 — 2026-03-13

Added

Changed

Removed

0.3.5 — 2026-03-14

Fixed

Changed

Added

Added

0.3.3 — 2026-03-13

Added

Changed

0.3.2 — 2026-03-13

Fixed

Added

Changed

Removed

0.3.1 — 2026-03-12

Phase 3.5: Browser cookie import

0.3.0 — 2026-03-12

Phase 3: /qa skill — systematic QA testing

Phase 2: Enhanced browser

0.0.2 — 2026-03-12

0.0.1 — 2026-03-11

14 KiB

Raw Blame History