mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
1717ed2891
* fix: replace find-browse with direct path in SKILL.md setup blocks
Agents were skipping the find-browse binary and guessing bin/browse
(wrong path). Now the setup block explicitly checks browse/dist/browse
with workspace-local priority, global fallback.
Also adds || true to update check to prevent misleading exit code 1.
Adds {{UPDATE_CHECK}} and {{BROWSE_SETUP}} template placeholders to
gen-skill-docs.ts so all skills share a single source of truth.
* refactor: convert qa/ and setup-browser-cookies/ to .tmpl templates
Replaces hardcoded update check and find-browse blocks with
{{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders. Both skills
are now generated from templates via gen-skill-docs.
* test: add e2e and LLM eval tests for SKILL.md setup block
- 3 Agent SDK e2e tests: happy path, NEEDS_SETUP, non-git-repo
- LLM eval: setup block clarity + actionability >= 4
- New error pattern: 'no such file or directory.*browse'
These tests catch the exact failure mode where agents can't discover
the browse binary via SKILL.md instructions.
* chore: bump version and changelog (v0.3.5)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
14 KiB
14 KiB
Changelog
0.3.4 — 2026-03-13
Added
- Daily update check — all 9 skills now check for new versions once per day via
bin/gstack-update-check(pure bash, <5ms cached). Prompts user via AskUserQuestion with option to upgrade or defer 24h. /gstack-upgradeskill — standalone upgrade command that detects install type (global-git, local-git, vendored), upgrades, and shows a "What's New" summary from CHANGELOG- "Just upgraded" confirmation — after upgrading, the next skill invocation shows "Running gstack v{new} (just updated!)" via
~/.gstack/just-upgraded-frommarker AskUserQuestionadded to 5 skills — gstack (root), browse, qa, retro, setup-browser-cookies now have AskUserQuestion in allowed-tools for upgrade promptsBashadded to plan-eng-review — enables the update check preamble to run in plan review sessionsbrowse/test/gstack-update-check.test.ts— 10 test cases covering all script branch paths withGSTACK_REMOTE_URLenv var for test isolationTODOS.mdfor tracking deferred work
Changed
- Version check is now one system — removed SHA-based
checkVersion()frombrowse/src/find-browse.ts(~120 lines deleted) andbrowse/test/find-browse.test.ts(~100 lines deleted). Replaced bybin/gstack-update-checkbash script using semver VERSION comparison with 24h cache. - Simplified
qa/SKILL.mdandsetup-browser-cookies/SKILL.mdsetup blocks — removed oldBROWSE_OUTPUT/METAparsing, now use simplefind-browsecall - Updated
browse/bin/find-browseshim comments to reflect simplified role (binary locator only)
Removed
checkVersion(),readCache(),writeCache(),fetchRemoteSHA(),resolveSkillDir(),CacheEntryinterface frombrowse/src/find-browse.tsMETA:UPDATE_AVAILABLEprotocol from find-browse output- Old META-based upgrade instructions from qa and setup-browser-cookies SKILL.md files
- Legacy
/tmp/gstack-latest-versioncache file (cleaned up bysetupscript)
0.3.5 — 2026-03-14
Fixed
- Browse binary discovery broken for agents — replaced
find-browseindirection with explicitbrowse/dist/browsepath in SKILL.md setup blocks. Agents were guessingbin/browse(wrong) instead of runningfind-browseto discoverbrowse/dist/browse(correct). - Update check exit code 1 misleading agents —
[ -n "$_UPD" ] && echo "$_UPD"returned exit code 1 when no update available, causing agents to think gstack was broken. Added|| true. - browse/SKILL.md missing setup block —
/browseused$Bin every example but never defined it. Added{{BROWSE_SETUP}}placeholder.
Changed
- Enriched 14 command descriptions with specific arg formats, valid values, error behavior, and return types
- Fixed
headerusage from<name> <value>to<name>:<value>(matching actual implementation) - Added
cookieusage syntax:cookie <name>=<value> - Template system expanded — added
{{UPDATE_CHECK}}and{{BROWSE_SETUP}}placeholders togen-skill-docs.ts. Convertedqa/SKILL.mdandsetup-browser-cookies/SKILL.mdto.tmpltemplates. All 4 browse-using skills now generate from a single source of truth. - Setup block now checks workspace-local path first (for development), then falls back to global
~/.claude/skills/gstack/browse/dist/browse
Added
- 3 new e2e test cases for SKILL.md setup flow: happy path, NEEDS_SETUP, non-git-repo
- LLM eval for setup block clarity (actionability + clarity >= 4)
no such file or directory.*browseerror pattern in session-runner- TODO: convert remaining 5 non-browse skills to .tmpl files
- Enriched 4 snapshot flag descriptions with defaults, output paths, and behavior details
- Snapshot flags section now shows long flag names (
-i / --interactive) alongside short - Added ref numbering explanation and output format example to snapshot docs
- Replaced hand-maintained server.ts help text with auto-generated
generateHelpText()from COMMAND_DESCRIPTIONS - Upgraded LLM eval judge from Haiku to Sonnet 4.6 for more stable scoring
Added
- Usage string consistency test: cross-checks
Usage:patterns in implementation against COMMAND_DESCRIPTIONS - Pipe guard test: ensures no command description contains
|(would break markdown tables)
0.3.3 — 2026-03-13
Added
- SKILL.md template system —
.tmplfiles with{{COMMAND_REFERENCE}}and{{SNAPSHOT_FLAGS}}placeholders, auto-generated from source code at build time. Structurally prevents command drift between docs and code. - Command registry (
browse/src/commands.ts) — single source of truth for all browse commands with categories and enriched descriptions. Zero side effects, safe to import from build scripts and tests. - Snapshot flags metadata (
SNAPSHOT_FLAGSarray inbrowse/src/snapshot.ts) — metadata-driven parser replaces hand-coded switch/case. Adding a flag in one place updates the parser, docs, and tests. - Tier 1 static validation — 43 tests: parses
$Bcommands from SKILL.md code blocks, validates against command registry and snapshot flag metadata - Tier 2 E2E tests via Agent SDK — spawns real Claude sessions, runs skills, scans for browse errors. Gated by
SKILL_E2E=1env var (~$0.50/run) - Tier 3 LLM-as-judge evals — Haiku scores generated docs on clarity/completeness/actionability (threshold ≥4/5), plus regression test vs hand-maintained baseline. Gated by
ANTHROPIC_API_KEY bun run skill:check— health dashboard showing all skills, command counts, validation status, template freshnessbun run dev:skill— watch mode that regenerates and validates SKILL.md on every template or source file change- CI workflow (
.github/workflows/skill-docs.yml) — runsgen:skill-docson push/PR, fails if generated output differs from committed files bun run gen:skill-docsscript for manual regenerationbun run test:evalfor LLM-as-judge evalstest/helpers/skill-parser.ts— extracts and validates$Bcommands from Markdowntest/helpers/session-runner.ts— Agent SDK wrapper with error pattern scanning and transcript saving- ARCHITECTURE.md — design decisions document covering daemon model, security, ref system, logging, crash recovery
- Conductor integration (
conductor.json) — lifecycle hooks for workspace setup/teardown .envpropagation —bin/dev-setupcopies.envfrom main worktree into Conductor workspaces automatically.env.exampletemplate for API key configuration
Changed
- Build now runs
gen:skill-docsbefore compiling binaries parseSnapshotArgsis metadata-driven (iteratesSNAPSHOT_FLAGSinstead of switch/case)server.tsimports command sets fromcommands.tsinstead of declaring inline- SKILL.md and browse/SKILL.md are now generated files (edit the
.tmplinstead)
0.3.2 — 2026-03-13
Fixed
- Cookie import picker now returns JSON instead of HTML —
jsonResponse()referencedurlout of scope, crashing every API call helpcommand routed correctly (was unreachable due to META_COMMANDS dispatch ordering)- Stale servers from global install no longer shadow local changes — removed legacy
~/.claude/skills/gstackfallback fromresolveServerScript() - Crash log path references updated from
/tmp/to.gstack/
Added
- Diff-aware QA mode —
/qaon a feature branch auto-analyzesgit diff, identifies affected pages/routes, detects the running app on localhost, and tests only what changed. No URL needed. - Project-local browse state — state file, logs, and all server state now live in
.gstack/inside the project root (detected viagit rev-parse --show-toplevel). No more/tmpstate files. - Shared config module (
browse/src/config.ts) — centralizes path resolution for CLI and server, eliminates duplicated port/state logic - Random port selection — server picks a random port 10000-60000 instead of scanning 9400-9409. No more CONDUCTOR_PORT magic offset. No more port collisions across workspaces.
- Binary version tracking — state file includes
binaryVersionSHA; CLI auto-restarts the server when the binary is rebuilt - Legacy /tmp cleanup — CLI scans for and removes old
/tmp/browse-server*.jsonfiles, verifying PID ownership before sending signals - Greptile integration —
/reviewand/shipfetch and triage Greptile bot comments;/retrotracks Greptile batting average across weeks - Local dev mode —
bin/dev-setupsymlinks skills from the repo for in-place development;bin/dev-teardownrestores global install helpcommand — agents can self-discover all commands and snapshot flags- Version-aware
find-browsewith META signal protocol — detects stale binaries and prompts agents to update browse/dist/find-browsecompiled binary with git SHA comparison against origin/main (4hr cached).versionfile written at build time for binary version tracking- Route-level tests for cookie picker (13 tests) and find-browse version check (10 tests)
- Config resolution tests (14 tests) covering git root detection, BROWSE_STATE_FILE override, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection
- Browser interaction guidance in CLAUDE.md — prevents Claude from using mcp__claude-in-chrome__* tools
- CONTRIBUTING.md with quick start, dev mode explanation, and instructions for testing branches in other repos
Changed
- State file location:
.gstack/browse.json(was/tmp/browse-server.json) - Log files location:
.gstack/browse-{console,network,dialog}.log(was/tmp/browse-*.log) - Atomic state file writes:
.json.tmp→ rename (prevents partial reads) - CLI passes
BROWSE_STATE_FILEto spawned server (server derives all paths from it) - SKILL.md setup checks parse META signals and handle
META:UPDATE_AVAILABLE /qaSKILL.md now describes four modes (diff-aware, full, quick, regression) with diff-aware as the default on feature branchesjsonResponse/errorResponseuse options objects to prevent positional parameter confusion- Build script compiles both
browseandfind-browsebinaries, cleans up.bun-buildtemp files - README updated with Greptile setup instructions, diff-aware QA examples, and revised demo transcript
Removed
CONDUCTOR_PORTmagic offset (browse_port = CONDUCTOR_PORT - 45600)- Port scan range 9400-9409
- Legacy fallback to
~/.claude/skills/gstack/browse/src/server.ts DEVELOPING_GSTACK.md(renamed to CONTRIBUTING.md)
0.3.1 — 2026-03-12
Phase 3.5: Browser cookie import
cookie-import-browsercommand — decrypt and import cookies from real Chromium browsers (Comet, Chrome, Arc, Brave, Edge)- Interactive cookie picker web UI served from the browse server (dark theme, two-panel layout, domain search, import/remove)
- Direct CLI import with
--domainflag for non-interactive use /setup-browser-cookiesskill for Claude Code integration- macOS Keychain access with async 10s timeout (no event loop blocking)
- Per-browser AES key caching (one Keychain prompt per browser per session)
- DB lock fallback: copies locked cookie DB to /tmp for safe reads
- 18 unit tests with encrypted cookie fixtures
0.3.0 — 2026-03-12
Phase 3: /qa skill — systematic QA testing
- New
/qaskill with 6-phase workflow (Initialize, Authenticate, Orient, Explore, Document, Wrap up) - Three modes: full (systematic, 5-10 issues), quick (30-second smoke test), regression (compare against baseline)
- Issue taxonomy: 7 categories, 4 severity levels, per-page exploration checklist
- Structured report template with health score (0-100, weighted across 7 categories)
- Framework detection guidance for Next.js, Rails, WordPress, and SPAs
browse/bin/find-browse— DRY binary discovery usinggit rev-parse --show-toplevel
Phase 2: Enhanced browser
- Dialog handling: auto-accept/dismiss, dialog buffer, prompt text support
- File upload:
upload <sel> <file1> [file2...] - Element state checks:
is visible|hidden|enabled|disabled|checked|editable|focused <sel> - Annotated screenshots with ref labels overlaid (
snapshot -a) - Snapshot diffing against previous snapshot (
snapshot -D) - Cursor-interactive element scan for non-ARIA clickables (
snapshot -C) wait --networkidle/--load/--domcontentloadedflagsconsole --errorsfilter (error + warning only)cookie-import <json-file>with auto-fill domain from page URL- CircularBuffer O(1) ring buffer for console/network/dialog buffers
- Async buffer flush with Bun.write()
- Health check with page.evaluate + 2s timeout
- Playwright error wrapping — actionable messages for AI agents
- Context recreation preserves cookies/storage/URLs (useragent fix)
- SKILL.md rewritten as QA-oriented playbook with 10 workflow patterns
- 166 integration tests (was ~63)
0.0.2 — 2026-03-12
- Fix project-local
/browseinstalls — compiled binary now resolvesserver.tsfrom its own directory instead of assuming a global install exists setuprebuilds stale binaries (not just missing ones) and exits non-zero if the build fails- Fix
chaincommand swallowing real errors from write commands (e.g. navigation timeout reported as "Unknown meta command") - Fix unbounded restart loop in CLI when server crashes repeatedly on the same command
- Cap console/network buffers at 50k entries (ring buffer) instead of growing without bound
- Fix disk flush stopping silently after buffer hits the 50k cap
- Fix
ln -snfin setup to avoid creating nested symlinks on upgrade - Use
git fetch && git reset --hardinstead ofgit pullfor upgrades (handles force-pushes) - Simplify install: global-first with optional project copy (replaces submodule approach)
- Restructured README: hero, before/after, demo transcript, troubleshooting section
- Six skills (added
/retro)
0.0.1 — 2026-03-11
Initial release.
- Five skills:
/plan-ceo-review,/plan-eng-review,/review,/ship,/browse - Headless browser CLI with 40+ commands, ref-based interaction, persistent Chromium daemon
- One-command install as Claude Code skills (submodule or global clone)
setupscript for binary compilation and skill symlinking