feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)

* fix: cookie import picker returns JSON instead of HTML

jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.

Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add help command to browse server

Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: version-aware find-browse with META signal protocol

Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.

- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: clean up .bun-build temp files after compile

bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make help command reachable by removing it from META_COMMANDS

help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared Greptile comment triage reference doc

Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.

* feat: make /review and /ship Greptile-aware

/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.

/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.

* feat: add Greptile batting average to /retro

Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.

* docs: add Greptile integration section to README

Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.

* feat: add local dev mode for testing skills from within the repo

bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow gitignore to .claude/skills/ instead of all .claude/

Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md

Rewritten as a contributor-friendly guide instead of a dry plan doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add browser interaction guidance to CLAUDE.md

Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared config module for project-local browse state

Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: rewrite port selection to use random ports

Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: move browse state from /tmp to project-local .gstack/

CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update crash log path reference to .gstack/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add config tests and update CLI lifecycle test

14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update BROWSER.md and TODO.md for project-local state

Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2

- README: replace Conductor-aware language with project-local isolation,
  add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff

When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: update CHANGELOG for complete v0.3.2 coverage

Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-13 18:10:56 -07:00
committed by GitHub
parent f7b95329c1
commit 07b4e15b34
30 changed files with 1629 additions and 110 deletions
+99 -13
View File
@@ -19,10 +19,10 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e
|-------|------|--------------|
| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. |
| `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
| `/qa` | QA lead | Systematic QA testing with structured reports, health scores, screenshots, and regression tracking. Three modes: full, quick, regression. |
| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. |
| `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
| `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
@@ -63,6 +63,12 @@ You: /ship
Claude: [Syncs main, runs tests, pushes branch, opens PR — 6 tool calls, done]
You: /qa
Claude: Analyzing branch diff... 8 files changed, 3 routes affected.
[Tests /listings/new, /listings/:id, /api/listings against localhost:3000]
All 3 routes working. Upload + enrichment flow passes end to end.
You: /setup-browser-cookies staging.myapp.com
Claude: Imported 8 cookies for staging.myapp.com from Chrome.
@@ -71,12 +77,6 @@ You: /qa https://staging.myapp.com --quick
Claude: [Smoke test: homepage + 5 pages, 30 seconds]
Health Score: 91/100. No critical issues. 1 medium: mobile nav overlap.
You: /browse staging.myapp.com/listings/new — test the upload flow specifically
Claude: [22 tool calls — navigates routes, fills the upload form, verifies
enrichment renders, checks console for errors, screenshots each step]
Listing flow works end to end on staging.
```
## Who this is for
@@ -91,7 +91,7 @@ gstack is powerful with one Claude Code session. It is transformative with ten.
[Conductor](https://conductor.build) runs multiple Claude Code sessions in parallel — each in its own isolated workspace. That means you can have one session running `/qa` on staging, another doing `/review` on a PR, a third implementing a feature, and seven more working on other branches. All at the same time.
gstack is Conductor-aware out of the box. Each workspace gets its own isolated browser instance (separate Chromium process, cookies, tabs, and logs) so `/browse` and `/qa` sessions never collide. No configuration needed — it just works.
Each workspace gets its own isolated browser instance automatically — separate Chromium process, cookies, tabs, and logs stored in `.gstack/` inside each project root. No port collisions, no shared state, no configuration needed. `/browse` and `/qa` sessions never interfere with each other, even across ten parallel workspaces.
This is the setup I use. One person, ten parallel agents, each with the right cognitive mode for its task. That is not incremental improvement. That is a different way of building software.
@@ -338,6 +338,72 @@ I want the plane landed.
---
## Greptile integration
[Greptile](https://greptile.com) is a YC company that reviews your PRs automatically. It catches real bugs — race conditions, security issues, things that pass CI and blow up in production. It has genuinely saved my ass more than once. I love these guys.
**Setup:** Install Greptile on your GitHub repo at [greptile.com](https://greptile.com) — it takes about 30 seconds. Once it's reviewing your PRs, gstack picks up its comments automatically. No additional configuration.
The problem with any automated reviewer is triage. Greptile is good, but not every comment is a real issue. Some are false positives. Some flag things you already fixed three commits ago. Without a triage layer, the comments pile up and you start ignoring them — which defeats the purpose.
gstack solves this. `/review` and `/ship` are now Greptile-aware. They read Greptile's comments, classify each one, and take action:
- **Valid issues** get added to the critical findings and fixed before shipping
- **Already-fixed issues** get an auto-reply acknowledging the catch
- **False positives** get pushed back — you confirm, and a reply goes out explaining why it's wrong
The result is a two-layer review: Greptile catches things asynchronously on the PR, then `/review` and `/ship` triage those findings as part of the normal workflow. Nothing falls through the cracks.
It also learns. Every false positive you confirm gets saved to `~/.gstack/greptile-history.md`. Future runs auto-skip known FP patterns for your codebase. And `/retro` tracks Greptile's batting average over time — so you can see whether the signal-to-noise ratio is improving.
### Example
```
You: /ship
Claude: [syncs main, runs tests, pre-landing review...]
Greptile found 3 comments on this PR:
[VALID] app/services/payment_service.rb:47 — Race condition:
concurrent charges can double-debit. Recommend DB-level advisory
lock around the charge block.
→ https://github.com/you/app/pull/42#discussion_r123
[ALREADY FIXED] app/models/user.rb:12 — Missing null check
on email before downcase.
→ Already fixed in commit abc1234. Auto-replying.
[FALSE POSITIVE] lib/auth.rb:88 — "Token comparison should
use constant-time comparison." We already use
ActiveSupport::SecurityUtils.secure_compare here.
→ https://github.com/you/app/pull/42#discussion_r125
Fixing the race condition in payment_service.rb...
Applied advisory lock. Committed: fix: address Greptile
review — advisory lock on concurrent charges
The auth token comment is a false positive — we're already
using secure_compare.
Options for the false positive:
A) Reply to Greptile explaining (recommended)
B) Fix it anyway
C) Ignore
You: A
Claude: Replied to Greptile. Re-running tests after the fix...
All tests pass. Continuing to version bump.
[creates PR with Greptile Review section in body]
PR: https://github.com/you/app/pull/42
```
Three Greptile comments. One real fix. One auto-acknowledged. One false positive pushed back with a reply. Total extra time: about 30 seconds. And the PR body has the full audit trail.
---
## `/browse`
This is my **QA engineer mode**.
@@ -405,11 +471,31 @@ This is my **QA lead mode**.
`/browse` gives the agent eyes. `/qa` gives it a testing methodology.
Where `/browse` is a single command — go here, click this, screenshot that — `/qa` is a full systematic test pass. It explores every reachable page, fills forms, clicks buttons, checks console errors, tests responsive layouts, and produces a structured report with a health score, screenshots as evidence, and ranked issues with repro steps.
The most common use case: you're on a feature branch, you just finished coding, and you want to verify everything works. Just say `/qa` — it reads your git diff, identifies which pages and routes your changes affect, spins up the browser, and tests each one. No URL required. No manual test plan. It figures out what to test from the code you changed.
Three modes:
```
You: /qa
- **Full** (default) — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
Claude: Analyzing branch diff against main...
12 files changed: 3 controllers, 2 views, 4 services, 3 tests
Affected routes: /listings/new, /listings/:id, /api/listings
Detected app running on localhost:3000.
[Tests each affected page — navigates, fills forms, clicks buttons,
screenshots, checks console errors]
QA Report: 3 routes tested, all working.
- /listings/new: upload + enrichment flow works end to end
- /listings/:id: detail page renders correctly
- /api/listings: returns 200 with expected shape
No console errors. No regressions on adjacent pages.
```
Four modes:
- **Diff-aware** (automatic on feature branches) — reads `git diff main`, identifies affected pages, tests them specifically. The fastest path from "I just wrote code" to "it works."
- **Full** — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
- **Quick** (`--quick`) — 30-second smoke test. Homepage + top 5 nav targets. Loads? Console errors? Broken links?
- **Regression** (`--regression baseline.json`) — run full mode, then diff against a previous baseline. Which issues are fixed? Which are new? What's the score delta?