feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)

* fix: cookie import picker returns JSON instead of HTML jsonResponse() was defined at module scope but referenced `url` which only existed as a parameter of handleCookiePickerRoute(). Every API call crashed, the catch block also crashed, and Bun returned a default HTML page that the frontend couldn't parse as JSON. Thread port via corsOrigin() helper and options objects. Add route-level tests to prevent this class of bug from shipping again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add help command to browse server Agents that don't have SKILL.md loaded (or misread flags) had no way to self-discover the CLI. The help command returns a formatted reference of all commands and snapshot flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: version-aware find-browse with META signal protocol Agents in other workspaces found stale browse binaries that were missing newer flags. find-browse now compares the local binary's git SHA against origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE when behind. SKILL.md setup checks parse META signals and prompt the user to update. - New compiled binary: browse/dist/find-browse (TypeScript, testable) - Bash shim at browse/bin/find-browse delegates to compiled binary - .version file written at build time with git commit SHA - Build script compiles both browse and find-browse binaries - Graceful degradation: offline, missing .version, corrupt cache all skip check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: clean up .bun-build temp files after compile bun build --compile leaves ~58MB temp files in the working directory. Add rm -f .*.bun-build to the build script to clean up after each build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make help command reachable by removing it from META_COMMANDS help was in META_COMMANDS, so it dispatched to handleMetaCommand() which threw "Unknown meta command: help". Removing it from the set lets the dedicated else-if handler in handleCommand() execute correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared Greptile comment triage reference doc Shared reference for fetching, filtering, and classifying Greptile review comments on GitHub PRs. Used by both /review and /ship skills. Includes parallel API fetching, suppressions check, classification logic, reply APIs, and history file writes. * feat: make /review and /ship Greptile-aware /review: Step 2.5 fetches and classifies Greptile comments, Step 5 resolves them with AskUserQuestion for valid issues and false positives. /ship: Step 3.75 triages Greptile comments between pre-landing review and version bump. Adds Greptile Review section to PR body in Step 8. Re-runs tests if any Greptile fixes are applied. * feat: add Greptile batting average to /retro Reads ~/.gstack/greptile-history.md, computes signal ratio (valid catches vs false positives), includes in metrics table, JSON snapshot, and Code Quality Signals narrative. * docs: add Greptile integration section to README Personal endorsement, two-layer review narrative, full UX walkthrough transcript, skills table updates. Add Greptile training feedback loop to TODO.md future ideas. * feat: add local dev mode for testing skills from within the repo bin/dev-setup creates .claude/skills/gstack symlink to the working tree so Claude Code discovers skills locally. bin/dev-teardown cleans up. DEVELOPING_GSTACK.md documents the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow gitignore to .claude/skills/ instead of all .claude/ Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Rewritten as a contributor-friendly guide instead of a dry plan doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add browser interaction guidance to CLAUDE.md Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared config module for project-local browse state Centralizes path resolution (git root detection, state dir, log paths) into config.ts. Both cli.ts and server.ts import from it, eliminating duplicated PORT_OFFSET/BROWSE_PORT/STATE_FILE logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rewrite port selection to use random ports Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port 10000-60000. Atomic state file writes, log paths from config module, binaryVersion field for auto-restart on update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: move browse state from /tmp to project-local .gstack/ CLI now uses config module for state paths, passes BROWSE_STATE_FILE to spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup with PID verification, and removes stale global install fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update crash log path reference to .gstack/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add config tests and update CLI lifecycle test 14 new tests for config resolution, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection. Remove obsolete CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update BROWSER.md and TODO.md for project-local state Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document random port selection and per-project isolation. Add server bundling TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 - README: replace Conductor-aware language with project-local isolation, add Greptile setup note - CHANGELOG: comprehensive v0.3.2 entry with all state management changes - CONTRIBUTING: add instructions for testing branches in other repos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff When on a feature branch, /qa now reads git diff main, identifies affected pages/routes from changed files, and tests them automatically. No URL required. The most natural flow: write code, /ship, /qa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update CHANGELOG for complete v0.3.2 coverage Add missing entries: diff-aware QA mode, Greptile integration, local dev mode, crash log path fix, README/SKILL.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-03-13 18:10:56 -07:00
parent f7b95329c1
commit 07b4e15b34
30 changed files with 1629 additions and 110 deletions
@@ -19,10 +19,10 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e
 |-------|------|--------------|
 | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
 | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
-| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. |
-| `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. |
+| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
+| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
 | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
-| `/qa` | QA lead | Systematic QA testing with structured reports, health scores, screenshots, and regression tracking. Three modes: full, quick, regression. |
+| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. |
 | `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
 | `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |

@@ -63,6 +63,12 @@ You:   /ship

 Claude: [Syncs main, runs tests, pushes branch, opens PR — 6 tool calls, done]

+You:   /qa
+
+Claude: Analyzing branch diff... 8 files changed, 3 routes affected.
+        [Tests /listings/new, /listings/:id, /api/listings against localhost:3000]
+        All 3 routes working. Upload + enrichment flow passes end to end.
+
 You:   /setup-browser-cookies staging.myapp.com

 Claude: Imported 8 cookies for staging.myapp.com from Chrome.
@@ -71,12 +77,6 @@ You:   /qa https://staging.myapp.com --quick

 Claude: [Smoke test: homepage + 5 pages, 30 seconds]
        Health Score: 91/100. No critical issues. 1 medium: mobile nav overlap.
-
-You:   /browse staging.myapp.com/listings/new — test the upload flow specifically
-
-Claude: [22 tool calls — navigates routes, fills the upload form, verifies
-        enrichment renders, checks console for errors, screenshots each step]
-        Listing flow works end to end on staging.
 ```

 ## Who this is for
@@ -91,7 +91,7 @@ gstack is powerful with one Claude Code session. It is transformative with ten.

 [Conductor](https://conductor.build) runs multiple Claude Code sessions in parallel — each in its own isolated workspace. That means you can have one session running `/qa` on staging, another doing `/review` on a PR, a third implementing a feature, and seven more working on other branches. All at the same time.

-gstack is Conductor-aware out of the box. Each workspace gets its own isolated browser instance (separate Chromium process, cookies, tabs, and logs) so `/browse` and `/qa` sessions never collide. No configuration needed — it just works.
+Each workspace gets its own isolated browser instance automatically — separate Chromium process, cookies, tabs, and logs stored in `.gstack/` inside each project root. No port collisions, no shared state, no configuration needed. `/browse` and `/qa` sessions never interfere with each other, even across ten parallel workspaces.

 This is the setup I use. One person, ten parallel agents, each with the right cognitive mode for its task. That is not incremental improvement. That is a different way of building software.

@@ -338,6 +338,72 @@ I want the plane landed.

 ---

+## Greptile integration
+
+[Greptile](https://greptile.com) is a YC company that reviews your PRs automatically. It catches real bugs — race conditions, security issues, things that pass CI and blow up in production. It has genuinely saved my ass more than once. I love these guys.
+
+**Setup:** Install Greptile on your GitHub repo at [greptile.com](https://greptile.com) — it takes about 30 seconds. Once it's reviewing your PRs, gstack picks up its comments automatically. No additional configuration.
+
+The problem with any automated reviewer is triage. Greptile is good, but not every comment is a real issue. Some are false positives. Some flag things you already fixed three commits ago. Without a triage layer, the comments pile up and you start ignoring them — which defeats the purpose.
+
+gstack solves this. `/review` and `/ship` are now Greptile-aware. They read Greptile's comments, classify each one, and take action:
+
+- **Valid issues** get added to the critical findings and fixed before shipping
+- **Already-fixed issues** get an auto-reply acknowledging the catch
+- **False positives** get pushed back — you confirm, and a reply goes out explaining why it's wrong
+
+The result is a two-layer review: Greptile catches things asynchronously on the PR, then `/review` and `/ship` triage those findings as part of the normal workflow. Nothing falls through the cracks.
+
+It also learns. Every false positive you confirm gets saved to `~/.gstack/greptile-history.md`. Future runs auto-skip known FP patterns for your codebase. And `/retro` tracks Greptile's batting average over time — so you can see whether the signal-to-noise ratio is improving.
+
+### Example
+
+```
+You:   /ship
+
+Claude: [syncs main, runs tests, pre-landing review...]
+
+        Greptile found 3 comments on this PR:
+
+        [VALID] app/services/payment_service.rb:47 — Race condition:
+        concurrent charges can double-debit. Recommend DB-level advisory
+        lock around the charge block.
+        → https://github.com/you/app/pull/42#discussion_r123
+
+        [ALREADY FIXED] app/models/user.rb:12 — Missing null check
+        on email before downcase.
+        → Already fixed in commit abc1234. Auto-replying.
+
+        [FALSE POSITIVE] lib/auth.rb:88 — "Token comparison should
+        use constant-time comparison." We already use
+        ActiveSupport::SecurityUtils.secure_compare here.
+        → https://github.com/you/app/pull/42#discussion_r125
+
+        Fixing the race condition in payment_service.rb...
+        Applied advisory lock. Committed: fix: address Greptile
+        review — advisory lock on concurrent charges
+
+        The auth token comment is a false positive — we're already
+        using secure_compare.
+
+        Options for the false positive:
+          A) Reply to Greptile explaining (recommended)
+          B) Fix it anyway
+          C) Ignore
+
+You:    A
+
+Claude: Replied to Greptile. Re-running tests after the fix...
+        All tests pass. Continuing to version bump.
+
+        [creates PR with Greptile Review section in body]
+        PR: https://github.com/you/app/pull/42
+```
+
+Three Greptile comments. One real fix. One auto-acknowledged. One false positive pushed back with a reply. Total extra time: about 30 seconds. And the PR body has the full audit trail.
+
+---
+
 ## `/browse`

 This is my **QA engineer mode**.
@@ -405,11 +471,31 @@ This is my **QA lead mode**.

 `/browse` gives the agent eyes. `/qa` gives it a testing methodology.

-Where `/browse` is a single command — go here, click this, screenshot that — `/qa` is a full systematic test pass. It explores every reachable page, fills forms, clicks buttons, checks console errors, tests responsive layouts, and produces a structured report with a health score, screenshots as evidence, and ranked issues with repro steps.
+The most common use case: you're on a feature branch, you just finished coding, and you want to verify everything works. Just say `/qa` — it reads your git diff, identifies which pages and routes your changes affect, spins up the browser, and tests each one. No URL required. No manual test plan. It figures out what to test from the code you changed.

-Three modes:
+```
+You:   /qa

- **Full** (default) — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
+Claude: Analyzing branch diff against main...
+        12 files changed: 3 controllers, 2 views, 4 services, 3 tests
+
+        Affected routes: /listings/new, /listings/:id, /api/listings
+        Detected app running on localhost:3000.
+
+        [Tests each affected page — navigates, fills forms, clicks buttons,
+        screenshots, checks console errors]
+
+        QA Report: 3 routes tested, all working.
+        - /listings/new: upload + enrichment flow works end to end
+        - /listings/:id: detail page renders correctly
+        - /api/listings: returns 200 with expected shape
+        No console errors. No regressions on adjacent pages.
+```
+
+Four modes:
+
+- **Diff-aware** (automatic on feature branches) — reads `git diff main`, identifies affected pages, tests them specifically. The fastest path from "I just wrote code" to "it works."
+- **Full** — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
 - **Quick** (`--quick`) — 30-second smoke test. Homepage + top 5 nav targets. Loads? Console errors? Broken links?
 - **Regression** (`--regression baseline.json`) — run full mode, then diff against a previous baseline. Which issues are fixed? Which are new? What's the score delta?