From 210e1b1f25779bdc2d856c8b637319545a5eade1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sun, 15 Mar 2026 21:17:15 -0500 Subject: [PATCH] chore: bump version and changelog (v0.4.0) Co-Authored-By: Claude Opus 4.6 --- CHANGELOG.md | 25 +++++++++++++++++++++++++ CLAUDE.md | 1 + README.md | 9 +++++---- TODOS.md | 24 ++++++++++++++++++++++++ VERSION | 2 +- 5 files changed, 56 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4c571e6e..013e5978 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,30 @@ # Changelog +## 0.4.0 — 2026-03-16 + +### Added +- **QA-only skill** (`/qa-only`) — report-only QA mode that finds and documents bugs without making fixes. Uses `allowedTools` to block `Edit` tool entirely. +- **QA fix loop** — `/qa` now runs a find-fix-verify cycle: discover bugs, fix them, commit, re-navigate to confirm the fix took. +- **Plan-to-QA artifact flow** — `/plan-eng-review` writes test-plan artifacts to `~/.gstack/projects//` that `/qa` picks up for targeted testing. +- **`{{QA_METHODOLOGY}}` DRY placeholder** — shared QA methodology block injected into both `/qa` and `/qa-only` SKILL.md templates via gen-skill-docs. +- **Eval efficiency metrics** — turns, duration, and cost now displayed across all eval surfaces (summary, comparison, list, watch). Comparison output includes natural-language **Takeaway** commentary interpreting deltas. +- **`generateCommentary()` engine** — pure function that interprets comparison deltas: flags regressions, notes improvements, reports per-test efficiency changes, and produces overall summary. +- **Eval list columns** — `bun run eval:list` now shows Turns and Duration per run. +- **Eval summary per-test efficiency** — `bun run eval:summary` shows average turns/duration/cost per test across runs. +- **`judgePassed()` unit tests** — extracted and tested the pass/fail judgment logic. +- **3 new E2E tests** — qa-only no-fix guardrail, qa fix loop with commit verification, plan-eng-review test-plan artifact. +- **Browser ref staleness detection** — `resolveRef()` now checks element count to detect stale refs after page mutations. +- 3 new snapshot tests for ref staleness. + +### Changed +- QA skill prompt restructured with explicit two-cycle workflow (find → fix → verify). +- `formatComparison()` now shows per-test turns and duration deltas alongside cost. +- `printSummary()` shows turns and duration columns. +- `eval-store.test.ts` fixed pre-existing `_partial` file assertion bug. + +### Fixed +- Browser ref staleness — refs collected before page mutation (e.g. SPA navigation) are now detected and re-collected. + ## 0.3.9 — 2026-03-15 ### Added diff --git a/CLAUDE.md b/CLAUDE.md index c6909357..65462335 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -42,6 +42,7 @@ gstack/ │ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s) │ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run) │ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run) +├── qa-only/ # /qa-only skill (report-only QA, no fixes) ├── ship/ # Ship workflow skill ├── review/ # PR review skill ├── plan-ceo-review/ # /plan-ceo-review skill diff --git a/README.md b/README.md index 27548066..ca2ddb77 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.** -Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands. +Nine opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands. ### Without gstack @@ -22,7 +22,8 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. | | `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. | | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. | -| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. | +| `/qa` | QA + fix engineer | Test app, find bugs, fix them with atomic commits, re-verify. Before/after health scores and ship-readiness summary. Three tiers: Quick, Standard, Exhaustive. | +| `/qa-only` | QA reporter | Report-only QA testing. Same methodology as /qa but never fixes anything. Use when you want a pure bug report without code changes. | | `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. | | `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. | @@ -103,7 +104,7 @@ This is the setup I use. One person, ten parallel agents, each with the right co Open Claude Code and paste this. Claude will do the rest. -> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it. +> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /qa-only, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it. ### Step 2: Add to your repo so teammates get it (optional) @@ -613,7 +614,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically Paste this into Claude Code: -> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. +> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. ## Development diff --git a/TODOS.md b/TODOS.md index 4916c236..7bd1176a 100644 --- a/TODOS.md +++ b/TODOS.md @@ -350,6 +350,30 @@ **Priority:** P3 **Depends on:** Eval persistence (shipped in v0.3.6) +### CI/CD QA quality gate + +**What:** Run `/qa` as a GitHub Action step, fail PR if health score drops below threshold. + +**Why:** Automated quality gate catches regressions before merge. Currently QA is manual — CI integration makes it part of the standard workflow. + +**Context:** Requires headless browse binary available in CI. The `/qa` skill already produces `baseline.json` with health scores — CI step would compare against the main branch baseline and fail if score drops. Would need `ANTHROPIC_API_KEY` in CI secrets since `/qa` uses Claude. + +**Effort:** M +**Priority:** P2 +**Depends on:** None + +### CDP-based DOM mutation detection for ref staleness + +**What:** Use Chrome DevTools Protocol `DOM.documentUpdated` / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit `snapshot` call. + +**Why:** Current ref staleness detection (async count() check) only catches stale refs at action time. CDP mutation detection would proactively warn when refs become stale, preventing the 5-second timeout entirely for SPA re-renders. + +**Context:** Parts 1+2 of ref staleness fix (RefEntry metadata + eager validation via count()) are shipped. This is Part 3 — the most ambitious piece. Requires CDP session alongside Playwright, MutationObserver bridge, and careful performance tuning to avoid overhead on every DOM change. + +**Effort:** L +**Priority:** P3 +**Depends on:** Ref staleness Parts 1+2 (shipped) + ## Completed ### Phase 1: Foundations (v0.2.0) diff --git a/VERSION b/VERSION index 940ac09a..1d0ba9ea 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.3.9 +0.4.0