mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
chore: bump version and changelog (v0.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,30 @@
|
||||
# Changelog
|
||||
|
||||
## 0.4.0 — 2026-03-16
|
||||
|
||||
### Added
|
||||
- **QA-only skill** (`/qa-only`) — report-only QA mode that finds and documents bugs without making fixes. Uses `allowedTools` to block `Edit` tool entirely.
|
||||
- **QA fix loop** — `/qa` now runs a find-fix-verify cycle: discover bugs, fix them, commit, re-navigate to confirm the fix took.
|
||||
- **Plan-to-QA artifact flow** — `/plan-eng-review` writes test-plan artifacts to `~/.gstack/projects/<slug>/` that `/qa` picks up for targeted testing.
|
||||
- **`{{QA_METHODOLOGY}}` DRY placeholder** — shared QA methodology block injected into both `/qa` and `/qa-only` SKILL.md templates via gen-skill-docs.
|
||||
- **Eval efficiency metrics** — turns, duration, and cost now displayed across all eval surfaces (summary, comparison, list, watch). Comparison output includes natural-language **Takeaway** commentary interpreting deltas.
|
||||
- **`generateCommentary()` engine** — pure function that interprets comparison deltas: flags regressions, notes improvements, reports per-test efficiency changes, and produces overall summary.
|
||||
- **Eval list columns** — `bun run eval:list` now shows Turns and Duration per run.
|
||||
- **Eval summary per-test efficiency** — `bun run eval:summary` shows average turns/duration/cost per test across runs.
|
||||
- **`judgePassed()` unit tests** — extracted and tested the pass/fail judgment logic.
|
||||
- **3 new E2E tests** — qa-only no-fix guardrail, qa fix loop with commit verification, plan-eng-review test-plan artifact.
|
||||
- **Browser ref staleness detection** — `resolveRef()` now checks element count to detect stale refs after page mutations.
|
||||
- 3 new snapshot tests for ref staleness.
|
||||
|
||||
### Changed
|
||||
- QA skill prompt restructured with explicit two-cycle workflow (find → fix → verify).
|
||||
- `formatComparison()` now shows per-test turns and duration deltas alongside cost.
|
||||
- `printSummary()` shows turns and duration columns.
|
||||
- `eval-store.test.ts` fixed pre-existing `_partial` file assertion bug.
|
||||
|
||||
### Fixed
|
||||
- Browser ref staleness — refs collected before page mutation (e.g. SPA navigation) are now detected and re-collected.
|
||||
|
||||
## 0.3.9 — 2026-03-15
|
||||
|
||||
### Added
|
||||
|
||||
@@ -42,6 +42,7 @@ gstack/
|
||||
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
|
||||
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
|
||||
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
|
||||
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
|
||||
├── ship/ # Ship workflow skill
|
||||
├── review/ # PR review skill
|
||||
├── plan-ceo-review/ # /plan-ceo-review skill
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
**gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
|
||||
|
||||
Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands.
|
||||
Nine opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands.
|
||||
|
||||
### Without gstack
|
||||
|
||||
@@ -22,7 +22,8 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e
|
||||
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
|
||||
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
|
||||
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
|
||||
| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. |
|
||||
| `/qa` | QA + fix engineer | Test app, find bugs, fix them with atomic commits, re-verify. Before/after health scores and ship-readiness summary. Three tiers: Quick, Standard, Exhaustive. |
|
||||
| `/qa-only` | QA reporter | Report-only QA testing. Same methodology as /qa but never fixes anything. Use when you want a pure bug report without code changes. |
|
||||
| `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
|
||||
| `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
|
||||
|
||||
@@ -103,7 +104,7 @@ This is the setup I use. One person, ten parallel agents, each with the right co
|
||||
|
||||
Open Claude Code and paste this. Claude will do the rest.
|
||||
|
||||
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
|
||||
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /qa-only, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
|
||||
|
||||
### Step 2: Add to your repo so teammates get it (optional)
|
||||
|
||||
@@ -613,7 +614,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically
|
||||
|
||||
Paste this into Claude Code:
|
||||
|
||||
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
|
||||
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
|
||||
|
||||
## Development
|
||||
|
||||
|
||||
@@ -350,6 +350,30 @@
|
||||
**Priority:** P3
|
||||
**Depends on:** Eval persistence (shipped in v0.3.6)
|
||||
|
||||
### CI/CD QA quality gate
|
||||
|
||||
**What:** Run `/qa` as a GitHub Action step, fail PR if health score drops below threshold.
|
||||
|
||||
**Why:** Automated quality gate catches regressions before merge. Currently QA is manual — CI integration makes it part of the standard workflow.
|
||||
|
||||
**Context:** Requires headless browse binary available in CI. The `/qa` skill already produces `baseline.json` with health scores — CI step would compare against the main branch baseline and fail if score drops. Would need `ANTHROPIC_API_KEY` in CI secrets since `/qa` uses Claude.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
### CDP-based DOM mutation detection for ref staleness
|
||||
|
||||
**What:** Use Chrome DevTools Protocol `DOM.documentUpdated` / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit `snapshot` call.
|
||||
|
||||
**Why:** Current ref staleness detection (async count() check) only catches stale refs at action time. CDP mutation detection would proactively warn when refs become stale, preventing the 5-second timeout entirely for SPA re-renders.
|
||||
|
||||
**Context:** Parts 1+2 of ref staleness fix (RefEntry metadata + eager validation via count()) are shipped. This is Part 3 — the most ambitious piece. Requires CDP session alongside Playwright, MutationObserver bridge, and careful performance tuning to avoid overhead on every DOM change.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P3
|
||||
**Depends on:** Ref staleness Parts 1+2 (shipped)
|
||||
|
||||
## Completed
|
||||
|
||||
### Phase 1: Foundations (v0.2.0)
|
||||
|
||||
Reference in New Issue
Block a user