From 9dcbf484ad7ec4a1e052a5074fe86faaa13daee1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 21 Mar 2026 14:39:23 -0700 Subject: [PATCH] =?UTF-8?q?docs:=20v0.9.8.0=20=E2=80=94=20deploy=20pipelin?= =?UTF-8?q?e=20+=20E2E=20performance=20+=20pre-merge=20gate?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CHANGELOG: added v0.9.8.0 entry covering /land-and-deploy, /canary, /benchmark, /setup-deploy, /review perf pass, E2E model pinning, and 3 test fixes. README: added 4 new skills to tables and install instructions, updated specialist/tool counts (18+7), added deploy pipeline to "What's new" section. /land-and-deploy: added Step 3.5 pre-merge readiness gate that checks review dashboard, E2E results, free tests, and doc-release status before merging. Uses AskUserQuestion for explicit confirmation. VERSION: 0.9.7.0 → 0.9.8.0 TODOS: updated deploy pipeline to Completed. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../skills/gstack-land-and-deploy/SKILL.md | 89 ++++++++++++++++++- CHANGELOG.md | 23 +++++ README.md | 16 ++-- TODOS.md | 15 ++-- VERSION | 2 +- land-and-deploy/SKILL.md | 89 ++++++++++++++++++- land-and-deploy/SKILL.md.tmpl | 89 ++++++++++++++++++- package.json | 2 +- 8 files changed, 301 insertions(+), 24 deletions(-) diff --git a/.agents/skills/gstack-land-and-deploy/SKILL.md b/.agents/skills/gstack-land-and-deploy/SKILL.md index 1d709339..68f81f68 100644 --- a/.agents/skills/gstack-land-and-deploy/SKILL.md +++ b/.agents/skills/gstack-land-and-deploy/SKILL.md @@ -286,11 +286,14 @@ When the user types `/land-and-deploy`, run this skill. - `/land-and-deploy #123` — specific PR number - `/land-and-deploy #123 ` — specific PR + verification URL -## Non-interactive philosophy (like /ship) +## Non-interactive philosophy (like /ship) — with one critical gate -This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT. +This is a **mostly automated** workflow. Do NOT ask for confirmation at any step except +the ones listed below. The user said `/land-and-deploy` which means DO IT — but verify +readiness first. -**Only stop for:** +**Always stop for:** +- **Pre-merge readiness gate (Step 3.5)** — this is the ONE confirmation before merge - GitHub CLI not authenticated - No PR found for this branch - CI failures or merge conflicts @@ -300,7 +303,6 @@ This is a **non-interactive, fully automated** workflow. Do NOT ask for confirma **Never stop for:** - Choosing merge method (auto-detect from repo settings) -- Confirming the merge - Timeout warnings (warn and continue gracefully) --- @@ -365,6 +367,85 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate --- +## Step 3.5: Pre-merge readiness gate + +**This is the critical safety check before an irreversible merge.** Gather ALL evidence, +present it to the user, and get explicit confirmation before merging. + +### 3.5a: Review Readiness Dashboard + +```bash +~/.codex/skills/gstack/bin/gstack-review-read 2>/dev/null +``` + +Parse the output. Find the most recent entry for each review skill within the last 7 days. +Check staleness: compare each review's commit hash against the current HEAD. + +Display the dashboard: + +``` +PRE-MERGE READINESS CHECK +══════════════════════════ +Review | Status | Last Run | Stale? +----------------|-----------|---------------------|-------- +Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind +CEO Review | CLEAR/— | — | — +Design Review | CLEAR/— | — | — +Codex Review | CLEAR/— | — | — +``` + +### 3.5b: Test results + +Check for recent E2E eval results: + +```bash +ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20 +``` + +For each eval file found from today, parse pass/fail counts and show a summary: + +``` +E2E Tests | 52/52 pass | 25 min ago | $8.50 +Free Tests | (run bun test to check) +``` + +If no E2E results from today exist, note: "No E2E tests run today." + +Also run free tests inline: + +```bash +bun test 2>&1 | tail -5 +``` + +### 3.5c: Document-release check + +Check if `/document-release` has been run on this branch: + +```bash +git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD +``` + +If no doc update commit found: flag "WARNING: /document-release has not been run." + +### 3.5d: Merge confirmation + +Present all evidence in a single AskUserQuestion: + +- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y." +- Show the review dashboard, test results, and doc-release status from above. +- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update), + list them explicitly as warnings. +- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings + to address first. Choose C to skip checks and merge anyway. +- A) Merge — all checks look good +- B) Don't merge yet — fix the flagged issues first +- C) Merge anyway — I accept the risks + +If the user chooses B: **STOP** with a summary of what needs to be done. +If the user chooses A or C: continue to Step 4. + +--- + ## Step 4: Merge the PR Record the start timestamp for timing data. diff --git a/CHANGELOG.md b/CHANGELOG.md index dbfe286a..31c1fbf1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,28 @@ # Changelog +## [0.9.8.0] - 2026-03-21 — Deploy Pipeline + E2E Performance + +### Added + +- **`/land-and-deploy` — merge, deploy, and verify in one command.** Takes over where `/ship` left off. Merges the PR, waits for CI and deploy workflows, then runs canary verification on your production URL. Auto-detects your deploy platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions). Offers revert at every failure point. One command from "PR approved" to "verified in production." +- **`/canary` — post-deploy monitoring loop.** Watches your live app for console errors, performance regressions, and page failures using the browse daemon. Takes periodic screenshots, compares against pre-deploy baselines, and alerts on anomalies. Run `/canary https://myapp.com --duration 10m` after any deploy. +- **`/benchmark` — performance regression detection.** Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR. Tracks performance trends over time. Catches the bundle size regressions that code review misses. +- **`/setup-deploy` — one-time deploy configuration.** Detects your deploy platform, production URL, health check endpoints, and deploy status commands. Writes the config to CLAUDE.md so all future `/land-and-deploy` runs are fully automatic. +- **`/review` now includes Performance & Bundle Impact analysis.** The informational review pass checks for heavy dependencies, missing lazy loading, synchronous script tags, and bundle size regressions. Catches moment.js-instead-of-date-fns before it ships. + +### Changed + +- **E2E tests now run 3-5x faster.** Structure tests default to Sonnet (5x faster, 5x cheaper). Quality tests (planted-bug detection, design quality, strategic review) stay on Opus. Full suite dropped from 50-80 minutes to ~15-25 minutes. +- **`--retry 2` on all E2E tests.** Flaky tests get a second chance without masking real failures. +- **`test:e2e:fast` tier.** Excludes the 8 slowest Opus quality tests for quick feedback (~5-7 minutes). Run `bun run test:e2e:fast` for rapid iteration. +- **E2E timing telemetry.** Every test now records `first_response_ms`, `max_inter_turn_ms`, and `model` used. Wall-clock timing shows whether parallelism is actually working. + +### Fixed + +- **`plan-design-review-plan-mode` no longer races.** Each test gets its own isolated tmpdir — no more concurrent tests polluting each other's working directory. +- **`ship-local-workflow` no longer wastes 6 of 15 turns.** Ship workflow steps are inlined in the test prompt instead of having the agent read the 700+ line SKILL.md at runtime. +- **`design-consultation-core` no longer fails on synonym sections.** "Colors" matches "Color", "Type System" matches "Typography" — fuzzy synonym-based matching with all 7 sections still required. + ## [0.9.7.0] - 2026-03-21 — Plan File Review Report ### Added diff --git a/README.md b/README.md index dcb76dd8..5a032b3e 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ In the last 60 days I have written **over 600,000 lines of production code** — Same person. Different era. The difference is the tooling. -**gstack is how I do it.** It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Fifteen specialists and six power tools, all as slash commands, all Markdown, **all free, MIT license, available right now.** +**gstack is how I do it.** It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Eighteen specialists and seven power tools, all as slash commands, all Markdown, **all free, MIT license, available right now.** I am learning how to get to the edge of what agentic systems can do as of March 2026, and this is my live experiment. I am sharing it because I want the whole world on this journey with me. @@ -48,11 +48,11 @@ Expect first useful run in under 5 minutes on any repo with tests already set up Open Claude Code and paste this. Claude does the rest. -> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it. +> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it. ### Step 2: Add to your repo so teammates get it (optional) -> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. +> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background. @@ -72,7 +72,7 @@ git clone https://github.com/garrytan/gstack.git ~/gstack cd ~/gstack && ./setup --host auto ``` -This installs to `~/.claude/skills/gstack` and/or `~/.codex/skills/gstack` depending on what's available. All 21 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts. +This installs to `~/.claude/skills/gstack` and/or `~/.codex/skills/gstack` depending on what's available. All 25 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts. ## See it work @@ -140,6 +140,9 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack. | `/qa` | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. | | `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. | | `/ship` | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command. | +| `/land-and-deploy` | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. Takes over after `/ship`. One command from "approved" to "verified in production." | +| `/canary` | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. Periodic screenshots and anomaly detection. | +| `/benchmark` | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. Catch bundle size regressions before they ship. | | `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. | | `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. | | `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. | @@ -154,6 +157,7 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack. | `/freeze` | **Edit Lock** — restrict file edits to one directory. Prevents accidental changes outside scope while debugging. | | `/guard` | **Full Safety** — `/careful` + `/freeze` in one command. Maximum safety for prod work. | | `/unfreeze` | **Unlock** — remove the `/freeze` boundary. | +| `/setup-deploy` | **Deploy Configurator** — one-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands. | | `/gstack-upgrade` | **Self-Updater** — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed. | **[Deep dives with examples and philosophy for every skill →](docs/skills.md)** @@ -170,6 +174,8 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack. **Test everything.** `/ship` bootstraps test frameworks from scratch if your project doesn't have one. Every `/ship` run produces a coverage audit. Every `/qa` bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding. +**Ship to production in one command.** `/land-and-deploy` picks up where `/ship` left off — merges your PR, waits for CI and deploy, then runs canary verification on your production URL. Auto-detects Fly.io, Render, Vercel, Netlify, Heroku, or GitHub Actions. If something breaks, it offers a revert. Pair with `/canary` for extended post-deploy monitoring and `/benchmark` to catch performance regressions before they ship. + **`/document-release` is the engineer you never had.** It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now `/ship` auto-invokes it — docs stay current without an extra command. **Browser handoff when the AI gets stuck.** Hit a CAPTCHA, auth wall, or MFA prompt? `$B handoff` opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, `$B resume` picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures. @@ -200,7 +206,7 @@ Same tools, different outcome — because gstack gives you structured roles and The models are getting better fast. The people who figure out how to work with them now — really work with them, not just dabble — are going to have a massive advantage. This is that window. Let's go. -Fifteen specialists and six power tools. All slash commands. All Markdown. All free. **[github.com/garrytan/gstack](https://github.com/garrytan/gstack)** — MIT License +Eighteen specialists and seven power tools. All slash commands. All Markdown. All free. **[github.com/garrytan/gstack](https://github.com/garrytan/gstack)** — MIT License > **We're hiring.** Want to ship 10K+ LOC/day and help harden gstack? > Come work at YC — [ycombinator.com/software](https://ycombinator.com/software) diff --git a/TODOS.md b/TODOS.md index d8adb4e4..388792d6 100644 --- a/TODOS.md +++ b/TODOS.md @@ -515,11 +515,16 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr ## Completed -### Deploy pipeline (v0.7.0) -- /merge skill — review-gated PR merge → superseded by /land-and-deploy -- Deploy-verify skill → superseded by /land-and-deploy canary verification -- Post-deploy verification (ship + browse) → superseded by /land-and-deploy -**Completed:** v0.7.0 +### Deploy pipeline (v0.9.8.0) +- /land-and-deploy — merge PR, wait for CI/deploy, canary verification +- /canary — post-deploy monitoring loop with anomaly detection +- /benchmark — performance regression detection with Core Web Vitals +- /setup-deploy — one-time deploy platform configuration +- /review Performance & Bundle Impact pass +- E2E model pinning (Sonnet default, Opus for quality tests) +- E2E timing telemetry (first_response_ms, max_inter_turn_ms, wall_clock_ms) +- test:e2e:fast tier, --retry 2 on all E2E scripts +**Completed:** v0.9.8.0 ### Phase 1: Foundations (v0.2.0) - Rename to gstack diff --git a/VERSION b/VERSION index d9439040..68f4aad3 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.9.7.0 +0.9.8.0 diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md index 230bc927..6b4271e4 100644 --- a/land-and-deploy/SKILL.md +++ b/land-and-deploy/SKILL.md @@ -293,11 +293,14 @@ When the user types `/land-and-deploy`, run this skill. - `/land-and-deploy #123` — specific PR number - `/land-and-deploy #123 ` — specific PR + verification URL -## Non-interactive philosophy (like /ship) +## Non-interactive philosophy (like /ship) — with one critical gate -This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT. +This is a **mostly automated** workflow. Do NOT ask for confirmation at any step except +the ones listed below. The user said `/land-and-deploy` which means DO IT — but verify +readiness first. -**Only stop for:** +**Always stop for:** +- **Pre-merge readiness gate (Step 3.5)** — this is the ONE confirmation before merge - GitHub CLI not authenticated - No PR found for this branch - CI failures or merge conflicts @@ -307,7 +310,6 @@ This is a **non-interactive, fully automated** workflow. Do NOT ask for confirma **Never stop for:** - Choosing merge method (auto-detect from repo settings) -- Confirming the merge - Timeout warnings (warn and continue gracefully) --- @@ -372,6 +374,85 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate --- +## Step 3.5: Pre-merge readiness gate + +**This is the critical safety check before an irreversible merge.** Gather ALL evidence, +present it to the user, and get explicit confirmation before merging. + +### 3.5a: Review Readiness Dashboard + +```bash +~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null +``` + +Parse the output. Find the most recent entry for each review skill within the last 7 days. +Check staleness: compare each review's commit hash against the current HEAD. + +Display the dashboard: + +``` +PRE-MERGE READINESS CHECK +══════════════════════════ +Review | Status | Last Run | Stale? +----------------|-----------|---------------------|-------- +Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind +CEO Review | CLEAR/— | — | — +Design Review | CLEAR/— | — | — +Codex Review | CLEAR/— | — | — +``` + +### 3.5b: Test results + +Check for recent E2E eval results: + +```bash +ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20 +``` + +For each eval file found from today, parse pass/fail counts and show a summary: + +``` +E2E Tests | 52/52 pass | 25 min ago | $8.50 +Free Tests | (run bun test to check) +``` + +If no E2E results from today exist, note: "No E2E tests run today." + +Also run free tests inline: + +```bash +bun test 2>&1 | tail -5 +``` + +### 3.5c: Document-release check + +Check if `/document-release` has been run on this branch: + +```bash +git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD +``` + +If no doc update commit found: flag "WARNING: /document-release has not been run." + +### 3.5d: Merge confirmation + +Present all evidence in a single AskUserQuestion: + +- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y." +- Show the review dashboard, test results, and doc-release status from above. +- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update), + list them explicitly as warnings. +- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings + to address first. Choose C to skip checks and merge anyway. +- A) Merge — all checks look good +- B) Don't merge yet — fix the flagged issues first +- C) Merge anyway — I accept the risks + +If the user chooses B: **STOP** with a summary of what needs to be done. +If the user chooses A or C: continue to Step 4. + +--- + ## Step 4: Merge the PR Record the start timestamp for timing data. diff --git a/land-and-deploy/SKILL.md.tmpl b/land-and-deploy/SKILL.md.tmpl index b860e106..a026eca2 100644 --- a/land-and-deploy/SKILL.md.tmpl +++ b/land-and-deploy/SKILL.md.tmpl @@ -35,11 +35,14 @@ When the user types `/land-and-deploy`, run this skill. - `/land-and-deploy #123` — specific PR number - `/land-and-deploy #123 ` — specific PR + verification URL -## Non-interactive philosophy (like /ship) +## Non-interactive philosophy (like /ship) — with one critical gate -This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT. +This is a **mostly automated** workflow. Do NOT ask for confirmation at any step except +the ones listed below. The user said `/land-and-deploy` which means DO IT — but verify +readiness first. -**Only stop for:** +**Always stop for:** +- **Pre-merge readiness gate (Step 3.5)** — this is the ONE confirmation before merge - GitHub CLI not authenticated - No PR found for this branch - CI failures or merge conflicts @@ -49,7 +52,6 @@ This is a **non-interactive, fully automated** workflow. Do NOT ask for confirma **Never stop for:** - Choosing merge method (auto-detect from repo settings) -- Confirming the merge - Timeout warnings (warn and continue gracefully) --- @@ -114,6 +116,85 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate --- +## Step 3.5: Pre-merge readiness gate + +**This is the critical safety check before an irreversible merge.** Gather ALL evidence, +present it to the user, and get explicit confirmation before merging. + +### 3.5a: Review Readiness Dashboard + +```bash +~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null +``` + +Parse the output. Find the most recent entry for each review skill within the last 7 days. +Check staleness: compare each review's commit hash against the current HEAD. + +Display the dashboard: + +``` +PRE-MERGE READINESS CHECK +══════════════════════════ +Review | Status | Last Run | Stale? +----------------|-----------|---------------------|-------- +Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind +CEO Review | CLEAR/— | — | — +Design Review | CLEAR/— | — | — +Codex Review | CLEAR/— | — | — +``` + +### 3.5b: Test results + +Check for recent E2E eval results: + +```bash +ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20 +``` + +For each eval file found from today, parse pass/fail counts and show a summary: + +``` +E2E Tests | 52/52 pass | 25 min ago | $8.50 +Free Tests | (run bun test to check) +``` + +If no E2E results from today exist, note: "No E2E tests run today." + +Also run free tests inline: + +```bash +bun test 2>&1 | tail -5 +``` + +### 3.5c: Document-release check + +Check if `/document-release` has been run on this branch: + +```bash +git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD +``` + +If no doc update commit found: flag "WARNING: /document-release has not been run." + +### 3.5d: Merge confirmation + +Present all evidence in a single AskUserQuestion: + +- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y." +- Show the review dashboard, test results, and doc-release status from above. +- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update), + list them explicitly as warnings. +- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings + to address first. Choose C to skip checks and merge anyway. +- A) Merge — all checks look good +- B) Don't merge yet — fix the flagged issues first +- C) Merge anyway — I accept the risks + +If the user chooses B: **STOP** with a summary of what needs to be done. +If the user chooses A or C: continue to Step 4. + +--- + ## Step 4: Merge the PR Record the start timestamp for timing data. diff --git a/package.json b/package.json index 5964fd44..0f6d846b 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "0.3.3", + "version": "0.9.8.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module",