Merge remote-tracking branch 'origin/main' into garrytan/improve-test-coverage

This commit is contained in:
Garry Tan
2026-03-17 10:41:43 -07:00
15 changed files with 700 additions and 215 deletions
+24 -1
View File
@@ -1,7 +1,30 @@
# Changelog
## 0.5.1 — 2026-03-17
## 0.5.3 — 2026-03-17
- **You're always in control — even when dreaming big.** `/plan-ceo-review` now presents every scope expansion as an individual decision you opt into. EXPANSION mode recommends enthusiastically, but you say yes or no to each idea. No more "the agent went wild and added 5 features I didn't ask for."
- **New mode: SELECTIVE EXPANSION.** Hold your current scope as the baseline, but see what else is possible. The agent surfaces expansion opportunities one by one with neutral recommendations — you cherry-pick the ones worth doing. Perfect for iterating on existing features where you want rigor but also want to be tempted by adjacent improvements.
- **Your CEO review visions are saved, not lost.** Expansion ideas, cherry-pick decisions, and 10x visions are now persisted to `~/.gstack/projects/{repo}/ceo-plans/` as structured design documents. Stale plans get archived automatically. If a vision is exceptional, you can promote it to `docs/designs/` in your repo for the team.
- **Smarter ship gates.** `/ship` no longer nags you about CEO and Design reviews when they're not relevant. Eng Review is the only required gate (and you can disable even that with `gstack-config set skip_eng_review true`). CEO Review is recommended for big product changes; Design Review for UI work. The dashboard still shows all three — it just won't block you for the optional ones.
### For contributors
- Added SELECTIVE EXPANSION mode to `plan-ceo-review/SKILL.md.tmpl` with cherry-pick ceremony, neutral recommendation posture, and HOLD SCOPE baseline.
- Rewrote EXPANSION mode's Step 0D to include opt-in ceremony — distill vision into discrete proposals, present each as AskUserQuestion.
- Added CEO plan persistence (0D-POST step): structured markdown with YAML frontmatter (`status: ACTIVE/ARCHIVED/PROMOTED`), scope decisions table, archival flow.
- Added `docs/designs` promotion step after Review Log.
- Mode Quick Reference table expanded to 4 columns.
- Review Readiness Dashboard: Eng Review required (overridable via `skip_eng_review` config), CEO/Design optional with agent judgment.
- New tests: CEO review mode validation (4 modes, persistence, promotion), SELECTIVE EXPANSION E2E test.
## 0.5.2 — 2026-03-17
- **Your design consultant now takes creative risks.** `/design-consultation` doesn't just propose a safe, coherent system — it explicitly breaks down SAFE CHOICES (category baseline) vs. RISKS (where your product stands out). You pick which rules to break. Every risk comes with a rationale for why it works and what it costs.
- **See the competition before you choose.** When you opt into research, the agent browses competitor sites with screenshots and accessibility tree analysis — not just web search results. You see what the landscape looks like before making design decisions.
- **Preview pages that look like your product.** The preview page now renders realistic product mockups — dashboards with sidebar nav and data tables, marketing pages with hero sections, settings pages with forms — not just font swatches and color palettes.
## 0.5.1 — 2026-03-17
- **Know where you stand before you ship.** Every `/plan-ceo-review`, `/plan-eng-review`, and `/plan-design-review` now logs its result to a review tracker. At the end of each review, you see a **Review Readiness Dashboard** showing which reviews are done, when they ran, and whether they're clean — with a clear CLEARED TO SHIP or NOT READY verdict.
- **`/ship` checks your reviews before creating the PR.** Pre-flight now reads the dashboard and asks if you want to continue when reviews are missing. Informational only — it won't block you, but you'll know what you skipped.
- **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `eval $(gstack-slug)`. If the format ever changes, fix it once.
+43 -24
View File
@@ -20,7 +20,7 @@ Thirteen opinionated workflow skills for [Claude Code](https://docs.anthropic.co
| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
| `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. |
| `/design-consultation` | Design consultant | Build a complete design system from scratch. Researches competitors, proposes aesthetic + typography + color + spacing + motion, generates a preview page, and writes DESIGN.md. |
| `/design-consultation` | Design consultant | Build a complete design system from scratch. Browses competitors to get in the ballpark, proposes safe choices AND creative risks, generates realistic product mockups, and writes DESIGN.md. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
@@ -344,11 +344,15 @@ This is my **design partner mode**.
`/plan-design-review` audits a site that already exists. `/design-consultation` is for when you have nothing yet — no design system, no font choices, no color palette. You are starting from zero and you want a senior designer to sit down with you and build the whole visual identity together.
It is a conversation, not a form. The agent asks about your product, your users, and your space. If you want, it researches what top competitors in your category are doing — fonts, colors, layout patterns, aesthetic direction. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice.
It is a conversation, not a form. The agent asks about your product, your users, and your audience. It thinks about what your product needs to communicate — trust, speed, craft, warmth, whatever fits — and works backward from that to concrete choices. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice.
The key insight: individual design decisions are easy to make but hard to make coherently. Picking a font is simple. Picking a font that works with your color palette, your spacing density, your aesthetic direction, and your product's personality — that is the actual job. `/design-consultation` solves the coherence problem by proposing everything as one package, then letting you adjust any piece while it checks that the rest still holds together.
But coherence is table stakes. Every dev tool dashboard looks the same — clean sans-serif, muted grays, a blue accent. They are all coherent. They are all forgettable. The difference between a product that looks "nice" and one that people actually recognize is the **deliberate creative risks**: the unexpected serif for headings, the bold accent nobody else in your category uses, the tighter spacing that makes your data feel authoritative instead of airy.
After you agree on the system, it generates an interactive HTML preview page — your fonts rendered at real sizes, your color palette as actual UI components (buttons, cards, form inputs, alerts), light/dark mode toggle, all using your product name instead of lorem ipsum. You open the page in your browser and see the design system rendered before a single line of product code is written.
That is what `/design-consultation` is really about. It does not just propose a safe system. It proposes safe choices AND risks — and tells you which is which. "Here are the choices that keep you literate in your category. And here is where I think you should break from convention, and why." You pick which risks to take. The agent checks that the whole system still coheres either way.
If you want, the agent will browse your competitors' actual sites — take screenshots, analyze their fonts and colors and spacing — so you can see the landscape before you make choices. This is not about copying. It is about getting in the ballpark so you know what the conventions are, and then deciding which ones are worth breaking.
After you agree on the system, it generates an interactive HTML preview page — not just swatches and font samples, but realistic product pages. If you are building a dashboard, you see a dashboard with a sidebar, data tables, and stat cards. If you are building a marketing site, you see a hero section with real copy and a CTA. Everything rendered in your design system, with your product name, in light and dark mode. You see what your product could feel like before a single line of production code is written.
Then it writes `DESIGN.md` to your repo root — your project's design source of truth — and updates `CLAUDE.md` so every future Claude Code session respects the system. From that point on, `/qa-design-review` can audit against it, and any agent working on your frontend knows the rules.
@@ -356,44 +360,59 @@ Then it writes `DESIGN.md` to your repo root — your project's design source of
You: /design-consultation
Claude: From the README, this looks like a developer analytics dashboard
for engineering teams. Sound right? And would you like me to
research what other dev tools are doing for design, or should
I work from what I know?
for engineering teams. Sound right? Tell me about the feeling
you want — and would you like me to browse some competitors
first so we can see the landscape?
You: Yes, it's a dashboard for eng managers. Research would be great.
You: Yes, it's for eng managers. Show me what's out there.
Claude: I looked at Linear, Vercel, Railway, Planetscale, and Posthog.
They converge on clean sans-serifs and muted palettes, but
Linear stands out with its dark-first approach and tight spacing.
The gap: most of these feel interchangeable. Here's my proposal:
Claude: [browses Linear, Vercel, Datadog, Posthog — takes screenshots]
AESTHETIC: Industrial/Utilitarian — data-first, no decoration tax
Here's the landscape. They all converge on clean sans-serifs
and muted blue-gray palettes. Linear stands out with dark-first
and tight spacing. The rest feel interchangeable.
Here's my proposal — and where I'd play it safe vs. take risks:
AESTHETIC: Industrial/Utilitarian — data-first, every pixel earns its place
TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data)
COLOR: Restrained — slate neutrals, single teal accent (#0D9488)
SPACING: 4px base, compact density
LAYOUT: Grid-disciplined, 12-column
MOTION: Minimal-functional — only transitions that aid comprehension
This coheres because the utilitarian aesthetic demands a tight
spacing grid and restrained color. The serif display font creates
hierarchy contrast without adding visual noise. The monospace for
data tables feels native to the engineering audience.
SAFE CHOICES (match category expectations):
Geist for body — your users already know this font from Vercel
Slate neutrals — warm grays feel right for data-heavy UIs
Grid-disciplined layout — dashboards need predictable alignment
Want to adjust anything, or should I generate the preview page?
RISKS (where you'd stand out):
Instrument Serif for display headings — nobody in dev tools
uses a serif. It creates instant hierarchy contrast and
signals "we care about craft" in a space full of sans-serifs.
Single teal accent instead of blue — every competitor uses blue.
Teal is close enough to feel trustworthy, different enough
to be recognizable.
4px compact density — tighter than Linear, says "we respect
your screen real estate."
You: Generate it.
The safe choices keep you literate. The risks are where your
product gets its own face. Which risks appeal to you?
You: Love the serif risk. Let's see it.
Claude: Preview opened in your browser.
[/tmp/design-consultation-preview-1710000000.html]
The page shows Instrument Serif at hero scale, Geist in body
paragraphs, JetBrains Mono in a sample data table. Color swatches
with your teal accent against slate grays. Buttons, cards, and
alerts rendered in both light and dark mode.
The page shows a realistic dashboard: sidebar nav, header with
avatar, stat cards, data table — all in your design system.
Instrument Serif at hero scale, Geist in body text, JetBrains
Mono in the data table. Teal accent on buttons and active states.
Light and dark mode toggle in the corner.
How does it feel? Want to adjust anything before I write DESIGN.md?
You: Looks great. Ship it.
You: Ship it.
Claude: Wrote DESIGN.md (typography, color, spacing, layout, motion).
Updated CLAUDE.md with design system guidelines.
+1 -1
View File
@@ -1 +1 @@
0.5.1
0.5.3
+75 -14
View File
@@ -123,6 +123,29 @@ If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
**Find the browse binary (optional — enables visual competitive research):**
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge.
---
## Phase 1: Product Context
@@ -143,17 +166,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr
If the user wants competitive research:
**Step 1: Identify competitors via WebSearch**
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
**Step 2: Visual research via browse (if available)**
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence:
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
```bash
$B goto "https://competitor-site.com"
$B screenshot "/tmp/design-research-competitor-name.png"
$B snapshot
```
For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data.
If a competitor site blocks the headless browser or requires login, skip it and note why.
If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine.
**Step 3: Synthesize findings**
The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable).
Summarize conversationally:
> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..."
**Graceful degradation:**
- Browse available → screenshots + snapshots + WebSearch (richest research)
- Browse unavailable → WebSearch only (still good)
- WebSearch also unavailable → agent's built-in design knowledge (always works)
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
@@ -163,7 +209,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:**
```
Based on [product context] and [research findings / my design knowledge]:
@@ -178,12 +224,21 @@ MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
SAFE CHOICES (category baseline — your users expect these):
- [2-3 decisions that match category conventions, with rationale for playing safe]
RISKS (where your product gets its own face):
- [2-3 deliberate departures from convention]
- For each risk: what it is, why it works, what you gain, what it costs
The safe choices keep you literate in your category. The risks are where
your product becomes memorable. Which risks appeal to you? Want to see
different ones? Or adjust anything else?
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality.
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
@@ -273,7 +328,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
4. **Font specimen section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
@@ -281,11 +336,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system:
- **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards
- **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA
- **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button
- **Auth / onboarding:** login form with social buttons, branding, input validation states
- Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code.
7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
8. **Clean, professional layout** — the preview page IS a taste signal for the skill
9. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
+58 -14
View File
@@ -58,6 +58,12 @@ If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
**Find the browse binary (optional — enables visual competitive research):**
{{BROWSE_SETUP}}
If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge.
---
## Phase 1: Product Context
@@ -78,17 +84,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr
If the user wants competitive research:
**Step 1: Identify competitors via WebSearch**
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
**Step 2: Visual research via browse (if available)**
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence:
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
```bash
$B goto "https://competitor-site.com"
$B screenshot "/tmp/design-research-competitor-name.png"
$B snapshot
```
For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data.
If a competitor site blocks the headless browser or requires login, skip it and note why.
If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine.
**Step 3: Synthesize findings**
The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable).
Summarize conversationally:
> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..."
**Graceful degradation:**
- Browse available → screenshots + snapshots + WebSearch (richest research)
- Browse unavailable → WebSearch only (still good)
- WebSearch also unavailable → agent's built-in design knowledge (always works)
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
@@ -98,7 +127,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:**
```
Based on [product context] and [research findings / my design knowledge]:
@@ -113,12 +142,21 @@ MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
SAFE CHOICES (category baseline — your users expect these):
- [2-3 decisions that match category conventions, with rationale for playing safe]
RISKS (where your product gets its own face):
- [2-3 deliberate departures from convention]
- For each risk: what it is, why it works, what you gain, what it costs
The safe choices keep you literate in your category. The risks are where
your product becomes memorable. Which risks appeal to you? Want to see
different ones? Or adjust anything else?
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality.
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
@@ -208,7 +246,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
4. **Font specimen section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
@@ -216,11 +254,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system:
- **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards
- **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA
- **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button
- **Auth / onboarding:** login form with social buttons, branding, input validation states
- Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code.
7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
8. **Clean, professional layout** — the preview page IS a taste signal for the skill
9. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
+159 -60
View File
@@ -3,9 +3,9 @@ name: plan-ceo-review
version: 1.0.0
description: |
CEO/founder-mode plan review. Rethink the problem, find the 10-star product,
challenge premises, expand scope when it creates a better product. Three modes:
SCOPE EXPANSION (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION
(strip to essentials).
challenge premises, expand scope when it creates a better product. Four modes:
SCOPE EXPANSION (dream big), SELECTIVE EXPANSION (hold scope + cherry-pick
expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials).
allowed-tools:
- Read
- Grep
@@ -105,10 +105,11 @@ branch name wherever the instructions say "the base branch."
## Philosophy
You are not here to rubber-stamp this plan. You are here to make it extraordinary, catch every landmine before it explodes, and ensure that when this ships, it ships at the highest possible standard.
But your posture depends on what the user needs:
* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" The answer to "should we also build X?" is "yes, if it serves the vision." You have permission to dream.
* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" You have permission to dream — and to recommend enthusiastically. But every expansion is the user's decision. Present each scope-expanding idea as an AskUserQuestion. The user opts in or out.
* SELECTIVE EXPANSION: You are a rigorous reviewer who also has taste. Hold the current scope as your baseline — make it bulletproof. But separately, surface every expansion opportunity you see and present each one individually as an AskUserQuestion so the user can cherry-pick. Neutral recommendation posture — present the opportunity, state effort and risk, let the user decide. Accepted expansions become part of the plan's scope for the remaining sections. Rejected ones go to "NOT in scope."
* HOLD SCOPE: You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof — catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand.
* SCOPE REDUCTION: You are a surgeon. Find the minimum viable version that achieves the core outcome. Cut everything else. Be ruthless.
Critical rule: Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully.
Critical rule: In ALL modes, the user is 100% in control. Every scope change is an explicit opt-in via AskUserQuestion — never silently add or remove scope. Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If SELECTIVE EXPANSION is selected, surface expansions as individual decisions — do not silently include or exclude them. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully.
Do NOT make any code changes. Do NOT start implementation. Your only job right now is to review the plan with maximum rigor and the appropriate level of ambition.
## Prime Directives
@@ -164,7 +165,7 @@ Map:
### Retrospective Check
Check the git log for this branch. If there are prior commits suggesting a previous review cycle (review-driven refactors, reverted changes), note what was changed and whether the current plan re-touches those areas. Be MORE aggressive reviewing areas that were previously problematic. Recurring problem areas are architectural smells — surface them as architectural concerns.
### Taste Calibration (EXPANSION mode only)
### Taste Calibration (EXPANSION and SELECTIVE EXPANSION modes)
Identify 2-3 files or patterns in the existing codebase that are particularly well-designed. Note them as style references for the review. Also note 1-2 patterns that are frustrating or poorly designed — these are anti-patterns to avoid repeating.
Report findings before proceeding to Step 0.
@@ -187,10 +188,20 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
```
### 0D. Mode-Specific Analysis
**For SCOPE EXPANSION** — run all three:
**For SCOPE EXPANSION** — run all three, then the opt-in ceremony:
1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely.
2. Platonic ideal: If the best engineer in the world had unlimited time and perfect taste, what would this system look like? What would the user feel when using it? Start from experience, not architecture.
3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 3.
3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 5.
4. **Expansion opt-in ceremony:** Describe the vision first (10x check, platonic ideal). Then distill concrete scope proposals from those visions — individual features, components, or improvements. Present each proposal as its own AskUserQuestion. Recommend enthusiastically — explain why it's worth doing. But the user decides. Options: **A)** Add to this plan's scope **B)** Defer to TODOS.md **C)** Skip. Accepted items become plan scope for all remaining review sections. Rejected items go to "NOT in scope."
**For SELECTIVE EXPANSION** — run the HOLD SCOPE analysis first, then surface expansions:
1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
2. What is the minimum set of changes that achieves the stated goal? Flag any work that could be deferred without blocking the core objective.
3. Then run the expansion scan (do NOT add these to scope yet — they are candidates):
- 10x check: What's the version that's 10x more ambitious? Describe it concretely.
- Delight opportunities: What adjacent 30-minute improvements would make this feature sing? List at least 5.
- Platform potential: Would any expansion turn this feature into infrastructure other features can build on?
4. **Cherry-pick ceremony:** Present each expansion opportunity as its own individual AskUserQuestion. Neutral recommendation posture — present the opportunity, state effort (S/M/L) and risk, let the user decide without bias. Options: **A)** Add to this plan's scope **B)** Defer to TODOS.md **C)** Skip. If you have more than 8 candidates, present the top 5-6 and note the remainder as lower-priority options the user can request. Accepted items become plan scope for all remaining review sections. Rejected items go to "NOT in scope."
**For HOLD SCOPE** — run this:
1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
@@ -200,7 +211,57 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
1. Ruthless cut: What is the absolute minimum that ships value to a user? Everything else is deferred. No exceptions.
2. What can be a follow-up PR? Separate "must ship together" from "nice to ship together."
### 0E. Temporal Interrogation (EXPANSION and HOLD modes)
### 0D-POST. Persist CEO Plan (EXPANSION and SELECTIVE EXPANSION only)
After the opt-in/cherry-pick ceremony, write the plan to disk so the vision and decisions survive beyond this conversation. Only run this step for EXPANSION and SELECTIVE EXPANSION modes.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
```
Before writing, check for existing CEO plans in the ceo-plans/ directory. If any are >30 days old or their branch has been merged/deleted, offer to archive them:
```bash
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans/archive
# For each stale plan: mv ~/.gstack/projects/$SLUG/ceo-plans/{old-plan}.md ~/.gstack/projects/$SLUG/ceo-plans/archive/
```
Write to `~/.gstack/projects/$SLUG/ceo-plans/{date}-{feature-slug}.md` using this format:
```markdown
---
status: ACTIVE
---
# CEO Plan: {Feature Name}
Generated by /plan-ceo-review on {date}
Branch: {branch} | Mode: {EXPANSION / SELECTIVE EXPANSION}
Repo: {owner/repo}
## Vision
### 10x Check
{10x vision description}
### Platonic Ideal
{platonic ideal description — EXPANSION mode only}
## Scope Decisions
| # | Proposal | Effort | Decision | Reasoning |
|---|----------|--------|----------|-----------|
| 1 | {proposal} | S/M/L | ACCEPTED / DEFERRED / SKIPPED | {why} |
## Accepted Scope (added to this plan)
- {bullet list of what's now in scope}
## Deferred to TODOS.md
- {items with context}
```
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
HOUR 1 (foundations): What does the implementer need to know?
@@ -211,17 +272,22 @@ Think ahead to implementation: What decisions will need to be made during implem
Surface these as questions for the user NOW, not as "figure it out later."
### 0F. Mode Selection
Present three options:
1. **SCOPE EXPANSION:** The plan is good but could be great. Propose the ambitious version, then review that. Push scope up. Build the cathedral.
2. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof.
3. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that.
In every mode, you are 100% in control. No scope is added without your explicit approval.
Present four options:
1. **SCOPE EXPANSION:** The plan is good but could be great. Dream big — propose the ambitious version. Every expansion is presented individually for your approval. You opt in to each one.
2. **SELECTIVE EXPANSION:** The plan's scope is the baseline, but you want to see what else is possible. Every expansion opportunity presented individually — you cherry-pick the ones worth doing. Neutral recommendations.
3. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof. No expansions surfaced.
4. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that.
Context-dependent defaults:
* Greenfield feature → default EXPANSION
* Feature enhancement or iteration on existing system → default SELECTIVE EXPANSION
* Bug fix or hotfix → default HOLD SCOPE
* Refactor → default HOLD SCOPE
* Plan touching >15 files → suggest REDUCTION unless user pushes back
* User says "go big" / "ambitious" / "cathedral" → EXPANSION, no question
* User says "hold scope but tempt me" / "show me options" / "cherry-pick" → SELECTIVE EXPANSION, no question
Once selected, commit fully. Do not silently drift.
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
@@ -244,10 +310,12 @@ Evaluate and diagram:
* Production failure scenarios. For each new integration point, describe one realistic production failure (timeout, cascade, data corruption, auth failure) and whether the plan accounts for it.
* Rollback posture. If this ships and immediately breaks, what's the rollback procedure? Git revert? Feature flag? DB migration rollback? How long?
**EXPANSION mode additions:**
**EXPANSION and SELECTIVE EXPANSION additions:**
* What would make this architecture beautiful? Not just correct — elegant. Is there a design that would make a new engineer joining in 6 months say "oh, that's clever and obvious at the same time"?
* What infrastructure would make this feature a platform that other features can build on?
**SELECTIVE EXPANSION:** If any accepted cherry-picks from Step 0D affect the architecture, evaluate their architectural fit here. Flag any that create coupling concerns or don't integrate cleanly — this is a chance to revisit the decision with new information.
Required ASCII diagram: full system architecture showing new components and their relationships to existing ones.
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
@@ -406,8 +474,8 @@ Evaluate:
* Admin tooling. New operational tasks that need admin UI or rake tasks?
* Runbooks. For each new failure mode: what's the operational response?
**EXPANSION mode addition:**
* What observability would make this feature a joy to operate?
**EXPANSION and SELECTIVE EXPANSION addition:**
* What observability would make this feature a joy to operate? (For SELECTIVE EXPANSION, include observability for any accepted cherry-picks.)
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
### Section 9: Deployment & Rollout Review
@@ -421,8 +489,8 @@ Evaluate:
* Post-deploy verification checklist. First 5 minutes? First hour?
* Smoke tests. What automated checks should run immediately post-deploy?
**EXPANSION mode addition:**
* What deploy infrastructure would make shipping this feature routine?
**EXPANSION and SELECTIVE EXPANSION addition:**
* What deploy infrastructure would make shipping this feature routine? (For SELECTIVE EXPANSION, assess whether accepted cherry-picks change the deployment risk profile.)
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
### Section 10: Long-Term Trajectory Review
@@ -434,9 +502,10 @@ Evaluate:
* Ecosystem fit. Aligns with Rails/JS ecosystem direction?
* The 1-year question. Read this plan as a new engineer in 12 months — obvious?
**EXPANSION mode additions:**
**EXPANSION and SELECTIVE EXPANSION additions:**
* What comes after this ships? Phase 2? Phase 3? Does the architecture support that trajectory?
* Platform potential. Does this create capabilities other features can leverage?
* (SELECTIVE EXPANSION only) Retrospective: Were the right cherry-picks accepted? Did any rejected expansions turn out to be load-bearing for the accepted ones?
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
## CRITICAL RULE — How to ask questions
@@ -485,8 +554,11 @@ For each TODO, describe:
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
### Delight Opportunities (EXPANSION mode only)
Identify at least 5 "bonus chunk" opportunities (<30 min each) that would make users think "oh nice, they thought of that." Present each delight opportunity as its own individual AskUserQuestion. Never batch them. For each one, describe what it is, why it would delight users, and effort estimate. Then present options: **A)** Add to TODOS.md as a vision item **B)** Skip **C)** Build it now in this PR.
### Scope Expansion Decisions (EXPANSION and SELECTIVE EXPANSION only)
For EXPANSION and SELECTIVE EXPANSION modes: expansion opportunities and delight items were surfaced and decided in Step 0D (opt-in/cherry-pick ceremony). The decisions are persisted in the CEO plan document. Reference the CEO plan for the full record. Do not re-surface them here — list the accepted expansions for completeness:
* Accepted: {list items added to scope}
* Deferred: {list items sent to TODOS.md}
* Skipped: {list items rejected}
### Diagrams (mandatory, produce all that apply)
1. System architecture
@@ -504,7 +576,7 @@ List every ASCII diagram in files this plan touches. Still accurate?
+====================================================================+
| MEGA PLAN REVIEW — COMPLETION SUMMARY |
+====================================================================+
| Mode selected | EXPANSION / HOLD / REDUCTION |
| Mode selected | EXPANSION / SELECTIVE / HOLD / REDUCTION |
| System Audit | [key findings] |
| Step 0 | [mode + key decisions] |
| Section 1 (Arch) | ___ issues found |
@@ -524,7 +596,8 @@ List every ASCII diagram in files this plan touches. Still accurate?
| Error/rescue registry| ___ methods, ___ CRITICAL GAPS |
| Failure modes | ___ total, ___ CRITICAL GAPS |
| TODOS.md updates | ___ items proposed |
| Delight opportunities| ___ identified (EXPANSION only) |
| Scope proposals | ___ proposed, ___ accepted (EXP + SEL) |
| CEO plan | written / skipped (HOLD/REDUCTION) |
| Diagrams produced | ___ (list types) |
| Stale diagrams found | ___ |
| Unresolved decisions | ___ (listed below) |
@@ -549,15 +622,17 @@ Before running this command, substitute the placeholder values from the Completi
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" in the summary
- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary
- **MODE**: the mode the user selected (SCOPE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
- **MODE**: the mode the user selected (SCOPE_EXPANSION / SELECTIVE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
After completing the review, read the review log and config to display the dashboard.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
@@ -566,20 +641,37 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | — | NOT YET RUN |
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | | | no |
| Design Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR — Design Review not yet run |
| VERDICT: CLEAREDEng Review passed |
+====================================================================+
```
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only — does NOT block.
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO and Design reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
## docs/designs Promotion (EXPANSION and SELECTIVE EXPANSION only)
At the end of the review, if the vision produced a compelling feature direction, offer to promote the CEO plan to the project repo. AskUserQuestion:
"The vision from this review produced {N} accepted scope expansions. Want to promote it to a design doc in the repo?"
- **A)** Promote to `docs/designs/{FEATURE}.md` (committed to repo, visible to the team)
- **B)** Keep in `~/.gstack/projects/` only (local, personal reference)
- **C)** Skip
If promoted, copy the CEO plan content to `docs/designs/{FEATURE}.md` (create the directory if needed) and update the `status` field in the original CEO plan from `ACTIVE` to `PROMOTED`.
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
@@ -590,30 +682,37 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
## Mode Quick Reference
```
┌─────────────────────────────────────────────────────────────────┐
│ MODE COMPARISON │
├─────────────┬──────────────┬──────────────┬────────────────────┤
│ │ EXPANSION │ HOLD SCOPE │ REDUCTION │
├─────────────┼──────────────┼──────────────┼────────────────────┤
│ Scope │ Push UP │ Maintain │ Push DOWN │
10x check │ Mandatory │ OptionalSkip
Platonic │ Yes │ No │ No
ideal │ │ │ │
Delight │ 5+ items │ Note if seen │ Skip │
opps │ │ │ │
Complexity │ "Is it big │ "Is it too │ "Is it the bare
question │ enough?" │ complex?"minimum?"
Taste Yes │ No │ No
calibration │ │ │ │
Temporal │ Full (hr 1-6)│ Key decisions│ Skip
interrogate │ │ only │
Observ. │ "Joy to │ "Can we │ "Can we see if
standard │ operate" │ debug it?" │ it's broken?"
Deploy │ Infra as │ Safe deploySimplest possible
standard │ feature scope│ + rollback │ deploy
Error map │ Full + chaos │ Full │ Critical paths
│ scenarios │ only
Phase 2/3 │ Map it │ Note it │ Skip
planning │ │
└─────────────┴──────────────┴──────────────┴────────────────────┘
┌────────────────────────────────────────────────────────────────────────────────
MODE COMPARISON
├─────────────┬──────────────┬──────────────┬──────────────┬────────────────────┤
│ │ EXPANSION │ SELECTIVE │ HOLD SCOPE │ REDUCTION │
├─────────────┼──────────────┼──────────────┼──────────────┼────────────────────┤
│ Scope │ Push UP │ Hold + offer │ Maintain │ Push DOWN │
│ (opt-in) │ │
Recommend │ Enthusiastic │ Neutral │ N/A │ N/A
posture │ │ │ │ │
10x check │ Mandatory │ Surface as │ Optional │ Skip │
│ │ cherry-pick │ │ │
Platonic │ Yes │ No │ No │ No
ideal │ │ │
DelightOpt-in │ Cherry-pick │ Note if seen │ Skip
opps │ ceremony │ ceremony │ │ │
Complexity │ "Is it big │ "Is it right │ "Is it too │ "Is it the bare
question enough?" + what else │ complex?" │ minimum?"
│ │ is tempting"│ │
Taste │ Yes │ Yes │ No │ No
calibration │ │ │
Temporal │ Full (hr 1-6)│ Full (hr 1-6)│ Key decisions│ Skip
interrogate │ │ │ only │
Observ. │ "Joy to │ "Joy to │ "Can we"Can we see if
standard │ operate" operate" │ debug it?" it's broken?"
Deploy │ Infra as │ Safe deploy │ Safe deploy │ Simplest possible
│ standard │ feature scope│ + cherry-pick│ + rollback │ deploy │
│ │ │ risk check │ │ │
│ Error map │ Full + chaos │ Full + chaos │ Full │ Critical paths │
│ │ scenarios │ for accepted │ │ only │
│ CEO plan │ Written │ Written │ Skipped │ Skipped │
│ Phase 2/3 │ Map accepted │ Map accepted │ Note it │ Skip │
│ planning │ │ cherry-picks │ │ │
└─────────────┴──────────────┴──────────────┴──────────────┴────────────────────┘
```
+141 -50
View File
@@ -3,9 +3,9 @@ name: plan-ceo-review
version: 1.0.0
description: |
CEO/founder-mode plan review. Rethink the problem, find the 10-star product,
challenge premises, expand scope when it creates a better product. Three modes:
SCOPE EXPANSION (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION
(strip to essentials).
challenge premises, expand scope when it creates a better product. Four modes:
SCOPE EXPANSION (dream big), SELECTIVE EXPANSION (hold scope + cherry-pick
expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials).
allowed-tools:
- Read
- Grep
@@ -23,10 +23,11 @@ allowed-tools:
## Philosophy
You are not here to rubber-stamp this plan. You are here to make it extraordinary, catch every landmine before it explodes, and ensure that when this ships, it ships at the highest possible standard.
But your posture depends on what the user needs:
* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" The answer to "should we also build X?" is "yes, if it serves the vision." You have permission to dream.
* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" You have permission to dream — and to recommend enthusiastically. But every expansion is the user's decision. Present each scope-expanding idea as an AskUserQuestion. The user opts in or out.
* SELECTIVE EXPANSION: You are a rigorous reviewer who also has taste. Hold the current scope as your baseline — make it bulletproof. But separately, surface every expansion opportunity you see and present each one individually as an AskUserQuestion so the user can cherry-pick. Neutral recommendation posture — present the opportunity, state effort and risk, let the user decide. Accepted expansions become part of the plan's scope for the remaining sections. Rejected ones go to "NOT in scope."
* HOLD SCOPE: You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof — catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand.
* SCOPE REDUCTION: You are a surgeon. Find the minimum viable version that achieves the core outcome. Cut everything else. Be ruthless.
Critical rule: Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully.
Critical rule: In ALL modes, the user is 100% in control. Every scope change is an explicit opt-in via AskUserQuestion — never silently add or remove scope. Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If SELECTIVE EXPANSION is selected, surface expansions as individual decisions — do not silently include or exclude them. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully.
Do NOT make any code changes. Do NOT start implementation. Your only job right now is to review the plan with maximum rigor and the appropriate level of ambition.
## Prime Directives
@@ -82,7 +83,7 @@ Map:
### Retrospective Check
Check the git log for this branch. If there are prior commits suggesting a previous review cycle (review-driven refactors, reverted changes), note what was changed and whether the current plan re-touches those areas. Be MORE aggressive reviewing areas that were previously problematic. Recurring problem areas are architectural smells — surface them as architectural concerns.
### Taste Calibration (EXPANSION mode only)
### Taste Calibration (EXPANSION and SELECTIVE EXPANSION modes)
Identify 2-3 files or patterns in the existing codebase that are particularly well-designed. Note them as style references for the review. Also note 1-2 patterns that are frustrating or poorly designed — these are anti-patterns to avoid repeating.
Report findings before proceeding to Step 0.
@@ -105,10 +106,20 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
```
### 0D. Mode-Specific Analysis
**For SCOPE EXPANSION** — run all three:
**For SCOPE EXPANSION** — run all three, then the opt-in ceremony:
1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely.
2. Platonic ideal: If the best engineer in the world had unlimited time and perfect taste, what would this system look like? What would the user feel when using it? Start from experience, not architecture.
3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 3.
3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 5.
4. **Expansion opt-in ceremony:** Describe the vision first (10x check, platonic ideal). Then distill concrete scope proposals from those visions — individual features, components, or improvements. Present each proposal as its own AskUserQuestion. Recommend enthusiastically — explain why it's worth doing. But the user decides. Options: **A)** Add to this plan's scope **B)** Defer to TODOS.md **C)** Skip. Accepted items become plan scope for all remaining review sections. Rejected items go to "NOT in scope."
**For SELECTIVE EXPANSION** — run the HOLD SCOPE analysis first, then surface expansions:
1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
2. What is the minimum set of changes that achieves the stated goal? Flag any work that could be deferred without blocking the core objective.
3. Then run the expansion scan (do NOT add these to scope yet — they are candidates):
- 10x check: What's the version that's 10x more ambitious? Describe it concretely.
- Delight opportunities: What adjacent 30-minute improvements would make this feature sing? List at least 5.
- Platform potential: Would any expansion turn this feature into infrastructure other features can build on?
4. **Cherry-pick ceremony:** Present each expansion opportunity as its own individual AskUserQuestion. Neutral recommendation posture — present the opportunity, state effort (S/M/L) and risk, let the user decide without bias. Options: **A)** Add to this plan's scope **B)** Defer to TODOS.md **C)** Skip. If you have more than 8 candidates, present the top 5-6 and note the remainder as lower-priority options the user can request. Accepted items become plan scope for all remaining review sections. Rejected items go to "NOT in scope."
**For HOLD SCOPE** — run this:
1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
@@ -118,7 +129,57 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
1. Ruthless cut: What is the absolute minimum that ships value to a user? Everything else is deferred. No exceptions.
2. What can be a follow-up PR? Separate "must ship together" from "nice to ship together."
### 0E. Temporal Interrogation (EXPANSION and HOLD modes)
### 0D-POST. Persist CEO Plan (EXPANSION and SELECTIVE EXPANSION only)
After the opt-in/cherry-pick ceremony, write the plan to disk so the vision and decisions survive beyond this conversation. Only run this step for EXPANSION and SELECTIVE EXPANSION modes.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
```
Before writing, check for existing CEO plans in the ceo-plans/ directory. If any are >30 days old or their branch has been merged/deleted, offer to archive them:
```bash
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans/archive
# For each stale plan: mv ~/.gstack/projects/$SLUG/ceo-plans/{old-plan}.md ~/.gstack/projects/$SLUG/ceo-plans/archive/
```
Write to `~/.gstack/projects/$SLUG/ceo-plans/{date}-{feature-slug}.md` using this format:
```markdown
---
status: ACTIVE
---
# CEO Plan: {Feature Name}
Generated by /plan-ceo-review on {date}
Branch: {branch} | Mode: {EXPANSION / SELECTIVE EXPANSION}
Repo: {owner/repo}
## Vision
### 10x Check
{10x vision description}
### Platonic Ideal
{platonic ideal description — EXPANSION mode only}
## Scope Decisions
| # | Proposal | Effort | Decision | Reasoning |
|---|----------|--------|----------|-----------|
| 1 | {proposal} | S/M/L | ACCEPTED / DEFERRED / SKIPPED | {why} |
## Accepted Scope (added to this plan)
- {bullet list of what's now in scope}
## Deferred to TODOS.md
- {items with context}
```
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
HOUR 1 (foundations): What does the implementer need to know?
@@ -129,17 +190,22 @@ Think ahead to implementation: What decisions will need to be made during implem
Surface these as questions for the user NOW, not as "figure it out later."
### 0F. Mode Selection
Present three options:
1. **SCOPE EXPANSION:** The plan is good but could be great. Propose the ambitious version, then review that. Push scope up. Build the cathedral.
2. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof.
3. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that.
In every mode, you are 100% in control. No scope is added without your explicit approval.
Present four options:
1. **SCOPE EXPANSION:** The plan is good but could be great. Dream big — propose the ambitious version. Every expansion is presented individually for your approval. You opt in to each one.
2. **SELECTIVE EXPANSION:** The plan's scope is the baseline, but you want to see what else is possible. Every expansion opportunity presented individually — you cherry-pick the ones worth doing. Neutral recommendations.
3. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof. No expansions surfaced.
4. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that.
Context-dependent defaults:
* Greenfield feature → default EXPANSION
* Feature enhancement or iteration on existing system → default SELECTIVE EXPANSION
* Bug fix or hotfix → default HOLD SCOPE
* Refactor → default HOLD SCOPE
* Plan touching >15 files → suggest REDUCTION unless user pushes back
* User says "go big" / "ambitious" / "cathedral" → EXPANSION, no question
* User says "hold scope but tempt me" / "show me options" / "cherry-pick" → SELECTIVE EXPANSION, no question
Once selected, commit fully. Do not silently drift.
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
@@ -162,10 +228,12 @@ Evaluate and diagram:
* Production failure scenarios. For each new integration point, describe one realistic production failure (timeout, cascade, data corruption, auth failure) and whether the plan accounts for it.
* Rollback posture. If this ships and immediately breaks, what's the rollback procedure? Git revert? Feature flag? DB migration rollback? How long?
**EXPANSION mode additions:**
**EXPANSION and SELECTIVE EXPANSION additions:**
* What would make this architecture beautiful? Not just correct — elegant. Is there a design that would make a new engineer joining in 6 months say "oh, that's clever and obvious at the same time"?
* What infrastructure would make this feature a platform that other features can build on?
**SELECTIVE EXPANSION:** If any accepted cherry-picks from Step 0D affect the architecture, evaluate their architectural fit here. Flag any that create coupling concerns or don't integrate cleanly — this is a chance to revisit the decision with new information.
Required ASCII diagram: full system architecture showing new components and their relationships to existing ones.
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
@@ -324,8 +392,8 @@ Evaluate:
* Admin tooling. New operational tasks that need admin UI or rake tasks?
* Runbooks. For each new failure mode: what's the operational response?
**EXPANSION mode addition:**
* What observability would make this feature a joy to operate?
**EXPANSION and SELECTIVE EXPANSION addition:**
* What observability would make this feature a joy to operate? (For SELECTIVE EXPANSION, include observability for any accepted cherry-picks.)
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
### Section 9: Deployment & Rollout Review
@@ -339,8 +407,8 @@ Evaluate:
* Post-deploy verification checklist. First 5 minutes? First hour?
* Smoke tests. What automated checks should run immediately post-deploy?
**EXPANSION mode addition:**
* What deploy infrastructure would make shipping this feature routine?
**EXPANSION and SELECTIVE EXPANSION addition:**
* What deploy infrastructure would make shipping this feature routine? (For SELECTIVE EXPANSION, assess whether accepted cherry-picks change the deployment risk profile.)
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
### Section 10: Long-Term Trajectory Review
@@ -352,9 +420,10 @@ Evaluate:
* Ecosystem fit. Aligns with Rails/JS ecosystem direction?
* The 1-year question. Read this plan as a new engineer in 12 months — obvious?
**EXPANSION mode additions:**
**EXPANSION and SELECTIVE EXPANSION additions:**
* What comes after this ships? Phase 2? Phase 3? Does the architecture support that trajectory?
* Platform potential. Does this create capabilities other features can leverage?
* (SELECTIVE EXPANSION only) Retrospective: Were the right cherry-picks accepted? Did any rejected expansions turn out to be load-bearing for the accepted ones?
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
## CRITICAL RULE — How to ask questions
@@ -403,8 +472,11 @@ For each TODO, describe:
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
### Delight Opportunities (EXPANSION mode only)
Identify at least 5 "bonus chunk" opportunities (<30 min each) that would make users think "oh nice, they thought of that." Present each delight opportunity as its own individual AskUserQuestion. Never batch them. For each one, describe what it is, why it would delight users, and effort estimate. Then present options: **A)** Add to TODOS.md as a vision item **B)** Skip **C)** Build it now in this PR.
### Scope Expansion Decisions (EXPANSION and SELECTIVE EXPANSION only)
For EXPANSION and SELECTIVE EXPANSION modes: expansion opportunities and delight items were surfaced and decided in Step 0D (opt-in/cherry-pick ceremony). The decisions are persisted in the CEO plan document. Reference the CEO plan for the full record. Do not re-surface them here — list the accepted expansions for completeness:
* Accepted: {list items added to scope}
* Deferred: {list items sent to TODOS.md}
* Skipped: {list items rejected}
### Diagrams (mandatory, produce all that apply)
1. System architecture
@@ -422,7 +494,7 @@ List every ASCII diagram in files this plan touches. Still accurate?
+====================================================================+
| MEGA PLAN REVIEW — COMPLETION SUMMARY |
+====================================================================+
| Mode selected | EXPANSION / HOLD / REDUCTION |
| Mode selected | EXPANSION / SELECTIVE / HOLD / REDUCTION |
| System Audit | [key findings] |
| Step 0 | [mode + key decisions] |
| Section 1 (Arch) | ___ issues found |
@@ -442,7 +514,8 @@ List every ASCII diagram in files this plan touches. Still accurate?
| Error/rescue registry| ___ methods, ___ CRITICAL GAPS |
| Failure modes | ___ total, ___ CRITICAL GAPS |
| TODOS.md updates | ___ items proposed |
| Delight opportunities| ___ identified (EXPANSION only) |
| Scope proposals | ___ proposed, ___ accepted (EXP + SEL) |
| CEO plan | written / skipped (HOLD/REDUCTION) |
| Diagrams produced | ___ (list types) |
| Stale diagrams found | ___ |
| Unresolved decisions | ___ (listed below) |
@@ -467,10 +540,21 @@ Before running this command, substitute the placeholder values from the Completi
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" in the summary
- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary
- **MODE**: the mode the user selected (SCOPE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
- **MODE**: the mode the user selected (SCOPE_EXPANSION / SELECTIVE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
{{REVIEW_DASHBOARD}}
## docs/designs Promotion (EXPANSION and SELECTIVE EXPANSION only)
At the end of the review, if the vision produced a compelling feature direction, offer to promote the CEO plan to the project repo. AskUserQuestion:
"The vision from this review produced {N} accepted scope expansions. Want to promote it to a design doc in the repo?"
- **A)** Promote to `docs/designs/{FEATURE}.md` (committed to repo, visible to the team)
- **B)** Keep in `~/.gstack/projects/` only (local, personal reference)
- **C)** Skip
If promoted, copy the CEO plan content to `docs/designs/{FEATURE}.md` (create the directory if needed) and update the `status` field in the original CEO plan from `ACTIVE` to `PROMOTED`.
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
@@ -480,30 +564,37 @@ Before running this command, substitute the placeholder values from the Completi
## Mode Quick Reference
```
┌─────────────────────────────────────────────────────────────────┐
│ MODE COMPARISON │
├─────────────┬──────────────┬──────────────┬────────────────────┤
│ │ EXPANSION │ HOLD SCOPE │ REDUCTION │
├─────────────┼──────────────┼──────────────┼────────────────────┤
│ Scope │ Push UP │ Maintain │ Push DOWN │
10x check │ Mandatory │ OptionalSkip
Platonic │ Yes │ No │ No
ideal │ │ │ │
Delight │ 5+ items │ Note if seen │ Skip │
opps │ │ │ │
Complexity │ "Is it big │ "Is it too │ "Is it the bare
question │ enough?" │ complex?"minimum?"
Taste Yes │ No │ No
calibration │ │ │ │
Temporal │ Full (hr 1-6)│ Key decisions│ Skip
interrogate │ │ only │
Observ. │ "Joy to │ "Can we │ "Can we see if
standard │ operate" │ debug it?" │ it's broken?"
Deploy │ Infra as │ Safe deploySimplest possible
standard │ feature scope│ + rollback │ deploy
Error map │ Full + chaos │ Full │ Critical paths
│ scenarios │ only
Phase 2/3 │ Map it │ Note it │ Skip
planning │ │
└─────────────┴──────────────┴──────────────┴────────────────────┘
┌────────────────────────────────────────────────────────────────────────────────
MODE COMPARISON
├─────────────┬──────────────┬──────────────┬──────────────┬────────────────────┤
│ │ EXPANSION │ SELECTIVE │ HOLD SCOPE │ REDUCTION │
├─────────────┼──────────────┼──────────────┼──────────────┼────────────────────┤
│ Scope │ Push UP │ Hold + offer │ Maintain │ Push DOWN │
│ (opt-in) │ │
Recommend │ Enthusiastic │ Neutral │ N/A │ N/A
posture │ │ │ │ │
10x check │ Mandatory │ Surface as │ Optional │ Skip │
│ │ cherry-pick │ │ │
Platonic │ Yes │ No │ No │ No
ideal │ │ │
DelightOpt-in │ Cherry-pick │ Note if seen │ Skip
opps │ ceremony │ ceremony │ │ │
Complexity │ "Is it big │ "Is it right │ "Is it too │ "Is it the bare
question enough?" + what else │ complex?" │ minimum?"
│ │ is tempting"│ │
Taste │ Yes │ Yes │ No │ No
calibration │ │ │
Temporal │ Full (hr 1-6)│ Full (hr 1-6)│ Key decisions│ Skip
interrogate │ │ │ only │
Observ. │ "Joy to │ "Joy to │ "Can we"Can we see if
standard │ operate" operate" │ debug it?" it's broken?"
Deploy │ Infra as │ Safe deploy │ Safe deploy │ Simplest possible
│ standard │ feature scope│ + cherry-pick│ + rollback │ deploy │
│ │ │ risk check │ │ │
│ Error map │ Full + chaos │ Full + chaos │ Full │ Critical paths │
│ │ scenarios │ for accepted │ │ only │
│ CEO plan │ Written │ Written │ Skipped │ Skipped │
│ Phase 2/3 │ Map accepted │ Map accepted │ Note it │ Skip │
│ planning │ │ cherry-picks │ │ │
└─────────────┴──────────────┴──────────────┴──────────────┴────────────────────┘
```
+18 -10
View File
@@ -576,11 +576,13 @@ Substitute values from the report:
## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
After completing the review, read the review log and config to display the dashboard.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
@@ -589,17 +591,23 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | — | NOT YET RUN |
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | | | no |
| Design Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR — Design Review not yet run |
| VERDICT: CLEAREDEng Review passed |
+====================================================================+
```
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only — does NOT block.
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO and Design reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
+18 -10
View File
@@ -277,11 +277,13 @@ Substitute values from the Completion Summary:
## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
After completing the review, read the review log and config to display the dashboard.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
@@ -290,20 +292,26 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | — | NOT YET RUN |
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | | | no |
| Design Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR — Design Review not yet run |
| VERDICT: CLEAREDEng Review passed |
+====================================================================+
```
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only — does NOT block.
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO and Design reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
## Unresolved decisions
If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option.
+18 -10
View File
@@ -817,11 +817,13 @@ Tie everything to user goals and product objectives. Always suggest specific imp
function generateReviewDashboard(): string {
return `## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
After completing the review, read the review log and config to display the dashboard.
\`\`\`bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
\`\`\`
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
@@ -830,20 +832,26 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | | NOT YET RUN |
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | | | no |
| Design Review | 0 | | | no |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR Design Review not yet run |
| VERDICT: CLEARED Eng Review passed |
+====================================================================+
\`\`\`
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only does NOT block.`;
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \\\`skip_eng_review\\\` is \\\`true\\\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO and Design reviews are shown for context but never block shipping
- If \\\`skip_eng_review\\\` config is \\\`true\\\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED`;
}
function generateTestBootstrap(): string {
+24 -15
View File
@@ -138,11 +138,13 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
After completing the review, read the review log and config to display the dashboard.
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
@@ -151,25 +153,32 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | — | NOT YET RUN |
| Review | Runs | Last Run | Status | Required |
|-----------------|------|---------------------|-----------|----------|
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
| CEO Review | 0 | | | no |
| Design Review | 0 | — | — | no |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR — Design Review not yet run |
| VERDICT: CLEAREDEng Review passed |
+====================================================================+
```
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only — does NOT block.
**Review tiers:**
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
- Show which reviews are missing or have open issues
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
- CEO and Design reviews are shown for context but never block shipping
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
If the verdict is NOT "CLEARED", use AskUserQuestion:
- Show that Eng Review is missing or has open issues
- RECOMMENDATION: Choose B (run eng review first) unless the change is obviously trivial (<20 lines, typo fix, config-only)
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
- If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them
---
+5 -4
View File
@@ -56,10 +56,11 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
{{REVIEW_DASHBOARD}}
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
- Show which reviews are missing or have open issues
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
If the verdict is NOT "CLEARED", use AskUserQuestion:
- Show that Eng Review is missing or has open issues
- RECOMMENDATION: Choose B (run eng review first) unless the change is obviously trivial (<20 lines, typo fix, config-only)
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
- If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them
---
+3 -2
View File
@@ -343,9 +343,10 @@ describe('REVIEW_DASHBOARD resolver', () => {
test('resolver output contains key dashboard elements', () => {
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
expect(content).toContain('VERDICT');
expect(content).toContain('CLEARED TO SHIP');
expect(content).toContain('NOT YET RUN');
expect(content).toContain('CLEARED');
expect(content).toContain('Eng Review');
expect(content).toContain('7 days');
expect(content).toContain('Design Review');
expect(content).toContain('skip_eng_review');
});
});
+83
View File
@@ -852,6 +852,89 @@ Focus on reviewing the plan content: architecture, error handling, security, and
}, 420_000);
});
// --- Plan CEO Review (SELECTIVE EXPANSION) E2E ---
describeE2E('Plan CEO Review SELECTIVE EXPANSION E2E', () => {
let planDir: string;
beforeAll(() => {
planDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-plan-ceo-sel-'));
const { spawnSync } = require('child_process');
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: planDir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
fs.writeFileSync(path.join(planDir, 'plan.md'), `# Plan: Add User Dashboard
## Context
We're building a new user dashboard that shows recent activity, notifications, and quick actions.
## Changes
1. New React component \`UserDashboard\` in \`src/components/\`
2. REST API endpoint \`GET /api/dashboard\` returning user stats
3. PostgreSQL query for activity aggregation
4. Redis cache layer for dashboard data (5min TTL)
## Architecture
- Frontend: React + TailwindCSS
- Backend: Express.js REST API
- Database: PostgreSQL with existing user/activity tables
- Cache: Redis for dashboard aggregates
## Open questions
- Should we use WebSocket for real-time updates?
- How do we handle users with 100k+ activity records?
`);
run('git', ['add', '.']);
run('git', ['commit', '-m', 'add plan']);
fs.mkdirSync(path.join(planDir, 'plan-ceo-review'), { recursive: true });
fs.copyFileSync(
path.join(ROOT, 'plan-ceo-review', 'SKILL.md'),
path.join(planDir, 'plan-ceo-review', 'SKILL.md'),
);
});
afterAll(() => {
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
});
test('/plan-ceo-review SELECTIVE EXPANSION produces structured review output', async () => {
const result = await runSkillTest({
prompt: `Read plan-ceo-review/SKILL.md for the review workflow.
Read plan.md that's the plan to review. This is a standalone plan document, not a codebase skip any codebase exploration or system audit steps.
Choose SELECTIVE EXPANSION mode. Skip any AskUserQuestion calls this is non-interactive.
For the cherry-pick ceremony, accept all expansion proposals automatically.
Write your complete review directly to ${planDir}/review-output-selective.md
Focus on reviewing the plan content: architecture, error handling, security, and performance.`,
workingDirectory: planDir,
maxTurns: 15,
timeout: 360_000,
testName: 'plan-ceo-review-selective',
runId,
});
logCost('/plan-ceo-review (SELECTIVE)', result);
recordE2E('/plan-ceo-review-selective', 'Plan CEO Review SELECTIVE EXPANSION E2E', result, {
passed: ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
const reviewPath = path.join(planDir, 'review-output-selective.md');
if (fs.existsSync(reviewPath)) {
const review = fs.readFileSync(reviewPath, 'utf-8');
expect(review.length).toBeGreaterThan(200);
}
}, 420_000);
});
// --- Plan Eng Review E2E ---
describeE2E('Plan Eng Review E2E', () => {
+30
View File
@@ -666,6 +666,36 @@ describe('Planted-bug fixture validation', () => {
});
});
// --- CEO review mode validation ---
describe('CEO review mode validation', () => {
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
test('has all four CEO review modes defined', () => {
const modes = ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'];
for (const mode of modes) {
expect(content).toContain(mode);
}
});
test('has CEO plan persistence step', () => {
expect(content).toContain('ceo-plans');
expect(content).toContain('status: ACTIVE');
});
test('has docs/designs promotion section', () => {
expect(content).toContain('docs/designs');
expect(content).toContain('PROMOTED');
});
test('mode quick reference has four columns', () => {
expect(content).toContain('EXPANSION');
expect(content).toContain('SELECTIVE');
expect(content).toContain('HOLD SCOPE');
expect(content).toContain('REDUCTION');
});
});
// --- gstack-slug helper ---
describe('gstack-slug', () => {