feat: /design-consultation — risk-taking thesis, visual research, ambitious preview

Add SAFE/RISK breakdown to design proposals so users see which choices
match category conventions vs. which are deliberate creative departures.

Wire browse binary for visual competitive research — agent browses
competitor sites, takes screenshots, and analyzes fonts/colors/spacing
with graceful degradation to WebSearch-only or built-in knowledge.

Upgrade preview page instructions to render realistic product mockups
(dashboards, marketing pages, settings forms) instead of just swatches.

Rewrite README section with the thesis: "coherence is table stakes —
the real question is where you take risks."
This commit is contained in:
Garry Tan
2026-03-17 08:40:02 -07:00
parent 5f41cd9ad7
commit d6a324fba5
17 changed files with 398 additions and 66 deletions
+10 -3
View File
@@ -1,11 +1,18 @@
# Changelog
## 0.5.0.1 — 2026-03-17
### Fixed
## 0.5.1 — 2026-03-17
- **Know where you stand before you ship.** Every `/plan-ceo-review`, `/plan-eng-review`, and `/plan-design-review` now logs its result to a review tracker. At the end of each review, you see a **Review Readiness Dashboard** showing which reviews are done, when they ran, and whether they're clean — with a clear CLEARED TO SHIP or NOT READY verdict.
- **`/ship` checks your reviews before creating the PR.** Pre-flight now reads the dashboard and asks if you want to continue when reviews are missing. Informational only — it won't block you, but you'll know what you skipped.
- **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `eval $(gstack-slug)`. If the format ever changes, fix it once.
- **Screenshots are now visible during QA and browse sessions.** When gstack takes screenshots, they now show up as clickable image elements in your output — no more invisible `/tmp/browse-screenshot.png` paths you can't see. Works in `/qa`, `/qa-only`, `/plan-design-review`, `/qa-design-review`, `/browse`, and `/gstack`.
### For contributors
- Added `{{REVIEW_DASHBOARD}}` resolver to `gen-skill-docs.ts` — shared dashboard reader injected into 4 templates (3 review skills + ship).
- Added `bin/gstack-slug` helper (5-line bash) with unit tests. Outputs `SLUG=` and `BRANCH=` lines, sanitizes `/` to `-`.
- New TODOs: smart review relevance detection (P3), `/merge` skill for review-gated PR merge (P2).
## 0.5.0 — 2026-03-16
- **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches.
+43 -24
View File
@@ -20,7 +20,7 @@ Thirteen opinionated workflow skills for [Claude Code](https://docs.anthropic.co
| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
| `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. |
| `/design-consultation` | Design consultant | Build a complete design system from scratch. Researches competitors, proposes aesthetic + typography + color + spacing + motion, generates a preview page, and writes DESIGN.md. |
| `/design-consultation` | Design consultant | Build a complete design system from scratch. Browses competitors to get in the ballpark, proposes safe choices AND creative risks, generates realistic product mockups, and writes DESIGN.md. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
@@ -344,11 +344,15 @@ This is my **design partner mode**.
`/plan-design-review` audits a site that already exists. `/design-consultation` is for when you have nothing yet — no design system, no font choices, no color palette. You are starting from zero and you want a senior designer to sit down with you and build the whole visual identity together.
It is a conversation, not a form. The agent asks about your product, your users, and your space. If you want, it researches what top competitors in your category are doing — fonts, colors, layout patterns, aesthetic direction. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice.
It is a conversation, not a form. The agent asks about your product, your users, and your audience. It thinks about what your product needs to communicate — trust, speed, craft, warmth, whatever fits — and works backward from that to concrete choices. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice.
The key insight: individual design decisions are easy to make but hard to make coherently. Picking a font is simple. Picking a font that works with your color palette, your spacing density, your aesthetic direction, and your product's personality — that is the actual job. `/design-consultation` solves the coherence problem by proposing everything as one package, then letting you adjust any piece while it checks that the rest still holds together.
But coherence is table stakes. Every dev tool dashboard looks the same — clean sans-serif, muted grays, a blue accent. They are all coherent. They are all forgettable. The difference between a product that looks "nice" and one that people actually recognize is the **deliberate creative risks**: the unexpected serif for headings, the bold accent nobody else in your category uses, the tighter spacing that makes your data feel authoritative instead of airy.
After you agree on the system, it generates an interactive HTML preview page — your fonts rendered at real sizes, your color palette as actual UI components (buttons, cards, form inputs, alerts), light/dark mode toggle, all using your product name instead of lorem ipsum. You open the page in your browser and see the design system rendered before a single line of product code is written.
That is what `/design-consultation` is really about. It does not just propose a safe system. It proposes safe choices AND risks — and tells you which is which. "Here are the choices that keep you literate in your category. And here is where I think you should break from convention, and why." You pick which risks to take. The agent checks that the whole system still coheres either way.
If you want, the agent will browse your competitors' actual sites — take screenshots, analyze their fonts and colors and spacing — so you can see the landscape before you make choices. This is not about copying. It is about getting in the ballpark so you know what the conventions are, and then deciding which ones are worth breaking.
After you agree on the system, it generates an interactive HTML preview page — not just swatches and font samples, but realistic product pages. If you are building a dashboard, you see a dashboard with a sidebar, data tables, and stat cards. If you are building a marketing site, you see a hero section with real copy and a CTA. Everything rendered in your design system, with your product name, in light and dark mode. You see what your product could feel like before a single line of production code is written.
Then it writes `DESIGN.md` to your repo root — your project's design source of truth — and updates `CLAUDE.md` so every future Claude Code session respects the system. From that point on, `/qa-design-review` can audit against it, and any agent working on your frontend knows the rules.
@@ -356,44 +360,59 @@ Then it writes `DESIGN.md` to your repo root — your project's design source of
You: /design-consultation
Claude: From the README, this looks like a developer analytics dashboard
for engineering teams. Sound right? And would you like me to
research what other dev tools are doing for design, or should
I work from what I know?
for engineering teams. Sound right? Tell me about the feeling
you want — and would you like me to browse some competitors
first so we can see the landscape?
You: Yes, it's a dashboard for eng managers. Research would be great.
You: Yes, it's for eng managers. Show me what's out there.
Claude: I looked at Linear, Vercel, Railway, Planetscale, and Posthog.
They converge on clean sans-serifs and muted palettes, but
Linear stands out with its dark-first approach and tight spacing.
The gap: most of these feel interchangeable. Here's my proposal:
Claude: [browses Linear, Vercel, Datadog, Posthog — takes screenshots]
AESTHETIC: Industrial/Utilitarian — data-first, no decoration tax
Here's the landscape. They all converge on clean sans-serifs
and muted blue-gray palettes. Linear stands out with dark-first
and tight spacing. The rest feel interchangeable.
Here's my proposal — and where I'd play it safe vs. take risks:
AESTHETIC: Industrial/Utilitarian — data-first, every pixel earns its place
TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data)
COLOR: Restrained — slate neutrals, single teal accent (#0D9488)
SPACING: 4px base, compact density
LAYOUT: Grid-disciplined, 12-column
MOTION: Minimal-functional — only transitions that aid comprehension
This coheres because the utilitarian aesthetic demands a tight
spacing grid and restrained color. The serif display font creates
hierarchy contrast without adding visual noise. The monospace for
data tables feels native to the engineering audience.
SAFE CHOICES (match category expectations):
Geist for body — your users already know this font from Vercel
Slate neutrals — warm grays feel right for data-heavy UIs
Grid-disciplined layout — dashboards need predictable alignment
Want to adjust anything, or should I generate the preview page?
RISKS (where you'd stand out):
Instrument Serif for display headings — nobody in dev tools
uses a serif. It creates instant hierarchy contrast and
signals "we care about craft" in a space full of sans-serifs.
Single teal accent instead of blue — every competitor uses blue.
Teal is close enough to feel trustworthy, different enough
to be recognizable.
4px compact density — tighter than Linear, says "we respect
your screen real estate."
You: Generate it.
The safe choices keep you literate. The risks are where your
product gets its own face. Which risks appeal to you?
You: Love the serif risk. Let's see it.
Claude: Preview opened in your browser.
[/tmp/design-consultation-preview-1710000000.html]
The page shows Instrument Serif at hero scale, Geist in body
paragraphs, JetBrains Mono in a sample data table. Color swatches
with your teal accent against slate grays. Buttons, cards, and
alerts rendered in both light and dark mode.
The page shows a realistic dashboard: sidebar nav, header with
avatar, stat cards, data table — all in your design system.
Instrument Serif at hero scale, Geist in body text, JetBrains
Mono in the data table. Teal accent on buttons and active states.
Light and dark mode toggle in the corner.
How does it feel? Want to adjust anything before I write DESIGN.md?
You: Looks great. Ship it.
You: Ship it.
Claude: Wrote DESIGN.md (typography, color, spacing, layout, motion).
Updated CLAUDE.md with design system guidelines.
+26
View File
@@ -408,6 +408,32 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu
**Priority:** P2
**Depends on:** None
## Ship Confidence Dashboard
### Smart review relevance detection
**What:** Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only).
**Why:** Currently dashboard always shows 4 rows. On docs-only changes, "Design Review: NOT YET RUN" is noise.
**Context:** /plan-design-review and /qa already do file-type detection in diff-aware mode. Could reuse that heuristic. Would require a `gstack-diff-scope` helper or enriching `gstack-slug` to also output change categories.
**Effort:** M
**Priority:** P3
**Depends on:** Ship Confidence Dashboard (shipped)
### /merge skill — review-gated PR merge
**What:** Create a `/merge` skill that merges an approved PR, but first checks the Review Readiness Dashboard and runs `/review` (Fix-First) if code review hasn't been done. Separates "ship" (create PR) from "merge" (land it).
**Why:** Currently `/review` runs inside `/ship` Step 3.5 but isn't tracked as a gate. A `/merge` skill ensures code review always happens before landing, and enables workflows where someone else reviews the PR first.
**Context:** `/ship` creates the PR. `/merge` would: check dashboard → run `/review` if needed → `gh pr merge`. This is where code review tracking belongs — at merge time, not at plan time.
**Effort:** M
**Priority:** P2
**Depends on:** Ship Confidence Dashboard (shipped)
## Completed
### Phase 1: Foundations (v0.2.0)
+1 -1
View File
@@ -1 +1 @@
0.5.0.1
0.5.1
+9
View File
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
# gstack-slug — output project slug and sanitized branch name
# Usage: eval $(gstack-slug) → sets SLUG and BRANCH variables
# Or: gstack-slug → prints SLUG=... and BRANCH=... lines
set -euo pipefail
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-')
echo "SLUG=$SLUG"
echo "BRANCH=$BRANCH"
+76 -15
View File
@@ -114,7 +114,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
Look for brainstorm output:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```
@@ -123,6 +123,29 @@ If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
**Find the browse binary (optional — enables visual competitive research):**
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge.
---
## Phase 1: Product Context
@@ -143,17 +166,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr
If the user wants competitive research:
**Step 1: Identify competitors via WebSearch**
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
**Step 2: Visual research via browse (if available)**
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence:
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
```bash
$B goto "https://competitor-site.com"
$B screenshot "/tmp/design-research-competitor-name.png"
$B snapshot
```
For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data.
If a competitor site blocks the headless browser or requires login, skip it and note why.
If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine.
**Step 3: Synthesize findings**
The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable).
Summarize conversationally:
> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..."
**Graceful degradation:**
- Browse available → screenshots + snapshots + WebSearch (richest research)
- Browse unavailable → WebSearch only (still good)
- WebSearch also unavailable → agent's built-in design knowledge (always works)
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
@@ -163,7 +209,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:**
```
Based on [product context] and [research findings / my design knowledge]:
@@ -178,12 +224,21 @@ MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
SAFE CHOICES (category baseline — your users expect these):
- [2-3 decisions that match category conventions, with rationale for playing safe]
RISKS (where your product gets its own face):
- [2-3 deliberate departures from convention]
- For each risk: what it is, why it works, what you gain, what it costs
The safe choices keep you literate in your category. The risks are where
your product becomes memorable. Which risks appeal to you? Want to see
different ones? Or adjust anything else?
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality.
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
@@ -273,7 +328,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
4. **Font specimen section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
@@ -281,11 +336,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system:
- **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards
- **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA
- **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button
- **Auth / onboarding:** login form with social buttons, branding, input validation states
- Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code.
7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
8. **Clean, professional layout** — the preview page IS a taste signal for the skill
9. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
+59 -15
View File
@@ -49,7 +49,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
Look for brainstorm output:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```
@@ -58,6 +58,12 @@ If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
**Find the browse binary (optional — enables visual competitive research):**
{{BROWSE_SETUP}}
If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge.
---
## Phase 1: Product Context
@@ -78,17 +84,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr
If the user wants competitive research:
**Step 1: Identify competitors via WebSearch**
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
**Step 2: Visual research via browse (if available)**
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence:
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
```bash
$B goto "https://competitor-site.com"
$B screenshot "/tmp/design-research-competitor-name.png"
$B snapshot
```
For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data.
If a competitor site blocks the headless browser or requires login, skip it and note why.
If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine.
**Step 3: Synthesize findings**
The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable).
Summarize conversationally:
> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..."
**Graceful degradation:**
- Browse available → screenshots + snapshots + WebSearch (richest research)
- Browse unavailable → WebSearch only (still good)
- WebSearch also unavailable → agent's built-in design knowledge (always works)
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
@@ -98,7 +127,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:**
```
Based on [product context] and [research findings / my design knowledge]:
@@ -113,12 +142,21 @@ MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
SAFE CHOICES (category baseline — your users expect these):
- [2-3 decisions that match category conventions, with rationale for playing safe]
RISKS (where your product gets its own face):
- [2-3 deliberate departures from convention]
- For each risk: what it is, why it works, what you gain, what it costs
The safe choices keep you literate in your category. The risks are where
your product becomes memorable. Which risks appeal to you? Want to see
different ones? Or adjust anything else?
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality.
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
@@ -208,7 +246,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
4. **Font specimen section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
@@ -216,11 +254,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system:
- **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards
- **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA
- **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button
- **Auth / onboarding:** login form with social buttons, branding, input validation states
- Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code.
7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
8. **Clean, professional layout** — the preview page IS a taste signal for the skill
9. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
+19
View File
@@ -452,6 +452,25 @@ List every ASCII diagram in files this plan touches. Still accurate?
### Unresolved Decisions
If any AskUserQuestion goes unanswered, note it here. Never silently default.
## Review Log
After producing the Completion Summary above, persist the review result:
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Before running this command, substitute the placeholder values from the Completion Summary you just produced:
- **TIMESTAMP**: current ISO 8601 datetime (e.g., 2026-03-16T14:30:00)
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" in the summary
- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary
- **MODE**: the mode the user selected (SCOPE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION)
{{REVIEW_DASHBOARD}}
## Formatting Rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
+18
View File
@@ -145,3 +145,21 @@ Project type: {web app / dashboard / marketing site / etc.}
11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop.
12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create.
## Review Log
After compiling the report, persist the review result:
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","design_score":"GRADE","ai_slop_score":"GRADE","mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute values from the report:
- **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if Design Score is A or B; "issues_open" if C, D, or F
- **GRADE**: the letter grade from the report (Design Score and AI Slop Score respectively)
- **MODE**: Full / Quick / Deep / Diff-aware / Regression
{{REVIEW_DASHBOARD}}
+20 -2
View File
@@ -89,8 +89,7 @@ For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in C
After producing the test diagram, write a test plan artifact to the project directory so `/qa` and `/qa-only` can consume it as primary test input (replacing the lossy git-diff heuristic):
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
BRANCH=$(git rev-parse --abbrev-ref HEAD)
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
USER=$(whoami)
DATETIME=$(date +%Y%m%d-%H%M%S)
mkdir -p ~/.gstack/projects/$SLUG
@@ -194,5 +193,24 @@ Check the git log for this branch. If there are prior commits suggesting a previ
* One sentence max per option. Pick in under 5 seconds.
* After each review section, pause and ask for feedback before moving on.
## Review Log
After producing the Completion Summary above, persist the review result:
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute values from the Completion Summary:
- **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" count
- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged"
- **MODE**: SCOPE_REDUCTION / BIG_CHANGE / SMALL_CHANGE
{{REVIEW_DASHBOARD}}
## Unresolved decisions
If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option.
+1 -1
View File
@@ -191,7 +191,7 @@ Write the report to both local and project-scoped locations:
**Project-scoped:**
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
+2 -2
View File
@@ -52,7 +52,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
```
2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
@@ -72,7 +72,7 @@ Write the report to both local and project-scoped locations:
**Project-scoped:** Write test outcome artifact for cross-session context:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
+2 -2
View File
@@ -72,7 +72,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
```
2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
@@ -207,7 +207,7 @@ Write the report to both local and project-scoped locations:
**Project-scoped:** Write test outcome artifact for cross-session context:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
+34 -1
View File
@@ -730,7 +730,7 @@ Compare screenshots and observations across pages for:
**Project-scoped:**
\`\`\`bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\\([^/]*/[^/]*\\)\\.git$|\\1|;s|.*[:/]\\([^/]*/[^/]*\\)$|\\1|' | tr '/' '-')
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
\`\`\`
Write to: \`~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md\`
@@ -814,6 +814,38 @@ Tie everything to user goals and product objectives. Always suggest specific imp
11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`;
}
function generateReviewDashboard(): string {
return `## Review Readiness Dashboard
After completing the review, read the review log to display the dashboard.
\`\`\`bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
\`\`\`
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
\`\`\`
+====================================================================+
| REVIEW READINESS DASHBOARD |
+====================================================================+
| Review | Runs | Last Run | Status |
|-----------------|------|---------------------|----------------------|
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
| Design Review | 0 | | NOT YET RUN |
+--------------------------------------------------------------------+
| VERDICT: 2/3 CLEAR Design Review not yet run |
+====================================================================+
\`\`\`
**Verdict logic:**
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
- Informational only does NOT block.`;
}
const RESOLVERS: Record<string, () => string> = {
COMMAND_REFERENCE: generateCommandReference,
SNAPSHOT_FLAGS: generateSnapshotFlags,
@@ -822,6 +854,7 @@ const RESOLVERS: Record<string, () => string> = {
BASE_BRANCH_DETECT: generateBaseBranchDetect,
QA_METHODOLOGY: generateQAMethodology,
DESIGN_METHODOLOGY: generateDesignMethodology,
REVIEW_DASHBOARD: generateReviewDashboard,
};
// ─── Template Processing ────────────────────────────────────
+9
View File
@@ -50,6 +50,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
3. Run `git diff <base>...HEAD --stat` and `git log <base>..HEAD --oneline` to understand what's being shipped.
4. Check review readiness:
{{REVIEW_DASHBOARD}}
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
- Show which reviews are missing or have open issues
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
---
## Step 2: Merge the base branch (BEFORE tests)
+27
View File
@@ -322,3 +322,30 @@ describe('description quality evals', () => {
expect(tipsSection).not.toContain('->');
});
});
describe('REVIEW_DASHBOARD resolver', () => {
const REVIEW_SKILLS = ['plan-ceo-review', 'plan-eng-review', 'plan-design-review'];
for (const skill of REVIEW_SKILLS) {
test(`review dashboard appears in ${skill} generated file`, () => {
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
expect(content).toContain('reviews.jsonl');
expect(content).toContain('REVIEW READINESS DASHBOARD');
});
}
test('review dashboard appears in ship generated file', () => {
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
expect(content).toContain('reviews.jsonl');
expect(content).toContain('REVIEW READINESS DASHBOARD');
});
test('resolver output contains key dashboard elements', () => {
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
expect(content).toContain('VERDICT');
expect(content).toContain('CLEARED TO SHIP');
expect(content).toContain('NOT YET RUN');
expect(content).toContain('7 days');
expect(content).toContain('Design Review');
});
});
+42
View File
@@ -665,3 +665,45 @@ describe('Planted-bug fixture validation', () => {
expect(content).toContain('update_column');
});
});
// --- gstack-slug helper ---
describe('gstack-slug', () => {
const SLUG_BIN = path.join(ROOT, 'bin', 'gstack-slug');
test('binary exists and is executable', () => {
expect(fs.existsSync(SLUG_BIN)).toBe(true);
const stat = fs.statSync(SLUG_BIN);
expect(stat.mode & 0o111).toBeGreaterThan(0);
});
test('outputs SLUG and BRANCH lines in a git repo', () => {
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
expect(result.exitCode).toBe(0);
const output = result.stdout.toString();
expect(output).toContain('SLUG=');
expect(output).toContain('BRANCH=');
});
test('SLUG does not contain forward slashes', () => {
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
const slug = result.stdout.toString().match(/SLUG=(.*)/)?.[1] ?? '';
expect(slug).not.toContain('/');
expect(slug.length).toBeGreaterThan(0);
});
test('BRANCH does not contain forward slashes', () => {
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
const branch = result.stdout.toString().match(/BRANCH=(.*)/)?.[1] ?? '';
expect(branch).not.toContain('/');
expect(branch.length).toBeGreaterThan(0);
});
test('output is eval-compatible (KEY=VALUE format)', () => {
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
const lines = result.stdout.toString().trim().split('\n');
expect(lines.length).toBe(2);
expect(lines[0]).toMatch(/^SLUG=.+/);
expect(lines[1]).toMatch(/^BRANCH=.+/);
});
});