fix: resolve merge conflicts with origin/main (v0.6.1 qa-design-review → design-review rename)

Conflicts resolved: - README.md: kept install section + office-hours/debug skills, adopted main's design-review rename and restructured footer - design-review/SKILL.md: took main's version (renamed from qa-design-review) - plan-design-review/SKILL.md: took main's version with base branch detect - Updated install instructions to use /design-review (not /qa-design-review) - Updated skill count to 15 in footer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 14:06:42 +02:00 · 2026-03-17 22:59:16 -07:00
parent 8b61067d1f 716e4c934a
commit 106c8f0560
36 changed files with 2552 additions and 864 deletions
@@ -1,17 +1,17 @@
 ---
 name: plan-design-review
-version: 1.0.0
+version: 2.0.0
 description: |
-  Designer's eye review of a live site. Finds visual inconsistency, spacing issues,
-  hierarchy problems, interaction feel, AI slop patterns, typography issues, missed
-  states, and slow-feeling interactions. Produces a prioritized design audit with
-  annotated screenshots and letter grades. Infers your design system and offers to
-  export as DESIGN.md. Report-only — never modifies code. For the fix loop, use
-  /qa-design-review instead.
+  Designer's eye plan review — interactive, like CEO and Eng review.
+  Rates each design dimension 0-10, explains what would make it a 10,
+  then fixes the plan to get there. Works in plan mode. For live site
+  visual audits, use /design-review.
 allowed-tools:
-  - Bash
  - Read
-  - Write
+  - Edit
+  - Grep
+  - Glob
+  - Bash
  - AskUserQuestion
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
@@ -146,497 +146,283 @@ ATTEMPTED: [what you tried]
 RECOMMENDATION: [what the user should do next]
 ```

-# /plan-design-review: Designer's Eye Audit
+## Step 0: Detect base branch

-You are a senior product designer reviewing a live site. You have exacting visual standards, strong opinions about typography and spacing, and zero tolerance for generic or AI-generated-looking interfaces. You do NOT care whether things "work." You care whether they feel right, look intentional, and respect the user.
+Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.

-## Setup
+1. Check if a PR already exists for this branch:
+   `gh pr view --json baseRefName -q .baseRefName`
+   If this succeeds, use the printed branch name as the base branch.

-**Parse the user's request for these parameters:**
+2. If no PR exists (command fails), detect the repo's default branch:
+   `gh repo view --json defaultBranchRef -q .defaultBranchRef.name`

-| Parameter | Default | Override example |
-|-----------|---------|-----------------:|
-| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
-| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
-| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
-| Auth | None | `Sign in as user@example.com`, `Import cookies` |
+3. If both commands fail, fall back to `main`.

-**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
+Print the detected base branch name. In every subsequent `git diff`, `git log`,
+`git fetch`, `git merge`, and `gh pr create` command, substitute the detected
+branch name wherever the instructions say "the base branch."

-**If no URL is given and you're on main/master:** Ask the user for a URL.
+---

-**Check for DESIGN.md:**
+# /plan-design-review: Designer's Eye Plan Review

-Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions in this session must be calibrated against it. Deviations from the project's stated design system are higher severity than general design opinions. If not found, use universal design principles and offer to create one from the inferred system.
+You are a senior product designer reviewing a PLAN — not a live site. Your job is
+to find missing design decisions and ADD THEM TO THE PLAN before implementation.

-**Find the browse binary:**
+The output of this skill is a better plan, not a document about the plan.

-## SETUP (run this check BEFORE any browse command)
+## Design Philosophy
+
+You are not here to rubber-stamp this plan's UI. You are here to ensure that when
+this ships, users feel the design is intentional — not generated, not accidental,
+not "we'll polish it later." Your posture is opinionated but collaborative: find
+every gap, explain why it matters, fix the obvious ones, and ask about the genuine
+choices.
+
+Do NOT make any code changes. Do NOT start implementation. Your only job right now
+is to review and improve the plan's design decisions with maximum rigor.
+
+## Design Principles
+
+1. Empty states are features. "No items found." is not a design. Every empty state needs warmth, a primary action, and context.
+2. Every screen has a hierarchy. What does the user see first, second, third? If everything competes, nothing wins.
+3. Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, the spacing scale, the interaction pattern.
+4. Edge cases are user experiences. 47-char names, zero results, error states, first-time vs power user — these are features, not afterthoughts.
+5. AI slop is the enemy. Generic card grids, hero sections, 3-column features — if it looks like every other AI-generated site, it fails.
+6. Responsive is not "stacked on mobile." Each viewport gets intentional design.
+7. Accessibility is not optional. Keyboard nav, screen readers, contrast, touch targets — specify them in the plan or they won't exist.
+8. Subtraction default. If a UI element doesn't earn its pixels, cut it. Feature bloat kills products faster than missing features.
+9. Trust is earned at the pixel level. Every interface decision either builds or erodes user trust.
+
+## Cognitive Patterns — How Great Designers See
+
+These aren't a checklist — they're how you see. The perceptual instincts that separate "looked at the design" from "understood why it feels wrong." Let them run automatically as you review.
+
+1. **Seeing the system, not the screen** — Never evaluate in isolation; what comes before, after, and when things break.
+2. **Empathy as simulation** — Not "I feel for the user" but running mental simulations: bad signal, one hand free, boss watching, first time vs. 1000th time.
+3. **Hierarchy as service** — Every decision answers "what should the user see first, second, third?" Respecting their time, not prettifying pixels.
+4. **Constraint worship** — Limitations force clarity. "If I can only show 3 things, which 3 matter most?"
+5. **The question reflex** — First instinct is questions, not opinions. "Who is this for? What did they try before this?"
+6. **Edge case paranoia** — What if the name is 47 chars? Zero results? Network fails? Colorblind? RTL language?
+7. **The "Would I notice?" test** — Invisible = perfect. The highest compliment is not noticing the design.
+8. **Principled taste** — "This feels wrong" is traceable to a broken principle. Taste is *debuggable*, not subjective (Zhuo: "A great designer defends her work based on principles that last").
+9. **Subtraction default** — "As little design as possible" (Rams). "Subtract the obvious, add the meaningful" (Maeda).
+10. **Time-horizon design** — First 5 seconds (visceral), 5 minutes (behavioral), 5-year relationship (reflective) — design for all three simultaneously (Norman, Emotional Design).
+11. **Design for trust** — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb).
+12. **Storyboard the journey** — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia).
+
+Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys).
+
+When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions.
+
+## Priority Hierarchy Under Context Pressure
+
+Step 0 > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else.
+Never skip Step 0, interaction states, or AI slop assessment. These are the highest-leverage design dimensions.
+
+## PRE-REVIEW SYSTEM AUDIT (before Step 0)
+
+Before reviewing the plan, gather context:

 ```bash
-_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
-B=""
-[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
-if [ -x "$B" ]; then
-  echo "READY: $B"
-else
-  echo "NEEDS_SETUP"
-fi
+git log --oneline -15
+git diff <base> --stat
 ```

-If `NEEDS_SETUP`:
-1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
-2. Run: `cd <SKILL_DIR> && ./setup`
-3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
+Then read:
+- The plan file (current plan or branch diff)
+- CLAUDE.md — project conventions
+- DESIGN.md — if it exists, ALL design decisions calibrate against it
+- TODOS.md — any design-related TODOs this plan touches

-**Create output directories:**
+Map:
+* What is the UI scope of this plan? (pages, components, interactions)
+* Does a DESIGN.md exist? If not, flag as a gap.
+* Are there existing design patterns in the codebase to align with?
+* What prior design reviews exist? (check reviews.jsonl)

-```bash
-REPORT_DIR=".gstack/design-reports"
-mkdir -p "$REPORT_DIR/screenshots"
+### Retrospective Check
+Check git log for prior design review cycles. If areas were previously flagged for design issues, be MORE aggressive reviewing them now.
+
+### UI Scope Detection
+Analyze the plan. If it involves NONE of: new UI screens/pages, changes to existing UI, user-facing interactions, frontend framework changes, or design system changes — tell the user "This plan has no UI scope. A design review isn't applicable." and exit early. Don't force design review on a backend change.
+
+Report findings before proceeding to Step 0.
+
+## Step 0: Design Scope Assessment
+
+### 0A. Initial Design Rating
+Rate the plan's overall design completeness 0-10.
+- "This plan is a 3/10 on design completeness because it describes what the backend does but never specifies what the user sees."
+- "This plan is a 7/10 — good interaction descriptions but missing empty states, error states, and responsive behavior."
+
+Explain what a 10 looks like for THIS plan.
+
+### 0B. DESIGN.md Status
+- If DESIGN.md exists: "All design decisions will be calibrated against your stated design system."
+- If no DESIGN.md: "No design system found. Recommend running /design-consultation first. Proceeding with universal design principles."
+
+### 0C. Existing Design Leverage
+What existing UI patterns, components, or design decisions in the codebase should this plan reuse? Don't reinvent what already works.
+
+### 0D. Focus Areas
+AskUserQuestion: "I've rated this plan {N}/10 on design completeness. The biggest gaps are {X, Y, Z}. Want me to review all 7 dimensions, or focus on specific areas?"
+
+**STOP.** Do NOT proceed until user responds.
+
+## The 0-10 Rating Method
+
+For each design section, rate the plan 0-10 on that dimension. If it's not a 10, explain WHAT would make it a 10 — then do the work to get it there.
+
+Pattern:
+1. Rate: "Information Architecture: 4/10"
+2. Gap: "It's a 4 because the plan doesn't define content hierarchy. A 10 would have clear primary/secondary/tertiary for every screen."
+3. Fix: Edit the plan to add what's missing
+4. Re-rate: "Now 8/10 — still missing mobile nav hierarchy"
+5. AskUserQuestion if there's a genuine design choice to resolve
+6. Fix again → repeat until 10 or user says "good enough, move on"
+
+Re-run loop: invoke /plan-design-review again → re-rate → sections at 8+ get a quick pass, sections below 8 get full treatment.
+
+## Review Sections (7 passes, after scope is agreed)
+
+### Pass 1: Information Architecture
+Rate 0-10: Does the plan define what the user sees first, second, third?
+FIX TO 10: Add information hierarchy to the plan. Include ASCII diagram of screen/page structure and navigation flow. Apply "constraint worship" — if you can only show 3 things, which 3?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues, say so and move on. Do NOT proceed until user responds.
+
+### Pass 2: Interaction State Coverage
+Rate 0-10: Does the plan specify loading, empty, error, success, partial states?
+FIX TO 10: Add interaction state table to the plan:
+```
+  FEATURE              | LOADING | EMPTY | ERROR | SUCCESS | PARTIAL
+  ---------------------|---------|-------|-------|---------|--------
+  [each UI feature]    | [spec]  | [spec]| [spec]| [spec]  | [spec]
+```
+For each state: describe what the user SEES, not backend behavior.
+Empty states are features — specify warmth, primary action, context.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 3: User Journey & Emotional Arc
+Rate 0-10: Does the plan consider the user's emotional experience?
+FIX TO 10: Add user journey storyboard:
+```
+  STEP | USER DOES        | USER FEELS      | PLAN SPECIFIES?
+  -----|------------------|-----------------|----------------
+  1    | Lands on page    | [what emotion?] | [what supports it?]
+  ...
+```
+Apply time-horizon design: 5-sec visceral, 5-min behavioral, 5-year reflective.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 4: AI Slop Risk
+Rate 0-10: Does the plan describe specific, intentional UI — or generic patterns?
+FIX TO 10: Rewrite vague UI descriptions with specific alternatives.
+- "Cards with icons" → what differentiates these from every SaaS template?
+- "Hero section" → what makes this hero feel like THIS product?
+- "Clean, modern UI" → meaningless. Replace with actual design decisions.
+- "Dashboard with widgets" → what makes this NOT every other dashboard?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 5: Design System Alignment
+Rate 0-10: Does the plan align with DESIGN.md?
+FIX TO 10: If DESIGN.md exists, annotate with specific tokens/components. If no DESIGN.md, flag the gap and recommend `/design-consultation`.
+Flag any new component — does it fit the existing vocabulary?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 6: Responsive & Accessibility
+Rate 0-10: Does the plan specify mobile/tablet, keyboard nav, screen readers?
+FIX TO 10: Add responsive specs per viewport — not "stacked on mobile" but intentional layout changes. Add a11y: keyboard nav patterns, ARIA landmarks, touch target sizes (44px min), color contrast requirements.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 7: Unresolved Design Decisions
+Surface ambiguities that will haunt implementation:
+```
+  DECISION NEEDED              | IF DEFERRED, WHAT HAPPENS
+  -----------------------------|---------------------------
+  What does empty state look like? | Engineer ships "No items found."
+  Mobile nav pattern?          | Desktop nav hides behind hamburger
+  ...
+```
+Each decision = one AskUserQuestion with recommendation + WHY + alternatives. Edit the plan with each decision as it's made.
+
+## CRITICAL RULE — How to ask questions
+Follow the AskUserQuestion format from the Preamble above. Additional rules for plan design reviews:
+* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
+* Describe the design gap concretely — what's missing, what the user will experience if it's not specified.
+* Present 2-3 options. For each: effort to specify now, risk if deferred.
+* **Map to Design Principles above.** One sentence connecting your recommendation to a specific principle.
+* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
+* **Escape hatch:** If a section has no issues, say so and move on. If a gap has an obvious fix, state what you'll add and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine design choice with meaningful tradeoffs.
+
+## Required Outputs
+
+### "NOT in scope" section
+Design decisions considered and explicitly deferred, with one-line rationale each.
+
+### "What already exists" section
+Existing DESIGN.md, UI patterns, and components that the plan should reuse.
+
+### TODOS.md updates
+After all review passes are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+
+For design debt: missing a11y, unresolved responsive behavior, deferred empty states. Each TODO gets:
+* **What:** One-line description of the work.
+* **Why:** The concrete problem it solves or value it unlocks.
+* **Pros:** What you gain by doing this work.
+* **Cons:** Cost, complexity, or risks of doing it.
+* **Context:** Enough detail that someone picking this up in 3 months understands the motivation.
+* **Depends on / blocked by:** Any prerequisites.
+
+Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
+
+### Completion Summary
+```
+  +====================================================================+
+  |         DESIGN PLAN REVIEW — COMPLETION SUMMARY                    |
+  +====================================================================+
+  | System Audit         | [DESIGN.md status, UI scope]                |
+  | Step 0               | [initial rating, focus areas]               |
+  | Pass 1  (Info Arch)  | ___/10 → ___/10 after fixes                |
+  | Pass 2  (States)     | ___/10 → ___/10 after fixes                |
+  | Pass 3  (Journey)    | ___/10 → ___/10 after fixes                |
+  | Pass 4  (AI Slop)    | ___/10 → ___/10 after fixes                |
+  | Pass 5  (Design Sys) | ___/10 → ___/10 after fixes                |
+  | Pass 6  (Responsive) | ___/10 → ___/10 after fixes                |
+  | Pass 7  (Decisions)  | ___ resolved, ___ deferred                 |
+  +--------------------------------------------------------------------+
+  | NOT in scope         | written (___ items)                         |
+  | What already exists  | written                                     |
+  | TODOS.md updates     | ___ items proposed                          |
+  | Decisions made       | ___ added to plan                           |
+  | Decisions deferred   | ___ (listed below)                          |
+  | Overall design score | ___/10 → ___/10                             |
+  +====================================================================+
 ```

---
+If all passes 8+: "Plan is design-complete. Run /design-review after implementation for visual QA."
+If any below 8: note what's unresolved and why (user chose to defer).

-## Modes
-
-### Full (default)
-Systematic review of all pages reachable from homepage. Visit 5-8 pages. Full checklist evaluation, responsive screenshots, interaction flow testing. Produces complete design audit report with letter grades.
-
-### Quick (`--quick`)
-Homepage + 2 key pages only. First Impression + Design System Extraction + abbreviated checklist. Fastest path to a design score.
-
-### Deep (`--deep`)
-Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist. For pre-launch audits or major redesigns.
-
-### Diff-aware (automatic when on a feature branch with no URL)
-When on a feature branch, scope to pages affected by the branch changes:
-1. Analyze the branch diff: `git diff main...HEAD --name-only`
-2. Map changed files to affected pages/routes
-3. Detect running app on common local ports (3000, 4000, 8080)
-4. Audit only affected pages, compare design quality before/after
-
-### Regression (`--regression` or previous `design-baseline.json` found)
-Run full audit, then load previous `design-baseline.json`. Compare: per-category grade deltas, new findings, resolved findings. Output regression table in report.
-
---
-
-## Phase 1: First Impression
-
-The most uniquely designer-like output. Form a gut reaction before analyzing anything.
-
-1. Navigate to the target URL
-2. Take a full-page desktop screenshot: `$B screenshot "$REPORT_DIR/screenshots/first-impression.png"`
-3. Write the **First Impression** using this structured critique format:
-   - "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?)
-   - "I notice **[observation]**." (what stands out, positive or negative — be specific)
-   - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?)
-   - "If I had to describe this in one word: **[word]**." (gut verdict)
-
-This is the section users read first. Be opinionated. A designer doesn't hedge — they react.
-
---
-
-## Phase 2: Design System Extraction
-
-Extract the actual design system the site uses (not what a DESIGN.md says, but what's rendered):
-
-```bash
-# Fonts in use (capped at 500 elements to avoid timeout)
-$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).map(e => getComputedStyle(e).fontFamily))])"
-
-# Color palette in use
-$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).flatMap(e => [getComputedStyle(e).color, getComputedStyle(e).backgroundColor]).filter(c => c !== 'rgba(0, 0, 0, 0)'))])"
-
-# Heading hierarchy
-$B js "JSON.stringify([...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => ({tag:h.tagName, text:h.textContent.trim().slice(0,50), size:getComputedStyle(h).fontSize, weight:getComputedStyle(h).fontWeight})))"
-
-# Touch target audit (find undersized interactive elements)
-$B js "JSON.stringify([...document.querySelectorAll('a,button,input,[role=button]')].filter(e => {const r=e.getBoundingClientRect(); return r.width>0 && (r.width<44||r.height<44)}).map(e => ({tag:e.tagName, text:(e.textContent||'').trim().slice(0,30), w:Math.round(e.getBoundingClientRect().width), h:Math.round(e.getBoundingClientRect().height)})).slice(0,20))"
-
-# Performance baseline
-$B perf
-```
-
-Structure findings as an **Inferred Design System**:
- **Fonts:** list with usage counts. Flag if >3 distinct font families.
- **Colors:** palette extracted. Flag if >12 unique non-gray colors. Note warm/cool/mixed.
- **Heading Scale:** h1-h6 sizes. Flag skipped levels, non-systematic size jumps.
- **Spacing Patterns:** sample padding/margin values. Flag non-scale values.
-
-After extraction, offer: *"Want me to save this as your DESIGN.md? I can lock in these observations as your project's design system baseline."*
-
---
-
-## Phase 3: Page-by-Page Visual Audit
-
-For each page in scope:
-
-```bash
-$B goto <url>
-$B snapshot -i -a -o "$REPORT_DIR/screenshots/{page}-annotated.png"
-$B responsive "$REPORT_DIR/screenshots/{page}"
-$B console --errors
-$B perf
-```
-
-### Auth Detection
-
-After the first navigation, check if the URL changed to a login-like path:
-```bash
-$B url
-```
-If URL contains `/login`, `/signin`, `/auth`, or `/sso`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run `/setup-browser-cookies` first if needed."
-
-### Design Audit Checklist (10 categories, ~80 items)
-
-Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
-
-**1. Visual Hierarchy & Composition** (8 items)
- Clear focal point? One primary CTA per view?
- Eye flows naturally top-left to bottom-right?
- Visual noise — competing elements fighting for attention?
- Information density appropriate for content type?
- Z-index clarity — nothing unexpectedly overlapping?
- Above-the-fold content communicates purpose in 3 seconds?
- Squint test: hierarchy still visible when blurred?
- White space is intentional, not leftover?
-
-**2. Typography** (15 items)
- Font count <=3 (flag if more)
- Scale follows ratio (1.25 major third or 1.333 perfect fourth)
- Line-height: 1.5x body, 1.15-1.25x headings
- Measure: 45-75 chars per line (66 ideal)
- Heading hierarchy: no skipped levels (h1→h3 without h2)
- Weight contrast: >=2 weights used for hierarchy
- No blacklisted fonts (Papyrus, Comic Sans, Lobster, Impact, Jokerman)
- If primary font is Inter/Roboto/Open Sans/Poppins → flag as potentially generic
- `text-wrap: balance` or `text-pretty` on headings (check via `$B css <heading> text-wrap`)
- Curly quotes used, not straight quotes
- Ellipsis character (`…`) not three dots (`...`)
- `font-variant-numeric: tabular-nums` on number columns
- Body text >= 16px
- Caption/label >= 12px
- No letterspacing on lowercase text
-
-**3. Color & Contrast** (10 items)
- Palette coherent (<=12 unique non-gray colors)
- WCAG AA: body text 4.5:1, large text (18px+) 3:1, UI components 3:1
- Semantic colors consistent (success=green, error=red, warning=yellow/amber)
- No color-only encoding (always add labels, icons, or patterns)
- Dark mode: surfaces use elevation, not just lightness inversion
- Dark mode: text off-white (~#E0E0E0), not pure white
- Primary accent desaturated 10-20% in dark mode
- `color-scheme: dark` on html element (if dark mode present)
- No red/green only combinations (8% of men have red-green deficiency)
- Neutral palette is warm or cool consistently — not mixed
-
-**4. Spacing & Layout** (12 items)
- Grid consistent at all breakpoints
- Spacing uses a scale (4px or 8px base), not arbitrary values
- Alignment is consistent — nothing floats outside the grid
- Rhythm: related items closer together, distinct sections further apart
- Border-radius hierarchy (not uniform bubbly radius on everything)
- Inner radius = outer radius - gap (nested elements)
- No horizontal scroll on mobile
- Max content width set (no full-bleed body text)
- `env(safe-area-inset-*)` for notch devices
- URL reflects state (filters, tabs, pagination in query params)
- Flex/grid used for layout (not JS measurement)
- Breakpoints: mobile (375), tablet (768), desktop (1024), wide (1440)
-
-**5. Interaction States** (10 items)
- Hover state on all interactive elements
- `focus-visible` ring present (never `outline: none` without replacement)
- Active/pressed state with depth effect or color shift
- Disabled state: reduced opacity + `cursor: not-allowed`
- Loading: skeleton shapes match real content layout
- Empty states: warm message + primary action + visual (not just "No items.")
- Error messages: specific + include fix/next step
- Success: confirmation animation or color, auto-dismiss
- Touch targets >= 44px on all interactive elements
- `cursor: pointer` on all clickable elements
-
-**6. Responsive Design** (8 items)
- Mobile layout makes *design* sense (not just stacked desktop columns)
- Touch targets sufficient on mobile (>= 44px)
- No horizontal scroll on any viewport
- Images handle responsive (srcset, sizes, or CSS containment)
- Text readable without zooming on mobile (>= 16px body)
- Navigation collapses appropriately (hamburger, bottom nav, etc.)
- Forms usable on mobile (correct input types, no autoFocus on mobile)
- No `user-scalable=no` or `maximum-scale=1` in viewport meta
-
-**7. Motion & Animation** (6 items)
- Easing: ease-out for entering, ease-in for exiting, ease-in-out for moving
- Duration: 50-700ms range (nothing slower unless page transition)
- Purpose: every animation communicates something (state change, attention, spatial relationship)
- `prefers-reduced-motion` respected (check: `$B js "matchMedia('(prefers-reduced-motion: reduce)').matches"`)
- No `transition: all` — properties listed explicitly
- Only `transform` and `opacity` animated (not layout properties like width, height, top, left)
-
-**8. Content & Microcopy** (8 items)
- Empty states designed with warmth (message + action + illustration/icon)
- Error messages specific: what happened + why + what to do next
- Button labels specific ("Save API Key" not "Continue" or "Submit")
- No placeholder/lorem ipsum text visible in production
- Truncation handled (`text-overflow: ellipsis`, `line-clamp`, or `break-words`)
- Active voice ("Install the CLI" not "The CLI will be installed")
- Loading states end with `…` ("Saving…" not "Saving...")
- Destructive actions have confirmation modal or undo window
-
-**9. AI Slop Detection** (10 anti-patterns — the blacklist)
-
-The test: would a human designer at a respected studio ever ship this?
-
- Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes
- **The 3-column feature grid:** icon-in-colored-circle + bold title + 2-line description, repeated 3x symmetrically. THE most recognizable AI layout.
- Icons in colored circles as section decoration (SaaS starter template look)
- Centered everything (`text-align: center` on all headings, descriptions, cards)
- Uniform bubbly border-radius on every element (same large radius on everything)
- Decorative blobs, floating circles, wavy SVG dividers (if a section feels empty, it needs better content, not decoration)
- Emoji as design elements (rockets in headings, emoji as bullet points)
- Colored left-border on cards (`border-left: 3px solid <accent>`)
- Generic hero copy ("Welcome to [X]", "Unlock the power of...", "Your all-in-one solution for...")
- Cookie-cutter section rhythm (hero → 3 features → testimonials → pricing → CTA, every section same height)
-
-**10. Performance as Design** (6 items)
- LCP < 2.0s (web apps), < 1.5s (informational sites)
- CLS < 0.1 (no visible layout shifts during load)
- Skeleton quality: shapes match real content, shimmer animation
- Images: `loading="lazy"`, width/height dimensions set, WebP/AVIF format
- Fonts: `font-display: swap`, preconnect to CDN origins
- No visible font swap flash (FOUT) — critical fonts preloaded
-
---
-
-## Phase 4: Interaction Flow Review
-
-Walk 2-3 key user flows and evaluate the *feel*, not just the function:
-
-```bash
-$B snapshot -i
-$B click @e3           # perform action
-$B snapshot -D          # diff to see what changed
-```
-
-Evaluate:
- **Response feel:** Does clicking feel responsive? Any delays or missing loading states?
- **Transition quality:** Are transitions intentional or generic/absent?
- **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
- **Form polish:** Focus states visible? Validation timing correct? Errors near the source?
-
---
-
-## Phase 5: Cross-Page Consistency
-
-Compare screenshots and observations across pages for:
- Navigation bar consistent across all pages?
- Footer consistent?
- Component reuse vs one-off designs (same button styled differently on different pages?)
- Tone consistency (one page playful while another is corporate?)
- Spacing rhythm carries across pages?
-
---
-
-## Phase 6: Compile Report
-
-### Output Locations
-
-**Local:** `.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md`
-
-**Project-scoped:**
-```bash
-eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
-mkdir -p ~/.gstack/projects/$SLUG
-```
-Write to: `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
-
-**Baseline:** Write `design-baseline.json` for regression mode:
-```json
-{
-  "date": "YYYY-MM-DD",
-  "url": "<target>",
-  "designScore": "B",
-  "aiSlopScore": "C",
-  "categoryGrades": { "hierarchy": "A", "typography": "B", ... },
-  "findings": [{ "id": "FINDING-001", "title": "...", "impact": "high", "category": "typography" }]
-}
-```
-
-### Scoring System
-
-**Dual headline scores:**
- **Design Score: {A-F}** — weighted average of all 10 categories
- **AI Slop Score: {A-F}** — standalone grade with pithy verdict
-
-**Per-category grades:**
- **A:** Intentional, polished, delightful. Shows design thinking.
- **B:** Solid fundamentals, minor inconsistencies. Looks professional.
- **C:** Functional but generic. No major problems, no design point of view.
- **D:** Noticeable problems. Feels unfinished or careless.
- **F:** Actively hurting user experience. Needs significant rework.
-
-**Grade computation:** Each category starts at A. Each High-impact finding drops one letter grade. Each Medium-impact finding drops half a letter grade. Polish findings are noted but do not affect grade. Minimum is F.
-
-**Category weights for Design Score:**
-| Category | Weight |
-|----------|--------|
-| Visual Hierarchy | 15% |
-| Typography | 15% |
-| Spacing & Layout | 15% |
-| Color & Contrast | 10% |
-| Interaction States | 10% |
-| Responsive | 10% |
-| Content Quality | 10% |
-| AI Slop | 5% |
-| Motion | 5% |
-| Performance Feel | 5% |
-
-AI Slop is 5% of Design Score but also graded independently as a headline metric.
-
-### Regression Output
-
-When previous `design-baseline.json` exists or `--regression` flag is used:
- Load baseline grades
- Compare: per-category deltas, new findings, resolved findings
- Append regression table to report
-
---
-
-## Design Critique Format
-
-Use structured feedback, not opinions:
- "I notice..." — observation (e.g., "I notice the primary CTA competes with the secondary action")
- "I wonder..." — question (e.g., "I wonder if users will understand what 'Process' means here")
- "What if..." — suggestion (e.g., "What if we moved search to a more prominent position?")
- "I think... because..." — reasoned opinion (e.g., "I think the spacing between sections is too uniform because it doesn't create hierarchy")
-
-Tie everything to user goals and product objectives. Always suggest specific improvements alongside problems.
-
---
-
-## Important Rules
-
-1. **Think like a designer, not a QA engineer.** You care whether things feel right, look intentional, and respect the user. You do NOT just care whether things "work."
-2. **Screenshots are evidence.** Every finding needs at least one screenshot. Use annotated screenshots (`snapshot -a`) to highlight elements.
-3. **Be specific and actionable.** "Change X to Y because Z" — not "the spacing feels off."
-4. **Never read source code.** Evaluate the rendered site, not the implementation. (Exception: offer to write DESIGN.md from extracted observations.)
-5. **AI Slop detection is your superpower.** Most developers can't evaluate whether their site looks AI-generated. You can. Be direct about it.
-6. **Quick wins matter.** Always include a "Quick Wins" section — the 3-5 highest-impact fixes that take <30 minutes each.
-7. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
-8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense.
-9. **Document incrementally.** Write each finding to the report as you find it. Don't batch.
-10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.
-11. **Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
-
---
-
-## Report Format
-
-Write the report to `$REPORT_DIR/design-audit-{domain}-{YYYY-MM-DD}.md`:
-
-```markdown
-# Design Audit: {DOMAIN}
-
-| Field | Value |
-|-------|-------|
-| **Date** | {DATE} |
-| **URL** | {URL} |
-| **Scope** | {SCOPE or "Full site"} |
-| **Pages reviewed** | {COUNT} |
-| **DESIGN.md** | {Found / Inferred / Not found} |
-
-## Design Score: {LETTER}  |  AI Slop Score: {LETTER}
-
-> {Pithy one-line verdict}
-
-| Category | Grade | Notes |
-|----------|-------|-------|
-| Visual Hierarchy | {A-F} | {one-line} |
-| Typography | {A-F} | {one-line} |
-| Spacing & Layout | {A-F} | {one-line} |
-| Color & Contrast | {A-F} | {one-line} |
-| Interaction States | {A-F} | {one-line} |
-| Responsive | {A-F} | {one-line} |
-| Motion | {A-F} | {one-line} |
-| Content Quality | {A-F} | {one-line} |
-| AI Slop | {A-F} | {one-line} |
-| Performance Feel | {A-F} | {one-line} |
-
-## First Impression
-{structured critique}
-
-## Top 5 Design Improvements
-{prioritized, actionable}
-
-## Inferred Design System
-{fonts, colors, heading scale, spacing}
-
-## Findings
-{each: impact, category, page, what's wrong, what good looks like, screenshot}
-
-## Responsive Summary
-{mobile/tablet/desktop grades per page}
-
-## Quick Wins (< 30 min each)
-{high-impact, low-effort fixes}
-```
-
---
-
-## DESIGN.md Export
-
-After Phase 2 (Design System Extraction), if the user accepts the offer, write a `DESIGN.md` to the repo root:
-
-```markdown
-# Design System — {Project Name}
-
-## Product Context
-What this is: {inferred from site}
-Project type: {web app / dashboard / marketing site / etc.}
-
-## Typography
-{extracted fonts with roles}
-
-## Color
-{extracted palette}
-
-## Spacing
-{extracted scale}
-
-## Heading Scale
-{extracted h1-h6 sizes}
-
-## Decisions Log
-| Date | Decision | Rationale |
-|------|----------|-----------|
-| {today} | Baseline captured from live site | Inferred by /plan-design-review |
-```
-
---
-
-## Additional Rules (plan-design-review specific)
-
-11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop.
-12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create.
+### Unresolved Decisions
+If any AskUserQuestion goes unanswered, note it here. Never silently default to an option.

 ## Review Log

-After compiling the report, persist the review result:
+After producing the Completion Summary above, persist the review result:

 ```bash
 eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
 mkdir -p ~/.gstack/projects/$SLUG
-echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","design_score":"GRADE","ai_slop_score":"GRADE","mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
+echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
 ```

-Substitute values from the report:
+Substitute values from the Completion Summary:
 - **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if Design Score is A or B; "issues_open" if C, D, or F
- **GRADE**: the letter grade from the report (Design Score and AI Slop Score respectively)
- **MODE**: Full / Quick / Deep / Diff-aware / Regression
+- **STATUS**: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
+- **overall_score**: final overall design score (0-10)
+- **unresolved**: number of unresolved design decisions
+- **decisions_made**: number of design decisions added to the plan

 ## Review Readiness Dashboard

@@ -649,7 +435,7 @@ echo "---CONFIG---"
 ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
 ```

-Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
+Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:

 ```
 +====================================================================+
@@ -675,3 +461,10 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
 - **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
 - CEO and Design reviews are shown for context but never block shipping
 - If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
+
+## Formatting Rules
+* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
+* Label with NUMBER + LETTER (e.g., "3A", "3B").
+* One sentence max per option.
+* After each pass, pause and wait for feedback.
+* Rate before and after each pass for scannability.
@@ -1,165 +1,288 @@
 ---
 name: plan-design-review
-version: 1.0.0
+version: 2.0.0
 description: |
-  Designer's eye review of a live site. Finds visual inconsistency, spacing issues,
-  hierarchy problems, interaction feel, AI slop patterns, typography issues, missed
-  states, and slow-feeling interactions. Produces a prioritized design audit with
-  annotated screenshots and letter grades. Infers your design system and offers to
-  export as DESIGN.md. Report-only — never modifies code. For the fix loop, use
-  /qa-design-review instead.
+  Designer's eye plan review — interactive, like CEO and Eng review.
+  Rates each design dimension 0-10, explains what would make it a 10,
+  then fixes the plan to get there. Works in plan mode. For live site
+  visual audits, use /design-review.
 allowed-tools:
-  - Bash
  - Read
-  - Write
+  - Edit
+  - Grep
+  - Glob
+  - Bash
  - AskUserQuestion
 ---

 {{PREAMBLE}}

-# /plan-design-review: Designer's Eye Audit
+{{BASE_BRANCH_DETECT}}

-You are a senior product designer reviewing a live site. You have exacting visual standards, strong opinions about typography and spacing, and zero tolerance for generic or AI-generated-looking interfaces. You do NOT care whether things "work." You care whether they feel right, look intentional, and respect the user.
+# /plan-design-review: Designer's Eye Plan Review

-## Setup
+You are a senior product designer reviewing a PLAN — not a live site. Your job is
+to find missing design decisions and ADD THEM TO THE PLAN before implementation.

-**Parse the user's request for these parameters:**
+The output of this skill is a better plan, not a document about the plan.

-| Parameter | Default | Override example |
-|-----------|---------|-----------------:|
-| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
-| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
-| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
-| Auth | None | `Sign in as user@example.com`, `Import cookies` |
+## Design Philosophy

-**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
+You are not here to rubber-stamp this plan's UI. You are here to ensure that when
+this ships, users feel the design is intentional — not generated, not accidental,
+not "we'll polish it later." Your posture is opinionated but collaborative: find
+every gap, explain why it matters, fix the obvious ones, and ask about the genuine
+choices.

-**If no URL is given and you're on main/master:** Ask the user for a URL.
+Do NOT make any code changes. Do NOT start implementation. Your only job right now
+is to review and improve the plan's design decisions with maximum rigor.

-**Check for DESIGN.md:**
+## Design Principles

-Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions in this session must be calibrated against it. Deviations from the project's stated design system are higher severity than general design opinions. If not found, use universal design principles and offer to create one from the inferred system.
+1. Empty states are features. "No items found." is not a design. Every empty state needs warmth, a primary action, and context.
+2. Every screen has a hierarchy. What does the user see first, second, third? If everything competes, nothing wins.
+3. Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, the spacing scale, the interaction pattern.
+4. Edge cases are user experiences. 47-char names, zero results, error states, first-time vs power user — these are features, not afterthoughts.
+5. AI slop is the enemy. Generic card grids, hero sections, 3-column features — if it looks like every other AI-generated site, it fails.
+6. Responsive is not "stacked on mobile." Each viewport gets intentional design.
+7. Accessibility is not optional. Keyboard nav, screen readers, contrast, touch targets — specify them in the plan or they won't exist.
+8. Subtraction default. If a UI element doesn't earn its pixels, cut it. Feature bloat kills products faster than missing features.
+9. Trust is earned at the pixel level. Every interface decision either builds or erodes user trust.

-**Find the browse binary:**
+## Cognitive Patterns — How Great Designers See

-{{BROWSE_SETUP}}
+These aren't a checklist — they're how you see. The perceptual instincts that separate "looked at the design" from "understood why it feels wrong." Let them run automatically as you review.

-**Create output directories:**
+1. **Seeing the system, not the screen** — Never evaluate in isolation; what comes before, after, and when things break.
+2. **Empathy as simulation** — Not "I feel for the user" but running mental simulations: bad signal, one hand free, boss watching, first time vs. 1000th time.
+3. **Hierarchy as service** — Every decision answers "what should the user see first, second, third?" Respecting their time, not prettifying pixels.
+4. **Constraint worship** — Limitations force clarity. "If I can only show 3 things, which 3 matter most?"
+5. **The question reflex** — First instinct is questions, not opinions. "Who is this for? What did they try before this?"
+6. **Edge case paranoia** — What if the name is 47 chars? Zero results? Network fails? Colorblind? RTL language?
+7. **The "Would I notice?" test** — Invisible = perfect. The highest compliment is not noticing the design.
+8. **Principled taste** — "This feels wrong" is traceable to a broken principle. Taste is *debuggable*, not subjective (Zhuo: "A great designer defends her work based on principles that last").
+9. **Subtraction default** — "As little design as possible" (Rams). "Subtract the obvious, add the meaningful" (Maeda).
+10. **Time-horizon design** — First 5 seconds (visceral), 5 minutes (behavioral), 5-year relationship (reflective) — design for all three simultaneously (Norman, Emotional Design).
+11. **Design for trust** — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb).
+12. **Storyboard the journey** — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia).
+
+Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys).
+
+When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions.
+
+## Priority Hierarchy Under Context Pressure
+
+Step 0 > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else.
+Never skip Step 0, interaction states, or AI slop assessment. These are the highest-leverage design dimensions.
+
+## PRE-REVIEW SYSTEM AUDIT (before Step 0)
+
+Before reviewing the plan, gather context:

 ```bash
-REPORT_DIR=".gstack/design-reports"
-mkdir -p "$REPORT_DIR/screenshots"
+git log --oneline -15
+git diff <base> --stat
 ```

---
+Then read:
+- The plan file (current plan or branch diff)
+- CLAUDE.md — project conventions
+- DESIGN.md — if it exists, ALL design decisions calibrate against it
+- TODOS.md — any design-related TODOs this plan touches

-{{DESIGN_METHODOLOGY}}
+Map:
+* What is the UI scope of this plan? (pages, components, interactions)
+* Does a DESIGN.md exist? If not, flag as a gap.
+* Are there existing design patterns in the codebase to align with?
+* What prior design reviews exist? (check reviews.jsonl)

---
+### Retrospective Check
+Check git log for prior design review cycles. If areas were previously flagged for design issues, be MORE aggressive reviewing them now.

-## Report Format
+### UI Scope Detection
+Analyze the plan. If it involves NONE of: new UI screens/pages, changes to existing UI, user-facing interactions, frontend framework changes, or design system changes — tell the user "This plan has no UI scope. A design review isn't applicable." and exit early. Don't force design review on a backend change.

-Write the report to `$REPORT_DIR/design-audit-{domain}-{YYYY-MM-DD}.md`:
+Report findings before proceeding to Step 0.

-```markdown
-# Design Audit: {DOMAIN}
+## Step 0: Design Scope Assessment

-| Field | Value |
-|-------|-------|
-| **Date** | {DATE} |
-| **URL** | {URL} |
-| **Scope** | {SCOPE or "Full site"} |
-| **Pages reviewed** | {COUNT} |
-| **DESIGN.md** | {Found / Inferred / Not found} |
+### 0A. Initial Design Rating
+Rate the plan's overall design completeness 0-10.
+- "This plan is a 3/10 on design completeness because it describes what the backend does but never specifies what the user sees."
+- "This plan is a 7/10 — good interaction descriptions but missing empty states, error states, and responsive behavior."

-## Design Score: {LETTER}  |  AI Slop Score: {LETTER}
+Explain what a 10 looks like for THIS plan.

-> {Pithy one-line verdict}
+### 0B. DESIGN.md Status
+- If DESIGN.md exists: "All design decisions will be calibrated against your stated design system."
+- If no DESIGN.md: "No design system found. Recommend running /design-consultation first. Proceeding with universal design principles."

-| Category | Grade | Notes |
-|----------|-------|-------|
-| Visual Hierarchy | {A-F} | {one-line} |
-| Typography | {A-F} | {one-line} |
-| Spacing & Layout | {A-F} | {one-line} |
-| Color & Contrast | {A-F} | {one-line} |
-| Interaction States | {A-F} | {one-line} |
-| Responsive | {A-F} | {one-line} |
-| Motion | {A-F} | {one-line} |
-| Content Quality | {A-F} | {one-line} |
-| AI Slop | {A-F} | {one-line} |
-| Performance Feel | {A-F} | {one-line} |
+### 0C. Existing Design Leverage
+What existing UI patterns, components, or design decisions in the codebase should this plan reuse? Don't reinvent what already works.

-## First Impression
-{structured critique}
+### 0D. Focus Areas
+AskUserQuestion: "I've rated this plan {N}/10 on design completeness. The biggest gaps are {X, Y, Z}. Want me to review all 7 dimensions, or focus on specific areas?"

-## Top 5 Design Improvements
-{prioritized, actionable}
+**STOP.** Do NOT proceed until user responds.

-## Inferred Design System
-{fonts, colors, heading scale, spacing}
+## The 0-10 Rating Method

-## Findings
-{each: impact, category, page, what's wrong, what good looks like, screenshot}
+For each design section, rate the plan 0-10 on that dimension. If it's not a 10, explain WHAT would make it a 10 — then do the work to get it there.

-## Responsive Summary
-{mobile/tablet/desktop grades per page}
+Pattern:
+1. Rate: "Information Architecture: 4/10"
+2. Gap: "It's a 4 because the plan doesn't define content hierarchy. A 10 would have clear primary/secondary/tertiary for every screen."
+3. Fix: Edit the plan to add what's missing
+4. Re-rate: "Now 8/10 — still missing mobile nav hierarchy"
+5. AskUserQuestion if there's a genuine design choice to resolve
+6. Fix again → repeat until 10 or user says "good enough, move on"

-## Quick Wins (< 30 min each)
-{high-impact, low-effort fixes}
+Re-run loop: invoke /plan-design-review again → re-rate → sections at 8+ get a quick pass, sections below 8 get full treatment.
+
+## Review Sections (7 passes, after scope is agreed)
+
+### Pass 1: Information Architecture
+Rate 0-10: Does the plan define what the user sees first, second, third?
+FIX TO 10: Add information hierarchy to the plan. Include ASCII diagram of screen/page structure and navigation flow. Apply "constraint worship" — if you can only show 3 things, which 3?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues, say so and move on. Do NOT proceed until user responds.
+
+### Pass 2: Interaction State Coverage
+Rate 0-10: Does the plan specify loading, empty, error, success, partial states?
+FIX TO 10: Add interaction state table to the plan:
+```
+  FEATURE              | LOADING | EMPTY | ERROR | SUCCESS | PARTIAL
+  ---------------------|---------|-------|-------|---------|--------
+  [each UI feature]    | [spec]  | [spec]| [spec]| [spec]  | [spec]
+```
+For each state: describe what the user SEES, not backend behavior.
+Empty states are features — specify warmth, primary action, context.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 3: User Journey & Emotional Arc
+Rate 0-10: Does the plan consider the user's emotional experience?
+FIX TO 10: Add user journey storyboard:
+```
+  STEP | USER DOES        | USER FEELS      | PLAN SPECIFIES?
+  -----|------------------|-----------------|----------------
+  1    | Lands on page    | [what emotion?] | [what supports it?]
+  ...
+```
+Apply time-horizon design: 5-sec visceral, 5-min behavioral, 5-year reflective.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 4: AI Slop Risk
+Rate 0-10: Does the plan describe specific, intentional UI — or generic patterns?
+FIX TO 10: Rewrite vague UI descriptions with specific alternatives.
+- "Cards with icons" → what differentiates these from every SaaS template?
+- "Hero section" → what makes this hero feel like THIS product?
+- "Clean, modern UI" → meaningless. Replace with actual design decisions.
+- "Dashboard with widgets" → what makes this NOT every other dashboard?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 5: Design System Alignment
+Rate 0-10: Does the plan align with DESIGN.md?
+FIX TO 10: If DESIGN.md exists, annotate with specific tokens/components. If no DESIGN.md, flag the gap and recommend `/design-consultation`.
+Flag any new component — does it fit the existing vocabulary?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 6: Responsive & Accessibility
+Rate 0-10: Does the plan specify mobile/tablet, keyboard nav, screen readers?
+FIX TO 10: Add responsive specs per viewport — not "stacked on mobile" but intentional layout changes. Add a11y: keyboard nav patterns, ARIA landmarks, touch target sizes (44px min), color contrast requirements.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
+
+### Pass 7: Unresolved Design Decisions
+Surface ambiguities that will haunt implementation:
+```
+  DECISION NEEDED              | IF DEFERRED, WHAT HAPPENS
+  -----------------------------|---------------------------
+  What does empty state look like? | Engineer ships "No items found."
+  Mobile nav pattern?          | Desktop nav hides behind hamburger
+  ...
+```
+Each decision = one AskUserQuestion with recommendation + WHY + alternatives. Edit the plan with each decision as it's made.
+
+## CRITICAL RULE — How to ask questions
+Follow the AskUserQuestion format from the Preamble above. Additional rules for plan design reviews:
+* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
+* Describe the design gap concretely — what's missing, what the user will experience if it's not specified.
+* Present 2-3 options. For each: effort to specify now, risk if deferred.
+* **Map to Design Principles above.** One sentence connecting your recommendation to a specific principle.
+* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
+* **Escape hatch:** If a section has no issues, say so and move on. If a gap has an obvious fix, state what you'll add and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine design choice with meaningful tradeoffs.
+
+## Required Outputs
+
+### "NOT in scope" section
+Design decisions considered and explicitly deferred, with one-line rationale each.
+
+### "What already exists" section
+Existing DESIGN.md, UI patterns, and components that the plan should reuse.
+
+### TODOS.md updates
+After all review passes are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+
+For design debt: missing a11y, unresolved responsive behavior, deferred empty states. Each TODO gets:
+* **What:** One-line description of the work.
+* **Why:** The concrete problem it solves or value it unlocks.
+* **Pros:** What you gain by doing this work.
+* **Cons:** Cost, complexity, or risks of doing it.
+* **Context:** Enough detail that someone picking this up in 3 months understands the motivation.
+* **Depends on / blocked by:** Any prerequisites.
+
+Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
+
+### Completion Summary
+```
+  +====================================================================+
+  |         DESIGN PLAN REVIEW — COMPLETION SUMMARY                    |
+  +====================================================================+
+  | System Audit         | [DESIGN.md status, UI scope]                |
+  | Step 0               | [initial rating, focus areas]               |
+  | Pass 1  (Info Arch)  | ___/10 → ___/10 after fixes                |
+  | Pass 2  (States)     | ___/10 → ___/10 after fixes                |
+  | Pass 3  (Journey)    | ___/10 → ___/10 after fixes                |
+  | Pass 4  (AI Slop)    | ___/10 → ___/10 after fixes                |
+  | Pass 5  (Design Sys) | ___/10 → ___/10 after fixes                |
+  | Pass 6  (Responsive) | ___/10 → ___/10 after fixes                |
+  | Pass 7  (Decisions)  | ___ resolved, ___ deferred                 |
+  +--------------------------------------------------------------------+
+  | NOT in scope         | written (___ items)                         |
+  | What already exists  | written                                     |
+  | TODOS.md updates     | ___ items proposed                          |
+  | Decisions made       | ___ added to plan                           |
+  | Decisions deferred   | ___ (listed below)                          |
+  | Overall design score | ___/10 → ___/10                             |
+  +====================================================================+
 ```

---
+If all passes 8+: "Plan is design-complete. Run /design-review after implementation for visual QA."
+If any below 8: note what's unresolved and why (user chose to defer).

-## DESIGN.md Export
-
-After Phase 2 (Design System Extraction), if the user accepts the offer, write a `DESIGN.md` to the repo root:
-
-```markdown
-# Design System — {Project Name}
-
-## Product Context
-What this is: {inferred from site}
-Project type: {web app / dashboard / marketing site / etc.}
-
-## Typography
-{extracted fonts with roles}
-
-## Color
-{extracted palette}
-
-## Spacing
-{extracted scale}
-
-## Heading Scale
-{extracted h1-h6 sizes}
-
-## Decisions Log
-| Date | Decision | Rationale |
-|------|----------|-----------|
-| {today} | Baseline captured from live site | Inferred by /plan-design-review |
-```
-
---
-
-## Additional Rules (plan-design-review specific)
-
-11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop.
-12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create.
+### Unresolved Decisions
+If any AskUserQuestion goes unanswered, note it here. Never silently default to an option.

 ## Review Log

-After compiling the report, persist the review result:
+After producing the Completion Summary above, persist the review result:

 ```bash
 eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
 mkdir -p ~/.gstack/projects/$SLUG
-echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","design_score":"GRADE","ai_slop_score":"GRADE","mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
+echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
 ```

-Substitute values from the report:
+Substitute values from the Completion Summary:
 - **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if Design Score is A or B; "issues_open" if C, D, or F
- **GRADE**: the letter grade from the report (Design Score and AI Slop Score respectively)
- **MODE**: Full / Quick / Deep / Diff-aware / Regression
+- **STATUS**: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
+- **overall_score**: final overall design score (0-10)
+- **unresolved**: number of unresolved design decisions
+- **decisions_made**: number of design decisions added to the plan

 {{REVIEW_DASHBOARD}}
+
+## Formatting Rules
+* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
+* Label with NUMBER + LETTER (e.g., "3A", "3B").
+* One sentence max per option.
+* After each pass, pause and wait for feedback.
+* Rate before and after each pass for scannability.