diff --git a/CHANGELOG.md b/CHANGELOG.md index 5df40ae8..04f690a3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,14 @@ # Changelog +## [0.6.3.0] - 2026-03-17 + +### Added + +- **Every PR touching frontend code now gets a design review automatically.** `/review` and `/ship` apply a 20-item design checklist against changed CSS, HTML, JSX, and view files. Catches AI slop patterns (purple gradients, 3-column icon grids, generic hero copy), typography issues (body text < 16px, blacklisted fonts), accessibility gaps (`outline: none`), and `!important` abuse. Mechanical CSS fixes are auto-applied; design judgment calls ask you first. +- **`gstack-diff-scope` categorizes what changed in your branch.** Run `eval $(gstack-diff-scope main)` and get `SCOPE_FRONTEND=true/false`, `SCOPE_BACKEND`, `SCOPE_PROMPTS`, `SCOPE_TESTS`, `SCOPE_DOCS`, `SCOPE_CONFIG`. Design review uses it to skip silently on backend-only PRs. Ship pre-flight uses it to recommend design review when frontend files are touched. +- **Design review shows up in the Review Readiness Dashboard.** The dashboard now distinguishes between "LITE" (code-level, runs automatically in /review and /ship) and "FULL" (visual audit via /plan-design-review with browse binary). Both show up as Design Review entries. +- **E2E eval for design review detection.** Planted CSS/HTML fixtures with 7 known anti-patterns (Papyrus font, 14px body text, `outline: none`, `!important`, purple gradient, generic hero copy, 3-column feature grid). The eval verifies `/review` catches at least 4 of 7. + ## [0.6.2.0] - 2026-03-17 ### Added diff --git a/TODOS.md b/TODOS.md index 8616f906..5f771de7 100644 --- a/TODOS.md +++ b/TODOS.md @@ -444,17 +444,17 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu ## Ship Confidence Dashboard -### Smart review relevance detection +### Smart review relevance detection — PARTIALLY SHIPPED -**What:** Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only). +~~**What:** Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only).~~ -**Why:** Currently dashboard always shows 4 rows. On docs-only changes, "Design Review: NOT YET RUN" is noise. +`bin/gstack-diff-scope` shipped — categorizes diff into SCOPE_FRONTEND, SCOPE_BACKEND, SCOPE_PROMPTS, SCOPE_TESTS, SCOPE_DOCS, SCOPE_CONFIG. Used by design-review-lite to skip when no frontend files changed. Dashboard integration for conditional row display is a follow-up. -**Context:** /plan-design-review and /qa already do file-type detection in diff-aware mode. Could reuse that heuristic. Would require a `gstack-diff-scope` helper or enriching `gstack-slug` to also output change categories. +**Remaining:** Dashboard conditional row display (hide "Design Review: NOT YET RUN" when SCOPE_FRONTEND=false). Extend to Eng Review (skip for docs-only) and CEO Review (skip for config-only). -**Effort:** M +**Effort:** S **Priority:** P3 -**Depends on:** Ship Confidence Dashboard (shipped) +**Depends on:** gstack-diff-scope (shipped) ### /merge skill — review-gated PR merge diff --git a/VERSION b/VERSION index e1e48733..1e40a508 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.6.2.0 +0.6.3.0 diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index b44dff6d..55238e3f 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -702,7 +702,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index a8f3498e..1d821bf2 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -645,7 +645,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 05c74ba7..48fe7230 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -348,7 +348,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ diff --git a/review/SKILL.md b/review/SKILL.md index 186978ef..5d734594 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -196,6 +196,47 @@ Follow the output format specified in the checklist. Respect the suppressions --- +## Step 4.5: Design Review (conditional) + +## Design Review (conditional, diff-scoped) + +Check if the diff touches frontend files using `gstack-diff-scope`: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null) +``` + +**If `SCOPE_FRONTEND=false`:** Skip design review silently. No output. + +**If `SCOPE_FRONTEND=true`:** + +1. **Check for DESIGN.md.** If `DESIGN.md` or `design-system.md` exists in the repo root, read it. All design findings are calibrated against it — patterns blessed in DESIGN.md are not flagged. If not found, use universal design principles. + +2. **Read `.claude/skills/review/design-checklist.md`.** If the file cannot be read, skip design review with a note: "Design checklist not found — skipping design review." + +3. **Read each changed frontend file** (full file, not just diff hunks). Frontend files are identified by the patterns listed in the checklist. + +4. **Apply the design checklist** against the changed files. For each item: + - **[HIGH] mechanical CSS fix** (`outline: none`, `!important`, `font-size < 16px`): classify as AUTO-FIX + - **[HIGH/MEDIUM] design judgment needed**: classify as ASK + - **[LOW] intent-based detection**: present as "Possible — verify visually or run /qa-design-review" + +5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow. + +6. **Log the result** for the Review Readiness Dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count. + +Include any design findings alongside the findings from Step 4. They follow the same Fix-First flow in Step 5 — AUTO-FIX for mechanical CSS fixes, ASK for everything else. + +--- + ## Step 5: Fix-First Review **Every finding gets action — not just critical ones.** diff --git a/ship/SKILL.md b/ship/SKILL.md index e2b524d9..486f32bf 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -186,7 +186,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -226,7 +226,8 @@ If the Eng Review is NOT "CLEAR": - Show that Eng Review is missing or has open issues - RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes - Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review - - If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them + - If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block + - For Design Review: run `eval $(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /plan-design-review for a full visual audit." Still never block. 3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate: ```bash @@ -642,6 +643,43 @@ Review the diff for structural issues that tests don't catch. - **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary - **Pass 2 (INFORMATIONAL):** All remaining categories +## Design Review (conditional, diff-scoped) + +Check if the diff touches frontend files using `gstack-diff-scope`: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null) +``` + +**If `SCOPE_FRONTEND=false`:** Skip design review silently. No output. + +**If `SCOPE_FRONTEND=true`:** + +1. **Check for DESIGN.md.** If `DESIGN.md` or `design-system.md` exists in the repo root, read it. All design findings are calibrated against it — patterns blessed in DESIGN.md are not flagged. If not found, use universal design principles. + +2. **Read `.claude/skills/review/design-checklist.md`.** If the file cannot be read, skip design review with a note: "Design checklist not found — skipping design review." + +3. **Read each changed frontend file** (full file, not just diff hunks). Frontend files are identified by the patterns listed in the checklist. + +4. **Apply the design checklist** against the changed files. For each item: + - **[HIGH] mechanical CSS fix** (`outline: none`, `!important`, `font-size < 16px`): classify as AUTO-FIX + - **[HIGH/MEDIUM] design judgment needed**: classify as ASK + - **[LOW] intent-based detection**: present as "Possible — verify visually or run /qa-design-review" + +5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow. + +6. **Log the result** for the Review Readiness Dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count. + + Include any design findings alongside the code review findings. They follow the same Fix-First flow below. + 4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX. @@ -863,7 +901,11 @@ gh pr create --base --title ": " --body "$(cat <<'EOF' ## Pre-Landing Review - + + +## Design Review + + ## Eval Results