feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149)

* refactor: rename qa-design-review → design-review

The "qa-" prefix was confusing — this is the live-site design audit with
fix loop, not a QA-only report. Rename directory and update all references
across docs, tests, scripts, and skill templates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: interactive /plan-design-review + CEO invokes designer

Rewrite /plan-design-review from report-only grading to an interactive
plan-fixer that rates each design dimension 0-10, explains what a 10
looks like, and edits the plan to get there. Parallel structure with
/plan-ceo-review and /plan-eng-review — one issue = one AskUserQuestion.

CEO review now detects UI scope and invokes the designer perspective
when the plan has frontend/UX work, so you get design review
automatically when it matters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: validation + touchfile entries for 100% coverage

Add design-consultation to command/snapshot flag validation. Add 4
skills to contributor mode validation (plan-design-review,
design-review, design-consultation, document-release). Add 2 templates
to hardcoded branch check. Register touchfile entries for 10 new
LLM-judge tests and 1 new E2E test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: LLM-judge for 10 skills + gstack-upgrade E2E

Add LLM-judge quality evals for all uncovered skills using a DRY
runWorkflowJudge helper with section marker guards. Add real E2E
test for gstack-upgrade using mock git remote (replaces test.todo).
Add plan-edit assertion to plan-design-review E2E.

14/15 skills now at full coverage. setup-browser-cookies remains
deferred (needs real browser).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add bisect commit style to CLAUDE.md

All commits should be single logical changes, split before pushing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-17 22:48:48 -05:00
committed by GitHub
parent f91222f5bd
commit 78c207efb4
24 changed files with 1120 additions and 765 deletions
+1 -1
View File
@@ -219,7 +219,7 @@ eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
4. **Apply the design checklist** against the changed files. For each item:
- **[HIGH] mechanical CSS fix** (`outline: none`, `!important`, `font-size < 16px`): classify as AUTO-FIX
- **[HIGH/MEDIUM] design judgment needed**: classify as ASK
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /qa-design-review"
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /design-review"
5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow.
+3 -3
View File
@@ -24,7 +24,7 @@ Each item is tagged with a detection confidence level:
- **[HIGH]** — Reliably detectable via grep/pattern match. Definitive findings.
- **[MEDIUM]** — Detectable via pattern aggregation or heuristic. Flag as findings but expect some noise.
- **[LOW]** — Requires understanding visual intent. Present as: "Possible issue — verify visually or run /qa-design-review."
- **[LOW]** — Requires understanding visual intent. Present as: "Possible issue — verify visually or run /design-review."
---
@@ -38,7 +38,7 @@ Each item is tagged with a detection confidence level:
**ASK** (everything else — requires design judgment):
- All AI slop findings, typography structure, spacing choices, interaction state gaps, DESIGN.md violations
**LOW confidence items** → present as "Possible: [description]. Verify visually or run /qa-design-review." Never AUTO-FIX.
**LOW confidence items** → present as "Possible: [description]. Verify visually or run /design-review." Never AUTO-FIX.
---
@@ -55,7 +55,7 @@ Design Review: N issues (X auto-fixable, Y need input, Z possible)
Recommended fix: suggested fix
**POSSIBLE (verify visually):**
- [file:line] Possible issue — verify with /qa-design-review
- [file:line] Possible issue — verify with /design-review
```
If no issues found: `Design Review: No issues found.`