feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149)

* refactor: rename qa-design-review → design-review

The "qa-" prefix was confusing — this is the live-site design audit with
fix loop, not a QA-only report. Rename directory and update all references
across docs, tests, scripts, and skill templates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: interactive /plan-design-review + CEO invokes designer

Rewrite /plan-design-review from report-only grading to an interactive
plan-fixer that rates each design dimension 0-10, explains what a 10
looks like, and edits the plan to get there. Parallel structure with
/plan-ceo-review and /plan-eng-review — one issue = one AskUserQuestion.

CEO review now detects UI scope and invokes the designer perspective
when the plan has frontend/UX work, so you get design review
automatically when it matters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: validation + touchfile entries for 100% coverage

Add design-consultation to command/snapshot flag validation. Add 4
skills to contributor mode validation (plan-design-review,
design-review, design-consultation, document-release). Add 2 templates
to hardcoded branch check. Register touchfile entries for 10 new
LLM-judge tests and 1 new E2E test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: LLM-judge for 10 skills + gstack-upgrade E2E

Add LLM-judge quality evals for all uncovered skills using a DRY
runWorkflowJudge helper with section marker guards. Add real E2E
test for gstack-upgrade using mock git remote (replaces test.todo).
Add plan-edit assertion to plan-design-review E2E.

14/15 skills now at full coverage. setup-browser-cookies remains
deferred (needs real browser).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add bisect commit style to CLAUDE.md

All commits should be single logical changes, split before pushing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-17 22:48:48 -05:00
committed by GitHub
parent f91222f5bd
commit 78c207efb4
24 changed files with 1120 additions and 765 deletions
+2 -2
View File
@@ -227,7 +227,7 @@ If the Eng Review is NOT "CLEAR":
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
- For Design Review: run `eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /plan-design-review for a full visual audit." Still never block.
- For Design Review: run `eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
```bash
@@ -664,7 +664,7 @@ eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
4. **Apply the design checklist** against the changed files. For each item:
- **[HIGH] mechanical CSS fix** (`outline: none`, `!important`, `font-size < 16px`): classify as AUTO-FIX
- **[HIGH/MEDIUM] design judgment needed**: classify as ASK
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /qa-design-review"
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /design-review"
5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow.
+1 -1
View File
@@ -70,7 +70,7 @@ If the Eng Review is NOT "CLEAR":
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
- For Design Review: run `eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /plan-design-review for a full visual audit." Still never block.
- For Design Review: run `eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
```bash