mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-08 19:13:56 +02:00
cab774cced
* refactor(plan-ceo-review): carve review body into on-demand section
Carve the largest skill (138,838 B) into a skeleton + one on-demand
section, the documented next Phase B target after /ship (v2_PLAN.md:216).
- sections/review-sections.md(.tmpl): the 11-section deep review, codex/
outside-voice rules, how-to-ask, Required Outputs, registries, Completion
Summary, Review Log, REVIEW_DASHBOARD, PLAN_FILE_REVIEW_REPORT, Next Steps,
docs/designs promotion, Formatting Rules, and the Mode Quick Reference.
- sections/manifest.json: passive registry (CM2), one entry.
- SKILL.md.tmpl: {{SECTION_INDEX}} after the system audit, a single
{{SECTION:review-sections}} STOP-Read after Step 0 mode selection, and a
Section self-check. All of Step 0 (the scope/mode conversation) stays in
the always-loaded skeleton; only EXIT_PLAN_MODE_GATE follows the section.
Measured: always-loaded skeleton 138,838 -> 80,731 B (-42%, ~14.4K tokens
off every invocation). Union (skeleton + section) 139,110 B, behavior held.
Boundary honors Codex P1: nothing review-governing (formatting rules, mode
reference, how-to-ask, required outputs) sits in the skeleton below the
STOP. Housekeeping resolvers ride in the section, matching the ship
precedent (adversarial.md carries LEARNINGS_LOG + GBRAIN_SAVE_RESULTS).
Tests (atomic with the carve — skill-docs.yml gates gen:skill-docs
freshness on every push, so source + regen + tests must land together):
- parity-harness: plan-ceo flipped to sectioned, maxSkeletonBytes 90_000
(measured 80,731 + headroom); content/minBytes run against the union.
- skill-size-budget: plan-ceo-review added to SECTIONS_EXTRACTED.
- section-manifest-consistency: generalized to discover every carved skill,
vars computed per-skill-case (Codex P2).
- skill-ceo-section-ordering (new, gate): per-PR static guard — STOP after
Step 0, review body absent from skeleton, report writer in the section,
nothing review-governing below the STOP.
- skill-e2e-plan-ceo-review-section-loading (new, periodic): refreshes the
installed skill first (Codex P1), drives full Step 0, asserts the section
is Read before the report.
- gen-skill-docs + skill-validation: read the skeleton+sections union for
carved skills so relocated prose still counts.
- touchfiles: plan-ceo-section-loading registered (periodic).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: bump VERSION + CHANGELOG for plan-ceo-review carve (v1.56.0.0)
MINOR: carves the largest skill into skeleton + on-demand section,
dropping plan-ceo-review's always-loaded cost 42% (138,838 -> 80,731 B,
~14.4K tokens off every invocation). User-facing release notes lead with
the measured token win.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(todos): file P3 follow-up — carve the shared {{PREAMBLE}} reference blocks
Surfaced by /plan-eng-review on the plan-ceo-review carve: per-skill section
carves stay modest because the ~40-50KB shared preamble dominates the
always-loaded surface. A single preamble-reference carve would help every
tier->=2 skill at once. Records the why, the cold-vs-hot split to measure,
and the guards it needs. Not implemented this PR.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): Layer 0 — guarantee AUQ format spec is always-loaded
Deterministic, free, per-PR keystone for the token-reduction era. For every
interactive (tier>=2) skill, asserts the full AskUserQuestion decision-brief
format (ELI10/Recommendation/Pros-cons/checks/Net/(recommended)/Stakes/
self-check) lives in the always-loaded SKILL.md skeleton, NOT only in an
on-demand section. Plus a roster guard (a carve can't silently drop the block)
and per-skill rule survival in the skeleton+sections union. 51 cases + a
negative control. Fails the instant a future carve strands AUQ-governing text
where it won't be loaded when a question fires.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): SDK capture engine + verbose-vs-carved no-degradation A/B
Adds the reusable SDK $OUT_FILE capture engine (auq-sdk-capture.ts): drives a
skill to its AUQ and captures the verbatim text the model GENERATES, cleanly
(real-PTY mangles plan-mode AUQs via cursor escapes). Pins the skill to an
absolute path with Read/Write-only tools so the agent can't wander to the
global install. gradeAuqRecommendation normalizes a non-"because" connective
before grading so substantive reasons aren't false-flagged (without touching
the pinned shared judge).
The A/B drives the same prompt through the carved 80KB skeleton and the
pre-carve 137KB monolith and fails if carved scores worse. Result: both 7/7
format, substance 5 — proven no degradation, transcript-verified each side read
its own planted SKILL.md. Periodic tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): consistency — same trigger N runs, stable format + substance
Drives the carved /plan-ceo-review AUQ N=3 times and fails if any format
element appears in one run but not another, or substance craters. Targets the
"fine one run, broken the next" failure class a single snapshot can't see.
Result: 3/3 stable, 7/7 + substance 5 every run. Periodic tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): behavioral matrix across AUQ-heavy skills
Data-driven test that drives each AUQ-heavy skill (plan-eng/design/devex,
office-hours, cso, spec, design-consultation) to its first AskUserQuestion and
grades it to the plan-ceo bar: 7/7 decision-brief format + recommendation
substance >=4. One case per skill (isolated failures), env-subsettable via
AUQ_MATRIX_ONLY. Browser/design-binary skills are intentionally excluded
(comparison boards, not format-AUQs; Layer 0 covers their spec). All targeted
skills pass 7/7 with substance 4-5. Periodic tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(codex): live recommendation-substance grade for /codex
Closes the gap where /codex's synthesis recommendation was only checked
statically (template grep) and via fixtures. Drives the real /codex skill over
a flawed diff and grades the emitted "Recommendation: ... because ..." line
with judgeRecommendation (present/commits/has_because/substance>=4). The named
weak spot holds up: substance 5. Periodic tier.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): deterministic trigger for format-compliance gate
A bare /plan-ceo-review against a repo whose work is already implemented makes
the model improvise an off-script "what should I review?" scope question that
skips the decision-brief format, which the gate test then times out waiting for.
Hand it a concrete plan to review (FORCING_FLOOR_CEO) so it reaches the real
Step 0 mode-selection AUQ that is the intended format check.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(office-hours): carve Phase 5+6 into on-demand section
Third Phase B carve (v2_PLAN.md:216, after ship and plan-ceo-review). Moves
Phase 5 (Design Doc templates) + Phase 6 (tiered relationship handoff) — the
session's output + closing tail, only reached after the conversation and
alternatives are done — into sections/design-and-handoff.md, behind a single
STOP-Read after Phase 4.5. The live conversation (Phases 1-4.5) and the
always-run Important Rules stay in the always-loaded skeleton.
Measured: always-loaded skeleton 118,280 -> 88,975 B (-24.8%). Union preserved.
The carved AUQ is identical to pre-carve (matrix: 7/7 format, substance 5),
and Layer 0 confirms the AUQ format spec stays in the skeleton — the AUQ
paranoid suite de-risked this carve end to end.
Atomic with tests + regen (skill-docs.yml gates gen:skill-docs freshness on
every push, so source + regen + tests land together; --host all regenerates
the inlined non-Claude variants):
- sections/manifest.json: passive registry, one entry.
- parity-harness: office-hours flipped to sectioned, maxSkeletonBytes 96_000
(measured 88,975 + headroom); content/minBytes run against the union.
- skill-size-budget: office-hours added to SECTIONS_EXTRACTED.
- gen-skill-docs + skill-validation: read the skeleton+sections union for
office-hours so relocated Phase 5/6 prose still counts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: bump VERSION + CHANGELOG for office-hours carve + AUQ suite (v1.57.0.0)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(preamble): carve CJK-escaping manual to on-demand doc
The AskUserQuestion format block is inlined into every interactive skill (~33).
It carried the full multi-paragraph non-ASCII/CJK escaping manual inline, but
that rationale only matters when a question contains CJK text and the operative
rule already lives in the always-loaded self-check. Moved the justification to
docs/askuserquestion-cjk.md (read on demand); kept the rule + a pointer.
Corpus: Claude-host SKILL.md total 3,087,499 -> 3,057,975 B (-29,524 B, ~900 B
x ~33 skills). Layer 0 still passes — the core decision-brief format stays
always-loaded; only the rare CJK rationale moved. Atomic with the all-host
regen (skill-docs.yml freshness gate). VERSION + package.json -> 1.58.0.0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(plan-eng-review): carve review body into on-demand section
Fourth Phase B carve (v2_PLAN.md:220). Moves the 4-section review (Architecture,
Code Quality, Tests, Performance), outside voice, required outputs, and review
report — everything after Step 0 scope — into sections/review-sections.md behind
a single STOP-Read. Step 0 (scope challenge) and EXIT_PLAN_MODE_GATE stay in the
always-loaded skeleton.
Measured: skeleton 106,984 -> 54,892 B (-48.7%). Union preserved. Atomic with
tests + all-host regen (freshness gate): parity flipped to sectioned
(maxSkeletonBytes 62K), plan-eng-review added to SECTIONS_EXTRACTED, gen-skill-docs
reads the union for relocated review/TEST_COVERAGE/dashboard prose. Layer 0 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(plan-design-review): carve review body into on-demand section
Fifth Phase B carve (v2_PLAN.md:220, bundled with plan-eng). Moves the 7 design
passes, required outputs, and review report — everything after Step 0 scope and
the mockup/rating phase — into sections/review-sections.md behind a STOP-Read.
Step 0, Step 0.5 mockups, the rating method, and EXIT_PLAN_MODE_GATE stay in the
always-loaded skeleton.
Measured: skeleton 112,057 -> 76,024 B (-32.2%). Union preserved. Atomic with
tests + all-host regen: parity sectioned (maxSkeletonBytes 82K), added to
SECTIONS_EXTRACTED, gen-skill-docs reads the union. Layer 0 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(plan-devex-review): carve review body into on-demand section
Sixth Phase B carve. Moves the 8 DX passes, required outputs, and review report
— everything after the Step 0 DX investigation — into sections/review-sections.md
behind a STOP-Read. All of Step 0 (persona, empathy, benchmark, journey trace,
roleplay) + the rating method + EXIT_PLAN_MODE_GATE stay always-loaded.
Measured: skeleton 110,621 -> 69,658 B (-37%). Union preserved. Atomic with
tests + all-host regen: added to SECTIONS_EXTRACTED, gen-skill-docs reads the
union. Layer 0 green. (No parity invariant entry for plan-devex-review.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: bump VERSION + CHANGELOG for plan-* family carves (v1.59.0.0)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: refresh ship golden baselines + gbrain-detection union after carves
Two follow-ups the carve commits should have carried (caught by the full suite,
missed by targeted subsets):
- ship golden baselines (claude/codex/factory) regenerated: the preamble CJK
trim (v1.58) changed ship's always-loaded AskUserQuestion block.
- gbrain-detection-override probes the office-hours skeleton+section union:
GBRAIN_SAVE_RESULTS moved into sections/design-and-handoff.md when office-hours
was carved, so the detection assertions now check both files.
Full `bun test` green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(auq): grade format-compliance gate from SDK capture, not the TUI
The real-PTY version grepped the stripAnsi'd interactive AUQ picker. Verified
directly that this cannot work: plan-mode AUQs render as a cursor picker whose
cursor-positioning escapes stripAnsi can't flatten — the picker renders fine for
a human (cursorSeen=45) but the flattened text drops ELI10:/(recommended) and
parseNumberedOptions returns 0. The test was grading a lossy projection and
failed by construction.
Rewritten to drive /plan-ceo-review via the SDK $OUT_FILE capture (the agent
writes the verbatim question it would have shown — clean text, no rendering
loss) and grade 7/7 format + kind-note + recommendation substance >=4. Same
property, reliable, environment-independent; shares the engine with the periodic
A/B and matrix evals. Result: 7/7 format, substance 5. Touchfiles key renamed
ask-user-question-format-pty -> auq-format-gate (no longer a PTY test).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: fix carve-broken CI evals (union reads + section fixtures)
Two CI eval jobs failed on the carved plan-* skills because they read content
that moved into sections/:
- llm-judge (skill-llm-eval): runWorkflowJudge sliced SKILL.md between markers
like "## Review Sections" / "## CRITICAL RULE" that now live in
sections/review-sections.md. The markers vanished from the skeleton, so the
judge scored empty/wrong content. Fix: read the skeleton+sections union.
Verified: plan-ceo modes / plan-eng sections / plan-design passes all PASS
(25/25).
- e2e-plan (skill-e2e-plan): setupPlanDir copied only <skill>/SKILL.md into the
fixture, not sections/. The carved skill's STOP pointed at a section file that
was absent, so the model improvised a compressed report table instead of the
canonical "| Review | Trigger | Why | Runs | Status | Findings |". Fix: copy
sections/ alongside SKILL.md in all 6 setup sites. Verified: report test PASS,
canonical table emitted.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: copy carved sections into all e2e fixtures (prevent more carve-blind CI fails)
Proactive sweep beyond the two CI logs: every e2e test that copies a carved
skill's SKILL.md into a temp fixture must also copy its sections/, or the
model hits a STOP pointing at a missing section file and improvises/degrades.
- skill-e2e.test.ts: plan-ceo/plan-eng/plan-design/office-hours copies across
planDir/reviewDir/ohDir/benefitsDir dests now copy sections/.
- skill-e2e-plan.test.ts: the office-hours copy + the 4-skill codex-offering
loop now copy sections/.
- skill-e2e-design.test.ts: plan-design-review copy now copies sections/.
- skill-e2e-office-hours.test.ts: both office-hours copies now copy sections/.
- skill-e2e-office-hours-brain-writeback.test.ts: GBRAIN_SAVE_RESULTS moved into
the section, so check the regenerated skeleton+section UNION for the gbrain put
block, ship both into the workdir, and restore both (the section regen was also
leaking into the working tree — finally now restores it).
ship copies (single-file Step-0 slices) and review/retro (not carved) untouched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: migrate section-loading E2E to lossless SDK tool-stream detection
The /ship and /plan-ceo-review section-loading tests drove a real PTY and
scraped the ANSI screen buffer for sections/<file>.md paths. That silently
saw nothing in a Conductor PTY (cursor-positioned tool renders and an
unanswered Step 0 question loop both defeat the regex), so both reported
read: [] even when the agent did the work.
They now run the skill through claude -p (the same SDK path the AUQ matrix
uses) and detect section reads from the tool-use stream — Read calls whose
file_path contains sections/<file>.md — with no rendering layer to mangle.
The run is also hermetic: the freshly-generated worktree skeleton + sections
are copied into a throwaway fixture with the absolute path pinned, so the
test validates this branch's carve without mutating the user's ~/.claude
install.
Validated EVALS_TIER=periodic: both pass (plan-ceo Reads review-sections.md;
ship Reads review-army.md + changelog.md), ~6.5 min for both vs ~23 min
combined on the old PTY path where both were failing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: consolidate branch to v1.56.0.0 (single MINOR above main)
The branch bumped VERSION several times during development (1.56 → 1.57 →
1.58 → 1.59), but none of those landed on main (main is at 1.55.1.0). Per
the "never orphan branch-internal versions" discipline, collapse all four
into a single 1.56.0.0 entry — one MINOR release covering the whole branch:
five skills carved (plan-ceo, office-hours, plan-eng, plan-design,
plan-devex), the shared AskUserQuestion preamble CJK trim, and the paranoid
AUQ no-degradation test suite + lossless section-loading tests.
VERSION and package.json set to 1.56.0.0; main's 1.55.1.0 entry preserved
below the consolidated entry. No SKILL.md drift (VERSION is not embedded in
generated bodies).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
224 lines
14 KiB
Cheetah
224 lines
14 KiB
Cheetah
## Review Sections (7 passes, after scope is agreed)
|
|
|
|
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
|
|
|
|
{{ANTI_SHORTCUT_CLAUSE}}
|
|
|
|
{{LEARNINGS_SEARCH}}
|
|
|
|
### Pass 1: Information Architecture
|
|
Rate 0-10: Does the plan define what the user sees first, second, third?
|
|
FIX TO 10: Add information hierarchy to the plan. Include ASCII diagram of screen/page structure and navigation flow. Apply "constraint worship" — if you can only show 3 things, which 3?
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues, say so and move on. Do NOT proceed until user responds.
|
|
|
|
### Pass 2: Interaction State Coverage
|
|
Rate 0-10: Does the plan specify loading, empty, error, success, partial states?
|
|
FIX TO 10: Add interaction state table to the plan:
|
|
```
|
|
FEATURE | LOADING | EMPTY | ERROR | SUCCESS | PARTIAL
|
|
---------------------|---------|-------|-------|---------|--------
|
|
[each UI feature] | [spec] | [spec]| [spec]| [spec] | [spec]
|
|
```
|
|
For each state: describe what the user SEES, not backend behavior.
|
|
Empty states are features — specify warmth, primary action, context.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 3: User Journey & Emotional Arc
|
|
Rate 0-10: Does the plan consider the user's emotional experience?
|
|
FIX TO 10: Add user journey storyboard:
|
|
```
|
|
STEP | USER DOES | USER FEELS | PLAN SPECIFIES?
|
|
-----|------------------|-----------------|----------------
|
|
1 | Lands on page | [what emotion?] | [what supports it?]
|
|
...
|
|
```
|
|
Apply time-horizon design: 5-sec visceral, 5-min behavioral, 5-year reflective.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 4: AI Slop Risk
|
|
Rate 0-10: Does the plan describe specific, intentional UI — or generic patterns?
|
|
FIX TO 10: Rewrite vague UI descriptions with specific alternatives.
|
|
|
|
{{DESIGN_HARD_RULES}}
|
|
- "Cards with icons" → what differentiates these from every SaaS template?
|
|
- "Hero section" → what makes this hero feel like THIS product?
|
|
- "Clean, modern UI" → meaningless. Replace with actual design decisions.
|
|
- "Dashboard with widgets" → what makes this NOT every other dashboard?
|
|
If visual mockups were generated in Step 0.5, evaluate them against the AI slop blacklist above. Read each mockup image using the Read tool. Does the mockup fall into generic patterns (3-column grid, centered hero, stock-photo feel)? If so, flag it and offer to regenerate with more specific direction via `$D iterate --feedback "..."`.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 5: Design System Alignment
|
|
Rate 0-10: Does the plan align with DESIGN.md?
|
|
FIX TO 10: If DESIGN.md exists, annotate with specific tokens/components. If no DESIGN.md, flag the gap and recommend `/design-consultation`.
|
|
Flag any new component — does it fit the existing vocabulary?
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 6: Responsive & Accessibility
|
|
Rate 0-10: Does the plan specify mobile/tablet, keyboard nav, screen readers?
|
|
FIX TO 10: Add responsive specs per viewport — not "stacked on mobile" but intentional layout changes. Add a11y: keyboard nav patterns, ARIA landmarks, touch target sizes (44px min), color contrast requirements.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 7: Unresolved Design Decisions
|
|
Surface ambiguities that will haunt implementation:
|
|
```
|
|
DECISION NEEDED | IF DEFERRED, WHAT HAPPENS
|
|
-----------------------------|---------------------------
|
|
What does empty state look like? | Engineer ships "No items found."
|
|
Mobile nav pattern? | Desktop nav hides behind hamburger
|
|
...
|
|
```
|
|
If visual mockups were generated in Step 0.5, reference them as evidence when surfacing unresolved decisions. A mockup makes decisions concrete — e.g., "Your approved mockup shows a sidebar nav, but the plan doesn't specify mobile behavior. What happens to this sidebar on 375px?"
|
|
Each decision = one AskUserQuestion with recommendation + WHY + alternatives. Edit the plan with each decision as it's made.
|
|
|
|
### Post-Pass: Update Mockups (if generated)
|
|
|
|
If mockups were generated in Step 0.5 and review passes changed significant design decisions (information architecture restructure, new states, layout changes), offer to regenerate (one-shot, not a loop):
|
|
|
|
AskUserQuestion: "The review passes changed [list major design changes]. Want me to regenerate mockups to reflect the updated plan? This ensures the visual reference matches what we're actually building."
|
|
|
|
If yes, use `$D iterate` with feedback summarizing the changes, or `$D variants` with an updated brief. Save to the same `$_DESIGN_DIR` directory.
|
|
|
|
## CRITICAL RULE — How to ask questions
|
|
Follow the AskUserQuestion format from the Preamble above. Additional rules for plan design reviews:
|
|
* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
|
|
* Describe the design gap concretely — what's missing, what the user will experience if it's not specified.
|
|
* Present 2-3 options. For each: effort to specify now, risk if deferred.
|
|
* **Map to Design Principles above.** One sentence connecting your recommendation to a specific principle.
|
|
* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
|
|
* **Zero findings:** if a section has zero findings, state "No issues, moving on" and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an "obvious fix" is still a gap and still needs user approval before any change lands in the plan.
|
|
* **NEVER use AskUserQuestion to ask which variant the user prefers.** Always create a comparison board first (`$D compare --serve`) and open it in the browser. The board has rating controls, comments, remix/regenerate buttons, and structured feedback output. Use AskUserQuestion ONLY to notify the user the board is open and wait for them to finish — not to present variants inline and ask "which do you prefer?" That is a degraded experience.
|
|
|
|
## Required Outputs
|
|
|
|
### "NOT in scope" section
|
|
Design decisions considered and explicitly deferred, with one-line rationale each.
|
|
|
|
### "What already exists" section
|
|
Existing DESIGN.md, UI patterns, and components that the plan should reuse.
|
|
|
|
### TODOS.md updates
|
|
After all review passes are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
|
|
|
For design debt: missing a11y, unresolved responsive behavior, deferred empty states. Each TODO gets:
|
|
* **What:** One-line description of the work.
|
|
* **Why:** The concrete problem it solves or value it unlocks.
|
|
* **Pros:** What you gain by doing this work.
|
|
* **Cons:** Cost, complexity, or risks of doing it.
|
|
* **Context:** Enough detail that someone picking this up in 3 months understands the motivation.
|
|
* **Depends on / blocked by:** Any prerequisites.
|
|
|
|
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
|
|
|
|
{{TASKS_SECTION_EMIT:design-review}}
|
|
|
|
### Completion Summary
|
|
```
|
|
+====================================================================+
|
|
| DESIGN PLAN REVIEW — COMPLETION SUMMARY |
|
|
+====================================================================+
|
|
| System Audit | [DESIGN.md status, UI scope] |
|
|
| Step 0 | [initial rating, focus areas] |
|
|
| Pass 1 (Info Arch) | ___/10 → ___/10 after fixes |
|
|
| Pass 2 (States) | ___/10 → ___/10 after fixes |
|
|
| Pass 3 (Journey) | ___/10 → ___/10 after fixes |
|
|
| Pass 4 (AI Slop) | ___/10 → ___/10 after fixes |
|
|
| Pass 5 (Design Sys) | ___/10 → ___/10 after fixes |
|
|
| Pass 6 (Responsive) | ___/10 → ___/10 after fixes |
|
|
| Pass 7 (Decisions) | ___ resolved, ___ deferred |
|
|
+--------------------------------------------------------------------+
|
|
| NOT in scope | written (___ items) |
|
|
| What already exists | written |
|
|
| TODOS.md updates | ___ items proposed |
|
|
| Approved Mockups | ___ generated, ___ approved |
|
|
| Decisions made | ___ added to plan |
|
|
| Decisions deferred | ___ (listed below) |
|
|
| Overall design score | ___/10 → ___/10 |
|
|
+====================================================================+
|
|
```
|
|
|
|
If all passes 8+: "Plan is design-complete. Run /design-review after implementation for visual QA."
|
|
If any below 8: note what's unresolved and why (user chose to defer).
|
|
|
|
### Unresolved Decisions
|
|
If any AskUserQuestion goes unanswered, note it here. Never silently default to an option.
|
|
|
|
### Approved Mockups
|
|
|
|
If visual mockups were generated during this review, add to the plan file:
|
|
|
|
```
|
|
## Approved Mockups
|
|
|
|
| Screen/Section | Mockup Path | Direction | Notes |
|
|
|----------------|-------------|-----------|-------|
|
|
| [screen name] | ~/.gstack/projects/$SLUG/designs/[folder]/[filename].png | [brief description] | [constraints from review] |
|
|
```
|
|
|
|
Include the full path to each approved mockup (the variant the user chose), a one-line description of the direction, and any constraints. The implementer reads this to know exactly which visual to build from. These persist across conversations and workspaces. If no mockups were generated, omit this section.
|
|
|
|
## Review Log
|
|
|
|
After producing the Completion Summary above, persist the review result.
|
|
|
|
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
|
|
`~/.gstack/` (user config directory, not project files). The skill preamble
|
|
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
|
|
the same pattern. The review dashboard depends on this data. Skipping this
|
|
command breaks the review readiness dashboard in /ship.
|
|
|
|
```bash
|
|
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'
|
|
```
|
|
|
|
Substitute values from the Completion Summary:
|
|
- **TIMESTAMP**: current ISO 8601 datetime
|
|
- **STATUS**: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
|
|
- **initial_score**: initial overall design score before fixes (0-10)
|
|
- **overall_score**: final overall design score after fixes (0-10)
|
|
- **unresolved**: number of unresolved design decisions
|
|
- **decisions_made**: number of design decisions added to the plan
|
|
- **COMMIT**: output of `git rev-parse --short HEAD`
|
|
|
|
{{REVIEW_DASHBOARD}}
|
|
|
|
{{PLAN_FILE_REVIEW_REPORT}}
|
|
|
|
{{LEARNINGS_LOG}}
|
|
|
|
{{GBRAIN_SAVE_RESULTS}}
|
|
|
|
{{BRAIN_WRITE_BACK}}
|
|
|
|
{{BRAIN_CACHE_REFRESH}}
|
|
|
|
## Next Steps — Review Chaining
|
|
|
|
After displaying the Review Readiness Dashboard, recommend the next review(s) based on what this design review discovered. Read the dashboard output to see which reviews have already been run and whether they are stale.
|
|
|
|
**Recommend /plan-eng-review if eng review is not skipped globally** — check the dashboard output for `skip_eng_review`. If it is `true`, eng review is opted out — do not recommend it. Otherwise, eng review is the required shipping gate. If this design review added significant interaction specifications, new user flows, or changed the information architecture, emphasize that eng review needs to validate the architectural implications. If an eng review already exists but the commit hash shows it predates this design review, note that it may be stale and should be re-run.
|
|
|
|
**Consider recommending /plan-ceo-review** — but only if this design review revealed fundamental product direction gaps. Specifically: if the overall design score started below 4/10, if the information architecture had major structural problems, or if the review surfaced questions about whether the right problem is being solved. AND no CEO review exists in the dashboard. This is a selective recommendation — most design reviews should NOT trigger a CEO review.
|
|
|
|
**If both are needed, recommend eng review first** (required gate).
|
|
|
|
**Recommend design exploration skills when appropriate** — /design-shotgun and /design-html
|
|
produce design artifacts (mockups, HTML previews), not application code. They belong in
|
|
plan mode alongside reviews. If this design review found visual issues that would benefit
|
|
from exploring new directions, recommend /design-shotgun. If approved mockups exist and
|
|
need to be turned into working HTML, recommend /design-html.
|
|
|
|
Use AskUserQuestion to present the next step. Include only applicable options:
|
|
- **A)** Run /plan-eng-review next (required gate)
|
|
- **B)** Run /plan-ceo-review (only if fundamental product gaps found)
|
|
- **C)** Run /design-shotgun — explore visual design variants for issues found
|
|
- **D)** Run /design-html — generate Pretext-native HTML from approved mockups
|
|
- **E)** Skip — I'll handle next steps manually
|
|
|
|
## Formatting Rules
|
|
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
|
|
* Label with NUMBER + LETTER (e.g., "3A", "3B").
|
|
* One sentence max per option.
|
|
* After each pass, pause and wait for feedback.
|
|
* Rate before and after each pass for scannability.
|
|
|