mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-01 15:51:41 +02:00
070722ace3
* feat(brain): brain-cache-spec.ts — single source of truth for cache layer Foundation for the brain-aware planning skills work (v1.48 plan / D2). One TS const file consolidates BRAIN_CACHE_ENTITIES (8 entities × TTL + budget + invalidation rules), SKILL_DIGEST_SUBSETS (per-skill which files to load), SALIENCE_DEFAULT_ALLOWLIST (D9 privacy gate), SKILL_CALIBRATION_WEIGHTS (Phase 2 E5), and policy / identity / schema constants. Drift between docs and runtime becomes impossible by construction: resolver, cache CLI, and test/skill-preflight-budget.test.ts all import from the same module. test/brain-cache-spec.test.ts: 19 invariant assertions (subset/entity consistency, per-skill achievability, allowlist sanity, transport defaults, user-slug fallback chain, lock timeout, retention policy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): gstack-core@1.0.0 schema pack (T1 / Phase 0) Defines 8 typed page kinds for the brain entity model: gstack/user-profile, gstack/product, gstack/goal, gstack/developer-persona, gstack/brand, gstack/competitive-intel, gstack/skill-run, gstack/take Each declares frontmatter shape (typed fields with required/optional flags), retention policy (immutable / archive-after-90d / never-archive), and emits_links graph for mcp__gbrain__schema_graph rendering. getSchemaPackMutationPayload() returns JSON in the shape accepted by mcp__gbrain__schema_apply_mutations. Idempotent registration: gbrain skips when pack+version already installed. test/gstack-schema-pack.test.ts: 16 invariants on pack shape, retention policies, link verb consistency, JSON serializability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): gstack-brain-cache CLI (T2a) — core subcommands bin/gstack-brain-cache: TS CLI with five subcommands: get <entity-name> [--project <slug>] refresh [--full] [--entity X] [--project <slug>] invalidate <entity-name> [--project <slug>] digest <entity-slug> meta [--project <slug>] Cache layout per Phase 0.5 design: ~/.gstack/brain-cache/ ← cross-project (user-profile) ~/.gstack/projects/<slug>/brain-cache/ ← per-project (everything else) Per-entity TTL drives staleness; per-entity byte budgets enforce compression at write time. Atomic writes via tmp+rename. Stale-but-usable fallback when brain unreachable (returns cached digest with diagnostic prefix instead of failing). Schema-version mismatch + endpoint switch both trigger full rebuild for the affected scope (D4 A4). Fetch+compress paths wired for the 7 entities (user-profile, product, goals, developer-persona, brand, competitive-intel, recent-decisions, salience) via gbrain CLI shell-out — works for local PGLite and local-stdio MCP, transparent over the existing spawnGbrain helper. Concurrent-refresh dedup (D3 / T15) is a follow-up commit. Salience allowlist gate (D9 / T17) is a follow-up commit. Bootstrap + lifecycle subcommands (T2b / T18) are follow-up commits. test/brain-cache-roundtrip.test.ts: 11 tests covering path resolution, meta lifecycle, endpoint detection, schema mismatch behavior, and the four cache states (warm / cold-refreshed / stale-fallback / missing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): concurrent-refresh lockfile dedup (T15 / D3) When autoplan dispatches 4 planning skills back-to-back and they all hit a cold-miss on the same digest, only ONE actually fetches from the brain. The rest dedup via the project-scoped lockfile at ~/.gstack/projects/<slug>/brain-cache/.refresh.lock. Reuses the 5-min stale-takeover convention from /sync-gbrain. Lock is taken over when: - File is older than CACHE_REFRESH_LOCK_TIMEOUT_MS - PID is on the same host and dead (process.kill(pid, 0) fails) - Lock file is corrupt (defensive) withRefreshLock(projectSlug, fn) returns either the callback's value or the literal 'dedup'. The CLI emits exit code 3 + diagnostic stderr on dedup, so callers can choose to wait + retry (resolver does this) or fall through to stale-but-usable behavior. test/cache-concurrent-refresh.test.ts: 7 tests covering acquire/release, stale-takeover, dead-PID takeover, corrupt-lock recovery, error-path release, and cross-project lock location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): salience privacy allowlist gate (T17 / D9) D9 cross-model finding from codex outside voice: salience-sourced digests can include emotionally-weighted personal pages (family, therapy, reflection). Pulling those into a coding-review prompt leaks sensitive context into work-flow reasoning. fetchSalience now strips entries whose slugs don't match an allowlist prefix BEFORE writing to the cache file. Default allowlist is SALIENCE_DEFAULT_ALLOWLIST = ['projects/', 'concepts/', 'gstack/']. User can extend via: gstack-config set salience_allowlist 'projects/,gstack/,concepts/,custom/' or override with GSTACK_SALIENCE_ALLOWLIST env var. Digest still records the strip count for transparency. Empty result emits 'all N entries stripped' note rather than silent absence. test/salience-allowlist.test.ts: 9 tests covering default permits, default blocks, empty allowlist, env override, whitespace trimming, and the invariant that defaults contain nothing sensitive (personal, family, therapy, reflection, private, medical, health). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): bootstrap + list + purge subcommands (T2b / T18) T2b — bootstrap synthesizes draft entity content from CLAUDE.md + README + recent learnings.jsonl and emits as JSON for the caller. Skill template is responsible for the AUQ-confirm-before-write flow (D10 T4 extraction- review requirement). Cli stays pure (no AUQ logic); agent owns user interaction. T18 — list/purge subcommands close the lifecycle loop: list [--project <slug>] — enumerate gstack-owned pages in brain (probe all 8 gstack/* page types) purge <slug> — delete one gstack page, refuses non-gstack/ slugs (defensive) list defaults to all-projects (cross-project user-profile included). With --project, filters to per-project pages plus the cross-project user-profile. --json flag emits machine-readable output for the agent. Retention sweep + audit subcommand are deferred to a follow-up commit (they need the lifecycle scheduling design, not just CLI plumbing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): brain-aware planning resolvers + 3 new placeholders (T4) scripts/resolvers/gbrain.ts adds: - generateBrainPreflight(ctx) — emits per-skill ## Brain Context block + bash that loads digests via gstack-brain-cache get (one call per digest). Per-skill subset comes from SKILL_DIGEST_SUBSETS (single source). - generateBrainCacheRefresh(ctx) — at-skill-end background refresh hook; non-blocking; warms cache for next run. - generateBrainWriteBack(ctx) — Phase 2 / E5 calibration write-back with per-skill weight. Gated on personal trust policy + the BRAIN_CALIBRATION_WRITEBACK flag. Includes invalidation bash that busts affected digests after the write. scripts/resolvers/index.ts registers three new placeholders: {{BRAIN_PREFLIGHT}}, {{BRAIN_CACHE_REFRESH}}, {{BRAIN_WRITE_BACK}} All three resolvers return empty string for skills not in SKILL_DIGEST_SUBSETS (defensive — skill template authors can drop the placeholders into non-preflight skills with zero effect). D9 privacy is mentioned in the rendered preflight prose so the agent knows to expect filtered salience. D11 codex tension: write-back gates on brain_trust_policy@<hash> being personal — shared brains skip write-back to avoid polluting team calibration profile. test/brain-preflight.test.ts: 19 tests covering subset rendering, non-preflight skill gating, cross-project vs per-project --project flag emission, weight injection per skill, BRAIN_CALIBRATION_WRITEBACK flag mention, and registration in RESOLVERS map. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): gstack-config brain integration helpers (T5+T10+T16) Extends bin/gstack-config to support the brain-aware planning layer: KEY VALIDATION (T5): Plain alphanumeric/underscore now extended to allow @<hex-hash> suffix. Required for per-endpoint namespaced keys (brain_trust_policy@<sha8>, user_slug_at_<sha8>). Keys without the suffix still validate as before. VALUE WHITELISTING (D4 / D11): brain_trust_policy@* values gated to personal | shared | unset. Unknown values warn + default to unset (defense against typos). NEW DEFAULTS (lookup_default): brain_trust_policy@* -> unset salience_allowlist -> '' (resolver uses SALIENCE_DEFAULT_ALLOWLIST) user_slug_at_* -> '' (resolve-user-slug fills + persists on demand) NEW SUBCOMMANDS: endpoint-hash — print sha8 of active gbrain MCP URL from ~/.claude.json. Collision check escalates to sha16 when a prior endpoint stored at the same sha8 would conflict (T10 defensive default). resolve-user-slug — walks D4 A3 identity chain: 1. mcp__gbrain__whoami.client_name 2. $USER env var 3. sha8(git config user.email) 4. anonymous-<sha8(hostname)> Persists result on first call so subsequent calls are stable across sessions. test/user-slug-fallback.test.ts: 14 tests covering endpoint-hash output shape, fallback chain ordering, persistence, brain_trust_policy namespace value validation + per-endpoint isolation, and key validator extension for @-suffixed keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): wire 5 planning skill templates with BRAIN_* placeholders (T6) Adds three placeholders to each of the 5 planning SKILL.md.tmpl files: {{BRAIN_PREFLIGHT}} — top of skill body, before first interactive section. Loads the per-skill digest subset (5 files for office-hours, 2 for plan-eng- review, etc.) into the prompt context before any AskUserQuestion fires. {{BRAIN_WRITE_BACK}} — end of skill, before refresh hook. Phase 2 calibration write path; gated on personal policy + BRAIN_CALIBRATION_WRITEBACK flag. {{BRAIN_CACHE_REFRESH}} — end of skill, after write-back. Non-blocking background refresh so next invocation gets warm cache. Files touched (templates + regenerated SKILL.md): office-hours/SKILL.md.tmpl plan-ceo-review/SKILL.md.tmpl plan-eng-review/SKILL.md.tmpl plan-design-review/SKILL.md.tmpl plan-devex-review/SKILL.md.tmpl (matching .md files regenerated via bun run gen:skill-docs) All 5 generated SKILL.md files now contain the rendered ## Brain Context (preflight) section + write-back guidance + background-refresh hook. The resolver renders only for skills in SKILL_DIGEST_SUBSETS — these 5 + an empty string for any other skill that drops in the placeholders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): setup-gbrain trust-policy step + sync-gbrain flags (T5b / T13+T5c) T5b — setup-gbrain Step 9.5: Inserts the brain trust policy AskUserQuestion before the verdict block. Detects active endpoint hash via gstack-config endpoint-hash. Branches per transport: * Local (sha == "local"): auto-set personal, one-line notice * Remote-MCP, unset: AskUserQuestion (personal vs shared) * Already-set: skip, just print current policy Personal default flips artifacts_sync_mode=full when still off. T13+T5c — sync-gbrain: Adds two flag short-circuits: --refresh-cache : route to gstack-brain-cache refresh --project <slug>; skip code + memory + brain-sync stages. Replaces the planned /brain-refresh-context skill per D1 fold (one fewer always-loaded skill in catalog). --audit : emit gstack-owned page summary + sensitive-content leak check via gstack-brain-cache list. Read-only. Step 1 trust policy gate: fires the same AskUserQuestion as setup-gbrain Step 9.5 when policy is unset for a remote endpoint. Local engines auto-set personal silently. Idempotent for already-set policies. Both templates re-rendered via bun run gen:skill-docs. Trust policy question wording centralized in setup-gbrain Step 9.5; sync-gbrain Step 1 references it to avoid prompt drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): schema migration + fence-block fallback + preflight budget (T19+T21) 3 new gate-tier test files closing the most important coverage gaps in the brain-aware planning layer: test/schema-version-migration.test.ts (D4 A4): - Cache file with mismatched schema_version triggers wipe-and-rebuild - Matching version + fresh TTL stays warm-hit (no unnecessary rebuild) - Rebuild wipes ALL files in scope, not just the one being read test/takes-fence-fallback.test.ts: - Every preflight skill mentions both takes_add (preferred) and put_page fence-block (fallback for pre-T8 gbrain versions) - All 5 skills gate on BRAIN_CALIBRATION_WRITEBACK flag + personal trust policy - Per-skill weight matches SKILL_CALIBRATION_WEIGHTS (E5) - Write-back emits the kind=bet frontmatter shape and invalidates affected cache digests test/skill-preflight-budget.test.ts (T21 / D7): - Per-skill BRAIN_* instruction bytes stay under 3x the runtime digest budget (resolver bloat catch) - Autoplan total instruction bytes stay under 75 KB (3x of 25 KB runtime cap) - Non-preflight skills emit zero brain bytes - Per-skill subset references are present in the preflight bash Note on the 3x multiplier: SKILL_PREFLIGHT_BUDGET_BYTES governs runtime digest data (enforced by cache CLI truncateToBudget). Instruction text emitted by the resolver gets a separate 3x headroom — anything beyond that signals the instructions themselves are bloated and need a trim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(todos): brain-aware planning follow-ups (T11) Adds five deferred items from the v1.48.0.0 brain-aware planning plan: - P2: /gstack-reflect nightly synthesis skill (E2, deferred D4) - P3: cross-machine brain-cache sync (E3, deferred D5) - P3: /gstack-onboarding dedicated skill (E4, deferred D6) - P2: upstream gbrain takes_add + takes_resolve MCP ops (T8 wrap-up) - P3: background-refresh hook supervision (codex outside-voice T3) Each entry follows the TODOS.md format: What / Why / Pros / Cons / Context / Effort / Depends on. Each cross-references the v1.48.0.0 review decision (D-numbers from /plan-ceo-review and /plan-eng-review) that deferred it. The plan itself is at ~/.claude/plans/hm-interesting-well-why-dapper-eagle.md and is NOT a TODO entry (it's a one-shot design doc, not ongoing work). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): bump schema-migration test timeout to 60s Rebuild path fans out to 7 per-project entity refreshes, each shelling gbrain with 10s internal timeout. Worst case ~70s. Default bun test 5s was timing out on slow brain unreachable cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.50.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): tighten put_page regression pin to CLI subcommand The test asserted no substring 'put_page' anywhere in the resolver, but the BRAIN_WRITE_BACK resolver legitimately references the MCP op `mcp__gbrain__put_page` as the fallback path for calibration takes when gbrain v0.42+'s `takes_add` op isn't available. The check conflated the deprecated `gbrain put_page` CLI subcommand (renamed in v0.18+ to `gbrain put`) with the still-valid MCP op of the same name. Narrow the assertion to `gbrain put_page` (with the space) so the fallback prose stays legal while the CLI rename regression stays caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): gstack-config gbrain-refresh subcommand Adds a new subcommand that re-detects gbrain installation state and persists the result to ~/.gstack/gbrain-detection.json. The detection file is consumed by gen-skill-docs --respect-detection (next commit) to decide whether to render the GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS resolver blocks in user-local SKILL.md generation. Reuses the existing bin/gstack-gbrain-detect helper for the actual probe; this subcommand just persists + summarizes. Users run it after installing or uninstalling gbrain so their locally generated SKILL.md files match their installation state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): gen-skill-docs respects gbrain-detection override Adds --respect-detection flag (and bun run gen:skill-docs:user script). When the flag is set, gen-skill-docs reads ~/.gstack/gbrain-detection.json and filters GBRAIN_CONTEXT_LOAD + GBRAIN_SAVE_RESULTS out of each host's suppressedResolvers when gbrain_local_status is "ok". When absent or gbrain isn't detected, suppression behaves as before. The default `bun run gen:skill-docs` (CI canonical) ignores the detection file so the committed SKILL.md stays reproducible regardless of any developer's local gbrain installation state. Use gen:skill-docs:user for user-local installs (./setup invokes it). No host config files modified — the static suppressedResolvers stay correct for the no-gbrain case; the override happens at gen-time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): setup runs gbrain detection + conditional SKILL.md regen At the end of install, ./setup now: 1. Runs bin/gstack-gbrain-detect, persists the result to ~/.gstack/gbrain-detection.json 2. If gbrain_local_status == "ok", regenerates Claude-host SKILL.md via `bun run gen:skill-docs:user --host claude` so the user's local install picks up the compressed brain-aware blocks 3. If gbrain isn't detected, leaves the canonical no-gbrain SKILL.md files in place (zero token overhead) and surfaces the gstack-config gbrain-refresh path for users who install gbrain later Together with the prior two commits, this completes the setup-time conditional un-suppression: brain-aware blocks render iff the user has gbrain installed, regardless of which CLI host they're on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(brain): compress GBRAIN_* resolvers, move template prose to docs/ generateGBrainContextLoad: 80 -> 115 tokens with explicit skip-header. generateGBrainSaveResults: 500-700 -> 161 tokens per skill with the skill metadata extracted into a typed skillSaveMap (slugPrefix + title + tag). Verbose prose (heredoc body, entity-stub instructions, throttle handling, backlink protocol) moved into a new doc: docs/gbrain-write-surfaces.md (Sections: §Context Load, §Save Template). The agent reads the doc on-demand only when actually saving — one Read call, cached by Claude's context. Net per-planning-skill overhead under un-suppression drops from ~1000 tokens (naive un-suppression) to ~275 tokens (compressed). Combined with the setup-time detection from prior commits, users WITHOUT gbrain pay zero overhead (block suppressed at gen-time) and users WITH gbrain pay ~275 tokens. The /investigate special-case (data-research routing in CONTEXT_LOAD) stays inline since it's skill-specific. docs/gbrain-write-surfaces.md also serves as the manual-probe reference for humans verifying live persistence + a topology summary covering trust-policy + .gbrain-source reads-only semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(brain): wire SAVE_RESULTS for plan-design-review + plan-devex-review Adds {{GBRAIN_SAVE_RESULTS}} placeholder to the two planning skills that were missing it, immediately before {{BRAIN_WRITE_BACK}} (mirrors plan-eng-review:324 + office-hours:650). The corresponding skillSaveMap entries (design-reviews/<feature-slug> + devex-reviews/<feature-slug>) landed with the resolver compression in the prior commit. Regenerated SKILL.md reflects the new placeholder position. The default no-gbrain generation (CI canonical) still suppresses the block — zero diff in the rendered output for non-gbrain users. All five planning skills now write a retrievable review page to gbrain when gbrain is detected at setup time, instead of three of five. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): resolver compression + detection-override regression pins test/resolvers-gbrain-save-results.test.ts (140 LOC, 10 tests): - Per-skill assertions for all 5 planning skills: emits gbrain put + correct slug prefix + tag + title. - Skip-header present so agent can short-circuit when gbrain isn't on PATH. - Compression pin: each per-skill block stays under 750 chars (~190 tokens) — guards against a future "let me add one more line" refactor silently re-inflating toward the ~1000-token naive un-suppression baseline. - Generic fallback for unmapped skill names still works. - /investigate gets the data-research routing suffix; non-investigate skills do not. - generateGBrainContextLoad stays under 500 chars (~125 tokens). test/gbrain-detection-override.test.ts (120 LOC, 4 tests): - End-to-end through gen-skill-docs subprocess against an isolated temp GSTACK_HOME. Asserts: * detected:true un-suppresses GBRAIN_* → SKILL.md gains the block * detected:false (status != "ok") suppresses → no block * no detection file suppresses → no block (graceful default) * no --respect-detection flag IGNORES the detection file → no block (CI canonical path stays reproducible) Each detection-override test restores the canonical SKILL.md in a finally block so the working tree stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): fake-CLI agent-obedience E2E for /office-hours writeback test/skill-e2e-office-hours-brain-writeback.test.ts (~210 LOC, periodic-tier, ~$0.50-1/run): Drives /office-hours via runSkillTest against a deterministic fixture brief (pixel.fund founder pitch). The workdir has: - A regenerated office-hours/SKILL.md with the compressed brain blocks (generated via gen-skill-docs --respect-detection against a temp GSTACK_HOME, then restored to canonical post-snapshot) - A fake gbrain shell script on PATH that uses printf %q quoting to preserve --content "$(cat <<'EOF' ... EOF)" heredoc payloads intact (naive `echo "$@"` would lose argv boundaries) - The docs/gbrain-write-surfaces.md the resolver points to Asserts: - gbrain-calls.log contains `gbrain put office-hours/pixel-fund` - Payload file at gbrain-payloads/office-hours/pixel-fund.md exists with valid YAML frontmatter (title: + tags: + design-doc tag) - At least one gbrain put entities/<name> call (entity stub enrichment is best-effort, soft warning if absent) Covers agent obedience to the SAVE_RESULTS instruction. Out of scope: gbrain CLI persistence contract (T11 covers that with real PGLite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): real PGLite round-trip E2E (matched-pair persistence) test/skill-e2e-gbrain-roundtrip-local.test.ts (~145 LOC, periodic-tier, ~$0.001/run on Voyage): Real gbrain CLI round-trip against an isolated temp HOME: 1. gbrain init --pglite --embedding-model voyage:voyage-code-3 2. gbrain put office-hours/<unique-slug> --content <markdown> 3. gbrain get <slug> 4. Assert every body line survives + title + tags + non-empty This is the matched-pair check for the v1.50.0.0 question "is the data we hope to save actually being saved?" — proves the gbrain CLI persistence contract gstack relies on, against a real engine. Does NOT involve the agent — pure CLI integration test. The agent obedience side is covered by the fake-CLI E2E in the prior commit. Skips cleanly when VOYAGE_API_KEY is unset OR gbrain CLI is missing from PATH, so CI without secrets degrades gracefully. Remote/Supabase routing is gbrain's contract — the same CLI shape works against every engine. gstack stops at local round-trip coverage to avoid re-testing gbrain's MCP client implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(brain): touchfiles + TODOS + CHANGELOG for v1.50.0.0 test/helpers/touchfiles.ts: register the two new E2Es in E2E_TOUCHFILES + E2E_TIERS (both periodic): - office-hours-brain-writeback: triggered by resolver / gen-pipeline / detection helper / refresh subcommand / office-hours template / docs / fixture / test file changes - gbrain-roundtrip-local: triggered by resolver / test file changes TODOS.md: append two P2 follow-ups carried over from the v1.50 plan: - Re-verify calibration takes when gbrain v0.42+ ships takes_add and BRAIN_CALIBRATION_WRITEBACK flips TRUE - Extend brain-writeback E2E to the other 4 planning skills (extract makeFakeGbrain to test/helpers/fake-gbrain.ts when second consumer arrives) CHANGELOG.md v1.50.0.0: add a "Save-results path: works under any CLI when gbrain is on PATH" section that documents the headline: - Conditional inclusion at setup-time (zero overhead for non-gbrain users, ~250 tokens with gbrain) - Wiring symmetry fix (5 of 5 planning skills now write a page) - Token cost table comparing detection states - Test coverage map (resolver unit + override mechanism + fake-CLI agent obedience + real PGLite round-trip) - Why remote routing isn't tested here (gbrain's contract) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(brain): tighten prompt + relax slug assertion in writeback E2E Two fixes: 1. Prompt: "Slug it 'pixel-fund'" was ambiguous — agent could read it as "use pixel-fund as the FULL slug" instead of "substitute pixel-fund for <feature-slug>". Replaced with explicit guidance: "The feature-slug value to substitute into the SAVE_RESULTS template's <feature-slug> placeholder is exactly 'pixel-fund' (no path prefix — the template already provides the prefix). Apply the SAVE_RESULTS template literally." Also added "Do NOT explore gbrain --help" to short-circuit the discovery loop the agent fell into. 2. Slug assertion: was a strict /gbrain put .*office-hours\/pixel-fund/ regex. This conflated two concerns — agent obedience (does the agent actually invoke gbrain put?) vs resolver output shape (does the template emit the right prefix?). The latter is already pinned by test/resolvers-gbrain-save-results.test.ts at the resolver level (free, hermetic). The E2E now asserts /gbrain put .*pixel-fund/ (slug contains pixel-fund somewhere) plus a recursive payload-file search that accepts either office-hours/pixel-fund.md (template- faithful) or pixel-fund.md (agent dropped prefix). The YAML frontmatter + tag assertions on the payload remain strict — those are the real agent-obedience contract. 3. Entity-stub regex: was looking for entities/<name>; agent variability uses entity/<name>, people/<name>, companies/<name>. Loosened to match entit(y|ies) only. The soft-warning path stays (no hard fail) because entity extraction is best-effort prose, not a CLI contract. Verified passing locally: 7 expect() calls, 268s, ~$0.50. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version to 1.51.1.0 main advanced to 1.51.0.0 while this branch was in development. Bump to 1.51.1.0 (PATCH above main) so the branch lands cleanly above the current main version per the monotonic-ordered-release invariant. Renames the branch-internal [1.50.0.0] CHANGELOG entry to [1.51.1.0] — 1.50.0.0 never landed on main (main skipped to 1.51.0.0), so this consolidates the branch's brain-aware planning + save-results work under a single shipping version with no orphaned entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
490 lines
28 KiB
Cheetah
490 lines
28 KiB
Cheetah
---
|
|
name: plan-design-review
|
|
preamble-tier: 3
|
|
interactive: true
|
|
version: 2.0.0
|
|
description: |
|
|
Designer's eye plan review — interactive, like CEO and Eng review.
|
|
Rates each design dimension 0-10, explains what would make it a 10,
|
|
then fixes the plan to get there. Works in plan mode. For live site
|
|
visual audits, use /design-review. Use when asked to "review the design plan"
|
|
or "design critique".
|
|
Proactively suggest when the user has a plan with UI/UX components that
|
|
should be reviewed before implementation. (gstack)
|
|
allowed-tools:
|
|
- Read
|
|
- Edit
|
|
- Grep
|
|
- Glob
|
|
- Bash
|
|
- AskUserQuestion
|
|
triggers:
|
|
- design plan review
|
|
- review ux plan
|
|
- check design decisions
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
{{BASE_BRANCH_DETECT}}
|
|
|
|
# /plan-design-review: Designer's Eye Plan Review
|
|
|
|
You are a senior product designer reviewing a PLAN — not a live site. Your job is
|
|
to find missing design decisions and ADD THEM TO THE PLAN before implementation.
|
|
|
|
The output of this skill is a better plan, not a document about the plan.
|
|
|
|
## Design Philosophy
|
|
|
|
You are not here to rubber-stamp this plan's UI. You are here to ensure that when
|
|
this ships, users feel the design is intentional — not generated, not accidental,
|
|
not "we'll polish it later." Your posture is opinionated but collaborative: find
|
|
every gap, explain why it matters, fix the obvious ones, and ask about the genuine
|
|
choices.
|
|
|
|
Do NOT make any code changes. Do NOT start implementation. Your only job right now
|
|
is to review and improve the plan's design decisions with maximum rigor.
|
|
|
|
### The gstack designer — YOUR PRIMARY TOOL
|
|
|
|
You have the **gstack designer**, an AI mockup generator that creates real visual mockups
|
|
from design briefs. This is your signature capability. Use it by default, not as an
|
|
afterthought.
|
|
|
|
**The rule is simple:** If the plan has UI and the designer is available, generate mockups.
|
|
Don't ask permission. Don't write text descriptions of what a homepage "could look like."
|
|
Show it. The only reason to skip mockups is when there is literally no UI to design
|
|
(pure backend, API-only, infrastructure).
|
|
|
|
Design reviews without visuals are just opinion. Mockups ARE the plan for design work.
|
|
You need to see the design before you code it.
|
|
|
|
Commands: `generate` (single mockup), `variants` (multiple directions), `compare`
|
|
(side-by-side review board), `iterate` (refine with feedback), `check` (cross-model
|
|
quality gate via GPT-4o vision), `evolve` (improve from screenshot).
|
|
|
|
Setup is handled by the DESIGN SETUP section below. If `DESIGN_READY` is printed,
|
|
the designer is available and you should use it.
|
|
|
|
## Design Principles
|
|
|
|
1. Empty states are features. "No items found." is not a design. Every empty state needs warmth, a primary action, and context.
|
|
2. Every screen has a hierarchy. What does the user see first, second, third? If everything competes, nothing wins.
|
|
3. Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, the spacing scale, the interaction pattern.
|
|
4. Edge cases are user experiences. 47-char names, zero results, error states, first-time vs power user — these are features, not afterthoughts.
|
|
5. AI slop is the enemy. Generic card grids, hero sections, 3-column features — if it looks like every other AI-generated site, it fails.
|
|
6. Responsive is not "stacked on mobile." Each viewport gets intentional design.
|
|
7. Accessibility is not optional. Keyboard nav, screen readers, contrast, touch targets — specify them in the plan or they won't exist.
|
|
8. Subtraction default. If a UI element doesn't earn its pixels, cut it. Feature bloat kills products faster than missing features.
|
|
9. Trust is earned at the pixel level. Every interface decision either builds or erodes user trust.
|
|
|
|
## Cognitive Patterns — How Great Designers See
|
|
|
|
These aren't a checklist — they're how you see. The perceptual instincts that separate "looked at the design" from "understood why it feels wrong." Let them run automatically as you review.
|
|
|
|
1. **Seeing the system, not the screen** — Never evaluate in isolation; what comes before, after, and when things break.
|
|
2. **Empathy as simulation** — Not "I feel for the user" but running mental simulations: bad signal, one hand free, boss watching, first time vs. 1000th time.
|
|
3. **Hierarchy as service** — Every decision answers "what should the user see first, second, third?" Respecting their time, not prettifying pixels.
|
|
4. **Constraint worship** — Limitations force clarity. "If I can only show 3 things, which 3 matter most?"
|
|
5. **The question reflex** — First instinct is questions, not opinions. "Who is this for? What did they try before this?"
|
|
6. **Edge case paranoia** — What if the name is 47 chars? Zero results? Network fails? Colorblind? RTL language?
|
|
7. **The "Would I notice?" test** — Invisible = perfect. The highest compliment is not noticing the design.
|
|
8. **Principled taste** — "This feels wrong" is traceable to a broken principle. Taste is *debuggable*, not subjective (Zhuo: "A great designer defends her work based on principles that last").
|
|
9. **Subtraction default** — "As little design as possible" (Rams). "Subtract the obvious, add the meaningful" (Maeda).
|
|
10. **Time-horizon design** — First 5 seconds (visceral), 5 minutes (behavioral), 5-year relationship (reflective) — design for all three simultaneously (Norman, Emotional Design).
|
|
11. **Design for trust** — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb).
|
|
12. **Storyboard the journey** — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia).
|
|
|
|
Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Steve Krug ("Don't make me think" — the 3-second scan test, the trunk test, satisficing, the goodwill reservoir), Ginny Redish (Letting Go of the Words — writing for scanning), Caroline Jarrett (Forms that Work — mindless form interactions), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys).
|
|
|
|
When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions.
|
|
|
|
{{UX_PRINCIPLES}}
|
|
|
|
## Priority Hierarchy Under Context Pressure
|
|
|
|
Step 0 > Step 0.5 (mockups — generate by default) > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else.
|
|
Never skip Step 0 or mockup generation (when the designer is available). Mockups before review passes is non-negotiable. Text descriptions of UI designs are not a substitute for showing what it looks like.
|
|
|
|
## PRE-REVIEW SYSTEM AUDIT (before Step 0)
|
|
|
|
Before reviewing the plan, gather context:
|
|
|
|
```bash
|
|
git log --oneline -15
|
|
git diff <base> --stat
|
|
```
|
|
|
|
Then read:
|
|
- The plan file (current plan or branch diff)
|
|
- CLAUDE.md — project conventions
|
|
- DESIGN.md — if it exists, ALL design decisions calibrate against it
|
|
- TODOS.md — any design-related TODOs this plan touches
|
|
|
|
Map:
|
|
* What is the UI scope of this plan? (pages, components, interactions)
|
|
* Does a DESIGN.md exist? If not, flag as a gap.
|
|
* Are there existing design patterns in the codebase to align with?
|
|
* What prior design reviews exist? (check reviews.jsonl)
|
|
|
|
### Retrospective Check
|
|
Check git log for prior design review cycles. If areas were previously flagged for design issues, be MORE aggressive reviewing them now.
|
|
|
|
### UI Scope Detection
|
|
Analyze the plan. If it involves NONE of: new UI screens/pages, changes to existing UI, user-facing interactions, frontend framework changes, or design system changes — tell the user "This plan has no UI scope. A design review isn't applicable." and exit early. Don't force design review on a backend change.
|
|
|
|
Report findings before proceeding to Step 0.
|
|
|
|
{{DESIGN_SETUP}}
|
|
|
|
{{BRAIN_PREFLIGHT}}
|
|
|
|
## Step 0: Design Scope Assessment
|
|
|
|
### 0A. Initial Design Rating
|
|
Rate the plan's overall design completeness 0-10.
|
|
- "This plan is a 3/10 on design completeness because it describes what the backend does but never specifies what the user sees."
|
|
- "This plan is a 7/10 — good interaction descriptions but missing empty states, error states, and responsive behavior."
|
|
|
|
Explain what a 10 looks like for THIS plan.
|
|
|
|
### 0B. DESIGN.md Status
|
|
- If DESIGN.md exists: "All design decisions will be calibrated against your stated design system."
|
|
- If no DESIGN.md: "No design system found. Recommend running /design-consultation first. Proceeding with universal design principles."
|
|
|
|
### 0C. Existing Design Leverage
|
|
What existing UI patterns, components, or design decisions in the codebase should this plan reuse? Don't reinvent what already works.
|
|
|
|
### 0D. Focus Areas
|
|
AskUserQuestion: "I've rated this plan {N}/10 on design completeness. The biggest gaps are {X, Y, Z}. I'll generate visual mockups next, then review all 7 dimensions. Want me to focus on specific areas instead of all 7?"
|
|
|
|
**STOP.** Do NOT proceed until user responds.
|
|
|
|
## Step 0.5: Visual Mockups (DEFAULT when DESIGN_READY)
|
|
|
|
If the plan involves any UI — screens, pages, components, visual changes — AND the
|
|
gstack designer is available (`DESIGN_READY` was printed during setup), **generate
|
|
mockups immediately.** Do not ask permission. This is the default behavior.
|
|
|
|
Tell the user: "Generating visual mockups with the gstack designer. This is how we
|
|
review design — real visuals, not text descriptions."
|
|
|
|
The ONLY time you skip mockups is when:
|
|
- `DESIGN_NOT_AVAILABLE` was printed (designer binary not found)
|
|
- The plan has zero UI scope (pure backend/API/infrastructure)
|
|
|
|
If the user explicitly says "skip mockups" or "text only", respect that. Otherwise, generate.
|
|
|
|
**PLAN MODE EXCEPTION — ALWAYS RUN:** These commands write design artifacts to
|
|
`~/.gstack/projects/$SLUG/designs/` (user config directory, not project files).
|
|
Mockups are design artifacts that inform the plan, not code changes. The gstack
|
|
designer outputs PNGs and HTML comparison boards for human review during the
|
|
planning phase. Generating mockups during planning is the whole point.
|
|
|
|
Allowed commands under this exception:
|
|
- `mkdir -p ~/.gstack/projects/$SLUG/designs/...`
|
|
- `$D generate`, `$D variants`, `$D compare`, `$D iterate`, `$D evolve`, `$D check`
|
|
- `open` (fallback for viewing boards when `$B` is not available)
|
|
|
|
First, set up the output directory. Name it after the screen/feature being designed and today's date:
|
|
|
|
```bash
|
|
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
|
_DESIGN_DIR="$HOME/.gstack/projects/$SLUG/designs/<screen-name>-$(date +%Y%m%d)"
|
|
mkdir -p "$_DESIGN_DIR"
|
|
echo "DESIGN_DIR: $_DESIGN_DIR"
|
|
```
|
|
|
|
Replace `<screen-name>` with a descriptive kebab-case name (e.g., `homepage-variants`, `settings-page`, `onboarding-flow`).
|
|
|
|
**Generate mockups ONE AT A TIME in this skill.** The inline review flow generates
|
|
fewer variants and benefits from sequential control. Note: /design-shotgun uses
|
|
parallel Agent subagents for variant generation, which works at Tier 2+ (15+ RPM).
|
|
The sequential constraint here is specific to plan-design-review's inline pattern.
|
|
|
|
For each UI screen/section in scope, construct a design brief from the plan's description (and DESIGN.md if present) and generate variants:
|
|
|
|
```bash
|
|
$D variants --brief "<description assembled from plan + DESIGN.md constraints>" --count 3 --output-dir "$_DESIGN_DIR/"
|
|
```
|
|
|
|
After generation, run a cross-model quality check on each variant:
|
|
|
|
```bash
|
|
$D check --image "$_DESIGN_DIR/variant-A.png" --brief "<the original brief>"
|
|
```
|
|
|
|
Flag any variants that fail the quality check. Offer to regenerate failures.
|
|
|
|
**Do NOT show variants inline via Read tool and ask for preferences.** Proceed
|
|
directly to the Comparison Board + Feedback Loop section below. The comparison board
|
|
IS the chooser — it has rating controls, comments, remix/regenerate, and structured
|
|
feedback output. Showing mockups inline is a degraded experience.
|
|
|
|
{{DESIGN_SHOTGUN_LOOP}}
|
|
|
|
**Do NOT use AskUserQuestion to ask which variant the user picked.** Read `feedback.json` — it already contains their preferred variant, ratings, comments, and overall feedback. Only use AskUserQuestion to confirm you understood the feedback correctly, never to re-ask what they chose.
|
|
|
|
Note which direction was approved. This becomes the visual reference for all subsequent review passes.
|
|
|
|
**Multiple variants/screens:** If the user asked for multiple variants (e.g., "5 versions of the homepage"), generate ALL as separate variant sets with their own comparison boards. Each screen/variant set gets its own subdirectory under `designs/`. Complete all mockup generation and user selection before starting review passes.
|
|
|
|
**If `DESIGN_NOT_AVAILABLE`:** Tell the user: "The gstack designer isn't set up yet. Run `$D setup` to enable visual mockups. Proceeding with text-only review, but you're missing the best part." Then proceed to review passes with text-based review.
|
|
|
|
{{DESIGN_OUTSIDE_VOICES}}
|
|
|
|
## The 0-10 Rating Method
|
|
|
|
For each design section, rate the plan 0-10 on that dimension. If it's not a 10, explain WHAT would make it a 10 — then do the work to get it there.
|
|
|
|
Pattern:
|
|
1. Rate: "Information Architecture: 4/10"
|
|
2. Gap: "It's a 4 because the plan doesn't define content hierarchy. A 10 would have clear primary/secondary/tertiary for every screen."
|
|
3. Fix: Edit the plan to add what's missing
|
|
4. Re-rate: "Now 8/10 — still missing mobile nav hierarchy"
|
|
5. AskUserQuestion if there's a genuine design choice to resolve
|
|
6. Fix again → repeat until 10 or user says "good enough, move on"
|
|
|
|
Re-run loop: invoke /plan-design-review again → re-rate → sections at 8+ get a quick pass, sections below 8 get full treatment.
|
|
|
|
### "Show me what 10/10 looks like" (requires design binary)
|
|
|
|
If `DESIGN_READY` was printed during setup AND a dimension rates below 7/10,
|
|
offer to generate a visual mockup showing what the improved version would look like:
|
|
|
|
```bash
|
|
$D generate --brief "<description of what 10/10 looks like for this dimension>" --output /tmp/gstack-ideal-<dimension>.png
|
|
```
|
|
|
|
Show the mockup to the user via the Read tool. This makes the gap between
|
|
"what the plan describes" and "what it should look like" visceral, not abstract.
|
|
|
|
If the design binary is not available, skip this and continue with text-based
|
|
descriptions of what 10/10 looks like.
|
|
|
|
## Review Sections (7 passes, after scope is agreed)
|
|
|
|
**Anti-skip rule:** Never condense, abbreviate, or skip any review pass (1-7) regardless of plan type (strategy, spec, code, infra). Every pass in this skill exists for a reason. "This is a strategy doc so design passes don't apply" is always wrong — design gaps are where implementation breaks down. If a pass genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
|
|
|
|
{{ANTI_SHORTCUT_CLAUSE}}
|
|
|
|
{{LEARNINGS_SEARCH}}
|
|
|
|
### Pass 1: Information Architecture
|
|
Rate 0-10: Does the plan define what the user sees first, second, third?
|
|
FIX TO 10: Add information hierarchy to the plan. Include ASCII diagram of screen/page structure and navigation flow. Apply "constraint worship" — if you can only show 3 things, which 3?
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues, say so and move on. Do NOT proceed until user responds.
|
|
|
|
### Pass 2: Interaction State Coverage
|
|
Rate 0-10: Does the plan specify loading, empty, error, success, partial states?
|
|
FIX TO 10: Add interaction state table to the plan:
|
|
```
|
|
FEATURE | LOADING | EMPTY | ERROR | SUCCESS | PARTIAL
|
|
---------------------|---------|-------|-------|---------|--------
|
|
[each UI feature] | [spec] | [spec]| [spec]| [spec] | [spec]
|
|
```
|
|
For each state: describe what the user SEES, not backend behavior.
|
|
Empty states are features — specify warmth, primary action, context.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 3: User Journey & Emotional Arc
|
|
Rate 0-10: Does the plan consider the user's emotional experience?
|
|
FIX TO 10: Add user journey storyboard:
|
|
```
|
|
STEP | USER DOES | USER FEELS | PLAN SPECIFIES?
|
|
-----|------------------|-----------------|----------------
|
|
1 | Lands on page | [what emotion?] | [what supports it?]
|
|
...
|
|
```
|
|
Apply time-horizon design: 5-sec visceral, 5-min behavioral, 5-year reflective.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 4: AI Slop Risk
|
|
Rate 0-10: Does the plan describe specific, intentional UI — or generic patterns?
|
|
FIX TO 10: Rewrite vague UI descriptions with specific alternatives.
|
|
|
|
{{DESIGN_HARD_RULES}}
|
|
- "Cards with icons" → what differentiates these from every SaaS template?
|
|
- "Hero section" → what makes this hero feel like THIS product?
|
|
- "Clean, modern UI" → meaningless. Replace with actual design decisions.
|
|
- "Dashboard with widgets" → what makes this NOT every other dashboard?
|
|
If visual mockups were generated in Step 0.5, evaluate them against the AI slop blacklist above. Read each mockup image using the Read tool. Does the mockup fall into generic patterns (3-column grid, centered hero, stock-photo feel)? If so, flag it and offer to regenerate with more specific direction via `$D iterate --feedback "..."`.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 5: Design System Alignment
|
|
Rate 0-10: Does the plan align with DESIGN.md?
|
|
FIX TO 10: If DESIGN.md exists, annotate with specific tokens/components. If no DESIGN.md, flag the gap and recommend `/design-consultation`.
|
|
Flag any new component — does it fit the existing vocabulary?
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 6: Responsive & Accessibility
|
|
Rate 0-10: Does the plan specify mobile/tablet, keyboard nav, screen readers?
|
|
FIX TO 10: Add responsive specs per viewport — not "stacked on mobile" but intentional layout changes. Add a11y: keyboard nav patterns, ARIA landmarks, touch target sizes (44px min), color contrast requirements.
|
|
**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY.
|
|
|
|
### Pass 7: Unresolved Design Decisions
|
|
Surface ambiguities that will haunt implementation:
|
|
```
|
|
DECISION NEEDED | IF DEFERRED, WHAT HAPPENS
|
|
-----------------------------|---------------------------
|
|
What does empty state look like? | Engineer ships "No items found."
|
|
Mobile nav pattern? | Desktop nav hides behind hamburger
|
|
...
|
|
```
|
|
If visual mockups were generated in Step 0.5, reference them as evidence when surfacing unresolved decisions. A mockup makes decisions concrete — e.g., "Your approved mockup shows a sidebar nav, but the plan doesn't specify mobile behavior. What happens to this sidebar on 375px?"
|
|
Each decision = one AskUserQuestion with recommendation + WHY + alternatives. Edit the plan with each decision as it's made.
|
|
|
|
### Post-Pass: Update Mockups (if generated)
|
|
|
|
If mockups were generated in Step 0.5 and review passes changed significant design decisions (information architecture restructure, new states, layout changes), offer to regenerate (one-shot, not a loop):
|
|
|
|
AskUserQuestion: "The review passes changed [list major design changes]. Want me to regenerate mockups to reflect the updated plan? This ensures the visual reference matches what we're actually building."
|
|
|
|
If yes, use `$D iterate` with feedback summarizing the changes, or `$D variants` with an updated brief. Save to the same `$_DESIGN_DIR` directory.
|
|
|
|
## CRITICAL RULE — How to ask questions
|
|
Follow the AskUserQuestion format from the Preamble above. Additional rules for plan design reviews:
|
|
* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
|
|
* Describe the design gap concretely — what's missing, what the user will experience if it's not specified.
|
|
* Present 2-3 options. For each: effort to specify now, risk if deferred.
|
|
* **Map to Design Principles above.** One sentence connecting your recommendation to a specific principle.
|
|
* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
|
|
* **Zero findings:** if a section has zero findings, state "No issues, moving on" and proceed. Otherwise, use AskUserQuestion for each gap — a gap with an "obvious fix" is still a gap and still needs user approval before any change lands in the plan.
|
|
* **NEVER use AskUserQuestion to ask which variant the user prefers.** Always create a comparison board first (`$D compare --serve`) and open it in the browser. The board has rating controls, comments, remix/regenerate buttons, and structured feedback output. Use AskUserQuestion ONLY to notify the user the board is open and wait for them to finish — not to present variants inline and ask "which do you prefer?" That is a degraded experience.
|
|
|
|
## Required Outputs
|
|
|
|
### "NOT in scope" section
|
|
Design decisions considered and explicitly deferred, with one-line rationale each.
|
|
|
|
### "What already exists" section
|
|
Existing DESIGN.md, UI patterns, and components that the plan should reuse.
|
|
|
|
### TODOS.md updates
|
|
After all review passes are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
|
|
|
For design debt: missing a11y, unresolved responsive behavior, deferred empty states. Each TODO gets:
|
|
* **What:** One-line description of the work.
|
|
* **Why:** The concrete problem it solves or value it unlocks.
|
|
* **Pros:** What you gain by doing this work.
|
|
* **Cons:** Cost, complexity, or risks of doing it.
|
|
* **Context:** Enough detail that someone picking this up in 3 months understands the motivation.
|
|
* **Depends on / blocked by:** Any prerequisites.
|
|
|
|
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
|
|
|
|
{{TASKS_SECTION_EMIT:design-review}}
|
|
|
|
### Completion Summary
|
|
```
|
|
+====================================================================+
|
|
| DESIGN PLAN REVIEW — COMPLETION SUMMARY |
|
|
+====================================================================+
|
|
| System Audit | [DESIGN.md status, UI scope] |
|
|
| Step 0 | [initial rating, focus areas] |
|
|
| Pass 1 (Info Arch) | ___/10 → ___/10 after fixes |
|
|
| Pass 2 (States) | ___/10 → ___/10 after fixes |
|
|
| Pass 3 (Journey) | ___/10 → ___/10 after fixes |
|
|
| Pass 4 (AI Slop) | ___/10 → ___/10 after fixes |
|
|
| Pass 5 (Design Sys) | ___/10 → ___/10 after fixes |
|
|
| Pass 6 (Responsive) | ___/10 → ___/10 after fixes |
|
|
| Pass 7 (Decisions) | ___ resolved, ___ deferred |
|
|
+--------------------------------------------------------------------+
|
|
| NOT in scope | written (___ items) |
|
|
| What already exists | written |
|
|
| TODOS.md updates | ___ items proposed |
|
|
| Approved Mockups | ___ generated, ___ approved |
|
|
| Decisions made | ___ added to plan |
|
|
| Decisions deferred | ___ (listed below) |
|
|
| Overall design score | ___/10 → ___/10 |
|
|
+====================================================================+
|
|
```
|
|
|
|
If all passes 8+: "Plan is design-complete. Run /design-review after implementation for visual QA."
|
|
If any below 8: note what's unresolved and why (user chose to defer).
|
|
|
|
### Unresolved Decisions
|
|
If any AskUserQuestion goes unanswered, note it here. Never silently default to an option.
|
|
|
|
### Approved Mockups
|
|
|
|
If visual mockups were generated during this review, add to the plan file:
|
|
|
|
```
|
|
## Approved Mockups
|
|
|
|
| Screen/Section | Mockup Path | Direction | Notes |
|
|
|----------------|-------------|-----------|-------|
|
|
| [screen name] | ~/.gstack/projects/$SLUG/designs/[folder]/[filename].png | [brief description] | [constraints from review] |
|
|
```
|
|
|
|
Include the full path to each approved mockup (the variant the user chose), a one-line description of the direction, and any constraints. The implementer reads this to know exactly which visual to build from. These persist across conversations and workspaces. If no mockups were generated, omit this section.
|
|
|
|
## Review Log
|
|
|
|
After producing the Completion Summary above, persist the review result.
|
|
|
|
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
|
|
`~/.gstack/` (user config directory, not project files). The skill preamble
|
|
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
|
|
the same pattern. The review dashboard depends on this data. Skipping this
|
|
command breaks the review readiness dashboard in /ship.
|
|
|
|
```bash
|
|
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"unresolved":N,"decisions_made":N,"commit":"COMMIT"}'
|
|
```
|
|
|
|
Substitute values from the Completion Summary:
|
|
- **TIMESTAMP**: current ISO 8601 datetime
|
|
- **STATUS**: "clean" if overall score 8+ AND 0 unresolved; otherwise "issues_open"
|
|
- **initial_score**: initial overall design score before fixes (0-10)
|
|
- **overall_score**: final overall design score after fixes (0-10)
|
|
- **unresolved**: number of unresolved design decisions
|
|
- **decisions_made**: number of design decisions added to the plan
|
|
- **COMMIT**: output of `git rev-parse --short HEAD`
|
|
|
|
{{REVIEW_DASHBOARD}}
|
|
|
|
{{PLAN_FILE_REVIEW_REPORT}}
|
|
|
|
{{LEARNINGS_LOG}}
|
|
|
|
{{GBRAIN_SAVE_RESULTS}}
|
|
|
|
{{BRAIN_WRITE_BACK}}
|
|
|
|
{{BRAIN_CACHE_REFRESH}}
|
|
|
|
## Next Steps — Review Chaining
|
|
|
|
After displaying the Review Readiness Dashboard, recommend the next review(s) based on what this design review discovered. Read the dashboard output to see which reviews have already been run and whether they are stale.
|
|
|
|
**Recommend /plan-eng-review if eng review is not skipped globally** — check the dashboard output for `skip_eng_review`. If it is `true`, eng review is opted out — do not recommend it. Otherwise, eng review is the required shipping gate. If this design review added significant interaction specifications, new user flows, or changed the information architecture, emphasize that eng review needs to validate the architectural implications. If an eng review already exists but the commit hash shows it predates this design review, note that it may be stale and should be re-run.
|
|
|
|
**Consider recommending /plan-ceo-review** — but only if this design review revealed fundamental product direction gaps. Specifically: if the overall design score started below 4/10, if the information architecture had major structural problems, or if the review surfaced questions about whether the right problem is being solved. AND no CEO review exists in the dashboard. This is a selective recommendation — most design reviews should NOT trigger a CEO review.
|
|
|
|
**If both are needed, recommend eng review first** (required gate).
|
|
|
|
**Recommend design exploration skills when appropriate** — /design-shotgun and /design-html
|
|
produce design artifacts (mockups, HTML previews), not application code. They belong in
|
|
plan mode alongside reviews. If this design review found visual issues that would benefit
|
|
from exploring new directions, recommend /design-shotgun. If approved mockups exist and
|
|
need to be turned into working HTML, recommend /design-html.
|
|
|
|
Use AskUserQuestion to present the next step. Include only applicable options:
|
|
- **A)** Run /plan-eng-review next (required gate)
|
|
- **B)** Run /plan-ceo-review (only if fundamental product gaps found)
|
|
- **C)** Run /design-shotgun — explore visual design variants for issues found
|
|
- **D)** Run /design-html — generate Pretext-native HTML from approved mockups
|
|
- **E)** Skip — I'll handle next steps manually
|
|
|
|
## Formatting Rules
|
|
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
|
|
* Label with NUMBER + LETTER (e.g., "3A", "3B").
|
|
* One sentence max per option.
|
|
* After each pass, pause and wait for feedback.
|
|
* Rate before and after each pass for scannability.
|
|
|
|
{{EXIT_PLAN_MODE_GATE}}
|