Files
gstack/plan-eng-review/SKILL.md.tmpl
T
Garry Tan 070722ace3 v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742)
* feat(brain): brain-cache-spec.ts — single source of truth for cache layer

Foundation for the brain-aware planning skills work (v1.48 plan / D2).
One TS const file consolidates BRAIN_CACHE_ENTITIES (8 entities × TTL +
budget + invalidation rules), SKILL_DIGEST_SUBSETS (per-skill which
files to load), SALIENCE_DEFAULT_ALLOWLIST (D9 privacy gate),
SKILL_CALIBRATION_WEIGHTS (Phase 2 E5), and policy / identity / schema
constants.

Drift between docs and runtime becomes impossible by construction:
resolver, cache CLI, and test/skill-preflight-budget.test.ts all import
from the same module.

test/brain-cache-spec.test.ts: 19 invariant assertions (subset/entity
consistency, per-skill achievability, allowlist sanity, transport
defaults, user-slug fallback chain, lock timeout, retention policy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-core@1.0.0 schema pack (T1 / Phase 0)

Defines 8 typed page kinds for the brain entity model:
  gstack/user-profile, gstack/product, gstack/goal,
  gstack/developer-persona, gstack/brand, gstack/competitive-intel,
  gstack/skill-run, gstack/take

Each declares frontmatter shape (typed fields with required/optional flags),
retention policy (immutable / archive-after-90d / never-archive), and
emits_links graph for mcp__gbrain__schema_graph rendering.

getSchemaPackMutationPayload() returns JSON in the shape accepted by
mcp__gbrain__schema_apply_mutations. Idempotent registration: gbrain
skips when pack+version already installed.

test/gstack-schema-pack.test.ts: 16 invariants on pack shape, retention
policies, link verb consistency, JSON serializability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-brain-cache CLI (T2a) — core subcommands

bin/gstack-brain-cache: TS CLI with five subcommands:
  get <entity-name> [--project <slug>]
  refresh [--full] [--entity X] [--project <slug>]
  invalidate <entity-name> [--project <slug>]
  digest <entity-slug>
  meta [--project <slug>]

Cache layout per Phase 0.5 design:
  ~/.gstack/brain-cache/                 ← cross-project (user-profile)
  ~/.gstack/projects/<slug>/brain-cache/ ← per-project (everything else)

Per-entity TTL drives staleness; per-entity byte budgets enforce
compression at write time. Atomic writes via tmp+rename. Stale-but-usable
fallback when brain unreachable (returns cached digest with diagnostic
prefix instead of failing). Schema-version mismatch + endpoint switch
both trigger full rebuild for the affected scope (D4 A4).

Fetch+compress paths wired for the 7 entities (user-profile, product,
goals, developer-persona, brand, competitive-intel, recent-decisions,
salience) via gbrain CLI shell-out — works for local PGLite and
local-stdio MCP, transparent over the existing spawnGbrain helper.

Concurrent-refresh dedup (D3 / T15) is a follow-up commit. Salience
allowlist gate (D9 / T17) is a follow-up commit. Bootstrap + lifecycle
subcommands (T2b / T18) are follow-up commits.

test/brain-cache-roundtrip.test.ts: 11 tests covering path resolution,
meta lifecycle, endpoint detection, schema mismatch behavior, and the
four cache states (warm / cold-refreshed / stale-fallback / missing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): concurrent-refresh lockfile dedup (T15 / D3)

When autoplan dispatches 4 planning skills back-to-back and they all hit
a cold-miss on the same digest, only ONE actually fetches from the brain.
The rest dedup via the project-scoped lockfile at
~/.gstack/projects/<slug>/brain-cache/.refresh.lock.

Reuses the 5-min stale-takeover convention from /sync-gbrain. Lock is
taken over when:
  - File is older than CACHE_REFRESH_LOCK_TIMEOUT_MS
  - PID is on the same host and dead (process.kill(pid, 0) fails)
  - Lock file is corrupt (defensive)

withRefreshLock(projectSlug, fn) returns either the callback's value or
the literal 'dedup'. The CLI emits exit code 3 + diagnostic stderr on
dedup, so callers can choose to wait + retry (resolver does this) or
fall through to stale-but-usable behavior.

test/cache-concurrent-refresh.test.ts: 7 tests covering acquire/release,
stale-takeover, dead-PID takeover, corrupt-lock recovery, error-path
release, and cross-project lock location.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): salience privacy allowlist gate (T17 / D9)

D9 cross-model finding from codex outside voice: salience-sourced digests
can include emotionally-weighted personal pages (family, therapy,
reflection). Pulling those into a coding-review prompt leaks sensitive
context into work-flow reasoning.

fetchSalience now strips entries whose slugs don't match an allowlist
prefix BEFORE writing to the cache file. Default allowlist is
SALIENCE_DEFAULT_ALLOWLIST = ['projects/', 'concepts/', 'gstack/'].
User can extend via:
  gstack-config set salience_allowlist 'projects/,gstack/,concepts/,custom/'
or override with GSTACK_SALIENCE_ALLOWLIST env var.

Digest still records the strip count for transparency. Empty result
emits 'all N entries stripped' note rather than silent absence.

test/salience-allowlist.test.ts: 9 tests covering default permits,
default blocks, empty allowlist, env override, whitespace trimming,
and the invariant that defaults contain nothing sensitive (personal,
family, therapy, reflection, private, medical, health).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): bootstrap + list + purge subcommands (T2b / T18)

T2b — bootstrap synthesizes draft entity content from CLAUDE.md + README
+ recent learnings.jsonl and emits as JSON for the caller. Skill template
is responsible for the AUQ-confirm-before-write flow (D10 T4 extraction-
review requirement). Cli stays pure (no AUQ logic); agent owns user
interaction.

T18 — list/purge subcommands close the lifecycle loop:
  list [--project <slug>] — enumerate gstack-owned pages in brain
                            (probe all 8 gstack/* page types)
  purge <slug>           — delete one gstack page, refuses non-gstack/
                            slugs (defensive)

list defaults to all-projects (cross-project user-profile included).
With --project, filters to per-project pages plus the cross-project
user-profile. --json flag emits machine-readable output for the agent.

Retention sweep + audit subcommand are deferred to a follow-up commit
(they need the lifecycle scheduling design, not just CLI plumbing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): brain-aware planning resolvers + 3 new placeholders (T4)

scripts/resolvers/gbrain.ts adds:
  - generateBrainPreflight(ctx)       — emits per-skill ## Brain Context
                                        block + bash that loads digests via
                                        gstack-brain-cache get (one call per
                                        digest). Per-skill subset comes from
                                        SKILL_DIGEST_SUBSETS (single source).
  - generateBrainCacheRefresh(ctx)    — at-skill-end background refresh hook;
                                        non-blocking; warms cache for next run.
  - generateBrainWriteBack(ctx)       — Phase 2 / E5 calibration write-back
                                        with per-skill weight. Gated on
                                        personal trust policy + the
                                        BRAIN_CALIBRATION_WRITEBACK flag.
                                        Includes invalidation bash that busts
                                        affected digests after the write.

scripts/resolvers/index.ts registers three new placeholders:
  {{BRAIN_PREFLIGHT}}, {{BRAIN_CACHE_REFRESH}}, {{BRAIN_WRITE_BACK}}

All three resolvers return empty string for skills not in
SKILL_DIGEST_SUBSETS (defensive — skill template authors can drop the
placeholders into non-preflight skills with zero effect).

D9 privacy is mentioned in the rendered preflight prose so the agent
knows to expect filtered salience.
D11 codex tension: write-back gates on brain_trust_policy@<hash> being
personal — shared brains skip write-back to avoid polluting team
calibration profile.

test/brain-preflight.test.ts: 19 tests covering subset rendering,
non-preflight skill gating, cross-project vs per-project --project flag
emission, weight injection per skill, BRAIN_CALIBRATION_WRITEBACK flag
mention, and registration in RESOLVERS map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-config brain integration helpers (T5+T10+T16)

Extends bin/gstack-config to support the brain-aware planning layer:

KEY VALIDATION (T5):
  Plain alphanumeric/underscore now extended to allow @<hex-hash> suffix.
  Required for per-endpoint namespaced keys (brain_trust_policy@<sha8>,
  user_slug_at_<sha8>). Keys without the suffix still validate as before.

VALUE WHITELISTING (D4 / D11):
  brain_trust_policy@* values gated to personal | shared | unset.
  Unknown values warn + default to unset (defense against typos).

NEW DEFAULTS (lookup_default):
  brain_trust_policy@*  -> unset
  salience_allowlist    -> '' (resolver uses SALIENCE_DEFAULT_ALLOWLIST)
  user_slug_at_*        -> '' (resolve-user-slug fills + persists on demand)

NEW SUBCOMMANDS:
  endpoint-hash      — print sha8 of active gbrain MCP URL from
                       ~/.claude.json. Collision check escalates to sha16
                       when a prior endpoint stored at the same sha8
                       would conflict (T10 defensive default).
  resolve-user-slug  — walks D4 A3 identity chain:
                         1. mcp__gbrain__whoami.client_name
                         2. $USER env var
                         3. sha8(git config user.email)
                         4. anonymous-<sha8(hostname)>
                       Persists result on first call so subsequent
                       calls are stable across sessions.

test/user-slug-fallback.test.ts: 14 tests covering endpoint-hash output
shape, fallback chain ordering, persistence, brain_trust_policy
namespace value validation + per-endpoint isolation, and key validator
extension for @-suffixed keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): wire 5 planning skill templates with BRAIN_* placeholders (T6)

Adds three placeholders to each of the 5 planning SKILL.md.tmpl files:
  {{BRAIN_PREFLIGHT}}     — top of skill body, before first interactive
                            section. Loads the per-skill digest subset
                            (5 files for office-hours, 2 for plan-eng-
                            review, etc.) into the prompt context before
                            any AskUserQuestion fires.
  {{BRAIN_WRITE_BACK}}    — end of skill, before refresh hook. Phase 2
                            calibration write path; gated on personal
                            policy + BRAIN_CALIBRATION_WRITEBACK flag.
  {{BRAIN_CACHE_REFRESH}} — end of skill, after write-back. Non-blocking
                            background refresh so next invocation gets
                            warm cache.

Files touched (templates + regenerated SKILL.md):
  office-hours/SKILL.md.tmpl
  plan-ceo-review/SKILL.md.tmpl
  plan-eng-review/SKILL.md.tmpl
  plan-design-review/SKILL.md.tmpl
  plan-devex-review/SKILL.md.tmpl
  (matching .md files regenerated via bun run gen:skill-docs)

All 5 generated SKILL.md files now contain the rendered ## Brain Context
(preflight) section + write-back guidance + background-refresh hook. The
resolver renders only for skills in SKILL_DIGEST_SUBSETS — these 5 + an
empty string for any other skill that drops in the placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): setup-gbrain trust-policy step + sync-gbrain flags (T5b / T13+T5c)

T5b — setup-gbrain Step 9.5:
  Inserts the brain trust policy AskUserQuestion before the verdict block.
  Detects active endpoint hash via gstack-config endpoint-hash. Branches
  per transport:
    * Local (sha == "local"): auto-set personal, one-line notice
    * Remote-MCP, unset: AskUserQuestion (personal vs shared)
    * Already-set: skip, just print current policy
  Personal default flips artifacts_sync_mode=full when still off.

T13+T5c — sync-gbrain:
  Adds two flag short-circuits:
    --refresh-cache : route to gstack-brain-cache refresh --project <slug>;
                       skip code + memory + brain-sync stages. Replaces
                       the planned /brain-refresh-context skill per D1
                       fold (one fewer always-loaded skill in catalog).
    --audit          : emit gstack-owned page summary + sensitive-content
                       leak check via gstack-brain-cache list. Read-only.
  Step 1 trust policy gate: fires the same AskUserQuestion as setup-gbrain
  Step 9.5 when policy is unset for a remote endpoint. Local engines
  auto-set personal silently. Idempotent for already-set policies.

Both templates re-rendered via bun run gen:skill-docs. Trust policy
question wording centralized in setup-gbrain Step 9.5; sync-gbrain
Step 1 references it to avoid prompt drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): schema migration + fence-block fallback + preflight budget (T19+T21)

3 new gate-tier test files closing the most important coverage gaps in
the brain-aware planning layer:

test/schema-version-migration.test.ts (D4 A4):
  - Cache file with mismatched schema_version triggers wipe-and-rebuild
  - Matching version + fresh TTL stays warm-hit (no unnecessary rebuild)
  - Rebuild wipes ALL files in scope, not just the one being read

test/takes-fence-fallback.test.ts:
  - Every preflight skill mentions both takes_add (preferred) and
    put_page fence-block (fallback for pre-T8 gbrain versions)
  - All 5 skills gate on BRAIN_CALIBRATION_WRITEBACK flag + personal
    trust policy
  - Per-skill weight matches SKILL_CALIBRATION_WEIGHTS (E5)
  - Write-back emits the kind=bet frontmatter shape and invalidates
    affected cache digests

test/skill-preflight-budget.test.ts (T21 / D7):
  - Per-skill BRAIN_* instruction bytes stay under 3x the runtime
    digest budget (resolver bloat catch)
  - Autoplan total instruction bytes stay under 75 KB (3x of 25 KB
    runtime cap)
  - Non-preflight skills emit zero brain bytes
  - Per-skill subset references are present in the preflight bash

Note on the 3x multiplier: SKILL_PREFLIGHT_BUDGET_BYTES governs runtime
digest data (enforced by cache CLI truncateToBudget). Instruction text
emitted by the resolver gets a separate 3x headroom — anything beyond
that signals the instructions themselves are bloated and need a trim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(todos): brain-aware planning follow-ups (T11)

Adds five deferred items from the v1.48.0.0 brain-aware planning plan:

  - P2: /gstack-reflect nightly synthesis skill (E2, deferred D4)
  - P3: cross-machine brain-cache sync (E3, deferred D5)
  - P3: /gstack-onboarding dedicated skill (E4, deferred D6)
  - P2: upstream gbrain takes_add + takes_resolve MCP ops (T8 wrap-up)
  - P3: background-refresh hook supervision (codex outside-voice T3)

Each entry follows the TODOS.md format: What / Why / Pros / Cons /
Context / Effort / Depends on. Each cross-references the v1.48.0.0
review decision (D-numbers from /plan-ceo-review and /plan-eng-review)
that deferred it.

The plan itself is at ~/.claude/plans/hm-interesting-well-why-dapper-eagle.md
and is NOT a TODO entry (it's a one-shot design doc, not ongoing work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): bump schema-migration test timeout to 60s

Rebuild path fans out to 7 per-project entity refreshes, each shelling
gbrain with 10s internal timeout. Worst case ~70s. Default bun test
5s was timing out on slow brain unreachable cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.50.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): tighten put_page regression pin to CLI subcommand

The test asserted no substring 'put_page' anywhere in the resolver,
but the BRAIN_WRITE_BACK resolver legitimately references the MCP op
`mcp__gbrain__put_page` as the fallback path for calibration takes
when gbrain v0.42+'s `takes_add` op isn't available. The check
conflated the deprecated `gbrain put_page` CLI subcommand (renamed in
v0.18+ to `gbrain put`) with the still-valid MCP op of the same name.

Narrow the assertion to `gbrain put_page` (with the space) so the
fallback prose stays legal while the CLI rename regression stays caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-config gbrain-refresh subcommand

Adds a new subcommand that re-detects gbrain installation state and
persists the result to ~/.gstack/gbrain-detection.json. The detection
file is consumed by gen-skill-docs --respect-detection (next commit)
to decide whether to render the GBRAIN_CONTEXT_LOAD and
GBRAIN_SAVE_RESULTS resolver blocks in user-local SKILL.md generation.

Reuses the existing bin/gstack-gbrain-detect helper for the actual
probe; this subcommand just persists + summarizes. Users run it after
installing or uninstalling gbrain so their locally generated SKILL.md
files match their installation state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gen-skill-docs respects gbrain-detection override

Adds --respect-detection flag (and bun run gen:skill-docs:user script).
When the flag is set, gen-skill-docs reads ~/.gstack/gbrain-detection.json
and filters GBRAIN_CONTEXT_LOAD + GBRAIN_SAVE_RESULTS out of each host's
suppressedResolvers when gbrain_local_status is "ok". When absent or
gbrain isn't detected, suppression behaves as before.

The default `bun run gen:skill-docs` (CI canonical) ignores the
detection file so the committed SKILL.md stays reproducible regardless
of any developer's local gbrain installation state. Use
gen:skill-docs:user for user-local installs (./setup invokes it).

No host config files modified — the static suppressedResolvers stay
correct for the no-gbrain case; the override happens at gen-time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): setup runs gbrain detection + conditional SKILL.md regen

At the end of install, ./setup now:
  1. Runs bin/gstack-gbrain-detect, persists the result to
     ~/.gstack/gbrain-detection.json
  2. If gbrain_local_status == "ok", regenerates Claude-host SKILL.md
     via `bun run gen:skill-docs:user --host claude` so the user's
     local install picks up the compressed brain-aware blocks
  3. If gbrain isn't detected, leaves the canonical no-gbrain SKILL.md
     files in place (zero token overhead) and surfaces the
     gstack-config gbrain-refresh path for users who install gbrain
     later

Together with the prior two commits, this completes the setup-time
conditional un-suppression: brain-aware blocks render iff the user
has gbrain installed, regardless of which CLI host they're on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(brain): compress GBRAIN_* resolvers, move template prose to docs/

generateGBrainContextLoad: 80 -> 115 tokens with explicit skip-header.
generateGBrainSaveResults: 500-700 -> 161 tokens per skill with the
skill metadata extracted into a typed skillSaveMap (slugPrefix + title
+ tag). Verbose prose (heredoc body, entity-stub instructions, throttle
handling, backlink protocol) moved into a new doc:
docs/gbrain-write-surfaces.md (Sections: §Context Load, §Save Template).
The agent reads the doc on-demand only when actually saving — one Read
call, cached by Claude's context.

Net per-planning-skill overhead under un-suppression drops from ~1000
tokens (naive un-suppression) to ~275 tokens (compressed). Combined
with the setup-time detection from prior commits, users WITHOUT gbrain
pay zero overhead (block suppressed at gen-time) and users WITH gbrain
pay ~275 tokens.

The /investigate special-case (data-research routing in CONTEXT_LOAD)
stays inline since it's skill-specific.

docs/gbrain-write-surfaces.md also serves as the manual-probe reference
for humans verifying live persistence + a topology summary covering
trust-policy + .gbrain-source reads-only semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): wire SAVE_RESULTS for plan-design-review + plan-devex-review

Adds {{GBRAIN_SAVE_RESULTS}} placeholder to the two planning skills
that were missing it, immediately before {{BRAIN_WRITE_BACK}} (mirrors
plan-eng-review:324 + office-hours:650). The corresponding skillSaveMap
entries (design-reviews/<feature-slug> + devex-reviews/<feature-slug>)
landed with the resolver compression in the prior commit.

Regenerated SKILL.md reflects the new placeholder position. The
default no-gbrain generation (CI canonical) still suppresses the
block — zero diff in the rendered output for non-gbrain users.

All five planning skills now write a retrievable review page to gbrain
when gbrain is detected at setup time, instead of three of five.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): resolver compression + detection-override regression pins

test/resolvers-gbrain-save-results.test.ts (140 LOC, 10 tests):
  - Per-skill assertions for all 5 planning skills: emits gbrain put +
    correct slug prefix + tag + title.
  - Skip-header present so agent can short-circuit when gbrain isn't
    on PATH.
  - Compression pin: each per-skill block stays under 750 chars
    (~190 tokens) — guards against a future "let me add one more
    line" refactor silently re-inflating toward the ~1000-token naive
    un-suppression baseline.
  - Generic fallback for unmapped skill names still works.
  - /investigate gets the data-research routing suffix; non-investigate
    skills do not.
  - generateGBrainContextLoad stays under 500 chars (~125 tokens).

test/gbrain-detection-override.test.ts (120 LOC, 4 tests):
  - End-to-end through gen-skill-docs subprocess against an isolated
    temp GSTACK_HOME. Asserts:
    * detected:true un-suppresses GBRAIN_* → SKILL.md gains the block
    * detected:false (status != "ok") suppresses → no block
    * no detection file suppresses → no block (graceful default)
    * no --respect-detection flag IGNORES the detection file → no
      block (CI canonical path stays reproducible)

Each detection-override test restores the canonical SKILL.md in a
finally block so the working tree stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): fake-CLI agent-obedience E2E for /office-hours writeback

test/skill-e2e-office-hours-brain-writeback.test.ts (~210 LOC,
periodic-tier, ~$0.50-1/run):

Drives /office-hours via runSkillTest against a deterministic fixture
brief (pixel.fund founder pitch). The workdir has:
  - A regenerated office-hours/SKILL.md with the compressed brain blocks
    (generated via gen-skill-docs --respect-detection against a temp
    GSTACK_HOME, then restored to canonical post-snapshot)
  - A fake gbrain shell script on PATH that uses printf %q quoting to
    preserve --content "$(cat <<'EOF' ... EOF)" heredoc payloads
    intact (naive `echo "$@"` would lose argv boundaries)
  - The docs/gbrain-write-surfaces.md the resolver points to

Asserts:
  - gbrain-calls.log contains `gbrain put office-hours/pixel-fund`
  - Payload file at gbrain-payloads/office-hours/pixel-fund.md exists
    with valid YAML frontmatter (title: + tags: + design-doc tag)
  - At least one gbrain put entities/<name> call (entity stub
    enrichment is best-effort, soft warning if absent)

Covers agent obedience to the SAVE_RESULTS instruction. Out of scope:
gbrain CLI persistence contract (T11 covers that with real PGLite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): real PGLite round-trip E2E (matched-pair persistence)

test/skill-e2e-gbrain-roundtrip-local.test.ts (~145 LOC, periodic-tier,
~$0.001/run on Voyage):

Real gbrain CLI round-trip against an isolated temp HOME:
  1. gbrain init --pglite --embedding-model voyage:voyage-code-3
  2. gbrain put office-hours/<unique-slug> --content <markdown>
  3. gbrain get <slug>
  4. Assert every body line survives + title + tags + non-empty

This is the matched-pair check for the v1.50.0.0 question "is the data
we hope to save actually being saved?" — proves the gbrain CLI
persistence contract gstack relies on, against a real engine.

Does NOT involve the agent — pure CLI integration test. The agent
obedience side is covered by the fake-CLI E2E in the prior commit.

Skips cleanly when VOYAGE_API_KEY is unset OR gbrain CLI is missing
from PATH, so CI without secrets degrades gracefully.

Remote/Supabase routing is gbrain's contract — the same CLI shape
works against every engine. gstack stops at local round-trip coverage
to avoid re-testing gbrain's MCP client implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(brain): touchfiles + TODOS + CHANGELOG for v1.50.0.0

test/helpers/touchfiles.ts: register the two new E2Es in
E2E_TOUCHFILES + E2E_TIERS (both periodic):
  - office-hours-brain-writeback: triggered by resolver / gen-pipeline /
    detection helper / refresh subcommand / office-hours template /
    docs / fixture / test file changes
  - gbrain-roundtrip-local: triggered by resolver / test file changes

TODOS.md: append two P2 follow-ups carried over from the v1.50 plan:
  - Re-verify calibration takes when gbrain v0.42+ ships takes_add and
    BRAIN_CALIBRATION_WRITEBACK flips TRUE
  - Extend brain-writeback E2E to the other 4 planning skills (extract
    makeFakeGbrain to test/helpers/fake-gbrain.ts when second consumer
    arrives)

CHANGELOG.md v1.50.0.0: add a "Save-results path: works under any CLI
when gbrain is on PATH" section that documents the headline:
  - Conditional inclusion at setup-time (zero overhead for non-gbrain
    users, ~250 tokens with gbrain)
  - Wiring symmetry fix (5 of 5 planning skills now write a page)
  - Token cost table comparing detection states
  - Test coverage map (resolver unit + override mechanism + fake-CLI
    agent obedience + real PGLite round-trip)
  - Why remote routing isn't tested here (gbrain's contract)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): tighten prompt + relax slug assertion in writeback E2E

Two fixes:

1. Prompt: "Slug it 'pixel-fund'" was ambiguous — agent could read it
   as "use pixel-fund as the FULL slug" instead of "substitute
   pixel-fund for <feature-slug>". Replaced with explicit guidance:
   "The feature-slug value to substitute into the SAVE_RESULTS
   template's <feature-slug> placeholder is exactly 'pixel-fund' (no
   path prefix — the template already provides the prefix). Apply the
   SAVE_RESULTS template literally." Also added "Do NOT explore gbrain
   --help" to short-circuit the discovery loop the agent fell into.

2. Slug assertion: was a strict /gbrain put .*office-hours\/pixel-fund/
   regex. This conflated two concerns — agent obedience (does the
   agent actually invoke gbrain put?) vs resolver output shape (does
   the template emit the right prefix?). The latter is already pinned
   by test/resolvers-gbrain-save-results.test.ts at the resolver level
   (free, hermetic). The E2E now asserts /gbrain put .*pixel-fund/
   (slug contains pixel-fund somewhere) plus a recursive payload-file
   search that accepts either office-hours/pixel-fund.md (template-
   faithful) or pixel-fund.md (agent dropped prefix). The YAML
   frontmatter + tag assertions on the payload remain strict — those
   are the real agent-obedience contract.

3. Entity-stub regex: was looking for entities/<name>; agent
   variability uses entity/<name>, people/<name>, companies/<name>.
   Loosened to match entit(y|ies) only. The soft-warning path stays
   (no hard fail) because entity extraction is best-effort prose, not
   a CLI contract.

Verified passing locally: 7 expect() calls, 268s, ~$0.50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to 1.51.1.0

main advanced to 1.51.0.0 while this branch was in development. Bump
to 1.51.1.0 (PATCH above main) so the branch lands cleanly above the
current main version per the monotonic-ordered-release invariant.

Renames the branch-internal [1.50.0.0] CHANGELOG entry to [1.51.1.0] —
1.50.0.0 never landed on main (main skipped to 1.51.0.0), so this
consolidates the branch's brain-aware planning + save-results work
under a single shipping version with no orphaned entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 08:35:00 -07:00

351 lines
26 KiB
Cheetah

---
name: plan-eng-review
preamble-tier: 3
interactive: true
version: 1.0.0
description: |
Eng manager-mode plan review. Lock in the execution plan — architecture,
data flow, diagrams, edge cases, test coverage, performance. Walks through
issues interactively with opinionated recommendations. Use when asked to
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation. (gstack)
voice-triggers:
- "tech review"
- "technical review"
- "plan engineering review"
benefits-from: [office-hours]
allowed-tools:
- Read
- Write
- Grep
- Glob
- AskUserQuestion
- Bash
- WebSearch
triggers:
- review architecture
- eng plan review
- check the implementation plan
---
{{PREAMBLE}}
{{GBRAIN_CONTEXT_LOAD}}
# Plan Review Mode
Review this plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give me an opinionated recommendation, and ask for my input before assuming a direction.
## Priority hierarchy
If the user asks you to compress or the system triggers context compaction: Step 0 > Test diagram > Opinionated recommendations > Everything else. Never skip Step 0 or the test diagram. Do not preemptively warn about context limits -- the system handles compaction automatically.
## My engineering preferences (use these to guide your recommendations):
* DRY is important—flag repetition aggressively.
* Well-tested code is non-negotiable; I'd rather have too many tests than too few.
* I want code that's "engineered enough" — not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity).
* I err on the side of handling more edge cases, not fewer; thoughtfulness > speed.
* Bias toward explicit over clever.
* Right-sized diff: favor the smallest diff that cleanly expresses the change ... but don't compress a necessary rewrite into a minimal patch. If the existing foundation is broken, say "scrap it and do this instead."
## Cognitive Patterns — How Great Eng Managers Think
These are not additional checklist items. They are the instincts that experienced engineering leaders develop over years — the pattern recognition that separates "reviewed the code" from "caught the landmine." Apply them throughout your review.
1. **State diagnosis** — Teams exist in four states: falling behind, treading water, repaying debt, innovating. Each demands a different intervention (Larson, An Elegant Puzzle).
2. **Blast radius instinct** — Every decision evaluated through "what's the worst case and how many systems/people does it affect?"
3. **Boring by default** — "Every company gets about three innovation tokens." Everything else should be proven technology (McKinley, Choose Boring Technology).
4. **Incremental over revolutionary** — Strangler fig, not big bang. Canary, not global rollout. Refactor, not rewrite (Fowler).
5. **Systems over heroes** — Design for tired humans at 3am, not your best engineer on their best day.
6. **Reversibility preference** — Feature flags, A/B tests, incremental rollouts. Make the cost of being wrong low.
7. **Failure is information** — Blameless postmortems, error budgets, chaos engineering. Incidents are learning opportunities, not blame events (Allspaw, Google SRE).
8. **Org structure IS architecture** — Conway's Law in practice. Design both intentionally (Skelton/Pais, Team Topologies).
9. **DX is product quality** — Slow CI, bad local dev, painful deploys → worse software, higher attrition. Developer experience is a leading indicator.
10. **Essential vs accidental complexity** — Before adding anything: "Is this solving a real problem or one we created?" (Brooks, No Silver Bullet).
11. **Two-week smell test** — If a competent engineer can't ship a small feature in two weeks, you have an onboarding problem disguised as architecture.
12. **Glue work awareness** — Recognize invisible coordination work. Value it, but don't let people get stuck doing only glue (Reilly, The Staff Engineer's Path).
13. **Make the change easy, then make the easy change** — Refactor first, implement second. Never structural + behavioral changes simultaneously (Beck).
14. **Own your code in production** — No wall between dev and ops. "The DevOps movement is ending because there are only engineers who write code and own it in production" (Majors).
15. **Error budgets over uptime targets** — SLO of 99.9% = 0.1% downtime *budget to spend on shipping*. Reliability is resource allocation (Google SRE).
When evaluating architecture, think "boring by default." When reviewing tests, think "systems over heroes." When assessing complexity, ask Brooks's question. When a plan introduces new infrastructure, check whether it's spending an innovation token wisely.
## Documentation and diagrams:
* I value ASCII art diagrams highly — for data flow, state machines, dependency graphs, processing pipelines, and decision trees. Use them liberally in plans and design docs.
* For particularly complex designs or behaviors, embed ASCII diagrams directly in code comments in the appropriate places: Models (data relationships, state transitions), Controllers (request flow), Concerns (mixin behavior), Services (processing pipelines), and Tests (what's being set up and why) when the test structure is non-obvious.
* **Diagram maintenance is part of the change.** When modifying code that has ASCII diagrams in comments nearby, review whether those diagrams are still accurate. Update them as part of the same commit. Stale diagrams are worse than no diagrams — they actively mislead. Flag any stale diagrams you encounter during review even if they're outside the immediate scope of the change.
{{BRAIN_PREFLIGHT}}
## BEFORE YOU START:
### Design Doc Check
```bash
setopt +o nomatch 2>/dev/null || true # zsh compat
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
[ -z "$DESIGN" ] && DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null | head -1)
[ -n "$DESIGN" ] && echo "Design doc found: $DESIGN" || echo "No design doc found"
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
{{BENEFITS_FROM}}
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
4. **Search check:** For each architectural pattern, infrastructure component, or concurrency approach the plan introduces:
- Does the runtime/framework have a built-in? Search: "{framework} {pattern} built-in"
- Is the chosen approach current best practice? Search: "{pattern} best practice {current year}"
- Are there known footguns? Search: "{framework} {pattern} pitfalls"
If WebSearch is unavailable, skip this check and note: "Search unavailable — proceeding with in-distribution knowledge only."
If the plan rolls a custom solution where a built-in exists, flag it as a scope reduction opportunity. Annotate recommendations with **[Layer 1]**, **[Layer 2]**, **[Layer 3]**, or **[EUREKA]** (see preamble's Search Before Building section). If you find a eureka moment — a reason the standard approach is wrong for this case — present it as an architectural insight.
5. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?
5. **Completeness check:** Is the plan doing the complete version or a shortcut? With AI-assisted coding, the cost of completeness (100% test coverage, full edge case handling, complete error paths) is 10-100x cheaper than with a human team. If the plan proposes a shortcut that saves human-hours but only saves minutes with CC+gstack, recommend the complete version. Boil the lake.
6. **Distribution check:** If the plan introduces a new artifact type (CLI binary, library package, container image, mobile app), does it include the build/publish pipeline? Code without distribution is code nobody can use. Check:
- Is there a CI/CD workflow for building and publishing the artifact?
- Are target platforms defined (linux/darwin/windows, amd64/arm64)?
- How will users download or install it (GitHub Releases, package manager, container registry)?
If the plan defers distribution, flag it explicitly in the "NOT in scope" section — don't let it silently drop.
If the complexity check triggers (8+ files or 2+ new classes/services), STOP before any review-section work. Call AskUserQuestion: name what's overbuilt, propose a minimal version that achieves the core goal, ask whether to reduce or proceed as-is. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
**STOP.** Do NOT proceed to Section 1 (Architecture review), edit the plan file with a proposed scope reduction, or call ExitPlanMode until the user responds. Naming the 80% solution in chat prose and continuing — or loading the AskUserQuestion schema via ToolSearch and then never invoking it — is the failure mode this gate exists to prevent.
If the complexity check does not trigger, present your Step 0 findings and proceed directly to Section 1.
Always work through the full interactive review: one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section.
**Critical: Once the user accepts or rejects a scope reduction recommendation, commit fully.** Do not re-argue for smaller scope during later review sections. Do not silently reduce scope or skip planned components.
## Review Sections (after scope is agreed)
**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
{{ANTI_SHORTCUT_CLAUSE}}
{{LEARNINGS_SEARCH}}
### 1. Architecture review
Evaluate:
* Overall system design and component boundaries.
* Dependency graph and coupling concerns.
* Data flow patterns and potential bottlenecks.
* Scaling characteristics and single points of failure.
* Security architecture (auth, data access, API boundaries).
* Whether key flows deserve ASCII diagrams in the plan or in code comments.
* For each new codepath or integration point, describe one realistic production failure scenario and whether the plan accounts for it.
* **Distribution architecture:** If this introduces a new artifact (binary, package, container), how does it get built, published, and updated? Is the CI/CD pipeline part of the plan or deferred?
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
{{CONFIDENCE_CALIBRATION}}
### 2. Code quality review
Evaluate:
* Code organization and module structure.
* DRY violations—be aggressive here.
* Error handling patterns and missing edge cases (call these out explicitly).
* Technical debt hotspots.
* Areas that are over-engineered or under-engineered relative to my preferences.
* Existing ASCII diagrams in touched files — are they still accurate after this change?
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
### 3. Test review
{{TEST_COVERAGE_AUDIT_PLAN}}
For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in CLAUDE.md. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against. Then use AskUserQuestion to confirm the eval scope with the user.
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
### 4. Performance review
Evaluate:
* N+1 queries and database access patterns.
* Memory-usage concerns.
* Caching opportunities.
* Slow or high-complexity code paths.
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
{{CODEX_PLAN_REVIEW}}
### Outside Voice Integration Rule
Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
Do NOT incorporate outside voice recommendations into the plan without presenting each
finding via AskUserQuestion and getting explicit approval. This applies even when you
agree with the outside voice. Cross-model consensus is a strong signal — present it as
such — but the user makes the decision.
## CRITICAL RULE — How to ask questions
Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews:
* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
* Describe the problem concretely, with file and line references.
* Present 2-3 options, including "do nothing" where that's reasonable.
* For each option, specify in one line: effort (human: ~X / CC: ~Y), risk, and maintenance burden. If the complete option is only marginally more effort than the shortcut with CC, recommend the complete option.
* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.).
* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
* **Coverage vs kind:** for every per-issue AskUserQuestion you raise in this review, decide whether the options differ in coverage or in kind. If coverage (e.g., more tests vs fewer, complete error handling vs happy-path-only, full edge-case coverage vs shortcut), include `Completeness: N/10` on each option. If kind (e.g., architectural choice between two different systems, posture-over-posture, A/B/C where each is a different kind of thing), skip the score and add one line: `Note: options differ in kind, not coverage — no completeness score.` Do NOT fabricate scores on kind-differentiated questions — filler scores are worse than no score.
* **Zero findings:** if a section has zero findings, state "No issues, moving on" and proceed. Otherwise, use AskUserQuestion for each finding — a finding with an "obvious fix" is still a finding and still needs user approval before any change lands in the plan.
## Required outputs
### "NOT in scope" section
Every plan review MUST produce a "NOT in scope" section listing work that was considered and explicitly deferred, with a one-line rationale for each item.
### "What already exists" section
List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.
### TODOS.md updates
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
For each TODO, describe:
* **What:** One-line description of the work.
* **Why:** The concrete problem it solves or value it unlocks.
* **Pros:** What you gain by doing this work.
* **Cons:** Cost, complexity, or risks of doing it.
* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start.
* **Depends on / blocked by:** Any prerequisites or ordering constraints.
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
Do NOT just append vague bullet points. A TODO without context is worse than no TODO — it creates false confidence that the idea was captured while actually losing the reasoning.
### Diagrams
The plan itself should use ASCII diagrams for any non-trivial data flow, state machine, or processing pipeline. Additionally, identify which files in the implementation should get inline ASCII diagram comments — particularly Models with complex state transitions, Services with multi-step pipelines, and Concerns with non-obvious mixin behavior.
### Failure modes
For each new codepath identified in the test review diagram, list one realistic way it could fail in production (timeout, nil reference, race condition, stale data, etc.) and whether:
1. A test covers that failure
2. Error handling exists for it
3. The user would see a clear error or a silent failure
If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**.
### Worktree parallelization strategy
Analyze the plan's implementation steps for parallel execution opportunities. This helps the user split work across git worktrees (via Claude Code's Agent tool with `isolation: "worktree"` or parallel workspaces).
**Skip if:** all steps touch the same primary module, or the plan has fewer than 2 independent workstreams. In that case, write: "Sequential implementation, no parallelization opportunity."
**Otherwise, produce:**
1. **Dependency table** — for each implementation step/workstream:
| Step | Modules touched | Depends on |
|------|----------------|------------|
| (step name) | (directories/modules, NOT specific files) | (other steps, or —) |
Work at the module/directory level, not file level. Plans describe intent ("add API endpoints"), not specific files. Module-level ("controllers/, models/") is reliable; file-level is guesswork.
2. **Parallel lanes** — group steps into lanes:
- Steps with no shared modules and no dependency go in separate lanes (parallel)
- Steps sharing a module directory go in the same lane (sequential)
- Steps depending on other steps go in later lanes
Format: `Lane A: step1 → step2 (sequential, shared models/)` / `Lane B: step3 (independent)`
3. **Execution order** — which lanes launch in parallel, which wait. Example: "Launch A + B in parallel worktrees. Merge both. Then C."
4. **Conflict flags** — if two parallel lanes touch the same module directory, flag it: "Lanes X and Y both touch module/ — potential merge conflict. Consider sequential execution or careful coordination."
{{TASKS_SECTION_EMIT:eng-review}}
### Completion summary
At the end of the review, fill in and display this summary so the user can see all findings at a glance:
- Step 0: Scope Challenge — ___ (scope accepted as-is / scope reduced per recommendation)
- Architecture Review: ___ issues found
- Code Quality Review: ___ issues found
- Test Review: diagram produced, ___ gaps identified
- Performance Review: ___ issues found
- NOT in scope: written
- What already exists: written
- TODOS.md updates: ___ items proposed to user
- Failure modes: ___ critical gaps flagged
- Outside voice: ran (codex/claude) / skipped
- Parallelization: ___ lanes, ___ parallel / ___ sequential
- Lake Score: X/Y recommendations chose complete option
## Retrospective learning
Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic.
## Formatting rules
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
* Label with NUMBER + LETTER (e.g., "3A", "3B").
* One sentence max per option. Pick in under 5 seconds.
* After each review section, pause and ask for feedback before moving on.
## Review Log
After producing the Completion Summary above, persist the review result.
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
`~/.gstack/` (user config directory, not project files). The skill preamble
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
the same pattern. The review dashboard depends on this data. Skipping this
command breaks the review readiness dashboard in /ship.
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"MODE","commit":"COMMIT"}'
```
Substitute values from the Completion Summary:
- **TIMESTAMP**: current ISO 8601 datetime
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
- **unresolved**: number from "Unresolved decisions" count
- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged"
- **issues_found**: total issues found across all review sections (Architecture + Code Quality + Performance + Test gaps)
- **MODE**: FULL_REVIEW / SCOPE_REDUCED
- **COMMIT**: output of `git rev-parse --short HEAD`
{{REVIEW_DASHBOARD}}
{{PLAN_FILE_REVIEW_REPORT}}
{{LEARNINGS_LOG}}
{{GBRAIN_SAVE_RESULTS}}
{{BRAIN_WRITE_BACK}}
{{BRAIN_CACHE_REFRESH}}
## Next Steps — Review Chaining
After displaying the Review Readiness Dashboard, check if additional reviews would be valuable. Read the dashboard output to see which reviews have already been run and whether they are stale.
**Suggest /plan-design-review if UI changes exist and no design review has been run** — detect from the test diagram, architecture review, or any section that touched frontend components, CSS, views, or user-facing interaction flows. If an existing design review's commit hash shows it predates significant changes found in this eng review, note that it may be stale.
**Mention /plan-ceo-review if this is a significant product change and no CEO review exists** — this is a soft suggestion, not a push. CEO review is optional. Only mention it if the plan introduces new user-facing features, changes product direction, or expands scope substantially.
**Note staleness** of existing CEO or design reviews if this eng review found assumptions that contradict them, or if the commit hash shows significant drift.
**If no additional reviews are needed** (or `skip_eng_review` is `true` in the dashboard config, meaning this eng review was optional): state "All relevant reviews complete. Run /ship when ready."
Use AskUserQuestion with only the applicable options:
- **A)** Run /plan-design-review (only if UI scope detected and no design review exists)
- **B)** Run /plan-ceo-review (only if significant product change and no CEO review exists)
- **C)** Ready to implement — run /ship when done
## Unresolved decisions
If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option.
{{EXIT_PLAN_MODE_GATE}}