mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-18 07:40:09 +02:00
45cc95d5f4
* feat(gbrain-sync): add cycleCompleted() cycle-state probe Reads `gbrain doctor` cycle_freshness to classify whether a source has completed a full cycle (completed/never/unknown). A fail naming this source -> never; a fail naming only other sources -> completed; an absent or unparseable check -> unknown, so an unrelated doctor failure never masks a real state. Gates the automatic call-graph build on --full. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): --dream call-graph stage with lock-free gate + honest outcome guard Adds a source-scoped `gbrain dream --source <id>` stage that builds this worktree's call graph (code-callers/code-callees). Runs lock-free after the sync lock releases so it never blocks sibling worktrees; a .dream-in-progress marker dedupes concurrent dreams. --full auto-runs it only when the cycle was never built; explicit --dream always forces; --no-dream opts out. The stage parses the cycle's own output and reports the truth, not a flat "built": a WARN when the schema pack can't extract code symbols, when the embed phase failed for a missing key, or when 0 edges resolved; OK with the resolved-edge count otherwise. gbrain exits 0 even when it skips on a held cycle lock (e.g. autopilot), so that case reports SKIP, not success. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: ignore gbrain .sources/ local staging dir gbrain writes per-source staging and capability-check artifacts under .sources/ in the repo root. It's machine-local runtime state, not source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(gbrain): honest call-graph guidance in /sync-gbrain + pin works on gbrain>=0.41.38 sync-gbrain frames the --dream offer honestly: building a call graph requires a code-aware schema pack, and the dream stage reports a WARN when it can't. The verdict's Call graph row mirrors the dream stage's real outcome instead of assuming a completed cycle means edges exist. The ## GBrain Search Guidance block written into CLAUDE.md drops the old code-callers --source caveat: gbrain >=0.41.38.0 honors the .gbrain-source pin for code-callers/code-callees. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(jsonl-store): shared audited JSONL plumbing (injection-reject + atomic append + tolerant read) Single source of truth extracted for D2A: gstack-learnings-* and the upcoming gstack-decision-* bins share one injection-pattern list, one atomic single-line appender, and one tolerant reader. No more drift between stores. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(learnings-log): use shared hasInjection from lib/jsonl-store (D2A) Replace the inline injection-pattern copy with the shared list. One audited write-path rejection across learnings + the upcoming decision store. Behavior unchanged (35/35 learnings tests green); learnings-search keeps its inline copy because a structural test pins its bash/bun shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): event-sourced decision-memory model (lib/gstack-decision) decide/supersede/redact events on lib/jsonl-store; active set is computed (no mutable status), dangling refs tolerated. Free-text is injection-checked and redact-scanned on write (HIGH secret -> reject). Scope filter (repo/branch/issue) for relevant resurfacing. File-only + reliable; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): bounded active snapshot + compaction (redact expunges, supersede archives) writeSnapshot/readSnapshot/rebuildSnapshot give an O(active) bounded read for the session-start hot path (D1A). compact() rewrites the log to active, archives superseded decisions for history, and EXPUNGES redacted ones (dropped, never archived) so an accidentally-captured secret leaves the store for good. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): gstack-decision-log + gstack-decision-search bins (non-interactive) Two bins mirroring gstack-learnings-* (D3A). log writes decide/--supersede/--redact/ --compact events + refreshes the bounded snapshot + enqueues for cross-machine sync; search reads the O(active) snapshot, scope-filtered to current branch, newest-first, --all to include superseded, --json for machines. Empty store returns silently (no snapshot write on an empty read). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): surface active decisions at session start + capture nudge (Context Recovery) Context Recovery now shows recent scope-relevant active decisions (bounded read of decisions.active.json via gstack-decision-search) and instructs the agent to treat them as settled calls and to log durable decisions/reversals. Closes the Phase-1 capture->curate->resurface loop, reliable + file-only. Regen across all hosts folded in (squash-with-regen); parity 10/10, freshness green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: refresh ship golden baselines for the memory-loop preamble change Context Recovery now emits the cross-session-decisions block, so ship's preamble (all hosts) changed. Golden baselines are hand-maintained copies (gen does not write them); refresh them from the fresh gen so golden-file regression passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(memory): document the cross-session decision-memory loop in CLAUDE.md Adds a '## Cross-session decision memory' section: how to resurface (gstack-decision-search) and capture (gstack-decision-log) durable decisions, the supersede/redact/compact verbs, and a crisp durable-vs-trivial definition so the store stays signal. Reliable file-only path; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): emit durable decisions from ship/ceo/eng/spec at structured points Wires the four skills that finalize real decisions to capture them in the cross-session decision store, from their STRUCTURED outputs (never free-text scraping): - ship: the version bump (level + why) at write time - plan-ceo-review: accepted scope + verdict (branch-scoped) - plan-eng-review: the architecture verdict + key call (branch-scoped) - spec: the filed issue's core approach (issue-scoped) All emits are non-interactive, schema-correct (content in decision/rationale, source=skill, confidence 1-10), and best-effort (|| true) so a decision-log failure never blocks the workflow. Includes regen across hosts + refreshed ship golden baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): optional gbrain --semantic recall for decision search Adds gstack-decision-search --semantic (with --query): appends a 'Related from memory' block from gbrain semantic search, scoped to the curated-memory source. Pure enhancement, reliability-first: a new lib/gstack-decision-semantic.ts is the ONLY decision module that touches gbrain and is imported lazily only on --semantic, so the reliable file path never loads gbrain code. Every path degrades to the reliable file results when gbrain is off, unconfigured, empty, or errors (never throws, 10s timeout). Built against the verified gbrain 0.42.x surface (text output [score] slug -- snippet, NOT JSON; curated-memory source resolved by worktree path, not a gstack-brain-<user> id). Deterministic-contract tests only: parser units, degrade-to-null when gbrain absent, and a fake-gbrain shim proving scope+search end-to-end. find-contradictions deferred (no verifiable CLI surface yet + curated memory not indexed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): self-heal stale autopilot lock (dead-pid) detectAutopilot treated a lock FILE as proof of life, so a crashed gbrain daemon left a stale lock that wedged every sync forever (observed: a dead pid refused --full indefinitely). Now read the holder pid (bare or JSON body) and check liveness via signal-0: ESRCH=dead → ignore the stale signal and keep checking; EPERM=alive (other user) → active. A stale lock never masks a live autopilot process. Pure decision function — does not delete the file; the caller may clean it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): drop stray trailing code fence in TODOS-format Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): align section-loading E2E testNames with their TOUCHFILES keys Pre-existing on main (v1.56.x): the two section-loading E2E tests used human-label testNames ('/ship section-loading') that don't match their slug keys ('ship-section-loading') in E2E_TOUCHFILES/E2E_TIERS. Every other E2E test uses the slug as its testName, and the TOUCHFILES completeness gate requires testName to be a registered key — so the gate was red. Align both testNames to their slug keys (also fixes tier lookup for these two periodic tests). Verified failing on a clean origin/main checkout before the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (datamark, DRY, compact, coverage) Addresses the pre-landing review findings (all INFORMATIONAL, no criticals): - security: datamark resurfaced decision text at the render boundary (lib/gstack-decision.ts datamark() — neutralizes code fences, --- banners, <|role|>/</system> markers, control chars, newlines). Applied in gstack-decision-search human output so stored text can't masquerade as instructions in Context Recovery (codex hardening #3 / AC #7). --json stays raw. - DRY: extract resolveSlug/gitBranch/flagValue to lib/bin-context.ts; both decision bins use it instead of duplicating the helpers. - compact(): batch the archive append (one write, not N) and shrink the mid-compact crash window; simplify the opaque branch/issue ternary. - coverage: learnings-log injection rejection (D2A wiring), search --recent/ --scope + NaN-safe --recent, datamark-applied, unparseable lock body, compact-empty, corrupt-snapshot degrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close adversarial-review findings in decision memory Adversarial review (Claude subagent) found a CRITICAL the specialist pass missed: - F1 (CRITICAL): 'Human:'/'Assistant:' turn-prefixes bypassed BOTH the write-time denylist AND datamark(), landing verbatim in agent context inside the trusted ACTIVE DECISIONS fence. Add 'human:' (+ 'disregard previous', 'from now on') to the shared denylist, and have datamark() neutralize Human:/Assistant:/System:/User: turn-prefixes (ZWSP) at the render boundary. - F2: datamark() only stripped ASCII C0; extend to Unicode line terminators (U+0085/2028/2029) and U+007F so 'strip newlines' actually holds. - F3: validateDecide blocked only HIGH secrets; MEDIUM-tier PII (e.g. SSN) persisted silently and synced cross-machine. The store is non-interactive (no confirm path), so fail closed on MEDIUM too. - F4: compact() was a lock-free read-modify-rewrite that could clobber a concurrent append (lost decision). Add an O_EXCL compact lock + a pre-rename size recheck that aborts untouched (skipped=true) if an append landed; caller re-runs. - F7: filterByScope unknown/garbage scope fell through to 'return true' (leaked into every context); fail conservative (false). F5 (pid reuse) and F6 (pgrep over-match) are intentionally left as-is: both fail SAFE (over-refuse sync); making them precise would introduce a fail-DANGEROUS path (allowing sync during a real autopilot). True disambiguation needs gbrain to stamp the lock with a start-time, which gstack doesn't own. F8 (compact moves history to archive) is by design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close cross-model (Codex) adversarial findings Codex adversarial review found a HIGH the Claude pass missed plus 3 mediums: - C1 (HIGH): gstack-decision-search --all returned every decide and IGNORED redact events, so a redacted secret still resurfaced via --all until compact ran. --all now excludes redacted (redact = expunge from every read path), still showing superseded history. - C-med: semantic (external gbrain) slug/snippet were printed raw — datamark them too so a gbrain hit can't spoof role markers / fences into agent context. - C4: semanticRecall fell back to an UNSCOPED gbrain search when no curated-memory source resolved, pulling code/doc corpora mislabeled as 'related decisions'. Now returns null (degrade) when there's no worktree-backed memory source. - C5: validateDecide scanned only decision/rationale/alternatives; branch and issue are stored + surfaced (raw via --json), so include them in the injection+secret scan. C2 (snapshot staleness) / C3 (compact TOCTOU residual): accepted for a single-user store — atomic appends never lose the event, rebuilds self-heal, and the compact size-recheck leaves only a sub-ms window; full append-locking would break the lock-free append design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.5.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
226 lines
16 KiB
Cheetah
226 lines
16 KiB
Cheetah
## Review Sections (after scope is agreed)
|
|
|
|
**Anti-skip rule:** Never condense, abbreviate, or skip any review section (1-4) regardless of plan type (strategy, spec, code, infra). Every section in this skill exists for a reason. "This is a strategy doc so implementation sections don't apply" is always wrong — implementation details are where strategy breaks down. If a section genuinely has zero findings, say "No issues found" and move on — but you must evaluate it.
|
|
|
|
{{ANTI_SHORTCUT_CLAUSE}}
|
|
|
|
{{LEARNINGS_SEARCH}}
|
|
|
|
### 1. Architecture review
|
|
Evaluate:
|
|
* Overall system design and component boundaries.
|
|
* Dependency graph and coupling concerns.
|
|
* Data flow patterns and potential bottlenecks.
|
|
* Scaling characteristics and single points of failure.
|
|
* Security architecture (auth, data access, API boundaries).
|
|
* Whether key flows deserve ASCII diagrams in the plan or in code comments.
|
|
* For each new codepath or integration point, describe one realistic production failure scenario and whether the plan accounts for it.
|
|
* **Distribution architecture:** If this introduces a new artifact (binary, package, container), how does it get built, published, and updated? Is the CI/CD pipeline part of the plan or deferred?
|
|
|
|
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
|
|
|
|
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
|
|
|
|
{{CONFIDENCE_CALIBRATION}}
|
|
|
|
### 2. Code quality review
|
|
Evaluate:
|
|
* Code organization and module structure.
|
|
* DRY violations—be aggressive here.
|
|
* Error handling patterns and missing edge cases (call these out explicitly).
|
|
* Technical debt hotspots.
|
|
* Areas that are over-engineered or under-engineered relative to my preferences.
|
|
* Existing ASCII diagrams in touched files — are they still accurate after this change?
|
|
|
|
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
|
|
|
|
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
|
|
|
|
### 3. Test review
|
|
|
|
{{TEST_COVERAGE_AUDIT_PLAN}}
|
|
|
|
For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in CLAUDE.md. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against. Then use AskUserQuestion to confirm the eval scope with the user.
|
|
|
|
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
|
|
|
|
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
|
|
|
|
### 4. Performance review
|
|
Evaluate:
|
|
* N+1 queries and database access patterns.
|
|
* Memory-usage concerns.
|
|
* Caching opportunities.
|
|
* Slow or high-complexity code paths.
|
|
|
|
For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Use the preamble's AskUserQuestion Format section. The AskUserQuestion call is a tool_use, not prose — call the tool directly.
|
|
|
|
**STOP.** Do NOT proceed to the next review section, edit the plan file with the proposed fix, or call ExitPlanMode until the user responds. An issue with an "obvious fix" is still an issue and still needs explicit user approval before it lands in the plan. Loading the AskUserQuestion schema via ToolSearch and then writing the recommendation as chat prose is the failure mode this gate exists to prevent.
|
|
|
|
{{CODEX_PLAN_REVIEW}}
|
|
|
|
### Outside Voice Integration Rule
|
|
|
|
Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
|
|
Do NOT incorporate outside voice recommendations into the plan without presenting each
|
|
finding via AskUserQuestion and getting explicit approval. This applies even when you
|
|
agree with the outside voice. Cross-model consensus is a strong signal — present it as
|
|
such — but the user makes the decision.
|
|
|
|
## CRITICAL RULE — How to ask questions
|
|
Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews:
|
|
* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
|
|
* Describe the problem concretely, with file and line references.
|
|
* Present 2-3 options, including "do nothing" where that's reasonable.
|
|
* For each option, specify in one line: effort (human: ~X / CC: ~Y), risk, and maintenance burden. If the complete option is only marginally more effort than the shortcut with CC, recommend the complete option.
|
|
* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.).
|
|
* Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
|
|
* **Coverage vs kind:** for every per-issue AskUserQuestion you raise in this review, decide whether the options differ in coverage or in kind. If coverage (e.g., more tests vs fewer, complete error handling vs happy-path-only, full edge-case coverage vs shortcut), include `Completeness: N/10` on each option. If kind (e.g., architectural choice between two different systems, posture-over-posture, A/B/C where each is a different kind of thing), skip the score and add one line: `Note: options differ in kind, not coverage — no completeness score.` Do NOT fabricate scores on kind-differentiated questions — filler scores are worse than no score.
|
|
* **Zero findings:** if a section has zero findings, state "No issues, moving on" and proceed. Otherwise, use AskUserQuestion for each finding — a finding with an "obvious fix" is still a finding and still needs user approval before any change lands in the plan.
|
|
|
|
## Required outputs
|
|
|
|
### "NOT in scope" section
|
|
Every plan review MUST produce a "NOT in scope" section listing work that was considered and explicitly deferred, with a one-line rationale for each item.
|
|
|
|
### "What already exists" section
|
|
List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.
|
|
|
|
### TODOS.md updates
|
|
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
|
|
|
|
For each TODO, describe:
|
|
* **What:** One-line description of the work.
|
|
* **Why:** The concrete problem it solves or value it unlocks.
|
|
* **Pros:** What you gain by doing this work.
|
|
* **Cons:** Cost, complexity, or risks of doing it.
|
|
* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start.
|
|
* **Depends on / blocked by:** Any prerequisites or ordering constraints.
|
|
|
|
Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
|
|
|
|
Do NOT just append vague bullet points. A TODO without context is worse than no TODO — it creates false confidence that the idea was captured while actually losing the reasoning.
|
|
|
|
### Diagrams
|
|
The plan itself should use ASCII diagrams for any non-trivial data flow, state machine, or processing pipeline. Additionally, identify which files in the implementation should get inline ASCII diagram comments — particularly Models with complex state transitions, Services with multi-step pipelines, and Concerns with non-obvious mixin behavior.
|
|
|
|
### Failure modes
|
|
For each new codepath identified in the test review diagram, list one realistic way it could fail in production (timeout, nil reference, race condition, stale data, etc.) and whether:
|
|
1. A test covers that failure
|
|
2. Error handling exists for it
|
|
3. The user would see a clear error or a silent failure
|
|
|
|
If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**.
|
|
|
|
### Worktree parallelization strategy
|
|
|
|
Analyze the plan's implementation steps for parallel execution opportunities. This helps the user split work across git worktrees (via Claude Code's Agent tool with `isolation: "worktree"` or parallel workspaces).
|
|
|
|
**Skip if:** all steps touch the same primary module, or the plan has fewer than 2 independent workstreams. In that case, write: "Sequential implementation, no parallelization opportunity."
|
|
|
|
**Otherwise, produce:**
|
|
|
|
1. **Dependency table** — for each implementation step/workstream:
|
|
|
|
| Step | Modules touched | Depends on |
|
|
|------|----------------|------------|
|
|
| (step name) | (directories/modules, NOT specific files) | (other steps, or —) |
|
|
|
|
Work at the module/directory level, not file level. Plans describe intent ("add API endpoints"), not specific files. Module-level ("controllers/, models/") is reliable; file-level is guesswork.
|
|
|
|
2. **Parallel lanes** — group steps into lanes:
|
|
- Steps with no shared modules and no dependency go in separate lanes (parallel)
|
|
- Steps sharing a module directory go in the same lane (sequential)
|
|
- Steps depending on other steps go in later lanes
|
|
|
|
Format: `Lane A: step1 → step2 (sequential, shared models/)` / `Lane B: step3 (independent)`
|
|
|
|
3. **Execution order** — which lanes launch in parallel, which wait. Example: "Launch A + B in parallel worktrees. Merge both. Then C."
|
|
|
|
4. **Conflict flags** — if two parallel lanes touch the same module directory, flag it: "Lanes X and Y both touch module/ — potential merge conflict. Consider sequential execution or careful coordination."
|
|
|
|
{{TASKS_SECTION_EMIT:eng-review}}
|
|
|
|
### Completion summary
|
|
At the end of the review, fill in and display this summary so the user can see all findings at a glance:
|
|
- Step 0: Scope Challenge — ___ (scope accepted as-is / scope reduced per recommendation)
|
|
- Architecture Review: ___ issues found
|
|
- Code Quality Review: ___ issues found
|
|
- Test Review: diagram produced, ___ gaps identified
|
|
- Performance Review: ___ issues found
|
|
- NOT in scope: written
|
|
- What already exists: written
|
|
- TODOS.md updates: ___ items proposed to user
|
|
- Failure modes: ___ critical gaps flagged
|
|
- Outside voice: ran (codex/claude) / skipped
|
|
- Parallelization: ___ lanes, ___ parallel / ___ sequential
|
|
- Lake Score: X/Y recommendations chose complete option
|
|
|
|
## Retrospective learning
|
|
Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic.
|
|
|
|
## Formatting rules
|
|
* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
|
|
* Label with NUMBER + LETTER (e.g., "3A", "3B").
|
|
* One sentence max per option. Pick in under 5 seconds.
|
|
* After each review section, pause and ask for feedback before moving on.
|
|
|
|
## Review Log
|
|
|
|
After producing the Completion Summary above, persist the review result.
|
|
|
|
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes review metadata to
|
|
`~/.gstack/` (user config directory, not project files). The skill preamble
|
|
already writes to `~/.gstack/sessions/` and `~/.gstack/analytics/` — this is
|
|
the same pattern. The review dashboard depends on this data. Skipping this
|
|
command breaks the review readiness dashboard in /ship.
|
|
|
|
```bash
|
|
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"MODE","commit":"COMMIT"}'
|
|
~/.claude/skills/gstack/bin/gstack-decision-log '{"decision":"Eng review (MODE): ARCH_SUMMARY","rationale":"KEY_DECISION","scope":"branch","source":"skill","confidence":8}' 2>/dev/null || true
|
|
```
|
|
|
|
The second command records the architecture verdict as a durable cross-session decision (so a future session inherits the chosen approach and what was hardened, not just the count). Same `~/.gstack/` write pattern as review-log, non-interactive, best-effort (`|| true`). Substitute `ARCH_SUMMARY` (e.g. "N findings, all folded" or "M unresolved") and `KEY_DECISION` (the load-bearing architecture call from the report, one line — omit if the review found nothing durable).
|
|
|
|
Substitute values from the Completion Summary:
|
|
- **TIMESTAMP**: current ISO 8601 datetime
|
|
- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open"
|
|
- **unresolved**: number from "Unresolved decisions" count
|
|
- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged"
|
|
- **issues_found**: total issues found across all review sections (Architecture + Code Quality + Performance + Test gaps)
|
|
- **MODE**: FULL_REVIEW / SCOPE_REDUCED
|
|
- **COMMIT**: output of `git rev-parse --short HEAD`
|
|
|
|
{{REVIEW_DASHBOARD}}
|
|
|
|
{{PLAN_FILE_REVIEW_REPORT}}
|
|
|
|
{{LEARNINGS_LOG}}
|
|
|
|
{{GBRAIN_SAVE_RESULTS}}
|
|
|
|
{{BRAIN_WRITE_BACK}}
|
|
|
|
{{BRAIN_CACHE_REFRESH}}
|
|
|
|
## Next Steps — Review Chaining
|
|
|
|
After displaying the Review Readiness Dashboard, check if additional reviews would be valuable. Read the dashboard output to see which reviews have already been run and whether they are stale.
|
|
|
|
**Suggest /plan-design-review if UI changes exist and no design review has been run** — detect from the test diagram, architecture review, or any section that touched frontend components, CSS, views, or user-facing interaction flows. If an existing design review's commit hash shows it predates significant changes found in this eng review, note that it may be stale.
|
|
|
|
**Mention /plan-ceo-review if this is a significant product change and no CEO review exists** — this is a soft suggestion, not a push. CEO review is optional. Only mention it if the plan introduces new user-facing features, changes product direction, or expands scope substantially.
|
|
|
|
**Note staleness** of existing CEO or design reviews if this eng review found assumptions that contradict them, or if the commit hash shows significant drift.
|
|
|
|
**If no additional reviews are needed** (or `skip_eng_review` is `true` in the dashboard config, meaning this eng review was optional): state "All relevant reviews complete. Run /ship when ready."
|
|
|
|
Use AskUserQuestion with only the applicable options:
|
|
- **A)** Run /plan-design-review (only if UI scope detected and no design review exists)
|
|
- **B)** Run /plan-ceo-review (only if significant product change and no CEO review exists)
|
|
- **C)** Ready to implement — run /ship when done
|
|
|
|
## Unresolved decisions
|
|
If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option.
|
|
|