* feat(test): transcript-section-logger + ship-action fingerprint (T10) Pure-analysis module over a SkillTestResult/NDJSON transcript: - extractSectionReads(): which sections/*.md a run opened (post-carve check) - extractShipActions(): observable action fingerprint (merge/test/bump/ changelog/commit/push/pr) that works on the MONOLITH too, so a baseline captured before the carve can detect a sectioned-ship regression - baseline read/write + compareShipActions() for baseline-first dogf(T10) Baseline-first answers the Codex outside-voice critique that a logger in the same PR as the carve is post-failure telemetry without a pre-carve reference. 11 unit tests, all green. Paid monolith baseline capture runs separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(pipeline): section discovery + generation machinery (T9) - discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl - gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext as shared helpers (processTemplate and the new processSectionTemplate both call them, so a sanitization/rewrite fix can't miss sections) [C1] - processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice), parent-skill TemplateContext (skillName pinned to parent, not 'sections', so appliesTo gating + tier behave identically), per-host output routing - --host all now fails the build on ANY host failure, not just claude, so a stale external-host output can't slip the freshness gate [Codex outside-voice #9] Inert until a skill is carved (no sections/ dirs exist yet). Refactor is output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE. 5 discovery unit tests + 389 gen-skill-docs tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9) Two install targets cherry-pick SKILL.md and would leave a carved skill's sections/ behind, 404ing a runtime 'Read sections/<name>.md': - link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows gets a fresh copy on every ./setup) - kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under ~/.kiro, not ~/.codex/~/.claude codex/factory/opencode link the whole generated dir, so sections ride free. Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a skill is carved. Static-tripwire test + windows-fallback invariant green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9) Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a tested CLI instead of bash prose the agent re-derives each run. - classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION vs origin/<base>:VERSION vs package.json.version (pure reader) - write: validated dual-write to VERSION + package.json (FRESH bump) - repair: DRIFT_STALE_PKG sync, no re-bump Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from skippable prose into code that can't be skipped or misread. 15 tests (exhaustive state matrix + write/repair fs + real-git classify). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): sectioned-skill parity capability — guards the carve (T9) Carved skills (skeleton + sections/*.md) need parity checks that see relocated content, or moving a phrase into a section reads as 'lost': - readSkillForParity(): union skeleton + all sections/*.md - checkSkillParity sectioned mode: content checks against the union; minBytes/ maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a small skeleton would otherwise make the size floor toothless [Codex #12]. Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the same commit it lands. Monolith path byte-identical (verified: pre-existing investigate 1.053 ratio drift fails the same with this change stashed). 7 sectioned-parity tests + existing parity tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(ship): carve into skeleton + on-demand sections (Claude) (T9) ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving 8 prose-heavy steps into ship/sections/*.md, read on demand: tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body. Step 12's version logic now calls the tested gstack-version-bump CLI instead of inline bash. Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton + generated section files) and INLINES the content on every other host, so external hosts keep the full monolith — verified factory at 162KB with no sections dir. {{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures. Multi-pass resolve expands inlined sections' own resolvers. Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/ golden/plan-completion/#1539/size-budget tests via skeleton+sections union reads. Free suite green except the pre-existing investigate parity drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): manifest-consistency + context-parity + requiredReads helper (T9) Free deterministic guards for the carve: - required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the mechanical layer-5 check that the agent Read the sections its situation needs (required set comes from the fixture, not the passive manifest) - section-manifest-consistency: 3-tier orphan classification (generated orphan + hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and pins the PASSIVE-manifest contract (no applies_when/required_for) - template-context-parity: generated sections have zero unresolved placeholders and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW) rendered — proving sections resolve with the parent skillName, not 'sections' 16 tests, all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): section-loading E2E + idempotency CLI detection (T9) - skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan mode against a fresh version-changing fixture and asserts the agent Read the required sections (review-army + changelog). Runs against the INSTALLED skill (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip. - skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12 now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a gstack-version-bump-write re-bump regression signal. - touchfiles: register ship-section-loading (periodic) + extend idempotency deps with bin/gstack-version-bump + scripts/resolvers/sections.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): union-read redaction wiring test for the carve (T9) main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the carve, not the skeleton template. Read skeleton + section templates union so the redaction-wiring assertions follow the relocated content. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
20 KiB
Step 8: Plan Completion Audit
Dispatch this step as a subagent using the Agent tool with subagent_type: "general-purpose". The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
Subagent prompt: Pass these instructions to the subagent:
You are running a ship-workflow plan completion audit. The base branch is
<base>. Usegit diff <base>...HEADto see what shipped. Do not commit or push — report only.Plan File Discovery
-
Conversation context (primary): Check if there is an active plan file in this conversation. The host agent's system messages include plan file paths when in plan mode. If found, use it directly — this is the most reliable signal.
-
Content-based search (fallback): If no plan file is referenced in conversation context, search by content:
setopt +o nomatch 2>/dev/null || true # zsh compat
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
# Compute project slug for ~/.gstack/projects/ lookup
_PLAN_SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-' | tr -cd 'a-zA-Z0-9._-') || true
_PLAN_SLUG="${_PLAN_SLUG:-$(basename "$PWD" | tr -cd 'a-zA-Z0-9._-')}"
# Search common plan file locations (project designs first, then personal/local)
for PLAN_DIR in "$HOME/.gstack/projects/$_PLAN_SLUG" "$HOME/.claude/plans" "$HOME/.codex/plans" ".gstack/plans"; do
[ -d "$PLAN_DIR" ] || continue
PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$BRANCH" 2>/dev/null | head -1)
[ -z "$PLAN" ] && PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$REPO" 2>/dev/null | head -1)
[ -z "$PLAN" ] && PLAN=$(find "$PLAN_DIR" -name '*.md' -mmin -1440 -maxdepth 1 2>/dev/null | xargs ls -t 2>/dev/null | head -1)
[ -n "$PLAN" ] && break
done
[ -n "$PLAN" ] && echo "PLAN_FILE: $PLAN" || echo "NO_PLAN_FILE"
- Validation: If a plan file was found via content-based search (not conversation context), read the first 20 lines and verify it is relevant to the current branch's work. If it appears to be from a different project or feature, treat as "no plan file found."
Error handling:
- No plan file found → skip with "No plan file detected — skipping."
- Plan file found but unreadable (permissions, encoding) → skip with "Plan file found but unreadable — skipping."
Actionable Item Extraction
Read the plan file. Extract every actionable item — anything that describes work to be done. Look for:
- Checkbox items:
- [ ] ...or- [x] ... - Numbered steps under implementation headings: "1. Create ...", "2. Add ...", "3. Modify ..."
- Imperative statements: "Add X to Y", "Create a Z service", "Modify the W controller"
- File-level specifications: "New file: path/to/file.ts", "Modify path/to/existing.rb"
- Test requirements: "Test that X", "Add test for Y", "Verify Z"
- Data model changes: "Add column X to table Y", "Create migration for Z"
Ignore:
- Context/Background sections (
## Context,## Background,## Problem) - Questions and open items (marked with ?, "TBD", "TODO: decide")
- Review report sections (
## GSTACK REVIEW REPORT) - Explicitly deferred items ("Future:", "Out of scope:", "NOT in scope:", "P2:", "P3:", "P4:")
- CEO Review Decisions sections (these record choices, not work items)
Cap: Extract at most 50 items. If the plan has more, note: "Showing top 50 of N plan items — full list in plan file."
No items found: If the plan contains no extractable actionable items, skip with: "Plan file contains no actionable items — skipping completion audit."
For each item, note:
- The item text (verbatim or concise summary)
- Its category: CODE | TEST | MIGRATION | CONFIG | DOCS
Verification Mode
Before judging completion, classify HOW each item can be verified. The diff alone cannot prove every kind of work. Items outside the current repo or system are structurally invisible to git diff.
- DIFF-VERIFIABLE — A code change in this repo would manifest in
git diff <base>...HEAD. Examples: "add UserService" (file appears), "validate input X" (validation logic appears), "create users table" (migration file appears). - CROSS-REPO — Item names a file or change in a sibling repo (e.g.,
domain-hq/docs/dashboard.md,~/Development/<other-repo>/...). The current diff CANNOT prove this. - EXTERNAL-STATE — Item names state in an external system: Supabase config/RLS, Cloudflare DNS, Vercel env vars, OAuth provider allowlists, third-party SaaS, DNS records. The current diff CANNOT prove this.
- CONTENT-SHAPE — Item requires a file to follow a specific convention. If the file is in this repo: diff-verifiable. If in another repo or system: see CROSS-REPO / EXTERNAL-STATE.
Verification dispatch:
- DIFF-VERIFIABLE → cross-reference against diff (next section).
- CROSS-REPO → if the sibling repo is reachable on disk (try
~/Development/<repo>/,~/code/<repo>/, the parent of the current repo), run[ -f <path> ]to check file existence. File exists → DONE (cite path). File missing → NOT DONE (cite path). Path unreachable → UNVERIFIABLE (cite what needs manual check). - EXTERNAL-STATE → UNVERIFIABLE. Cite the system and the specific check the user must perform.
- CONTENT-SHAPE in another repo → if the file exists, run any project-detected validator (see "Validator detection" below) before falling back to UNVERIFIABLE. With a validator: pass → DONE; fail → NOT DONE (cite validator output). No validator available: classify UNVERIFIABLE and cite both the file path and the convention to confirm.
Path concreteness rule. If a plan item names a concrete filesystem path (absolute, ~/..., or <sibling-repo>/<file>), it MUST be classified DONE or NOT DONE based on [ -f <path> ]. UNVERIFIABLE is only valid when the path is genuinely abstract ("Cloudflare DNS", "Supabase allowlist") or the sibling root is unreachable on this machine. "I don't want to check" is not unreachable.
Validator detection. Before falling back to UNVERIFIABLE on a CONTENT-SHAPE item, scan the target repo's package.json for any script matching validate-*, lint-wiki, check-docs, or similar. If found, invoke it with the relevant path argument (e.g., npm run validate-wiki -- <path>). For multi-target validators (e.g., validate-wiki --all), run once and reconcile per-item from the output. A passing validator promotes the item from UNVERIFIABLE to DONE; a failing one demotes to NOT DONE.
Honesty rule. Do NOT classify an item as DONE just because related code shipped. Code that handles a deliverable is not the deliverable. Shipping a markdown-extraction library is not the same as shipping the markdown file. When in doubt between DONE and UNVERIFIABLE, prefer UNVERIFIABLE — better to surface a confirmation prompt than silently miss a deliverable.
Cross-Reference Against Diff
Run git diff origin/<base>...HEAD and git log origin/<base>..HEAD --oneline to understand what was implemented.
For each extracted plan item, run the verification dispatch from the previous section, then classify:
- DONE — Clear evidence the item shipped. Cite the specific file(s) changed in the diff for DIFF-VERIFIABLE items, or the verified path that exists for CROSS-REPO items with a reachable sibling repo.
- PARTIAL — Some work toward this item exists but is incomplete (e.g., model created but controller missing, function exists but edge cases not handled).
- NOT DONE — Verification ran and produced negative evidence (file missing, code absent in diff, sibling-repo file confirmed absent).
- CHANGED — The item was implemented using a different approach than the plan described, but the same goal is achieved. Note the difference.
- UNVERIFIABLE — The diff and any reachable sibling-repo checks cannot prove or disprove this. Always applies to EXTERNAL-STATE items and to CROSS-REPO items where the sibling repo isn't reachable. Cite the specific manual verification the user must perform (e.g., "check Cloudflare DNS shows DNS-only mode for dashboard.example.com", "confirm /docs/dashboard.md exists in domain-hq repo").
Be conservative with DONE — require clear evidence. A file being touched is not enough; the specific functionality described must be present. Be generous with CHANGED — if the goal is met by different means, that counts as addressed. Be honest with UNVERIFIABLE — better to surface 5 items the user must manually confirm than silently classify them DONE.
Output Format
PLAN COMPLETION AUDIT
═══════════════════════════════
Plan: {plan file path}
## Implementation Items
[DONE] Create UserService — src/services/user_service.rb (+142 lines)
[PARTIAL] Add validation — model validates but missing controller checks
[NOT DONE] Add caching layer — no cache-related changes in diff
[CHANGED] "Redis queue" → implemented with Sidekiq instead
## Test Items
[DONE] Unit tests for UserService — test/services/user_service_test.rb
[NOT DONE] E2E test for signup flow
## Migration Items
[DONE] Create users table — db/migrate/20240315_create_users.rb
## Cross-Repo / External Items
[DONE] sibling-repo has /docs/dashboard.md — verified at ~/Development/sibling-repo/docs/dashboard.md
[UNVERIFIABLE] Cloudflare DNS-only on api.example.com — external system, manual check required
[UNVERIFIABLE] Supabase auth allowlist contains user email — external system, confirm in Supabase dashboard
─────────────────────────────────
COMPLETION: 5/9 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED, 2 UNVERIFIABLE
─────────────────────────────────
Gate Logic
After producing the completion checklist, evaluate in priority order:
-
Any NOT DONE items (highest priority — known missing work). Use AskUserQuestion:
- Show the completion checklist above
- "{N} items from the plan are NOT DONE. These were part of the original plan but are missing from the implementation."
- RECOMMENDATION: depends on item count and severity. If 1-2 minor items (docs, config), recommend B. If core functionality is missing, recommend A.
- Options: A) Stop — implement the missing items before shipping B) Ship anyway — defer these to a follow-up (will create P1 TODOs in Step 5.5) C) These items were intentionally dropped — remove from scope
- If A: STOP. List the missing items for the user to implement.
- If B: Continue. For each NOT DONE item, create a P1 TODO in Step 5.5 with "Deferred from plan: {plan file path}".
- If C: Continue. Note in PR body: "Plan items intentionally dropped: {list}."
-
Any UNVERIFIABLE items (silent gaps — the diff cannot prove them either way). Only fires after NOT DONE is resolved or absent.
Per-item confirmation is mandatory. Do NOT use a single AskUserQuestion to blanket-confirm all UNVERIFIABLE items. Blanket confirmation is the failure mode that surfaced in VAS-449 (user clicks A without opening any file). Instead:
- Loop through UNVERIFIABLE items one at a time.
- For each item, use AskUserQuestion with the item's specific manual check (e.g., "Confirm: does
~/Development/domain-hq/docs/dashboard.mdexist?", not "Have you checked all items?"). - Options per item: Y) Confirmed done — cite what you verified (free-text, embedded in PR body) N) Not done — block ship; treat as NOT DONE and re-enter the priority-1 gate D) Intentionally dropped — note in PR body: "Plan item intentionally dropped: {item}"
- RECOMMENDATION per item: Y if the item is concrete and easily verified; N if it's critical-path (auth, DNS, deliverables to other repos) and the user shows hesitation.
Exit conditions:
- Any N: STOP. Surface the missing items, suggest re-running /ship after they're addressed.
- All Y or D: Continue. Embed
## Plan Completion — Manual Verificationssection in PR body listing each Y'd item with the user's free-text evidence and each D'd item with "intentionally dropped".
Cap. If there are more than 5 UNVERIFIABLE items, present them as a numbered list first and ask whether the user wants to (1) confirm each individually, (2) stop and reduce scope, or (3) explicitly accept blanket-confirmation with the warning that this is the VAS-449 failure shape. Default and recommended option is (1).
-
Only PARTIAL items (no NOT DONE, no UNVERIFIABLE): Continue with a note in the PR body. Not blocking.
-
All DONE or CHANGED: Pass. "Plan completion: PASS — all items addressed." Continue.
No plan file found: Skip entirely. "No plan file detected — skipping plan completion audit."
Include in PR body (Step 8): Add a ## Plan Completion section with the checklist summary.
After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}
Parent processing:
- Parse the LAST line of the subagent's output as JSON.
- Store
done,deferred,unverifiablefor Step 20 metrics; usesummaryin PR body. - If
deferred > 0orunverifiable > 0and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing. - Embed
summaryin PR body's## Plan Completionsection (Step 19). Ifunverifiable > 0and the user picked option A in the UNVERIFIABLE gate, also embed## Plan Completion — Manual Verificationslisting each user-confirmed item.
If the subagent fails or returns invalid JSON: Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.
Step 8.1: Plan Verification
Automatically verify the plan's testing/verification steps using the /qa-only skill.
1. Check for verification section
Using the plan file already discovered in Step 8, look for a verification section. Match any of these headings: ## Verification, ## Test plan, ## Testing, ## How to test, ## Manual testing, or any section with verification-flavored items (URLs to visit, things to check visually, interactions to test).
If no verification section found: Skip with "No verification steps found in plan — skipping auto-verification." If no plan file was found in Step 8: Skip (already handled).
2. Check for running dev server
Before invoking browse-based verification, check if a dev server is reachable:
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null || \
curl -s -o /dev/null -w '%{http_code}' http://localhost:8080 2>/dev/null || \
curl -s -o /dev/null -w '%{http_code}' http://localhost:5173 2>/dev/null || \
curl -s -o /dev/null -w '%{http_code}' http://localhost:4000 2>/dev/null || echo "NO_SERVER"
If NO_SERVER: Skip with "No dev server detected — skipping plan verification. Run /qa separately after deploying."
3. Invoke /qa-only inline
Read the /qa-only skill from disk:
cat ${CLAUDE_SKILL_DIR}/../qa-only/SKILL.md
If unreadable: Skip with "Could not load /qa-only — skipping plan verification."
Follow the /qa-only workflow with these modifications:
- Skip the preamble (already handled by /ship)
- Use the plan's verification section as the primary test input — treat each verification item as a test case
- Use the detected dev server URL as the base URL
- Skip the fix loop — this is report-only verification during /ship
- Cap at the verification items from the plan — do not expand into general site QA
4. Gate logic
- All verification items PASS: Continue silently. "Plan verification: PASS."
- Any FAIL: Use AskUserQuestion:
- Show the failures with screenshot evidence
- RECOMMENDATION: Choose A if failures indicate broken functionality. Choose B if cosmetic only.
- Options: A) Fix the failures before shipping (recommended for functional issues) B) Ship anyway — known issues (acceptable for cosmetic issues)
- No verification section / no server / unreadable skill: Skip (non-blocking).
5. Include in PR body
Add a ## Verification Results section to the PR body (Step 19):
- If verification ran: summary of results (N PASS, M FAIL, K SKIPPED)
- If skipped: reason for skipping (no plan, no server, no verification section)
Prior Learnings
Search for relevant learnings from previous sessions:
_CROSS_PROJ=$(~/.claude/skills/gstack/bin/gstack-config get cross_project_learnings 2>/dev/null || echo "unset")
echo "CROSS_PROJECT: $_CROSS_PROJ"
if [ "$_CROSS_PROJ" = "true" ]; then
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" --cross-project 2>/dev/null || true
else
~/.claude/skills/gstack/bin/gstack-learnings-search --limit 10 --query "release ship version changelog merge pr" 2>/dev/null || true
fi
If CROSS_PROJECT is unset (first time): Use AskUserQuestion:
gstack can search learnings from your other projects on this machine to find patterns that might apply here. This stays local (no data leaves your machine). Recommended for solo developers. Skip if you work on multiple client codebases where cross-contamination would be a concern.
Options:
- A) Enable cross-project learnings (recommended)
- B) Keep learnings project-scoped only
If A: run ~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings true
If B: run ~/.claude/skills/gstack/bin/gstack-config set cross_project_learnings false
Then re-run the search with the appropriate flag.
If learnings are found, incorporate them into your analysis. When a review finding matches a past learning, display:
"Prior learning applied: [key] (confidence N/10, from [date])"
This makes the compounding visible. The user should see that gstack is getting smarter on their codebase over time.
Step 8.2: Scope Drift Detection
Before reviewing code quality, check: did they build what was requested — nothing more, nothing less?
-
Read
TODOS.md(if it exists). Read PR description (gh pr view --json body --jq .body 2>/dev/null || true). Read commit messages (git log origin/<base>..HEAD --oneline). If no PR exists: rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR. -
Identify the stated intent — what was this branch supposed to accomplish?
-
Run
DIFF_BASE=$(git merge-base origin/<base> HEAD) && git diff "$DIFF_BASE" --statand compare the files changed against the stated intent. -
Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
SCOPE CREEP detection:
- Files changed that are unrelated to the stated intent
- New features or refactors not mentioned in the plan
- "While I was in there..." changes that expand blast radius
MISSING REQUIREMENTS detection:
- Requirements from TODOS.md/PR description not addressed in the diff
- Test coverage gaps for stated requirements
- Partial implementations (started but not finished)
-
Output (before the main review begins): ``` Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING] Intent: <1-line summary of what was requested> Delivered: <1-line summary of what the diff actually does> [If drift: list each out-of-scope change] [If missing: list each unaddressed requirement] ```
-
This is INFORMATIONAL — does not block the review. Proceed to the next step.