mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030)
* refactor: renumber /ship steps to clean integers (1-20)
Replaces fractional step numbers (1.5, 2.5, 3.25, 3.4, 3.45, 3.47, 3.48,
3.5, 3.55, 3.56, 3.57, 3.75, 3.8, 5.5, 6.5, 8.5, 8.75) with clean
integers 1 through 20, plus allowed resolver sub-steps 8.1, 8.2,
9.1, 9.2, 9.3. Fractional numbering signaled "optional appendix" and
contributed to /ship's habit of skipping late-stage steps.
Affects:
- ship/SKILL.md.tmpl (all headings + ~30 cross-references)
- scripts/resolvers/review.ts (ship-side 3.47/3.48/3.57/3.8 conditionals)
- scripts/resolvers/review-army.ts (ship-side 3.55/3.56 conditionals)
- scripts/resolvers/testing.ts (ship-side 2.5/3.4 references, 5 sites)
- scripts/resolvers/utility.ts (CHANGELOG heading gets Step 13 prefix)
- test/gen-skill-docs.test.ts (5 step-number assertions updated)
- test/skill-validation.test.ts (3 step-number assertions updated)
/review step numbering (1.5, 2.5, 4.5, 5.5-5.8) intentionally unchanged —
only the ship-side of each isShip conditional was updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: subagent isolation for /ship's 4 context-heaviest sub-workflows
Fights context rot. By late /ship, the parent context is bloated with
500-1,750 lines of intermediate tool output from tests, coverage audits,
reviews, adversarial checks, and PR body construction. The model is
at its least intelligent when it reaches doc-sync — which is why
/document-release was being skipped ~80% of the time.
Applies subagent dispatch (proven pattern from Review Army at Step 9.1
and Adversarial at Step 11) to four sub-workflows where the parent
only needs the conclusion, not the intermediate output:
- Step 7 (Test Coverage Audit) — subagent returns coverage_pct, gaps,
diagram, tests_added
- Step 8 (Plan Completion Audit) — subagent returns total_items, done,
changed, deferred, summary
- Step 10 (Greptile Triage) — subagent fetches + classifies, parent
handles user interaction and commits fixes (AskUserQuestion + Edit
can't run in subagents)
- Step 18 (Documentation Sync) — subagent invokes full /document-release
skill in fresh context; parent embeds documentation_section in PR body
Sequencing fix for Step 18: runs AFTER Step 17 (Push) and BEFORE Step 19
(Create PR). The PR is created once from final HEAD with the
## Documentation section baked into the initial body — no create-then-
re-edit dance, no race conditions with document-release's own PR body
editor.
Adds "You are NOT done" guardrail after Step 17 (Push) to break the
natural stopping point that currently causes doc-release skips.
Each subagent falls back to inline execution if it fails or returns
invalid JSON. /ship never blocks on subagent failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: regression guard for /ship step numbering
Three regression guards in skill-validation.test.ts to prevent future
drift back to fractional step numbering:
1. ship/SKILL.md.tmpl contains no fractional step numbers except the
allowed resolver sub-steps (8.1, 8.2, 9.1, 9.2, 9.3). A contributor
adding "Step 3.75" next month will fail this test with a clear error.
2. ship/SKILL.md main headings use clean integer step numbers. If a
renumber accidentally leaves a decimal heading, this catches it.
3. review/SKILL.md step numbers unchanged — regression guard for the
resolver conditionals in review.ts/review-army.ts. If a future edit
accidentally touches the review-side of an isShip ternary, /review's
fractional numbering (1.5, 4.5, 5.7) would vanish. This test catches
that cross-contamination.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: sync ship step references after renumber
CLAUDE.md: "At /ship time (Step 5)" → "(Step 13)" — CHANGELOG is now
explicitly Step 13 after the renumber (was implicit between old
Step 4 and Step 5.5).
TODOS.md: "Step 3.4 coverage audit" → "Step 7" — references the open
TODO for auto-upgrading ★-rated tests, which hooks into the coverage
audit step.
Both are historical references to ship's step numbering that became
stale when clean integer renumbering landed in 566d42c2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: update golden ship skill baselines after renumber + subagent refactor
The golden fixtures at test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md
regression-test that generated ship/SKILL.md output matches a committed baseline.
After renumbering steps to clean integers and converting 4 sub-workflows to
subagent dispatches, the generated output changed substantially — refresh the
baselines to reflect the new expected output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.18.1.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: gitignore Claude Code harness runtime artifacts
.claude/scheduled_tasks.lock appears when ScheduleWakeup fires. It's a
runtime lock file owned by the Claude Code harness, not project source.
Add .claude/*.lock too so future harness artifacts in that directory
don't need their own gitignore entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -13,8 +13,8 @@ import type { TemplateContext } from './types';
|
||||
|
||||
function generateSpecialistSelection(ctx: TemplateContext): string {
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepSel = isShip ? '3.55' : '4.5';
|
||||
const stepMerge = isShip ? '3.56' : '4.6';
|
||||
const stepSel = isShip ? '9.1' : '4.5';
|
||||
const stepMerge = isShip ? '9.2' : '4.6';
|
||||
const nextStep = isShip ? 'the Fix-First flow (item 4)' : 'Step 5';
|
||||
return `## Step ${stepSel}: Review Army — Specialist Dispatch
|
||||
|
||||
@@ -134,10 +134,10 @@ CHECKLIST:
|
||||
|
||||
function generateFindingsMerge(ctx: TemplateContext): string {
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepMerge = isShip ? '3.56' : '4.6';
|
||||
const stepSel = isShip ? '3.55' : '4.5';
|
||||
const stepMerge = isShip ? '9.2' : '4.6';
|
||||
const stepSel = isShip ? '9.1' : '4.5';
|
||||
const fixFirstRef = isShip ? 'the Fix-First flow (item 4)' : 'Step 5 Fix-First';
|
||||
const critPassRef = isShip ? 'the checklist pass (Step 3.5)' : 'the CRITICAL pass findings from Step 4';
|
||||
const critPassRef = isShip ? 'the checklist pass (Step 9)' : 'the CRITICAL pass findings from Step 4';
|
||||
const persistRef = isShip ? 'the review-log persist' : 'the review-log entry in Step 5.8';
|
||||
return `### Step ${stepMerge}: Collect and merge findings
|
||||
|
||||
@@ -202,7 +202,7 @@ Remember these stats — you will need them for the review-log entry in Step 5.8
|
||||
|
||||
function generateRedTeam(ctx: TemplateContext): string {
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepMerge = isShip ? '3.56' : '4.6';
|
||||
const stepMerge = isShip ? '9.2' : '4.6';
|
||||
const fixFirstRef = isShip ? 'the Fix-First flow (item 4)' : 'Step 5 Fix-First';
|
||||
return `### Red Team dispatch (conditional)
|
||||
|
||||
|
||||
@@ -368,7 +368,7 @@ If A: revise the premise and note the revision. If B: proceed (and note that the
|
||||
|
||||
export function generateScopeDrift(ctx: TemplateContext): string {
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepNum = isShip ? '3.48' : '1.5';
|
||||
const stepNum = isShip ? '8.2' : '1.5';
|
||||
|
||||
return `## Step ${stepNum}: Scope Drift Detection
|
||||
|
||||
@@ -413,7 +413,7 @@ export function generateAdversarialStep(ctx: TemplateContext): string {
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepNum = isShip ? '3.8' : '5.7';
|
||||
const stepNum = isShip ? '11' : '5.7';
|
||||
|
||||
return `## Step ${stepNum}: Adversarial review (always-on)
|
||||
|
||||
@@ -501,7 +501,7 @@ A) Investigate and fix now (recommended)
|
||||
B) Continue — review will still complete
|
||||
\`\`\`
|
||||
|
||||
If A: address the findings${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Re-run \`codex review\` to verify.
|
||||
If A: address the findings${isShip ? '. After fixing, re-run tests (Step 5) since code has changed' : ''}. Re-run \`codex review\` to verify.
|
||||
|
||||
Read stderr for errors (same error handling as Codex adversarial above).
|
||||
|
||||
@@ -917,16 +917,16 @@ export function generatePlanCompletionAuditReview(_ctx: TemplateContext): string
|
||||
// ─── Plan Verification Execution ──────────────────────────────────────
|
||||
|
||||
export function generatePlanVerificationExec(_ctx: TemplateContext): string {
|
||||
return `## Step 3.47: Plan Verification
|
||||
return `## Step 8.1: Plan Verification
|
||||
|
||||
Automatically verify the plan's testing/verification steps using the \`/qa-only\` skill.
|
||||
|
||||
### 1. Check for verification section
|
||||
|
||||
Using the plan file already discovered in Step 3.45, look for a verification section. Match any of these headings: \`## Verification\`, \`## Test plan\`, \`## Testing\`, \`## How to test\`, \`## Manual testing\`, or any section with verification-flavored items (URLs to visit, things to check visually, interactions to test).
|
||||
Using the plan file already discovered in Step 8, look for a verification section. Match any of these headings: \`## Verification\`, \`## Test plan\`, \`## Testing\`, \`## How to test\`, \`## Manual testing\`, or any section with verification-flavored items (URLs to visit, things to check visually, interactions to test).
|
||||
|
||||
**If no verification section found:** Skip with "No verification steps found in plan — skipping auto-verification."
|
||||
**If no plan file was found in Step 3.45:** Skip (already handled).
|
||||
**If no plan file was found in Step 8:** Skip (already handled).
|
||||
|
||||
### 2. Check for running dev server
|
||||
|
||||
@@ -971,7 +971,7 @@ Follow the /qa-only workflow with these modifications:
|
||||
|
||||
### 5. Include in PR body
|
||||
|
||||
Add a \`## Verification Results\` section to the PR body (Step 8):
|
||||
Add a \`## Verification Results\` section to the PR body (Step 19):
|
||||
- If verification ran: summary of results (N PASS, M FAIL, K SKIPPED)
|
||||
- If skipped: reason for skipping (no plan, no server, no verification section)`;
|
||||
}
|
||||
@@ -980,9 +980,9 @@ Add a \`## Verification Results\` section to the PR body (Step 8):
|
||||
|
||||
export function generateCrossReviewDedup(ctx: TemplateContext): string {
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepNum = isShip ? '3.57' : '5.0';
|
||||
const stepNum = isShip ? '9.3' : '5.0';
|
||||
const findingsRef = isShip
|
||||
? 'the checklist pass (Step 3.5) and specialist review (Step 3.55-3.56)'
|
||||
? 'the checklist pass (Step 9) and specialist review (Step 9.1-9.2)'
|
||||
: 'Step 4 critical pass and Step 4.5-4.6 specialists';
|
||||
|
||||
return `### Step ${stepNum}: Cross-review finding dedup
|
||||
|
||||
@@ -28,7 +28,7 @@ ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 7. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
@@ -213,7 +213,7 @@ ls jest.config.* vitest.config.* playwright.config.* cypress.config.* .rspec pyt
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
\`\`\`
|
||||
|
||||
3. **If no framework detected:**${mode === 'ship' ? ' falls through to the Test Framework Bootstrap step (Step 2.5) which handles full setup.' : ' still produce the coverage diagram, but skip test generation.'}`);
|
||||
3. **If no framework detected:**${mode === 'ship' ? ' falls through to the Test Framework Bootstrap step (Step 4) which handles full setup.' : ' still produce the coverage diagram, but skip test generation.'}`);
|
||||
|
||||
// ── Before/after count (ship only) ──
|
||||
if (mode === 'ship') {
|
||||
@@ -379,7 +379,7 @@ GAPS: 8 paths need tests (2 need E2E, 1 needs eval)
|
||||
─────────────────────────────────
|
||||
\`\`\`
|
||||
|
||||
**Fast path:** All paths covered → "${mode === 'ship' ? 'Step 3.4' : mode === 'review' ? 'Step 4.75' : 'Test review'}: All new code paths have test coverage ✓" Continue.`);
|
||||
**Fast path:** All paths covered → "${mode === 'ship' ? 'Step 7' : mode === 'review' ? 'Step 4.75' : 'Test review'}: All new code paths have test coverage ✓" Continue.`);
|
||||
|
||||
// ── Mode-specific action section ──
|
||||
if (mode === 'plan') {
|
||||
@@ -432,7 +432,7 @@ This file is consumed by \`/qa\` and \`/qa-only\` as primary test input. Include
|
||||
sections.push(`
|
||||
**5. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 2.5):
|
||||
If test framework detected (or bootstrapped in Step 4):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
@@ -446,7 +446,7 @@ Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-m
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
|
||||
**Diff is test-only changes:** Skip Step 7 entirely: "No new application code paths to audit."
|
||||
|
||||
**6. After-count and coverage summary:**
|
||||
|
||||
|
||||
@@ -373,7 +373,7 @@ export function generateCoAuthorTrailer(ctx: TemplateContext): string {
|
||||
}
|
||||
|
||||
export function generateChangelogWorkflow(_ctx: TemplateContext): string {
|
||||
return `## CHANGELOG (auto-generate)
|
||||
return `## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
1. Read \`CHANGELOG.md\` header to know the format.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user