v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806)

* feat(test): transcript-section-logger + ship-action fingerprint (T10) Pure-analysis module over a SkillTestResult/NDJSON transcript: - extractSectionReads(): which sections/*.md a run opened (post-carve check) - extractShipActions(): observable action fingerprint (merge/test/bump/ changelog/commit/push/pr) that works on the MONOLITH too, so a baseline captured before the carve can detect a sectioned-ship regression - baseline read/write + compareShipActions() for baseline-first dogf(T10) Baseline-first answers the Codex outside-voice critique that a logger in the same PR as the carve is post-failure telemetry without a pre-carve reference. 11 unit tests, all green. Paid monolith baseline capture runs separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(pipeline): section discovery + generation machinery (T9) - discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl - gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext as shared helpers (processTemplate and the new processSectionTemplate both call them, so a sanitization/rewrite fix can't miss sections) [C1] - processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice), parent-skill TemplateContext (skillName pinned to parent, not 'sections', so appliesTo gating + tier behave identically), per-host output routing - --host all now fails the build on ANY host failure, not just claude, so a stale external-host output can't slip the freshness gate [Codex outside-voice #9] Inert until a skill is carved (no sections/ dirs exist yet). Refactor is output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE. 5 discovery unit tests + 389 gen-skill-docs tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9) Two install targets cherry-pick SKILL.md and would leave a carved skill's sections/ behind, 404ing a runtime 'Read sections/<name>.md': - link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows gets a fresh copy on every ./setup) - kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under ~/.kiro, not ~/.codex/~/.claude codex/factory/opencode link the whole generated dir, so sections ride free. Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a skill is carved. Static-tripwire test + windows-fallback invariant green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9) Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a tested CLI instead of bash prose the agent re-derives each run. - classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION vs origin/<base>:VERSION vs package.json.version (pure reader) - write: validated dual-write to VERSION + package.json (FRESH bump) - repair: DRIFT_STALE_PKG sync, no re-bump Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from skippable prose into code that can't be skipped or misread. 15 tests (exhaustive state matrix + write/repair fs + real-git classify). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): sectioned-skill parity capability — guards the carve (T9) Carved skills (skeleton + sections/*.md) need parity checks that see relocated content, or moving a phrase into a section reads as 'lost': - readSkillForParity(): union skeleton + all sections/*.md - checkSkillParity sectioned mode: content checks against the union; minBytes/ maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a small skeleton would otherwise make the size floor toothless [Codex #12]. Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the same commit it lands. Monolith path byte-identical (verified: pre-existing investigate 1.053 ratio drift fails the same with this change stashed). 7 sectioned-parity tests + existing parity tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(ship): carve into skeleton + on-demand sections (Claude) (T9) ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving 8 prose-heavy steps into ship/sections/*.md, read on demand: tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body. Step 12's version logic now calls the tested gstack-version-bump CLI instead of inline bash. Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton + generated section files) and INLINES the content on every other host, so external hosts keep the full monolith — verified factory at 162KB with no sections dir. {{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures. Multi-pass resolve expands inlined sections' own resolvers. Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/ golden/plan-completion/#1539/size-budget tests via skeleton+sections union reads. Free suite green except the pre-existing investigate parity drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): manifest-consistency + context-parity + requiredReads helper (T9) Free deterministic guards for the carve: - required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the mechanical layer-5 check that the agent Read the sections its situation needs (required set comes from the fixture, not the passive manifest) - section-manifest-consistency: 3-tier orphan classification (generated orphan + hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and pins the PASSIVE-manifest contract (no applies_when/required_for) - template-context-parity: generated sections have zero unresolved placeholders and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW) rendered — proving sections resolve with the parent skillName, not 'sections' 16 tests, all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): section-loading E2E + idempotency CLI detection (T9) - skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan mode against a fresh version-changing fixture and asserts the agent Read the required sections (review-army + changelog). Runs against the INSTALLED skill (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip. - skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12 now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a gstack-version-bump-write re-bump regression signal. - touchfiles: register ship-section-loading (periodic) + extend idempotency deps with bin/gstack-version-bump + scripts/resolvers/sections.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): union-read redaction wiring test for the carve (T9) main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the carve, not the skeleton template. Read skeleton + section templates union so the redaction-wiring assertions follow the relocated content. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-17 13:07:22 +02:00 · 2026-05-30 12:09:10 -07:00
parent 9562ad4e70
commit 46c1fae7f1
51 changed files with 4445 additions and 4891 deletions
@@ -7,6 +7,22 @@ import * as path from 'path';

 const ROOT = path.resolve(import.meta.dir, '..');

+// Carved-skill aware (v2 plan T9): ship is a skeleton SKILL.md + sections/*.md.
+// Read the union so validations of content that moved into a section still hold.
+// `_SHIP_MD` is a distinct path expression so a mechanical read-replace can't
+// recurse into this helper.
+const _SHIP_MD = path.join(ROOT, 'ship', 'SKILL.md');
+function readShipUnion(): string {
+  let t = fs.readFileSync(_SHIP_MD, 'utf-8');
+  const secDir = path.join(ROOT, 'ship', 'sections');
+  if (fs.existsSync(secDir)) {
+    for (const f of fs.readdirSync(secDir).sort()) {
+      if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf-8');
+    }
+  }
+  return t;
+}
+
 describe('SKILL.md command validation', () => {
  test('all $B commands in SKILL.md are valid browse commands', () => {
    const result = validateSkill(path.join(ROOT, 'SKILL.md'));
@@ -315,7 +331,8 @@ describe('Cross-skill path consistency', () => {
    for (const file of filesToCheck) {
      const filePath = path.join(ROOT, file);
      if (!fs.existsSync(filePath)) continue;
-      const content = fs.readFileSync(filePath, 'utf-8');
+      // ship's greptile handling moved into sections/greptile.md (T9 carve).
+      const content = file === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(filePath, 'utf-8');

      const hasBoth = (content.includes('per-project') && content.includes('global')) ||
        (content.includes('$REMOTE_SLUG/greptile-history') && content.includes('~/.gstack/greptile-history'));
@@ -437,7 +454,7 @@ describe('Greptile history format consistency', () => {

  test('review/SKILL.md and ship/SKILL.md both reference greptile-triage.md for write details', () => {
    const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
-    const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const shipContent = readShipUnion();

    expect(reviewContent.toLowerCase()).toContain('greptile-triage.md');
    expect(shipContent.toLowerCase()).toContain('greptile-triage.md');
@@ -530,7 +547,7 @@ describe('TODOS-format.md reference consistency', () => {
  });

  test('skills that write TODOs reference TODOS-format.md', () => {
-    const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const shipContent = readShipUnion();
    const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
    const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');

@@ -788,7 +805,7 @@ describe('Enum & Value Completeness in review checklist', () => {
    expect(checklist).toContain('ASK');

    const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
-    const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
+    const shipSkill = readShipUnion();
    expect(reviewSkill).toContain('AUTO-FIX');
    expect(reviewSkill).toContain('[AUTO-FIXED]');
    expect(shipSkill).toContain('AUTO-FIX');
@@ -1014,7 +1031,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
  });

  test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Test Framework Bootstrap');
    expect(content).toContain('Step 4');
  });
@@ -1063,7 +1080,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {

  test('WebSearch is in allowed-tools for qa, ship, design-review', () => {
    const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
-    const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const ship = readShipUnion();
    const qaDesign = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
    expect(qa).toContain('WebSearch');
    expect(ship).toContain('WebSearch');
@@ -1112,7 +1129,7 @@ describe('Phase 8e.5 regression test generation', () => {

 describe('Step 3.4 test coverage audit', () => {
  test('ship/SKILL.md contains Step 7', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Step 7: Test Coverage Audit');
    // The coverage diagram collapses code-path and user-flow counts onto one
    // summary line. Verify that summary is present (labels are stable).
@@ -1120,7 +1137,7 @@ describe('Step 3.4 test coverage audit', () => {
  });

  test('Step 3.4 includes quality scoring rubric', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('★★★');
    expect(content).toContain('★★');
    expect(content).toContain('edge cases AND error paths');
@@ -1128,36 +1145,36 @@ describe('Step 3.4 test coverage audit', () => {
  });

  test('Step 3.4 includes before/after test count', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Count test files before');
    expect(content).toContain('Count test files after');
  });

  test('ship PR body includes Test Coverage section', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('## Test Coverage');
  });

  test('ship rules include test generation rule', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Step 7 generates coverage tests');
    expect(content).toContain('Never commit failing tests');
  });

  test('Step 3.4 includes vibe coding philosophy', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('vibe coding becomes yolo coding');
  });

  test('Step 3.4 traces actual codepaths, not just syntax', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Trace every codepath');
    expect(content).toContain('Trace data flow');
    expect(content).toContain('Diagram the execution');
  });

  test('Step 3.4 maps user flows and interaction edge cases', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Map user flows');
    expect(content).toContain('Interaction edge cases');
    expect(content).toContain('Double-click');
@@ -1167,7 +1184,7 @@ describe('Step 3.4 test coverage audit', () => {
  });

  test('Step 3.4 diagram includes user-flow coverage summary', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    // The diagram was compressed from separate CODE PATH COVERAGE / USER FLOW
    // COVERAGE section headers into a single summary line. Assert on the
    // labels that still appear on that summary line.
@@ -1203,7 +1220,7 @@ describe('ship step numbering', () => {
  });

  test('ship/SKILL.md main headings use clean integer step numbers', () => {
-    const skill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const skill = readShipUnion();
    // Headings like "## Step 7: Test Coverage Audit" — NOT sub-steps like "## Step 8.1:"
    const headings = Array.from(skill.matchAll(/^## Step (\d+(?:\.\d+)?):/gm)).map(
      (m) => m[1]
@@ -1381,7 +1398,7 @@ describe('Codex skill', () => {
  });

  test('adversarial review in /ship always runs both passes', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Adversarial review (always-on)');
    expect(content).toContain('adversarial-review');
    expect(content).toContain('reasoning_effort="high"');
@@ -1391,7 +1408,7 @@ describe('Codex skill', () => {

  test('scope drift detection in /review and /ship', () => {
    const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
-    const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const shipContent = readShipUnion();
    // Both should contain scope drift from the shared resolver
    for (const content of [reviewContent, shipContent]) {
      expect(content).toContain('Scope Check:');
@@ -1427,7 +1444,8 @@ describe('Codex skill', () => {

  test('codex review invocations avoid the prompt plus --base argument shape', () => {
    for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
-      const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
+      // ship's codex command moved into sections/adversarial.md (T9 carve).
+      const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
      expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
      expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
    }
@@ -1443,7 +1461,8 @@ describe('Codex skill', () => {
    const boundaryLine =
      'Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/';
    for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
-      const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
+      // ship's codex/adversarial boundary line moved into sections/adversarial.md.
+      const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
      expect(content).toContain(boundaryLine);
    }
  });
@@ -1456,7 +1475,7 @@ describe('Codex skill', () => {
  });

  test('Review Readiness Dashboard includes Adversarial Review row', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Adversarial');
    expect(content).toContain('codex-review');
  });
@@ -1711,17 +1730,17 @@ describe('Repo mode preamble validation', () => {

 describe('Test failure triage in ship skill', () => {
  test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('Test Failure Ownership Triage');
  });

  test('ship/SKILL.md triage uses git diff for classification', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('git diff origin/<base>...HEAD --name-only');
  });

  test('ship/SKILL.md triage has solo and collaborative paths', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('REPO_MODE');
    expect(content).toContain('solo');
    expect(content).toContain('collaborative');
@@ -1730,18 +1749,18 @@ describe('Test failure triage in ship skill', () => {
  });

  test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('gh issue create');
    expect(content).toContain('--assignee');
  });

  test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
  });

  test('ship/SKILL.md uses in-branch language for stop condition', () => {
-    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    const content = readShipUnion();
    expect(content).toContain('In-branch test failures');
  });
 });