feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259)

* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver DRY extraction of the test coverage audit methodology into a shared generator function with three explicit placeholders: - TEST_COVERAGE_AUDIT_PLAN (plan-eng-review) - TEST_COVERAGE_AUDIT_SHIP (ship) - TEST_COVERAGE_AUDIT_REVIEW (review) Shared across all modes: codepath tracing, ASCII diagram format, quality scoring rubric, E2E test decision matrix, regression rule, and test framework detection via CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: plan-eng-review uses shared test coverage audit Replace the thin 6-line Section 3 test review with the full shared methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now: - Traces every codepath with full ASCII diagrams - Adds missing tests to the plan (not just "check for tests") - Writes test plan artifact for /qa consumption - Includes E2E/eval recommendations and regression detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: ship uses shared test coverage audit Replace 135 lines of inline Step 3.4 methodology with {{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus: - E2E test decision matrix (marks paths needing E2E vs unit) - Eval recommendations for LLM prompt changes - Regression detection iron rule - Test framework detection via CLAUDE.md first - Test plan artifact for /qa consumption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /review Step 4.75 test coverage diagram Add codepath tracing to the pre-landing review via {{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode: - Produces ASCII coverage diagram (same methodology as plan/ship) - Generates tests for gaps via Fix-First (ASK user) - Subsumes Pass 2 "Test Gaps" checklist category - Gaps are INFORMATIONAL findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: mode differentiation + regression guard for coverage audit 10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders: - All modes share: codepath tracing, E2E matrix, regression rule - Plan mode: adds to plan + artifact, no ship-specific content - Ship mode: auto-generates + before/after count + coverage summary - Review mode: Fix-First ASK + INFORMATIONAL, no artifact - Regression guard: ship SKILL.md preserves all key phrases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: extract shared coverage audit fixture + review E2E - Extract billing.ts fixture into coverage-audit-fixture.ts (DRY) - Refactor ship-coverage-audit E2E to use shared fixture - Add review-coverage-audit E2E for Step 4.75 - Update touchfiles: both E2Es depend on shared fixture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strengthen E2E assertions for coverage audit tests The coverage audit E2E tests (ship + review) were only asserting exitReason === 'success' and readCalls > 0 — they passed even if the agent produced no coverage diagram. Add assertion that the output contains either GAP or TESTED markers. Found during /review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: plan mode traces the plan, not the git diff Codex adversarial review caught that plan-eng-review was inheriting "git diff origin/<base>...HEAD" from the shared resolver, but plan mode reviews a plan document, not a code diff. Plan mode now says: "Trace every codepath in the plan" and "Read the plan document." Ship and review modes keep the git diff instruction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: test coverage catalog + failure triage (merged branches) (#285) * feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching Detects whether a repo is solo-dev (one person does 80%+ of recent commits) or collaborative. Uses 90-day git shortlog window with 7-day cache in ~/.gstack/projects/{SLUG}/repo-mode.json. Config override via `gstack-config set repo_mode solo|collaborative` takes precedence over the heuristic. Minimum 5 commits required to classify (otherwise unknown). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: test failure ownership triage — see something say something Adds two new preamble sections to all gstack skills: - Repo Ownership Mode: explains solo vs collaborative behavior - See Something, Say Something: proactive issue flagging principle Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship): - Classifies test failures as in-branch vs pre-existing - Solo mode defaults to "investigate and fix now" - Collaborative mode offers "blame + assign GitHub issue" option - Also offers P0 TODO and skip options /ship Step 3 now triages test failures instead of hard-stopping on all failures. In-branch failures still block shipping. Pre-existing failures get user-directed triage based on repo mode. Adds P2 TODO for gstack notes system (deferred lightweight reminder). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for Claude and Codex hosts All 22 Claude skills and 21 Codex skills regenerated with new preamble sections (Repo Ownership Mode, See Something Say Something) and {{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate repo mode values to prevent shell injection Codex adversarial review found that unvalidated config/cache values could be injected into shell via source <(gstack-repo-mode). Added validate_mode() that only allows solo|collaborative|unknown — anything else becomes "unknown". Prevents persistent code execution through malicious config.yaml or tampered cache JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: shell injection via branch names + feature-branch sampling bias Codex code review found two issues: P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and would execute arbitrary commands. Fix: compute SLUG directly with sed instead of eval'ing gstack-slug output. P2: git shortlog HEAD only sees current branch history. On feature branches that haven't merged main recently, other contributors disappear from the sample. Fix: use git shortlog on the default branch (origin/main) instead of HEAD. Also improved blame lookup in collaborative triage to check both the test file and the production code it covers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: broaden codex-host stripping test to accommodate triage section "Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the Codex review step). Use CODEX_REVIEWS config string as a more specific marker for detecting the Codex review step in Codex-hosted skills. * fix: replace template placeholder in TODOS.md with readable text {{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed by gen-skill-docs — replaced with human-readable reference. * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add bin/ directory to project structure in CLAUDE.md * test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E - TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4), REPO_MODE branching, and safety default for ambiguous failures - plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath (gap identified during eng review — existed on neither branch) - ship-triage E2E: planted-bug fixture with in-branch (truncate null) and pre-existing (divide-by-zero) failures; verifies correct classification - Touchfile entries for diff-based test selection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate stale Codex SKILL.md for retro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-repo-mode handles repos without origin remote Split `git remote get-url origin` into a separate variable with `|| true` so the script doesn't crash under `set -euo pipefail` in local-only repos. Falls back to REPO_MODE=unknown gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: REPO_MODE defaults to unknown when helper emits nothing Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't catch empty output) to `source <(...) || true` followed by `REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: triage E2E runs both test files in subprocesses math.test.js called process.exit(1) which killed the runner before string.test.js could execute. Changed test runner to use child_process so each test runs independently and both failure classes are exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-repo-mode handles repos without origin remote Fall back through origin/main → origin/master → HEAD when git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents shortlog crash in repos where origin/HEAD isn't configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: triage E2E runs both test files in subprocesses Add assertions verifying both math.test.js (pre-existing failure) and string.test.js (in-branch failure) actually executed during triage. Prevents false passes where only one failure class is exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: REPO_MODE defaults to unknown when helper emits nothing - Remove head -20 truncation that biased solo classification by dropping low-volume contributors from the denominator - Use atomic write (mktemp + mv) for cache to prevent concurrent preamble reads from seeing partial JSON Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add test coverage catalog to CHANGELOG + update project structure - CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E recommendations, regression iron rule, failure triage, repo-mode fix - CLAUDE.md: add missing skill directories (autoplan, benchmark, canary, codex, land-and-deploy, setup-deploy) to project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.10.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: CHANGELOG rules — branch-scoped versions, never fold into old entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 05:56:41 +02:00 · 2026-03-22 11:28:16 -07:00
parent 407b156920
commit 7ff0f84b1e
54 changed files with 5949 additions and 205 deletions
@@ -0,0 +1,76 @@
+/**
+ * Shared fixture for test coverage audit E2E tests.
+ *
+ * Creates a Node.js project with billing source code that has intentional
+ * test coverage gaps: processPayment has happy-path-only tests,
+ * refundPayment has no tests at all.
+ *
+ * Used by: ship-coverage-audit E2E, review-coverage-audit E2E
+ */
+
+import * as fs from 'fs';
+import * as path from 'path';
+import { spawnSync } from 'child_process';
+
+export function createCoverageAuditFixture(dir: string): void {
+  // Create a Node.js project WITH test framework but coverage gaps
+  fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({
+    name: 'test-coverage-app',
+    version: '1.0.0',
+    type: 'module',
+    scripts: { test: 'echo "no tests yet"' },
+    devDependencies: { vitest: '^1.0.0' },
+  }, null, 2));
+
+  // Create vitest config
+  fs.writeFileSync(path.join(dir, 'vitest.config.ts'),
+    `import { defineConfig } from 'vitest/config';\nexport default defineConfig({ test: {} });\n`);
+
+  fs.writeFileSync(path.join(dir, 'VERSION'), '0.1.0.0\n');
+  fs.writeFileSync(path.join(dir, 'CHANGELOG.md'), '# Changelog\n');
+
+  // Create source file with multiple code paths
+  fs.mkdirSync(path.join(dir, 'src'), { recursive: true });
+  fs.writeFileSync(path.join(dir, 'src', 'billing.ts'), `
+export function processPayment(amount: number, currency: string) {
+  if (amount <= 0) throw new Error('Invalid amount');
+  if (currency !== 'USD' && currency !== 'EUR') throw new Error('Unsupported currency');
+  return { status: 'success', amount, currency };
+}
+
+export function refundPayment(paymentId: string, reason: string) {
+  if (!paymentId) throw new Error('Payment ID required');
+  if (!reason) throw new Error('Reason required');
+  return { status: 'refunded', paymentId, reason };
+}
+`);
+
+  // Create a test directory with ONE test (partial coverage)
+  fs.mkdirSync(path.join(dir, 'test'), { recursive: true });
+  fs.writeFileSync(path.join(dir, 'test', 'billing.test.ts'), `
+import { describe, test, expect } from 'vitest';
+import { processPayment } from '../src/billing';
+
+describe('processPayment', () => {
+  test('processes valid payment', () => {
+    const result = processPayment(100, 'USD');
+    expect(result.status).toBe('success');
+  });
+  // GAP: no test for invalid amount
+  // GAP: no test for unsupported currency
+  // GAP: refundPayment not tested at all
+});
+`);
+
+  // Init git repo with main branch
+  const run = (cmd: string, args: string[]) =>
+    spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 5000 });
+  run('git', ['init', '-b', 'main']);
+  run('git', ['config', 'user.email', 'test@test.com']);
+  run('git', ['config', 'user.name', 'Test']);
+  run('git', ['add', '.']);
+  run('git', ['commit', '-m', 'initial commit']);
+
+  // Create feature branch
+  run('git', ['checkout', '-b', 'feature/billing']);
+}
@@ -416,6 +416,150 @@ describe('REVIEW_DASHBOARD resolver', () => {
  });
 });

+// ─── Test Coverage Audit Resolver Tests ─────────────────────
+
+describe('TEST_COVERAGE_AUDIT placeholders', () => {
+  const planSkill = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
+  const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+  const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
+
+  test('all three modes share codepath tracing methodology', () => {
+    const sharedPhrases = [
+      'Trace data flow',
+      'Diagram the execution',
+      'Quality scoring rubric',
+      '★★★',
+      '★★',
+      'GAP',
+    ];
+    for (const phrase of sharedPhrases) {
+      expect(planSkill).toContain(phrase);
+      expect(shipSkill).toContain(phrase);
+      expect(reviewSkill).toContain(phrase);
+    }
+    // Plan mode traces the plan, not a git diff
+    expect(planSkill).toContain('Trace every codepath in the plan');
+    expect(planSkill).not.toContain('git diff origin');
+    // Ship and review modes trace the diff
+    expect(shipSkill).toContain('Trace every codepath changed');
+    expect(reviewSkill).toContain('Trace every codepath changed');
+  });
+
+  test('all three modes include E2E decision matrix', () => {
+    for (const skill of [planSkill, shipSkill, reviewSkill]) {
+      expect(skill).toContain('E2E Test Decision Matrix');
+      expect(skill).toContain('→E2E');
+      expect(skill).toContain('→EVAL');
+    }
+  });
+
+  test('all three modes include regression rule', () => {
+    for (const skill of [planSkill, shipSkill, reviewSkill]) {
+      expect(skill).toContain('REGRESSION RULE');
+      expect(skill).toContain('IRON RULE');
+    }
+  });
+
+  test('all three modes include test framework detection', () => {
+    for (const skill of [planSkill, shipSkill, reviewSkill]) {
+      expect(skill).toContain('Test Framework Detection');
+      expect(skill).toContain('CLAUDE.md');
+    }
+  });
+
+  test('plan mode adds tests to plan + includes test plan artifact', () => {
+    expect(planSkill).toContain('Add missing tests to the plan');
+    expect(planSkill).toContain('eng-review-test-plan');
+    expect(planSkill).toContain('Test Plan Artifact');
+  });
+
+  test('ship mode auto-generates tests + includes before/after count', () => {
+    expect(shipSkill).toContain('Generate tests for uncovered paths');
+    expect(shipSkill).toContain('Before/after test count');
+    expect(shipSkill).toContain('30 code paths max');
+    expect(shipSkill).toContain('ship-test-plan');
+  });
+
+  test('review mode generates via Fix-First + gaps are INFORMATIONAL', () => {
+    expect(reviewSkill).toContain('Fix-First');
+    expect(reviewSkill).toContain('INFORMATIONAL');
+    expect(reviewSkill).toContain('Step 4.75');
+    expect(reviewSkill).toContain('subsumes the "Test Gaps" category');
+  });
+
+  test('plan mode does NOT include ship-specific content', () => {
+    expect(planSkill).not.toContain('Before/after test count');
+    expect(planSkill).not.toContain('30 code paths max');
+    expect(planSkill).not.toContain('ship-test-plan');
+  });
+
+  test('review mode does NOT include test plan artifact', () => {
+    expect(reviewSkill).not.toContain('Test Plan Artifact');
+    expect(reviewSkill).not.toContain('eng-review-test-plan');
+    expect(reviewSkill).not.toContain('ship-test-plan');
+  });
+
+  // Regression guard: ship output contains key phrases from before the refactor
+  test('ship SKILL.md regression guard — key phrases preserved', () => {
+    const regressionPhrases = [
+      '100% coverage is the goal',
+      'ASCII coverage diagram',
+      'processPayment',
+      'refundPayment',
+      'billing.test.ts',
+      'checkout.e2e.ts',
+      'COVERAGE:',
+      'QUALITY:',
+      'GAPS:',
+      'Code paths:',
+      'User flows:',
+    ];
+    for (const phrase of regressionPhrases) {
+      expect(shipSkill).toContain(phrase);
+    }
+  });
+});
+
+// --- {{TEST_FAILURE_TRIAGE}} resolver tests ---
+
+describe('TEST_FAILURE_TRIAGE resolver', () => {
+  const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+
+  test('contains all 4 triage steps', () => {
+    expect(shipSkill).toContain('Step T1: Classify each failure');
+    expect(shipSkill).toContain('Step T2: Handle in-branch failures');
+    expect(shipSkill).toContain('Step T3: Handle pre-existing failures');
+    expect(shipSkill).toContain('Step T4: Execute the chosen action');
+  });
+
+  test('T1 includes classification criteria (in-branch vs pre-existing)', () => {
+    expect(shipSkill).toContain('In-branch');
+    expect(shipSkill).toContain('Likely pre-existing');
+    expect(shipSkill).toContain('git diff origin/');
+  });
+
+  test('T3 branches on REPO_MODE (solo vs collaborative)', () => {
+    expect(shipSkill).toContain('REPO_MODE');
+    expect(shipSkill).toContain('solo');
+    expect(shipSkill).toContain('collaborative');
+  });
+
+  test('solo mode offers fix-now, TODO, and skip options', () => {
+    expect(shipSkill).toContain('Investigate and fix now');
+    expect(shipSkill).toContain('Add as P0 TODO');
+    expect(shipSkill).toContain('Skip');
+  });
+
+  test('collaborative mode offers blame + assign option', () => {
+    expect(shipSkill).toContain('Blame + assign GitHub issue');
+    expect(shipSkill).toContain('gh issue create');
+  });
+
+  test('defaults ambiguous failures to in-branch (safety)', () => {
+    expect(shipSkill).toContain('When ambiguous, default to in-branch');
+  });
+});
+
 // --- {{PLAN_FILE_REVIEW_REPORT}} resolver tests ---

 describe('PLAN_FILE_REVIEW_REPORT resolver', () => {
@@ -611,11 +755,11 @@ describe('Codex generation (--host codex)', () => {
  test('Codex review step stripped from Codex-host ship and review', () => {
    const shipContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
    expect(shipContent).not.toContain('codex review --base');
-    expect(shipContent).not.toContain('Investigate and fix');
+    expect(shipContent).not.toContain('CODEX_REVIEWS');

    const reviewContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
    expect(reviewContent).not.toContain('codex review --base');
-    expect(reviewContent).not.toContain('Investigate and fix');
+    expect(reviewContent).not.toContain('CODEX_REVIEWS');
  });

  test('--host codex --dry-run freshness', () => {
@@ -70,7 +70,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  'plan-eng-review-artifact':  ['plan-eng-review/**'],

  // Ship
-  'ship-base-branch':    ['ship/**'],
+  'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],
  'ship-local-workflow': ['ship/**', 'scripts/gen-skill-docs.ts'],

  // Setup browser cookies
@@ -95,8 +95,11 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  'gemini-review-findings': ['review/**', '.agents/skills/gstack-review/**', 'test/helpers/gemini-session-runner.ts'],


-  // Ship coverage audit
-  'ship-coverage-audit': ['ship/**'],
+  // Coverage audit (shared fixture) + triage
+  'ship-coverage-audit': ['ship/**', 'test/fixtures/coverage-audit-fixture.ts', 'bin/gstack-repo-mode'],
+  'review-coverage-audit': ['review/**', 'test/fixtures/coverage-audit-fixture.ts'],
+  'plan-eng-coverage-audit': ['plan-eng-review/**', 'test/fixtures/coverage-audit-fixture.ts'],
+  'ship-triage': ['ship/**', 'bin/gstack-repo-mode'],

  // Design
  'design-consultation-core':       ['design-consultation/**'],
@@ -1319,10 +1319,12 @@ describe('Codex skill', () => {
  test('codex-host ship/review do NOT contain adversarial review step', () => {
    const shipContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-ship', 'SKILL.md'), 'utf-8');
    expect(shipContent).not.toContain('codex review --base');
-    expect(shipContent).not.toContain('Investigate and fix');
+    expect(shipContent).not.toContain('CODEX_REVIEWS');

    const reviewContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-review', 'SKILL.md'), 'utf-8');
    expect(reviewContent).not.toContain('codex review --base');
+    expect(reviewContent).not.toContain('codex_reviews');
+    expect(reviewContent).not.toContain('CODEX_REVIEWS');
    expect(reviewContent).not.toContain('adversarial-review');
    expect(reviewContent).not.toContain('Investigate and fix');
  });
@@ -1450,3 +1452,58 @@ describe('Codex skill validation', () => {
    }
  });
 });
+
+// --- Repo mode and test failure triage validation ---
+
+describe('Repo mode preamble validation', () => {
+  test('generated SKILL.md preamble contains REPO_MODE output', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
+    expect(content).toContain('REPO_MODE:');
+    expect(content).toContain('gstack-repo-mode');
+  });
+
+  test('generated SKILL.md contains See Something Say Something section', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
+    expect(content).toContain('See Something, Say Something');
+    expect(content).toContain('REPO_MODE');
+    expect(content).toContain('solo');
+    expect(content).toContain('collaborative');
+  });
+});
+
+describe('Test failure triage in ship skill', () => {
+  test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('Test Failure Ownership Triage');
+  });
+
+  test('ship/SKILL.md triage uses git diff for classification', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('git diff origin/<base>...HEAD --name-only');
+  });
+
+  test('ship/SKILL.md triage has solo and collaborative paths', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('REPO_MODE');
+    expect(content).toContain('solo');
+    expect(content).toContain('collaborative');
+    expect(content).toContain('Investigate and fix now');
+    expect(content).toContain('Add as P0 TODO');
+  });
+
+  test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('gh issue create');
+    expect(content).toContain('--assignee');
+  });
+
+  test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
+  });
+
+  test('ship/SKILL.md uses in-branch language for stop condition', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('In-branch test failures');
+  });
+});