feat: test coverage catalog + failure triage (merged branches) (#285)

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching Detects whether a repo is solo-dev (one person does 80%+ of recent commits) or collaborative. Uses 90-day git shortlog window with 7-day cache in ~/.gstack/projects/{SLUG}/repo-mode.json. Config override via `gstack-config set repo_mode solo|collaborative` takes precedence over the heuristic. Minimum 5 commits required to classify (otherwise unknown). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: test failure ownership triage — see something say something Adds two new preamble sections to all gstack skills: - Repo Ownership Mode: explains solo vs collaborative behavior - See Something, Say Something: proactive issue flagging principle Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship): - Classifies test failures as in-branch vs pre-existing - Solo mode defaults to "investigate and fix now" - Collaborative mode offers "blame + assign GitHub issue" option - Also offers P0 TODO and skip options /ship Step 3 now triages test failures instead of hard-stopping on all failures. In-branch failures still block shipping. Pre-existing failures get user-directed triage based on repo mode. Adds P2 TODO for gstack notes system (deferred lightweight reminder). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for Claude and Codex hosts All 22 Claude skills and 21 Codex skills regenerated with new preamble sections (Repo Ownership Mode, See Something Say Something) and {{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate repo mode values to prevent shell injection Codex adversarial review found that unvalidated config/cache values could be injected into shell via source <(gstack-repo-mode). Added validate_mode() that only allows solo|collaborative|unknown — anything else becomes "unknown". Prevents persistent code execution through malicious config.yaml or tampered cache JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: shell injection via branch names + feature-branch sampling bias Codex code review found two issues: P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and would execute arbitrary commands. Fix: compute SLUG directly with sed instead of eval'ing gstack-slug output. P2: git shortlog HEAD only sees current branch history. On feature branches that haven't merged main recently, other contributors disappear from the sample. Fix: use git shortlog on the default branch (origin/main) instead of HEAD. Also improved blame lookup in collaborative triage to check both the test file and the production code it covers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: broaden codex-host stripping test to accommodate triage section "Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the Codex review step). Use CODEX_REVIEWS config string as a more specific marker for detecting the Codex review step in Codex-hosted skills. * fix: replace template placeholder in TODOS.md with readable text {{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed by gen-skill-docs — replaced with human-readable reference. * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add bin/ directory to project structure in CLAUDE.md * test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E - TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4), REPO_MODE branching, and safety default for ambiguous failures - plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath (gap identified during eng review — existed on neither branch) - ship-triage E2E: planted-bug fixture with in-branch (truncate null) and pre-existing (divide-by-zero) failures; verifies correct classification - Touchfile entries for diff-based test selection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate stale Codex SKILL.md for retro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 13:15:24 +02:00 · 2026-03-21 09:36:17 -07:00
parent 624c7793de
commit f954f3449a
43 changed files with 1232 additions and 18 deletions
@@ -520,6 +520,46 @@ describe('TEST_COVERAGE_AUDIT placeholders', () => {
  });
 });

+// --- {{TEST_FAILURE_TRIAGE}} resolver tests ---
+
+describe('TEST_FAILURE_TRIAGE resolver', () => {
+  const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+
+  test('contains all 4 triage steps', () => {
+    expect(shipSkill).toContain('Step T1: Classify each failure');
+    expect(shipSkill).toContain('Step T2: Handle in-branch failures');
+    expect(shipSkill).toContain('Step T3: Handle pre-existing failures');
+    expect(shipSkill).toContain('Step T4: Execute the chosen action');
+  });
+
+  test('T1 includes classification criteria (in-branch vs pre-existing)', () => {
+    expect(shipSkill).toContain('In-branch');
+    expect(shipSkill).toContain('Likely pre-existing');
+    expect(shipSkill).toContain('git diff origin/');
+  });
+
+  test('T3 branches on REPO_MODE (solo vs collaborative)', () => {
+    expect(shipSkill).toContain('REPO_MODE');
+    expect(shipSkill).toContain('solo');
+    expect(shipSkill).toContain('collaborative');
+  });
+
+  test('solo mode offers fix-now, TODO, and skip options', () => {
+    expect(shipSkill).toContain('Investigate and fix now');
+    expect(shipSkill).toContain('Add as P0 TODO');
+    expect(shipSkill).toContain('Skip');
+  });
+
+  test('collaborative mode offers blame + assign option', () => {
+    expect(shipSkill).toContain('Blame + assign GitHub issue');
+    expect(shipSkill).toContain('gh issue create');
+  });
+
+  test('defaults ambiguous failures to in-branch (safety)', () => {
+    expect(shipSkill).toContain('When ambiguous, default to in-branch');
+  });
+});
+
 // --- {{SPEC_REVIEW_LOOP}} resolver tests ---

 describe('SPEC_REVIEW_LOOP resolver', () => {
@@ -691,11 +731,11 @@ describe('Codex generation (--host codex)', () => {
  test('Codex review step stripped from Codex-host ship and review', () => {
    const shipContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
    expect(shipContent).not.toContain('codex review --base');
-    expect(shipContent).not.toContain('Investigate and fix');
+    expect(shipContent).not.toContain('CODEX_REVIEWS');

    const reviewContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
    expect(reviewContent).not.toContain('codex review --base');
-    expect(reviewContent).not.toContain('Investigate and fix');
+    expect(reviewContent).not.toContain('CODEX_REVIEWS');
  });

  test('--host codex --dry-run freshness', () => {
@@ -68,7 +68,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  'plan-eng-review-artifact':  ['plan-eng-review/**'],

  // Ship
-  'ship-base-branch': ['ship/**'],
+  'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],

  // Retro
  'retro':             ['retro/**'],
@@ -91,9 +91,11 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  // QA bootstrap
  'qa-bootstrap': ['qa/**', 'browse/src/**', 'ship/**'],

-  // Coverage audit (shared fixture)
-  'ship-coverage-audit': ['ship/**', 'test/fixtures/coverage-audit-fixture.ts'],
+  // Coverage audit (shared fixture) + triage
+  'ship-coverage-audit': ['ship/**', 'test/fixtures/coverage-audit-fixture.ts', 'bin/gstack-repo-mode'],
  'review-coverage-audit': ['review/**', 'test/fixtures/coverage-audit-fixture.ts'],
+  'plan-eng-coverage-audit': ['plan-eng-review/**', 'test/fixtures/coverage-audit-fixture.ts'],
+  'ship-triage': ['ship/**', 'bin/gstack-repo-mode'],

  // Design
  'design-consultation-core':     ['design-consultation/**'],
@@ -2858,6 +2858,249 @@ Output the diagram directly.`,
  }, 180_000);
 });

+// --- Plan Eng Review Coverage Audit E2E ---
+
+describeIfSelected('Plan Eng Review Coverage Audit E2E', ['plan-eng-coverage-audit'], () => {
+  let planCoverageDir: string;
+
+  beforeAll(() => {
+    planCoverageDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-plan-coverage-'));
+
+    // Copy plan-eng-review skill files
+    copyDirSync(path.join(ROOT, 'plan-eng-review'), path.join(planCoverageDir, 'plan-eng-review'));
+
+    // Use shared fixture for billing project with coverage gaps
+    const { createCoverageAuditFixture } = require('./fixtures/coverage-audit-fixture');
+    createCoverageAuditFixture(planCoverageDir);
+  });
+
+  afterAll(() => {
+    try { fs.rmSync(planCoverageDir, { recursive: true, force: true }); } catch {}
+  });
+
+  test('/plan-eng-review coverage audit traces plan codepaths', async () => {
+    const result = await runSkillTest({
+      prompt: `Read the file plan-eng-review/SKILL.md for the plan review workflow instructions.
+
+You are on the feature/billing branch. The base branch is main.
+This is a test project — there is no remote, no PR to create.
+
+ONLY run the Test Coverage Audit section from the plan review workflow.
+Skip all other steps (architecture, code quality, performance, etc.).
+
+The source code is in ${planCoverageDir}/src/billing.ts.
+Existing tests are in ${planCoverageDir}/test/billing.test.ts.
+
+Produce the ASCII coverage diagram showing which code paths are tested and which have gaps.
+Output the diagram directly.`,
+      workingDirectory: planCoverageDir,
+      maxTurns: 15,
+      allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'],
+      timeout: 120_000,
+      testName: 'plan-eng-coverage-audit',
+      runId,
+    });
+
+    logCost('/plan-eng-review coverage audit', result);
+    recordE2E('/plan-eng-review coverage audit', 'Plan Eng Review Coverage Audit E2E', result, {
+      passed: result.exitReason === 'success',
+    });
+
+    expect(result.exitReason).toBe('success');
+
+    // Check output contains coverage diagram elements
+    const output = result.output || '';
+    const outputLower = output.toLowerCase();
+    const hasGap = outputLower.includes('gap') || outputLower.includes('no test');
+    const hasTested = outputLower.includes('tested') || output.includes('✓') || output.includes('★');
+    const hasCoverage = outputLower.includes('coverage') || outputLower.includes('paths tested');
+
+    console.log(`Output has GAP markers: ${hasGap}`);
+    console.log(`Output has TESTED markers: ${hasTested}`);
+    console.log(`Output has coverage summary: ${hasCoverage}`);
+
+    // The agent MUST produce a coverage diagram with gap and tested markers
+    expect(hasGap || hasTested).toBe(true);
+
+    // At minimum, the agent should have read the source and test files
+    const readCalls = result.toolCalls.filter(tc => tc.tool === 'Read');
+    expect(readCalls.length).toBeGreaterThan(0);
+  }, 180_000);
+});
+
+// --- Triage E2E ---
+
+describeIfSelected('Test Failure Triage E2E', ['ship-triage'], () => {
+  let triageDir: string;
+
+  beforeAll(() => {
+    triageDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-triage-'));
+
+    // Copy ship skill files
+    copyDirSync(path.join(ROOT, 'ship'), path.join(triageDir, 'ship'));
+
+    const run = (cmd: string, args: string[]) =>
+      spawnSync(cmd, args, { cwd: triageDir, stdio: 'pipe', timeout: 5000 });
+
+    // Init git repo
+    run('git', ['init', '-b', 'main']);
+    run('git', ['config', 'user.email', 'test@test.com']);
+    run('git', ['config', 'user.name', 'Test']);
+
+    // Create a project with a pre-existing test failure on main
+    fs.writeFileSync(path.join(triageDir, 'package.json'), JSON.stringify({
+      name: 'triage-test-app',
+      version: '1.0.0',
+      scripts: { test: 'node test/run.js' },
+    }, null, 2));
+
+    fs.mkdirSync(path.join(triageDir, 'src'), { recursive: true });
+    fs.mkdirSync(path.join(triageDir, 'test'), { recursive: true });
+
+    // Source with a bug that exists on main (pre-existing)
+    fs.writeFileSync(path.join(triageDir, 'src', 'math.js'), `
+module.exports = {
+  add: (a, b) => a + b,
+  divide: (a, b) => a / b,  // BUG: no zero-division check (pre-existing)
+};
+`);
+
+    // Test file that catches the pre-existing bug
+    fs.writeFileSync(path.join(triageDir, 'test', 'math.test.js'), `
+const { add, divide } = require('../src/math');
+
+// This test passes
+if (add(2, 3) !== 5) { console.error('FAIL: add(2,3) should be 5'); process.exit(1); }
+console.log('PASS: add');
+
+// This test FAILS — pre-existing bug (divide by zero returns Infinity, not an error)
+try {
+  const result = divide(10, 0);
+  if (result === Infinity) { console.error('FAIL: divide(10,0) should throw, got Infinity'); process.exit(1); }
+} catch(e) {
+  console.log('PASS: divide zero check');
+}
+`);
+
+    // Test runner
+    fs.writeFileSync(path.join(triageDir, 'test', 'run.js'), `
+require('./math.test.js');
+require('./string.test.js');
+`);
+
+    // Commit on main with the pre-existing bug
+    run('git', ['add', '.']);
+    run('git', ['commit', '-m', 'initial: math utils with tests']);
+
+    // Create feature branch
+    run('git', ['checkout', '-b', 'feature/string-utils']);
+
+    // Add new code with a new bug (in-branch)
+    fs.writeFileSync(path.join(triageDir, 'src', 'string.js'), `
+module.exports = {
+  capitalize: (s) => s.charAt(0).toUpperCase() + s.slice(1),
+  reverse: (s) => s.split('').reverse().join(''),
+  truncate: (s, len) => s.substring(0, len),  // BUG: no null check (in-branch)
+};
+`);
+
+    // Add test that catches the in-branch bug
+    fs.writeFileSync(path.join(triageDir, 'test', 'string.test.js'), `
+const { capitalize, reverse, truncate } = require('../src/string');
+
+if (capitalize('hello') !== 'Hello') { console.error('FAIL: capitalize'); process.exit(1); }
+console.log('PASS: capitalize');
+
+if (reverse('abc') !== 'cba') { console.error('FAIL: reverse'); process.exit(1); }
+console.log('PASS: reverse');
+
+// This test FAILS — in-branch bug (null input causes TypeError)
+try {
+  truncate(null, 5);
+  console.log('PASS: truncate null');
+} catch(e) {
+  console.error('FAIL: truncate(null, 5) threw: ' + e.message);
+  process.exit(1);
+}
+`);
+
+    run('git', ['add', '.']);
+    run('git', ['commit', '-m', 'feat: add string utilities']);
+  });
+
+  afterAll(() => {
+    try { fs.rmSync(triageDir, { recursive: true, force: true }); } catch {}
+  });
+
+  test('/ship triage correctly classifies in-branch vs pre-existing failures', async () => {
+    const result = await runSkillTest({
+      prompt: `Read the file ship/SKILL.md for the ship workflow instructions.
+
+You are on the feature/string-utils branch. The base branch is main.
+This is a test project — there is no remote, no PR to create.
+
+Run the tests first:
+\`\`\`bash
+cd ${triageDir} && node test/run.js
+\`\`\`
+
+The tests will fail. Now run ONLY the Test Failure Ownership Triage (Steps T1-T4) from the ship workflow.
+
+For each failing test, classify it as:
+- **In-branch**: caused by changes on this branch (feature/string-utils)
+- **Pre-existing**: existed before this branch (present on main)
+
+Use git diff origin/main...HEAD (or git diff main...HEAD since there's no remote) to determine which files changed on this branch.
+
+Output your classification for each failure clearly, labeling each as "IN-BRANCH" or "PRE-EXISTING" with your reasoning.
+
+This is a solo repo (REPO_MODE=solo). For pre-existing failures, recommend fixing now.`,
+      workingDirectory: triageDir,
+      maxTurns: 20,
+      allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'],
+      timeout: 180_000,
+      testName: 'ship-triage',
+      runId,
+    });
+
+    logCost('/ship triage', result);
+
+    const output = result.output || '';
+    const outputLower = output.toLowerCase();
+
+    // The triage should identify the string/truncate failure as in-branch
+    const hasInBranch = outputLower.includes('in-branch') || outputLower.includes('in branch') || outputLower.includes('introduced');
+    // The triage should identify the math/divide failure as pre-existing
+    const hasPreExisting = outputLower.includes('pre-existing') || outputLower.includes('pre existing') || outputLower.includes('existed before');
+
+    console.log(`Output identifies IN-BRANCH failures: ${hasInBranch}`);
+    console.log(`Output identifies PRE-EXISTING failures: ${hasPreExisting}`);
+
+    // Check that the string/truncate bug is classified as in-branch
+    const mentionsTruncate = outputLower.includes('truncate') || outputLower.includes('string');
+    const mentionsDivide = outputLower.includes('divide') || outputLower.includes('math');
+
+    console.log(`Mentions truncate/string (in-branch bug): ${mentionsTruncate}`);
+    console.log(`Mentions divide/math (pre-existing bug): ${mentionsDivide}`);
+
+    recordE2E('/ship triage', 'Test Failure Triage E2E', result, {
+      passed: result.exitReason === 'success' && hasInBranch && hasPreExisting,
+      has_in_branch_classification: hasInBranch,
+      has_pre_existing_classification: hasPreExisting,
+      mentions_truncate: mentionsTruncate,
+      mentions_divide: mentionsDivide,
+    });
+
+    expect(result.exitReason).toBe('success');
+    // Must classify at least one failure as in-branch AND one as pre-existing
+    expect(hasInBranch).toBe(true);
+    expect(hasPreExisting).toBe(true);
+    // Must mention the specific bugs
+    expect(mentionsTruncate).toBe(true);
+    expect(mentionsDivide).toBe(true);
+  }, 240_000);
+});
+
 // --- Codex skill E2E ---

 describeIfSelected('Codex skill E2E', ['codex-review'], () => {
@@ -1280,12 +1280,12 @@ describe('Codex skill', () => {
  test('codex-host ship/review do NOT contain codex review step', () => {
    const shipContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-ship', 'SKILL.md'), 'utf-8');
    expect(shipContent).not.toContain('codex review --base');
-    expect(shipContent).not.toContain('Investigate and fix');
+    expect(shipContent).not.toContain('CODEX_REVIEWS');

    const reviewContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-review', 'SKILL.md'), 'utf-8');
    expect(reviewContent).not.toContain('codex review --base');
    expect(reviewContent).not.toContain('codex_reviews');
-    expect(reviewContent).not.toContain('Investigate and fix');
+    expect(reviewContent).not.toContain('CODEX_REVIEWS');
  });

  test('codex integration in /plan-eng-review offers plan critique', () => {
@@ -1411,3 +1411,58 @@ describe('Codex skill validation', () => {
    }
  });
 });
+
+// --- Repo mode and test failure triage validation ---
+
+describe('Repo mode preamble validation', () => {
+  test('generated SKILL.md preamble contains REPO_MODE output', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
+    expect(content).toContain('REPO_MODE:');
+    expect(content).toContain('gstack-repo-mode');
+  });
+
+  test('generated SKILL.md contains See Something Say Something section', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
+    expect(content).toContain('See Something, Say Something');
+    expect(content).toContain('REPO_MODE');
+    expect(content).toContain('solo');
+    expect(content).toContain('collaborative');
+  });
+});
+
+describe('Test failure triage in ship skill', () => {
+  test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('Test Failure Ownership Triage');
+  });
+
+  test('ship/SKILL.md triage uses git diff for classification', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('git diff origin/<base>...HEAD --name-only');
+  });
+
+  test('ship/SKILL.md triage has solo and collaborative paths', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('REPO_MODE');
+    expect(content).toContain('solo');
+    expect(content).toContain('collaborative');
+    expect(content).toContain('Investigate and fix now');
+    expect(content).toContain('Add as P0 TODO');
+  });
+
+  test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('gh issue create');
+    expect(content).toContain('--assignee');
+  });
+
+  test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
+  });
+
+  test('ship/SKILL.md uses in-branch language for stop condition', () => {
+    const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+    expect(content).toContain('In-branch test failures');
+  });
+});