## Step 7: Test Coverage Audit **Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense. **Subagent prompt:** Pass the following instructions to the subagent, with `` substituted with the base branch: > You are running a ship-workflow test coverage audit. Run `git diff ...HEAD` as needed. Do not commit or push — report only. > > 100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned. ### Test Framework Detection Before analyzing coverage, detect the project's test framework: 1. **Read CLAUDE.md** — look for a `## Testing` section with test command and framework name. If found, use that as the authoritative source. 2. **If CLAUDE.md has no testing section, auto-detect:** ```bash setopt +o nomatch 2>/dev/null || true # zsh compat # Detect project runtime [ -f Gemfile ] && echo "RUNTIME:ruby" [ -f package.json ] && echo "RUNTIME:node" [ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" [ -f go.mod ] && echo "RUNTIME:go" [ -f Cargo.toml ] && echo "RUNTIME:rust" # Check for existing test infrastructure ls jest.config.* vitest.config.* playwright.config.* cypress.config.* .rspec pytest.ini phpunit.xml 2>/dev/null ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null ``` 3. **If no framework detected:** falls through to the Test Framework Bootstrap step (Step 4) which handles full setup. **0. Before/after test count:** ```bash # Count test files before any generation find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l ``` Store this number for the PR body. **1. Trace every codepath changed** using `git diff origin/...HEAD`: Read every changed file. For each one, trace how data flows through the code — don't just list functions, actually follow the execution: 1. **Read the diff.** For each changed file, read the full file (not just the diff hunk) to understand context. 2. **Trace data flow.** Starting from each entry point (route handler, exported function, event listener, component render), follow the data through every branch: - Where does input come from? (request params, props, database, API call) - What transforms it? (validation, mapping, computation) - Where does it go? (database write, API response, rendered output, side effect) - What can go wrong at each step? (null/undefined, invalid input, network failure, empty collection) 3. **Diagram the execution.** For each changed file, draw an ASCII diagram showing: - Every function/method that was added or modified - Every conditional branch (if/else, switch, ternary, guard clause, early return) - Every error path (try/catch, rescue, error boundary, fallback) - Every call to another function (trace into it — does IT have untested branches?) - Every edge: what happens with null input? Empty array? Invalid type? This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test. **2. Map user flows, interactions, and error states:** Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through: - **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test. - **Interaction edge cases:** What happens when the user does something unexpected? - Double-click/rapid resubmit - Navigate away mid-operation (back button, close tab, click another link) - Submit with stale data (page sat open for 30 minutes, session expired) - Slow connection (API takes 10 seconds — what does the user see?) - Concurrent actions (two tabs, same form) - **Error states the user can see:** For every error the code handles, what does the user actually experience? - Is there a clear error message or a silent failure? - Can the user recover (retry, go back, fix input) or are they stuck? - What happens with no network? With a 500 from the API? With invalid data from the server? - **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input? Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else. **3. Check each branch against existing tests:** Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it: - Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb` - An if/else → look for tests covering BOTH the true AND false path - An error handler → look for a test that triggers that specific error condition - A call to `helperFn()` that has its own branches → those branches need tests too - A user flow → look for an integration or E2E test that walks through the journey - An interaction edge case → look for a test that simulates the unexpected action Quality scoring rubric: - ★★★ Tests behavior with edge cases AND error paths - ★★ Tests correct behavior, happy path only - ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw") ### E2E Test Decision Matrix When checking each branch, also determine whether a unit test or E2E/integration test is the right tool: **RECOMMEND E2E (mark as [→E2E] in the diagram):** - Common user flow spanning 3+ components/services (e.g., signup → verify email → first login) - Integration point where mocking hides real failures (e.g., API → queue → worker → DB) - Auth/payment/data-destruction flows — too important to trust unit tests alone **RECOMMEND EVAL (mark as [→EVAL] in the diagram):** - Critical LLM call that needs a quality eval (e.g., prompt change → test output still meets quality bar) - Changes to prompt templates, system instructions, or tool definitions **STICK WITH UNIT TESTS:** - Pure function with clear inputs/outputs - Internal helper with no side effects - Edge case of a single function (null input, empty array) - Obscure/rare flow that isn't customer-facing ### REGRESSION RULE (mandatory) **IRON RULE:** When the coverage audit identifies a REGRESSION — code that previously worked but the diff broke — a regression test is written immediately. No AskUserQuestion. No skipping. Regressions are the highest-priority test because they prove something broke. A regression is when: - The diff modifies existing behavior (not new code) - The existing test suite (if any) doesn't cover the changed path - The change introduces a new failure mode for existing callers When uncertain whether a change is a regression, err on the side of writing the test. Format: commit as `test: regression test for {what broke}` **4. Output ASCII coverage diagram:** Include BOTH code paths and user flows in the same diagram. Mark E2E-worthy and eval-worthy paths: ``` CODE PATHS USER FLOWS [+] src/services/billing.ts [+] Payment checkout ├── processPayment() ├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15 │ ├── [★★★ TESTED] happy + declined + timeout ├── [GAP] [→E2E] Double-click submit │ ├── [GAP] Network timeout └── [GAP] Navigate away mid-payment │ └── [GAP] Invalid currency └── refundPayment() [+] Error states ├── [★★ TESTED] Full refund — :89 ├── [★★ TESTED] Card declined message └── [★ TESTED] Partial (non-throw only) — :101 └── [GAP] Network timeout UX LLM integration: [GAP] [→EVAL] Prompt template change — needs eval test COVERAGE: 5/13 paths tested (38%) | Code paths: 3/5 (60%) | User flows: 2/8 (25%) QUALITY: ★★★:2 ★★:2 ★:1 | GAPS: 8 (2 E2E, 1 eval) ``` Legend: ★★★ behavior + edge + error | ★★ happy path | ★ smoke check [→E2E] = needs integration test | [→EVAL] = needs LLM eval **Fast path:** All paths covered → "Step 7: All new code paths have test coverage ✓" Continue. **5. Generate tests for uncovered paths:** If test framework detected (or bootstrapped in Step 4): - Prioritize error handlers and edge cases first (happy paths are more likely already tested) - Read 2-3 existing test files to match conventions exactly - Generate unit tests. Mock all external dependencies (DB, API, Redis). - For paths marked [→E2E]: generate integration/E2E tests using the project's E2E framework (Playwright, Cypress, Capybara, etc.) - For paths marked [→EVAL]: generate eval tests using the project's eval framework, or flag for manual eval if none exists - Write tests that exercise the specific uncovered path with real assertions - Run each test. Passes → commit as `test: coverage for {feature}` - Fails → fix once. Still fails → revert, note gap in diagram. Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-min per-test exploration cap. If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured." **Diff is test-only changes:** Skip Step 7 entirely: "No new application code paths to audit." **6. After-count and coverage summary:** ```bash # Count test files after generation find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l ``` For PR body: `Tests: {before} → {after} (+{delta} new)` Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.` **7. Coverage gate:** Before proceeding, check CLAUDE.md for a `## Test Coverage` section with `Minimum:` and `Target:` fields. If found, use those percentages. Otherwise use defaults: Minimum = 60%, Target = 80%. Using the coverage percentage from the diagram in substep 4 (the `COVERAGE: X/Y (Z%)` line): - **>= target:** Pass. "Coverage gate: PASS ({X}%)." Continue. - **>= minimum, < target:** Use AskUserQuestion: - "AI-assessed coverage is {X}%. {N} code paths are untested. Target is {target}%." - RECOMMENDATION: Choose A because untested code paths are where production bugs hide. - Options: A) Generate more tests for remaining gaps (recommended) B) Ship anyway — I accept the coverage risk C) These paths don't need tests — mark as intentionally uncovered - If A: Loop back to substep 5 (generate tests) targeting the remaining gaps. After second pass, if still below target, present AskUserQuestion again with updated numbers. Maximum 2 generation passes total. - If B: Continue. Include in PR body: "Coverage gate: {X}% — user accepted risk." - If C: Continue. Include in PR body: "Coverage gate: {X}% — {N} paths intentionally uncovered." - **< minimum:** Use AskUserQuestion: - "AI-assessed coverage is critically low ({X}%). {N} of {M} code paths have no tests. Minimum threshold is {minimum}%." - RECOMMENDATION: Choose A because less than {minimum}% means more code is untested than tested. - Options: A) Generate tests for remaining gaps (recommended) B) Override — ship with low coverage (I understand the risk) - If A: Loop back to substep 5. Maximum 2 passes. If still below minimum after 2 passes, present the override choice again. - If B: Continue. Include in PR body: "Coverage gate: OVERRIDDEN at {X}%." **Coverage percentage undetermined:** If the coverage diagram doesn't produce a clear numeric percentage (ambiguous output, parse error), **skip the gate** with: "Coverage gate: could not determine percentage — skipping." Do not default to 0% or block. **Test-only diffs:** Skip the gate (same as the existing fast-path). **100% coverage:** "Coverage gate: PASS (100%)." Continue. ### Test Plan Artifact After producing the coverage diagram, write a test plan artifact so `/qa` and `/qa-only` can consume it: ```bash eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG USER=$(whoami) DATETIME=$(date +%Y%m%d-%H%M%S) ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-ship-test-plan-{datetime}.md`: ```markdown # Test Plan Generated by /ship on {date} Branch: {branch} Repo: {owner/repo} ## Affected Pages/Routes - {URL path} — {what to test and why} ## Key Interactions to Verify - {interaction description} on {page} ## Edge Cases - {edge case} on {page} ## Critical Paths - {end-to-end flow that must work} ``` > > After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it): > `{"coverage_pct":N,"gaps":N,"diagram":"","tests_added":["path",...]}` **Parent processing:** 1. Read the subagent's final output. Parse the LAST line as JSON. 2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit). 3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19). 4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.` **If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none. ---