From f24cd778e2cf5877c0df7a0d4f1c6333f2ddb869 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 10:57:52 -0700 Subject: [PATCH] feat: coverage audit now maps user flows, interactions, and error states Step 3.4 now covers the full picture: code branches AND user-facing behavior. Maps user flows (complete journey through the feature), interaction edge cases (double-click, back button, stale state, slow connection), error states (what does the user actually see?), and boundary states (zero results, 10k results, max-length input). Coverage diagram splits into Code Path Coverage and User Flow Coverage sections with separate percentages. --- ship/SKILL.md | 58 +++++++++++++++++++++++++++++------ ship/SKILL.md.tmpl | 58 +++++++++++++++++++++++++++++------ test/skill-validation.test.ts | 2 +- 3 files changed, 99 insertions(+), 19 deletions(-) diff --git a/ship/SKILL.md b/ship/SKILL.md index 6f1057c4..9c48b942 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -469,23 +469,46 @@ Read every changed file. For each one, trace how data flows through the code — This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test. -**2. Check each branch against existing tests:** +**2. Map user flows, interactions, and error states:** -Go through your diagram branch by branch. For each one, search for a test that exercises it: +Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through: + +- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test. +- **Interaction edge cases:** What happens when the user does something unexpected? + - Double-click/rapid resubmit + - Navigate away mid-operation (back button, close tab, click another link) + - Submit with stale data (page sat open for 30 minutes, session expired) + - Slow connection (API takes 10 seconds — what does the user see?) + - Concurrent actions (two tabs, same form) +- **Error states the user can see:** For every error the code handles, what does the user actually experience? + - Is there a clear error message or a silent failure? + - Can the user recover (retry, go back, fix input) or are they stuck? + - What happens with no network? With a 500 from the API? With invalid data from the server? +- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input? + +Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else. + +**3. Check each branch against existing tests:** + +Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it: - Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb` - An if/else → look for tests covering BOTH the true AND false path - An error handler → look for a test that triggers that specific error condition - A call to `helperFn()` that has its own branches → those branches need tests too +- A user flow → look for an integration or E2E test that walks through the journey +- An interaction edge case → look for a test that simulates the unexpected action Quality scoring rubric: - ★★★ Tests behavior with edge cases AND error paths - ★★ Tests correct behavior, happy path only - ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw") -**3. Output ASCII coverage diagram:** +**4. Output ASCII coverage diagram:** + +Include BOTH code paths and user flows in the same diagram: ``` -NEW CODE PATH COVERAGE MAP +CODE PATH COVERAGE =========================== [+] src/services/billing.ts │ @@ -498,16 +521,33 @@ NEW CODE PATH COVERAGE MAP ├── [★★ TESTED] Full refund — billing.test.ts:89 └── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101 +USER FLOW COVERAGE +=========================== +[+] Payment checkout flow + │ + ├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15 + ├── [GAP] Double-click submit — NO TEST + ├── [GAP] Navigate away during payment — NO TEST + └── [★ TESTED] Form validation errors (checks render only) — checkout.test.ts:40 + +[+] Error states + │ + ├── [★★ TESTED] Card declined message — billing.test.ts:58 + ├── [GAP] Network timeout UX (what does user see?) — NO TEST + └── [GAP] Empty cart submission — NO TEST + ───────────────────────────────── -COVERAGE: 3/5 new paths tested (60%) -QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★) -GAPS: 2 paths need tests +COVERAGE: 5/12 paths tested (42%) + Code paths: 3/5 (60%) + User flows: 2/7 (29%) +QUALITY: ★★★: 2 ★★: 2 ★: 1 +GAPS: 7 paths need tests ───────────────────────────────── ``` **Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue. -**4. Generate tests for uncovered paths:** +**5. Generate tests for uncovered paths:** If test framework detected (or bootstrapped in Step 2.5): - Prioritize error handlers and edge cases first (happy paths are more likely already tested) @@ -523,7 +563,7 @@ If no test framework AND user declined bootstrap → diagram only, no generation **Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit." -**5. After-count and coverage summary:** +**6. After-count and coverage summary:** ```bash # Count test files after generation diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index 9b2fcbc0..cbf487ce 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -200,23 +200,46 @@ Read every changed file. For each one, trace how data flows through the code — This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test. -**2. Check each branch against existing tests:** +**2. Map user flows, interactions, and error states:** -Go through your diagram branch by branch. For each one, search for a test that exercises it: +Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through: + +- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test. +- **Interaction edge cases:** What happens when the user does something unexpected? + - Double-click/rapid resubmit + - Navigate away mid-operation (back button, close tab, click another link) + - Submit with stale data (page sat open for 30 minutes, session expired) + - Slow connection (API takes 10 seconds — what does the user see?) + - Concurrent actions (two tabs, same form) +- **Error states the user can see:** For every error the code handles, what does the user actually experience? + - Is there a clear error message or a silent failure? + - Can the user recover (retry, go back, fix input) or are they stuck? + - What happens with no network? With a 500 from the API? With invalid data from the server? +- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input? + +Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else. + +**3. Check each branch against existing tests:** + +Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it: - Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb` - An if/else → look for tests covering BOTH the true AND false path - An error handler → look for a test that triggers that specific error condition - A call to `helperFn()` that has its own branches → those branches need tests too +- A user flow → look for an integration or E2E test that walks through the journey +- An interaction edge case → look for a test that simulates the unexpected action Quality scoring rubric: - ★★★ Tests behavior with edge cases AND error paths - ★★ Tests correct behavior, happy path only - ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw") -**3. Output ASCII coverage diagram:** +**4. Output ASCII coverage diagram:** + +Include BOTH code paths and user flows in the same diagram: ``` -NEW CODE PATH COVERAGE MAP +CODE PATH COVERAGE =========================== [+] src/services/billing.ts │ @@ -229,16 +252,33 @@ NEW CODE PATH COVERAGE MAP ├── [★★ TESTED] Full refund — billing.test.ts:89 └── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101 +USER FLOW COVERAGE +=========================== +[+] Payment checkout flow + │ + ├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15 + ├── [GAP] Double-click submit — NO TEST + ├── [GAP] Navigate away during payment — NO TEST + └── [★ TESTED] Form validation errors (checks render only) — checkout.test.ts:40 + +[+] Error states + │ + ├── [★★ TESTED] Card declined message — billing.test.ts:58 + ├── [GAP] Network timeout UX (what does user see?) — NO TEST + └── [GAP] Empty cart submission — NO TEST + ───────────────────────────────── -COVERAGE: 3/5 new paths tested (60%) -QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★) -GAPS: 2 paths need tests +COVERAGE: 5/12 paths tested (42%) + Code paths: 3/5 (60%) + User flows: 2/7 (29%) +QUALITY: ★★★: 2 ★★: 2 ★: 1 +GAPS: 7 paths need tests ───────────────────────────────── ``` **Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue. -**4. Generate tests for uncovered paths:** +**5. Generate tests for uncovered paths:** If test framework detected (or bootstrapped in Step 2.5): - Prioritize error handlers and edge cases first (happy paths are more likely already tested) @@ -254,7 +294,7 @@ If no test framework AND user declined bootstrap → diagram only, no generation **Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit." -**5. After-count and coverage summary:** +**6. After-count and coverage summary:** ```bash # Count test files after generation diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 212f09e2..62ec3fdc 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -859,7 +859,7 @@ describe('Step 3.4 test coverage audit', () => { test('ship/SKILL.md contains Step 3.4', () => { const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); expect(content).toContain('Step 3.4: Test Coverage Audit'); - expect(content).toContain('CODE PATH COVERAGE MAP'); + expect(content).toContain('CODE PATH COVERAGE'); }); test('Step 3.4 includes quality scoring rubric', () => {