feat: test bootstrap, regression tests, coverage audit, retro test health

- Add {{TEST_BOOTSTRAP}} resolver to gen-skill-docs.ts - Add Phase 8e.5 regression test generation to /qa and /qa-design-review - Add Step 3.4 test coverage audit with quality scoring to /ship - Add test health tracking to /retro - Add 2 E2E evals (bootstrap + coverage audit) - Add 26 validation tests - Update ARCHITECTURE.md placeholder table - Add 2 P3 TODOs (CI/CD non-GitHub, auto-upgrade weak tests)
2026-06-08 11:03:54 +02:00 · 2026-03-17 10:41:40 -07:00
parent 73b00b4e29
commit 04eb7824bd
16 changed files with 1512 additions and 6 deletions
@@ -16,6 +16,7 @@ allowed-tools:
  - Glob
  - Grep
  - AskUserQuestion
+  - WebSearch
 ---

 {{PREAMBLE}}
@@ -58,6 +59,10 @@ fi

 {{BROWSE_SETUP}}

+**Check test framework (bootstrap if needed):**
+
+{{TEST_BOOTSTRAP}}
+
 **Create output directories:**

 ```bash
@@ -169,6 +174,52 @@ $B snapshot -D
 - **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
 - **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"

+### 8e.5. Regression Test
+
+Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
+
+**1. Study the project's existing test patterns:**
+
+Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
+- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
+The regression test must look like it was written by the same developer.
+
+**2. Write a regression test encoding the exact bug condition:**
+
+The test MUST:
+- Set up the precondition that triggered the bug (the exact state that made it break)
+- Perform the action that exposed the bug
+- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
+- Include full attribution comment:
+  ```
+  // Regression: ISSUE-NNN — {what broke}
+  // Found by /qa on {YYYY-MM-DD}
+  // Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
+  ```
+
+Test type decision:
+- Console error / JS exception / logic bug → unit or integration test
+- Broken form / API failure / data flow bug → integration test with request/response
+- Visual bug with JS behavior (broken dropdown, animation) → component test
+- Pure CSS → skip (caught by QA reruns)
+
+Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
+
+Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
+
+**3. Run only the new test file:**
+
+```bash
+{detected test command} {new-test-file}
+```
+
+**4. Evaluate:**
+- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
+- Fails → fix test once. Still failing → delete test, defer.
+- Taking >2 min exploration → skip and defer.
+
+**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
+
 ### 8f. Self-Regulation (STOP AND EVALUATE)

 Every 5 fixes (or after any revert), compute the WTF-likelihood:
@@ -242,6 +293,6 @@ If the repo has a `TODOS.md`:

 11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
 12. **One commit per fix.** Never bundle multiple fixes into one commit.
-13. **Never modify tests or CI configuration.** Only fix application source code.
+13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
 15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.