diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index bccb13ff..79bfda75 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -203,6 +203,8 @@ Templates contain the workflows, tips, and examples that require human judgment. | `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) | | `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only | | `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review | +| `{{REVIEW_DASHBOARD}}` | `gen-skill-docs.ts` | Review Readiness Dashboard for /ship pre-flight | +| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /qa-design-review | This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear. diff --git a/TODOS.md b/TODOS.md index f52bb693..a0801d85 100644 --- a/TODOS.md +++ b/TODOS.md @@ -263,6 +263,30 @@ **Effort:** S **Priority:** P3 +### CI/CD generation for non-GitHub providers + +**What:** Extend CI/CD bootstrap to generate GitLab CI (`.gitlab-ci.yml`), CircleCI (`.circleci/config.yml`), and Bitrise pipelines. + +**Why:** Not all projects use GitHub Actions. Universal CI/CD bootstrap would make test bootstrap work for everyone. + +**Context:** v1 ships with GitHub Actions only. Detection logic already checks for `.gitlab-ci.yml`, `.circleci/`, `bitrise.yml` and skips with an informational note. Each provider needs ~20 lines of template text in `generateTestBootstrap()`. + +**Effort:** M +**Priority:** P3 +**Depends on:** Test bootstrap (shipped) + +### Auto-upgrade weak tests (★) to strong tests (★★★) + +**What:** When Step 3.4 coverage audit identifies existing ★-rated tests (smoke/trivial assertions), generate improved versions testing edge cases and error paths. + +**Why:** Many codebases have tests that technically exist but don't catch real bugs — `expect(component).toBeDefined()` isn't testing behavior. Upgrading these closes the gap between "has tests" and "has good tests." + +**Context:** Requires the quality scoring rubric from the test coverage audit. Modifying existing test files is riskier than creating new ones — needs careful diffing to ensure the upgraded test still passes. Consider creating a companion test file rather than modifying the original. + +**Effort:** M +**Priority:** P3 +**Depends on:** Test quality scoring (shipped) + ## Retro ### Deployment health tracking (retro + browse) diff --git a/qa-design-review/SKILL.md b/qa-design-review/SKILL.md index 0d8d0771..7044c560 100644 --- a/qa-design-review/SKILL.md +++ b/qa-design-review/SKILL.md @@ -14,6 +14,7 @@ allowed-tools: - Glob - Grep - AskUserQuestion + - WebSearch --- @@ -136,6 +137,161 @@ If `NEEDS_SETUP`: 2. Run: `cd && ./setup` 3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` +**Check test framework (bootstrap if needed):** + +## Test Framework Bootstrap + +**Detect existing test framework and project runtime:** + +```bash +# Detect project runtime +[ -f Gemfile ] && echo "RUNTIME:ruby" +[ -f package.json ] && echo "RUNTIME:node" +[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" +[ -f go.mod ] && echo "RUNTIME:go" +[ -f Cargo.toml ] && echo "RUNTIME:rust" +[ -f composer.json ] && echo "RUNTIME:php" +[ -f mix.exs ] && echo "RUNTIME:elixir" +# Detect sub-frameworks +[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails" +[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs" +# Check for existing test infrastructure +ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null +ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null +# Check opt-out marker +[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED" +``` + +**If test framework detected** (config files or test directories found): +Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." +Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). +Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.** + +**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.** + +**If NO runtime detected** (no config files found): Use AskUserQuestion: +"I couldn't detect your project's language. What runtime are you using?" +Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests. +If user picks H → write `.gstack/no-test-bootstrap` and continue without tests. + +**If runtime detected but no test framework — bootstrap:** + +### B2. Research best practices + +Use WebSearch to find current best practices for the detected runtime: +- `"[runtime] best test framework 2025 2026"` +- `"[framework A] vs [framework B] comparison"` + +If WebSearch is unavailable, use this built-in knowledge table: + +| Runtime | Primary recommendation | Alternative | +|---------|----------------------|-------------| +| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers | +| Node.js | vitest + @testing-library | jest + @testing-library | +| Next.js | vitest + @testing-library/react + playwright | jest + cypress | +| Python | pytest + pytest-cov | unittest | +| Go | stdlib testing + testify | stdlib only | +| Rust | cargo test (built-in) + mockall | — | +| PHP | phpunit + mockery | pest | +| Elixir | ExUnit (built-in) + ex_machina | — | + +### B3. Framework selection + +Use AskUserQuestion: +"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: +A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e +B) [Alternative] — [rationale]. Includes: [packages] +C) Skip — don't set up testing right now +RECOMMENDATION: Choose A because [reason based on project context]" + +If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests. + +If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially. + +### B4. Install and configure + +1. Install the chosen packages (npm/bun/gem/pip/etc.) +2. Create minimal config file +3. Create directory structure (test/, spec/, etc.) +4. Create one example test matching the project's code to verify setup works + +If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests. + +### B4.5. First real tests + +Generate 3-5 real tests for existing code: + +1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10` +2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions +3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES. +4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently. +5. Generate at least 1 test, cap at 5. + +Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures. + +### B5. Verify + +```bash +# Run the full test suite to confirm everything works +{detected test command} +``` + +If tests fail → debug once. If still failing → revert all bootstrap changes and warn user. + +### B5.5. CI/CD pipeline + +```bash +# Check CI provider +ls -d .github/ 2>/dev/null && echo "CI:github" +ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null +``` + +If `.github/` exists (or no CI detected — default to GitHub Actions): +Create `.github/workflows/test.yml` with: +- `runs-on: ubuntu-latest` +- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.) +- The same test command verified in B5 +- Trigger: push + pull_request + +If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually." + +### B6. Create TESTING.md + +First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content. + +Write TESTING.md with: +- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower." +- Framework name and version +- How to run tests (the verified command from B5) +- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests +- Conventions: file naming, assertion style, setup/teardown patterns + +### B7. Update CLAUDE.md + +First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate. + +Append a `## Testing` section: +- Run command and test directory +- Reference to TESTING.md +- Test expectations: + - 100% test coverage is the goal — tests make vibe coding safe + - When writing new functions, write a corresponding test + - When fixing a bug, write a regression test + - When adding error handling, write a test that triggers the error + - When adding a conditional (if/else, switch), write tests for BOTH paths + - Never commit code that makes existing tests fail + +### B8. Commit + +```bash +git status --porcelain +``` + +Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created): +`git commit -m "chore: bootstrap test framework ({framework name})"` + +--- + **Create output directories:** ```bash @@ -565,6 +721,18 @@ Take **before/after screenshot pair** for every fix. - **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state) - **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred" +### 8e.5. Regression Test (design-review variant) + +Design fixes are typically CSS-only. Only generate regression tests for fixes involving +JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering, +interactive state issues. + +For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review. + +If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing +test patterns, write a regression test encoding the exact bug condition, run it, commit if +passes or defer if fails). Commit format: `test(design): regression test for FINDING-NNN`. + ### 8f. Self-Regulation (STOP AND EVALUATE) Every 5 fixes (or after any revert), compute the design-fix risk level: @@ -639,7 +807,7 @@ If the repo has a `TODOS.md`: 11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. 12. **One commit per fix.** Never bundle multiple design fixes into one commit. -13. **Never modify tests or CI configuration.** Only fix application source code and styles. +13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. 15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask. 16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible. diff --git a/qa-design-review/SKILL.md.tmpl b/qa-design-review/SKILL.md.tmpl index 0053a494..5969fb52 100644 --- a/qa-design-review/SKILL.md.tmpl +++ b/qa-design-review/SKILL.md.tmpl @@ -14,6 +14,7 @@ allowed-tools: - Glob - Grep - AskUserQuestion + - WebSearch --- {{PREAMBLE}} @@ -54,6 +55,10 @@ fi {{BROWSE_SETUP}} +**Check test framework (bootstrap if needed):** + +{{TEST_BOOTSTRAP}} + **Create output directories:** ```bash @@ -153,6 +158,18 @@ Take **before/after screenshot pair** for every fix. - **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state) - **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred" +### 8e.5. Regression Test (design-review variant) + +Design fixes are typically CSS-only. Only generate regression tests for fixes involving +JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering, +interactive state issues. + +For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review. + +If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing +test patterns, write a regression test encoding the exact bug condition, run it, commit if +passes or defer if fails). Commit format: `test(design): regression test for FINDING-NNN`. + ### 8f. Self-Regulation (STOP AND EVALUATE) Every 5 fixes (or after any revert), compute the design-fix risk level: @@ -227,7 +244,7 @@ If the repo has a `TODOS.md`: 11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. 12. **One commit per fix.** Never bundle multiple design fixes into one commit. -13. **Never modify tests or CI configuration.** Only fix application source code and styles. +13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. 15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask. 16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible. diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 36f5fead..4fa0cf04 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -452,3 +452,4 @@ Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md` ## Additional Rules (qa-only specific) 11. **Never fix bugs.** Find and document only. Do not read source code, edit files, or suggest fixes in the report. Your job is to report what's broken, not to fix it. Use `/qa` for the test-fix-verify loop. +12. **No test framework detected?** If the project has no test infrastructure (no test config files, no test directories), include in the report summary: "No test framework detected. Run `/qa` to bootstrap one and enable regression test generation." diff --git a/qa-only/SKILL.md.tmpl b/qa-only/SKILL.md.tmpl index 101cd71c..831e71ed 100644 --- a/qa-only/SKILL.md.tmpl +++ b/qa-only/SKILL.md.tmpl @@ -97,3 +97,4 @@ Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md` ## Additional Rules (qa-only specific) 11. **Never fix bugs.** Find and document only. Do not read source code, edit files, or suggest fixes in the report. Your job is to report what's broken, not to fix it. Use `/qa` for the test-fix-verify loop. +12. **No test framework detected?** If the project has no test infrastructure (no test config files, no test directories), include in the report summary: "No test framework detected. Run `/qa` to bootstrap one and enable regression test generation." diff --git a/qa/SKILL.md b/qa/SKILL.md index 9bd8fc9b..44167be7 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -16,6 +16,7 @@ allowed-tools: - Glob - Grep - AskUserQuestion + - WebSearch --- @@ -157,6 +158,161 @@ If `NEEDS_SETUP`: 2. Run: `cd && ./setup` 3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` +**Check test framework (bootstrap if needed):** + +## Test Framework Bootstrap + +**Detect existing test framework and project runtime:** + +```bash +# Detect project runtime +[ -f Gemfile ] && echo "RUNTIME:ruby" +[ -f package.json ] && echo "RUNTIME:node" +[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" +[ -f go.mod ] && echo "RUNTIME:go" +[ -f Cargo.toml ] && echo "RUNTIME:rust" +[ -f composer.json ] && echo "RUNTIME:php" +[ -f mix.exs ] && echo "RUNTIME:elixir" +# Detect sub-frameworks +[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails" +[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs" +# Check for existing test infrastructure +ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null +ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null +# Check opt-out marker +[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED" +``` + +**If test framework detected** (config files or test directories found): +Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." +Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). +Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.** + +**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.** + +**If NO runtime detected** (no config files found): Use AskUserQuestion: +"I couldn't detect your project's language. What runtime are you using?" +Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests. +If user picks H → write `.gstack/no-test-bootstrap` and continue without tests. + +**If runtime detected but no test framework — bootstrap:** + +### B2. Research best practices + +Use WebSearch to find current best practices for the detected runtime: +- `"[runtime] best test framework 2025 2026"` +- `"[framework A] vs [framework B] comparison"` + +If WebSearch is unavailable, use this built-in knowledge table: + +| Runtime | Primary recommendation | Alternative | +|---------|----------------------|-------------| +| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers | +| Node.js | vitest + @testing-library | jest + @testing-library | +| Next.js | vitest + @testing-library/react + playwright | jest + cypress | +| Python | pytest + pytest-cov | unittest | +| Go | stdlib testing + testify | stdlib only | +| Rust | cargo test (built-in) + mockall | — | +| PHP | phpunit + mockery | pest | +| Elixir | ExUnit (built-in) + ex_machina | — | + +### B3. Framework selection + +Use AskUserQuestion: +"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: +A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e +B) [Alternative] — [rationale]. Includes: [packages] +C) Skip — don't set up testing right now +RECOMMENDATION: Choose A because [reason based on project context]" + +If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests. + +If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially. + +### B4. Install and configure + +1. Install the chosen packages (npm/bun/gem/pip/etc.) +2. Create minimal config file +3. Create directory structure (test/, spec/, etc.) +4. Create one example test matching the project's code to verify setup works + +If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests. + +### B4.5. First real tests + +Generate 3-5 real tests for existing code: + +1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10` +2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions +3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES. +4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently. +5. Generate at least 1 test, cap at 5. + +Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures. + +### B5. Verify + +```bash +# Run the full test suite to confirm everything works +{detected test command} +``` + +If tests fail → debug once. If still failing → revert all bootstrap changes and warn user. + +### B5.5. CI/CD pipeline + +```bash +# Check CI provider +ls -d .github/ 2>/dev/null && echo "CI:github" +ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null +``` + +If `.github/` exists (or no CI detected — default to GitHub Actions): +Create `.github/workflows/test.yml` with: +- `runs-on: ubuntu-latest` +- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.) +- The same test command verified in B5 +- Trigger: push + pull_request + +If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually." + +### B6. Create TESTING.md + +First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content. + +Write TESTING.md with: +- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower." +- Framework name and version +- How to run tests (the verified command from B5) +- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests +- Conventions: file naming, assertion style, setup/teardown patterns + +### B7. Update CLAUDE.md + +First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate. + +Append a `## Testing` section: +- Run command and test directory +- Reference to TESTING.md +- Test expectations: + - 100% test coverage is the goal — tests make vibe coding safe + - When writing new functions, write a corresponding test + - When fixing a bug, write a regression test + - When adding error handling, write a test that triggers the error + - When adding a conditional (if/else, switch), write tests for BOTH paths + - Never commit code that makes existing tests fail + +### B8. Commit + +```bash +git status --porcelain +``` + +Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created): +`git commit -m "chore: bootstrap test framework ({framework name})"` + +--- + **Create output directories:** ```bash @@ -541,6 +697,52 @@ $B snapshot -D - **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service) - **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred" +### 8e.5. Regression Test + +Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap. + +**1. Study the project's existing test patterns:** + +Read 2-3 test files closest to the fix (same directory, same code type). Match exactly: +- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns +The regression test must look like it was written by the same developer. + +**2. Write a regression test encoding the exact bug condition:** + +The test MUST: +- Set up the precondition that triggered the bug (the exact state that made it break) +- Perform the action that exposed the bug +- Assert the correct behavior (NOT "it renders" or "it doesn't throw") +- Include full attribution comment: + ``` + // Regression: ISSUE-NNN — {what broke} + // Found by /qa on {YYYY-MM-DD} + // Report: .gstack/qa-reports/qa-report-{domain}-{date}.md + ``` + +Test type decision: +- Console error / JS exception / logic bug → unit or integration test +- Broken form / API failure / data flow bug → integration test with request/response +- Visual bug with JS behavior (broken dropdown, animation) → component test +- Pure CSS → skip (caught by QA reruns) + +Generate unit tests. Mock all external dependencies (DB, API, Redis, file system). + +Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1. + +**3. Run only the new test file:** + +```bash +{detected test command} {new-test-file} +``` + +**4. Evaluate:** +- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"` +- Fails → fix test once. Still failing → delete test, defer. +- Taking >2 min exploration → skip and defer. + +**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic. + ### 8f. Self-Regulation (STOP AND EVALUATE) Every 5 fixes (or after any revert), compute the WTF-likelihood: @@ -614,6 +816,6 @@ If the repo has a `TODOS.md`: 11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. 12. **One commit per fix.** Never bundle multiple fixes into one commit. -13. **Never modify tests or CI configuration.** Only fix application source code. +13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. 15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask. diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl index 45dfbea6..6a7513c3 100644 --- a/qa/SKILL.md.tmpl +++ b/qa/SKILL.md.tmpl @@ -16,6 +16,7 @@ allowed-tools: - Glob - Grep - AskUserQuestion + - WebSearch --- {{PREAMBLE}} @@ -58,6 +59,10 @@ fi {{BROWSE_SETUP}} +**Check test framework (bootstrap if needed):** + +{{TEST_BOOTSTRAP}} + **Create output directories:** ```bash @@ -169,6 +174,52 @@ $B snapshot -D - **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service) - **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred" +### 8e.5. Regression Test + +Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap. + +**1. Study the project's existing test patterns:** + +Read 2-3 test files closest to the fix (same directory, same code type). Match exactly: +- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns +The regression test must look like it was written by the same developer. + +**2. Write a regression test encoding the exact bug condition:** + +The test MUST: +- Set up the precondition that triggered the bug (the exact state that made it break) +- Perform the action that exposed the bug +- Assert the correct behavior (NOT "it renders" or "it doesn't throw") +- Include full attribution comment: + ``` + // Regression: ISSUE-NNN — {what broke} + // Found by /qa on {YYYY-MM-DD} + // Report: .gstack/qa-reports/qa-report-{domain}-{date}.md + ``` + +Test type decision: +- Console error / JS exception / logic bug → unit or integration test +- Broken form / API failure / data flow bug → integration test with request/response +- Visual bug with JS behavior (broken dropdown, animation) → component test +- Pure CSS → skip (caught by QA reruns) + +Generate unit tests. Mock all external dependencies (DB, API, Redis, file system). + +Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1. + +**3. Run only the new test file:** + +```bash +{detected test command} {new-test-file} +``` + +**4. Evaluate:** +- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"` +- Fails → fix test once. Still failing → delete test, defer. +- Taking >2 min exploration → skip and defer. + +**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic. + ### 8f. Self-Regulation (STOP AND EVALUATE) Every 5 fixes (or after any revert), compute the WTF-likelihood: @@ -242,6 +293,6 @@ If the repo has a `TODOS.md`: 11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. 12. **One commit per fix.** Never bundle multiple fixes into one commit. -13. **Never modify tests or CI configuration.** Only fix application source code. +13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. 15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask. diff --git a/qa/templates/qa-report-template.md b/qa/templates/qa-report-template.md index 5466bda4..6aa30943 100644 --- a/qa/templates/qa-report-template.md +++ b/qa/templates/qa-report-template.md @@ -86,6 +86,22 @@ --- +## Regression Tests + +| Issue | Test File | Status | Description | +|-------|-----------|--------|-------------| +| ISSUE-NNN | path/to/test | committed / deferred / skipped | description | + +### Deferred Tests + +#### ISSUE-NNN: {title} +**Precondition:** {setup state that triggers the bug} +**Action:** {what the user does} +**Expected:** {correct behavior} +**Why deferred:** {reason} + +--- + ## Ship Readiness | Metric | Value | diff --git a/retro/SKILL.md b/retro/SKILL.md index c7781525..e7cd3d2c 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -164,6 +164,15 @@ cat ~/.gstack/greptile-history.md 2>/dev/null || true # 9. TODOS.md backlog (if available) cat TODOS.md 2>/dev/null || true + +# 10. Test file count +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' 2>/dev/null | grep -v node_modules | wc -l + +# 11. Regression test commits in window +git log origin/ --since="" --oneline --grep="test(qa):" --grep="test(design):" --grep="test: coverage" + +# 12. Test files changed in window +git log origin/ --since="" --format="" --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l ``` ### Step 2: Compute Metrics @@ -185,6 +194,7 @@ Calculate and present these metrics in a summary table: | Detected sessions | N | | Avg LOC/session-hour | N | | Greptile signal | N% (Y catches, Z FPs) | +| Test Health | N total tests · M added this period · K regression tests | Then show a **per-author leaderboard** immediately below: @@ -408,7 +418,17 @@ Use the Write tool to save the JSON file with this schema: } ``` -**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely. +**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely. + +Include test health data in the JSON when test files exist: +```json + "test_health": { + "total_test_files": 47, + "tests_added_this_period": 5, + "regression_test_commits": 3, + "test_files_changed": 8 + } +``` Include backlog data in the JSON when TODOS.md exists: ```json @@ -464,6 +484,13 @@ Narrative covering: - Any XL PRs that should have been split - Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)" +### Test Health +- Total test files: N (from command 10) +- Tests added this period: M (from command 12 — test files changed) +- Regression test commits: list `test(qa):` and `test(design):` and `test: coverage` commits from command 11 +- If prior retro exists and has `test_health`: show delta "Test count: {last} → {now} (+{delta})" +- If test ratio < 20%: flag as growth area — "100% test coverage is the goal. Tests make vibe coding safe." + ### Focus & Highlights (from Step 8) - Focus score with interpretation diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index 2f39fb5c..bfbc2003 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -99,6 +99,15 @@ cat ~/.gstack/greptile-history.md 2>/dev/null || true # 9. TODOS.md backlog (if available) cat TODOS.md 2>/dev/null || true + +# 10. Test file count +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' 2>/dev/null | grep -v node_modules | wc -l + +# 11. Regression test commits in window +git log origin/ --since="" --oneline --grep="test(qa):" --grep="test(design):" --grep="test: coverage" + +# 12. Test files changed in window +git log origin/ --since="" --format="" --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l ``` ### Step 2: Compute Metrics @@ -120,6 +129,7 @@ Calculate and present these metrics in a summary table: | Detected sessions | N | | Avg LOC/session-hour | N | | Greptile signal | N% (Y catches, Z FPs) | +| Test Health | N total tests · M added this period · K regression tests | Then show a **per-author leaderboard** immediately below: @@ -343,7 +353,17 @@ Use the Write tool to save the JSON file with this schema: } ``` -**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely. +**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely. + +Include test health data in the JSON when test files exist: +```json + "test_health": { + "total_test_files": 47, + "tests_added_this_period": 5, + "regression_test_commits": 3, + "test_files_changed": 8 + } +``` Include backlog data in the JSON when TODOS.md exists: ```json @@ -399,6 +419,13 @@ Narrative covering: - Any XL PRs that should have been split - Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)" +### Test Health +- Total test files: N (from command 10) +- Tests added this period: M (from command 12 — test files changed) +- Regression test commits: list `test(qa):` and `test(design):` and `test: coverage` commits from command 11 +- If prior retro exists and has `test_health`: show delta "Test count: {last} → {now} (+{delta})" +- If test ratio < 20%: flag as growth area — "100% test coverage is the goal. Tests make vibe coding safe." + ### Focus & Highlights (from Step 8) - Focus score with interpretation diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index a9d3bce6..88df03e3 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -846,6 +846,161 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - Informational only — does NOT block.`; } +function generateTestBootstrap(): string { + return `## Test Framework Bootstrap + +**Detect existing test framework and project runtime:** + +\`\`\`bash +# Detect project runtime +[ -f Gemfile ] && echo "RUNTIME:ruby" +[ -f package.json ] && echo "RUNTIME:node" +[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" +[ -f go.mod ] && echo "RUNTIME:go" +[ -f Cargo.toml ] && echo "RUNTIME:rust" +[ -f composer.json ] && echo "RUNTIME:php" +[ -f mix.exs ] && echo "RUNTIME:elixir" +# Detect sub-frameworks +[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails" +[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs" +# Check for existing test infrastructure +ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null +ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null +# Check opt-out marker +[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED" +\`\`\` + +**If test framework detected** (config files or test directories found): +Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." +Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). +Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.** + +**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.** + +**If NO runtime detected** (no config files found): Use AskUserQuestion: +"I couldn't detect your project's language. What runtime are you using?" +Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests. +If user picks H → write \`.gstack/no-test-bootstrap\` and continue without tests. + +**If runtime detected but no test framework — bootstrap:** + +### B2. Research best practices + +Use WebSearch to find current best practices for the detected runtime: +- \`"[runtime] best test framework 2025 2026"\` +- \`"[framework A] vs [framework B] comparison"\` + +If WebSearch is unavailable, use this built-in knowledge table: + +| Runtime | Primary recommendation | Alternative | +|---------|----------------------|-------------| +| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers | +| Node.js | vitest + @testing-library | jest + @testing-library | +| Next.js | vitest + @testing-library/react + playwright | jest + cypress | +| Python | pytest + pytest-cov | unittest | +| Go | stdlib testing + testify | stdlib only | +| Rust | cargo test (built-in) + mockall | — | +| PHP | phpunit + mockery | pest | +| Elixir | ExUnit (built-in) + ex_machina | — | + +### B3. Framework selection + +Use AskUserQuestion: +"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: +A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e +B) [Alternative] — [rationale]. Includes: [packages] +C) Skip — don't set up testing right now +RECOMMENDATION: Choose A because [reason based on project context]" + +If user picks C → write \`.gstack/no-test-bootstrap\`. Tell user: "If you change your mind later, delete \`.gstack/no-test-bootstrap\` and re-run." Continue without tests. + +If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially. + +### B4. Install and configure + +1. Install the chosen packages (npm/bun/gem/pip/etc.) +2. Create minimal config file +3. Create directory structure (test/, spec/, etc.) +4. Create one example test matching the project's code to verify setup works + +If package installation fails → debug once. If still failing → revert with \`git checkout -- package.json package-lock.json\` (or equivalent for the runtime). Warn user and continue without tests. + +### B4.5. First real tests + +Generate 3-5 real tests for existing code: + +1. **Find recently changed files:** \`git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10\` +2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions +3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never \`expect(x).toBeDefined()\` — test what the code DOES. +4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently. +5. Generate at least 1 test, cap at 5. + +Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures. + +### B5. Verify + +\`\`\`bash +# Run the full test suite to confirm everything works +{detected test command} +\`\`\` + +If tests fail → debug once. If still failing → revert all bootstrap changes and warn user. + +### B5.5. CI/CD pipeline + +\`\`\`bash +# Check CI provider +ls -d .github/ 2>/dev/null && echo "CI:github" +ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null +\`\`\` + +If \`.github/\` exists (or no CI detected — default to GitHub Actions): +Create \`.github/workflows/test.yml\` with: +- \`runs-on: ubuntu-latest\` +- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.) +- The same test command verified in B5 +- Trigger: push + pull_request + +If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually." + +### B6. Create TESTING.md + +First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content. + +Write TESTING.md with: +- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower." +- Framework name and version +- How to run tests (the verified command from B5) +- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests +- Conventions: file naming, assertion style, setup/teardown patterns + +### B7. Update CLAUDE.md + +First check: If CLAUDE.md already has a \`## Testing\` section → skip. Don't duplicate. + +Append a \`## Testing\` section: +- Run command and test directory +- Reference to TESTING.md +- Test expectations: + - 100% test coverage is the goal — tests make vibe coding safe + - When writing new functions, write a corresponding test + - When fixing a bug, write a regression test + - When adding error handling, write a test that triggers the error + - When adding a conditional (if/else, switch), write tests for BOTH paths + - Never commit code that makes existing tests fail + +### B8. Commit + +\`\`\`bash +git status --porcelain +\`\`\` + +Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created): +\`git commit -m "chore: bootstrap test framework ({framework name})"\` + +---`; +} + const RESOLVERS: Record string> = { COMMAND_REFERENCE: generateCommandReference, SNAPSHOT_FLAGS: generateSnapshotFlags, @@ -855,6 +1010,7 @@ const RESOLVERS: Record string> = { QA_METHODOLOGY: generateQAMethodology, DESIGN_METHODOLOGY: generateDesignMethodology, REVIEW_DASHBOARD: generateReviewDashboard, + TEST_BOOTSTRAP: generateTestBootstrap, }; // ─── Template Processing ──────────────────────────────────── diff --git a/ship/SKILL.md b/ship/SKILL.md index e7b8b753..e72e2604 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -11,6 +11,7 @@ allowed-tools: - Grep - Glob - AskUserQuestion + - WebSearch --- @@ -121,6 +122,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat - Multi-file changesets (auto-split into bisectable commits) - TODOS.md completed-item detection (auto-mark) - Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically) +- Test coverage gaps (auto-generate and commit, or flag in PR body) --- @@ -185,6 +187,163 @@ git fetch origin && git merge origin/ --no-edit --- +## Step 2.5: Test Framework Bootstrap + +## Test Framework Bootstrap + +**Detect existing test framework and project runtime:** + +```bash +# Detect project runtime +[ -f Gemfile ] && echo "RUNTIME:ruby" +[ -f package.json ] && echo "RUNTIME:node" +[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python" +[ -f go.mod ] && echo "RUNTIME:go" +[ -f Cargo.toml ] && echo "RUNTIME:rust" +[ -f composer.json ] && echo "RUNTIME:php" +[ -f mix.exs ] && echo "RUNTIME:elixir" +# Detect sub-frameworks +[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails" +[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs" +# Check for existing test infrastructure +ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null +ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null +# Check opt-out marker +[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED" +``` + +**If test framework detected** (config files or test directories found): +Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap." +Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns). +Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.** + +**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.** + +**If NO runtime detected** (no config files found): Use AskUserQuestion: +"I couldn't detect your project's language. What runtime are you using?" +Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests. +If user picks H → write `.gstack/no-test-bootstrap` and continue without tests. + +**If runtime detected but no test framework — bootstrap:** + +### B2. Research best practices + +Use WebSearch to find current best practices for the detected runtime: +- `"[runtime] best test framework 2025 2026"` +- `"[framework A] vs [framework B] comparison"` + +If WebSearch is unavailable, use this built-in knowledge table: + +| Runtime | Primary recommendation | Alternative | +|---------|----------------------|-------------| +| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers | +| Node.js | vitest + @testing-library | jest + @testing-library | +| Next.js | vitest + @testing-library/react + playwright | jest + cypress | +| Python | pytest + pytest-cov | unittest | +| Go | stdlib testing + testify | stdlib only | +| Rust | cargo test (built-in) + mockall | — | +| PHP | phpunit + mockery | pest | +| Elixir | ExUnit (built-in) + ex_machina | — | + +### B3. Framework selection + +Use AskUserQuestion: +"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options: +A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e +B) [Alternative] — [rationale]. Includes: [packages] +C) Skip — don't set up testing right now +RECOMMENDATION: Choose A because [reason based on project context]" + +If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests. + +If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially. + +### B4. Install and configure + +1. Install the chosen packages (npm/bun/gem/pip/etc.) +2. Create minimal config file +3. Create directory structure (test/, spec/, etc.) +4. Create one example test matching the project's code to verify setup works + +If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests. + +### B4.5. First real tests + +Generate 3-5 real tests for existing code: + +1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10` +2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions +3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES. +4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently. +5. Generate at least 1 test, cap at 5. + +Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures. + +### B5. Verify + +```bash +# Run the full test suite to confirm everything works +{detected test command} +``` + +If tests fail → debug once. If still failing → revert all bootstrap changes and warn user. + +### B5.5. CI/CD pipeline + +```bash +# Check CI provider +ls -d .github/ 2>/dev/null && echo "CI:github" +ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null +``` + +If `.github/` exists (or no CI detected — default to GitHub Actions): +Create `.github/workflows/test.yml` with: +- `runs-on: ubuntu-latest` +- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.) +- The same test command verified in B5 +- Trigger: push + pull_request + +If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually." + +### B6. Create TESTING.md + +First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content. + +Write TESTING.md with: +- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower." +- Framework name and version +- How to run tests (the verified command from B5) +- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests +- Conventions: file naming, assertion style, setup/teardown patterns + +### B7. Update CLAUDE.md + +First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate. + +Append a `## Testing` section: +- Run command and test directory +- Reference to TESTING.md +- Test expectations: + - 100% test coverage is the goal — tests make vibe coding safe + - When writing new functions, write a corresponding test + - When fixing a bug, write a regression test + - When adding error handling, write a test that triggers the error + - When adding a conditional (if/else, switch), write tests for BOTH paths + - Never commit code that makes existing tests fail + +### B8. Commit + +```bash +git status --porcelain +``` + +Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created): +`git commit -m "chore: bootstrap test framework ({framework name})"` + +--- + +--- + ## Step 3: Run tests (on merged code) **Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls @@ -269,6 +428,94 @@ If multiple suites need to run, run them sequentially (each needs a test lane). --- +## Step 3.4: Test Coverage Audit + +100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned. + +**0. Before/after test count:** + +```bash +# Count test files before any generation +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l +``` + +Store this number for the PR body. + +**1. Build the code path map** from `git diff origin/...HEAD`: + +Extract all new or modified: +- Functions/methods (def, function, const, class methods) +- Conditional branches (if/else, switch/case, ternary, guard clauses, early returns) +- API routes/endpoints (route definitions, controller actions) +- Components (new files or new exports) +- Error handlers (try/catch, rescue, error boundaries, fallback paths) + +**2. Search for corresponding tests and score quality:** + +For each code path, search for a test exercising it: +- `src/services/billing.ts:processPayment` → `billing.test.ts`, `billing.spec.ts` +- `app/controllers/payments_controller.rb#create` → `test/controllers/payments_controller_test.rb` +- New if/else → tests for BOTH paths (not just happy path) +- New error handler → test triggering the error condition + +Quality scoring rubric: +- ★★★ Tests behavior with edge cases AND error paths +- ★★ Tests correct behavior, happy path only +- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw") + +**3. Output ASCII coverage diagram:** + +``` +NEW CODE PATH COVERAGE MAP +=========================== +[+] src/services/billing.ts + │ + ├── processPayment() + │ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42 + │ ├── [GAP] Network timeout — NO TEST + │ └── [GAP] Invalid currency — NO TEST + │ + └── refundPayment() + ├── [★★ TESTED] Full refund — billing.test.ts:89 + └── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101 + +───────────────────────────────── +COVERAGE: 3/5 new paths tested (60%) +QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★) +GAPS: 2 paths need tests +───────────────────────────────── +``` + +**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue. + +**4. Generate tests for uncovered paths:** + +If test framework detected (or bootstrapped in Step 2.5): +- Prioritize error handlers and edge cases first (happy paths are more likely already tested) +- Read 2-3 existing test files to match conventions exactly +- Generate unit tests. Mock all external dependencies (DB, API, Redis). +- Write tests that exercise the specific uncovered path with real assertions +- Run each test. Passes → commit as `test: coverage for {feature}` +- Fails → fix once. Still fails → revert, note gap in diagram. + +Caps: 30 code paths max, 10 tests generated max, 2-min per-test exploration cap. + +If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured." + +**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit." + +**5. After-count and coverage summary:** + +```bash +# Count test files after generation +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l +``` + +For PR body: `Tests: {before} → {after} (+{delta} new)` +Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.` + +--- + ## Step 3.5: Pre-Landing Review Review the diff for structural issues that tests don't catch. @@ -497,6 +744,10 @@ gh pr create --base --title ": " --body "$(cat <<'EOF' ## Summary +## Test Coverage + + + ## Pre-Landing Review @@ -538,4 +789,5 @@ EOF - **Split commits for bisectability** — each commit = one logical change. - **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done. - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies. +- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests. - **The goal is: user says `/ship`, next thing they see is the review + PR URL.** diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index 2a24bea3..52e42b62 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -11,6 +11,7 @@ allowed-tools: - Grep - Glob - AskUserQuestion + - WebSearch --- {{PREAMBLE}} @@ -39,6 +40,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat - Multi-file changesets (auto-split into bisectable commits) - TODOS.md completed-item detection (auto-mark) - Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically) +- Test coverage gaps (auto-generate and commit, or flag in PR body) --- @@ -75,6 +77,12 @@ git fetch origin && git merge origin/ --no-edit --- +## Step 2.5: Test Framework Bootstrap + +{{TEST_BOOTSTRAP}} + +--- + ## Step 3: Run tests (on merged code) **Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls @@ -159,6 +167,94 @@ If multiple suites need to run, run them sequentially (each needs a test lane). --- +## Step 3.4: Test Coverage Audit + +100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned. + +**0. Before/after test count:** + +```bash +# Count test files before any generation +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l +``` + +Store this number for the PR body. + +**1. Build the code path map** from `git diff origin/...HEAD`: + +Extract all new or modified: +- Functions/methods (def, function, const, class methods) +- Conditional branches (if/else, switch/case, ternary, guard clauses, early returns) +- API routes/endpoints (route definitions, controller actions) +- Components (new files or new exports) +- Error handlers (try/catch, rescue, error boundaries, fallback paths) + +**2. Search for corresponding tests and score quality:** + +For each code path, search for a test exercising it: +- `src/services/billing.ts:processPayment` → `billing.test.ts`, `billing.spec.ts` +- `app/controllers/payments_controller.rb#create` → `test/controllers/payments_controller_test.rb` +- New if/else → tests for BOTH paths (not just happy path) +- New error handler → test triggering the error condition + +Quality scoring rubric: +- ★★★ Tests behavior with edge cases AND error paths +- ★★ Tests correct behavior, happy path only +- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw") + +**3. Output ASCII coverage diagram:** + +``` +NEW CODE PATH COVERAGE MAP +=========================== +[+] src/services/billing.ts + │ + ├── processPayment() + │ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42 + │ ├── [GAP] Network timeout — NO TEST + │ └── [GAP] Invalid currency — NO TEST + │ + └── refundPayment() + ├── [★★ TESTED] Full refund — billing.test.ts:89 + └── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101 + +───────────────────────────────── +COVERAGE: 3/5 new paths tested (60%) +QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★) +GAPS: 2 paths need tests +───────────────────────────────── +``` + +**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue. + +**4. Generate tests for uncovered paths:** + +If test framework detected (or bootstrapped in Step 2.5): +- Prioritize error handlers and edge cases first (happy paths are more likely already tested) +- Read 2-3 existing test files to match conventions exactly +- Generate unit tests. Mock all external dependencies (DB, API, Redis). +- Write tests that exercise the specific uncovered path with real assertions +- Run each test. Passes → commit as `test: coverage for {feature}` +- Fails → fix once. Still fails → revert, note gap in diagram. + +Caps: 30 code paths max, 10 tests generated max, 2-min per-test exploration cap. + +If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured." + +**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit." + +**5. After-count and coverage summary:** + +```bash +# Count test files after generation +find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l +``` + +For PR body: `Tests: {before} → {after} (+{delta} new)` +Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.` + +--- + ## Step 3.5: Pre-Landing Review Review the diff for structural issues that tests don't catch. @@ -387,6 +483,10 @@ gh pr create --base --title ": " --body "$(cat <<'EOF' ## Summary +## Test Coverage + + + ## Pre-Landing Review @@ -428,4 +528,5 @@ EOF - **Split commits for bisectability** — each commit = one logical change. - **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done. - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies. +- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests. - **The goal is: user says `/ship`, next thing they see is the review + PR URL.** diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index 4378c322..bb7ed897 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -2215,6 +2215,269 @@ Review the site at ${serverUrl}. Use --quick mode. Skip any AskUserQuestion call }, 420_000); }); +// --- Test Bootstrap E2E --- + +describeE2E('Test Bootstrap E2E', () => { + let bootstrapDir: string; + let bootstrapServer: ReturnType; + + beforeAll(() => { + bootstrapDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-bootstrap-')); + setupBrowseShims(bootstrapDir); + + // Copy qa skill files + copyDirSync(path.join(ROOT, 'qa'), path.join(bootstrapDir, 'qa')); + + // Create a minimal Node.js project with NO test framework + fs.writeFileSync(path.join(bootstrapDir, 'package.json'), JSON.stringify({ + name: 'test-bootstrap-app', + version: '1.0.0', + type: 'module', + }, null, 2)); + + // Create a simple app file with a bug + fs.writeFileSync(path.join(bootstrapDir, 'app.js'), ` +export function add(a, b) { return a + b; } +export function subtract(a, b) { return a - b; } +export function divide(a, b) { return a / b; } // BUG: no zero check +`); + + // Create a simple HTML page with a bug + fs.writeFileSync(path.join(bootstrapDir, 'index.html'), ` + +Bootstrap Test + +

Test App

+ Broken Link + + + +`); + + // Init git repo + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: bootstrapDir, stdio: 'pipe', timeout: 5000 }); + run('git', ['init']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial commit']); + + // Serve from working directory + bootstrapServer = Bun.serve({ + port: 0, + hostname: '127.0.0.1', + fetch(req) { + const url = new URL(req.url); + let filePath = url.pathname === '/' ? '/index.html' : url.pathname; + filePath = filePath.replace(/^\//, ''); + const fullPath = path.join(bootstrapDir, filePath); + if (!fs.existsSync(fullPath)) { + return new Response('Not Found', { status: 404 }); + } + const content = fs.readFileSync(fullPath, 'utf-8'); + return new Response(content, { + headers: { 'Content-Type': 'text/html' }, + }); + }, + }); + }); + + afterAll(() => { + bootstrapServer?.stop(); + try { fs.rmSync(bootstrapDir, { recursive: true, force: true }); } catch {} + }); + + test('/qa bootstrap + regression test on zero-test project', async () => { + const serverUrl = `http://127.0.0.1:${bootstrapServer!.port}`; + + const result = await runSkillTest({ + prompt: `You have a browse binary at ${browseBin}. Assign it to B variable like: B="${browseBin}" + +Read the file qa/SKILL.md for the QA workflow instructions. + +Run a Quick-tier QA test on ${serverUrl} +The source code for this page is at ${bootstrapDir}/index.html — you can fix bugs there. +Do NOT use AskUserQuestion — for any AskUserQuestion prompts, choose the RECOMMENDED option automatically. +Write your report to ${bootstrapDir}/qa-reports/qa-report.md + +This project has NO test framework. When the bootstrap asks, pick vitest (option A). +This is a test+fix loop: find bugs, fix them, write regression tests, commit each fix.`, + workingDirectory: bootstrapDir, + maxTurns: 50, + allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'], + timeout: 420_000, + testName: 'qa-bootstrap', + runId, + }); + + logCost('/qa bootstrap', result); + recordE2E('/qa bootstrap + regression test', 'Test Bootstrap E2E', result, { + passed: ['success', 'error_max_turns'].includes(result.exitReason), + }); + + expect(['success', 'error_max_turns']).toContain(result.exitReason); + + // Verify bootstrap created test infrastructure + const hasTestConfig = fs.existsSync(path.join(bootstrapDir, 'vitest.config.ts')) + || fs.existsSync(path.join(bootstrapDir, 'vitest.config.js')) + || fs.existsSync(path.join(bootstrapDir, 'jest.config.js')) + || fs.existsSync(path.join(bootstrapDir, 'jest.config.ts')); + console.log(`Test config created: ${hasTestConfig}`); + + const hasTestingMd = fs.existsSync(path.join(bootstrapDir, 'TESTING.md')); + console.log(`TESTING.md created: ${hasTestingMd}`); + + // Check for bootstrap commit + const gitLog = spawnSync('git', ['log', '--oneline', '--grep=bootstrap'], { + cwd: bootstrapDir, stdio: 'pipe', + }); + const bootstrapCommits = gitLog.stdout.toString().trim(); + console.log(`Bootstrap commits: ${bootstrapCommits || 'none'}`); + + // Check for regression test commits + const regressionLog = spawnSync('git', ['log', '--oneline', '--grep=test(qa)'], { + cwd: bootstrapDir, stdio: 'pipe', + }); + const regressionCommits = regressionLog.stdout.toString().trim(); + console.log(`Regression test commits: ${regressionCommits || 'none'}`); + + // Verify at least the bootstrap happened (fix commits are bonus) + const allCommits = spawnSync('git', ['log', '--oneline'], { + cwd: bootstrapDir, stdio: 'pipe', + }); + const totalCommits = allCommits.stdout.toString().trim().split('\n').length; + console.log(`Total commits: ${totalCommits}`); + expect(totalCommits).toBeGreaterThan(1); // At least initial + bootstrap + }, 420_000); +}); + +// --- Test Coverage Audit E2E --- + +describeE2E('Test Coverage Audit E2E', () => { + let coverageDir: string; + + beforeAll(() => { + coverageDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-coverage-')); + + // Copy ship skill files + copyDirSync(path.join(ROOT, 'ship'), path.join(coverageDir, 'ship')); + copyDirSync(path.join(ROOT, 'review'), path.join(coverageDir, 'review')); + + // Create a Node.js project WITH test framework but coverage gaps + fs.writeFileSync(path.join(coverageDir, 'package.json'), JSON.stringify({ + name: 'test-coverage-app', + version: '1.0.0', + type: 'module', + scripts: { test: 'echo "no tests yet"' }, + devDependencies: { vitest: '^1.0.0' }, + }, null, 2)); + + // Create vitest config + fs.writeFileSync(path.join(coverageDir, 'vitest.config.ts'), + `import { defineConfig } from 'vitest/config';\nexport default defineConfig({ test: {} });\n`); + + fs.writeFileSync(path.join(coverageDir, 'VERSION'), '0.1.0.0\n'); + fs.writeFileSync(path.join(coverageDir, 'CHANGELOG.md'), '# Changelog\n'); + + // Create source file with multiple code paths + fs.mkdirSync(path.join(coverageDir, 'src'), { recursive: true }); + fs.writeFileSync(path.join(coverageDir, 'src', 'billing.ts'), ` +export function processPayment(amount: number, currency: string) { + if (amount <= 0) throw new Error('Invalid amount'); + if (currency !== 'USD' && currency !== 'EUR') throw new Error('Unsupported currency'); + return { status: 'success', amount, currency }; +} + +export function refundPayment(paymentId: string, reason: string) { + if (!paymentId) throw new Error('Payment ID required'); + if (!reason) throw new Error('Reason required'); + return { status: 'refunded', paymentId, reason }; +} +`); + + // Create a test directory with ONE test (partial coverage) + fs.mkdirSync(path.join(coverageDir, 'test'), { recursive: true }); + fs.writeFileSync(path.join(coverageDir, 'test', 'billing.test.ts'), ` +import { describe, test, expect } from 'vitest'; +import { processPayment } from '../src/billing'; + +describe('processPayment', () => { + test('processes valid payment', () => { + const result = processPayment(100, 'USD'); + expect(result.status).toBe('success'); + }); + // GAP: no test for invalid amount + // GAP: no test for unsupported currency + // GAP: refundPayment not tested at all +}); +`); + + // Init git repo with main branch + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: coverageDir, stdio: 'pipe', timeout: 5000 }); + run('git', ['init', '-b', 'main']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial commit']); + + // Create feature branch + run('git', ['checkout', '-b', 'feature/billing']); + }); + + afterAll(() => { + try { fs.rmSync(coverageDir, { recursive: true, force: true }); } catch {} + }); + + test('/ship Step 3.4 produces coverage diagram', async () => { + const result = await runSkillTest({ + prompt: `Read the file ship/SKILL.md for the ship workflow instructions. + +You are on the feature/billing branch. The base branch is main. +This is a test project — there is no remote, no PR to create. + +ONLY run Step 3.4 (Test Coverage Audit) from the ship workflow. +Skip all other steps (tests, evals, review, version, changelog, commit, push, PR). + +The source code is in ${coverageDir}/src/billing.ts. +Existing tests are in ${coverageDir}/test/billing.test.ts. +The test command is: echo "tests pass" (mocked — just pretend tests pass). + +Produce the ASCII coverage diagram showing which code paths are tested and which have gaps. +Do NOT generate new tests — just produce the diagram and coverage summary. +Output the diagram directly.`, + workingDirectory: coverageDir, + maxTurns: 15, + allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'], + timeout: 120_000, + testName: 'ship-coverage-audit', + runId, + }); + + logCost('/ship coverage audit', result); + recordE2E('/ship Step 3.4 coverage audit', 'Test Coverage Audit E2E', result, { + passed: result.exitReason === 'success', + }); + + expect(result.exitReason).toBe('success'); + + // Check output contains coverage diagram elements + const output = result.output || ''; + const hasGap = output.includes('GAP') || output.includes('gap') || output.includes('NO TEST'); + const hasTested = output.includes('TESTED') || output.includes('tested') || output.includes('✓'); + const hasCoverage = output.includes('COVERAGE') || output.includes('coverage') || output.includes('paths tested'); + + console.log(`Output has GAP markers: ${hasGap}`); + console.log(`Output has TESTED markers: ${hasTested}`); + console.log(`Output has coverage summary: ${hasCoverage}`); + + // At minimum, the agent should have read the source and test files + const readCalls = result.toolCalls.filter(tc => tc.tool === 'Read'); + expect(readCalls.length).toBeGreaterThan(0); + }, 180_000); +}); + // Module-level afterAll — finalize eval collector after all tests complete afterAll(async () => { if (evalCollector) { diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 78a9bef7..d03cdb39 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -707,3 +707,201 @@ describe('gstack-slug', () => { expect(lines[1]).toMatch(/^BRANCH=.+/); }); }); + +// --- Test Bootstrap validation --- + +describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => { + test('TEST_BOOTSTRAP resolver produces valid content', () => { + const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(qaContent).toContain('Test Framework Bootstrap'); + expect(qaContent).toContain('RUNTIME:ruby'); + expect(qaContent).toContain('RUNTIME:node'); + expect(qaContent).toContain('RUNTIME:python'); + expect(qaContent).toContain('no-test-bootstrap'); + expect(qaContent).toContain('BOOTSTRAP_DECLINED'); + }); + + test('TEST_BOOTSTRAP appears in qa/SKILL.md', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Test Framework Bootstrap'); + expect(content).toContain('TESTING.md'); + expect(content).toContain('CLAUDE.md'); + }); + + test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Test Framework Bootstrap'); + expect(content).toContain('Step 2.5'); + }); + + test('TEST_BOOTSTRAP appears in qa-design-review/SKILL.md', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Test Framework Bootstrap'); + }); + + test('TEST_BOOTSTRAP does NOT appear in qa-only/SKILL.md', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8'); + expect(content).not.toContain('Test Framework Bootstrap'); + // But should have the recommendation note + expect(content).toContain('No test framework detected'); + expect(content).toContain('Run `/qa` to bootstrap'); + }); + + test('bootstrap includes framework knowledge table', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('vitest'); + expect(content).toContain('minitest'); + expect(content).toContain('pytest'); + expect(content).toContain('cargo test'); + expect(content).toContain('phpunit'); + expect(content).toContain('ExUnit'); + }); + + test('bootstrap includes CI/CD pipeline generation', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('.github/workflows/test.yml'); + expect(content).toContain('GitHub Actions'); + }); + + test('bootstrap includes first real tests step', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('First real tests'); + expect(content).toContain('git log --since=30.days'); + expect(content).toContain('Prioritize by risk'); + }); + + test('bootstrap includes vibe coding philosophy', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('vibe coding'); + expect(content).toContain('100% test coverage'); + }); + + test('WebSearch is in allowed-tools for qa, ship, qa-design-review', () => { + const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + const qaDesign = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8'); + expect(qa).toContain('WebSearch'); + expect(ship).toContain('WebSearch'); + expect(qaDesign).toContain('WebSearch'); + }); +}); + +// --- Phase 8e.5 regression test validation --- + +describe('Phase 8e.5 regression test generation', () => { + test('qa/SKILL.md contains Phase 8e.5', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('8e.5. Regression Test'); + expect(content).toContain('test(qa): regression test'); + expect(content).toContain('WTF-likelihood exclusion'); + }); + + test('qa/SKILL.md Rule 13 is amended for regression tests', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Only modify tests when generating regression tests in Phase 8e.5'); + expect(content).not.toContain('Never modify tests or CI configuration'); + }); + + test('qa-design-review has CSS-aware Phase 8e.5 variant', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8'); + expect(content).toContain('8e.5. Regression Test (design-review variant)'); + expect(content).toContain('CSS-only'); + expect(content).toContain('test(design): regression test'); + }); + + test('regression test includes full attribution comment format', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('// Regression: ISSUE-NNN'); + expect(content).toContain('// Found by /qa on'); + expect(content).toContain('// Report: .gstack/qa-reports/'); + }); + + test('regression test uses auto-incrementing names', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8'); + expect(content).toContain('auto-incrementing'); + expect(content).toContain('max number + 1'); + }); +}); + +// --- Step 3.4 coverage audit validation --- + +describe('Step 3.4 test coverage audit', () => { + test('ship/SKILL.md contains Step 3.4', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Step 3.4: Test Coverage Audit'); + expect(content).toContain('CODE PATH COVERAGE MAP'); + }); + + test('Step 3.4 includes quality scoring rubric', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('★★★'); + expect(content).toContain('★★'); + expect(content).toContain('edge cases AND error paths'); + expect(content).toContain('happy path only'); + }); + + test('Step 3.4 includes before/after test count', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Count test files before'); + expect(content).toContain('Count test files after'); + }); + + test('ship PR body includes Test Coverage section', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('## Test Coverage'); + }); + + test('ship rules include test generation rule', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Step 3.4 generates coverage tests'); + expect(content).toContain('Never commit failing tests'); + }); + + test('Step 3.4 includes vibe coding philosophy', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('vibe coding becomes yolo coding'); + }); +}); + +// --- Retro test health validation --- + +describe('Retro test health tracking', () => { + test('retro/SKILL.md has test health data gathering commands', () => { + const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8'); + expect(content).toContain('# 10. Test file count'); + expect(content).toContain('# 11. Regression test commits'); + expect(content).toContain('# 12. Test files changed'); + }); + + test('retro/SKILL.md has Test Health metrics row', () => { + const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8'); + expect(content).toContain('Test Health'); + expect(content).toContain('regression tests'); + }); + + test('retro/SKILL.md has Test Health narrative section', () => { + const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8'); + expect(content).toContain('### Test Health'); + expect(content).toContain('Total test files'); + expect(content).toContain('vibe coding safe'); + }); + + test('retro JSON schema includes test_health field', () => { + const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8'); + expect(content).toContain('test_health'); + expect(content).toContain('total_test_files'); + expect(content).toContain('regression_test_commits'); + }); +}); + +// --- QA report template regression tests section --- + +describe('QA report template', () => { + test('qa-report-template.md has Regression Tests section', () => { + const content = fs.readFileSync(path.join(ROOT, 'qa', 'templates', 'qa-report-template.md'), 'utf-8'); + expect(content).toContain('## Regression Tests'); + expect(content).toContain('committed / deferred / skipped'); + expect(content).toContain('### Deferred Tests'); + expect(content).toContain('**Precondition:**'); + }); +});