mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
feat: test bootstrap, regression tests, coverage audit, retro test health
- Add {{TEST_BOOTSTRAP}} resolver to gen-skill-docs.ts
- Add Phase 8e.5 regression test generation to /qa and /qa-design-review
- Add Step 3.4 test coverage audit with quality scoring to /ship
- Add test health tracking to /retro
- Add 2 E2E evals (bootstrap + coverage audit)
- Add 26 validation tests
- Update ARCHITECTURE.md placeholder table
- Add 2 P3 TODOs (CI/CD non-GitHub, auto-upgrade weak tests)
This commit is contained in:
@@ -203,6 +203,8 @@ Templates contain the workflows, tips, and examples that require human judgment.
|
||||
| `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
|
||||
| `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
|
||||
| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
|
||||
| `{{REVIEW_DASHBOARD}}` | `gen-skill-docs.ts` | Review Readiness Dashboard for /ship pre-flight |
|
||||
| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /qa-design-review |
|
||||
|
||||
This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.
|
||||
|
||||
|
||||
@@ -263,6 +263,30 @@
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
|
||||
### CI/CD generation for non-GitHub providers
|
||||
|
||||
**What:** Extend CI/CD bootstrap to generate GitLab CI (`.gitlab-ci.yml`), CircleCI (`.circleci/config.yml`), and Bitrise pipelines.
|
||||
|
||||
**Why:** Not all projects use GitHub Actions. Universal CI/CD bootstrap would make test bootstrap work for everyone.
|
||||
|
||||
**Context:** v1 ships with GitHub Actions only. Detection logic already checks for `.gitlab-ci.yml`, `.circleci/`, `bitrise.yml` and skips with an informational note. Each provider needs ~20 lines of template text in `generateTestBootstrap()`.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Test bootstrap (shipped)
|
||||
|
||||
### Auto-upgrade weak tests (★) to strong tests (★★★)
|
||||
|
||||
**What:** When Step 3.4 coverage audit identifies existing ★-rated tests (smoke/trivial assertions), generate improved versions testing edge cases and error paths.
|
||||
|
||||
**Why:** Many codebases have tests that technically exist but don't catch real bugs — `expect(component).toBeDefined()` isn't testing behavior. Upgrading these closes the gap between "has tests" and "has good tests."
|
||||
|
||||
**Context:** Requires the quality scoring rubric from the test coverage audit. Modifying existing test files is riskier than creating new ones — needs careful diffing to ensure the upgraded test still passes. Consider creating a companion test file rather than modifying the original.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Test quality scoring (shipped)
|
||||
|
||||
## Retro
|
||||
|
||||
### Deployment health tracking (retro + browse)
|
||||
|
||||
+169
-1
@@ -14,6 +14,7 @@ allowed-tools:
|
||||
- Glob
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
@@ -136,6 +137,161 @@ If `NEEDS_SETUP`:
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
**Check test framework (bootstrap if needed):**
|
||||
|
||||
## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
```
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- `"[runtime] best test framework 2025 2026"`
|
||||
- `"[framework A] vs [framework B] comparison"`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
```bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
```
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
```bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
```
|
||||
|
||||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||||
Create `.github/workflows/test.yml` with:
|
||||
- `runs-on: ubuntu-latest`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||||
|
||||
Append a `## Testing` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
```bash
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||||
|
||||
---
|
||||
|
||||
**Create output directories:**
|
||||
|
||||
```bash
|
||||
@@ -565,6 +721,18 @@ Take **before/after screenshot pair** for every fix.
|
||||
- **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state)
|
||||
- **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred"
|
||||
|
||||
### 8e.5. Regression Test (design-review variant)
|
||||
|
||||
Design fixes are typically CSS-only. Only generate regression tests for fixes involving
|
||||
JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering,
|
||||
interactive state issues.
|
||||
|
||||
For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review.
|
||||
|
||||
If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing
|
||||
test patterns, write a regression test encoding the exact bug condition, run it, commit if
|
||||
passes or defer if fails). Commit format: `test(design): regression test for FINDING-NNN`.
|
||||
|
||||
### 8f. Self-Regulation (STOP AND EVALUATE)
|
||||
|
||||
Every 5 fixes (or after any revert), compute the design-fix risk level:
|
||||
@@ -639,7 +807,7 @@ If the repo has a `TODOS.md`:
|
||||
|
||||
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
|
||||
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
|
||||
13. **Never modify tests or CI configuration.** Only fix application source code and styles.
|
||||
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
||||
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
||||
15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask.
|
||||
16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible.
|
||||
|
||||
@@ -14,6 +14,7 @@ allowed-tools:
|
||||
- Glob
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
@@ -54,6 +55,10 @@ fi
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
**Check test framework (bootstrap if needed):**
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
|
||||
**Create output directories:**
|
||||
|
||||
```bash
|
||||
@@ -153,6 +158,18 @@ Take **before/after screenshot pair** for every fix.
|
||||
- **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state)
|
||||
- **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred"
|
||||
|
||||
### 8e.5. Regression Test (design-review variant)
|
||||
|
||||
Design fixes are typically CSS-only. Only generate regression tests for fixes involving
|
||||
JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering,
|
||||
interactive state issues.
|
||||
|
||||
For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review.
|
||||
|
||||
If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing
|
||||
test patterns, write a regression test encoding the exact bug condition, run it, commit if
|
||||
passes or defer if fails). Commit format: `test(design): regression test for FINDING-NNN`.
|
||||
|
||||
### 8f. Self-Regulation (STOP AND EVALUATE)
|
||||
|
||||
Every 5 fixes (or after any revert), compute the design-fix risk level:
|
||||
@@ -227,7 +244,7 @@ If the repo has a `TODOS.md`:
|
||||
|
||||
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
|
||||
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
|
||||
13. **Never modify tests or CI configuration.** Only fix application source code and styles.
|
||||
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
||||
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
||||
15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask.
|
||||
16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible.
|
||||
|
||||
@@ -452,3 +452,4 @@ Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
|
||||
## Additional Rules (qa-only specific)
|
||||
|
||||
11. **Never fix bugs.** Find and document only. Do not read source code, edit files, or suggest fixes in the report. Your job is to report what's broken, not to fix it. Use `/qa` for the test-fix-verify loop.
|
||||
12. **No test framework detected?** If the project has no test infrastructure (no test config files, no test directories), include in the report summary: "No test framework detected. Run `/qa` to bootstrap one and enable regression test generation."
|
||||
|
||||
@@ -97,3 +97,4 @@ Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
|
||||
## Additional Rules (qa-only specific)
|
||||
|
||||
11. **Never fix bugs.** Find and document only. Do not read source code, edit files, or suggest fixes in the report. Your job is to report what's broken, not to fix it. Use `/qa` for the test-fix-verify loop.
|
||||
12. **No test framework detected?** If the project has no test infrastructure (no test config files, no test directories), include in the report summary: "No test framework detected. Run `/qa` to bootstrap one and enable regression test generation."
|
||||
|
||||
+203
-1
@@ -16,6 +16,7 @@ allowed-tools:
|
||||
- Glob
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
@@ -157,6 +158,161 @@ If `NEEDS_SETUP`:
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
**Check test framework (bootstrap if needed):**
|
||||
|
||||
## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
```
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- `"[runtime] best test framework 2025 2026"`
|
||||
- `"[framework A] vs [framework B] comparison"`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
```bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
```
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
```bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
```
|
||||
|
||||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||||
Create `.github/workflows/test.yml` with:
|
||||
- `runs-on: ubuntu-latest`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||||
|
||||
Append a `## Testing` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
```bash
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||||
|
||||
---
|
||||
|
||||
**Create output directories:**
|
||||
|
||||
```bash
|
||||
@@ -541,6 +697,52 @@ $B snapshot -D
|
||||
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
|
||||
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
|
||||
|
||||
### 8e.5. Regression Test
|
||||
|
||||
Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
|
||||
|
||||
**1. Study the project's existing test patterns:**
|
||||
|
||||
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
|
||||
- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
|
||||
The regression test must look like it was written by the same developer.
|
||||
|
||||
**2. Write a regression test encoding the exact bug condition:**
|
||||
|
||||
The test MUST:
|
||||
- Set up the precondition that triggered the bug (the exact state that made it break)
|
||||
- Perform the action that exposed the bug
|
||||
- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
|
||||
- Include full attribution comment:
|
||||
```
|
||||
// Regression: ISSUE-NNN — {what broke}
|
||||
// Found by /qa on {YYYY-MM-DD}
|
||||
// Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
|
||||
```
|
||||
|
||||
Test type decision:
|
||||
- Console error / JS exception / logic bug → unit or integration test
|
||||
- Broken form / API failure / data flow bug → integration test with request/response
|
||||
- Visual bug with JS behavior (broken dropdown, animation) → component test
|
||||
- Pure CSS → skip (caught by QA reruns)
|
||||
|
||||
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
|
||||
|
||||
Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
|
||||
|
||||
**3. Run only the new test file:**
|
||||
|
||||
```bash
|
||||
{detected test command} {new-test-file}
|
||||
```
|
||||
|
||||
**4. Evaluate:**
|
||||
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
|
||||
- Fails → fix test once. Still failing → delete test, defer.
|
||||
- Taking >2 min exploration → skip and defer.
|
||||
|
||||
**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
|
||||
|
||||
### 8f. Self-Regulation (STOP AND EVALUATE)
|
||||
|
||||
Every 5 fixes (or after any revert), compute the WTF-likelihood:
|
||||
@@ -614,6 +816,6 @@ If the repo has a `TODOS.md`:
|
||||
|
||||
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
|
||||
12. **One commit per fix.** Never bundle multiple fixes into one commit.
|
||||
13. **Never modify tests or CI configuration.** Only fix application source code.
|
||||
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
||||
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
||||
15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
|
||||
|
||||
+52
-1
@@ -16,6 +16,7 @@ allowed-tools:
|
||||
- Glob
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
@@ -58,6 +59,10 @@ fi
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
**Check test framework (bootstrap if needed):**
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
|
||||
**Create output directories:**
|
||||
|
||||
```bash
|
||||
@@ -169,6 +174,52 @@ $B snapshot -D
|
||||
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
|
||||
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
|
||||
|
||||
### 8e.5. Regression Test
|
||||
|
||||
Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
|
||||
|
||||
**1. Study the project's existing test patterns:**
|
||||
|
||||
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
|
||||
- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
|
||||
The regression test must look like it was written by the same developer.
|
||||
|
||||
**2. Write a regression test encoding the exact bug condition:**
|
||||
|
||||
The test MUST:
|
||||
- Set up the precondition that triggered the bug (the exact state that made it break)
|
||||
- Perform the action that exposed the bug
|
||||
- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
|
||||
- Include full attribution comment:
|
||||
```
|
||||
// Regression: ISSUE-NNN — {what broke}
|
||||
// Found by /qa on {YYYY-MM-DD}
|
||||
// Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
|
||||
```
|
||||
|
||||
Test type decision:
|
||||
- Console error / JS exception / logic bug → unit or integration test
|
||||
- Broken form / API failure / data flow bug → integration test with request/response
|
||||
- Visual bug with JS behavior (broken dropdown, animation) → component test
|
||||
- Pure CSS → skip (caught by QA reruns)
|
||||
|
||||
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
|
||||
|
||||
Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
|
||||
|
||||
**3. Run only the new test file:**
|
||||
|
||||
```bash
|
||||
{detected test command} {new-test-file}
|
||||
```
|
||||
|
||||
**4. Evaluate:**
|
||||
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
|
||||
- Fails → fix test once. Still failing → delete test, defer.
|
||||
- Taking >2 min exploration → skip and defer.
|
||||
|
||||
**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
|
||||
|
||||
### 8f. Self-Regulation (STOP AND EVALUATE)
|
||||
|
||||
Every 5 fixes (or after any revert), compute the WTF-likelihood:
|
||||
@@ -242,6 +293,6 @@ If the repo has a `TODOS.md`:
|
||||
|
||||
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
|
||||
12. **One commit per fix.** Never bundle multiple fixes into one commit.
|
||||
13. **Never modify tests or CI configuration.** Only fix application source code.
|
||||
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
||||
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
||||
15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
|
||||
|
||||
@@ -86,6 +86,22 @@
|
||||
|
||||
---
|
||||
|
||||
## Regression Tests
|
||||
|
||||
| Issue | Test File | Status | Description |
|
||||
|-------|-----------|--------|-------------|
|
||||
| ISSUE-NNN | path/to/test | committed / deferred / skipped | description |
|
||||
|
||||
### Deferred Tests
|
||||
|
||||
#### ISSUE-NNN: {title}
|
||||
**Precondition:** {setup state that triggers the bug}
|
||||
**Action:** {what the user does}
|
||||
**Expected:** {correct behavior}
|
||||
**Why deferred:** {reason}
|
||||
|
||||
---
|
||||
|
||||
## Ship Readiness
|
||||
|
||||
| Metric | Value |
|
||||
|
||||
+28
-1
@@ -164,6 +164,15 @@ cat ~/.gstack/greptile-history.md 2>/dev/null || true
|
||||
|
||||
# 9. TODOS.md backlog (if available)
|
||||
cat TODOS.md 2>/dev/null || true
|
||||
|
||||
# 10. Test file count
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' 2>/dev/null | grep -v node_modules | wc -l
|
||||
|
||||
# 11. Regression test commits in window
|
||||
git log origin/<default> --since="<window>" --oneline --grep="test(qa):" --grep="test(design):" --grep="test: coverage"
|
||||
|
||||
# 12. Test files changed in window
|
||||
git log origin/<default> --since="<window>" --format="" --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l
|
||||
```
|
||||
|
||||
### Step 2: Compute Metrics
|
||||
@@ -185,6 +194,7 @@ Calculate and present these metrics in a summary table:
|
||||
| Detected sessions | N |
|
||||
| Avg LOC/session-hour | N |
|
||||
| Greptile signal | N% (Y catches, Z FPs) |
|
||||
| Test Health | N total tests · M added this period · K regression tests |
|
||||
|
||||
Then show a **per-author leaderboard** immediately below:
|
||||
|
||||
@@ -408,7 +418,17 @@ Use the Write tool to save the JSON file with this schema:
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.
|
||||
|
||||
Include test health data in the JSON when test files exist:
|
||||
```json
|
||||
"test_health": {
|
||||
"total_test_files": 47,
|
||||
"tests_added_this_period": 5,
|
||||
"regression_test_commits": 3,
|
||||
"test_files_changed": 8
|
||||
}
|
||||
```
|
||||
|
||||
Include backlog data in the JSON when TODOS.md exists:
|
||||
```json
|
||||
@@ -464,6 +484,13 @@ Narrative covering:
|
||||
- Any XL PRs that should have been split
|
||||
- Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)"
|
||||
|
||||
### Test Health
|
||||
- Total test files: N (from command 10)
|
||||
- Tests added this period: M (from command 12 — test files changed)
|
||||
- Regression test commits: list `test(qa):` and `test(design):` and `test: coverage` commits from command 11
|
||||
- If prior retro exists and has `test_health`: show delta "Test count: {last} → {now} (+{delta})"
|
||||
- If test ratio < 20%: flag as growth area — "100% test coverage is the goal. Tests make vibe coding safe."
|
||||
|
||||
### Focus & Highlights
|
||||
(from Step 8)
|
||||
- Focus score with interpretation
|
||||
|
||||
+28
-1
@@ -99,6 +99,15 @@ cat ~/.gstack/greptile-history.md 2>/dev/null || true
|
||||
|
||||
# 9. TODOS.md backlog (if available)
|
||||
cat TODOS.md 2>/dev/null || true
|
||||
|
||||
# 10. Test file count
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' 2>/dev/null | grep -v node_modules | wc -l
|
||||
|
||||
# 11. Regression test commits in window
|
||||
git log origin/<default> --since="<window>" --oneline --grep="test(qa):" --grep="test(design):" --grep="test: coverage"
|
||||
|
||||
# 12. Test files changed in window
|
||||
git log origin/<default> --since="<window>" --format="" --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l
|
||||
```
|
||||
|
||||
### Step 2: Compute Metrics
|
||||
@@ -120,6 +129,7 @@ Calculate and present these metrics in a summary table:
|
||||
| Detected sessions | N |
|
||||
| Avg LOC/session-hour | N |
|
||||
| Greptile signal | N% (Y catches, Z FPs) |
|
||||
| Test Health | N total tests · M added this period · K regression tests |
|
||||
|
||||
Then show a **per-author leaderboard** immediately below:
|
||||
|
||||
@@ -343,7 +353,17 @@ Use the Write tool to save the JSON file with this schema:
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.
|
||||
|
||||
Include test health data in the JSON when test files exist:
|
||||
```json
|
||||
"test_health": {
|
||||
"total_test_files": 47,
|
||||
"tests_added_this_period": 5,
|
||||
"regression_test_commits": 3,
|
||||
"test_files_changed": 8
|
||||
}
|
||||
```
|
||||
|
||||
Include backlog data in the JSON when TODOS.md exists:
|
||||
```json
|
||||
@@ -399,6 +419,13 @@ Narrative covering:
|
||||
- Any XL PRs that should have been split
|
||||
- Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)"
|
||||
|
||||
### Test Health
|
||||
- Total test files: N (from command 10)
|
||||
- Tests added this period: M (from command 12 — test files changed)
|
||||
- Regression test commits: list `test(qa):` and `test(design):` and `test: coverage` commits from command 11
|
||||
- If prior retro exists and has `test_health`: show delta "Test count: {last} → {now} (+{delta})"
|
||||
- If test ratio < 20%: flag as growth area — "100% test coverage is the goal. Tests make vibe coding safe."
|
||||
|
||||
### Focus & Highlights
|
||||
(from Step 8)
|
||||
- Focus score with interpretation
|
||||
|
||||
@@ -846,6 +846,161 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- Informational only — does NOT block.`;
|
||||
}
|
||||
|
||||
function generateTestBootstrap(): string {
|
||||
return `## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
\`\`\`bash
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
\`\`\`
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write \`.gstack/no-test-bootstrap\` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- \`"[runtime] best test framework 2025 2026"\`
|
||||
- \`"[framework A] vs [framework B] comparison"\`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write \`.gstack/no-test-bootstrap\`. Tell user: "If you change your mind later, delete \`.gstack/no-test-bootstrap\` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with \`git checkout -- package.json package-lock.json\` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** \`git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10\`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never \`expect(x).toBeDefined()\` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
\`\`\`bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
\`\`\`
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
\`\`\`bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
\`\`\`
|
||||
|
||||
If \`.github/\` exists (or no CI detected — default to GitHub Actions):
|
||||
Create \`.github/workflows/test.yml\` with:
|
||||
- \`runs-on: ubuntu-latest\`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a \`## Testing\` section → skip. Don't duplicate.
|
||||
|
||||
Append a \`## Testing\` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
\`\`\`bash
|
||||
git status --porcelain
|
||||
\`\`\`
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
\`git commit -m "chore: bootstrap test framework ({framework name})"\`
|
||||
|
||||
---`;
|
||||
}
|
||||
|
||||
const RESOLVERS: Record<string, () => string> = {
|
||||
COMMAND_REFERENCE: generateCommandReference,
|
||||
SNAPSHOT_FLAGS: generateSnapshotFlags,
|
||||
@@ -855,6 +1010,7 @@ const RESOLVERS: Record<string, () => string> = {
|
||||
QA_METHODOLOGY: generateQAMethodology,
|
||||
DESIGN_METHODOLOGY: generateDesignMethodology,
|
||||
REVIEW_DASHBOARD: generateReviewDashboard,
|
||||
TEST_BOOTSTRAP: generateTestBootstrap,
|
||||
};
|
||||
|
||||
// ─── Template Processing ────────────────────────────────────
|
||||
|
||||
+252
@@ -11,6 +11,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- Glob
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
@@ -121,6 +122,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
|
||||
- Test coverage gaps (auto-generate and commit, or flag in PR body)
|
||||
|
||||
---
|
||||
|
||||
@@ -185,6 +187,163 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
||||
|
||||
---
|
||||
|
||||
## Step 2.5: Test Framework Bootstrap
|
||||
|
||||
## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
```
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- `"[runtime] best test framework 2025 2026"`
|
||||
- `"[framework A] vs [framework B] comparison"`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
```bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
```
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
```bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
```
|
||||
|
||||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||||
Create `.github/workflows/test.yml` with:
|
||||
- `runs-on: ubuntu-latest`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||||
|
||||
Append a `## Testing` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
```bash
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
@@ -269,6 +428,94 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
|
||||
|
||||
---
|
||||
|
||||
## Step 3.4: Test Coverage Audit
|
||||
|
||||
100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
|
||||
|
||||
**0. Before/after test count:**
|
||||
|
||||
```bash
|
||||
# Count test files before any generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
Store this number for the PR body.
|
||||
|
||||
**1. Build the code path map** from `git diff origin/<base>...HEAD`:
|
||||
|
||||
Extract all new or modified:
|
||||
- Functions/methods (def, function, const, class methods)
|
||||
- Conditional branches (if/else, switch/case, ternary, guard clauses, early returns)
|
||||
- API routes/endpoints (route definitions, controller actions)
|
||||
- Components (new files or new exports)
|
||||
- Error handlers (try/catch, rescue, error boundaries, fallback paths)
|
||||
|
||||
**2. Search for corresponding tests and score quality:**
|
||||
|
||||
For each code path, search for a test exercising it:
|
||||
- `src/services/billing.ts:processPayment` → `billing.test.ts`, `billing.spec.ts`
|
||||
- `app/controllers/payments_controller.rb#create` → `test/controllers/payments_controller_test.rb`
|
||||
- New if/else → tests for BOTH paths (not just happy path)
|
||||
- New error handler → test triggering the error condition
|
||||
|
||||
Quality scoring rubric:
|
||||
- ★★★ Tests behavior with edge cases AND error paths
|
||||
- ★★ Tests correct behavior, happy path only
|
||||
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
|
||||
|
||||
**3. Output ASCII coverage diagram:**
|
||||
|
||||
```
|
||||
NEW CODE PATH COVERAGE MAP
|
||||
===========================
|
||||
[+] src/services/billing.ts
|
||||
│
|
||||
├── processPayment()
|
||||
│ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42
|
||||
│ ├── [GAP] Network timeout — NO TEST
|
||||
│ └── [GAP] Invalid currency — NO TEST
|
||||
│
|
||||
└── refundPayment()
|
||||
├── [★★ TESTED] Full refund — billing.test.ts:89
|
||||
└── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101
|
||||
|
||||
─────────────────────────────────
|
||||
COVERAGE: 3/5 new paths tested (60%)
|
||||
QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★)
|
||||
GAPS: 2 paths need tests
|
||||
─────────────────────────────────
|
||||
```
|
||||
|
||||
**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue.
|
||||
|
||||
**4. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 2.5):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
- Write tests that exercise the specific uncovered path with real assertions
|
||||
- Run each test. Passes → commit as `test: coverage for {feature}`
|
||||
- Fails → fix once. Still fails → revert, note gap in diagram.
|
||||
|
||||
Caps: 30 code paths max, 10 tests generated max, 2-min per-test exploration cap.
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
|
||||
|
||||
**5. After-count and coverage summary:**
|
||||
|
||||
```bash
|
||||
# Count test files after generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
For PR body: `Tests: {before} → {after} (+{delta} new)`
|
||||
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
|
||||
|
||||
---
|
||||
|
||||
## Step 3.5: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
@@ -497,6 +744,10 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
<bullet points from CHANGELOG>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 3.4, or "All new code paths have test coverage.">
|
||||
<If Step 3.4 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 3.5, or "No issues found.">
|
||||
|
||||
@@ -538,4 +789,5 @@ EOF
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
@@ -11,6 +11,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- Glob
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
@@ -39,6 +40,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
|
||||
- Test coverage gaps (auto-generate and commit, or flag in PR body)
|
||||
|
||||
---
|
||||
|
||||
@@ -75,6 +77,12 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
||||
|
||||
---
|
||||
|
||||
## Step 2.5: Test Framework Bootstrap
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
@@ -159,6 +167,94 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
|
||||
|
||||
---
|
||||
|
||||
## Step 3.4: Test Coverage Audit
|
||||
|
||||
100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
|
||||
|
||||
**0. Before/after test count:**
|
||||
|
||||
```bash
|
||||
# Count test files before any generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
Store this number for the PR body.
|
||||
|
||||
**1. Build the code path map** from `git diff origin/<base>...HEAD`:
|
||||
|
||||
Extract all new or modified:
|
||||
- Functions/methods (def, function, const, class methods)
|
||||
- Conditional branches (if/else, switch/case, ternary, guard clauses, early returns)
|
||||
- API routes/endpoints (route definitions, controller actions)
|
||||
- Components (new files or new exports)
|
||||
- Error handlers (try/catch, rescue, error boundaries, fallback paths)
|
||||
|
||||
**2. Search for corresponding tests and score quality:**
|
||||
|
||||
For each code path, search for a test exercising it:
|
||||
- `src/services/billing.ts:processPayment` → `billing.test.ts`, `billing.spec.ts`
|
||||
- `app/controllers/payments_controller.rb#create` → `test/controllers/payments_controller_test.rb`
|
||||
- New if/else → tests for BOTH paths (not just happy path)
|
||||
- New error handler → test triggering the error condition
|
||||
|
||||
Quality scoring rubric:
|
||||
- ★★★ Tests behavior with edge cases AND error paths
|
||||
- ★★ Tests correct behavior, happy path only
|
||||
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
|
||||
|
||||
**3. Output ASCII coverage diagram:**
|
||||
|
||||
```
|
||||
NEW CODE PATH COVERAGE MAP
|
||||
===========================
|
||||
[+] src/services/billing.ts
|
||||
│
|
||||
├── processPayment()
|
||||
│ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42
|
||||
│ ├── [GAP] Network timeout — NO TEST
|
||||
│ └── [GAP] Invalid currency — NO TEST
|
||||
│
|
||||
└── refundPayment()
|
||||
├── [★★ TESTED] Full refund — billing.test.ts:89
|
||||
└── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101
|
||||
|
||||
─────────────────────────────────
|
||||
COVERAGE: 3/5 new paths tested (60%)
|
||||
QUALITY: ★★★: 1 ★★: 1 ★: 1 (avg: ★★)
|
||||
GAPS: 2 paths need tests
|
||||
─────────────────────────────────
|
||||
```
|
||||
|
||||
**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue.
|
||||
|
||||
**4. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 2.5):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
- Write tests that exercise the specific uncovered path with real assertions
|
||||
- Run each test. Passes → commit as `test: coverage for {feature}`
|
||||
- Fails → fix once. Still fails → revert, note gap in diagram.
|
||||
|
||||
Caps: 30 code paths max, 10 tests generated max, 2-min per-test exploration cap.
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
|
||||
|
||||
**5. After-count and coverage summary:**
|
||||
|
||||
```bash
|
||||
# Count test files after generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
For PR body: `Tests: {before} → {after} (+{delta} new)`
|
||||
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
|
||||
|
||||
---
|
||||
|
||||
## Step 3.5: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
@@ -387,6 +483,10 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
<bullet points from CHANGELOG>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 3.4, or "All new code paths have test coverage.">
|
||||
<If Step 3.4 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 3.5, or "No issues found.">
|
||||
|
||||
@@ -428,4 +528,5 @@ EOF
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
@@ -2215,6 +2215,269 @@ Review the site at ${serverUrl}. Use --quick mode. Skip any AskUserQuestion call
|
||||
}, 420_000);
|
||||
});
|
||||
|
||||
// --- Test Bootstrap E2E ---
|
||||
|
||||
describeE2E('Test Bootstrap E2E', () => {
|
||||
let bootstrapDir: string;
|
||||
let bootstrapServer: ReturnType<typeof Bun.serve>;
|
||||
|
||||
beforeAll(() => {
|
||||
bootstrapDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-bootstrap-'));
|
||||
setupBrowseShims(bootstrapDir);
|
||||
|
||||
// Copy qa skill files
|
||||
copyDirSync(path.join(ROOT, 'qa'), path.join(bootstrapDir, 'qa'));
|
||||
|
||||
// Create a minimal Node.js project with NO test framework
|
||||
fs.writeFileSync(path.join(bootstrapDir, 'package.json'), JSON.stringify({
|
||||
name: 'test-bootstrap-app',
|
||||
version: '1.0.0',
|
||||
type: 'module',
|
||||
}, null, 2));
|
||||
|
||||
// Create a simple app file with a bug
|
||||
fs.writeFileSync(path.join(bootstrapDir, 'app.js'), `
|
||||
export function add(a, b) { return a + b; }
|
||||
export function subtract(a, b) { return a - b; }
|
||||
export function divide(a, b) { return a / b; } // BUG: no zero check
|
||||
`);
|
||||
|
||||
// Create a simple HTML page with a bug
|
||||
fs.writeFileSync(path.join(bootstrapDir, 'index.html'), `<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head><meta charset="utf-8"><title>Bootstrap Test</title></head>
|
||||
<body>
|
||||
<h1>Test App</h1>
|
||||
<a href="/nonexistent-page">Broken Link</a>
|
||||
<script>console.error("ReferenceError: undefinedVar is not defined");</script>
|
||||
</body>
|
||||
</html>
|
||||
`);
|
||||
|
||||
// Init git repo
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: bootstrapDir, stdio: 'pipe', timeout: 5000 });
|
||||
run('git', ['init']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial commit']);
|
||||
|
||||
// Serve from working directory
|
||||
bootstrapServer = Bun.serve({
|
||||
port: 0,
|
||||
hostname: '127.0.0.1',
|
||||
fetch(req) {
|
||||
const url = new URL(req.url);
|
||||
let filePath = url.pathname === '/' ? '/index.html' : url.pathname;
|
||||
filePath = filePath.replace(/^\//, '');
|
||||
const fullPath = path.join(bootstrapDir, filePath);
|
||||
if (!fs.existsSync(fullPath)) {
|
||||
return new Response('Not Found', { status: 404 });
|
||||
}
|
||||
const content = fs.readFileSync(fullPath, 'utf-8');
|
||||
return new Response(content, {
|
||||
headers: { 'Content-Type': 'text/html' },
|
||||
});
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
bootstrapServer?.stop();
|
||||
try { fs.rmSync(bootstrapDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/qa bootstrap + regression test on zero-test project', async () => {
|
||||
const serverUrl = `http://127.0.0.1:${bootstrapServer!.port}`;
|
||||
|
||||
const result = await runSkillTest({
|
||||
prompt: `You have a browse binary at ${browseBin}. Assign it to B variable like: B="${browseBin}"
|
||||
|
||||
Read the file qa/SKILL.md for the QA workflow instructions.
|
||||
|
||||
Run a Quick-tier QA test on ${serverUrl}
|
||||
The source code for this page is at ${bootstrapDir}/index.html — you can fix bugs there.
|
||||
Do NOT use AskUserQuestion — for any AskUserQuestion prompts, choose the RECOMMENDED option automatically.
|
||||
Write your report to ${bootstrapDir}/qa-reports/qa-report.md
|
||||
|
||||
This project has NO test framework. When the bootstrap asks, pick vitest (option A).
|
||||
This is a test+fix loop: find bugs, fix them, write regression tests, commit each fix.`,
|
||||
workingDirectory: bootstrapDir,
|
||||
maxTurns: 50,
|
||||
allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'],
|
||||
timeout: 420_000,
|
||||
testName: 'qa-bootstrap',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/qa bootstrap', result);
|
||||
recordE2E('/qa bootstrap + regression test', 'Test Bootstrap E2E', result, {
|
||||
passed: ['success', 'error_max_turns'].includes(result.exitReason),
|
||||
});
|
||||
|
||||
expect(['success', 'error_max_turns']).toContain(result.exitReason);
|
||||
|
||||
// Verify bootstrap created test infrastructure
|
||||
const hasTestConfig = fs.existsSync(path.join(bootstrapDir, 'vitest.config.ts'))
|
||||
|| fs.existsSync(path.join(bootstrapDir, 'vitest.config.js'))
|
||||
|| fs.existsSync(path.join(bootstrapDir, 'jest.config.js'))
|
||||
|| fs.existsSync(path.join(bootstrapDir, 'jest.config.ts'));
|
||||
console.log(`Test config created: ${hasTestConfig}`);
|
||||
|
||||
const hasTestingMd = fs.existsSync(path.join(bootstrapDir, 'TESTING.md'));
|
||||
console.log(`TESTING.md created: ${hasTestingMd}`);
|
||||
|
||||
// Check for bootstrap commit
|
||||
const gitLog = spawnSync('git', ['log', '--oneline', '--grep=bootstrap'], {
|
||||
cwd: bootstrapDir, stdio: 'pipe',
|
||||
});
|
||||
const bootstrapCommits = gitLog.stdout.toString().trim();
|
||||
console.log(`Bootstrap commits: ${bootstrapCommits || 'none'}`);
|
||||
|
||||
// Check for regression test commits
|
||||
const regressionLog = spawnSync('git', ['log', '--oneline', '--grep=test(qa)'], {
|
||||
cwd: bootstrapDir, stdio: 'pipe',
|
||||
});
|
||||
const regressionCommits = regressionLog.stdout.toString().trim();
|
||||
console.log(`Regression test commits: ${regressionCommits || 'none'}`);
|
||||
|
||||
// Verify at least the bootstrap happened (fix commits are bonus)
|
||||
const allCommits = spawnSync('git', ['log', '--oneline'], {
|
||||
cwd: bootstrapDir, stdio: 'pipe',
|
||||
});
|
||||
const totalCommits = allCommits.stdout.toString().trim().split('\n').length;
|
||||
console.log(`Total commits: ${totalCommits}`);
|
||||
expect(totalCommits).toBeGreaterThan(1); // At least initial + bootstrap
|
||||
}, 420_000);
|
||||
});
|
||||
|
||||
// --- Test Coverage Audit E2E ---
|
||||
|
||||
describeE2E('Test Coverage Audit E2E', () => {
|
||||
let coverageDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
coverageDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-coverage-'));
|
||||
|
||||
// Copy ship skill files
|
||||
copyDirSync(path.join(ROOT, 'ship'), path.join(coverageDir, 'ship'));
|
||||
copyDirSync(path.join(ROOT, 'review'), path.join(coverageDir, 'review'));
|
||||
|
||||
// Create a Node.js project WITH test framework but coverage gaps
|
||||
fs.writeFileSync(path.join(coverageDir, 'package.json'), JSON.stringify({
|
||||
name: 'test-coverage-app',
|
||||
version: '1.0.0',
|
||||
type: 'module',
|
||||
scripts: { test: 'echo "no tests yet"' },
|
||||
devDependencies: { vitest: '^1.0.0' },
|
||||
}, null, 2));
|
||||
|
||||
// Create vitest config
|
||||
fs.writeFileSync(path.join(coverageDir, 'vitest.config.ts'),
|
||||
`import { defineConfig } from 'vitest/config';\nexport default defineConfig({ test: {} });\n`);
|
||||
|
||||
fs.writeFileSync(path.join(coverageDir, 'VERSION'), '0.1.0.0\n');
|
||||
fs.writeFileSync(path.join(coverageDir, 'CHANGELOG.md'), '# Changelog\n');
|
||||
|
||||
// Create source file with multiple code paths
|
||||
fs.mkdirSync(path.join(coverageDir, 'src'), { recursive: true });
|
||||
fs.writeFileSync(path.join(coverageDir, 'src', 'billing.ts'), `
|
||||
export function processPayment(amount: number, currency: string) {
|
||||
if (amount <= 0) throw new Error('Invalid amount');
|
||||
if (currency !== 'USD' && currency !== 'EUR') throw new Error('Unsupported currency');
|
||||
return { status: 'success', amount, currency };
|
||||
}
|
||||
|
||||
export function refundPayment(paymentId: string, reason: string) {
|
||||
if (!paymentId) throw new Error('Payment ID required');
|
||||
if (!reason) throw new Error('Reason required');
|
||||
return { status: 'refunded', paymentId, reason };
|
||||
}
|
||||
`);
|
||||
|
||||
// Create a test directory with ONE test (partial coverage)
|
||||
fs.mkdirSync(path.join(coverageDir, 'test'), { recursive: true });
|
||||
fs.writeFileSync(path.join(coverageDir, 'test', 'billing.test.ts'), `
|
||||
import { describe, test, expect } from 'vitest';
|
||||
import { processPayment } from '../src/billing';
|
||||
|
||||
describe('processPayment', () => {
|
||||
test('processes valid payment', () => {
|
||||
const result = processPayment(100, 'USD');
|
||||
expect(result.status).toBe('success');
|
||||
});
|
||||
// GAP: no test for invalid amount
|
||||
// GAP: no test for unsupported currency
|
||||
// GAP: refundPayment not tested at all
|
||||
});
|
||||
`);
|
||||
|
||||
// Init git repo with main branch
|
||||
const run = (cmd: string, args: string[]) =>
|
||||
spawnSync(cmd, args, { cwd: coverageDir, stdio: 'pipe', timeout: 5000 });
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
run('git', ['add', '.']);
|
||||
run('git', ['commit', '-m', 'initial commit']);
|
||||
|
||||
// Create feature branch
|
||||
run('git', ['checkout', '-b', 'feature/billing']);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(coverageDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
test('/ship Step 3.4 produces coverage diagram', async () => {
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read the file ship/SKILL.md for the ship workflow instructions.
|
||||
|
||||
You are on the feature/billing branch. The base branch is main.
|
||||
This is a test project — there is no remote, no PR to create.
|
||||
|
||||
ONLY run Step 3.4 (Test Coverage Audit) from the ship workflow.
|
||||
Skip all other steps (tests, evals, review, version, changelog, commit, push, PR).
|
||||
|
||||
The source code is in ${coverageDir}/src/billing.ts.
|
||||
Existing tests are in ${coverageDir}/test/billing.test.ts.
|
||||
The test command is: echo "tests pass" (mocked — just pretend tests pass).
|
||||
|
||||
Produce the ASCII coverage diagram showing which code paths are tested and which have gaps.
|
||||
Do NOT generate new tests — just produce the diagram and coverage summary.
|
||||
Output the diagram directly.`,
|
||||
workingDirectory: coverageDir,
|
||||
maxTurns: 15,
|
||||
allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep'],
|
||||
timeout: 120_000,
|
||||
testName: 'ship-coverage-audit',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/ship coverage audit', result);
|
||||
recordE2E('/ship Step 3.4 coverage audit', 'Test Coverage Audit E2E', result, {
|
||||
passed: result.exitReason === 'success',
|
||||
});
|
||||
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Check output contains coverage diagram elements
|
||||
const output = result.output || '';
|
||||
const hasGap = output.includes('GAP') || output.includes('gap') || output.includes('NO TEST');
|
||||
const hasTested = output.includes('TESTED') || output.includes('tested') || output.includes('✓');
|
||||
const hasCoverage = output.includes('COVERAGE') || output.includes('coverage') || output.includes('paths tested');
|
||||
|
||||
console.log(`Output has GAP markers: ${hasGap}`);
|
||||
console.log(`Output has TESTED markers: ${hasTested}`);
|
||||
console.log(`Output has coverage summary: ${hasCoverage}`);
|
||||
|
||||
// At minimum, the agent should have read the source and test files
|
||||
const readCalls = result.toolCalls.filter(tc => tc.tool === 'Read');
|
||||
expect(readCalls.length).toBeGreaterThan(0);
|
||||
}, 180_000);
|
||||
});
|
||||
|
||||
// Module-level afterAll — finalize eval collector after all tests complete
|
||||
afterAll(async () => {
|
||||
if (evalCollector) {
|
||||
|
||||
@@ -707,3 +707,201 @@ describe('gstack-slug', () => {
|
||||
expect(lines[1]).toMatch(/^BRANCH=.+/);
|
||||
});
|
||||
});
|
||||
|
||||
// --- Test Bootstrap validation ---
|
||||
|
||||
describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
||||
test('TEST_BOOTSTRAP resolver produces valid content', () => {
|
||||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(qaContent).toContain('Test Framework Bootstrap');
|
||||
expect(qaContent).toContain('RUNTIME:ruby');
|
||||
expect(qaContent).toContain('RUNTIME:node');
|
||||
expect(qaContent).toContain('RUNTIME:python');
|
||||
expect(qaContent).toContain('no-test-bootstrap');
|
||||
expect(qaContent).toContain('BOOTSTRAP_DECLINED');
|
||||
});
|
||||
|
||||
test('TEST_BOOTSTRAP appears in qa/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Test Framework Bootstrap');
|
||||
expect(content).toContain('TESTING.md');
|
||||
expect(content).toContain('CLAUDE.md');
|
||||
});
|
||||
|
||||
test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Test Framework Bootstrap');
|
||||
expect(content).toContain('Step 2.5');
|
||||
});
|
||||
|
||||
test('TEST_BOOTSTRAP appears in qa-design-review/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Test Framework Bootstrap');
|
||||
});
|
||||
|
||||
test('TEST_BOOTSTRAP does NOT appear in qa-only/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8');
|
||||
expect(content).not.toContain('Test Framework Bootstrap');
|
||||
// But should have the recommendation note
|
||||
expect(content).toContain('No test framework detected');
|
||||
expect(content).toContain('Run `/qa` to bootstrap');
|
||||
});
|
||||
|
||||
test('bootstrap includes framework knowledge table', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('vitest');
|
||||
expect(content).toContain('minitest');
|
||||
expect(content).toContain('pytest');
|
||||
expect(content).toContain('cargo test');
|
||||
expect(content).toContain('phpunit');
|
||||
expect(content).toContain('ExUnit');
|
||||
});
|
||||
|
||||
test('bootstrap includes CI/CD pipeline generation', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('.github/workflows/test.yml');
|
||||
expect(content).toContain('GitHub Actions');
|
||||
});
|
||||
|
||||
test('bootstrap includes first real tests step', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('First real tests');
|
||||
expect(content).toContain('git log --since=30.days');
|
||||
expect(content).toContain('Prioritize by risk');
|
||||
});
|
||||
|
||||
test('bootstrap includes vibe coding philosophy', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('vibe coding');
|
||||
expect(content).toContain('100% test coverage');
|
||||
});
|
||||
|
||||
test('WebSearch is in allowed-tools for qa, ship, qa-design-review', () => {
|
||||
const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const qaDesign = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8');
|
||||
expect(qa).toContain('WebSearch');
|
||||
expect(ship).toContain('WebSearch');
|
||||
expect(qaDesign).toContain('WebSearch');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Phase 8e.5 regression test validation ---
|
||||
|
||||
describe('Phase 8e.5 regression test generation', () => {
|
||||
test('qa/SKILL.md contains Phase 8e.5', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('8e.5. Regression Test');
|
||||
expect(content).toContain('test(qa): regression test');
|
||||
expect(content).toContain('WTF-likelihood exclusion');
|
||||
});
|
||||
|
||||
test('qa/SKILL.md Rule 13 is amended for regression tests', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Only modify tests when generating regression tests in Phase 8e.5');
|
||||
expect(content).not.toContain('Never modify tests or CI configuration');
|
||||
});
|
||||
|
||||
test('qa-design-review has CSS-aware Phase 8e.5 variant', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa-design-review', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('8e.5. Regression Test (design-review variant)');
|
||||
expect(content).toContain('CSS-only');
|
||||
expect(content).toContain('test(design): regression test');
|
||||
});
|
||||
|
||||
test('regression test includes full attribution comment format', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('// Regression: ISSUE-NNN');
|
||||
expect(content).toContain('// Found by /qa on');
|
||||
expect(content).toContain('// Report: .gstack/qa-reports/');
|
||||
});
|
||||
|
||||
test('regression test uses auto-incrementing names', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('auto-incrementing');
|
||||
expect(content).toContain('max number + 1');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Step 3.4 coverage audit validation ---
|
||||
|
||||
describe('Step 3.4 test coverage audit', () => {
|
||||
test('ship/SKILL.md contains Step 3.4', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Step 3.4: Test Coverage Audit');
|
||||
expect(content).toContain('CODE PATH COVERAGE MAP');
|
||||
});
|
||||
|
||||
test('Step 3.4 includes quality scoring rubric', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('★★★');
|
||||
expect(content).toContain('★★');
|
||||
expect(content).toContain('edge cases AND error paths');
|
||||
expect(content).toContain('happy path only');
|
||||
});
|
||||
|
||||
test('Step 3.4 includes before/after test count', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Count test files before');
|
||||
expect(content).toContain('Count test files after');
|
||||
});
|
||||
|
||||
test('ship PR body includes Test Coverage section', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('## Test Coverage');
|
||||
});
|
||||
|
||||
test('ship rules include test generation rule', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Step 3.4 generates coverage tests');
|
||||
expect(content).toContain('Never commit failing tests');
|
||||
});
|
||||
|
||||
test('Step 3.4 includes vibe coding philosophy', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('vibe coding becomes yolo coding');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Retro test health validation ---
|
||||
|
||||
describe('Retro test health tracking', () => {
|
||||
test('retro/SKILL.md has test health data gathering commands', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('# 10. Test file count');
|
||||
expect(content).toContain('# 11. Regression test commits');
|
||||
expect(content).toContain('# 12. Test files changed');
|
||||
});
|
||||
|
||||
test('retro/SKILL.md has Test Health metrics row', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Test Health');
|
||||
expect(content).toContain('regression tests');
|
||||
});
|
||||
|
||||
test('retro/SKILL.md has Test Health narrative section', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('### Test Health');
|
||||
expect(content).toContain('Total test files');
|
||||
expect(content).toContain('vibe coding safe');
|
||||
});
|
||||
|
||||
test('retro JSON schema includes test_health field', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('test_health');
|
||||
expect(content).toContain('total_test_files');
|
||||
expect(content).toContain('regression_test_commits');
|
||||
});
|
||||
});
|
||||
|
||||
// --- QA report template regression tests section ---
|
||||
|
||||
describe('QA report template', () => {
|
||||
test('qa-report-template.md has Regression Tests section', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'templates', 'qa-report-template.md'), 'utf-8');
|
||||
expect(content).toContain('## Regression Tests');
|
||||
expect(content).toContain('committed / deferred / skipped');
|
||||
expect(content).toContain('### Deferred Tests');
|
||||
expect(content).toContain('**Precondition:**');
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user