mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-09 06:45:46 +02:00
merge: resolve conflicts with origin/main (v0.6.0.1)
Merge main (v0.6.0.1) into boil-the-lakes branch. Resolved conflicts: - VERSION: bumped to 0.6.1 (next after 0.6.0.1) - CHANGELOG: added v0.6.1 entry above existing entries - plan-ceo-review/SKILL.md.tmpl: kept COMPLETENESS IS CHEAP bullet + upstream's expanded critical rule, SELECTIVE EXPANSION, CEO plan persistence - plan-eng-review/SKILL.md.tmpl: kept completeness check + upstream's always-full review flow (removed old BIG/SMALL/SCOPE menu) - Regenerated all 15 SKILL.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+342
-15
@@ -11,6 +11,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- Glob
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
@@ -160,6 +161,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
|
||||
- Test coverage gaps (auto-generate and commit, or flag in PR body)
|
||||
|
||||
---
|
||||
|
||||
@@ -175,11 +177,13 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
|
||||
## Review Readiness Dashboard
|
||||
|
||||
After completing the review, read the review log to display the dashboard.
|
||||
After completing the review, read the review log and config to display the dashboard.
|
||||
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
|
||||
echo "---CONFIG---"
|
||||
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display:
|
||||
@@ -188,25 +192,48 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
+====================================================================+
|
||||
| REVIEW READINESS DASHBOARD |
|
||||
+====================================================================+
|
||||
| Review | Runs | Last Run | Status |
|
||||
|-----------------|------|---------------------|----------------------|
|
||||
| CEO Review | 1 | 2026-03-16 14:30 | CLEAR |
|
||||
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR |
|
||||
| Design Review | 0 | — | NOT YET RUN |
|
||||
| Review | Runs | Last Run | Status | Required |
|
||||
|-----------------|------|---------------------|-----------|----------|
|
||||
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
|
||||
| CEO Review | 0 | — | — | no |
|
||||
| Design Review | 0 | — | — | no |
|
||||
+--------------------------------------------------------------------+
|
||||
| VERDICT: 2/3 CLEAR — Design Review not yet run |
|
||||
| VERDICT: CLEARED — Eng Review passed |
|
||||
+====================================================================+
|
||||
```
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean"
|
||||
- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days)
|
||||
- Informational only — does NOT block.
|
||||
**Review tiers:**
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
|
||||
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
|
||||
- Show which reviews are missing or have open issues
|
||||
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
|
||||
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
|
||||
- CEO and Design reviews are shown for context but never block shipping
|
||||
- If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
|
||||
|
||||
If the Eng Review is NOT "CLEAR":
|
||||
|
||||
1. **Check for a prior override on this branch:**
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
|
||||
```
|
||||
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
|
||||
|
||||
2. **If no override exists,** use AskUserQuestion:
|
||||
- Show that Eng Review is missing or has open issues
|
||||
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
|
||||
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
|
||||
- If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them
|
||||
|
||||
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
```
|
||||
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
|
||||
|
||||
---
|
||||
|
||||
@@ -224,6 +251,163 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
||||
|
||||
---
|
||||
|
||||
## Step 2.5: Test Framework Bootstrap
|
||||
|
||||
## Test Framework Bootstrap
|
||||
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
[ -f requirements.txt ] || [ -f pyproject.toml ] && echo "RUNTIME:python"
|
||||
[ -f go.mod ] && echo "RUNTIME:go"
|
||||
[ -f Cargo.toml ] && echo "RUNTIME:rust"
|
||||
[ -f composer.json ] && echo "RUNTIME:php"
|
||||
[ -f mix.exs ] && echo "RUNTIME:elixir"
|
||||
# Detect sub-frameworks
|
||||
[ -f Gemfile ] && grep -q "rails" Gemfile 2>/dev/null && echo "FRAMEWORK:rails"
|
||||
[ -f package.json ] && grep -q '"next"' package.json 2>/dev/null && echo "FRAMEWORK:nextjs"
|
||||
# Check for existing test infrastructure
|
||||
ls jest.config.* vitest.config.* playwright.config.* .rspec pytest.ini pyproject.toml phpunit.xml 2>/dev/null
|
||||
ls -d test/ tests/ spec/ __tests__/ cypress/ e2e/ 2>/dev/null
|
||||
# Check opt-out marker
|
||||
[ -f .gstack/no-test-bootstrap ] && echo "BOOTSTRAP_DECLINED"
|
||||
```
|
||||
|
||||
**If test framework detected** (config files or test directories found):
|
||||
Print "Test framework detected: {name} ({N} existing tests). Skipping bootstrap."
|
||||
Read 2-3 existing test files to learn conventions (naming, imports, assertion style, setup patterns).
|
||||
Store conventions as prose context for use in Phase 8e.5 or Step 3.4. **Skip the rest of bootstrap.**
|
||||
|
||||
**If BOOTSTRAP_DECLINED** appears: Print "Test bootstrap previously declined — skipping." **Skip the rest of bootstrap.**
|
||||
|
||||
**If NO runtime detected** (no config files found): Use AskUserQuestion:
|
||||
"I couldn't detect your project's language. What runtime are you using?"
|
||||
Options: A) Node.js/TypeScript B) Ruby/Rails C) Python D) Go E) Rust F) PHP G) Elixir H) This project doesn't need tests.
|
||||
If user picks H → write `.gstack/no-test-bootstrap` and continue without tests.
|
||||
|
||||
**If runtime detected but no test framework — bootstrap:**
|
||||
|
||||
### B2. Research best practices
|
||||
|
||||
Use WebSearch to find current best practices for the detected runtime:
|
||||
- `"[runtime] best test framework 2025 2026"`
|
||||
- `"[framework A] vs [framework B] comparison"`
|
||||
|
||||
If WebSearch is unavailable, use this built-in knowledge table:
|
||||
|
||||
| Runtime | Primary recommendation | Alternative |
|
||||
|---------|----------------------|-------------|
|
||||
| Ruby/Rails | minitest + fixtures + capybara | rspec + factory_bot + shoulda-matchers |
|
||||
| Node.js | vitest + @testing-library | jest + @testing-library |
|
||||
| Next.js | vitest + @testing-library/react + playwright | jest + cypress |
|
||||
| Python | pytest + pytest-cov | unittest |
|
||||
| Go | stdlib testing + testify | stdlib only |
|
||||
| Rust | cargo test (built-in) + mockall | — |
|
||||
| PHP | phpunit + mockery | pest |
|
||||
| Elixir | ExUnit (built-in) + ex_machina | — |
|
||||
|
||||
### B3. Framework selection
|
||||
|
||||
Use AskUserQuestion:
|
||||
"I detected this is a [Runtime/Framework] project with no test framework. I researched current best practices. Here are the options:
|
||||
A) [Primary] — [rationale]. Includes: [packages]. Supports: unit, integration, smoke, e2e
|
||||
B) [Alternative] — [rationale]. Includes: [packages]
|
||||
C) Skip — don't set up testing right now
|
||||
RECOMMENDATION: Choose A because [reason based on project context]"
|
||||
|
||||
If user picks C → write `.gstack/no-test-bootstrap`. Tell user: "If you change your mind later, delete `.gstack/no-test-bootstrap` and re-run." Continue without tests.
|
||||
|
||||
If multiple runtimes detected (monorepo) → ask which runtime to set up first, with option to do both sequentially.
|
||||
|
||||
### B4. Install and configure
|
||||
|
||||
1. Install the chosen packages (npm/bun/gem/pip/etc.)
|
||||
2. Create minimal config file
|
||||
3. Create directory structure (test/, spec/, etc.)
|
||||
4. Create one example test matching the project's code to verify setup works
|
||||
|
||||
If package installation fails → debug once. If still failing → revert with `git checkout -- package.json package-lock.json` (or equivalent for the runtime). Warn user and continue without tests.
|
||||
|
||||
### B4.5. First real tests
|
||||
|
||||
Generate 3-5 real tests for existing code:
|
||||
|
||||
1. **Find recently changed files:** `git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -10`
|
||||
2. **Prioritize by risk:** Error handlers > business logic with conditionals > API endpoints > pure functions
|
||||
3. **For each file:** Write one test that tests real behavior with meaningful assertions. Never `expect(x).toBeDefined()` — test what the code DOES.
|
||||
4. Run each test. Passes → keep. Fails → fix once. Still fails → delete silently.
|
||||
5. Generate at least 1 test, cap at 5.
|
||||
|
||||
Never import secrets, API keys, or credentials in test files. Use environment variables or test fixtures.
|
||||
|
||||
### B5. Verify
|
||||
|
||||
```bash
|
||||
# Run the full test suite to confirm everything works
|
||||
{detected test command}
|
||||
```
|
||||
|
||||
If tests fail → debug once. If still failing → revert all bootstrap changes and warn user.
|
||||
|
||||
### B5.5. CI/CD pipeline
|
||||
|
||||
```bash
|
||||
# Check CI provider
|
||||
ls -d .github/ 2>/dev/null && echo "CI:github"
|
||||
ls .gitlab-ci.yml .circleci/ bitrise.yml 2>/dev/null
|
||||
```
|
||||
|
||||
If `.github/` exists (or no CI detected — default to GitHub Actions):
|
||||
Create `.github/workflows/test.yml` with:
|
||||
- `runs-on: ubuntu-latest`
|
||||
- Appropriate setup action for the runtime (setup-node, setup-ruby, setup-python, etc.)
|
||||
- The same test command verified in B5
|
||||
- Trigger: push + pull_request
|
||||
|
||||
If non-GitHub CI detected → skip CI generation with note: "Detected {provider} — CI pipeline generation supports GitHub Actions only. Add test step to your existing pipeline manually."
|
||||
|
||||
### B6. Create TESTING.md
|
||||
|
||||
First check: If TESTING.md already exists → read it and update/append rather than overwriting. Never destroy existing content.
|
||||
|
||||
Write TESTING.md with:
|
||||
- Philosophy: "100% test coverage is the key to great vibe coding. Tests let you move fast, trust your instincts, and ship with confidence — without them, vibe coding is just yolo coding. With tests, it's a superpower."
|
||||
- Framework name and version
|
||||
- How to run tests (the verified command from B5)
|
||||
- Test layers: Unit tests (what, where, when), Integration tests, Smoke tests, E2E tests
|
||||
- Conventions: file naming, assertion style, setup/teardown patterns
|
||||
|
||||
### B7. Update CLAUDE.md
|
||||
|
||||
First check: If CLAUDE.md already has a `## Testing` section → skip. Don't duplicate.
|
||||
|
||||
Append a `## Testing` section:
|
||||
- Run command and test directory
|
||||
- Reference to TESTING.md
|
||||
- Test expectations:
|
||||
- 100% test coverage is the goal — tests make vibe coding safe
|
||||
- When writing new functions, write a corresponding test
|
||||
- When fixing a bug, write a regression test
|
||||
- When adding error handling, write a test that triggers the error
|
||||
- When adding a conditional (if/else, switch), write tests for BOTH paths
|
||||
- Never commit code that makes existing tests fail
|
||||
|
||||
### B8. Commit
|
||||
|
||||
```bash
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
Only commit if there are changes. Stage all bootstrap files (config, test directory, TESTING.md, CLAUDE.md, .github/workflows/test.yml if created):
|
||||
`git commit -m "chore: bootstrap test framework ({framework name})"`
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
@@ -308,6 +492,144 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
|
||||
|
||||
---
|
||||
|
||||
## Step 3.4: Test Coverage Audit
|
||||
|
||||
100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
|
||||
|
||||
**0. Before/after test count:**
|
||||
|
||||
```bash
|
||||
# Count test files before any generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
Store this number for the PR body.
|
||||
|
||||
**1. Trace every codepath changed** using `git diff origin/<base>...HEAD`:
|
||||
|
||||
Read every changed file. For each one, trace how data flows through the code — don't just list functions, actually follow the execution:
|
||||
|
||||
1. **Read the diff.** For each changed file, read the full file (not just the diff hunk) to understand context.
|
||||
2. **Trace data flow.** Starting from each entry point (route handler, exported function, event listener, component render), follow the data through every branch:
|
||||
- Where does input come from? (request params, props, database, API call)
|
||||
- What transforms it? (validation, mapping, computation)
|
||||
- Where does it go? (database write, API response, rendered output, side effect)
|
||||
- What can go wrong at each step? (null/undefined, invalid input, network failure, empty collection)
|
||||
3. **Diagram the execution.** For each changed file, draw an ASCII diagram showing:
|
||||
- Every function/method that was added or modified
|
||||
- Every conditional branch (if/else, switch, ternary, guard clause, early return)
|
||||
- Every error path (try/catch, rescue, error boundary, fallback)
|
||||
- Every call to another function (trace into it — does IT have untested branches?)
|
||||
- Every edge: what happens with null input? Empty array? Invalid type?
|
||||
|
||||
This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test.
|
||||
|
||||
**2. Map user flows, interactions, and error states:**
|
||||
|
||||
Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through:
|
||||
|
||||
- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test.
|
||||
- **Interaction edge cases:** What happens when the user does something unexpected?
|
||||
- Double-click/rapid resubmit
|
||||
- Navigate away mid-operation (back button, close tab, click another link)
|
||||
- Submit with stale data (page sat open for 30 minutes, session expired)
|
||||
- Slow connection (API takes 10 seconds — what does the user see?)
|
||||
- Concurrent actions (two tabs, same form)
|
||||
- **Error states the user can see:** For every error the code handles, what does the user actually experience?
|
||||
- Is there a clear error message or a silent failure?
|
||||
- Can the user recover (retry, go back, fix input) or are they stuck?
|
||||
- What happens with no network? With a 500 from the API? With invalid data from the server?
|
||||
- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input?
|
||||
|
||||
Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else.
|
||||
|
||||
**3. Check each branch against existing tests:**
|
||||
|
||||
Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it:
|
||||
- Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb`
|
||||
- An if/else → look for tests covering BOTH the true AND false path
|
||||
- An error handler → look for a test that triggers that specific error condition
|
||||
- A call to `helperFn()` that has its own branches → those branches need tests too
|
||||
- A user flow → look for an integration or E2E test that walks through the journey
|
||||
- An interaction edge case → look for a test that simulates the unexpected action
|
||||
|
||||
Quality scoring rubric:
|
||||
- ★★★ Tests behavior with edge cases AND error paths
|
||||
- ★★ Tests correct behavior, happy path only
|
||||
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
|
||||
|
||||
**4. Output ASCII coverage diagram:**
|
||||
|
||||
Include BOTH code paths and user flows in the same diagram:
|
||||
|
||||
```
|
||||
CODE PATH COVERAGE
|
||||
===========================
|
||||
[+] src/services/billing.ts
|
||||
│
|
||||
├── processPayment()
|
||||
│ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42
|
||||
│ ├── [GAP] Network timeout — NO TEST
|
||||
│ └── [GAP] Invalid currency — NO TEST
|
||||
│
|
||||
└── refundPayment()
|
||||
├── [★★ TESTED] Full refund — billing.test.ts:89
|
||||
└── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101
|
||||
|
||||
USER FLOW COVERAGE
|
||||
===========================
|
||||
[+] Payment checkout flow
|
||||
│
|
||||
├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15
|
||||
├── [GAP] Double-click submit — NO TEST
|
||||
├── [GAP] Navigate away during payment — NO TEST
|
||||
└── [★ TESTED] Form validation errors (checks render only) — checkout.test.ts:40
|
||||
|
||||
[+] Error states
|
||||
│
|
||||
├── [★★ TESTED] Card declined message — billing.test.ts:58
|
||||
├── [GAP] Network timeout UX (what does user see?) — NO TEST
|
||||
└── [GAP] Empty cart submission — NO TEST
|
||||
|
||||
─────────────────────────────────
|
||||
COVERAGE: 5/12 paths tested (42%)
|
||||
Code paths: 3/5 (60%)
|
||||
User flows: 2/7 (29%)
|
||||
QUALITY: ★★★: 2 ★★: 2 ★: 1
|
||||
GAPS: 7 paths need tests
|
||||
─────────────────────────────────
|
||||
```
|
||||
|
||||
**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue.
|
||||
|
||||
**5. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 2.5):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
- Write tests that exercise the specific uncovered path with real assertions
|
||||
- Run each test. Passes → commit as `test: coverage for {feature}`
|
||||
- Fails → fix once. Still fails → revert, note gap in diagram.
|
||||
|
||||
Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-min per-test exploration cap.
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
|
||||
|
||||
**6. After-count and coverage summary:**
|
||||
|
||||
```bash
|
||||
# Count test files after generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
For PR body: `Tests: {before} → {after} (+{delta} new)`
|
||||
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
|
||||
|
||||
---
|
||||
|
||||
## Step 3.5: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
@@ -536,6 +858,10 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
<bullet points from CHANGELOG>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 3.4, or "All new code paths have test coverage.">
|
||||
<If Step 3.4 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 3.5, or "No issues found.">
|
||||
|
||||
@@ -577,4 +903,5 @@ EOF
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
+172
-4
@@ -11,6 +11,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- Glob
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
@@ -39,6 +40,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
|
||||
- Test coverage gaps (auto-generate and commit, or flag in PR body)
|
||||
|
||||
---
|
||||
|
||||
@@ -54,10 +56,27 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
|
||||
{{REVIEW_DASHBOARD}}
|
||||
|
||||
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
|
||||
- Show which reviews are missing or have open issues
|
||||
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
|
||||
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
|
||||
If the Eng Review is NOT "CLEAR":
|
||||
|
||||
1. **Check for a prior override on this branch:**
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
|
||||
```
|
||||
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
|
||||
|
||||
2. **If no override exists,** use AskUserQuestion:
|
||||
- Show that Eng Review is missing or has open issues
|
||||
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
|
||||
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
|
||||
- If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them
|
||||
|
||||
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
```
|
||||
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
|
||||
|
||||
---
|
||||
|
||||
@@ -75,6 +94,12 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
||||
|
||||
---
|
||||
|
||||
## Step 2.5: Test Framework Bootstrap
|
||||
|
||||
{{TEST_BOOTSTRAP}}
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Run tests (on merged code)
|
||||
|
||||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||||
@@ -159,6 +184,144 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
|
||||
|
||||
---
|
||||
|
||||
## Step 3.4: Test Coverage Audit
|
||||
|
||||
100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
|
||||
|
||||
**0. Before/after test count:**
|
||||
|
||||
```bash
|
||||
# Count test files before any generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
Store this number for the PR body.
|
||||
|
||||
**1. Trace every codepath changed** using `git diff origin/<base>...HEAD`:
|
||||
|
||||
Read every changed file. For each one, trace how data flows through the code — don't just list functions, actually follow the execution:
|
||||
|
||||
1. **Read the diff.** For each changed file, read the full file (not just the diff hunk) to understand context.
|
||||
2. **Trace data flow.** Starting from each entry point (route handler, exported function, event listener, component render), follow the data through every branch:
|
||||
- Where does input come from? (request params, props, database, API call)
|
||||
- What transforms it? (validation, mapping, computation)
|
||||
- Where does it go? (database write, API response, rendered output, side effect)
|
||||
- What can go wrong at each step? (null/undefined, invalid input, network failure, empty collection)
|
||||
3. **Diagram the execution.** For each changed file, draw an ASCII diagram showing:
|
||||
- Every function/method that was added or modified
|
||||
- Every conditional branch (if/else, switch, ternary, guard clause, early return)
|
||||
- Every error path (try/catch, rescue, error boundary, fallback)
|
||||
- Every call to another function (trace into it — does IT have untested branches?)
|
||||
- Every edge: what happens with null input? Empty array? Invalid type?
|
||||
|
||||
This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test.
|
||||
|
||||
**2. Map user flows, interactions, and error states:**
|
||||
|
||||
Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through:
|
||||
|
||||
- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test.
|
||||
- **Interaction edge cases:** What happens when the user does something unexpected?
|
||||
- Double-click/rapid resubmit
|
||||
- Navigate away mid-operation (back button, close tab, click another link)
|
||||
- Submit with stale data (page sat open for 30 minutes, session expired)
|
||||
- Slow connection (API takes 10 seconds — what does the user see?)
|
||||
- Concurrent actions (two tabs, same form)
|
||||
- **Error states the user can see:** For every error the code handles, what does the user actually experience?
|
||||
- Is there a clear error message or a silent failure?
|
||||
- Can the user recover (retry, go back, fix input) or are they stuck?
|
||||
- What happens with no network? With a 500 from the API? With invalid data from the server?
|
||||
- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input?
|
||||
|
||||
Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else.
|
||||
|
||||
**3. Check each branch against existing tests:**
|
||||
|
||||
Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it:
|
||||
- Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb`
|
||||
- An if/else → look for tests covering BOTH the true AND false path
|
||||
- An error handler → look for a test that triggers that specific error condition
|
||||
- A call to `helperFn()` that has its own branches → those branches need tests too
|
||||
- A user flow → look for an integration or E2E test that walks through the journey
|
||||
- An interaction edge case → look for a test that simulates the unexpected action
|
||||
|
||||
Quality scoring rubric:
|
||||
- ★★★ Tests behavior with edge cases AND error paths
|
||||
- ★★ Tests correct behavior, happy path only
|
||||
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
|
||||
|
||||
**4. Output ASCII coverage diagram:**
|
||||
|
||||
Include BOTH code paths and user flows in the same diagram:
|
||||
|
||||
```
|
||||
CODE PATH COVERAGE
|
||||
===========================
|
||||
[+] src/services/billing.ts
|
||||
│
|
||||
├── processPayment()
|
||||
│ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42
|
||||
│ ├── [GAP] Network timeout — NO TEST
|
||||
│ └── [GAP] Invalid currency — NO TEST
|
||||
│
|
||||
└── refundPayment()
|
||||
├── [★★ TESTED] Full refund — billing.test.ts:89
|
||||
└── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101
|
||||
|
||||
USER FLOW COVERAGE
|
||||
===========================
|
||||
[+] Payment checkout flow
|
||||
│
|
||||
├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15
|
||||
├── [GAP] Double-click submit — NO TEST
|
||||
├── [GAP] Navigate away during payment — NO TEST
|
||||
└── [★ TESTED] Form validation errors (checks render only) — checkout.test.ts:40
|
||||
|
||||
[+] Error states
|
||||
│
|
||||
├── [★★ TESTED] Card declined message — billing.test.ts:58
|
||||
├── [GAP] Network timeout UX (what does user see?) — NO TEST
|
||||
└── [GAP] Empty cart submission — NO TEST
|
||||
|
||||
─────────────────────────────────
|
||||
COVERAGE: 5/12 paths tested (42%)
|
||||
Code paths: 3/5 (60%)
|
||||
User flows: 2/7 (29%)
|
||||
QUALITY: ★★★: 2 ★★: 2 ★: 1
|
||||
GAPS: 7 paths need tests
|
||||
─────────────────────────────────
|
||||
```
|
||||
|
||||
**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue.
|
||||
|
||||
**5. Generate tests for uncovered paths:**
|
||||
|
||||
If test framework detected (or bootstrapped in Step 2.5):
|
||||
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
|
||||
- Read 2-3 existing test files to match conventions exactly
|
||||
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
|
||||
- Write tests that exercise the specific uncovered path with real assertions
|
||||
- Run each test. Passes → commit as `test: coverage for {feature}`
|
||||
- Fails → fix once. Still fails → revert, note gap in diagram.
|
||||
|
||||
Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-min per-test exploration cap.
|
||||
|
||||
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
|
||||
|
||||
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
|
||||
|
||||
**6. After-count and coverage summary:**
|
||||
|
||||
```bash
|
||||
# Count test files after generation
|
||||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
|
||||
```
|
||||
|
||||
For PR body: `Tests: {before} → {after} (+{delta} new)`
|
||||
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
|
||||
|
||||
---
|
||||
|
||||
## Step 3.5: Pre-Landing Review
|
||||
|
||||
Review the diff for structural issues that tests don't catch.
|
||||
@@ -387,6 +550,10 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
## Summary
|
||||
<bullet points from CHANGELOG>
|
||||
|
||||
## Test Coverage
|
||||
<coverage diagram from Step 3.4, or "All new code paths have test coverage.">
|
||||
<If Step 3.4 ran: "Tests: {before} → {after} (+{delta} new)">
|
||||
|
||||
## Pre-Landing Review
|
||||
<findings from Step 3.5, or "No issues found.">
|
||||
|
||||
@@ -428,4 +595,5 @@ EOF
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
Reference in New Issue
Block a user