feat: test coverage gate + plan completion audit + auto-verification

Three new gates in /ship and /review:
1. Test coverage gate: configurable thresholds (60%/80% default), hard stop
   below minimum with user override
2. Plan completion audit: discovers plan file, extracts actionable items,
   cross-references against diff, gates on NOT DONE items
3. Auto-verification: invokes /qa-only inline with plan's verification
   section, conditional on localhost reachability

Also: coverage warning in /review, plan completion data in /retro,
shared plan file discovery helper (DRY), ship metrics logging.
This commit is contained in:
Garry Tan
2026-03-23 22:58:57 -07:00
parent f4bbfaa5bd
commit 0112476a5d
6 changed files with 533 additions and 8 deletions
+50 -1
View File
@@ -31,6 +31,9 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- AI-assessed coverage below minimum threshold (hard gate with user override — see Step 3.4)
- Plan items NOT DONE with no user override (see Step 3.45)
- Plan verification failures (see Step 3.47)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)
@@ -42,7 +45,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
- Test coverage gaps (auto-generate and commit, or flag in PR body)
- Test coverage gaps within target threshold (auto-generate and commit, or flag in PR body)
---
@@ -197,6 +200,16 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
---
## Step 3.45: Plan Completion Audit
{{PLAN_COMPLETION_AUDIT_SHIP}}
---
{{PLAN_VERIFICATION_EXEC}}
---
## Step 3.5: Pre-Landing Review
Review the diff for structural issues that tests don't catch.
@@ -472,6 +485,16 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
<If no Greptile comments found: "No Greptile comments.">
<If no PR existed during Step 3.75: omit this section entirely>
## Plan Completion
<If plan file found: completion checklist summary from Step 3.45>
<If no plan file: "No plan file detected.">
<If plan items deferred: list deferred items>
## Verification Results
<If verification ran: summary from Step 3.47 (N PASS, M FAIL, K SKIPPED)>
<If skipped: reason (no plan, no server, no verification section)>
<If not applicable: omit this section>
## TODOS
<If items marked complete: bullet list of completed items with version>
<If no items completed: "No TODO items completed in this PR.">
@@ -512,6 +535,32 @@ doc updates — the user runs `/ship` and documentation stays current without a
---
## Step 8.75: Persist ship metrics
Log coverage and plan completion data so `/retro` can track trends:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
```
Append to `~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl`:
```bash
echo '{"skill":"ship","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","coverage_pct":COVERAGE_PCT,"plan_items_total":PLAN_TOTAL,"plan_items_done":PLAN_DONE,"verification_result":"VERIFY_RESULT","version":"VERSION","branch":"BRANCH"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute from earlier steps:
- **COVERAGE_PCT**: coverage percentage from Step 3.4 diagram (integer, or -1 if undetermined)
- **PLAN_TOTAL**: total plan items extracted in Step 3.45 (0 if no plan file)
- **PLAN_DONE**: count of DONE + CHANGED items from Step 3.45 (0 if no plan file)
- **VERIFY_RESULT**: "pass", "fail", or "skipped" from Step 3.47
- **VERSION**: from the VERSION file
- **BRANCH**: current branch name
This step is automatic — never skip it, never ask for confirmation.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.