mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
feat: comprehensive pre-merge readiness gate in /land-and-deploy
Step 3.5 now checks 5 dimensions before allowing merge: 1. Review staleness — compares review commit hash against HEAD, flags if significant code changes happened after last review 2. Tests — runs free tests inline, checks today's E2E and LLM eval results from ~/.gstack-dev/evals/ 3. PR body accuracy — compares PR description against actual commits, flags missing features or stale descriptions 4. Document-release — checks if CHANGELOG/VERSION were updated when new features are present in the diff 5. Full readiness report — ASCII dashboard with warnings/blockers, explicit AskUserQuestion confirmation required before merge Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -369,79 +369,171 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate
|
||||
|
||||
## Step 3.5: Pre-merge readiness gate
|
||||
|
||||
**This is the critical safety check before an irreversible merge.** Gather ALL evidence,
|
||||
present it to the user, and get explicit confirmation before merging.
|
||||
**This is the critical safety check before an irreversible merge.** The merge cannot
|
||||
be undone without a revert commit. Gather ALL evidence, build a readiness report,
|
||||
and get explicit user confirmation before proceeding.
|
||||
|
||||
### 3.5a: Review Readiness Dashboard
|
||||
Collect evidence for each check below. Track warnings (yellow) and blockers (red).
|
||||
|
||||
### 3.5a: Review staleness check
|
||||
|
||||
```bash
|
||||
~/.codex/skills/gstack/bin/gstack-review-read 2>/dev/null
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each review skill within the last 7 days.
|
||||
Check staleness: compare each review's commit hash against the current HEAD.
|
||||
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
|
||||
plan-design-review, design-review-lite, codex-review):
|
||||
|
||||
Display the dashboard:
|
||||
1. Find the most recent entry within the last 7 days.
|
||||
2. Extract its `commit` field.
|
||||
3. Compare against current HEAD: `git rev-list --count STORED_COMMIT..HEAD`
|
||||
|
||||
**Staleness rules:**
|
||||
- 0 commits since review → CURRENT
|
||||
- 1-3 commits since review → RECENT (yellow if those commits touch code, not just docs)
|
||||
- 4+ commits since review → STALE (red — review may not reflect current code)
|
||||
- No review found → NOT RUN
|
||||
|
||||
**Critical check:** Look at what changed AFTER the last review. Run:
|
||||
```bash
|
||||
git log --oneline STORED_COMMIT..HEAD
|
||||
```
|
||||
PRE-MERGE READINESS CHECK
|
||||
══════════════════════════
|
||||
Review | Status | Last Run | Stale?
|
||||
----------------|-----------|---------------------|--------
|
||||
Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind
|
||||
CEO Review | CLEAR/— | — | —
|
||||
Design Review | CLEAR/— | — | —
|
||||
Codex Review | CLEAR/— | — | —
|
||||
```
|
||||
If any commits after the review contain words like "fix", "refactor", "rewrite",
|
||||
"overhaul", or touch more than 5 files — flag as **STALE (significant changes
|
||||
since review)**. The review was done on different code than what's about to merge.
|
||||
|
||||
### 3.5b: Test results
|
||||
|
||||
Check for recent E2E eval results:
|
||||
**Free tests — run them now:**
|
||||
|
||||
Read CLAUDE.md to find the project's test command. If not specified, use `bun test`.
|
||||
Run the test command and capture the exit code and output.
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -10
|
||||
```
|
||||
|
||||
If tests fail: **BLOCKER.** Cannot merge with failing tests.
|
||||
|
||||
**E2E tests — check recent results:**
|
||||
|
||||
```bash
|
||||
ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
|
||||
```
|
||||
|
||||
For each eval file found from today, parse pass/fail counts and show a summary:
|
||||
For each eval file from today, parse pass/fail counts. Show:
|
||||
- Total tests, pass count, fail count
|
||||
- How long ago the run finished (from file timestamp)
|
||||
- Total cost
|
||||
- Names of any failing tests
|
||||
|
||||
```
|
||||
E2E Tests | 52/52 pass | 25 min ago | $8.50
|
||||
Free Tests | (run bun test to check)
|
||||
```
|
||||
If no E2E results from today: **WARNING — no E2E tests run today.**
|
||||
If E2E results exist but have failures: **WARNING — N tests failed.** List them.
|
||||
|
||||
If no E2E results from today exist, note: "No E2E tests run today."
|
||||
|
||||
Also run free tests inline:
|
||||
**LLM judge evals — check recent results:**
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -5
|
||||
ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
### 3.5c: Document-release check
|
||||
If found, parse and show pass/fail. If not found, note "No LLM evals run today."
|
||||
|
||||
Check if `/document-release` has been run on this branch:
|
||||
### 3.5c: PR body accuracy check
|
||||
|
||||
Read the current PR body:
|
||||
```bash
|
||||
gh pr view --json body -q .body
|
||||
```
|
||||
|
||||
Read the current diff summary:
|
||||
```bash
|
||||
git log --oneline $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -20
|
||||
```
|
||||
|
||||
Compare the PR body against the actual commits. Check for:
|
||||
1. **Missing features** — commits that add significant functionality not mentioned in the PR
|
||||
2. **Stale descriptions** — PR body mentions things that were later changed or reverted
|
||||
3. **Wrong version** — PR title or body references a version that doesn't match VERSION file
|
||||
|
||||
If the PR body looks stale or incomplete: **WARNING — PR body may not reflect current
|
||||
changes.** List what's missing or stale.
|
||||
|
||||
### 3.5d: Document-release check
|
||||
|
||||
Check if documentation was updated on this branch:
|
||||
|
||||
```bash
|
||||
git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD
|
||||
git log --oneline --all-match --grep="docs:" $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -5
|
||||
```
|
||||
|
||||
If no doc update commit found: flag "WARNING: /document-release has not been run."
|
||||
Also check if key doc files were modified:
|
||||
```bash
|
||||
git diff --name-only $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)...HEAD -- README.md CHANGELOG.md ARCHITECTURE.md CONTRIBUTING.md CLAUDE.md VERSION
|
||||
```
|
||||
|
||||
### 3.5d: Merge confirmation
|
||||
If CHANGELOG.md and VERSION were NOT modified on this branch and the diff includes
|
||||
new features (new files, new commands, new skills): **WARNING — /document-release
|
||||
likely not run. CHANGELOG and VERSION not updated despite new features.**
|
||||
|
||||
Present all evidence in a single AskUserQuestion:
|
||||
If only docs changed (no code): skip this check.
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y."
|
||||
- Show the review dashboard, test results, and doc-release status from above.
|
||||
- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update),
|
||||
list them explicitly as warnings.
|
||||
- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings
|
||||
to address first. Choose C to skip checks and merge anyway.
|
||||
- A) Merge — all checks look good
|
||||
- B) Don't merge yet — fix the flagged issues first
|
||||
- C) Merge anyway — I accept the risks
|
||||
### 3.5e: Readiness report and confirmation
|
||||
|
||||
Build the full readiness report:
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════╗
|
||||
║ PRE-MERGE READINESS REPORT ║
|
||||
╠══════════════════════════════════════════════════════════╣
|
||||
║ ║
|
||||
║ PR: #NNN — title ║
|
||||
║ Branch: feature → main ║
|
||||
║ ║
|
||||
║ REVIEWS ║
|
||||
║ ├─ Eng Review: CURRENT / STALE (N commits) / — ║
|
||||
║ ├─ CEO Review: CURRENT / — (optional) ║
|
||||
║ ├─ Design Review: CURRENT / — (optional) ║
|
||||
║ └─ Codex Review: CURRENT / — (optional) ║
|
||||
║ ║
|
||||
║ TESTS ║
|
||||
║ ├─ Free tests: PASS / FAIL (blocker) ║
|
||||
║ ├─ E2E tests: 52/52 pass (25 min ago) / NOT RUN ║
|
||||
║ └─ LLM evals: PASS / NOT RUN ║
|
||||
║ ║
|
||||
║ DOCUMENTATION ║
|
||||
║ ├─ CHANGELOG: Updated / NOT UPDATED (warning) ║
|
||||
║ ├─ VERSION: 0.9.8.0 / NOT BUMPED (warning) ║
|
||||
║ └─ Doc release: Run / NOT RUN (warning) ║
|
||||
║ ║
|
||||
║ PR BODY ║
|
||||
║ └─ Accuracy: Current / STALE (warning) ║
|
||||
║ ║
|
||||
║ WARNINGS: N | BLOCKERS: N ║
|
||||
╚══════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
If there are BLOCKERS (failing free tests): list them and recommend B.
|
||||
If there are WARNINGS but no blockers: list each warning and recommend A if
|
||||
warnings are minor, or B if warnings are significant.
|
||||
If everything is green: recommend A.
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN (title) from branch X to Y. Here's the
|
||||
readiness report." Show the report above.
|
||||
- List each warning and blocker explicitly.
|
||||
- **RECOMMENDATION:** Choose A if green. Choose B if there are significant warnings.
|
||||
Choose C only if the user understands the risks.
|
||||
- A) Merge — readiness checks passed (Completeness: 10/10)
|
||||
- B) Don't merge yet — address the warnings first (Completeness: 10/10)
|
||||
- C) Merge anyway — I understand the risks (Completeness: 3/10)
|
||||
|
||||
If the user chooses B: **STOP.** List exactly what needs to be done:
|
||||
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
|
||||
- If E2E not run: "Run `bun run test:e2e` to verify."
|
||||
- If docs not updated: "Run /document-release to update documentation."
|
||||
- If PR body stale: "Update the PR body to reflect current changes."
|
||||
|
||||
If the user chooses B: **STOP** with a summary of what needs to be done.
|
||||
If the user chooses A or C: continue to Step 4.
|
||||
|
||||
---
|
||||
|
||||
+133
-41
@@ -376,79 +376,171 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate
|
||||
|
||||
## Step 3.5: Pre-merge readiness gate
|
||||
|
||||
**This is the critical safety check before an irreversible merge.** Gather ALL evidence,
|
||||
present it to the user, and get explicit confirmation before merging.
|
||||
**This is the critical safety check before an irreversible merge.** The merge cannot
|
||||
be undone without a revert commit. Gather ALL evidence, build a readiness report,
|
||||
and get explicit user confirmation before proceeding.
|
||||
|
||||
### 3.5a: Review Readiness Dashboard
|
||||
Collect evidence for each check below. Track warnings (yellow) and blockers (red).
|
||||
|
||||
### 3.5a: Review staleness check
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each review skill within the last 7 days.
|
||||
Check staleness: compare each review's commit hash against the current HEAD.
|
||||
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
|
||||
plan-design-review, design-review-lite, codex-review):
|
||||
|
||||
Display the dashboard:
|
||||
1. Find the most recent entry within the last 7 days.
|
||||
2. Extract its `commit` field.
|
||||
3. Compare against current HEAD: `git rev-list --count STORED_COMMIT..HEAD`
|
||||
|
||||
**Staleness rules:**
|
||||
- 0 commits since review → CURRENT
|
||||
- 1-3 commits since review → RECENT (yellow if those commits touch code, not just docs)
|
||||
- 4+ commits since review → STALE (red — review may not reflect current code)
|
||||
- No review found → NOT RUN
|
||||
|
||||
**Critical check:** Look at what changed AFTER the last review. Run:
|
||||
```bash
|
||||
git log --oneline STORED_COMMIT..HEAD
|
||||
```
|
||||
PRE-MERGE READINESS CHECK
|
||||
══════════════════════════
|
||||
Review | Status | Last Run | Stale?
|
||||
----------------|-----------|---------------------|--------
|
||||
Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind
|
||||
CEO Review | CLEAR/— | — | —
|
||||
Design Review | CLEAR/— | — | —
|
||||
Codex Review | CLEAR/— | — | —
|
||||
```
|
||||
If any commits after the review contain words like "fix", "refactor", "rewrite",
|
||||
"overhaul", or touch more than 5 files — flag as **STALE (significant changes
|
||||
since review)**. The review was done on different code than what's about to merge.
|
||||
|
||||
### 3.5b: Test results
|
||||
|
||||
Check for recent E2E eval results:
|
||||
**Free tests — run them now:**
|
||||
|
||||
Read CLAUDE.md to find the project's test command. If not specified, use `bun test`.
|
||||
Run the test command and capture the exit code and output.
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -10
|
||||
```
|
||||
|
||||
If tests fail: **BLOCKER.** Cannot merge with failing tests.
|
||||
|
||||
**E2E tests — check recent results:**
|
||||
|
||||
```bash
|
||||
ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
|
||||
```
|
||||
|
||||
For each eval file found from today, parse pass/fail counts and show a summary:
|
||||
For each eval file from today, parse pass/fail counts. Show:
|
||||
- Total tests, pass count, fail count
|
||||
- How long ago the run finished (from file timestamp)
|
||||
- Total cost
|
||||
- Names of any failing tests
|
||||
|
||||
```
|
||||
E2E Tests | 52/52 pass | 25 min ago | $8.50
|
||||
Free Tests | (run bun test to check)
|
||||
```
|
||||
If no E2E results from today: **WARNING — no E2E tests run today.**
|
||||
If E2E results exist but have failures: **WARNING — N tests failed.** List them.
|
||||
|
||||
If no E2E results from today exist, note: "No E2E tests run today."
|
||||
|
||||
Also run free tests inline:
|
||||
**LLM judge evals — check recent results:**
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -5
|
||||
ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
### 3.5c: Document-release check
|
||||
If found, parse and show pass/fail. If not found, note "No LLM evals run today."
|
||||
|
||||
Check if `/document-release` has been run on this branch:
|
||||
### 3.5c: PR body accuracy check
|
||||
|
||||
Read the current PR body:
|
||||
```bash
|
||||
gh pr view --json body -q .body
|
||||
```
|
||||
|
||||
Read the current diff summary:
|
||||
```bash
|
||||
git log --oneline $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -20
|
||||
```
|
||||
|
||||
Compare the PR body against the actual commits. Check for:
|
||||
1. **Missing features** — commits that add significant functionality not mentioned in the PR
|
||||
2. **Stale descriptions** — PR body mentions things that were later changed or reverted
|
||||
3. **Wrong version** — PR title or body references a version that doesn't match VERSION file
|
||||
|
||||
If the PR body looks stale or incomplete: **WARNING — PR body may not reflect current
|
||||
changes.** List what's missing or stale.
|
||||
|
||||
### 3.5d: Document-release check
|
||||
|
||||
Check if documentation was updated on this branch:
|
||||
|
||||
```bash
|
||||
git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD
|
||||
git log --oneline --all-match --grep="docs:" $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -5
|
||||
```
|
||||
|
||||
If no doc update commit found: flag "WARNING: /document-release has not been run."
|
||||
Also check if key doc files were modified:
|
||||
```bash
|
||||
git diff --name-only $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)...HEAD -- README.md CHANGELOG.md ARCHITECTURE.md CONTRIBUTING.md CLAUDE.md VERSION
|
||||
```
|
||||
|
||||
### 3.5d: Merge confirmation
|
||||
If CHANGELOG.md and VERSION were NOT modified on this branch and the diff includes
|
||||
new features (new files, new commands, new skills): **WARNING — /document-release
|
||||
likely not run. CHANGELOG and VERSION not updated despite new features.**
|
||||
|
||||
Present all evidence in a single AskUserQuestion:
|
||||
If only docs changed (no code): skip this check.
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y."
|
||||
- Show the review dashboard, test results, and doc-release status from above.
|
||||
- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update),
|
||||
list them explicitly as warnings.
|
||||
- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings
|
||||
to address first. Choose C to skip checks and merge anyway.
|
||||
- A) Merge — all checks look good
|
||||
- B) Don't merge yet — fix the flagged issues first
|
||||
- C) Merge anyway — I accept the risks
|
||||
### 3.5e: Readiness report and confirmation
|
||||
|
||||
Build the full readiness report:
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════╗
|
||||
║ PRE-MERGE READINESS REPORT ║
|
||||
╠══════════════════════════════════════════════════════════╣
|
||||
║ ║
|
||||
║ PR: #NNN — title ║
|
||||
║ Branch: feature → main ║
|
||||
║ ║
|
||||
║ REVIEWS ║
|
||||
║ ├─ Eng Review: CURRENT / STALE (N commits) / — ║
|
||||
║ ├─ CEO Review: CURRENT / — (optional) ║
|
||||
║ ├─ Design Review: CURRENT / — (optional) ║
|
||||
║ └─ Codex Review: CURRENT / — (optional) ║
|
||||
║ ║
|
||||
║ TESTS ║
|
||||
║ ├─ Free tests: PASS / FAIL (blocker) ║
|
||||
║ ├─ E2E tests: 52/52 pass (25 min ago) / NOT RUN ║
|
||||
║ └─ LLM evals: PASS / NOT RUN ║
|
||||
║ ║
|
||||
║ DOCUMENTATION ║
|
||||
║ ├─ CHANGELOG: Updated / NOT UPDATED (warning) ║
|
||||
║ ├─ VERSION: 0.9.8.0 / NOT BUMPED (warning) ║
|
||||
║ └─ Doc release: Run / NOT RUN (warning) ║
|
||||
║ ║
|
||||
║ PR BODY ║
|
||||
║ └─ Accuracy: Current / STALE (warning) ║
|
||||
║ ║
|
||||
║ WARNINGS: N | BLOCKERS: N ║
|
||||
╚══════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
If there are BLOCKERS (failing free tests): list them and recommend B.
|
||||
If there are WARNINGS but no blockers: list each warning and recommend A if
|
||||
warnings are minor, or B if warnings are significant.
|
||||
If everything is green: recommend A.
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN (title) from branch X to Y. Here's the
|
||||
readiness report." Show the report above.
|
||||
- List each warning and blocker explicitly.
|
||||
- **RECOMMENDATION:** Choose A if green. Choose B if there are significant warnings.
|
||||
Choose C only if the user understands the risks.
|
||||
- A) Merge — readiness checks passed (Completeness: 10/10)
|
||||
- B) Don't merge yet — address the warnings first (Completeness: 10/10)
|
||||
- C) Merge anyway — I understand the risks (Completeness: 3/10)
|
||||
|
||||
If the user chooses B: **STOP.** List exactly what needs to be done:
|
||||
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
|
||||
- If E2E not run: "Run `bun run test:e2e` to verify."
|
||||
- If docs not updated: "Run /document-release to update documentation."
|
||||
- If PR body stale: "Update the PR body to reflect current changes."
|
||||
|
||||
If the user chooses B: **STOP** with a summary of what needs to be done.
|
||||
If the user chooses A or C: continue to Step 4.
|
||||
|
||||
---
|
||||
|
||||
+133
-41
@@ -118,79 +118,171 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate
|
||||
|
||||
## Step 3.5: Pre-merge readiness gate
|
||||
|
||||
**This is the critical safety check before an irreversible merge.** Gather ALL evidence,
|
||||
present it to the user, and get explicit confirmation before merging.
|
||||
**This is the critical safety check before an irreversible merge.** The merge cannot
|
||||
be undone without a revert commit. Gather ALL evidence, build a readiness report,
|
||||
and get explicit user confirmation before proceeding.
|
||||
|
||||
### 3.5a: Review Readiness Dashboard
|
||||
Collect evidence for each check below. Track warnings (yellow) and blockers (red).
|
||||
|
||||
### 3.5a: Review staleness check
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each review skill within the last 7 days.
|
||||
Check staleness: compare each review's commit hash against the current HEAD.
|
||||
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
|
||||
plan-design-review, design-review-lite, codex-review):
|
||||
|
||||
Display the dashboard:
|
||||
1. Find the most recent entry within the last 7 days.
|
||||
2. Extract its `commit` field.
|
||||
3. Compare against current HEAD: `git rev-list --count STORED_COMMIT..HEAD`
|
||||
|
||||
**Staleness rules:**
|
||||
- 0 commits since review → CURRENT
|
||||
- 1-3 commits since review → RECENT (yellow if those commits touch code, not just docs)
|
||||
- 4+ commits since review → STALE (red — review may not reflect current code)
|
||||
- No review found → NOT RUN
|
||||
|
||||
**Critical check:** Look at what changed AFTER the last review. Run:
|
||||
```bash
|
||||
git log --oneline STORED_COMMIT..HEAD
|
||||
```
|
||||
PRE-MERGE READINESS CHECK
|
||||
══════════════════════════
|
||||
Review | Status | Last Run | Stale?
|
||||
----------------|-----------|---------------------|--------
|
||||
Eng Review | CLEAR/— | 2026-03-21 15:00 | N commits behind
|
||||
CEO Review | CLEAR/— | — | —
|
||||
Design Review | CLEAR/— | — | —
|
||||
Codex Review | CLEAR/— | — | —
|
||||
```
|
||||
If any commits after the review contain words like "fix", "refactor", "rewrite",
|
||||
"overhaul", or touch more than 5 files — flag as **STALE (significant changes
|
||||
since review)**. The review was done on different code than what's about to merge.
|
||||
|
||||
### 3.5b: Test results
|
||||
|
||||
Check for recent E2E eval results:
|
||||
**Free tests — run them now:**
|
||||
|
||||
Read CLAUDE.md to find the project's test command. If not specified, use `bun test`.
|
||||
Run the test command and capture the exit code and output.
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -10
|
||||
```
|
||||
|
||||
If tests fail: **BLOCKER.** Cannot merge with failing tests.
|
||||
|
||||
**E2E tests — check recent results:**
|
||||
|
||||
```bash
|
||||
ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
|
||||
```
|
||||
|
||||
For each eval file found from today, parse pass/fail counts and show a summary:
|
||||
For each eval file from today, parse pass/fail counts. Show:
|
||||
- Total tests, pass count, fail count
|
||||
- How long ago the run finished (from file timestamp)
|
||||
- Total cost
|
||||
- Names of any failing tests
|
||||
|
||||
```
|
||||
E2E Tests | 52/52 pass | 25 min ago | $8.50
|
||||
Free Tests | (run bun test to check)
|
||||
```
|
||||
If no E2E results from today: **WARNING — no E2E tests run today.**
|
||||
If E2E results exist but have failures: **WARNING — N tests failed.** List them.
|
||||
|
||||
If no E2E results from today exist, note: "No E2E tests run today."
|
||||
|
||||
Also run free tests inline:
|
||||
**LLM judge evals — check recent results:**
|
||||
|
||||
```bash
|
||||
bun test 2>&1 | tail -5
|
||||
ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
### 3.5c: Document-release check
|
||||
If found, parse and show pass/fail. If not found, note "No LLM evals run today."
|
||||
|
||||
Check if `/document-release` has been run on this branch:
|
||||
### 3.5c: PR body accuracy check
|
||||
|
||||
Read the current PR body:
|
||||
```bash
|
||||
gh pr view --json body -q .body
|
||||
```
|
||||
|
||||
Read the current diff summary:
|
||||
```bash
|
||||
git log --oneline $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -20
|
||||
```
|
||||
|
||||
Compare the PR body against the actual commits. Check for:
|
||||
1. **Missing features** — commits that add significant functionality not mentioned in the PR
|
||||
2. **Stale descriptions** — PR body mentions things that were later changed or reverted
|
||||
3. **Wrong version** — PR title or body references a version that doesn't match VERSION file
|
||||
|
||||
If the PR body looks stale or incomplete: **WARNING — PR body may not reflect current
|
||||
changes.** List what's missing or stale.
|
||||
|
||||
### 3.5d: Document-release check
|
||||
|
||||
Check if documentation was updated on this branch:
|
||||
|
||||
```bash
|
||||
git log --oneline --grep="docs: update project documentation" HEAD~5..HEAD
|
||||
git log --oneline --all-match --grep="docs:" $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -5
|
||||
```
|
||||
|
||||
If no doc update commit found: flag "WARNING: /document-release has not been run."
|
||||
Also check if key doc files were modified:
|
||||
```bash
|
||||
git diff --name-only $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)...HEAD -- README.md CHANGELOG.md ARCHITECTURE.md CONTRIBUTING.md CLAUDE.md VERSION
|
||||
```
|
||||
|
||||
### 3.5d: Merge confirmation
|
||||
If CHANGELOG.md and VERSION were NOT modified on this branch and the diff includes
|
||||
new features (new files, new commands, new skills): **WARNING — /document-release
|
||||
likely not run. CHANGELOG and VERSION not updated despite new features.**
|
||||
|
||||
Present all evidence in a single AskUserQuestion:
|
||||
If only docs changed (no code): skip this check.
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN on branch X to base branch Y."
|
||||
- Show the review dashboard, test results, and doc-release status from above.
|
||||
- If ANY of these are red flags (eng review missing/stale, E2E failures, no doc update),
|
||||
list them explicitly as warnings.
|
||||
- **RECOMMENDATION:** Choose A if everything is green. Choose B if there are warnings
|
||||
to address first. Choose C to skip checks and merge anyway.
|
||||
- A) Merge — all checks look good
|
||||
- B) Don't merge yet — fix the flagged issues first
|
||||
- C) Merge anyway — I accept the risks
|
||||
### 3.5e: Readiness report and confirmation
|
||||
|
||||
Build the full readiness report:
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════╗
|
||||
║ PRE-MERGE READINESS REPORT ║
|
||||
╠══════════════════════════════════════════════════════════╣
|
||||
║ ║
|
||||
║ PR: #NNN — title ║
|
||||
║ Branch: feature → main ║
|
||||
║ ║
|
||||
║ REVIEWS ║
|
||||
║ ├─ Eng Review: CURRENT / STALE (N commits) / — ║
|
||||
║ ├─ CEO Review: CURRENT / — (optional) ║
|
||||
║ ├─ Design Review: CURRENT / — (optional) ║
|
||||
║ └─ Codex Review: CURRENT / — (optional) ║
|
||||
║ ║
|
||||
║ TESTS ║
|
||||
║ ├─ Free tests: PASS / FAIL (blocker) ║
|
||||
║ ├─ E2E tests: 52/52 pass (25 min ago) / NOT RUN ║
|
||||
║ └─ LLM evals: PASS / NOT RUN ║
|
||||
║ ║
|
||||
║ DOCUMENTATION ║
|
||||
║ ├─ CHANGELOG: Updated / NOT UPDATED (warning) ║
|
||||
║ ├─ VERSION: 0.9.8.0 / NOT BUMPED (warning) ║
|
||||
║ └─ Doc release: Run / NOT RUN (warning) ║
|
||||
║ ║
|
||||
║ PR BODY ║
|
||||
║ └─ Accuracy: Current / STALE (warning) ║
|
||||
║ ║
|
||||
║ WARNINGS: N | BLOCKERS: N ║
|
||||
╚══════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
If there are BLOCKERS (failing free tests): list them and recommend B.
|
||||
If there are WARNINGS but no blockers: list each warning and recommend A if
|
||||
warnings are minor, or B if warnings are significant.
|
||||
If everything is green: recommend A.
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
- **Re-ground:** "About to merge PR #NNN (title) from branch X to Y. Here's the
|
||||
readiness report." Show the report above.
|
||||
- List each warning and blocker explicitly.
|
||||
- **RECOMMENDATION:** Choose A if green. Choose B if there are significant warnings.
|
||||
Choose C only if the user understands the risks.
|
||||
- A) Merge — readiness checks passed (Completeness: 10/10)
|
||||
- B) Don't merge yet — address the warnings first (Completeness: 10/10)
|
||||
- C) Merge anyway — I understand the risks (Completeness: 3/10)
|
||||
|
||||
If the user chooses B: **STOP.** List exactly what needs to be done:
|
||||
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
|
||||
- If E2E not run: "Run `bun run test:e2e` to verify."
|
||||
- If docs not updated: "Run /document-release to update documentation."
|
||||
- If PR body stale: "Update the PR body to reflect current changes."
|
||||
|
||||
If the user chooses B: **STOP** with a summary of what needs to be done.
|
||||
If the user chooses A or C: continue to Step 4.
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user