mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
fix: review log architecture — close gaps, add attribution (v0.11.21.0) (#512)
* fix: review log architecture — close gaps, fix orphans, add attribution - Ship Step 3.5 now logs its code review to the review log (via:"ship") - Remove eng review gate — ship runs its own review in Step 3.5 - Dashboard Outside Voice row mapped to codex-plan-review - Dashboard shows via source attribution (e.g., "via /autoplan") - land-and-deploy checks all 8 review skill types (was 5) - codex-review log gets commit field for staleness detection - autoplan uses placeholder tokens instead of hardcoded "clean" - Document autoplan-voices as audit-trail-only in review.ts - E2E test for dashboard via attribution * chore: bump version and changelog (v0.11.21.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,16 @@
|
||||
# Changelog
|
||||
|
||||
## [0.11.21.0] - 2026-03-26
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`/autoplan` reviews now count toward the ship readiness gate.** When `/autoplan` ran full CEO + Design + Eng reviews, `/ship` still showed "0 runs" for Eng Review because autoplan-logged entries weren't being read correctly. Now the dashboard shows source attribution (e.g., "CLEAR (PLAN via /autoplan)") so you can see exactly which tool satisfied each review.
|
||||
- **`/ship` no longer tells you to "run /review first."** Ship runs its own pre-landing review in Step 3.5 — asking you to run the same review separately was redundant. The gate is removed; ship just does it.
|
||||
- **`/land-and-deploy` now checks all 8 review types.** Previously missed `review`, `adversarial-review`, and `codex-plan-review` — if you only ran `/review` (not `/plan-eng-review`), land-and-deploy wouldn't see it.
|
||||
- **Dashboard Outside Voice row now works.** Was showing "0 runs" even after outside voices ran in `/plan-ceo-review` or `/plan-eng-review`. Now correctly maps to `codex-plan-review` entries.
|
||||
- **`/codex review` now tracks staleness.** Added the `commit` field to codex review log entries so the dashboard can detect when a codex review is outdated.
|
||||
- **`/autoplan` no longer hardcodes "clean" status.** Review log entries from autoplan used to always record `status:"clean"` even when issues were found. Now uses proper placeholder tokens that Claude substitutes with real values.
|
||||
|
||||
## [0.11.20.0] - 2026-03-26
|
||||
|
||||
### Added
|
||||
|
||||
+6
-6
@@ -929,24 +929,24 @@ AskUserQuestion options:
|
||||
|
||||
## Completion: Write Review Logs
|
||||
|
||||
On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
|
||||
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
|
||||
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
|
||||
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.
|
||||
|
||||
```bash
|
||||
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
|
||||
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
||||
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
```
|
||||
|
||||
If Phase 2 ran (UI scope):
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
```
|
||||
|
||||
Replace field values with actual counts from the review.
|
||||
|
||||
Dual voice logs (one per phase that ran):
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'
|
||||
|
||||
@@ -584,24 +584,24 @@ AskUserQuestion options:
|
||||
|
||||
## Completion: Write Review Logs
|
||||
|
||||
On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
|
||||
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
|
||||
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
|
||||
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.
|
||||
|
||||
```bash
|
||||
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
|
||||
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
|
||||
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
```
|
||||
|
||||
If Phase 2 ran (UI scope):
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
|
||||
```
|
||||
|
||||
Replace field values with actual counts from the review.
|
||||
|
||||
Dual voice logs (one per phase that ran):
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'
|
||||
|
||||
+1
-1
@@ -423,7 +423,7 @@ CROSS-MODEL ANALYSIS:
|
||||
|
||||
7. Persist the review result:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
```
|
||||
|
||||
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
|
||||
|
||||
+1
-1
@@ -127,7 +127,7 @@ CROSS-MODEL ANALYSIS:
|
||||
|
||||
7. Persist the review result:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
```
|
||||
|
||||
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),
|
||||
|
||||
@@ -447,7 +447,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
|
||||
```
|
||||
|
||||
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
|
||||
plan-design-review, design-review-lite, codex-review):
|
||||
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
|
||||
codex-plan-review):
|
||||
|
||||
1. Find the most recent entry within the last 7 days.
|
||||
2. Extract its `commit` field.
|
||||
@@ -594,7 +595,7 @@ Use AskUserQuestion:
|
||||
- C) Merge anyway — I understand the risks (Completeness: 3/10)
|
||||
|
||||
If the user chooses B: **STOP.** List exactly what needs to be done:
|
||||
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
|
||||
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
|
||||
- If E2E not run: "Run `bun run test:e2e` to verify."
|
||||
- If docs not updated: "Run /document-release to update documentation."
|
||||
- If PR body stale: "Update the PR body to reflect current changes."
|
||||
|
||||
@@ -134,7 +134,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
|
||||
```
|
||||
|
||||
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
|
||||
plan-design-review, design-review-lite, codex-review):
|
||||
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
|
||||
codex-plan-review):
|
||||
|
||||
1. Find the most recent entry within the last 7 days.
|
||||
2. Extract its `commit` field.
|
||||
@@ -281,7 +282,7 @@ Use AskUserQuestion:
|
||||
- C) Merge anyway — I understand the risks (Completeness: 3/10)
|
||||
|
||||
If the user chooses B: **STOP.** List exactly what needs to be done:
|
||||
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
|
||||
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
|
||||
- If E2E not run: "Run `bun run test:e2e` to verify."
|
||||
- If docs not updated: "Run /document-release to update documentation."
|
||||
- If PR body stale: "Update the PR body to reflect current changes."
|
||||
|
||||
@@ -1262,7 +1262,13 @@ After completing the review, read the review log and config to display the dashb
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||||
|
||||
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||||
|
||||
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||||
|
||||
Display:
|
||||
|
||||
```
|
||||
+====================================================================+
|
||||
|
||||
@@ -768,7 +768,13 @@ After completing the review, read the review log and config to display the dashb
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||||
|
||||
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||||
|
||||
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||||
|
||||
Display:
|
||||
|
||||
```
|
||||
+====================================================================+
|
||||
|
||||
@@ -870,7 +870,13 @@ After completing the review, read the review log and config to display the dashb
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||||
|
||||
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||||
|
||||
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||||
|
||||
Display:
|
||||
|
||||
```
|
||||
+====================================================================+
|
||||
|
||||
@@ -9,7 +9,13 @@ After completing the review, read the review log and config to display the dashb
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
\`\`\`
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent \`codex-plan-review\` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||||
|
||||
**Source attribution:** If the most recent entry for a skill has a \\\`"via"\\\` field, append it to the status label in parentheses. Examples: \`plan-eng-review\` with \`via:"autoplan"\` shows as "CLEAR (PLAN via /autoplan)". \`review\` with \`via:"ship"\` shows as "CLEAR (DIFF via /ship)". Entries without a \`via\` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||||
|
||||
Note: \`autoplan-voices\` and \`design-outside-voices\` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||||
|
||||
Display:
|
||||
|
||||
\`\`\`
|
||||
+====================================================================+
|
||||
|
||||
+21
-19
@@ -364,7 +364,13 @@ After completing the review, read the review log and config to display the dashb
|
||||
~/.claude/skills/gstack/bin/gstack-review-read
|
||||
```
|
||||
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
|
||||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||||
|
||||
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||||
|
||||
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||||
|
||||
Display:
|
||||
|
||||
```
|
||||
+====================================================================+
|
||||
@@ -403,26 +409,15 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
|
||||
If the Eng Review is NOT "CLEAR":
|
||||
|
||||
1. **Check for a prior override on this branch:**
|
||||
```bash
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
|
||||
```
|
||||
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
|
||||
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."
|
||||
|
||||
2. **If no override exists,** use AskUserQuestion:
|
||||
- Show that Eng Review is missing or has open issues
|
||||
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
|
||||
- Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review
|
||||
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
|
||||
- For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
|
||||
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
|
||||
|
||||
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
|
||||
```bash
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
```
|
||||
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
|
||||
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
|
||||
|
||||
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
|
||||
|
||||
Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.
|
||||
|
||||
---
|
||||
|
||||
@@ -1340,6 +1335,13 @@ Present Codex output under a `CODEX (design):` header, merged with the checklist
|
||||
|
||||
If no issues found: `Pre-Landing Review: No issues found.`
|
||||
|
||||
9. Persist the review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||||
```
|
||||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||||
|
||||
Save the review output — it goes into the PR body in Step 8.
|
||||
|
||||
---
|
||||
|
||||
+14
-18
@@ -64,26 +64,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
|
||||
If the Eng Review is NOT "CLEAR":
|
||||
|
||||
1. **Check for a prior override on this branch:**
|
||||
```bash
|
||||
{{SLUG_EVAL}}
|
||||
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
|
||||
```
|
||||
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
|
||||
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."
|
||||
|
||||
2. **If no override exists,** use AskUserQuestion:
|
||||
- Show that Eng Review is missing or has open issues
|
||||
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
|
||||
- Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review
|
||||
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
|
||||
- For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
|
||||
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
|
||||
|
||||
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
|
||||
```bash
|
||||
{{SLUG_EVAL}}
|
||||
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
```
|
||||
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
|
||||
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
|
||||
|
||||
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
|
||||
|
||||
Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.
|
||||
|
||||
---
|
||||
|
||||
@@ -275,6 +264,13 @@ Review the diff for structural issues that tests don't catch.
|
||||
|
||||
If no issues found: `Pre-Landing Review: No issues found.`
|
||||
|
||||
9. Persist the review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||||
```
|
||||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||||
|
||||
Save the review output — it goes into the PR body in Step 8.
|
||||
|
||||
---
|
||||
|
||||
@@ -79,6 +79,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
// Ship
|
||||
'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],
|
||||
'ship-local-workflow': ['ship/**', 'scripts/gen-skill-docs.ts'],
|
||||
'review-dashboard-via': ['ship/**', 'scripts/resolvers/review.ts', 'codex/**', 'autoplan/**', 'land-and-deploy/**'],
|
||||
'ship-plan-completion': ['ship/**', 'scripts/gen-skill-docs.ts'],
|
||||
'ship-plan-verification': ['ship/**', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
|
||||
@@ -529,6 +529,119 @@ Analyze the git history and produce the narrative report as described in the SKI
|
||||
}, 420_000);
|
||||
});
|
||||
|
||||
// --- Review Dashboard Via Attribution E2E ---
|
||||
|
||||
describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'], () => {
|
||||
let dashDir: string;
|
||||
|
||||
beforeAll(() => {
|
||||
dashDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-dashboard-via-'));
|
||||
const run = (cmd: string, args: string[], cwd = dashDir) =>
|
||||
spawnSync(cmd, args, { cwd, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
// Create git repo with feature branch
|
||||
run('git', ['init', '-b', 'main']);
|
||||
run('git', ['config', 'user.email', 'test@test.com']);
|
||||
run('git', ['config', 'user.name', 'Test']);
|
||||
|
||||
fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v1");\n');
|
||||
run('git', ['add', 'app.ts']);
|
||||
run('git', ['commit', '-m', 'initial']);
|
||||
|
||||
run('git', ['checkout', '-b', 'feature/dashboard-test']);
|
||||
fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v2");\n');
|
||||
run('git', ['add', 'app.ts']);
|
||||
run('git', ['commit', '-m', 'feat: update']);
|
||||
|
||||
// Get HEAD commit for review entries
|
||||
const headResult = spawnSync('git', ['rev-parse', '--short', 'HEAD'], { cwd: dashDir, stdio: 'pipe' });
|
||||
const commit = headResult.stdout.toString().trim();
|
||||
|
||||
// Pre-populate review log with autoplan-sourced entries
|
||||
// gstack-review-read reads from ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
// For the test, we'll write a mock gstack-review-read script that returns our test data
|
||||
const timestamp = new Date().toISOString().replace(/\.\d{3}Z$/, 'Z');
|
||||
const reviewData = [
|
||||
`{"skill":"plan-eng-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"${commit}"}`,
|
||||
`{"skill":"plan-ceo-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"${commit}"}`,
|
||||
`{"skill":"codex-plan-review","timestamp":"${timestamp}","status":"clean","source":"codex","commit":"${commit}"}`,
|
||||
].join('\n');
|
||||
|
||||
// Write a mock gstack-review-read that returns our test data
|
||||
const mockBinDir = path.join(dashDir, '.mock-bin');
|
||||
fs.mkdirSync(mockBinDir, { recursive: true });
|
||||
fs.writeFileSync(path.join(mockBinDir, 'gstack-review-read'), [
|
||||
'#!/usr/bin/env bash',
|
||||
`echo '${reviewData.split('\n').join("'\necho '")}'`,
|
||||
'echo "---CONFIG---"',
|
||||
'echo "false"',
|
||||
'echo "---HEAD---"',
|
||||
`echo "${commit}"`,
|
||||
].join('\n'));
|
||||
fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);
|
||||
|
||||
// Copy ship skill
|
||||
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md'));
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { fs.rmSync(dashDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
testConcurrentIfSelected('review-dashboard-via', async () => {
|
||||
const mockBinDir = path.join(dashDir, '.mock-bin');
|
||||
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read ship-SKILL.md. You only need to run the Review Readiness Dashboard section.
|
||||
|
||||
Instead of running ~/.claude/skills/gstack/bin/gstack-review-read, run this mock: ${mockBinDir}/gstack-review-read
|
||||
|
||||
Parse the output and display the dashboard table. Pay attention to:
|
||||
1. The "via" field in entries — show source attribution (e.g., "via /autoplan")
|
||||
2. The codex-plan-review entry — it should populate the Outside Voice row
|
||||
3. Since Eng Review IS clear, there should be NO gate blocking — just display the dashboard
|
||||
|
||||
Skip the preamble, lake intro, telemetry, and all other ship steps.
|
||||
Write the dashboard output to ${dashDir}/dashboard-output.md`,
|
||||
workingDirectory: dashDir,
|
||||
maxTurns: 12,
|
||||
timeout: 90_000,
|
||||
testName: 'review-dashboard-via',
|
||||
runId,
|
||||
});
|
||||
|
||||
logCost('/ship dashboard-via', result);
|
||||
recordE2E(evalCollector, '/ship review dashboard via attribution', 'Dashboard via field', result);
|
||||
expect(result.exitReason).toBe('success');
|
||||
|
||||
// Check dashboard output for via attribution
|
||||
const dashPath = path.join(dashDir, 'dashboard-output.md');
|
||||
const allOutput = [
|
||||
result.output || '',
|
||||
...result.toolCalls.map(tc => tc.output || ''),
|
||||
].join('\n').toLowerCase();
|
||||
|
||||
// Verify via attribution appears somewhere (conversation or file)
|
||||
let dashContent = '';
|
||||
if (fs.existsSync(dashPath)) {
|
||||
dashContent = fs.readFileSync(dashPath, 'utf-8').toLowerCase();
|
||||
}
|
||||
const combined = allOutput + dashContent;
|
||||
|
||||
// Should mention autoplan attribution
|
||||
expect(combined).toMatch(/autoplan/);
|
||||
// Should show eng review as CLEAR (it has a clean entry)
|
||||
expect(combined).toMatch(/clear/i);
|
||||
// Should NOT contain AskUserQuestion gate (no blocking)
|
||||
const gateQuestions = result.toolCalls.filter(tc =>
|
||||
tc.tool === 'mcp__conductor__AskUserQuestion' ||
|
||||
(tc.tool === 'AskUserQuestion')
|
||||
);
|
||||
// Ship dashboard should not gate when eng review is clear
|
||||
expect(gateQuestions).toHaveLength(0);
|
||||
}, 120_000);
|
||||
});
|
||||
|
||||
// Module-level afterAll — finalize eval collector after all tests complete
|
||||
afterAll(async () => {
|
||||
await finalizeEvalCollector(evalCollector);
|
||||
|
||||
Reference in New Issue
Block a user