diff --git a/.agents/skills/gstack-plan-ceo-review/SKILL.md b/.agents/skills/gstack-plan-ceo-review/SKILL.md index f253d18d..3a1fecc4 100644 --- a/.agents/skills/gstack-plan-ceo-review/SKILL.md +++ b/.agents/skills/gstack-plan-ceo-review/SKILL.md @@ -931,6 +931,8 @@ Required ASCII diagram: user flow showing screens/states and transitions. If this plan has significant UI scope, recommend: "Consider running /plan-design-review for a deep design review of this plan before implementation." **STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. + + ## Post-Implementation Design Audit (if UI scope detected) After implementation, run `/design-review` on the live site to catch visual issues that can only be evaluated with rendered output. @@ -1025,6 +1027,7 @@ List every ASCII diagram in files this plan touches. Still accurate? | TODOS.md updates | ___ items proposed | | Scope proposals | ___ proposed, ___ accepted (EXP + SEL) | | CEO plan | written / skipped (HOLD/REDUCTION) | + | Outside voice | ran (codex/claude) / skipped | | Lake Score | X/Y recommendations chose complete option | | Diagrams produced | ___ (list types) | | Stale diagrams found | ___ | @@ -1078,7 +1081,7 @@ After completing the review, read the review log and config to display the dashb ~/.codex/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -1090,6 +1093,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -1100,6 +1104,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/.agents/skills/gstack-plan-design-review/SKILL.md b/.agents/skills/gstack-plan-design-review/SKILL.md index af092247..2665744d 100644 --- a/.agents/skills/gstack-plan-design-review/SKILL.md +++ b/.agents/skills/gstack-plan-design-review/SKILL.md @@ -528,7 +528,7 @@ After completing the review, read the review log and config to display the dashb ~/.codex/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -540,6 +540,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -550,6 +551,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/.agents/skills/gstack-plan-eng-review/SKILL.md b/.agents/skills/gstack-plan-eng-review/SKILL.md index f2be53a3..ecfa85b8 100644 --- a/.agents/skills/gstack-plan-eng-review/SKILL.md +++ b/.agents/skills/gstack-plan-eng-review/SKILL.md @@ -327,28 +327,7 @@ Before reviewing anything, answer these questions: If the complexity check triggers (8+ files or 2+ new classes/services), proactively recommend scope reduction via AskUserQuestion — explain what's overbuilt, propose a minimal version that achieves the core goal, and ask whether to reduce or proceed as-is. If the complexity check does not trigger, present your Step 0 findings and proceed directly to Section 1. -### Step 0.5: Codex plan review (optional) -Check if the Codex CLI is available: `which codex 2>/dev/null` - -If available, after presenting Step 0 findings, use AskUserQuestion: -``` -Want an independent Codex (OpenAI) review of this plan before the detailed review? -A) Yes — let Codex critique the plan independently -B) No — proceed with the Claude review only -``` - -If the user chooses A: tell Codex to read the plan file itself (avoids ARG_MAX limits for large plans): -```bash -codex exec "You are a brutally honest technical reviewer. Read the plan file at and review it for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached -``` - -Replace `` with the actual path to the plan file detected earlier. Codex has filesystem access in read-only mode and will read the file itself. - -Present the full output under a `CODEX SAYS (plan review):` header. Note any concerns -that should inform the subsequent engineering review sections. - -If Codex is not available, skip silently. Always work through the full interactive review: one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. @@ -483,6 +462,7 @@ At the end of the review, fill in and display this summary so the user can see a - What already exists: written - TODOS.md updates: ___ items proposed to user - Failure modes: ___ critical gaps flagged +- Outside voice: ran (codex/claude) / skipped - Lake Score: X/Y recommendations chose complete option ## Retrospective learning @@ -525,7 +505,7 @@ After completing the review, read the review log and config to display the dashb ~/.codex/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -537,6 +517,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -547,6 +528,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/.agents/skills/gstack-ship/SKILL.md b/.agents/skills/gstack-ship/SKILL.md index 442c4a72..1b01c249 100644 --- a/.agents/skills/gstack-ship/SKILL.md +++ b/.agents/skills/gstack-ship/SKILL.md @@ -294,7 +294,7 @@ After completing the review, read the review log and config to display the dashb ~/.codex/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -306,6 +306,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -316,6 +317,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 28ba5910..252823f6 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -940,6 +940,125 @@ Required ASCII diagram: user flow showing screens/states and transitions. If this plan has significant UI scope, recommend: "Consider running /plan-design-review for a deep design review of this plan before implementation." **STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds. +## Outside Voice — Independent Plan Challenge (optional, recommended) + +After all review sections are complete, offer an independent second opinion from a +different AI system. Two models agreeing on a plan is stronger signal than one model's +thorough review. + +**Check tool availability:** + +```bash +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" +``` + +Use AskUserQuestion: + +> "All review sections are complete. Want an outside voice? A different AI system can +> give a brutally honest, independent challenge of this plan — logical gaps, feasibility +> risks, and blind spots that are hard to catch from inside the review. Takes about 2 +> minutes." +> +> RECOMMENDATION: Choose A — an independent second opinion catches structural blind +> spots. Two different AI models agreeing on a plan is stronger signal than one model's +> thorough review. Completeness: A=9/10, B=7/10. + +Options: +- A) Get the outside voice (recommended) +- B) Skip — proceed to outputs + +**If B:** Print "Skipping outside voice." and continue to the next section. + +**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file +the user pointed this review at, or the branch diff scope). If a CEO plan document +was written in Step 0D-POST, read that too — it contains the scope decisions and vision. + +Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB, +truncate to the first 30KB and note "Plan truncated for size"): + +"You are a brutally honest technical reviewer examining a development plan that has +already been through a multi-section review. Your job is NOT to repeat that review. +Instead, find what it missed. Look for: logical gaps and unstated assumptions that +survived the review scrutiny, overcomplexity (is there a fundamentally simpler +approach the review was too deep in the weeds to see?), feasibility risks the review +took for granted, missing dependencies or sequencing issues, and strategic +miscalibration (is this the right thing to build at all?). Be direct. Be terse. No +compliments. Just the problems. + +THE PLAN: +" + +**If CODEX_AVAILABLE:** + +```bash +TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) +codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +``` + +Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: +```bash +cat "$TMPERR_PV" +``` + +Present the full output verbatim: + +``` +CODEX SAYS (plan review — outside voice): +════════════════════════════════════════════════════════════ + +════════════════════════════════════════════════════════════ +``` + +**Error handling:** All errors are non-blocking — the outside voice is informational. +- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \`codex login\` to authenticate." +- Timeout: "Codex timed out after 5 minutes." +- Empty response: "Codex returned no response." + +On any Codex error, fall back to the Claude adversarial subagent. + +**If CODEX_NOT_AVAILABLE (or Codex errored):** + +Dispatch via the Agent tool. The subagent has fresh context — genuine independence. + +Subagent prompt: same plan review prompt as above. + +Present findings under an `OUTSIDE VOICE (Claude subagent):` header. + +If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs." + +**Cross-model tension:** + +After presenting the outside voice findings, note any points where the outside voice +disagrees with the review findings from earlier sections. Flag these as: + +``` +CROSS-MODEL TENSION: + [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.] +``` + +For each substantive tension point, auto-propose as a TODO via AskUserQuestion: + +> "Cross-model disagreement on [topic]. The review found [X] but the outside voice +> argues [Y]. Worth investigating further?" + +Options: +- A) Add to TODOS.md +- B) Skip — not substantive + +If no tension points exist, note: "No cross-model tension — both reviewers agree." + +**Persist the result:** +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}' +``` + +Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist. +SOURCE = "codex" if Codex ran, "claude" if subagent ran. + +**Cleanup:** Run `rm -f "$TMPERR_PV"` after processing (if Codex was used). + +--- + ## Post-Implementation Design Audit (if UI scope detected) After implementation, run `/design-review` on the live site to catch visual issues that can only be evaluated with rendered output. @@ -1034,6 +1153,7 @@ List every ASCII diagram in files this plan touches. Still accurate? | TODOS.md updates | ___ items proposed | | Scope proposals | ___ proposed, ___ accepted (EXP + SEL) | | CEO plan | written / skipped (HOLD/REDUCTION) | + | Outside voice | ran (codex/claude) / skipped | | Lake Score | X/Y recommendations chose complete option | | Diagrams produced | ___ (list types) | | Stale diagrams found | ___ | @@ -1087,7 +1207,7 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -1099,6 +1219,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -1109,6 +1230,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index d7aaa3e8..f8e2fbdb 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -536,7 +536,7 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -548,6 +548,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -558,6 +559,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index b3f099a0..dcc17f46 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -337,28 +337,124 @@ Before reviewing anything, answer these questions: If the complexity check triggers (8+ files or 2+ new classes/services), proactively recommend scope reduction via AskUserQuestion — explain what's overbuilt, propose a minimal version that achieves the core goal, and ask whether to reduce or proceed as-is. If the complexity check does not trigger, present your Step 0 findings and proceed directly to Section 1. -### Step 0.5: Codex plan review (optional) +## Outside Voice — Independent Plan Challenge (optional, recommended) -Check if the Codex CLI is available: `which codex 2>/dev/null` +After all review sections are complete, offer an independent second opinion from a +different AI system. Two models agreeing on a plan is stronger signal than one model's +thorough review. -If available, after presenting Step 0 findings, use AskUserQuestion: -``` -Want an independent Codex (OpenAI) review of this plan before the detailed review? -A) Yes — let Codex critique the plan independently -B) No — proceed with the Claude review only -``` +**Check tool availability:** -If the user chooses A: tell Codex to read the plan file itself (avoids ARG_MAX limits for large plans): ```bash -codex exec "You are a brutally honest technical reviewer. Read the plan file at and review it for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" ``` -Replace `` with the actual path to the plan file detected earlier. Codex has filesystem access in read-only mode and will read the file itself. +Use AskUserQuestion: -Present the full output under a `CODEX SAYS (plan review):` header. Note any concerns -that should inform the subsequent engineering review sections. +> "All review sections are complete. Want an outside voice? A different AI system can +> give a brutally honest, independent challenge of this plan — logical gaps, feasibility +> risks, and blind spots that are hard to catch from inside the review. Takes about 2 +> minutes." +> +> RECOMMENDATION: Choose A — an independent second opinion catches structural blind +> spots. Two different AI models agreeing on a plan is stronger signal than one model's +> thorough review. Completeness: A=9/10, B=7/10. -If Codex is not available, skip silently. +Options: +- A) Get the outside voice (recommended) +- B) Skip — proceed to outputs + +**If B:** Print "Skipping outside voice." and continue to the next section. + +**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file +the user pointed this review at, or the branch diff scope). If a CEO plan document +was written in Step 0D-POST, read that too — it contains the scope decisions and vision. + +Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB, +truncate to the first 30KB and note "Plan truncated for size"): + +"You are a brutally honest technical reviewer examining a development plan that has +already been through a multi-section review. Your job is NOT to repeat that review. +Instead, find what it missed. Look for: logical gaps and unstated assumptions that +survived the review scrutiny, overcomplexity (is there a fundamentally simpler +approach the review was too deep in the weeds to see?), feasibility risks the review +took for granted, missing dependencies or sequencing issues, and strategic +miscalibration (is this the right thing to build at all?). Be direct. Be terse. No +compliments. Just the problems. + +THE PLAN: +" + +**If CODEX_AVAILABLE:** + +```bash +TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) +codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +``` + +Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: +```bash +cat "$TMPERR_PV" +``` + +Present the full output verbatim: + +``` +CODEX SAYS (plan review — outside voice): +════════════════════════════════════════════════════════════ + +════════════════════════════════════════════════════════════ +``` + +**Error handling:** All errors are non-blocking — the outside voice is informational. +- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \`codex login\` to authenticate." +- Timeout: "Codex timed out after 5 minutes." +- Empty response: "Codex returned no response." + +On any Codex error, fall back to the Claude adversarial subagent. + +**If CODEX_NOT_AVAILABLE (or Codex errored):** + +Dispatch via the Agent tool. The subagent has fresh context — genuine independence. + +Subagent prompt: same plan review prompt as above. + +Present findings under an `OUTSIDE VOICE (Claude subagent):` header. + +If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs." + +**Cross-model tension:** + +After presenting the outside voice findings, note any points where the outside voice +disagrees with the review findings from earlier sections. Flag these as: + +``` +CROSS-MODEL TENSION: + [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.] +``` + +For each substantive tension point, auto-propose as a TODO via AskUserQuestion: + +> "Cross-model disagreement on [topic]. The review found [X] but the outside voice +> argues [Y]. Worth investigating further?" + +Options: +- A) Add to TODOS.md +- B) Skip — not substantive + +If no tension points exist, note: "No cross-model tension — both reviewers agree." + +**Persist the result:** +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}' +``` + +Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist. +SOURCE = "codex" if Codex ran, "claude" if subagent ran. + +**Cleanup:** Run `rm -f "$TMPERR_PV"` after processing (if Codex was used). + +--- Always work through the full interactive review: one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. @@ -493,6 +589,7 @@ At the end of the review, fill in and display this summary so the user can see a - What already exists: written - TODOS.md updates: ___ items proposed to user - Failure modes: ___ critical gaps flagged +- Outside voice: ran (codex/claude) / skipped - Lake Score: X/Y recommendations chose complete option ## Retrospective learning @@ -535,7 +632,7 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -547,6 +644,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -557,6 +655,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/ship/SKILL.md b/ship/SKILL.md index 6ad69ba7..569fa76e 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -305,7 +305,7 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -317,6 +317,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | | Adversarial | 0 | — | — | no | +| Outside Voice | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -327,6 +328,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. - **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed. +- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)