diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 24a18674..cc56b66d 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -806,7 +806,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -817,6 +817,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Codex Review | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -826,11 +827,12 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) - **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues -- CEO and Design reviews are shown for context but never block shipping +- CEO, Design, and Codex reviews are shown for context but never block shipping - If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED ## docs/designs Promotion (EXPANSION and SELECTIVE EXPANSION only) diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 21e37c95..fce76143 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -436,7 +436,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -447,6 +447,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Codex Review | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -456,11 +457,12 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) - **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues -- CEO and Design reviews are shown for context but never block shipping +- CEO, Design, and Codex reviews are shown for context but never block shipping - If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED ## Formatting Rules diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index caafb792..ac127e9e 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -211,6 +211,27 @@ Before reviewing anything, answer these questions: If the complexity check triggers (8+ files or 2+ new classes/services), proactively recommend scope reduction via AskUserQuestion — explain what's overbuilt, propose a minimal version that achieves the core goal, and ask whether to reduce or proceed as-is. If the complexity check does not trigger, present your Step 0 findings and proceed directly to Section 1. +### Step 0.5: Codex plan review (optional) + +Check if the Codex CLI is available: `which codex 2>/dev/null` + +If available, after presenting Step 0 findings, use AskUserQuestion: +``` +Want an independent Codex (OpenAI) review of this plan before the detailed review? +A) Yes — let Codex critique the plan independently +B) No — proceed with the Claude review only +``` + +If the user chooses A: read the plan file and run Codex with the plan review persona: +```bash +codex exec "You are a brutally honest technical reviewer. Review this plan for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems." -s read-only +``` + +Present the full output under a `CODEX SAYS (plan review):` header. Note any concerns +that should inform the subsequent engineering review sections. + +If Codex is not available, skip silently. + Always work through the full interactive review: one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. **Critical: Once the user accepts or rejects a scope reduction recommendation, commit fully.** Do not re-argue for smaller scope during later review sections. Do not silently reduce scope or skip planned components. @@ -384,7 +405,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -395,6 +416,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Codex Review | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -404,11 +426,12 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) - **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues -- CEO and Design reviews are shown for context but never block shipping +- CEO, Design, and Codex reviews are shown for context but never block shipping - If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED ## Unresolved decisions diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl index 1ca2b298..3e206507 100644 --- a/plan-eng-review/SKILL.md.tmpl +++ b/plan-eng-review/SKILL.md.tmpl @@ -82,6 +82,27 @@ Before reviewing anything, answer these questions: If the complexity check triggers (8+ files or 2+ new classes/services), proactively recommend scope reduction via AskUserQuestion — explain what's overbuilt, propose a minimal version that achieves the core goal, and ask whether to reduce or proceed as-is. If the complexity check does not trigger, present your Step 0 findings and proceed directly to Section 1. +### Step 0.5: Codex plan review (optional) + +Check if the Codex CLI is available: `which codex 2>/dev/null` + +If available, after presenting Step 0 findings, use AskUserQuestion: +``` +Want an independent Codex (OpenAI) review of this plan before the detailed review? +A) Yes — let Codex critique the plan independently +B) No — proceed with the Claude review only +``` + +If the user chooses A: read the plan file and run Codex with the plan review persona: +```bash +codex exec "You are a brutally honest technical reviewer. Review this plan for: logical gaps and unstated assumptions, missing error handling or edge cases, overcomplexity (is there a simpler approach?), feasibility risks (what could go wrong?), and missing dependencies or sequencing issues. Be direct. Be terse. No compliments. Just the problems." -s read-only +``` + +Present the full output under a `CODEX SAYS (plan review):` header. Note any concerns +that should inform the subsequent engineering review sections. + +If Codex is not available, skip silently. + Always work through the full interactive review: one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section. **Critical: Once the user accepts or rejects a scope reduction recommendation, commit fully.** Do not re-argue for smaller scope during later review sections. Do not silently reduce scope or skip planned components. diff --git a/review/SKILL.md b/review/SKILL.md index 354e715b..2a2e69e0 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -409,6 +409,51 @@ If no documentation files exist, skip this step silently. --- +## Step 5.7: Codex second opinion (optional) + +After completing the review, check if the Codex CLI is available: + +```bash +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" +``` + +If Codex is available, use AskUserQuestion: + +``` +Review complete. Want an independent second opinion from Codex (OpenAI)? + +A) Run Codex code review — independent diff review with pass/fail gate +B) Run Codex adversarial challenge — try to find ways this code will fail in production +C) Both — review first, then adversarial challenge +D) Skip — no Codex review needed +``` + +If the user chooses A, B, or C: + +**For code review (A or C):** Run `codex review --base ` with a 5-minute timeout. +Present the full output verbatim under a `CODEX SAYS (code review):` header. +Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`. +After presenting, compare Codex's findings with your own review findings from Steps 4-5 +and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found, +and what only Claude found. + +**For adversarial challenge (B or C):** Run: +```bash +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only +``` +Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. + +Persist the Codex review result to the review log: +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-') +mkdir -p ~/.gstack/projects/$SLUG +``` + +If Codex is not available, skip this step silently. + +--- + ## Important Rules - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff. diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index 7094a156..ad435ecf 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -230,6 +230,51 @@ If no documentation files exist, skip this step silently. --- +## Step 5.7: Codex second opinion (optional) + +After completing the review, check if the Codex CLI is available: + +```bash +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" +``` + +If Codex is available, use AskUserQuestion: + +``` +Review complete. Want an independent second opinion from Codex (OpenAI)? + +A) Run Codex code review — independent diff review with pass/fail gate +B) Run Codex adversarial challenge — try to find ways this code will fail in production +C) Both — review first, then adversarial challenge +D) Skip — no Codex review needed +``` + +If the user chooses A, B, or C: + +**For code review (A or C):** Run `codex review --base ` with a 5-minute timeout. +Present the full output verbatim under a `CODEX SAYS (code review):` header. +Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`. +After presenting, compare Codex's findings with your own review findings from Steps 4-5 +and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found, +and what only Claude found. + +**For adversarial challenge (B or C):** Run: +```bash +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only +``` +Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. + +Persist the Codex review result to the review log: +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-') +mkdir -p ~/.gstack/projects/$SLUG +``` + +If Codex is not available, skip this step silently. + +--- + ## Important Rules - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff. diff --git a/ship/SKILL.md b/ship/SKILL.md index 3f0f0067..1315be7d 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -211,7 +211,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, codex-review). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: ``` +====================================================================+ @@ -222,6 +222,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Codex Review | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -231,11 +232,12 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) - **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues -- CEO and Design reviews are shown for context but never block shipping +- CEO, Design, and Codex reviews are shown for context but never block shipping - If \`skip_eng_review\` config is \`true\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED If the Eng Review is NOT "CLEAR": @@ -768,6 +770,47 @@ For each classified comment: --- +## Step 3.8: Codex second opinion (optional) + +Check if the Codex CLI is available: + +```bash +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" +``` + +If Codex is available, use AskUserQuestion: + +``` +Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping? + +A) Run Codex code review — independent diff review with pass/fail gate +B) Run Codex adversarial challenge — try to break this code +C) Skip — ship without Codex review +``` + +If the user chooses A or B: + +**For code review (A):** Run `codex review --base ` with a 5-minute timeout. +Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers +to determine pass/fail gate. Persist the result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-') +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}' >> ~/.gstack/projects/$SLUG/$BRANCH_SLUG-reviews.jsonl +``` + +If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?" +If the user says no, stop. If yes, continue to Step 4. + +**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill). +Present findings. This is informational — does not block shipping. + +If Codex is not available, skip silently. Continue to Step 4. + +--- + ## Step 4: Version bump (auto-decide) 1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`) diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index aef5c9d3..e96a9f84 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -402,6 +402,47 @@ For each classified comment: --- +## Step 3.8: Codex second opinion (optional) + +Check if the Codex CLI is available: + +```bash +which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE" +``` + +If Codex is available, use AskUserQuestion: + +``` +Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping? + +A) Run Codex code review — independent diff review with pass/fail gate +B) Run Codex adversarial challenge — try to break this code +C) Skip — ship without Codex review +``` + +If the user chooses A or B: + +**For code review (A):** Run `codex review --base ` with a 5-minute timeout. +Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers +to determine pass/fail gate. Persist the result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-') +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}' >> ~/.gstack/projects/$SLUG/$BRANCH_SLUG-reviews.jsonl +``` + +If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?" +If the user says no, stop. If yes, continue to Step 4. + +**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill). +Present findings. This is informational — does not block shipping. + +If Codex is not available, skip silently. Continue to Step 4. + +--- + ## Step 4: Version bump (auto-decide) 1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)