feat: default codex reviews in /ship and /review with xhigh reasoning

Codex code reviews are now opt-in-once-then-always-on via a one-time
adoption prompt. When enabled, both review + adversarial run automatically
on every /ship and /review — no more choosing between them.

Key changes:
- New {{CODEX_REVIEW_STEP}} resolver centralizes Codex review logic (DRY)
- Three-state config: enabled/not-set/disabled via gstack-config
- P1 findings default to "Investigate and fix" instead of "Ship anyway"
- All reasoning bumped to xhigh (review, adversarial, consult)
- Codex review step stripped from codex-host variants (no self-invocation)
- Ship "Never ask" rule updated to accurately list quality-gate stops
- Error handling for auth, timeout, empty response (all non-blocking)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-20 08:32:45 -07:00
parent ae2d841012
commit f32759a18a
17 changed files with 333 additions and 244 deletions
+117 -1
View File
@@ -1092,7 +1092,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \\\`gstack-config set codex_reviews enabled|disabled\\\`.
**Verdict logic:**
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \\\`skip_eng_review\\\` is \\\`true\\\`)
@@ -1412,6 +1412,121 @@ The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstrea
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.`;
}
function generateCodexReviewStep(ctx: TemplateContext): string {
// Codex host: strip entirely — Codex should never invoke itself
if (ctx.host === 'codex') return '';
const isShip = ctx.skillName === 'ship';
const stepNum = isShip ? '3.8' : '5.7';
return `## Step ${stepNum}: Codex review
Check if the Codex CLI is available and read the user's Codex review preference:
\`\`\`bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
CODEX_REVIEWS_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
echo "CODEX_REVIEWS: \${CODEX_REVIEWS_CFG:-not_set}"
\`\`\`
If \`CODEX_NOT_AVAILABLE\`: skip this step silently. Continue to the next step.
If \`CODEX_REVIEWS\` is \`disabled\`: skip this step silently. Continue to the next step.
If \`CODEX_REVIEWS\` is \`enabled\`: run both code review and adversarial challenge automatically (no prompt). Jump to the "Run Codex" section below.
If \`CODEX_REVIEWS\` is \`not_set\`: use AskUserQuestion to offer the one-time adoption prompt:
\`\`\`
GStack recommends enabling Codex code reviews — Codex is the super smart quiet engineer friend who will save your butt.
A) Enable for all future runs (recommended, default)
B) Try it for now, ask me again later
C) No thanks, don't ask me again
\`\`\`
If the user chooses A: persist the setting and run both:
\`\`\`bash
~/.claude/skills/gstack/bin/gstack-config set codex_reviews enabled
\`\`\`
If the user chooses B: run both this time but do not persist any setting.
If the user chooses C: persist the opt-out and skip:
\`\`\`bash
~/.claude/skills/gstack/bin/gstack-config set codex_reviews disabled
\`\`\`
Then skip this step. Continue to the next step.
### Run Codex
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (\`timeout: 300000\`) on each Bash call.
**Code review:** Run:
\`\`\`bash
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
\`\`\`
Present the full output verbatim under a \`CODEX SAYS (code review):\` header:
\`\`\`
CODEX SAYS (code review):
════════════════════════════════════════════════════════════
<full codex output, verbatim — do not truncate or summarize>
════════════════════════════════════════════════════════════
GATE: PASS Tokens: N | Est. cost: ~$X.XX
\`\`\`
Check the output for \`[P1]\` markers. If found: \`GATE: FAIL\`. If no \`[P1]\`: \`GATE: PASS\`.
**If GATE is FAIL:** use AskUserQuestion:
\`\`\`
Codex found N critical issues in the diff.
A) Investigate and fix now (recommended)
B) Ship anyway — these issues may cause production problems
\`\`\`
If the user chooses A: read the Codex findings carefully and work to address them${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Then re-run \`codex review\` to verify the gate is now PASS.
If the user chooses B: continue to the next step.
**Adversarial challenge:** Run:
\`\`\`bash
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
\`\`\`
Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping.
${!isShip ? `
**Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output:
\`\`\`
CROSS-MODEL ANALYSIS:
Both found: [findings that overlap between Claude and Codex]
Only Codex found: [findings unique to Codex]
Only Claude found: [findings unique to Claude's review]
Agreement rate: X% (N/M total unique findings overlap)
\`\`\`
` : ''}
**Persist the code review result:**
\`\`\`bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
\`\`\`
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
### Error handling
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Continue to the next step.
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
---`;
}
const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
COMMAND_REFERENCE: generateCommandReference,
SNAPSHOT_FLAGS: generateSnapshotFlags,
@@ -1426,6 +1541,7 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
SPEC_REVIEW_LOOP: generateSpecReviewLoop,
DESIGN_SKETCH: generateDesignSketch,
BENEFITS_FROM: generateBenefitsFrom,
CODEX_REVIEW_STEP: generateCodexReviewStep,
};
// ─── Codex Helpers ───────────────────────────────────────────