From 430d67473eac2b1fd499ddc109860fb1a7a85cdf Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Fri, 20 Mar 2026 13:07:23 -0700 Subject: [PATCH] feat: default codex reviews in /ship and /review (v0.9.4.0) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Codex code reviews now run automatically — both review + adversarial challenge — with a one-time opt-in prompt for new users. All modes use xhigh reasoning. Codex-host builds strip the step to prevent recursion. Fixes from Codex review: TMPERR properly defined, stderr captured for both review and adversarial, error handling before log persist, commit hash included in review log for staleness tracking. Co-Authored-By: Claude Opus 4.6 --- CHANGELOG.md | 13 ++++++++++ VERSION | 2 +- review/SKILL.md | 52 ++++++++++++++++++++++++++------------- scripts/gen-skill-docs.ts | 52 ++++++++++++++++++++++++++------------- ship/SKILL.md | 42 ++++++++++++++++++++++--------- 5 files changed, 114 insertions(+), 47 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b4e8261c..2fa0a844 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,18 @@ # Changelog +## [0.9.4.0] - 2026-03-20 — Codex Reviews On By Default + +### Changed + +- **Codex code reviews now run automatically in `/ship` and `/review`.** No more "want a second opinion?" prompt every time — Codex reviews both your code (with a pass/fail gate) and runs an adversarial challenge by default. First-time users get a one-time opt-in prompt; after that, it's hands-free. Configure with `gstack-config set codex_reviews enabled|disabled`. +- **All Codex operations use maximum reasoning power.** Review, adversarial, and consult modes all use `xhigh` reasoning effort — when an AI is reviewing your code, you want it thinking as hard as possible. +- **Codex review errors can't corrupt the dashboard.** Auth failures, timeouts, and empty responses are now detected before logging results, so the Review Readiness Dashboard never shows a false "passed" entry. Adversarial stderr is captured separately. +- **Codex review log includes commit hash.** Staleness detection now works correctly for Codex reviews, matching the same commit-tracking behavior as eng/CEO/design reviews. + +### Fixed + +- **Codex-for-Codex recursion prevented.** When gstack runs inside Codex CLI (`.agents/skills/`), the Codex review step is completely stripped — no accidental infinite loops. + ## [0.9.3.0] - 2026-03-20 — Windows Support ### Fixed diff --git a/VERSION b/VERSION index 947d2886..3544d2f0 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.9.3.0 +0.9.4.0 diff --git a/review/SKILL.md b/review/SKILL.md index 868abe6b..4a646f6e 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -526,11 +526,21 @@ Then skip this step. Continue to the next step. Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call. +First, create a temp file for stderr capture: +```bash +TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) +``` + **Code review:** Run: ```bash codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" ``` +After the command completes, read stderr for cost/error info: +```bash +cat "$TMPERR" +``` + Present the full output verbatim under a `CODEX SAYS (code review):` header: ``` @@ -556,12 +566,33 @@ If the user chooses A: read the Codex findings carefully and work to address the If the user chooses B: continue to the next step. -**Adversarial challenge:** Run: +### Error handling (code review) + +Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check `$TMPERR` output (already read above) for error indicators: + +- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway). +- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup. +- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: ." Do NOT persist a review log entry. Skip to cleanup. + +**Only if codex produced a real review (non-empty stdout):** Persist the code review result: ```bash -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}' ``` -Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. +Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"). + +**Adversarial challenge:** Run: +```bash +TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +``` + +After the command completes, read adversarial stderr: +```bash +cat "$TMPERR_ADV" +``` + +Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue. **Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output: @@ -573,20 +604,7 @@ CROSS-MODEL ANALYSIS: Agreement rate: X% (N/M total unique findings overlap) ``` -**Persist the code review result:** -```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}' -``` - -Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"). - -### Error handling - -All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. - -- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step. -- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step. -- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step. +**Cleanup:** Run `rm -f "$TMPERR" "$TMPERR_ADV"` after processing. --- diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 76848f2f..8bb16bf9 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -1462,11 +1462,21 @@ Then skip this step. Continue to the next step. Always run **both** code review and adversarial challenge. Use a 5-minute timeout (\`timeout: 300000\`) on each Bash call. +First, create a temp file for stderr capture: +\`\`\`bash +TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) +\`\`\` + **Code review:** Run: \`\`\`bash codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" \`\`\` +After the command completes, read stderr for cost/error info: +\`\`\`bash +cat "$TMPERR" +\`\`\` + Present the full output verbatim under a \`CODEX SAYS (code review):\` header: \`\`\` @@ -1492,12 +1502,33 @@ If the user chooses A: read the Codex findings carefully and work to address the If the user chooses B: continue to the next step. -**Adversarial challenge:** Run: +### Error handling (code review) + +Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check \`$TMPERR\` output (already read above) for error indicators: + +- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway). +- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup. +- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: ." Do NOT persist a review log entry. Skip to cleanup. + +**Only if codex produced a real review (non-empty stdout):** Persist the code review result: \`\`\`bash -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}' \`\`\` -Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping. +Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"). + +**Adversarial challenge:** Run: +\`\`\`bash +TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +\`\`\` + +After the command completes, read adversarial stderr: +\`\`\`bash +cat "$TMPERR_ADV" +\`\`\` + +Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue. ${!isShip ? ` **Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output: @@ -1509,20 +1540,7 @@ CROSS-MODEL ANALYSIS: Agreement rate: X% (N/M total unique findings overlap) \`\`\` ` : ''} -**Persist the code review result:** -\`\`\`bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}' -\`\`\` - -Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"). - -### Error handling - -All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. - -- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Continue to the next step. -- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step. -- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step. +**Cleanup:** Run \`rm -f "$TMPERR" "$TMPERR_ADV"\` after processing. ---`; } diff --git a/ship/SKILL.md b/ship/SKILL.md index b86bc9db..232a23e0 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -890,11 +890,21 @@ Then skip this step. Continue to the next step. Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call. +First, create a temp file for stderr capture: +```bash +TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) +``` + **Code review:** Run: ```bash codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" ``` +After the command completes, read stderr for cost/error info: +```bash +cat "$TMPERR" +``` + Present the full output verbatim under a `CODEX SAYS (code review):` header: ``` @@ -920,27 +930,35 @@ If the user chooses A: read the Codex findings carefully and work to address the If the user chooses B: continue to the next step. -**Adversarial challenge:** Run: -```bash -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached -``` +### Error handling (code review) -Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. +Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check `$TMPERR` output (already read above) for error indicators: -**Persist the code review result:** +- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway). +- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup. +- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: ." Do NOT persist a review log entry. Skip to cleanup. + +**Only if codex produced a real review (non-empty stdout):** Persist the code review result: ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}' ``` Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail"). -### Error handling +**Adversarial challenge:** Run: +```bash +TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +``` -All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. +After the command completes, read adversarial stderr: +```bash +cat "$TMPERR_ADV" +``` -- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step. -- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step. -- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step. +Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue. + +**Cleanup:** Run `rm -f "$TMPERR" "$TMPERR_ADV"` after processing. ---