mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
feat: default codex reviews in /ship and /review (v0.9.4.0)
Codex code reviews now run automatically — both review + adversarial challenge — with a one-time opt-in prompt for new users. All modes use xhigh reasoning. Codex-host builds strip the step to prevent recursion. Fixes from Codex review: TMPERR properly defined, stderr captured for both review and adversarial, error handling before log persist, commit hash included in review log for staleness tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,18 @@
|
||||
# Changelog
|
||||
|
||||
## [0.9.4.0] - 2026-03-20 — Codex Reviews On By Default
|
||||
|
||||
### Changed
|
||||
|
||||
- **Codex code reviews now run automatically in `/ship` and `/review`.** No more "want a second opinion?" prompt every time — Codex reviews both your code (with a pass/fail gate) and runs an adversarial challenge by default. First-time users get a one-time opt-in prompt; after that, it's hands-free. Configure with `gstack-config set codex_reviews enabled|disabled`.
|
||||
- **All Codex operations use maximum reasoning power.** Review, adversarial, and consult modes all use `xhigh` reasoning effort — when an AI is reviewing your code, you want it thinking as hard as possible.
|
||||
- **Codex review errors can't corrupt the dashboard.** Auth failures, timeouts, and empty responses are now detected before logging results, so the Review Readiness Dashboard never shows a false "passed" entry. Adversarial stderr is captured separately.
|
||||
- **Codex review log includes commit hash.** Staleness detection now works correctly for Codex reviews, matching the same commit-tracking behavior as eng/CEO/design reviews.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Codex-for-Codex recursion prevented.** When gstack runs inside Codex CLI (`.agents/skills/`), the Codex review step is completely stripped — no accidental infinite loops.
|
||||
|
||||
## [0.9.3.0] - 2026-03-20 — Windows Support
|
||||
|
||||
### Fixed
|
||||
|
||||
+35
-17
@@ -526,11 +526,21 @@ Then skip this step. Continue to the next step.
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call.
|
||||
|
||||
First, create a temp file for stderr capture:
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
```
|
||||
|
||||
**Code review:** Run:
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
After the command completes, read stderr for cost/error info:
|
||||
```bash
|
||||
cat "$TMPERR"
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header:
|
||||
|
||||
```
|
||||
@@ -556,12 +566,33 @@ If the user chooses A: read the Codex findings carefully and work to address the
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
### Error handling (code review)
|
||||
|
||||
Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check `$TMPERR` output (already read above) for error indicators:
|
||||
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway).
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup.
|
||||
- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: <paste relevant error>." Do NOT persist a review log entry. Skip to cleanup.
|
||||
|
||||
**Only if codex produced a real review (non-empty stdout):** Persist the code review result:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping.
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
After the command completes, read adversarial stderr:
|
||||
```bash
|
||||
cat "$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue.
|
||||
|
||||
**Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output:
|
||||
|
||||
@@ -573,20 +604,7 @@ CROSS-MODEL ANALYSIS:
|
||||
Agreement rate: X% (N/M total unique findings overlap)
|
||||
```
|
||||
|
||||
**Persist the code review result:**
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
### Error handling
|
||||
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
**Cleanup:** Run `rm -f "$TMPERR" "$TMPERR_ADV"` after processing.
|
||||
|
||||
---
|
||||
|
||||
|
||||
+35
-17
@@ -1462,11 +1462,21 @@ Then skip this step. Continue to the next step.
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (\`timeout: 300000\`) on each Bash call.
|
||||
|
||||
First, create a temp file for stderr capture:
|
||||
\`\`\`bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
\`\`\`
|
||||
|
||||
**Code review:** Run:
|
||||
\`\`\`bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
After the command completes, read stderr for cost/error info:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim under a \`CODEX SAYS (code review):\` header:
|
||||
|
||||
\`\`\`
|
||||
@@ -1492,12 +1502,33 @@ If the user chooses A: read the Codex findings carefully and work to address the
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
### Error handling (code review)
|
||||
|
||||
Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check \`$TMPERR\` output (already read above) for error indicators:
|
||||
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway).
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup.
|
||||
- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: <paste relevant error>." Do NOT persist a review log entry. Skip to cleanup.
|
||||
|
||||
**Only if codex produced a real review (non-empty stdout):** Persist the code review result:
|
||||
\`\`\`bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping.
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
\`\`\`bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
After the command completes, read adversarial stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue.
|
||||
${!isShip ? `
|
||||
**Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output:
|
||||
|
||||
@@ -1509,20 +1540,7 @@ CROSS-MODEL ANALYSIS:
|
||||
Agreement rate: X% (N/M total unique findings overlap)
|
||||
\`\`\`
|
||||
` : ''}
|
||||
**Persist the code review result:**
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
\`\`\`
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
### Error handling
|
||||
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
**Cleanup:** Run \`rm -f "$TMPERR" "$TMPERR_ADV"\` after processing.
|
||||
|
||||
---`;
|
||||
}
|
||||
|
||||
+30
-12
@@ -890,11 +890,21 @@ Then skip this step. Continue to the next step.
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call.
|
||||
|
||||
First, create a temp file for stderr capture:
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
```
|
||||
|
||||
**Code review:** Run:
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
After the command completes, read stderr for cost/error info:
|
||||
```bash
|
||||
cat "$TMPERR"
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header:
|
||||
|
||||
```
|
||||
@@ -920,27 +930,35 @@ If the user chooses A: read the Codex findings carefully and work to address the
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
```
|
||||
### Error handling (code review)
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping.
|
||||
Before persisting the gate result, check for errors. All errors are non-blocking — Codex is a quality enhancement, not a prerequisite. Check `$TMPERR` output (already read above) for error indicators:
|
||||
|
||||
**Persist the code review result:**
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key", tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Do NOT persist a review log entry. Continue to the adversarial step (it will likely fail too, but try anyway).
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Do NOT persist a review log entry. Skip to cleanup.
|
||||
- **Empty response:** If codex returned no stdout output, tell the user: "Codex returned no response. Stderr: <paste relevant error>." Do NOT persist a review log entry. Skip to cleanup.
|
||||
|
||||
**Only if codex produced a real review (non-empty stdout):** Persist the code review result:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
### Error handling
|
||||
**Adversarial challenge:** Run:
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
After the command completes, read adversarial stderr:
|
||||
```bash
|
||||
cat "$TMPERR_ADV"
|
||||
```
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping. If the adversarial command timed out or returned no output, note this to the user and continue.
|
||||
|
||||
**Cleanup:** Run `rm -f "$TMPERR" "$TMPERR_ADV"` after processing.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user