mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
fix: per-mode reasoning effort defaults, add --xhigh override
xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs on large context tasks (OpenAI issues #8545, #8402, #6931). Per-mode defaults for /codex skill: - Review: high (bounded diff, needs thoroughness) - Challenge: high (adversarial but bounded by diff) - Consult: medium (large context, interactive, needs speed) Also changes all Outside Voice / adversarial codex invocations across gstack (resolvers, gen-skill-docs) from xhigh to high. Users can override with --xhigh flag when they want max reasoning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+23
-6
@@ -363,6 +363,14 @@ Parse the user's input to determine which mode to run:
|
||||
- Otherwise, ask: "What would you like to ask Codex?"
|
||||
4. `/codex <anything else>` — **Consult mode** (Step 2C), where the remaining text is the prompt
|
||||
|
||||
**Reasoning effort override:** If the user's input contains `--xhigh` anywhere,
|
||||
note it and remove it from the prompt text before passing to Codex. When `--xhigh`
|
||||
is present, use `model_reasoning_effort="xhigh"` for all modes regardless of the
|
||||
per-mode default below. Otherwise, use the per-mode defaults:
|
||||
- Review (2A): `high` — bounded diff input, needs thoroughness
|
||||
- Challenge (2B): `high` — adversarial but bounded by diff
|
||||
- Consult (2C): `medium` — large context, interactive, needs speed
|
||||
|
||||
---
|
||||
|
||||
## Step 2A: Review Mode
|
||||
@@ -376,13 +384,15 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
3. Capture the output. Then parse cost from stderr:
|
||||
@@ -520,7 +530,7 @@ With focus (e.g., "security"):
|
||||
|
||||
2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -605,7 +615,7 @@ THE PLAN:
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -638,7 +648,7 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
<same python streaming parser as above, with flush=True on all print() calls>
|
||||
"
|
||||
```
|
||||
@@ -674,7 +684,14 @@ Session saved — run /codex again to continue this conversation.
|
||||
agentic coding model). This means as OpenAI ships newer models, /codex automatically
|
||||
uses them. If the user wants a specific model, pass `-m` through to codex.
|
||||
|
||||
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
|
||||
**Reasoning effort (per-mode defaults):**
|
||||
- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens
|
||||
- **Challenge (2B):** `high` — adversarial but bounded by diff size
|
||||
- **Consult (2C):** `medium` — large context (plans, codebase), interactive, needs speed
|
||||
|
||||
`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context
|
||||
tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag
|
||||
(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait.
|
||||
|
||||
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
|
||||
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
|
||||
|
||||
+23
-6
@@ -67,6 +67,14 @@ Parse the user's input to determine which mode to run:
|
||||
- Otherwise, ask: "What would you like to ask Codex?"
|
||||
4. `/codex <anything else>` — **Consult mode** (Step 2C), where the remaining text is the prompt
|
||||
|
||||
**Reasoning effort override:** If the user's input contains `--xhigh` anywhere,
|
||||
note it and remove it from the prompt text before passing to Codex. When `--xhigh`
|
||||
is present, use `model_reasoning_effort="xhigh"` for all modes regardless of the
|
||||
per-mode default below. Otherwise, use the per-mode defaults:
|
||||
- Review (2A): `high` — bounded diff input, needs thoroughness
|
||||
- Challenge (2B): `high` — adversarial but bounded by diff
|
||||
- Consult (2C): `medium` — large context, interactive, needs speed
|
||||
|
||||
---
|
||||
|
||||
## Step 2A: Review Mode
|
||||
@@ -80,13 +88,15 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
3. Capture the output. Then parse cost from stderr:
|
||||
@@ -159,7 +169,7 @@ With focus (e.g., "security"):
|
||||
|
||||
2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout):
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -244,7 +254,7 @@ THE PLAN:
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -277,7 +287,7 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
<same python streaming parser as above, with flush=True on all print() calls>
|
||||
"
|
||||
```
|
||||
@@ -313,7 +323,14 @@ Session saved — run /codex again to continue this conversation.
|
||||
agentic coding model). This means as OpenAI ships newer models, /codex automatically
|
||||
uses them. If the user wants a specific model, pass `-m` through to codex.
|
||||
|
||||
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
|
||||
**Reasoning effort (per-mode defaults):**
|
||||
- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens
|
||||
- **Challenge (2B):** `high` — adversarial but bounded by diff size
|
||||
- **Consult (2C):** `medium` — large context (plans, codebase), interactive, needs speed
|
||||
|
||||
`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context
|
||||
tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag
|
||||
(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait.
|
||||
|
||||
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
|
||||
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
|
||||
|
||||
@@ -670,7 +670,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
|
||||
|
||||
```bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
|
||||
@@ -1047,7 +1047,7 @@ THE PLAN:
|
||||
|
||||
```bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
|
||||
@@ -705,7 +705,7 @@ THE PLAN:
|
||||
|
||||
```bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
|
||||
+2
-2
@@ -933,7 +933,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -978,7 +978,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
|
||||
|
||||
@@ -2197,7 +2197,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
@@ -2281,7 +2281,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -2326,7 +2326,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
\`\`\`bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
|
||||
@@ -2436,7 +2436,7 @@ THE PLAN:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
|
||||
@@ -292,7 +292,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
@@ -376,7 +376,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -421,7 +421,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
\`\`\`bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
|
||||
@@ -531,7 +531,7 @@ THE PLAN:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
|
||||
+2
-2
@@ -1423,7 +1423,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -1468,7 +1468,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
|
||||
|
||||
@@ -1325,7 +1325,7 @@ describe('Codex skill', () => {
|
||||
expect(content).toContain('fall back to the Claude adversarial subagent');
|
||||
// Review log uses new skill name
|
||||
expect(content).toContain('adversarial-review');
|
||||
expect(content).toContain('xhigh');
|
||||
expect(content).toContain('reasoning_effort="high"');
|
||||
expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS');
|
||||
});
|
||||
|
||||
@@ -1335,7 +1335,7 @@ describe('Codex skill', () => {
|
||||
expect(content).toContain('< 50');
|
||||
expect(content).toContain('200+');
|
||||
expect(content).toContain('adversarial-review');
|
||||
expect(content).toContain('xhigh');
|
||||
expect(content).toContain('reasoning_effort="high"');
|
||||
expect(content).toContain('Investigate and fix');
|
||||
});
|
||||
|
||||
|
||||
Reference in New Issue
Block a user