From 11977d46ae7c6e07281465e8ba4e465c621adfdc Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Thu, 26 Mar 2026 12:18:25 -0600 Subject: [PATCH] fix: per-mode reasoning effort defaults, add --xhigh override xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs on large context tasks (OpenAI issues #8545, #8402, #6931). Per-mode defaults for /codex skill: - Review: high (bounded diff, needs thoroughness) - Challenge: high (adversarial but bounded by diff) - Consult: medium (large context, interactive, needs speed) Also changes all Outside Voice / adversarial codex invocations across gstack (resolvers, gen-skill-docs) from xhigh to high. Users can override with --xhigh flag when they want max reasoning. Co-Authored-By: Claude Opus 4.6 (1M context) --- codex/SKILL.md | 29 +++++++++++++++++++++++------ codex/SKILL.md.tmpl | 29 +++++++++++++++++++++++------ office-hours/SKILL.md | 2 +- plan-ceo-review/SKILL.md | 2 +- plan-eng-review/SKILL.md | 2 +- review/SKILL.md | 4 ++-- scripts/gen-skill-docs.ts | 8 ++++---- scripts/resolvers/review.ts | 8 ++++---- ship/SKILL.md | 4 ++-- test/skill-validation.test.ts | 4 ++-- 10 files changed, 63 insertions(+), 29 deletions(-) diff --git a/codex/SKILL.md b/codex/SKILL.md index 5739604e..a737bc4b 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -363,6 +363,14 @@ Parse the user's input to determine which mode to run: - Otherwise, ask: "What would you like to ask Codex?" 4. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt +**Reasoning effort override:** If the user's input contains `--xhigh` anywhere, +note it and remove it from the prompt text before passing to Codex. When `--xhigh` +is present, use `model_reasoning_effort="xhigh"` for all modes regardless of the +per-mode default below. Otherwise, use the per-mode defaults: +- Review (2A): `high` — bounded diff input, needs thoroughness +- Challenge (2B): `high` — adversarial but bounded by diff +- Consult (2C): `medium` — large context, interactive, needs speed + --- ## Step 2A: Review Mode @@ -376,13 +384,15 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) 2. Run the review (5-minute timeout): ```bash -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` +If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`. + Use `timeout: 300000` on the Bash call. If the user provided custom instructions (e.g., `/codex review focus on security`), pass them as the prompt argument: ```bash -codex review "focus on security" --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review "focus on security" --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` 3. Capture the output. Then parse cost from stderr: @@ -520,7 +530,7 @@ With focus (e.g., "security"): 2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout): ```bash -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c " import sys, json for line in sys.stdin: line = line.strip() @@ -605,7 +615,7 @@ THE PLAN: For a **new session:** ```bash -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " import sys, json for line in sys.stdin: line = line.strip() @@ -638,7 +648,7 @@ for line in sys.stdin: For a **resumed session** (user chose "Continue"): ```bash -codex exec resume "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec resume "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " " ``` @@ -674,7 +684,14 @@ Session saved — run /codex again to continue this conversation. agentic coding model). This means as OpenAI ships newer models, /codex automatically uses them. If the user wants a specific model, pass `-m` through to codex. -**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible. +**Reasoning effort (per-mode defaults):** +- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens +- **Challenge (2B):** `high` — adversarial but bounded by diff size +- **Consult (2C):** `medium` — large context (plans, codebase), interactive, needs speed + +`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context +tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag +(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait. **Web search:** All codex commands use `--enable web_search_cached` so Codex can look up docs and APIs during review. This is OpenAI's cached index — fast, no extra cost. diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl index 41e64fe7..97c99a5a 100644 --- a/codex/SKILL.md.tmpl +++ b/codex/SKILL.md.tmpl @@ -67,6 +67,14 @@ Parse the user's input to determine which mode to run: - Otherwise, ask: "What would you like to ask Codex?" 4. `/codex ` — **Consult mode** (Step 2C), where the remaining text is the prompt +**Reasoning effort override:** If the user's input contains `--xhigh` anywhere, +note it and remove it from the prompt text before passing to Codex. When `--xhigh` +is present, use `model_reasoning_effort="xhigh"` for all modes regardless of the +per-mode default below. Otherwise, use the per-mode defaults: +- Review (2A): `high` — bounded diff input, needs thoroughness +- Challenge (2B): `high` — adversarial but bounded by diff +- Consult (2C): `medium` — large context, interactive, needs speed + --- ## Step 2A: Review Mode @@ -80,13 +88,15 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt) 2. Run the review (5-minute timeout): ```bash -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` +If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`. + Use `timeout: 300000` on the Bash call. If the user provided custom instructions (e.g., `/codex review focus on security`), pass them as the prompt argument: ```bash -codex review "focus on security" --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review "focus on security" --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` 3. Capture the output. Then parse cost from stderr: @@ -159,7 +169,7 @@ With focus (e.g., "security"): 2. Run codex exec with **JSONL output** to capture reasoning traces and tool calls (5-minute timeout): ```bash -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c " import sys, json for line in sys.stdin: line = line.strip() @@ -244,7 +254,7 @@ THE PLAN: For a **new session:** ```bash -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " import sys, json for line in sys.stdin: line = line.strip() @@ -277,7 +287,7 @@ for line in sys.stdin: For a **resumed session** (user chose "Continue"): ```bash -codex exec resume "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " +codex exec resume "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c " " ``` @@ -313,7 +323,14 @@ Session saved — run /codex again to continue this conversation. agentic coding model). This means as OpenAI ships newer models, /codex automatically uses them. If the user wants a specific model, pass `-m` through to codex. -**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible. +**Reasoning effort (per-mode defaults):** +- **Review (2A):** `high` — bounded diff input, needs thoroughness but not max tokens +- **Challenge (2B):** `high` — adversarial but bounded by diff size +- **Consult (2C):** `medium` — large context (plans, codebase), interactive, needs speed + +`xhigh` uses ~23x more tokens than `high` and causes 50+ minute hangs on large context +tasks (OpenAI issues #8545, #8402, #6931). Users can override with `--xhigh` flag +(e.g., `/codex review --xhigh`) when they want maximum reasoning and are willing to wait. **Web search:** All codex commands use `--enable web_search_cached` so Codex can look up docs and APIs during review. This is OpenAI's cached index — fast, no extra cost. diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md index 9e2debd4..278bc49f 100644 --- a/office-hours/SKILL.md +++ b/office-hours/SKILL.md @@ -670,7 +670,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode- ```bash TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX) -codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH" +codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH" ``` Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index c092ebc1..a6b52834 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -1047,7 +1047,7 @@ THE PLAN: ```bash TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV" ``` Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 5b57c16f..38a2c580 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -705,7 +705,7 @@ THE PLAN: ```bash TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV" ``` Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr: diff --git a/review/SKILL.md b/review/SKILL.md index 3c28ed6c..4dd3cddd 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -933,7 +933,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal ```bash TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV" ``` Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr: @@ -978,7 +978,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f **1. Codex structured review (if available):** ```bash TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header. diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 81cd7476..5586febb 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -2197,7 +2197,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode- \`\`\`bash TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX) -codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH" +codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH" \`\`\` Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr: @@ -2281,7 +2281,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal \`\`\`bash TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV" \`\`\` Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr: @@ -2326,7 +2326,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f **1. Codex structured review (if available):** \`\`\`bash TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" \`\`\` Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header. @@ -2436,7 +2436,7 @@ THE PLAN: \`\`\`bash TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) -codex exec "" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +codex exec "" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV" \`\`\` Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr: diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts index 86da3b86..116f15bc 100644 --- a/scripts/resolvers/review.ts +++ b/scripts/resolvers/review.ts @@ -292,7 +292,7 @@ Write the full prompt (context block + instructions) to this file. Use the mode- \`\`\`bash TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX) -codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH" +codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH" \`\`\` Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr: @@ -376,7 +376,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal \`\`\`bash TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV" \`\`\` Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr: @@ -421,7 +421,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f **1. Codex structured review (if available):** \`\`\`bash TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" \`\`\` Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header. @@ -531,7 +531,7 @@ THE PLAN: \`\`\`bash TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX) -codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV" +codex exec "" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV" \`\`\` Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr: diff --git a/ship/SKILL.md b/ship/SKILL.md index 0fbc474f..36baab54 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -1423,7 +1423,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal ```bash TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX) -codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV" +codex exec "Review the changes on this branch against the base branch. Run git diff origin/ to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV" ``` Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr: @@ -1468,7 +1468,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f **1. Codex structured review (if available):** ```bash TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX) -codex review --base -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR" +codex review --base -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR" ``` Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header. diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 535ce73f..450a1bbe 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -1325,7 +1325,7 @@ describe('Codex skill', () => { expect(content).toContain('fall back to the Claude adversarial subagent'); // Review log uses new skill name expect(content).toContain('adversarial-review'); - expect(content).toContain('xhigh'); + expect(content).toContain('reasoning_effort="high"'); expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS'); }); @@ -1335,7 +1335,7 @@ describe('Codex skill', () => { expect(content).toContain('< 50'); expect(content).toContain('200+'); expect(content).toContain('adversarial-review'); - expect(content).toContain('xhigh'); + expect(content).toContain('reasoning_effort="high"'); expect(content).toContain('Investigate and fix'); });