From 6294c5a74a4c929cce6e6ae3c3aebcf460a21adb Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Wed, 18 Mar 2026 21:29:07 -0700
Subject: [PATCH] feat: per-mode reasoning (high for review/consult, xhigh for
 challenge) + web search
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Review and consult use high reasoning — thorough but not slow.
Challenge (adversarial) uses xhigh — maximum depth for breaking code.
All modes enable web_search_cached so Codex can look up docs/APIs.
---
 codex/SKILL.md      | 31 +++++++++++++++++++------------
 codex/SKILL.md.tmpl | 31 +++++++++++++++++++------------
 2 files changed, 38 insertions(+), 24 deletions(-)
diff --git a/codex/SKILL.md b/codex/SKILL.md
index 83e6d6a9..04e85664 100644
--- a/codex/SKILL.md
+++ b/codex/SKILL.md
@@ -221,13 +221,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
 
 2. Run the review (5-minute timeout):
 ```bash
-codex review --base <base> -c 'model_reasoning_effort="xhigh"' 2>"$TMPERR"
+codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 (e.g., `/codex review focus on security`), pass them as the prompt argument:
 ```bash
-codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' 2>"$TMPERR"
+codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 3. Capture the output. Then parse cost from stderr:
@@ -306,7 +306,7 @@ With focus (e.g., "security"):
 
 3. Run codex exec (5-minute timeout):
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 4. Read the response and parse cost:
@@ -369,12 +369,12 @@ THE PLAN:
 
 For a **new session:**
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 For a **resumed session** (user chose "Continue"):
 ```bash
-codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 5. Capture and save session ID:
@@ -410,15 +410,22 @@ Session saved — run /codex again to continue this conversation.
 
 ## Model & Reasoning
 
-Codex with ChatGPT login only supports `gpt-5.2-codex` (the default). Other models
-(o3, o4-mini, gpt-4o) require API key auth and are not available with ChatGPT login.
+**Models** (with ChatGPT login):
+- `gpt-5.2-codex` (default) — frontier agentic coding model, best overall
+- `gpt-5.2` — deeper reasoning for non-coding analysis
+- `gpt-5.1-codex-max` — deep and fast reasoning, 30% cheaper
+- `gpt-5.1-codex-mini` — faster and cheaper, less capable
 
-All codex commands use `-c 'model_reasoning_effort="xhigh"'` because the whole point of
-this skill is deep, thorough analysis. We want maximum reasoning power — this is your
-"200 IQ autistic developer" second opinion, not a quick sanity check.
+**Reasoning effort** varies by mode — use the right level for each task:
+- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
+- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
+- **Consult mode:** `high` — good balance of depth and speed for conversations.
 
-If the user has API key auth and wants a different model, they can say
-`/codex review -m o3` and the `-m` flag should be passed through to codex.
+**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
+docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
+
+If the user specifies a model (e.g., `/codex review -m gpt-5.1-codex-max`),
+pass the `-m` flag through to codex.
 
 ---
 
diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl
index 1b30a236..bfd1fa20 100644
--- a/codex/SKILL.md.tmpl
+++ b/codex/SKILL.md.tmpl
@@ -75,13 +75,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
 
 2. Run the review (5-minute timeout):
 ```bash
-codex review --base <base> -c 'model_reasoning_effort="xhigh"' 2>"$TMPERR"
+codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 (e.g., `/codex review focus on security`), pass them as the prompt argument:
 ```bash
-codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' 2>"$TMPERR"
+codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 3. Capture the output. Then parse cost from stderr:
@@ -160,7 +160,7 @@ With focus (e.g., "security"):
 
 3. Run codex exec (5-minute timeout):
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 4. Read the response and parse cost:
@@ -223,12 +223,12 @@ THE PLAN:
 
 For a **new session:**
 ```bash
-codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 For a **resumed session** (user chose "Continue"):
 ```bash
-codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' -o "$TMPRESP" 2>"$TMPERR"
+codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached -o "$TMPRESP" 2>"$TMPERR"
 ```
 
 5. Capture and save session ID:
@@ -264,15 +264,22 @@ Session saved — run /codex again to continue this conversation.
 
 ## Model & Reasoning
 
-Codex with ChatGPT login only supports `gpt-5.2-codex` (the default). Other models
-(o3, o4-mini, gpt-4o) require API key auth and are not available with ChatGPT login.
+**Models** (with ChatGPT login):
+- `gpt-5.2-codex` (default) — frontier agentic coding model, best overall
+- `gpt-5.2` — deeper reasoning for non-coding analysis
+- `gpt-5.1-codex-max` — deep and fast reasoning, 30% cheaper
+- `gpt-5.1-codex-mini` — faster and cheaper, less capable
 
-All codex commands use `-c 'model_reasoning_effort="xhigh"'` because the whole point of
-this skill is deep, thorough analysis. We want maximum reasoning power — this is your
-"200 IQ autistic developer" second opinion, not a quick sanity check.
+**Reasoning effort** varies by mode — use the right level for each task:
+- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
+- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
+- **Consult mode:** `high` — good balance of depth and speed for conversations.
 
-If the user has API key auth and wants a different model, they can say
-`/codex review -m o3` and the `-m` flag should be passed through to codex.
+**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
+docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
+
+If the user specifies a model (e.g., `/codex review -m gpt-5.1-codex-max`),
+pass the `-m` flag through to codex.
 
 ---