Merge remote-tracking branch 'origin/main' into garrytan/test-fail-ownership

# Conflicts:
#	scripts/gen-skill-docs.ts
This commit is contained in:
Garry Tan
2026-03-20 16:16:20 -07:00
48 changed files with 2351 additions and 285 deletions
+5 -8
View File
@@ -314,13 +314,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
2. Run the review (5-minute timeout):
```bash
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
```
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
(e.g., `/codex review focus on security`), pass them as the prompt argument:
```bash
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
```
3. Capture the output. Then parse cost from stderr:
@@ -475,7 +475,7 @@ THE PLAN:
For a **new session:**
```bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
import sys, json
for line in sys.stdin:
line = line.strip()
@@ -508,7 +508,7 @@ for line in sys.stdin:
For a **resumed session** (user chose "Continue"):
```bash
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
<same python streaming parser as above>
"
```
@@ -544,10 +544,7 @@ Session saved — run /codex again to continue this conversation.
agentic coding model). This means as OpenAI ships newer models, /codex automatically
uses them. If the user wants a specific model, pass `-m` through to codex.
**Reasoning effort** varies by mode use the right level for each task:
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
- **Consult mode:** `high` — good balance of depth and speed for conversations.
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
+5 -8
View File
@@ -79,13 +79,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
2. Run the review (5-minute timeout):
```bash
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
```
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
(e.g., `/codex review focus on security`), pass them as the prompt argument:
```bash
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
```
3. Capture the output. Then parse cost from stderr:
@@ -240,7 +240,7 @@ THE PLAN:
For a **new session:**
```bash
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
import sys, json
for line in sys.stdin:
line = line.strip()
@@ -273,7 +273,7 @@ for line in sys.stdin:
For a **resumed session** (user chose "Continue"):
```bash
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
<same python streaming parser as above>
"
```
@@ -309,10 +309,7 @@ Session saved — run /codex again to continue this conversation.
agentic coding model). This means as OpenAI ships newer models, /codex automatically
uses them. If the user wants a specific model, pass `-m` through to codex.
**Reasoning effort** varies by mode use the right level for each task:
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
- **Consult mode:** `high` — good balance of depth and speed for conversations.
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.