fix: per-mode reasoning effort defaults, add --xhigh override

xhigh reasoning uses ~23x more tokens and causes 50+ minute hangs
on large context tasks (OpenAI issues #8545, #8402, #6931).

Per-mode defaults for /codex skill:
- Review: high (bounded diff, needs thoroughness)
- Challenge: high (adversarial but bounded by diff)
- Consult: medium (large context, interactive, needs speed)

Also changes all Outside Voice / adversarial codex invocations
across gstack (resolvers, gen-skill-docs) from xhigh to high.
Users can override with --xhigh flag when they want max reasoning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-26 12:18:25 -06:00
parent ec069b9d07
commit 11977d46ae
10 changed files with 63 additions and 29 deletions
+2 -2
View File
@@ -1325,7 +1325,7 @@ describe('Codex skill', () => {
expect(content).toContain('fall back to the Claude adversarial subagent');
// Review log uses new skill name
expect(content).toContain('adversarial-review');
expect(content).toContain('xhigh');
expect(content).toContain('reasoning_effort="high"');
expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS');
});
@@ -1335,7 +1335,7 @@ describe('Codex skill', () => {
expect(content).toContain('< 50');
expect(content).toContain('200+');
expect(content).toContain('adversarial-review');
expect(content).toContain('xhigh');
expect(content).toContain('reasoning_effort="high"');
expect(content).toContain('Investigate and fix');
});