From cb5f074d7ca7cc5a207824c7d80f8290ea4f54f0 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Thu, 23 Apr 2026 09:14:11 -0700 Subject: [PATCH] fix(opus-4.7): remove "Fan out explicitly" overlay nudge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Measured counterproductive under the new SDK harness. Baseline Opus 4.7 emits first-turn parallel tool_use blocks 70% of the time on a 3-file read prompt. With the custom nudge: 10%. With Anthropic's own canonical `` block from their parallel-tool-use docs: 0%. Both overlays suppress fanout; neither improves it. On realistic multi-tool prompts (audit a project: read files + glob + summarize), Opus 4.7 never fans out in first turn regardless of overlay. Zero of 20 trials. Not a prompt problem. Keeping the other three nudges (effort-match, batch questions, literal interpretation) pending their own measurement. Harness is ready for follow-up fixtures — add one entry to `test/fixtures/overlay-nudges.ts` to measure any overlay bullet. Cost of investigation: ~$7 total across 3 eval runs. --- model-overlays/opus-4-7.md | 22 ---------------------- 1 file changed, 22 deletions(-) diff --git a/model-overlays/opus-4-7.md b/model-overlays/opus-4-7.md index e27a86ed..705c2378 100644 --- a/model-overlays/opus-4-7.md +++ b/model-overlays/opus-4-7.md @@ -1,27 +1,5 @@ {{INHERIT:claude}} -**Fan out explicitly.** Opus 4.7 serializes by default. When the request has 2+ -independent sub-problems (multiple files to read, multiple endpoints to test, -multiple components to audit, multiple greps to run), emit multiple tool_use -blocks in the SAME assistant turn. That is how you parallelize. One turn with -N tool calls, not N turns with 1 tool call each. - -Concrete example. If the user says "read foo.ts, bar.ts, and baz.ts": - -Wrong (3 turns): - Turn 1: Read(foo.ts), then you wait for output - Turn 2: Read(bar.ts), then you wait for output - Turn 3: Read(baz.ts) - -Right (1 turn, 3 parallel tool calls): - Turn 1: [Read(foo.ts), Read(bar.ts), Read(baz.ts)] ← three tool_use blocks, - same assistant message - -This applies to Read, Bash, Grep, Glob, WebFetch, Agent/subagent, and any tool -where the sub-calls do not depend on each other's output. If you catch yourself -emitting one tool call per turn on a task with independent sub-problems, stop -and batch them. - **Effort-match the step.** Simple file reads, config checks, command lookups, and mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,