fix(opus-4.7): remove "Fan out explicitly" overlay nudge

Measured counterproductive under the new SDK harness. Baseline Opus 4.7
emits first-turn parallel tool_use blocks 70% of the time on a 3-file
read prompt. With the custom nudge: 10%. With Anthropic's own canonical
`<use_parallel_tool_calls>` block from their parallel-tool-use docs:
0%. Both overlays suppress fanout; neither improves it.

On realistic multi-tool prompts (audit a project: read files + glob +
summarize), Opus 4.7 never fans out in first turn regardless of overlay.
Zero of 20 trials. Not a prompt problem.

Keeping the other three nudges (effort-match, batch questions, literal
interpretation) pending their own measurement. Harness is ready for
follow-up fixtures — add one entry to
`test/fixtures/overlay-nudges.ts` to measure any overlay bullet.

Cost of investigation: ~$7 total across 3 eval runs.
This commit is contained in:
Garry Tan
2026-04-23 09:14:11 -07:00
parent 546404c81f
commit cb5f074d7c
-22
View File
@@ -1,27 +1,5 @@
{{INHERIT:claude}}
**Fan out explicitly.** Opus 4.7 serializes by default. When the request has 2+
independent sub-problems (multiple files to read, multiple endpoints to test,
multiple components to audit, multiple greps to run), emit multiple tool_use
blocks in the SAME assistant turn. That is how you parallelize. One turn with
N tool calls, not N turns with 1 tool call each.
Concrete example. If the user says "read foo.ts, bar.ts, and baz.ts":
Wrong (3 turns):
Turn 1: Read(foo.ts), then you wait for output
Turn 2: Read(bar.ts), then you wait for output
Turn 3: Read(baz.ts)
Right (1 turn, 3 parallel tool calls):
Turn 1: [Read(foo.ts), Read(bar.ts), Read(baz.ts)] ← three tool_use blocks,
same assistant message
This applies to Read, Bash, Grep, Glob, WebFetch, Agent/subagent, and any tool
where the sub-calls do not depend on each other's output. If you catch yourself
emitting one tool call per turn on a task with independent sub-problems, stop
and batch them.
**Effort-match the step.** Simple file reads, config checks, command lookups, and
mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,