gstack/model-overlays/opus-4-7.md at 04e2f1bea92c225dcfe8e96406f49bbf3a027113

mirror of https://github.com/garrytan/gstack.git synced 2026-05-07 22:16:52 +02:00

Files

T

Garry Tan cb5f074d7c fix(opus-4.7): remove "Fan out explicitly" overlay nudge

Measured counterproductive under the new SDK harness. Baseline Opus 4.7
emits first-turn parallel tool_use blocks 70% of the time on a 3-file
read prompt. With the custom nudge: 10%. With Anthropic's own canonical
`<use_parallel_tool_calls>` block from their parallel-tool-use docs:
0%. Both overlays suppress fanout; neither improves it.

On realistic multi-tool prompts (audit a project: read files + glob +
summarize), Opus 4.7 never fans out in first turn regardless of overlay.
Zero of 20 trials. Not a prompt problem.

Keeping the other three nudges (effort-match, batch questions, literal
interpretation) pending their own measurement. Harness is ready for
follow-up fixtures — add one entry to
`test/fixtures/overlay-nudges.ts` to measure any overlay bullet.

Cost of investigation: ~$7 total across 3 eval runs.

2026-04-23 09:14:11 -07:00

1.4 KiB

Raw Blame History

Effort-match the step. Simple file reads, config checks, command lookups, and mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs, security implications, design decisions with competing constraints. Over-thinking simple steps wastes tokens and time.

Batch your questions. If you need to clarify multiple things before proceeding, ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per turn. Three questions in one message beats three back-and-forth exchanges. Exception: skill workflows that explicitly require one-question-at-a-time pacing (e.g., plan review skills with "STOP. AskUserQuestion once per issue. Do NOT batch.") override this nudge. The skill wins on pacing, always.

Literal interpretation awareness. Opus 4.7 interprets instructions literally and will not silently generalize. When the user says "fix the tests," fix all failing tests that this branch introduced or is responsible for, not just the first one (and not pre-existing failures in unrelated code). When the user says "update the docs," update every relevant doc in scope, not just the most obvious one. Read the full scope of what was asked and deliver the full scope. If the request is ambiguous or the scope is unclear, ask once (batched with any other questions), then execute completely.

1.4 KiB Raw Blame History

1.4 KiB

Raw Blame History