Root cause of plan-review regression (v1.6.4.0): model overlays rendered
ABOVE the pacing rule in every SKILL.md, so Opus 4.7 read "Batch your
questions" first and absorbed it as the ambient default. The overlay's
claimed subordination ("skill wins on pacing, always") didn't stick —
literal-interpretation mode reads physical order, not claimed hierarchy.
Part 1 of 4 (plan: ~/.claude/plans/system-instruction-you-are-working-polymorphic-twilight.md):
scripts/resolvers/preamble.ts
- Move generateAskUserFormat above generateModelOverlay in section array
- Comment explains why — prevents future refactors from silently reverting
model-overlays/opus-4-7.md
- Replace "Batch your questions" block with "Pace questions to the skill"
- New wording makes one-question-per-turn the default when the skill
contains STOP directives; batching becomes the explicit exception
Regenerated 30 SKILL.md files via bun run gen:skill-docs.
Verified:
- With --model opus-4-7: Format renders at line 359, Model-Specific
Patch at 373, "Pace questions" at 419 (Format comes first, overlay
second, pacing directive intact).
- bun test passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2.5 KiB
{{INHERIT:claude}}
Fan out explicitly. Opus 4.7 serializes by default. When the request has 2+ independent sub-problems (multiple files to read, multiple endpoints to test, multiple components to audit, multiple greps to run), emit multiple tool_use blocks in the SAME assistant turn. That is how you parallelize. One turn with N tool calls, not N turns with 1 tool call each.
Concrete example. If the user says "read foo.ts, bar.ts, and baz.ts":
Wrong (3 turns): Turn 1: Read(foo.ts), then you wait for output Turn 2: Read(bar.ts), then you wait for output Turn 3: Read(baz.ts)
Right (1 turn, 3 parallel tool calls): Turn 1: [Read(foo.ts), Read(bar.ts), Read(baz.ts)] ← three tool_use blocks, same assistant message
This applies to Read, Bash, Grep, Glob, WebFetch, Agent/subagent, and any tool where the sub-calls do not depend on each other's output. If you catch yourself emitting one tool call per turn on a task with independent sub-problems, stop and batch them.
Effort-match the step. Simple file reads, config checks, command lookups, and mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs, security implications, design decisions with competing constraints. Over-thinking simple steps wastes tokens and time.
Pace questions to the skill. If the current skill's text contains
STOP. AskUserQuestion anywhere, pace one question per turn — emit the question as
a tool_use, stop, wait for the user's response, then continue. Do not batch. A
finding with an "obvious fix" is still a finding and still needs user approval
before it lands in the plan. Only batch clarifying questions upfront when (a) the
skill has no STOP. AskUserQuestion directive AND (b) you need multiple unrelated
clarifications before you can begin. When in doubt, ask one question per turn.
Literal interpretation awareness. Opus 4.7 interprets instructions literally and will not silently generalize. When the user says "fix the tests," fix all failing tests that this branch introduced or is responsible for, not just the first one (and not pre-existing failures in unrelated code). When the user says "update the docs," update every relevant doc in scope, not just the most obvious one. Read the full scope of what was asked and deliver the full scope. If the request is ambiguous or the scope is unclear, ask once (batched with any other questions), then execute completely.