refactor(opus-4.7): split overlay, align routing, fix trailer fallback

Follow-up to wintermute's initial Opus 4.7 migration commit (addresses
ship-quality review findings before v1.6.1.0 release).

Overlay split (model-overlays/):
  - Move 4 Opus-4.7-specific nudges (Fan out, Effort-match, Batch your
    questions, Literal interpretation) from claude.md into new
    opus-4-7.md with {{INHERIT:claude}}
  - claude.md now holds only model-agnostic nudges (Todo discipline,
    Think before heavy, Dedicated tools over Bash)
  - Prevents Opus-4.7-specific guidance leaking onto Sonnet/Haiku
  - Uses existing {{INHERIT:claude}} mechanism at
    scripts/resolvers/model-overlay.ts:28-43

scripts/models.ts:
  - Add opus-4-7 to ALL_MODEL_NAMES
  - resolveModel: claude-opus-4-7-* variants route to opus-4-7,
    all other claude-* variants continue to route to claude

scripts/resolvers/utility.ts:
  - Update coAuthor trailer fallback: Opus 4.6 -> Opus 4.7
    (fallback was missed in the initial migration commit)

scripts/resolvers/preamble/generate-routing-injection.ts:
  - Align policy with new SKILL.md.tmpl: soft "when in doubt, invoke"
    instead of hard "ALWAYS invoke... Do NOT answer directly"
  - Replace stale /checkpoint reference with /context-save +
    /context-restore (skills were renamed in v1.0.1.0)
  - Expand route coverage to match full skill inventory:
    /plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
    /setup-deploy, /canary, /open-gstack-browser,
    /setup-browser-cookies, /benchmark, /learn, /plan-tune, /health

scripts/resolvers/preamble/generate-voice-directive.ts:
  - Voice example closing: "Want me to ship it?" -> "Want me to fix it?"
  - Preserves directness while routing through review gates

SKILL.md.tmpl:
  - Add routing triggers for skills that were missing from the list:
    /plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
    /setup-deploy, /canary, /open-gstack-browser,
    /setup-browser-cookies, /benchmark, /learn, /plan-tune, /health
  - Within Opus 4.7 overlay, added scope boundary to
    "Literal interpretation" nudge ("fix tests that this branch
    introduced or is responsible for")
  - Added pacing exception to "Batch your questions" nudge so skills
    that require one-question-at-a-time pacing still win

Follow-up commit will regenerate SKILL.md files + update goldens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-21 23:39:42 -07:00
parent 4d260db78a
commit da75ebaaa0
7 changed files with 81 additions and 42 deletions
+12
View File
@@ -36,12 +36,18 @@ quality gates that produce better results than answering inline.
- User asks to review architecture, lock in the plan, "does this design make sense" → invoke `/plan-eng-review`
- User asks about design system, brand, visual identity, "how should this look" → invoke `/design-consultation`
- User asks to review design of a plan → invoke `/plan-design-review`
- User asks about developer experience of a plan, API/CLI/SDK design → invoke `/plan-devex-review`
- User wants all reviews done automatically, "review everything" → invoke `/autoplan`
- User reports a bug, error, broken behavior, "why is this broken", "this doesn't work", "wtf", "something's wrong" → invoke `/investigate`
- User asks to test the site, find bugs, QA, "does this work", "check the deploy" → invoke `/qa`
- User asks to just report bugs without fixing → invoke `/qa-only`
- User asks to review code, check the diff, pre-landing review, "look at my changes" → invoke `/review`
- User asks about visual polish, design audit of a live site, "this looks off" → invoke `/design-review`
- User asks to audit the live developer experience, time-to-hello-world → invoke `/devex-review`
- User asks to ship, deploy, push, create a PR, "let's land this", "send it" → invoke `/ship`
- User asks to merge + deploy + verify as one flow → invoke `/land-and-deploy`
- User asks to configure deployment for the project → invoke `/setup-deploy`
- User asks to monitor prod after shipping, post-deploy checks → invoke `/canary`
- User asks to update docs after shipping → invoke `/document-release`
- User asks for a weekly retro, what did we ship, "how'd we do" → invoke `/retro`
- User asks for a second opinion, codex review → invoke `/codex`
@@ -52,6 +58,12 @@ quality gates that produce better results than answering inline.
- User asks to resume, restore, "where was I" → invoke `/context-restore`
- User asks about security, OWASP, vulnerabilities, "is this secure" → invoke `/cso`
- User asks to make a PDF, document, publication → invoke `/make-pdf`
- User asks to launch a real browser for QA, "open the browser" → invoke `/open-gstack-browser`
- User asks to import cookies for authenticated testing → invoke `/setup-browser-cookies`
- User asks about page speed, performance regression, benchmarks → invoke `/benchmark`
- User asks what gstack has learned, "show learnings" → invoke `/learn`
- User asks to tune question sensitivity, "stop asking me that" → invoke `/plan-tune`
- User asks for code quality dashboard, "health check" → invoke `/health`
**When in doubt, invoke the skill.** A false positive (invoking a skill that wasn't
needed) is cheaper than a false negative (answering ad-hoc when a structured workflow
-24
View File
@@ -8,27 +8,3 @@ the user course-correct cheaply instead of mid-flight.
**Dedicated tools over Bash.** Prefer Read, Edit, Write, Glob, Grep over shell
equivalents (cat, sed, find, grep). The dedicated tools are cheaper and clearer.
**Fan out explicitly.** Opus 4.7 defaults to sequential work and spawns fewer
subagents than 4.6. When a task has independent sub-problems (investigating multiple
files, testing multiple endpoints, auditing multiple components), explicitly parallelize:
spawn subagents in the same turn, run independent checks concurrently, don't serialize
work that has no dependencies. If you catch yourself doing A then B then C where none
depend on each other, stop and do all three at once.
**Effort-match the step.** Simple file reads, config checks, command lookups, and
mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,
security implications, design decisions with competing constraints. Over-thinking
simple steps wastes tokens and time.
**Batch your questions.** If you need to clarify multiple things before proceeding,
ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per
turn. Three questions in one message beats three back-and-forth exchanges.
**Literal interpretation awareness.** Opus 4.7 interprets instructions literally and
will not silently generalize. When the user says "fix the tests," fix ALL failing tests,
not just the first one. When the user says "update the docs," update every relevant doc,
not just the most obvious one. Read the full scope of what was asked and deliver the
full scope. If the request is ambiguous, ask once (batched with any other questions),
then execute completely.
+29
View File
@@ -0,0 +1,29 @@
{{INHERIT:claude}}
**Fan out explicitly.** Opus 4.7 defaults to sequential work and spawns fewer
subagents than 4.6. When a task has independent sub-problems (investigating multiple
files, testing multiple endpoints, auditing multiple components), explicitly parallelize:
spawn subagents in the same turn, run independent checks concurrently, don't serialize
work that has no dependencies. If you catch yourself doing A then B then C where none
depend on each other, stop and do all three at once.
**Effort-match the step.** Simple file reads, config checks, command lookups, and
mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,
security implications, design decisions with competing constraints. Over-thinking
simple steps wastes tokens and time.
**Batch your questions.** If you need to clarify multiple things before proceeding,
ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per
turn. Three questions in one message beats three back-and-forth exchanges. Exception:
skill workflows that explicitly require one-question-at-a-time pacing (e.g., plan
review skills with "STOP. AskUserQuestion once per issue. Do NOT batch.") override this
nudge. The skill wins on pacing, always.
**Literal interpretation awareness.** Opus 4.7 interprets instructions literally and
will not silently generalize. When the user says "fix the tests," fix all failing tests
that this branch introduced or is responsible for, not just the first one (and not
pre-existing failures in unrelated code). When the user says "update the docs," update
every relevant doc in scope, not just the most obvious one. Read the full scope of what
was asked and deliver the full scope. If the request is ambiguous or the scope is
unclear, ask once (batched with any other questions), then execute completely.
+2
View File
@@ -13,6 +13,7 @@
export const ALL_MODEL_NAMES = [
'claude',
'opus-4-7',
'gpt',
'gpt-5.4',
'gemini',
@@ -51,6 +52,7 @@ export function resolveModel(input: string): Model | null {
if (/^gpt-5\.4(-|$)/.test(s)) return 'gpt-5.4';
if (/^gpt(-|$)/.test(s)) return 'gpt';
if (/^o[0-9]+(-|$)/.test(s)) return 'o-series';
if (/^claude-opus-4-7(-|$)/.test(s)) return 'opus-4-7';
if (/^claude(-|$)/.test(s)) return 'claude';
if (/^gemini(-|$)/.test(s)) return 'gemini';
@@ -20,23 +20,44 @@ If A: Append this section to the end of CLAUDE.md:
## Skill routing
When the user's request matches an available skill, ALWAYS invoke it using the Skill
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
The skill has specialized workflows that produce better results than ad-hoc answers.
When the user's request matches an available skill, invoke it via the Skill tool. The
skill has multi-step workflows, checklists, and quality gates that produce better
results than an ad-hoc answer. When in doubt, invoke the skill. A false positive is
cheaper than a false negative.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
- Ship, deploy, push, create PR → invoke ship
- QA, test the site, find bugs → invoke qa
- Code review, check my diff → invoke review
- Update docs after shipping → invoke document-release
- Weekly retro → invoke retro
- Design system, brand → invoke design-consultation
- Visual audit, design polish → invoke design-review
- Architecture review → invoke plan-eng-review
- Save progress, checkpoint, resume → invoke checkpoint
- Code quality, health check → invoke health
- Product ideas, "is this worth building", brainstorming → invoke /office-hours
- Strategy, scope, "think bigger", "what should we build" → invoke /plan-ceo-review
- Architecture, "does this design make sense" → invoke /plan-eng-review
- Design system, brand, "how should this look" → invoke /design-consultation
- Design review of a plan → invoke /plan-design-review
- Developer experience of a plan → invoke /plan-devex-review
- "Review everything", full review pipeline → invoke /autoplan
- Bugs, errors, "why is this broken", "wtf", "this doesn't work" → invoke /investigate
- Test the site, find bugs, "does this work" → invoke /qa (or /qa-only for report only)
- Code review, check the diff, "look at my changes" → invoke /review
- Visual polish, design audit, "this looks off" → invoke /design-review
- Developer experience audit, try onboarding → invoke /devex-review
- Ship, deploy, create a PR, "send it" → invoke /ship
- Merge + deploy + verify → invoke /land-and-deploy
- Configure deployment → invoke /setup-deploy
- Post-deploy monitoring → invoke /canary
- Update docs after shipping → invoke /document-release
- Weekly retro, "how'd we do" → invoke /retro
- Second opinion, codex review → invoke /codex
- Safety mode, careful mode, lock it down → invoke /careful or /guard
- Restrict edits to a directory → invoke /freeze or /unfreeze
- Upgrade gstack → invoke /gstack-upgrade
- Save progress, "save my work" → invoke /context-save
- Resume, restore, "where was I" → invoke /context-restore
- Security audit, OWASP, "is this secure" → invoke /cso
- Make a PDF, document, publication → invoke /make-pdf
- Launch real browser for QA → invoke /open-gstack-browser
- Import cookies for authenticated testing → invoke /setup-browser-cookies
- Performance regression, page speed, benchmarks → invoke /benchmark
- Review what gstack has learned → invoke /learn
- Tune question sensitivity → invoke /plan-tune
- Code quality dashboard → invoke /health
\`\`\`
Then commit the change: \`git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"\`
@@ -46,4 +67,3 @@ Say "No problem. You can add routing rules later by running \`gstack-config set
This only happens once per project. If \`HAS_ROUTING\` is \`yes\` or \`ROUTING_DECLINED\` is \`true\`, skip this entirely.`;
}
@@ -56,7 +56,7 @@ Avoid filler, throat-clearing, generic optimism, founder cosplay, and unsupporte
- End with what to do. Give the action.
**Example of the right voice:**
"auth.ts:47 returns undefined when the session cookie expires. Your users hit a white screen. Fix: add a null check and redirect to /login. Two lines. Want me to ship it?"
"auth.ts:47 returns undefined when the session cookie expires. Your users hit a white screen. Fix: add a null check and redirect to /login. Two lines. Want me to fix it?"
Not: "I've identified a potential issue in the authentication flow that may cause problems for some users under certain conditions. Let me explain the approach I'd recommend..."
**Final test:** does this sound like a real cross-functional builder who wants to help someone make something people want, ship it, and make it actually work?`;
+1 -1
View File
@@ -369,7 +369,7 @@ Minimum 0 per category.
export function generateCoAuthorTrailer(ctx: TemplateContext): string {
const { getHostConfig } = require('../../hosts/index');
const hostConfig = getHostConfig(ctx.host);
return hostConfig.coAuthorTrailer || 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
return hostConfig.coAuthorTrailer || 'Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>';
}
export function generateChangelogWorkflow(_ctx: TemplateContext): string {