From da75ebaaa02c247aa035db6b860043a8650c13c4 Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Tue, 21 Apr 2026 23:39:42 -0700
Subject: [PATCH] refactor(opus-4.7): split overlay, align routing, fix trailer
 fallback

Follow-up to wintermute's initial Opus 4.7 migration commit (addresses
ship-quality review findings before v1.6.1.0 release).

Overlay split (model-overlays/):
  - Move 4 Opus-4.7-specific nudges (Fan out, Effort-match, Batch your
    questions, Literal interpretation) from claude.md into new
    opus-4-7.md with {{INHERIT:claude}}
  - claude.md now holds only model-agnostic nudges (Todo discipline,
    Think before heavy, Dedicated tools over Bash)
  - Prevents Opus-4.7-specific guidance leaking onto Sonnet/Haiku
  - Uses existing {{INHERIT:claude}} mechanism at
    scripts/resolvers/model-overlay.ts:28-43

scripts/models.ts:
  - Add opus-4-7 to ALL_MODEL_NAMES
  - resolveModel: claude-opus-4-7-* variants route to opus-4-7,
    all other claude-* variants continue to route to claude

scripts/resolvers/utility.ts:
  - Update coAuthor trailer fallback: Opus 4.6 -> Opus 4.7
    (fallback was missed in the initial migration commit)

scripts/resolvers/preamble/generate-routing-injection.ts:
  - Align policy with new SKILL.md.tmpl: soft "when in doubt, invoke"
    instead of hard "ALWAYS invoke... Do NOT answer directly"
  - Replace stale /checkpoint reference with /context-save +
    /context-restore (skills were renamed in v1.0.1.0)
  - Expand route coverage to match full skill inventory:
    /plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
    /setup-deploy, /canary, /open-gstack-browser,
    /setup-browser-cookies, /benchmark, /learn, /plan-tune, /health

scripts/resolvers/preamble/generate-voice-directive.ts:
  - Voice example closing: "Want me to ship it?" -> "Want me to fix it?"
  - Preserves directness while routing through review gates

SKILL.md.tmpl:
  - Add routing triggers for skills that were missing from the list:
    /plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
    /setup-deploy, /canary, /open-gstack-browser,
    /setup-browser-cookies, /benchmark, /learn, /plan-tune, /health
  - Within Opus 4.7 overlay, added scope boundary to
    "Literal interpretation" nudge ("fix tests that this branch
    introduced or is responsible for")
  - Added pacing exception to "Batch your questions" nudge so skills
    that require one-question-at-a-time pacing still win

Follow-up commit will regenerate SKILL.md files + update goldens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 SKILL.md.tmpl                                 | 12 +++++
 model-overlays/claude.md                      | 24 ---------
 model-overlays/opus-4-7.md                    | 29 +++++++++++
 scripts/models.ts                             |  2 +
 .../preamble/generate-routing-injection.ts    | 52 +++++++++++++------
 .../preamble/generate-voice-directive.ts      |  2 +-
 scripts/resolvers/utility.ts                  |  2 +-
 7 files changed, 81 insertions(+), 42 deletions(-)
 create mode 100644 model-overlays/opus-4-7.md

diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl
index 6936089a..a248cbfa 100644
--- a/SKILL.md.tmpl
+++ b/SKILL.md.tmpl
@@ -36,12 +36,18 @@ quality gates that produce better results than answering inline.
 - User asks to review architecture, lock in the plan, "does this design make sense" → invoke `/plan-eng-review`
 - User asks about design system, brand, visual identity, "how should this look" → invoke `/design-consultation`
 - User asks to review design of a plan → invoke `/plan-design-review`
+- User asks about developer experience of a plan, API/CLI/SDK design → invoke `/plan-devex-review`
 - User wants all reviews done automatically, "review everything" → invoke `/autoplan`
 - User reports a bug, error, broken behavior, "why is this broken", "this doesn't work", "wtf", "something's wrong" → invoke `/investigate`
 - User asks to test the site, find bugs, QA, "does this work", "check the deploy" → invoke `/qa`
+- User asks to just report bugs without fixing → invoke `/qa-only`
 - User asks to review code, check the diff, pre-landing review, "look at my changes" → invoke `/review`
 - User asks about visual polish, design audit of a live site, "this looks off" → invoke `/design-review`
+- User asks to audit the live developer experience, time-to-hello-world → invoke `/devex-review`
 - User asks to ship, deploy, push, create a PR, "let's land this", "send it" → invoke `/ship`
+- User asks to merge + deploy + verify as one flow → invoke `/land-and-deploy`
+- User asks to configure deployment for the project → invoke `/setup-deploy`
+- User asks to monitor prod after shipping, post-deploy checks → invoke `/canary`
 - User asks to update docs after shipping → invoke `/document-release`
 - User asks for a weekly retro, what did we ship, "how'd we do" → invoke `/retro`
 - User asks for a second opinion, codex review → invoke `/codex`
@@ -52,6 +58,12 @@ quality gates that produce better results than answering inline.
 - User asks to resume, restore, "where was I" → invoke `/context-restore`
 - User asks about security, OWASP, vulnerabilities, "is this secure" → invoke `/cso`
 - User asks to make a PDF, document, publication → invoke `/make-pdf`
+- User asks to launch a real browser for QA, "open the browser" → invoke `/open-gstack-browser`
+- User asks to import cookies for authenticated testing → invoke `/setup-browser-cookies`
+- User asks about page speed, performance regression, benchmarks → invoke `/benchmark`
+- User asks what gstack has learned, "show learnings" → invoke `/learn`
+- User asks to tune question sensitivity, "stop asking me that" → invoke `/plan-tune`
+- User asks for code quality dashboard, "health check" → invoke `/health`
 
 **When in doubt, invoke the skill.** A false positive (invoking a skill that wasn't
 needed) is cheaper than a false negative (answering ad-hoc when a structured workflow
diff --git a/model-overlays/claude.md b/model-overlays/claude.md
index 7264f8b8..95943af5 100644
--- a/model-overlays/claude.md
+++ b/model-overlays/claude.md
@@ -8,27 +8,3 @@ the user course-correct cheaply instead of mid-flight.
 
 **Dedicated tools over Bash.** Prefer Read, Edit, Write, Glob, Grep over shell
 equivalents (cat, sed, find, grep). The dedicated tools are cheaper and clearer.
-
-**Fan out explicitly.** Opus 4.7 defaults to sequential work and spawns fewer
-subagents than 4.6. When a task has independent sub-problems (investigating multiple
-files, testing multiple endpoints, auditing multiple components), explicitly parallelize:
-spawn subagents in the same turn, run independent checks concurrently, don't serialize
-work that has no dependencies. If you catch yourself doing A then B then C where none
-depend on each other, stop and do all three at once.
-
-**Effort-match the step.** Simple file reads, config checks, command lookups, and
-mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
-extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,
-security implications, design decisions with competing constraints. Over-thinking
-simple steps wastes tokens and time.
-
-**Batch your questions.** If you need to clarify multiple things before proceeding,
-ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per
-turn. Three questions in one message beats three back-and-forth exchanges.
-
-**Literal interpretation awareness.** Opus 4.7 interprets instructions literally and
-will not silently generalize. When the user says "fix the tests," fix ALL failing tests,
-not just the first one. When the user says "update the docs," update every relevant doc,
-not just the most obvious one. Read the full scope of what was asked and deliver the
-full scope. If the request is ambiguous, ask once (batched with any other questions),
-then execute completely.
diff --git a/model-overlays/opus-4-7.md b/model-overlays/opus-4-7.md
new file mode 100644
index 00000000..164ed6a3
--- /dev/null
+++ b/model-overlays/opus-4-7.md
@@ -0,0 +1,29 @@
+{{INHERIT:claude}}
+
+**Fan out explicitly.** Opus 4.7 defaults to sequential work and spawns fewer
+subagents than 4.6. When a task has independent sub-problems (investigating multiple
+files, testing multiple endpoints, auditing multiple components), explicitly parallelize:
+spawn subagents in the same turn, run independent checks concurrently, don't serialize
+work that has no dependencies. If you catch yourself doing A then B then C where none
+depend on each other, stop and do all three at once.
+
+**Effort-match the step.** Simple file reads, config checks, command lookups, and
+mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
+extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,
+security implications, design decisions with competing constraints. Over-thinking
+simple steps wastes tokens and time.
+
+**Batch your questions.** If you need to clarify multiple things before proceeding,
+ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per
+turn. Three questions in one message beats three back-and-forth exchanges. Exception:
+skill workflows that explicitly require one-question-at-a-time pacing (e.g., plan
+review skills with "STOP. AskUserQuestion once per issue. Do NOT batch.") override this
+nudge. The skill wins on pacing, always.
+
+**Literal interpretation awareness.** Opus 4.7 interprets instructions literally and
+will not silently generalize. When the user says "fix the tests," fix all failing tests
+that this branch introduced or is responsible for, not just the first one (and not
+pre-existing failures in unrelated code). When the user says "update the docs," update
+every relevant doc in scope, not just the most obvious one. Read the full scope of what
+was asked and deliver the full scope. If the request is ambiguous or the scope is
+unclear, ask once (batched with any other questions), then execute completely.
diff --git a/scripts/models.ts b/scripts/models.ts
index b84608f6..b6d1d368 100644
--- a/scripts/models.ts
+++ b/scripts/models.ts
@@ -13,6 +13,7 @@
 
 export const ALL_MODEL_NAMES = [
   'claude',
+  'opus-4-7',
   'gpt',
   'gpt-5.4',
   'gemini',
@@ -51,6 +52,7 @@ export function resolveModel(input: string): Model | null {
   if (/^gpt-5\.4(-|$)/.test(s)) return 'gpt-5.4';
   if (/^gpt(-|$)/.test(s)) return 'gpt';
   if (/^o[0-9]+(-|$)/.test(s)) return 'o-series';
+  if (/^claude-opus-4-7(-|$)/.test(s)) return 'opus-4-7';
   if (/^claude(-|$)/.test(s)) return 'claude';
   if (/^gemini(-|$)/.test(s)) return 'gemini';
 
diff --git a/scripts/resolvers/preamble/generate-routing-injection.ts b/scripts/resolvers/preamble/generate-routing-injection.ts
index 1c05c284..0768a307 100644
--- a/scripts/resolvers/preamble/generate-routing-injection.ts
+++ b/scripts/resolvers/preamble/generate-routing-injection.ts
@@ -20,23 +20,44 @@ If A: Append this section to the end of CLAUDE.md:
 
 ## Skill routing
 
-When the user's request matches an available skill, ALWAYS invoke it using the Skill
-tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
-The skill has specialized workflows that produce better results than ad-hoc answers.
+When the user's request matches an available skill, invoke it via the Skill tool. The
+skill has multi-step workflows, checklists, and quality gates that produce better
+results than an ad-hoc answer. When in doubt, invoke the skill. A false positive is
+cheaper than a false negative.
 
 Key routing rules:
-- Product ideas, "is this worth building", brainstorming → invoke office-hours
-- Bugs, errors, "why is this broken", 500 errors → invoke investigate
-- Ship, deploy, push, create PR → invoke ship
-- QA, test the site, find bugs → invoke qa
-- Code review, check my diff → invoke review
-- Update docs after shipping → invoke document-release
-- Weekly retro → invoke retro
-- Design system, brand → invoke design-consultation
-- Visual audit, design polish → invoke design-review
-- Architecture review → invoke plan-eng-review
-- Save progress, checkpoint, resume → invoke checkpoint
-- Code quality, health check → invoke health
+- Product ideas, "is this worth building", brainstorming → invoke /office-hours
+- Strategy, scope, "think bigger", "what should we build" → invoke /plan-ceo-review
+- Architecture, "does this design make sense" → invoke /plan-eng-review
+- Design system, brand, "how should this look" → invoke /design-consultation
+- Design review of a plan → invoke /plan-design-review
+- Developer experience of a plan → invoke /plan-devex-review
+- "Review everything", full review pipeline → invoke /autoplan
+- Bugs, errors, "why is this broken", "wtf", "this doesn't work" → invoke /investigate
+- Test the site, find bugs, "does this work" → invoke /qa (or /qa-only for report only)
+- Code review, check the diff, "look at my changes" → invoke /review
+- Visual polish, design audit, "this looks off" → invoke /design-review
+- Developer experience audit, try onboarding → invoke /devex-review
+- Ship, deploy, create a PR, "send it" → invoke /ship
+- Merge + deploy + verify → invoke /land-and-deploy
+- Configure deployment → invoke /setup-deploy
+- Post-deploy monitoring → invoke /canary
+- Update docs after shipping → invoke /document-release
+- Weekly retro, "how'd we do" → invoke /retro
+- Second opinion, codex review → invoke /codex
+- Safety mode, careful mode, lock it down → invoke /careful or /guard
+- Restrict edits to a directory → invoke /freeze or /unfreeze
+- Upgrade gstack → invoke /gstack-upgrade
+- Save progress, "save my work" → invoke /context-save
+- Resume, restore, "where was I" → invoke /context-restore
+- Security audit, OWASP, "is this secure" → invoke /cso
+- Make a PDF, document, publication → invoke /make-pdf
+- Launch real browser for QA → invoke /open-gstack-browser
+- Import cookies for authenticated testing → invoke /setup-browser-cookies
+- Performance regression, page speed, benchmarks → invoke /benchmark
+- Review what gstack has learned → invoke /learn
+- Tune question sensitivity → invoke /plan-tune
+- Code quality dashboard → invoke /health
 \`\`\`
 
 Then commit the change: \`git add CLAUDE.md && git commit -m "chore: add gstack skill routing rules to CLAUDE.md"\`
@@ -46,4 +67,3 @@ Say "No problem. You can add routing rules later by running \`gstack-config set
 
 This only happens once per project. If \`HAS_ROUTING\` is \`yes\` or \`ROUTING_DECLINED\` is \`true\`, skip this entirely.`;
 }
-
diff --git a/scripts/resolvers/preamble/generate-voice-directive.ts b/scripts/resolvers/preamble/generate-voice-directive.ts
index 539c8d3d..a175c08f 100644
--- a/scripts/resolvers/preamble/generate-voice-directive.ts
+++ b/scripts/resolvers/preamble/generate-voice-directive.ts
@@ -56,7 +56,7 @@ Avoid filler, throat-clearing, generic optimism, founder cosplay, and unsupporte
 - End with what to do. Give the action.
 
 **Example of the right voice:**
-"auth.ts:47 returns undefined when the session cookie expires. Your users hit a white screen. Fix: add a null check and redirect to /login. Two lines. Want me to ship it?"
+"auth.ts:47 returns undefined when the session cookie expires. Your users hit a white screen. Fix: add a null check and redirect to /login. Two lines. Want me to fix it?"
 Not: "I've identified a potential issue in the authentication flow that may cause problems for some users under certain conditions. Let me explain the approach I'd recommend..."
 
 **Final test:** does this sound like a real cross-functional builder who wants to help someone make something people want, ship it, and make it actually work?`;
diff --git a/scripts/resolvers/utility.ts b/scripts/resolvers/utility.ts
index 83934b07..3d2e368a 100644
--- a/scripts/resolvers/utility.ts
+++ b/scripts/resolvers/utility.ts
@@ -369,7 +369,7 @@ Minimum 0 per category.
 export function generateCoAuthorTrailer(ctx: TemplateContext): string {
   const { getHostConfig } = require('../../hosts/index');
   const hostConfig = getHostConfig(ctx.host);
-  return hostConfig.coAuthorTrailer || 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
+  return hostConfig.coAuthorTrailer || 'Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>';
 }
 
 export function generateChangelogWorkflow(_ctx: TemplateContext): string {