From 6b99df9df7127de38ec676f8cf8cec8a359837f4 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Thu, 23 Apr 2026 16:41:36 -0700 Subject: [PATCH] feat(preamble): upgrade AskUserQuestion format to Pros/Cons decision brief MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Part 4 of 4 (plan: ~/.claude/plans/system-instruction-you-are-working-polymorphic-twilight.md). Every AskUserQuestion now renders as a decision brief, not a bullet list: D-numbered header, ELI10, Stakes-if-we-pick-wrong, Recommendation, Pros/Cons with ✅/❌ markers per option, closing Net: tradeoff synthesis. scripts/resolvers/preamble/generate-ask-user-format.ts - Full rewrite. Preserves prior rules (Re-ground, ELI10, Recommend, Completeness, Options) and adds: - D-numbering per skill invocation (model-level, not runtime state) - Stakes line (pain avoided / capability unlocked / consequence named) - Pros/Cons block with min 2 ✅ + 1 ❌ per option, min 40 chars/bullet - Hard-stop escape: "✅ No cons — this is a hard-stop choice" for genuine one-sided choices (destructive-action confirmations) - Neutral-posture handling (CT1-compliant): (recommended) label STAYS on default option to preserve AUTO_DECIDE contract; neutrality expressed as prose in Recommendation line only - Net line closes the decision with a one-sentence tradeoff frame - Rule 11: tool_use mandate (prose "Question:" blocks don't count) - Self-check list before emitting test/skill-validation.test.ts - Update format assertions to check for new Pros/Cons tokens (Pros / cons:, Recommendation: , Net:, ELI10, Stakes if we pick wrong:, ✅, ❌) across all tier-2+ skills - Old "RECOMMENDATION: Choose" expectation removed (the new format uses mixed-case "Recommendation:" with no literal "Choose") test/skill-e2e-plan-format.test.ts - Add v1.7.0.0 format token regexes (PROS_CONS_HEADER_RE, PRO_BULLET_RE, CON_BULLET_RE, NET_LINE_RE, D_NUMBER_RE, STAKES_RE) - Existing RECOMMENDATION_RE loosened to accept mixed-case "Recommendation:" (canonical v1.7.0.0 form) alongside all-caps (legacy). Tests are additive — the strict new-format gate is the upcoming cadence eval. Regenerated 30 SKILL.md files via bun run gen:skill-docs. Verified: - bun test: 319 pass (1 pre-existing security-bench fixture oversize failure on main, unrelated — confirmed via git stash test on main HEAD) - New format tokens render in all tier-2+ skills (plan-ceo-review, plan-eng-review, ship, office-hours verified) Co-Authored-By: Claude Opus 4.7 (1M context) --- autoplan/SKILL.md | 131 +++++++++++++++-- canary/SKILL.md | 131 +++++++++++++++-- codex/SKILL.md | 131 +++++++++++++++-- context-restore/SKILL.md | 131 +++++++++++++++-- context-save/SKILL.md | 131 +++++++++++++++-- cso/SKILL.md | 131 +++++++++++++++-- design-consultation/SKILL.md | 131 +++++++++++++++-- design-html/SKILL.md | 131 +++++++++++++++-- design-review/SKILL.md | 131 +++++++++++++++-- design-shotgun/SKILL.md | 131 +++++++++++++++-- devex-review/SKILL.md | 131 +++++++++++++++-- document-release/SKILL.md | 131 +++++++++++++++-- health/SKILL.md | 131 +++++++++++++++-- investigate/SKILL.md | 131 +++++++++++++++-- land-and-deploy/SKILL.md | 131 +++++++++++++++-- learn/SKILL.md | 131 +++++++++++++++-- office-hours/SKILL.md | 131 +++++++++++++++-- open-gstack-browser/SKILL.md | 131 +++++++++++++++-- pair-agent/SKILL.md | 131 +++++++++++++++-- plan-ceo-review/SKILL.md | 131 +++++++++++++++-- plan-design-review/SKILL.md | 131 +++++++++++++++-- plan-devex-review/SKILL.md | 131 +++++++++++++++-- plan-eng-review/SKILL.md | 131 +++++++++++++++-- plan-tune/SKILL.md | 131 +++++++++++++++-- qa-only/SKILL.md | 131 +++++++++++++++-- qa/SKILL.md | 131 +++++++++++++++-- retro/SKILL.md | 131 +++++++++++++++-- review/SKILL.md | 131 +++++++++++++++-- .../preamble/generate-ask-user-format.ts | 132 ++++++++++++++++-- setup-deploy/SKILL.md | 131 +++++++++++++++-- ship/SKILL.md | 131 +++++++++++++++-- test/skill-e2e-plan-format.test.ts | 17 ++- test/skill-validation.test.ts | 15 +- 33 files changed, 3842 insertions(+), 252 deletions(-) diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md index f8fe68c8..64ce55b8 100644 --- a/autoplan/SKILL.md +++ b/autoplan/SKILL.md @@ -360,17 +360,132 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions: ## AskUserQuestion Format -**ALWAYS follow this structure for every AskUserQuestion call. All four elements are non-skippable. If you find yourself about to skip any of them, stop and back up.** +**ALWAYS follow this structure for every AskUserQuestion call. Every element is non-skippable. If you find yourself about to skip any of them, stop and back up.** -1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences) -2. **Simplify (ELI10, ALWAYS):** Explain what's happening in plain English a smart 16-year-old could follow. Concrete examples and analogies, not function names or internal jargon. Say what it DOES, not what it's called. State the stakes: what breaks if we pick wrong. This is NOT optional verbosity and it is NOT preamble — the user is about to make a decision and needs context. Even if you'd normally stay terse, emit the ELI10 paragraph. The user will ask for it anyway; do it the first time. -3. **Recommend (ALWAYS):** Every question ends with `RECOMMENDATION: Choose [X] because [one-line reason]` on its own line. Never omit it. Never collapse it into the options list. Required for every AskUserQuestion, regardless of whether the options are coverage-differentiated or different-in-kind. -4. **Score completeness (when meaningful):** When options differ in coverage (e.g. full test coverage vs happy path vs shortcut, complete error handling vs partial), score each with `Completeness: N/10` on its own line. Calibration: 10 = complete (all edge cases, full coverage), 7 = happy path only, 3 = shortcut. Flag any option ≤5 where a higher-completeness option exists. When options differ in kind (picking a review posture, picking an architectural approach, cherry-pick Add/Defer/Skip, choosing between two different kinds of systems), the completeness axis doesn't apply — skip `Completeness: N/10` entirely and write one line: `Note: options differ in kind, not coverage — no completeness score.` Do not fabricate filler scores. -5. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)` +### Required shape -Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex. +Every AskUserQuestion reads like a decision brief, not a bullet list: -Per-skill instructions may add additional formatting rules on top of this baseline. +``` +D + +ELI10: + +Stakes if we pick wrong: + +Recommendation: because + +Completeness: A=X/10, B=Y/10 (or: Note: options differ in kind, not coverage — no completeness score) + +Pros / cons: + +A)