* feat(preamble): add "Handling 5+ options — split, never drop" rule Agents repeatedly hit Conductor's 4-option AskUserQuestion cap and silently drop one option to fit, shrinking the user's decision space. This rule names the bug and gives two compliant shapes: batch into ≤4-groups (for coherent alternatives) or split into N sequential per-option calls (for independent scope items, default). Inline preamble subsection is ~15 lines (rule + buckets + pointer). Full reference with worked examples, Hold/dependency semantics, and final-summary validation lives in docs/askuserquestion-split.md. The agent loads the docs file on demand when N>4. Per-option call shape: D<N>.k header, ELI10, Recommendation, kind-note (no completeness score — decision actions, not coverage), Include / Defer / Cut / Hold buckets. Hold stops the chain immediately; the final D<N>.final call validates dependencies and confirms the assembled scope. question_ids: <skill>-split-<option-slug> (kebab-case ASCII, ≤64 chars). Also fixes orphan "12. " prefix on the existing CJK rule. Tier-2+ skills inherit via the existing resolver. SKILL.md regenerated for all 41 affected skills + 3 golden fixtures. Net diff per SKILL.md: ~34 lines (vs ~110 for the full inline version). 6 tests pin the inline contract (4-option cap, buckets, D-numbering, docs pointer, runtime AUTO_DECIDE gate reference, orphan 12 regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(question-pref): runtime AUTO_DECIDE carve-out for *-split-* ids Split chains (per-option AskUserQuestion calls emitted by the new "Handling 5+ options" rule) must never be silently auto-approved via /plan-tune preferences. The user's option set is sacred. Layer 1 (mechanism): unique <skill>-split-<option-slug> ids prevent cross-option preference leakage. Layer 2 (this commit): the runtime checker `gstack-question-preference --check` detects any id matching *-split-* and forces ASK_NORMALLY even when never-ask or ask-only-for-one-way preferences exist for that exact id. An explanatory note tells the user their preference was bypassed and why. 7 tests pin the carve-out: no-pref baseline, never-ask override, explanatory note text, ask-only-for-one-way override, always-ask (no note), non-split id containing "split" word (negative case for regex specificity), multi-skill split id formats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): split-overflow regression for /plan-ceo-review Periodic-tier E2E test that catches the original failure mode the user complained about: 5+ options for ONE decision must split into N sequential AskUserQuestion calls, not drop one to fit Conductor's 4-option cap. Fixture: 5 independent chat-platform integration candidates (Slack/Discord/Teams/Telegram/Mattermost), each carrying its own include/defer/cut decision. Floor = 4 review-phase AUQs (standard [N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this floor. Wired into test/helpers/touchfiles.ts: tier periodic, depends on plan-ceo-review/**, the new preamble subsection, the question-pref binary (for the carve-out), and the runner helper. touchfiles.test.ts expected count bumped 21 → 22 to account for the new entry. Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: post-merge regen + rebase size-budget baseline to v1.47.0.0 After merging origin/main (v1.45 → v1.47), three things needed cleanup: 1. spec/SKILL.md (main's new skill) regenerated to include our split-vs-drop preamble subsection — same mechanical regen as the other 41 tier-2+ skills. 2. Three golden ship fixtures refreshed to capture main's GSTACK_PLAN_MODE block + /spec routing entry + jargon-list.json refactor. 3. docs/skills.md — added /spec table row that main's PR (#1698/#1733) shipped without. Pre-existing failure on main; this PR catches and fixes. Also rebased test/skill-size-budget.test.ts from v1.44.1 → v1.47.0.0 baseline. Main's v1.46 (catalog tokens trim) + v1.47 (/spec skill) pushed the v1.44.1 anchor past the 5% ratchet to ×1.059 — pre-existing failure on main. This PR captures a fresh parity-baseline-v1.47.0.0.json and re-anchors the test there. Historical v1.44.1.json and v1.46.0.0.json retained in test/fixtures/ for reference. Our subsection contributes ~0.1% of the post-rebase corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.48.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.5 KiB
AskUserQuestion split rule — full reference
Inline summary lives in the canonical preamble (scripts/resolvers/preamble/generate-ask-user-format.ts).
That subsection is intentionally compressed because it injects into every
tier-2+ skill's SKILL.md. This file is the deep reference the inline
guidance points to — load it when N>4 options come up and you need
worked examples or the full Hold / dependency / final-summary semantics.
The bug this prevents
Pre-rule failure mode (transcript verbatim from the user complaint that motivated this):
"I'm hitting Conductor's limit of 4 options in the AUQ, so I need to cut one. E4 (the detect-mappings codegen) is the biggest lift and probably beyond scope for v0.42 anyway — users can hand-author their mapping rules for the 9 clusters. I'll drop that and keep E1, E2, E3, and E5..."
"Conductor caps at 4 options. Trimming: E4 (detect-mappings codegen) is the largest-effort item and a natural v0.43+ follow-up — moving it to TODOS.md without asking. Re-firing with 4."
The agent unilaterally cut a real option without user input. The option set is the user's decision space; shrinking it silently is the bug.
Which shape: batched vs. split
Two compliant shapes. Pick by reading the options:
-
Batched into ≤4-groups — the options are coherent alternatives, one will be picked. Examples: "major / minor / patch / micro" for a version bump, "5 layout variants where the user picks one", "which framework: rspec / minitest / cucumber / none". Batch the top 4 into one AskUserQuestion; surface the 5th as a follow-up if none of the first 4 fit. This is the lower-friction path when applicable.
-
Split per-option — the options are independent scope items, each carrying its own include/defer/cut decision. Examples: "E1..E6, which do we ship?", "5 candidate integrations for Q3", "8 TODOs surfaced by the audit — which do we land?". Fire N sequential AskUserQuestion calls, one per option.
Default to split per-option when unsure. Batching wrong options together — shoehorning orthogonal scope items into one question — is the same failure mode as dropping.
Split per-option mechanics
Before the chain
Check for dependencies between options. If E3 requires E1, or E5 conflicts with E2, surface that in the per-option ELI10:
"Cutting this orphans E3 — they're linked."
Without dependency surfacing, the chain produces incoherent picked sets (user picks Include for E3 + Cut for E1, ships an unbuildable scope).
D-numbering
- Parent decision:
D<N>where N is the global question counter. - Each per-option call:
D<N>.kfor k=1..K children. - Final summary:
D<N>.final. - Single-option revise:
D<N>.revise-<k>.
Example chain for 5 options at parent D3:
D3.1 → D3.2 → D3.3 → D3.4 → D3.5 → D3.final
Per-option call shape
For each option Eₖ, fire an AskUserQuestion with:
D<N>.kheader (e.g. D3.1, D3.2 ... D3.5)- ELI10 of just this option's scope, cost, and any dependency it carries
- Recommendation: Include / Defer / Cut, with concrete reason
- 4 buckets per option:
- A) Include in this scope (recommended/not)
- B) Defer to follow-up (TODOs / next version)
- C) Cut entirely
- D) Hold — stop the chain, discuss before deciding
- Note: options differ in kind, not coverage — no completeness score.
(Include/Defer/Cut/Hold are decision actions, so the existing format
rule applies: omit
Completeness: N/10and use the kind-note instead.)
Hold means stop, not queue
When the user picks Hold on any per-option call, stop the chain immediately. Do not continue asking later options behind the Hold — the user wants to discuss the picked option first. After discussion, the user resumes by saying "continue" or naming the next option to ask about.
Wrong behavior: queue E4 and E5 behind a Hold on E3, then fire them later with stale context. Right behavior: stop, let the user reset the parent decision, resume from where they left off.
Final summary
After the chain resolves (without Hold), fire D<N>.final to confirm
and validate the assembled set.
Step 1 — validate dependencies. If the picked set is incoherent (e.g. E3 picked Include but its required E1 was Cut), do NOT silently accept. Re-prompt the conflict as a single AskUserQuestion:
"E3 needs E1 but you cut E1. Revise: A) keep E1 B) cut E3 too C) leave as-is and accept the broken state"
Step 2 — confirm the assembled set. If coherent:
"Here's the assembled set: E1, E2, E5. Ship this scope? A) Ship this scope (recommended) B) Revise one option (you pick which) C) Cut more"
Step 3 — targeted revise. If the user picks B, ask which option to
revise, then fire ONE per-option AskUserQuestion at D<N>.revise-<k>
to update just that option. Do not re-run the whole chain.
Sizing rules
-
N ≤ 4: use the normal single AskUserQuestion form. Don't split.
-
N = 5 or 6: split (or batch if a clean grouping exists).
-
N > 6: BEFORE the chain, fire a meta-AskUserQuestion at
D<N>.0:"About to ask N per-option questions. Options: A) Proceed with the full split (recommended only if every option is independent) B) Narrow scope first — I'll propose a smaller set C) Batch into groups of 4 instead"
This is itself an AskUserQuestion tool call, not prose — it counts as the first prompt in the chain, not a violation of the "tool not prose" rule.
question_id rules for split chains
Each per-option AskUserQuestion emits a unique question_id of the
form <skill>-split-<option-slug> where <option-slug> is the option's
key kebab-cased (lowercase, hyphens, ASCII only).
Examples:
plan-ceo-review-split-e4-detect-mappingsship-split-rspecplan-eng-review-split-add-coverage-test
Collision handling. If two options would produce the same slug,
suffix with -2, -3, etc.
Length. Total length must be ≤64 chars (validated by
bin/gstack-question-preference --write). Truncate the option slug if
needed, preserving the <skill>-split- prefix.
AUTO_DECIDE behavior with split chains
Two-layer defense.
Layer 1 — mechanism. Each per-option question_id is unique to its
option, so preferences set on one option's id cannot leak across the
chain. A never-ask on ship-split-rspec does not silently approve
ship-split-minitest.
Layer 2 — runtime enforcement. bin/gstack-question-preference --check detects any id matching *-split-* (the canonical slug pattern
emitted by split chains) and forces ASK_NORMALLY even when a
never-ask or ask-only-for-one-way preference exists for that exact
id. The check emits an explanatory note when this override fires:
"split-chain per-option calls always ASK_NORMALLY; your never-ask preference does not apply to options inside a sequential split."
Result. Split-chain per-option calls are NEVER AUTO_DECIDE-eligible. This is a runtime contract, not just collision-resistance by id uniqueness. The user's option set is sacred — restoring user sovereignty over the decision space is the entire point of splitting.
Interaction with per-skill rules
This rule overrides any per-skill "batch decisions" guidance.
Per-skill templates that explicitly require one-issue-per-call (e.g.
plan-eng-review) are already compatible — they're a stricter special
case of this rule.
Worked example: 5 platform integrations
Fixture used by test/skill-e2e-plan-ceo-split-overflow.test.ts. A plan
has 5 independent chat-platform candidates:
- E1) Slack DM bot (~2 weeks, ~40% of asks)
- E2) Discord guild bot (~3 weeks, ~15%)
- E3) Microsoft Teams (~4 weeks, ~5%)
- E4) Telegram (~1 week, ~8%)
- E5) Mattermost (~2 weeks, ~3%)
User wants individual decisions per candidate, not a bundled pick. The agent should:
- Recognize this is a 5-option independent-scope decision → split.
- Check dependencies (none here — each platform is standalone).
- Fire
D3.1throughD3.5, one per platform, with Include / Defer / Cut / Hold buckets and an effort+demand-grounded recommendation per option. - After the chain, fire
D3.finalsummarizing the assembled scope (e.g. "Ship E1 + E4 — Slack and Telegram pull most demand for least build cost. Defer the rest. A) Ship / B) Revise / C) Cut more").
Pre-fix failure shape (the bug): agent constructs a single AskUserQuestion with E1..E4 as four options, drops E5 with prose like "E5 is the smallest revenue segment, moving to TODOs". The user never got to weigh in on E5. Floor-of-4 in the E2E test catches this.