Files
gstack/docs/askuserquestion-split.md
T
Garry Tan a6fb31726c v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740)
* feat(preamble): add "Handling 5+ options — split, never drop" rule

Agents repeatedly hit Conductor's 4-option AskUserQuestion cap and
silently drop one option to fit, shrinking the user's decision space.
This rule names the bug and gives two compliant shapes: batch into
≤4-groups (for coherent alternatives) or split into N sequential
per-option calls (for independent scope items, default).

Inline preamble subsection is ~15 lines (rule + buckets + pointer).
Full reference with worked examples, Hold/dependency semantics, and
final-summary validation lives in docs/askuserquestion-split.md.
The agent loads the docs file on demand when N>4.

Per-option call shape: D<N>.k header, ELI10, Recommendation, kind-note
(no completeness score — decision actions, not coverage), Include /
Defer / Cut / Hold buckets. Hold stops the chain immediately; the
final D<N>.final call validates dependencies and confirms the
assembled scope.

question_ids: <skill>-split-<option-slug> (kebab-case ASCII, ≤64
chars). Also fixes orphan "12. " prefix on the existing CJK rule.

Tier-2+ skills inherit via the existing resolver. SKILL.md regenerated
for all 41 affected skills + 3 golden fixtures. Net diff per SKILL.md:
~34 lines (vs ~110 for the full inline version).

6 tests pin the inline contract (4-option cap, buckets, D-numbering,
docs pointer, runtime AUTO_DECIDE gate reference, orphan 12 regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(question-pref): runtime AUTO_DECIDE carve-out for *-split-* ids

Split chains (per-option AskUserQuestion calls emitted by the new
"Handling 5+ options" rule) must never be silently auto-approved
via /plan-tune preferences. The user's option set is sacred.

Layer 1 (mechanism): unique <skill>-split-<option-slug> ids prevent
cross-option preference leakage. Layer 2 (this commit): the runtime
checker `gstack-question-preference --check` detects any id matching
*-split-* and forces ASK_NORMALLY even when never-ask or
ask-only-for-one-way preferences exist for that exact id. An
explanatory note tells the user their preference was bypassed and why.

7 tests pin the carve-out: no-pref baseline, never-ask override,
explanatory note text, ask-only-for-one-way override, always-ask
(no note), non-split id containing "split" word (negative case for
regex specificity), multi-skill split id formats.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): split-overflow regression for /plan-ceo-review

Periodic-tier E2E test that catches the original failure mode the
user complained about: 5+ options for ONE decision must split into
N sequential AskUserQuestion calls, not drop one to fit Conductor's
4-option cap.

Fixture: 5 independent chat-platform integration candidates
(Slack/Discord/Teams/Telegram/Mattermost), each carrying its own
include/defer/cut decision. Floor = 4 review-phase AUQs (standard
[N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this
floor.

Wired into test/helpers/touchfiles.ts: tier periodic, depends on
plan-ceo-review/**, the new preamble subsection, the question-pref
binary (for the carve-out), and the runner helper. touchfiles.test.ts
expected count bumped 21 → 22 to account for the new entry.

Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: post-merge regen + rebase size-budget baseline to v1.47.0.0

After merging origin/main (v1.45 → v1.47), three things needed cleanup:

1. spec/SKILL.md (main's new skill) regenerated to include our split-vs-drop
   preamble subsection — same mechanical regen as the other 41 tier-2+ skills.
2. Three golden ship fixtures refreshed to capture main's GSTACK_PLAN_MODE
   block + /spec routing entry + jargon-list.json refactor.
3. docs/skills.md — added /spec table row that main's PR (#1698/#1733) shipped
   without. Pre-existing failure on main; this PR catches and fixes.

Also rebased test/skill-size-budget.test.ts from v1.44.1 → v1.47.0.0 baseline.
Main's v1.46 (catalog tokens trim) + v1.47 (/spec skill) pushed the v1.44.1
anchor past the 5% ratchet to ×1.059 — pre-existing failure on main. This
PR captures a fresh parity-baseline-v1.47.0.0.json and re-anchors the test
there. Historical v1.44.1.json and v1.46.0.0.json retained in test/fixtures/
for reference. Our subsection contributes ~0.1% of the post-rebase corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.48.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:43:07 -07:00

217 lines
8.5 KiB
Markdown

# AskUserQuestion split rule — full reference
Inline summary lives in the canonical preamble (`scripts/resolvers/preamble/generate-ask-user-format.ts`).
That subsection is intentionally compressed because it injects into every
tier-2+ skill's `SKILL.md`. This file is the deep reference the inline
guidance points to — load it when N>4 options come up and you need
worked examples or the full Hold / dependency / final-summary semantics.
## The bug this prevents
Pre-rule failure mode (transcript verbatim from the user complaint that
motivated this):
> "I'm hitting Conductor's limit of 4 options in the AUQ, so I need to
> cut one. E4 (the detect-mappings codegen) is the biggest lift and
> probably beyond scope for v0.42 anyway — users can hand-author their
> mapping rules for the 9 clusters. I'll drop that and keep E1, E2, E3,
> and E5..."
>
> "Conductor caps at 4 options. Trimming: E4 (detect-mappings codegen)
> is the largest-effort item and a natural v0.43+ follow-up — moving it
> to TODOS.md without asking. Re-firing with 4."
The agent unilaterally cut a real option without user input. The option
set is the user's decision space; shrinking it silently is the bug.
## Which shape: batched vs. split
Two compliant shapes. Pick by reading the options:
1. **Batched into ≤4-groups** — the options are coherent alternatives,
one will be picked. Examples: "major / minor / patch / micro" for a
version bump, "5 layout variants where the user picks one", "which
framework: rspec / minitest / cucumber / none". Batch the top 4 into
one AskUserQuestion; surface the 5th as a follow-up if none of the
first 4 fit. This is the lower-friction path when applicable.
2. **Split per-option** — the options are independent scope items, each
carrying its own include/defer/cut decision. Examples: "E1..E6, which
do we ship?", "5 candidate integrations for Q3", "8 TODOs surfaced by
the audit — which do we land?". Fire N sequential AskUserQuestion
calls, one per option.
**Default to split per-option when unsure.** Batching wrong options
together — shoehorning orthogonal scope items into one question — is
the same failure mode as dropping.
## Split per-option mechanics
### Before the chain
Check for dependencies between options. If E3 requires E1, or E5
conflicts with E2, surface that in the per-option ELI10:
> "Cutting this orphans E3 — they're linked."
Without dependency surfacing, the chain produces incoherent picked sets
(user picks Include for E3 + Cut for E1, ships an unbuildable scope).
### D-numbering
- Parent decision: `D<N>` where N is the global question counter.
- Each per-option call: `D<N>.k` for k=1..K children.
- Final summary: `D<N>.final`.
- Single-option revise: `D<N>.revise-<k>`.
Example chain for 5 options at parent D3:
```
D3.1 → D3.2 → D3.3 → D3.4 → D3.5 → D3.final
```
### Per-option call shape
For each option Eₖ, fire an AskUserQuestion with:
- `D<N>.k` header (e.g. D3.1, D3.2 ... D3.5)
- ELI10 of just this option's scope, cost, and any dependency it carries
- Recommendation: Include / Defer / Cut, with concrete reason
- 4 buckets per option:
- **A) Include** in this scope (recommended/not)
- **B) Defer** to follow-up (TODOs / next version)
- **C) Cut** entirely
- **D) Hold** — stop the chain, discuss before deciding
- Note: options differ in kind, not coverage — no completeness score.
(Include/Defer/Cut/Hold are decision actions, so the existing format
rule applies: omit `Completeness: N/10` and use the kind-note instead.)
### Hold means stop, not queue
When the user picks Hold on any per-option call, **stop the chain
immediately**. Do not continue asking later options behind the Hold —
the user wants to discuss the picked option first. After discussion,
the user resumes by saying "continue" or naming the next option to ask
about.
Wrong behavior: queue E4 and E5 behind a Hold on E3, then fire them
later with stale context. Right behavior: stop, let the user reset the
parent decision, resume from where they left off.
### Final summary
After the chain resolves (without Hold), fire `D<N>.final` to confirm
and validate the assembled set.
**Step 1 — validate dependencies.** If the picked set is incoherent
(e.g. E3 picked Include but its required E1 was Cut), do NOT silently
accept. Re-prompt the conflict as a single AskUserQuestion:
> "E3 needs E1 but you cut E1. Revise:
> A) keep E1
> B) cut E3 too
> C) leave as-is and accept the broken state"
**Step 2 — confirm the assembled set.** If coherent:
> "Here's the assembled set: E1, E2, E5. Ship this scope?
> A) Ship this scope (recommended)
> B) Revise one option (you pick which)
> C) Cut more"
**Step 3 — targeted revise.** If the user picks B, ask which option to
revise, then fire ONE per-option AskUserQuestion at `D<N>.revise-<k>`
to update just that option. Do **not** re-run the whole chain.
## Sizing rules
- **N ≤ 4**: use the normal single AskUserQuestion form. Don't split.
- **N = 5 or 6**: split (or batch if a clean grouping exists).
- **N > 6**: BEFORE the chain, fire a meta-AskUserQuestion at `D<N>.0`:
> "About to ask N per-option questions. Options:
> A) Proceed with the full split (recommended only if every option is
> independent)
> B) Narrow scope first — I'll propose a smaller set
> C) Batch into groups of 4 instead"
This is itself an AskUserQuestion tool call, not prose — it counts as
the first prompt in the chain, not a violation of the "tool not prose"
rule.
## question_id rules for split chains
Each per-option AskUserQuestion emits a unique `question_id` of the
form `<skill>-split-<option-slug>` where `<option-slug>` is the option's
key kebab-cased (lowercase, hyphens, ASCII only).
Examples:
- `plan-ceo-review-split-e4-detect-mappings`
- `ship-split-rspec`
- `plan-eng-review-split-add-coverage-test`
**Collision handling.** If two options would produce the same slug,
suffix with `-2`, `-3`, etc.
**Length.** Total length must be ≤64 chars (validated by
`bin/gstack-question-preference --write`). Truncate the option slug if
needed, preserving the `<skill>-split-` prefix.
## AUTO_DECIDE behavior with split chains
Two-layer defense.
**Layer 1 — mechanism.** Each per-option `question_id` is unique to its
option, so preferences set on one option's id cannot leak across the
chain. A `never-ask` on `ship-split-rspec` does not silently approve
`ship-split-minitest`.
**Layer 2 — runtime enforcement.** `bin/gstack-question-preference
--check` detects any id matching `*-split-*` (the canonical slug pattern
emitted by split chains) and forces `ASK_NORMALLY` even when a
`never-ask` or `ask-only-for-one-way` preference exists for that exact
id. The check emits an explanatory note when this override fires:
> "split-chain per-option calls always ASK_NORMALLY; your never-ask
> preference does not apply to options inside a sequential split."
**Result.** Split-chain per-option calls are NEVER AUTO_DECIDE-eligible.
This is a runtime contract, not just collision-resistance by id
uniqueness. The user's option set is sacred — restoring user
sovereignty over the decision space is the entire point of splitting.
## Interaction with per-skill rules
This rule **overrides any per-skill "batch decisions" guidance**.
Per-skill templates that explicitly require one-issue-per-call (e.g.
`plan-eng-review`) are already compatible — they're a stricter special
case of this rule.
## Worked example: 5 platform integrations
Fixture used by `test/skill-e2e-plan-ceo-split-overflow.test.ts`. A plan
has 5 independent chat-platform candidates:
- E1) Slack DM bot (~2 weeks, ~40% of asks)
- E2) Discord guild bot (~3 weeks, ~15%)
- E3) Microsoft Teams (~4 weeks, ~5%)
- E4) Telegram (~1 week, ~8%)
- E5) Mattermost (~2 weeks, ~3%)
User wants individual decisions per candidate, not a bundled pick. The
agent should:
1. Recognize this is a 5-option independent-scope decision → split.
2. Check dependencies (none here — each platform is standalone).
3. Fire `D3.1` through `D3.5`, one per platform, with Include / Defer /
Cut / Hold buckets and an effort+demand-grounded recommendation per
option.
4. After the chain, fire `D3.final` summarizing the assembled scope
(e.g. "Ship E1 + E4 — Slack and Telegram pull most demand for least
build cost. Defer the rest. A) Ship / B) Revise / C) Cut more").
Pre-fix failure shape (the bug): agent constructs a single
AskUserQuestion with E1..E4 as four options, drops E5 with prose like
"E5 is the smallest revenue segment, moving to TODOs". The user never
got to weigh in on E5. Floor-of-4 in the E2E test catches this.