mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-02 00:01:37 +02:00
a6fb31726c
* feat(preamble): add "Handling 5+ options — split, never drop" rule Agents repeatedly hit Conductor's 4-option AskUserQuestion cap and silently drop one option to fit, shrinking the user's decision space. This rule names the bug and gives two compliant shapes: batch into ≤4-groups (for coherent alternatives) or split into N sequential per-option calls (for independent scope items, default). Inline preamble subsection is ~15 lines (rule + buckets + pointer). Full reference with worked examples, Hold/dependency semantics, and final-summary validation lives in docs/askuserquestion-split.md. The agent loads the docs file on demand when N>4. Per-option call shape: D<N>.k header, ELI10, Recommendation, kind-note (no completeness score — decision actions, not coverage), Include / Defer / Cut / Hold buckets. Hold stops the chain immediately; the final D<N>.final call validates dependencies and confirms the assembled scope. question_ids: <skill>-split-<option-slug> (kebab-case ASCII, ≤64 chars). Also fixes orphan "12. " prefix on the existing CJK rule. Tier-2+ skills inherit via the existing resolver. SKILL.md regenerated for all 41 affected skills + 3 golden fixtures. Net diff per SKILL.md: ~34 lines (vs ~110 for the full inline version). 6 tests pin the inline contract (4-option cap, buckets, D-numbering, docs pointer, runtime AUTO_DECIDE gate reference, orphan 12 regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(question-pref): runtime AUTO_DECIDE carve-out for *-split-* ids Split chains (per-option AskUserQuestion calls emitted by the new "Handling 5+ options" rule) must never be silently auto-approved via /plan-tune preferences. The user's option set is sacred. Layer 1 (mechanism): unique <skill>-split-<option-slug> ids prevent cross-option preference leakage. Layer 2 (this commit): the runtime checker `gstack-question-preference --check` detects any id matching *-split-* and forces ASK_NORMALLY even when never-ask or ask-only-for-one-way preferences exist for that exact id. An explanatory note tells the user their preference was bypassed and why. 7 tests pin the carve-out: no-pref baseline, never-ask override, explanatory note text, ask-only-for-one-way override, always-ask (no note), non-split id containing "split" word (negative case for regex specificity), multi-skill split id formats. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): split-overflow regression for /plan-ceo-review Periodic-tier E2E test that catches the original failure mode the user complained about: 5+ options for ONE decision must split into N sequential AskUserQuestion calls, not drop one to fit Conductor's 4-option cap. Fixture: 5 independent chat-platform integration candidates (Slack/Discord/Teams/Telegram/Mattermost), each carrying its own include/defer/cut decision. Floor = 4 review-phase AUQs (standard [N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this floor. Wired into test/helpers/touchfiles.ts: tier periodic, depends on plan-ceo-review/**, the new preamble subsection, the question-pref binary (for the carve-out), and the runner helper. touchfiles.test.ts expected count bumped 21 → 22 to account for the new entry. Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: post-merge regen + rebase size-budget baseline to v1.47.0.0 After merging origin/main (v1.45 → v1.47), three things needed cleanup: 1. spec/SKILL.md (main's new skill) regenerated to include our split-vs-drop preamble subsection — same mechanical regen as the other 41 tier-2+ skills. 2. Three golden ship fixtures refreshed to capture main's GSTACK_PLAN_MODE block + /spec routing entry + jargon-list.json refactor. 3. docs/skills.md — added /spec table row that main's PR (#1698/#1733) shipped without. Pre-existing failure on main; this PR catches and fixes. Also rebased test/skill-size-budget.test.ts from v1.44.1 → v1.47.0.0 baseline. Main's v1.46 (catalog tokens trim) + v1.47 (/spec skill) pushed the v1.44.1 anchor past the 5% ratchet to ×1.059 — pre-existing failure on main. This PR captures a fresh parity-baseline-v1.47.0.0.json and re-anchors the test there. Historical v1.44.1.json and v1.46.0.0.json retained in test/fixtures/ for reference. Our subsection contributes ~0.1% of the post-rebase corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.48.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
217 lines
8.5 KiB
Markdown
217 lines
8.5 KiB
Markdown
# AskUserQuestion split rule — full reference
|
|
|
|
Inline summary lives in the canonical preamble (`scripts/resolvers/preamble/generate-ask-user-format.ts`).
|
|
That subsection is intentionally compressed because it injects into every
|
|
tier-2+ skill's `SKILL.md`. This file is the deep reference the inline
|
|
guidance points to — load it when N>4 options come up and you need
|
|
worked examples or the full Hold / dependency / final-summary semantics.
|
|
|
|
## The bug this prevents
|
|
|
|
Pre-rule failure mode (transcript verbatim from the user complaint that
|
|
motivated this):
|
|
|
|
> "I'm hitting Conductor's limit of 4 options in the AUQ, so I need to
|
|
> cut one. E4 (the detect-mappings codegen) is the biggest lift and
|
|
> probably beyond scope for v0.42 anyway — users can hand-author their
|
|
> mapping rules for the 9 clusters. I'll drop that and keep E1, E2, E3,
|
|
> and E5..."
|
|
>
|
|
> "Conductor caps at 4 options. Trimming: E4 (detect-mappings codegen)
|
|
> is the largest-effort item and a natural v0.43+ follow-up — moving it
|
|
> to TODOS.md without asking. Re-firing with 4."
|
|
|
|
The agent unilaterally cut a real option without user input. The option
|
|
set is the user's decision space; shrinking it silently is the bug.
|
|
|
|
## Which shape: batched vs. split
|
|
|
|
Two compliant shapes. Pick by reading the options:
|
|
|
|
1. **Batched into ≤4-groups** — the options are coherent alternatives,
|
|
one will be picked. Examples: "major / minor / patch / micro" for a
|
|
version bump, "5 layout variants where the user picks one", "which
|
|
framework: rspec / minitest / cucumber / none". Batch the top 4 into
|
|
one AskUserQuestion; surface the 5th as a follow-up if none of the
|
|
first 4 fit. This is the lower-friction path when applicable.
|
|
|
|
2. **Split per-option** — the options are independent scope items, each
|
|
carrying its own include/defer/cut decision. Examples: "E1..E6, which
|
|
do we ship?", "5 candidate integrations for Q3", "8 TODOs surfaced by
|
|
the audit — which do we land?". Fire N sequential AskUserQuestion
|
|
calls, one per option.
|
|
|
|
**Default to split per-option when unsure.** Batching wrong options
|
|
together — shoehorning orthogonal scope items into one question — is
|
|
the same failure mode as dropping.
|
|
|
|
## Split per-option mechanics
|
|
|
|
### Before the chain
|
|
|
|
Check for dependencies between options. If E3 requires E1, or E5
|
|
conflicts with E2, surface that in the per-option ELI10:
|
|
|
|
> "Cutting this orphans E3 — they're linked."
|
|
|
|
Without dependency surfacing, the chain produces incoherent picked sets
|
|
(user picks Include for E3 + Cut for E1, ships an unbuildable scope).
|
|
|
|
### D-numbering
|
|
|
|
- Parent decision: `D<N>` where N is the global question counter.
|
|
- Each per-option call: `D<N>.k` for k=1..K children.
|
|
- Final summary: `D<N>.final`.
|
|
- Single-option revise: `D<N>.revise-<k>`.
|
|
|
|
Example chain for 5 options at parent D3:
|
|
|
|
```
|
|
D3.1 → D3.2 → D3.3 → D3.4 → D3.5 → D3.final
|
|
```
|
|
|
|
### Per-option call shape
|
|
|
|
For each option Eₖ, fire an AskUserQuestion with:
|
|
|
|
- `D<N>.k` header (e.g. D3.1, D3.2 ... D3.5)
|
|
- ELI10 of just this option's scope, cost, and any dependency it carries
|
|
- Recommendation: Include / Defer / Cut, with concrete reason
|
|
- 4 buckets per option:
|
|
- **A) Include** in this scope (recommended/not)
|
|
- **B) Defer** to follow-up (TODOs / next version)
|
|
- **C) Cut** entirely
|
|
- **D) Hold** — stop the chain, discuss before deciding
|
|
- Note: options differ in kind, not coverage — no completeness score.
|
|
(Include/Defer/Cut/Hold are decision actions, so the existing format
|
|
rule applies: omit `Completeness: N/10` and use the kind-note instead.)
|
|
|
|
### Hold means stop, not queue
|
|
|
|
When the user picks Hold on any per-option call, **stop the chain
|
|
immediately**. Do not continue asking later options behind the Hold —
|
|
the user wants to discuss the picked option first. After discussion,
|
|
the user resumes by saying "continue" or naming the next option to ask
|
|
about.
|
|
|
|
Wrong behavior: queue E4 and E5 behind a Hold on E3, then fire them
|
|
later with stale context. Right behavior: stop, let the user reset the
|
|
parent decision, resume from where they left off.
|
|
|
|
### Final summary
|
|
|
|
After the chain resolves (without Hold), fire `D<N>.final` to confirm
|
|
and validate the assembled set.
|
|
|
|
**Step 1 — validate dependencies.** If the picked set is incoherent
|
|
(e.g. E3 picked Include but its required E1 was Cut), do NOT silently
|
|
accept. Re-prompt the conflict as a single AskUserQuestion:
|
|
|
|
> "E3 needs E1 but you cut E1. Revise:
|
|
> A) keep E1
|
|
> B) cut E3 too
|
|
> C) leave as-is and accept the broken state"
|
|
|
|
**Step 2 — confirm the assembled set.** If coherent:
|
|
|
|
> "Here's the assembled set: E1, E2, E5. Ship this scope?
|
|
> A) Ship this scope (recommended)
|
|
> B) Revise one option (you pick which)
|
|
> C) Cut more"
|
|
|
|
**Step 3 — targeted revise.** If the user picks B, ask which option to
|
|
revise, then fire ONE per-option AskUserQuestion at `D<N>.revise-<k>`
|
|
to update just that option. Do **not** re-run the whole chain.
|
|
|
|
## Sizing rules
|
|
|
|
- **N ≤ 4**: use the normal single AskUserQuestion form. Don't split.
|
|
- **N = 5 or 6**: split (or batch if a clean grouping exists).
|
|
- **N > 6**: BEFORE the chain, fire a meta-AskUserQuestion at `D<N>.0`:
|
|
|
|
> "About to ask N per-option questions. Options:
|
|
> A) Proceed with the full split (recommended only if every option is
|
|
> independent)
|
|
> B) Narrow scope first — I'll propose a smaller set
|
|
> C) Batch into groups of 4 instead"
|
|
|
|
This is itself an AskUserQuestion tool call, not prose — it counts as
|
|
the first prompt in the chain, not a violation of the "tool not prose"
|
|
rule.
|
|
|
|
## question_id rules for split chains
|
|
|
|
Each per-option AskUserQuestion emits a unique `question_id` of the
|
|
form `<skill>-split-<option-slug>` where `<option-slug>` is the option's
|
|
key kebab-cased (lowercase, hyphens, ASCII only).
|
|
|
|
Examples:
|
|
- `plan-ceo-review-split-e4-detect-mappings`
|
|
- `ship-split-rspec`
|
|
- `plan-eng-review-split-add-coverage-test`
|
|
|
|
**Collision handling.** If two options would produce the same slug,
|
|
suffix with `-2`, `-3`, etc.
|
|
|
|
**Length.** Total length must be ≤64 chars (validated by
|
|
`bin/gstack-question-preference --write`). Truncate the option slug if
|
|
needed, preserving the `<skill>-split-` prefix.
|
|
|
|
## AUTO_DECIDE behavior with split chains
|
|
|
|
Two-layer defense.
|
|
|
|
**Layer 1 — mechanism.** Each per-option `question_id` is unique to its
|
|
option, so preferences set on one option's id cannot leak across the
|
|
chain. A `never-ask` on `ship-split-rspec` does not silently approve
|
|
`ship-split-minitest`.
|
|
|
|
**Layer 2 — runtime enforcement.** `bin/gstack-question-preference
|
|
--check` detects any id matching `*-split-*` (the canonical slug pattern
|
|
emitted by split chains) and forces `ASK_NORMALLY` even when a
|
|
`never-ask` or `ask-only-for-one-way` preference exists for that exact
|
|
id. The check emits an explanatory note when this override fires:
|
|
|
|
> "split-chain per-option calls always ASK_NORMALLY; your never-ask
|
|
> preference does not apply to options inside a sequential split."
|
|
|
|
**Result.** Split-chain per-option calls are NEVER AUTO_DECIDE-eligible.
|
|
This is a runtime contract, not just collision-resistance by id
|
|
uniqueness. The user's option set is sacred — restoring user
|
|
sovereignty over the decision space is the entire point of splitting.
|
|
|
|
## Interaction with per-skill rules
|
|
|
|
This rule **overrides any per-skill "batch decisions" guidance**.
|
|
Per-skill templates that explicitly require one-issue-per-call (e.g.
|
|
`plan-eng-review`) are already compatible — they're a stricter special
|
|
case of this rule.
|
|
|
|
## Worked example: 5 platform integrations
|
|
|
|
Fixture used by `test/skill-e2e-plan-ceo-split-overflow.test.ts`. A plan
|
|
has 5 independent chat-platform candidates:
|
|
|
|
- E1) Slack DM bot (~2 weeks, ~40% of asks)
|
|
- E2) Discord guild bot (~3 weeks, ~15%)
|
|
- E3) Microsoft Teams (~4 weeks, ~5%)
|
|
- E4) Telegram (~1 week, ~8%)
|
|
- E5) Mattermost (~2 weeks, ~3%)
|
|
|
|
User wants individual decisions per candidate, not a bundled pick. The
|
|
agent should:
|
|
|
|
1. Recognize this is a 5-option independent-scope decision → split.
|
|
2. Check dependencies (none here — each platform is standalone).
|
|
3. Fire `D3.1` through `D3.5`, one per platform, with Include / Defer /
|
|
Cut / Hold buckets and an effort+demand-grounded recommendation per
|
|
option.
|
|
4. After the chain, fire `D3.final` summarizing the assembled scope
|
|
(e.g. "Ship E1 + E4 — Slack and Telegram pull most demand for least
|
|
build cost. Defer the rest. A) Ship / B) Revise / C) Cut more").
|
|
|
|
Pre-fix failure shape (the bug): agent constructs a single
|
|
AskUserQuestion with E1..E4 as four options, drops E5 with prose like
|
|
"E5 is the smallest revenue segment, moving to TODOs". The user never
|
|
got to weigh in on E5. Floor-of-4 in the E2E test catches this.
|