mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-17 07:10:12 +02:00
feat(auq): prose fallback when AskUserQuestion fails (interactive sessions)
On a genuine AUQ failure (tool absent, or present-but-erroring like Conductor's flaky MCP returning '[Tool result missing due to internal error]'): retry once, then branch on SESSION_KIND — spawned auto-chooses, headless BLOCKs, interactive renders a prose decision brief the user answers by typing a letter. The prose fallback MUST surface the triad: a clear ELI10 of the issue, a per-choice Completeness score, and a recommendation+why (one paragraph per choice). Carves out the [plan-tune auto-decide] denial as NOT a failure, and qualifies the former 'tool_use, not prose' assertions so the rule isn't self-contradicting. Tests pin the triad, the SESSION_KIND branch, the OV2 collision guard, the always-loaded guarantee, and a cross-file invariant on the auto-decide prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -9,11 +9,31 @@ export function generateAskUserFormat(_ctx: TemplateContext): string {
|
||||
|
||||
**Rule:** if any \`mcp__*__AskUserQuestion\` variant is in your tool list, prefer it. Hosts may disable native AUQ via \`--disallowedTools AskUserQuestion\` (Conductor does, by default) and route through their MCP variant; calling native there silently fails. Same questions/options shape; same decision-brief format applies.
|
||||
|
||||
**If no AskUserQuestion variant appears in your tool list, this skill is BLOCKED.** Stop, report \`BLOCKED — AskUserQuestion unavailable\`, and wait for the user. Do not write decisions to the plan file as a substitute, do not emit them as prose and stop, and do not silently auto-decide (only \`/plan-tune\` AUTO_DECIDE opt-ins authorize auto-picking).
|
||||
If AskUserQuestion is unavailable (no variant in your tool list) OR a call to it fails, do NOT silently auto-decide or write the decision to the plan file as a substitute. Follow the **failure fallback** below.
|
||||
|
||||
### When AskUserQuestion is unavailable or a call fails
|
||||
|
||||
Tell three outcomes apart:
|
||||
|
||||
1. **Auto-decide denial (NOT a failure).** The result contains \`[plan-tune auto-decide] <id> → <option>\` — the preference hook working as designed. Proceed with that option. Do NOT retry, do NOT fall back to prose.
|
||||
2. **Genuine failure** — no variant in your tool list, OR the variant is present but the call returns an error / missing result (MCP transport error, empty result, host bug — e.g. Conductor's MCP AskUserQuestion is flaky and returns \`[Tool result missing due to internal error]\`).
|
||||
- If it was present and **errored** (not absent), retry the SAME call **once** — but only if no answer could have surfaced (a missing-result error can arrive after the user already saw the question; retrying would double-prompt, so if it may have reached them, treat as pending, don't retry).
|
||||
- Then branch on \`SESSION_KIND\` (echoed by the preamble; empty/absent ⇒ \`interactive\`):
|
||||
- \`spawned\` → defer to the **Spawned session** block: auto-choose the recommended option. Never prose, never BLOCKED.
|
||||
- \`headless\` → \`BLOCKED — AskUserQuestion unavailable\`; stop and wait (no human can answer).
|
||||
- \`interactive\` → **prose fallback** (below).
|
||||
|
||||
**Prose fallback — render the decision brief as a markdown message, not a tool call.** Same information as the tool format below, different structure (paragraphs, not ✅/❌ bullets). It MUST surface this triad:
|
||||
|
||||
1. **A clear ELI10 of the issue itself** — plain English on what's being decided and why it matters (the question, not per-choice), naming the stakes. Lead with it.
|
||||
2. **Completeness scores per choice** — explicit \`Completeness: X/10\` on EACH choice (10 complete, 7 happy-path, 3 shortcut); use the kind-note when options differ in kind not coverage, but never silently drop the score.
|
||||
3. **The recommendation and why** — a \`Recommendation: <choice> because <reason>\` line plus the \`(recommended)\` marker on that choice.
|
||||
|
||||
Layout: a \`D<N>\` title + a one-line note that AskUserQuestion failed and to reply with a letter; the issue ELI10; the Recommendation line; then ONE paragraph per choice carrying its \`(recommended)\` marker, its \`Completeness: X/10\`, and 2-4 sentences of reasoning — never a bare bullet list; a closing \`Net:\` line. Split chains / 5+ options: one prose block per per-option call, in sequence. Then STOP and wait — the user's typed answer is the decision. In plan mode this satisfies end-of-turn like a tool call.
|
||||
|
||||
### Format
|
||||
|
||||
Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose.
|
||||
Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose — unless the documented failure fallback above applies (interactive session + the call is unavailable/erroring), in which case the prose fallback is the correct output.
|
||||
|
||||
\`\`\`
|
||||
D<N> — <one-line question title>
|
||||
@@ -93,7 +113,7 @@ Before calling AskUserQuestion, verify:
|
||||
- [ ] (recommended) label on one option (even for neutral-posture)
|
||||
- [ ] Dual-scale effort labels on effort-bearing options (human / CC)
|
||||
- [ ] Net line closes the decision
|
||||
- [ ] You are calling the tool, not writing prose
|
||||
- [ ] You are calling the tool, not writing prose — unless the documented failure fallback applies (then: prose with the mandatory triad — issue ELI10, per-choice Completeness, Recommendation + \`(recommended)\` — and a "reply with a letter" instruction, then STOP)
|
||||
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \\u-escaped
|
||||
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
|
||||
- [ ] If you split, you checked dependencies between options before firing the chain
|
||||
|
||||
@@ -26,7 +26,7 @@ In plan mode, allowed because they inform the plan: \`$B\`, \`$D\`, \`codex exec
|
||||
|
||||
## Skill Invocation During Plan Mode
|
||||
|
||||
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — \`mcp__*__AskUserQuestion\` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report \`BLOCKED — AskUserQuestion unavailable\` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.`;
|
||||
If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — \`mcp__*__AskUserQuestion\` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: \`headless\` → BLOCKED; \`interactive\` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.`;
|
||||
}
|
||||
|
||||
export function generateCompletionStatus(ctx: TemplateContext): string {
|
||||
|
||||
@@ -47,6 +47,11 @@ const MANDATORY: Array<{ name: string; re: RegExp }> = [
|
||||
{ name: 'Completeness coverage rule', re: /Completeness\s*:/i },
|
||||
{ name: 'kind-vs-coverage rule', re: /options differ in kind/i },
|
||||
{ name: 'Self-check checklist', re: /Self-check before emitting/i },
|
||||
// The runtime-failure fallback must be ALWAYS-LOADED too: when an AUQ call errors
|
||||
// mid-skill, the model needs the prose-fallback rule in context that instant, not
|
||||
// stranded in an on-demand section. Same guarantee as the format spec above.
|
||||
{ name: 'AUQ-failure fallback subsection', re: /When AskUserQuestion is unavailable or a call fails/i },
|
||||
{ name: 'fallback SESSION_KIND branch', re: /SESSION_KIND/ },
|
||||
];
|
||||
|
||||
/** Per-skill AUQ rules that govern review-finding cadence. A carve may move
|
||||
|
||||
@@ -16,6 +16,8 @@
|
||||
* for the weekly periodic eval to notice.
|
||||
*/
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import type { TemplateContext } from '../scripts/resolvers/types';
|
||||
import { HOST_PATHS } from '../scripts/resolvers/types';
|
||||
import { generateAskUserFormat } from '../scripts/resolvers/preamble/generate-ask-user-format';
|
||||
@@ -161,3 +163,88 @@ describe('generateAskUserFormat — 5+ option split rule (slim inline + docs poi
|
||||
expect(out).toContain('**Non-ASCII characters');
|
||||
});
|
||||
});
|
||||
|
||||
describe('generateAskUserFormat — runtime-failure prose fallback', () => {
|
||||
const out = generateAskUserFormat(makeCtx());
|
||||
|
||||
test('documents the unavailable/failed subsection', () => {
|
||||
expect(out).toMatch(/When AskUserQuestion is unavailable or a call fails/i);
|
||||
});
|
||||
|
||||
test('carves out the auto-decide denial as NOT a failure', () => {
|
||||
expect(out).toContain('[plan-tune auto-decide]');
|
||||
expect(out).toMatch(/NOT a failure/i);
|
||||
// and explicitly: do not fall back to prose on an auto-decide denial
|
||||
expect(out).toMatch(/Do NOT[\s\S]{0,40}fall back to prose|never prose/i);
|
||||
});
|
||||
|
||||
test('retries the errored call exactly once before degrading', () => {
|
||||
expect(out).toMatch(/retry the SAME call \*\*once\*\*|retry the same call.*once/i);
|
||||
// idempotency guard against double-prompting
|
||||
expect(out).toMatch(/double-prompt|no answer could have surfaced/i);
|
||||
});
|
||||
|
||||
test('branches on SESSION_KIND: spawned / headless / interactive', () => {
|
||||
expect(out).toContain('SESSION_KIND');
|
||||
expect(out).toMatch(/`spawned`[\s\S]*auto-choose/);
|
||||
expect(out).toMatch(/`headless`[\s\S]*BLOCKED/);
|
||||
expect(out).toMatch(/`interactive`[\s\S]*prose fallback/);
|
||||
// empty/absent SESSION_KIND degrades to interactive
|
||||
expect(out).toMatch(/empty\/absent[\s\S]{0,40}interactive/i);
|
||||
});
|
||||
|
||||
// The mandatory triad the user explicitly required for the plain-text output.
|
||||
test('prose fallback mandates the triad: issue ELI10', () => {
|
||||
expect(out).toMatch(/ELI10 of the issue itself/i);
|
||||
});
|
||||
|
||||
test('prose fallback mandates the triad: per-choice Completeness score', () => {
|
||||
expect(out).toMatch(/Completeness scores per choice/i);
|
||||
expect(out).toMatch(/Completeness: X\/10.*EACH choice|on EACH choice/i);
|
||||
});
|
||||
|
||||
test('prose fallback mandates the triad: recommendation + (recommended) marker', () => {
|
||||
expect(out).toMatch(/Recommendation: <choice> because/);
|
||||
expect(out).toMatch(/\(recommended\)`? marker on that choice/);
|
||||
});
|
||||
|
||||
test('prose fallback is one paragraph per choice, not a bare bullet list', () => {
|
||||
expect(out).toMatch(/ONE paragraph per choice/i);
|
||||
expect(out).toMatch(/never a bare bullet list/i);
|
||||
});
|
||||
|
||||
test('prose fallback tells the user to reply with a letter, then STOP', () => {
|
||||
expect(out).toMatch(/reply with a letter/i);
|
||||
expect(out).toMatch(/STOP and wait/i);
|
||||
});
|
||||
|
||||
// OV2: the former "tool_use, not prose" assertions must carry the qualifier so the
|
||||
// fallback is not self-contradicting. Guards against the instruction collision
|
||||
// silently returning on a future edit.
|
||||
test('OV2: the Format line qualifies "not prose" with the fallback exception', () => {
|
||||
expect(out).toMatch(/must be sent as tool_use, not prose — unless the documented failure fallback/);
|
||||
});
|
||||
|
||||
test('OV2: the self-check "not writing prose" line carries the fallback qualifier', () => {
|
||||
expect(out).toMatch(/not writing prose — unless the documented failure fallback applies/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('CQ2 — cross-file invariant: auto-decide prefix matches the hook', () => {
|
||||
const out = generateAskUserFormat(makeCtx());
|
||||
const hookSrc = fs.readFileSync(
|
||||
path.resolve(__dirname, '..', 'hosts', 'claude', 'hooks', 'question-preference-hook.ts'),
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
test('the hook actually emits the [plan-tune auto-decide] prefix', () => {
|
||||
expect(hookSrc).toContain('[plan-tune auto-decide]');
|
||||
});
|
||||
|
||||
test('the resolver references the exact same prefix the hook emits', () => {
|
||||
// If a future edit reworded the hook reason, this catches the drift: the prose
|
||||
// fallback would stop recognizing the auto-decide denial as not-a-failure.
|
||||
const PREFIX = '[plan-tune auto-decide]';
|
||||
expect(hookSrc.includes(PREFIX) && out.includes(PREFIX)).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user