feat(auq): prose fallback when AskUserQuestion fails (interactive sessions)

On a genuine AUQ failure (tool absent, or present-but-erroring like Conductor's flaky MCP returning '[Tool result missing due to internal error]'): retry once, then branch on SESSION_KIND — spawned auto-chooses, headless BLOCKs, interactive renders a prose decision brief the user answers by typing a letter. The prose fallback MUST surface the triad: a clear ELI10 of the issue, a per-choice Completeness score, and a recommendation+why (one paragraph per choice). Carves out the [plan-tune auto-decide] denial as NOT a failure, and qualifies the former 'tool_use, not prose' assertions so the rule isn't self-contradicting. Tests pin the triad, the SESSION_KIND branch, the OV2 collision guard, the always-loaded guarantee, and a cross-file invariant on the auto-decide prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 07:10:12 +02:00 · 2026-06-07 17:51:22 -07:00
parent 28d75fe9f2
commit c475d73b34
4 changed files with 116 additions and 4 deletions
@@ -9,11 +9,31 @@ export function generateAskUserFormat(_ctx: TemplateContext): string {

 **Rule:** if any \`mcp__*__AskUserQuestion\` variant is in your tool list, prefer it. Hosts may disable native AUQ via \`--disallowedTools AskUserQuestion\` (Conductor does, by default) and route through their MCP variant; calling native there silently fails. Same questions/options shape; same decision-brief format applies.

-**If no AskUserQuestion variant appears in your tool list, this skill is BLOCKED.** Stop, report \`BLOCKED — AskUserQuestion unavailable\`, and wait for the user. Do not write decisions to the plan file as a substitute, do not emit them as prose and stop, and do not silently auto-decide (only \`/plan-tune\` AUTO_DECIDE opt-ins authorize auto-picking).
+If AskUserQuestion is unavailable (no variant in your tool list) OR a call to it fails, do NOT silently auto-decide or write the decision to the plan file as a substitute. Follow the **failure fallback** below.
+
+### When AskUserQuestion is unavailable or a call fails
+
+Tell three outcomes apart:
+
+1. **Auto-decide denial (NOT a failure).** The result contains \`[plan-tune auto-decide] <id> → <option>\` — the preference hook working as designed. Proceed with that option. Do NOT retry, do NOT fall back to prose.
+2. **Genuine failure** — no variant in your tool list, OR the variant is present but the call returns an error / missing result (MCP transport error, empty result, host bug — e.g. Conductor's MCP AskUserQuestion is flaky and returns \`[Tool result missing due to internal error]\`).
+   - If it was present and **errored** (not absent), retry the SAME call **once** — but only if no answer could have surfaced (a missing-result error can arrive after the user already saw the question; retrying would double-prompt, so if it may have reached them, treat as pending, don't retry).
+   - Then branch on \`SESSION_KIND\` (echoed by the preamble; empty/absent ⇒ \`interactive\`):
+     - \`spawned\` → defer to the **Spawned session** block: auto-choose the recommended option. Never prose, never BLOCKED.
+     - \`headless\` → \`BLOCKED — AskUserQuestion unavailable\`; stop and wait (no human can answer).
+     - \`interactive\` → **prose fallback** (below).
+
+**Prose fallback — render the decision brief as a markdown message, not a tool call.** Same information as the tool format below, different structure (paragraphs, not ✅/❌ bullets). It MUST surface this triad:
+
+1. **A clear ELI10 of the issue itself** — plain English on what's being decided and why it matters (the question, not per-choice), naming the stakes. Lead with it.
+2. **Completeness scores per choice** — explicit \`Completeness: X/10\` on EACH choice (10 complete, 7 happy-path, 3 shortcut); use the kind-note when options differ in kind not coverage, but never silently drop the score.
+3. **The recommendation and why** — a \`Recommendation: <choice> because <reason>\` line plus the \`(recommended)\` marker on that choice.
+
+Layout: a \`D<N>\` title + a one-line note that AskUserQuestion failed and to reply with a letter; the issue ELI10; the Recommendation line; then ONE paragraph per choice carrying its \`(recommended)\` marker, its \`Completeness: X/10\`, and 2-4 sentences of reasoning — never a bare bullet list; a closing \`Net:\` line. Split chains / 5+ options: one prose block per per-option call, in sequence. Then STOP and wait — the user's typed answer is the decision. In plan mode this satisfies end-of-turn like a tool call.

 ### Format

-Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose.
+Every AskUserQuestion is a decision brief and must be sent as tool_use, not prose — unless the documented failure fallback above applies (interactive session + the call is unavailable/erroring), in which case the prose fallback is the correct output.

 \`\`\`
 D<N> — <one-line question title>
@@ -93,7 +113,7 @@ Before calling AskUserQuestion, verify:
 - [ ] (recommended) label on one option (even for neutral-posture)
 - [ ] Dual-scale effort labels on effort-bearing options (human / CC)
 - [ ] Net line closes the decision
- [ ] You are calling the tool, not writing prose
+- [ ] You are calling the tool, not writing prose — unless the documented failure fallback applies (then: prose with the mandatory triad — issue ELI10, per-choice Completeness, Recommendation + \`(recommended)\` — and a "reply with a letter" instruction, then STOP)
 - [ ] Non-ASCII characters (CJK / accents) written directly, NOT \\u-escaped
 - [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
 - [ ] If you split, you checked dependencies between options before firing the chain
@@ -26,7 +26,7 @@ In plan mode, allowed because they inform the plan: \`$B\`, \`$D\`, \`codex exec

 ## Skill Invocation During Plan Mode

-If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — \`mcp__*__AskUserQuestion\` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If no variant is callable, the skill is BLOCKED — stop and report \`BLOCKED — AskUserQuestion unavailable\` per the AskUserQuestion Format rule. At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.`;
+If the user invokes a skill in plan mode, the skill takes precedence over generic plan mode behavior. **Treat the skill file as executable instructions, not reference.** Follow it step by step starting from Step 0; the first AskUserQuestion is the workflow entering plan mode, not a violation of it. AskUserQuestion (any variant — \`mcp__*__AskUserQuestion\` or native; see "AskUserQuestion Format → Tool resolution") satisfies plan mode's end-of-turn requirement. If AskUserQuestion is unavailable or a call fails, follow the AskUserQuestion Format failure fallback: \`headless\` → BLOCKED; \`interactive\` → the prose fallback (also satisfies end-of-turn). At a STOP point, stop immediately. Do not continue the workflow or call ExitPlanMode there. Commands marked "PLAN MODE EXCEPTION — ALWAYS RUN" execute. Call ExitPlanMode only after the skill workflow completes, or if the user tells you to cancel the skill or leave plan mode.`;
 }

 export function generateCompletionStatus(ctx: TemplateContext): string {
@@ -47,6 +47,11 @@ const MANDATORY: Array<{ name: string; re: RegExp }> = [
  { name: 'Completeness coverage rule', re: /Completeness\s*:/i },
  { name: 'kind-vs-coverage rule', re: /options differ in kind/i },
  { name: 'Self-check checklist', re: /Self-check before emitting/i },
+  // The runtime-failure fallback must be ALWAYS-LOADED too: when an AUQ call errors
+  // mid-skill, the model needs the prose-fallback rule in context that instant, not
+  // stranded in an on-demand section. Same guarantee as the format spec above.
+  { name: 'AUQ-failure fallback subsection', re: /When AskUserQuestion is unavailable or a call fails/i },
+  { name: 'fallback SESSION_KIND branch', re: /SESSION_KIND/ },
 ];

 /** Per-skill AUQ rules that govern review-finding cadence. A carve may move
@@ -16,6 +16,8 @@
 * for the weekly periodic eval to notice.
 */
 import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
 import type { TemplateContext } from '../scripts/resolvers/types';
 import { HOST_PATHS } from '../scripts/resolvers/types';
 import { generateAskUserFormat } from '../scripts/resolvers/preamble/generate-ask-user-format';
@@ -161,3 +163,88 @@ describe('generateAskUserFormat — 5+ option split rule (slim inline + docs poi
    expect(out).toContain('**Non-ASCII characters');
  });
 });
+
+describe('generateAskUserFormat — runtime-failure prose fallback', () => {
+  const out = generateAskUserFormat(makeCtx());
+
+  test('documents the unavailable/failed subsection', () => {
+    expect(out).toMatch(/When AskUserQuestion is unavailable or a call fails/i);
+  });
+
+  test('carves out the auto-decide denial as NOT a failure', () => {
+    expect(out).toContain('[plan-tune auto-decide]');
+    expect(out).toMatch(/NOT a failure/i);
+    // and explicitly: do not fall back to prose on an auto-decide denial
+    expect(out).toMatch(/Do NOT[\s\S]{0,40}fall back to prose|never prose/i);
+  });
+
+  test('retries the errored call exactly once before degrading', () => {
+    expect(out).toMatch(/retry the SAME call \*\*once\*\*|retry the same call.*once/i);
+    // idempotency guard against double-prompting
+    expect(out).toMatch(/double-prompt|no answer could have surfaced/i);
+  });
+
+  test('branches on SESSION_KIND: spawned / headless / interactive', () => {
+    expect(out).toContain('SESSION_KIND');
+    expect(out).toMatch(/`spawned`[\s\S]*auto-choose/);
+    expect(out).toMatch(/`headless`[\s\S]*BLOCKED/);
+    expect(out).toMatch(/`interactive`[\s\S]*prose fallback/);
+    // empty/absent SESSION_KIND degrades to interactive
+    expect(out).toMatch(/empty\/absent[\s\S]{0,40}interactive/i);
+  });
+
+  // The mandatory triad the user explicitly required for the plain-text output.
+  test('prose fallback mandates the triad: issue ELI10', () => {
+    expect(out).toMatch(/ELI10 of the issue itself/i);
+  });
+
+  test('prose fallback mandates the triad: per-choice Completeness score', () => {
+    expect(out).toMatch(/Completeness scores per choice/i);
+    expect(out).toMatch(/Completeness: X\/10.*EACH choice|on EACH choice/i);
+  });
+
+  test('prose fallback mandates the triad: recommendation + (recommended) marker', () => {
+    expect(out).toMatch(/Recommendation: <choice> because/);
+    expect(out).toMatch(/\(recommended\)`? marker on that choice/);
+  });
+
+  test('prose fallback is one paragraph per choice, not a bare bullet list', () => {
+    expect(out).toMatch(/ONE paragraph per choice/i);
+    expect(out).toMatch(/never a bare bullet list/i);
+  });
+
+  test('prose fallback tells the user to reply with a letter, then STOP', () => {
+    expect(out).toMatch(/reply with a letter/i);
+    expect(out).toMatch(/STOP and wait/i);
+  });
+
+  // OV2: the former "tool_use, not prose" assertions must carry the qualifier so the
+  // fallback is not self-contradicting. Guards against the instruction collision
+  // silently returning on a future edit.
+  test('OV2: the Format line qualifies "not prose" with the fallback exception', () => {
+    expect(out).toMatch(/must be sent as tool_use, not prose — unless the documented failure fallback/);
+  });
+
+  test('OV2: the self-check "not writing prose" line carries the fallback qualifier', () => {
+    expect(out).toMatch(/not writing prose — unless the documented failure fallback applies/);
+  });
+});
+
+describe('CQ2 — cross-file invariant: auto-decide prefix matches the hook', () => {
+  const out = generateAskUserFormat(makeCtx());
+  const hookSrc = fs.readFileSync(
+    path.resolve(__dirname, '..', 'hosts', 'claude', 'hooks', 'question-preference-hook.ts'),
+    'utf-8',
+  );
+
+  test('the hook actually emits the [plan-tune auto-decide] prefix', () => {
+    expect(hookSrc).toContain('[plan-tune auto-decide]');
+  });
+
+  test('the resolver references the exact same prefix the hook emits', () => {
+    // If a future edit reworded the hook reason, this catches the drift: the prose
+    // fallback would stop recognizing the auto-decide denial as not-a-failure.
+    const PREFIX = '[plan-tune auto-decide]';
+    expect(hookSrc.includes(PREFIX) && out.includes(PREFIX)).toBe(true);
+  });
+});