test: expand plan-mode pass envelopes to accept BLOCKED path

Three existing plan-mode regression tests previously codified the
preamble fallback as a valid PASS path under --disallowedTools
AskUserQuestion: outcome=plan_ready was accepted only when the model
wrote a "## Decisions to confirm" section. The forever-war fix deletes
that fallback, so this assertion would fail post-deletion.

Expanded envelope accepts EITHER:
- 'plan_ready' WITH (## Decisions section [legacy] OR BLOCKED string
  visible in TTY [post-fix])
- 'exited' WITH BLOCKED string visible in TTY [post-fix]

The legacy ## Decisions branch stays in the envelope so these tests
keep passing on today's code (where the fallback still exists) and
on tomorrow's code (where the model reports BLOCKED instead). Once
the deletion has been on main long enough that the cache flushes,
the legacy branch can be removed in a follow-up.

Failure signals (regression we DO want to catch) unchanged:
auto_decided / silent_write / timeout / exited-without-BLOCKED /
plan_ready-without-(decisions OR BLOCKED).

- test/skill-e2e-plan-ceo-plan-mode.test.ts (test 2 only)
- test/skill-e2e-autoplan-auto-mode.test.ts
- test/skill-e2e-plan-design-plan-mode.test.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-08 15:25:42 -07:00
parent 4c2bcf5c17
commit 33aab2ac77
3 changed files with 86 additions and 47 deletions
+20 -12
View File
@@ -40,10 +40,17 @@ describeE2E('plan-design-review plan-mode smoke (gate)', () => {
// v1.21+ regression: see skill-e2e-plan-ceo-plan-mode.test.ts for the
// contract. plan-design-review legitimately short-circuits on no-UI-scope
// branches, so this case keeps the same ['asked', 'plan_ready'] envelope
// as the baseline. The discriminating regression signals are
// 'auto_decided' (AUTO_DECIDE preamble fired upstream) or any failure
// outcome — both mean the user never saw a question they should have.
// branches, so this case has historically used a looser envelope.
//
// Post-v1.28 (forever-war fix), 'exited' is acceptable when BLOCKED is
// visible in the TTY (model correctly recognized the AUQ-unavailable
// failure mode and stopped). The legacy 'plan_ready' (with or without
// decisions section) and 'asked' paths remain valid pass outcomes.
//
// The discriminating regression signals are 'auto_decided' (AUTO_DECIDE
// preamble fired upstream), 'silent_write', 'timeout', or 'exited'
// without BLOCKED visible — all mean the user never saw a question they
// should have.
test('does not silently auto-decide when --disallowedTools AskUserQuestion is set', async () => {
const obs = await runPlanSkillObservation({
skillName: 'plan-design-review',
@@ -52,10 +59,11 @@ describeE2E('plan-design-review plan-mode smoke (gate)', () => {
timeoutMs: 300_000,
});
const blockedVisible = /BLOCKED\s*[—-]\s*AskUserQuestion/i.test(obs.evidence);
if (
obs.outcome === 'auto_decided' ||
obs.outcome === 'silent_write' ||
obs.outcome === 'exited' ||
obs.outcome === 'timeout'
) {
throw new Error(
@@ -65,13 +73,13 @@ describeE2E('plan-design-review plan-mode smoke (gate)', () => {
`--- evidence (last 2KB visible) ---\n${obs.evidence}`,
);
}
// plan-design-review legitimately short-circuits to plan_ready on no-UI
// branches. Allow plan_ready WITHOUT a decisions section ONLY if the
// plan file genuinely has no UI scope (we don't have a deterministic way
// to check this from the test, so this skill keeps the looser envelope).
// Other plan-mode skills require the decisions section under
// --disallowedTools; design is the special case.
expect(['asked', 'plan_ready']).toContain(obs.outcome);
if (obs.outcome === 'exited' && !blockedVisible) {
throw new Error(
`plan-design-review AskUserQuestion-blocked regression: outcome=exited without BLOCKED — AskUserQuestion string in TTY. Model quit silently instead of surfacing the failure mode.\n` +
`--- evidence (last 2KB visible) ---\n${obs.evidence}`,
);
}
expect(['asked', 'plan_ready', 'exited']).toContain(obs.outcome);
assertReportAtBottomIfPlanWritten(obs);
}, 360_000);
});