test: expose high-water-mark flags through PlanSkillObservation

The 2KB obs.evidence window often misses the prose-AUQ moment because
ExitPlanMode UI ("Ready to execute" + numbered approve/reject prompt)
pushes the model's earlier option list out of the tail by the time
outcome=plan_ready fires. Tests checking "did the user see a question"
need to consult historical state, not just the truncated final tail.

Adds two optional fields to PlanSkillObservation:
  - proseAUQEverObserved: true if isProseAUQVisible was true at any tick
  - waitingEverObserved: true if the LLM judge ever returned 'waiting'

The 4 plan-mode --disallowedTools tests now check these flags as part
of the surfaceVisible computation:
    isProseAUQVisible(obs.evidence) || obs.proseAUQEverObserved === true
    blockedVisible || proseAUQVisible || obs.waitingEverObserved === true

This catches the autoplan / plan-ceo / plan-eng case where the model
surfaces options briefly, fails to get a response, then keeps thinking
— eventually emitting ExitPlanMode and pushing options out of evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-08 23:19:34 -07:00
parent 6c13b5e657
commit 7757233d53
5 changed files with 35 additions and 8 deletions
+2 -2
View File
@@ -63,8 +63,8 @@ describeE2E('plan-design-review plan-mode smoke (gate)', () => {
// Surface visibility check (same as ceo / autoplan migrations): user
// must SEE the question via BLOCKED string OR prose-rendered AUQ options.
const blockedVisible = /BLOCKED\s*[—-]\s*AskUserQuestion/i.test(obs.evidence);
const proseAUQVisible = isProseAUQVisible(obs.evidence);
const surfaceVisible = blockedVisible || proseAUQVisible;
const proseAUQVisible = isProseAUQVisible(obs.evidence) || obs.proseAUQEverObserved === true;
const surfaceVisible = blockedVisible || proseAUQVisible || obs.waitingEverObserved === true;
if (
obs.outcome === 'auto_decided' ||