mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-28 20:50:05 +02:00
test: expose high-water-mark flags through PlanSkillObservation
The 2KB obs.evidence window often misses the prose-AUQ moment because
ExitPlanMode UI ("Ready to execute" + numbered approve/reject prompt)
pushes the model's earlier option list out of the tail by the time
outcome=plan_ready fires. Tests checking "did the user see a question"
need to consult historical state, not just the truncated final tail.
Adds two optional fields to PlanSkillObservation:
- proseAUQEverObserved: true if isProseAUQVisible was true at any tick
- waitingEverObserved: true if the LLM judge ever returned 'waiting'
The 4 plan-mode --disallowedTools tests now check these flags as part
of the surfaceVisible computation:
isProseAUQVisible(obs.evidence) || obs.proseAUQEverObserved === true
blockedVisible || proseAUQVisible || obs.waitingEverObserved === true
This catches the autoplan / plan-ceo / plan-eng case where the model
surfaces options briefly, fails to get a response, then keeps thinking
— eventually emitting ExitPlanMode and pushing options out of evidence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -83,8 +83,8 @@ describeE2E('plan-eng-review plan-mode smoke (gate)', () => {
|
||||
// section in the plan file (legacy) OR a BLOCKED string in TTY OR
|
||||
// prose-rendered AUQ options in TTY.
|
||||
const blockedVisible = /BLOCKED\s*[—-]\s*AskUserQuestion/i.test(obs.evidence);
|
||||
const proseAUQVisible = isProseAUQVisible(obs.evidence);
|
||||
const surfaceVisible = blockedVisible || proseAUQVisible;
|
||||
const proseAUQVisible = isProseAUQVisible(obs.evidence) || obs.proseAUQEverObserved === true;
|
||||
const surfaceVisible = blockedVisible || proseAUQVisible || obs.waitingEverObserved === true;
|
||||
|
||||
if (
|
||||
obs.outcome === 'auto_decided' ||
|
||||
|
||||
Reference in New Issue
Block a user