mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-17 23:30:09 +02:00
836f86ab5c
The autoplan E2E surfaces a brief prose-AUQ window (model emits options,
waits ~30s for non-existent test responder, then resumes thinking) that
the existing polling loop misses: by judge-tick time the buffer has
moved into spinner state, so the LLM judge correctly reports 'working'
and the loop times out at 5min.
Adds two flags tracked across polling iterations:
- proseAUQEverObserved: set true the first tick isProseAUQVisible
returns true on the recent buffer
- waitingEverObserved: set true on the first LLM judge 'waiting' verdict
At timeout, if either flag is set, return outcome='asked' with a
summary explaining the historical signal. The model DID surface the
question — we just missed the live-state window.
Snapshot logged with tag='prose-auq-surfaced' when GSTACK_PTY_LOG=1
for postmortem trace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>