gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-06-17 23:30:09 +02:00

Files

T

Garry Tan 7757233d53 test: expose high-water-mark flags through PlanSkillObservation

The 2KB obs.evidence window often misses the prose-AUQ moment because
ExitPlanMode UI ("Ready to execute" + numbered approve/reject prompt)
pushes the model's earlier option list out of the tail by the time
outcome=plan_ready fires. Tests checking "did the user see a question"
need to consult historical state, not just the truncated final tail.

Adds two optional fields to PlanSkillObservation:
  - proseAUQEverObserved: true if isProseAUQVisible was true at any tick
  - waitingEverObserved: true if the LLM judge ever returned 'waiting'

The 4 plan-mode --disallowedTools tests now check these flags as part
of the surfaceVisible computation:
    isProseAUQVisible(obs.evidence) || obs.proseAUQEverObserved === true
    blockedVisible || proseAUQVisible || obs.waitingEverObserved === true

This catches the autoplan / plan-ceo / plan-eng case where the model
surfaces options briefly, fails to get a response, then keeps thinking
— eventually emitting ExitPlanMode and pushing options out of evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 23:19:34 -07:00

providers

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

agent-sdk-runner.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

benchmark-judge.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

benchmark-runner.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )