Files
gstack/test
Garry Tan 4a56b882ab fix: make planted-bug evals resilient to max_turns and browse error flakes
- Accept error_max_turns as valid exit for planted-bug evals (agent may
  have written partial report before running out of turns)
- Browse snapshot: log browseErrors as warnings instead of hard assertions
  (agent sometimes hallucinates paths like "baltimore" vs "bangalore")
- Fall back to result.output when no report file exists
- What matters is detection rate (outcome judge), not turn completion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 05:29:40 -05:00
..