Files
Garry Tan bc8cab2b5b feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection
Two new E2E tests:
- review-pre-existing-bug: plants SQL injection in base branch, verifies
  Step 5.7 classifies as INFORMATIONAL and recommends /debug
- ship-reverted-qa-commits: creates branch with reverted fix(qa): commits,
  verifies /ship detects them and recommends /debug

Also fixes qa-debug-prompt-logic to use correct workingDirectory, and
ensures test repo init uses -b main for portability.

All 4 debug-related evals pass: $0.34 total, 94s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:49:57 -07:00
..