feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection

Two new E2E tests:
- review-pre-existing-bug: plants SQL injection in base branch, verifies
  Step 5.7 classifies as INFORMATIONAL and recommends /debug
- ship-reverted-qa-commits: creates branch with reverted fix(qa): commits,
  verifies /ship detects them and recommends /debug

Also fixes qa-debug-prompt-logic to use correct workingDirectory, and
ensures test repo init uses -b main for portability.

All 4 debug-related evals pass: $0.34 total, 94s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-18 14:49:57 -07:00
parent 49519f6130
commit bc8cab2b5b
2 changed files with 209 additions and 2 deletions
+4 -2
View File
@@ -92,8 +92,10 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
'gstack-upgrade-happy-path': ['gstack-upgrade/**'],
// Debug escalation
'qa-debug-prompt-logic': ['qa/**', 'debug/**'],
'qa-debug-escalation': ['qa/**', 'debug/**', 'browse/src/**'],
'qa-debug-prompt-logic': ['qa/**', 'debug/**'],
'qa-debug-escalation': ['qa/**', 'debug/**', 'browse/src/**'],
'review-pre-existing-bug': ['review/**', 'debug/**'],
'ship-reverted-qa-commits': ['ship/**', 'debug/**'],
};
/**