test: add e2e and LLM eval tests for SKILL.md setup block

- 3 Agent SDK e2e tests: happy path, NEEDS_SETUP, non-git-repo
- LLM eval: setup block clarity + actionability >= 4
- New error pattern: 'no such file or directory.*browse'

These tests catch the exact failure mode where agents can't discover
the browse binary via SKILL.md instructions.
This commit is contained in:
Garry Tan
2026-03-14 02:02:28 -05:00
parent 8e1feb7fa2
commit 60fce976cb
3 changed files with 103 additions and 0 deletions
+1
View File
@@ -23,6 +23,7 @@ const BROWSE_ERROR_PATTERNS = [
/Exit code 1/,
/ERROR: browse binary not found/,
/Server failed to start/,
/no such file or directory.*browse/i,
];
export async function runSkillTest(options: {