fix: reduce E2E test flakiness — pre-warm browse, simplify ship, accept multi-skill routing

Browse E2E: pre-warm Chromium in beforeAll so agent doesn't waste turns on cold
startup. Reduce maxTurns 10→3. Add CI-aware MAX_START_WAIT (8s→30s when CI=true).

Ship E2E: simplify prompt from full /ship workflow to focused VERSION bump +
CHANGELOG + commit + push. Reduce maxTurns 15→8.

Routing E2E: accept multiple valid skills for ambiguous prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-23 20:24:05 -07:00
parent 9dc04020a4
commit b0c7739c93
4 changed files with 18 additions and 35 deletions
+5 -2
View File
@@ -25,6 +25,9 @@ describeIfSelected('Skill E2E tests', [
testServer = startTestServer();
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-'));
setupBrowseShims(tmpDir);
// Pre-warm the browse server so Chromium is already launched for tests
spawnSync(browseBin, ['goto', testServer.url], { cwd: tmpDir, timeout: 30000, stdio: 'pipe' });
});
afterAll(() => {
@@ -41,7 +44,7 @@ describeIfSelected('Skill E2E tests', [
4. $B screenshot /tmp/skill-e2e-test.png
Report the results of each command.`,
workingDirectory: tmpDir,
maxTurns: 10,
maxTurns: 3,
timeout: 60_000,
testName: 'browse-basic',
runId,
@@ -63,7 +66,7 @@ Report the results of each command.`,
5. $B snapshot -i -a -o /tmp/skill-e2e-annotated.png
Report what each command returned.`,
workingDirectory: tmpDir,
maxTurns: 10,
maxTurns: 3,
timeout: 60_000,
testName: 'browse-snapshot',
runId,