From 64bbbb21984e1e399afe81fd427c10fdcb9740d0 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 14:41:13 -0700 Subject: [PATCH] =?UTF-8?q?fix:=20plan-design-review-audit=20eval=20?= =?UTF-8?q?=E2=80=94=20bump=20turns=20to=2030,=20add=20efficiency=20hints?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The test was flaky at 20 turns because the agent reads a 300-line SKILL.md, navigates, extracts design data, and writes a report. Added hints to skip preamble/batch commands/write early while still testing the real SKILL.md. Now completes in ~13 turns consistently. Co-Authored-By: Claude Opus 4.6 --- test/skill-e2e.test.ts | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/test/skill-e2e.test.ts b/test/skill-e2e.test.ts index 1d311bdf..34baafbc 100644 --- a/test/skill-e2e.test.ts +++ b/test/skill-e2e.test.ts @@ -2128,9 +2128,11 @@ B="${browseBin}" Read plan-design-review/SKILL.md for the design review workflow. -Review the site at ${testServer.url}. Use --quick mode (homepage + 2 pages). Skip any AskUserQuestion calls — this is non-interactive. Write your audit report to ./design-audit.md. Do not offer to create DESIGN.md.`, +Review the site at ${testServer.url}. Use --quick mode (homepage + 2 pages). Skip any AskUserQuestion calls — this is non-interactive. Write your audit report to ./design-audit.md. Do not offer to create DESIGN.md. + +EFFICIENCY: Skip the preamble bash block. Combine multiple browse commands into single bash blocks (e.g. run all Phase 2 JS extractions in one block). Write the report as soon as you have enough data — do not over-explore.`, workingDirectory: reviewDir, - maxTurns: 20, + maxTurns: 30, timeout: 360_000, testName: 'plan-design-review-audit', runId,