test(plan-ceo): bump --disallowedTools test timeout to 10 min

Last 5 runs showed the model under --disallowedTools spending the full
5-min budget in 'high effort thinking' before surfacing options. The LLM
judge correctly reports state=working at every 30s tick, so the
high-water-mark fallback never fires.

10-min budget gives the model 20 judge windows to eventually surface
the question. Outer bun timeout bumped accordingly to 660s (inner +60s).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-08 23:30:56 -07:00
parent 7757233d53
commit c18d9fa308
+6 -2
View File
@@ -107,7 +107,11 @@ describeE2E('plan-ceo-review plan-mode smoke (gate)', () => {
skillName: 'plan-ceo-review',
inPlanMode: true,
extraArgs: ['--disallowedTools', 'AskUserQuestion'],
timeoutMs: 300_000,
// 10-min budget: post-v1.28 the model under --disallowedTools sometimes
// spends 5+ min in "high effort thinking" before surfacing options. The
// judge fires every 30s and high-water-marks the first prose-AUQ tick;
// 10 min gives the model 20 surfacing windows.
timeoutMs: 600_000,
});
// The user must SEE the question one way or another. Three valid surfaces:
@@ -159,5 +163,5 @@ describeE2E('plan-ceo-review plan-mode smoke (gate)', () => {
// to enforce the at-bottom contract against. The contract is
// exercised by the periodic finding-count tests, which DO run the
// full review.
}, 360_000);
}, 660_000); // outer = inner timeoutMs (600_000) + 60s grace
});