mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-17 15:20:11 +02:00
test(plan-ceo): bump --disallowedTools test timeout to 10 min
Last 5 runs showed the model under --disallowedTools spending the full 5-min budget in 'high effort thinking' before surfacing options. The LLM judge correctly reports state=working at every 30s tick, so the high-water-mark fallback never fires. 10-min budget gives the model 20 judge windows to eventually surface the question. Outer bun timeout bumped accordingly to 660s (inner +60s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -107,7 +107,11 @@ describeE2E('plan-ceo-review plan-mode smoke (gate)', () => {
|
||||
skillName: 'plan-ceo-review',
|
||||
inPlanMode: true,
|
||||
extraArgs: ['--disallowedTools', 'AskUserQuestion'],
|
||||
timeoutMs: 300_000,
|
||||
// 10-min budget: post-v1.28 the model under --disallowedTools sometimes
|
||||
// spends 5+ min in "high effort thinking" before surfacing options. The
|
||||
// judge fires every 30s and high-water-marks the first prose-AUQ tick;
|
||||
// 10 min gives the model 20 surfacing windows.
|
||||
timeoutMs: 600_000,
|
||||
});
|
||||
|
||||
// The user must SEE the question one way or another. Three valid surfaces:
|
||||
@@ -159,5 +163,5 @@ describeE2E('plan-ceo-review plan-mode smoke (gate)', () => {
|
||||
// to enforce the at-bottom contract against. The contract is
|
||||
// exercised by the periodic finding-count tests, which DO run the
|
||||
// full review.
|
||||
}, 360_000);
|
||||
}, 660_000); // outer = inner timeoutMs (600_000) + 60s grace
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user