test(auq-compliance): stretch budgets to fit /plan-ceo-review Step 0F

/plan-ceo-review's Step 0F mode-selection AskUserQuestion fires after the
preamble drains: gbrain sync probe, telemetry log, learnings search,
review-readiness dashboard read, recent-artifacts recovery. On a fresh
PTY boot under concurrent test contention (max-concurrency 15), those
bash blocks sometimes consume 200-300 seconds before the first AUQ
renders. The previous 300s budget was tight enough that markersSeen=0
on both main and the contributor wave branch — the model was still
working through preamble when the harness gave up.

Composed budgets:
  - poll budget: 300s → 540s
  - PTY session timeout: 360s → 600s
  - bun test wrapper timeout: 420s → 660s

Each layer outlasts the one inside it. The harness still polls every
2s and breaks as soon as ELI10 + Recommendation + cursor are all
visible, so a fast Step 0F still finishes in seconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-11 09:48:46 -07:00
parent 561594f54a
commit 1919265b49
@@ -73,7 +73,7 @@ describeE2E('AskUserQuestion format compliance (gate)', () => {
async () => {
const session = await launchClaudePty({
permissionMode: 'plan',
timeoutMs: 360_000,
timeoutMs: 600_000,
});
try {
@@ -91,7 +91,16 @@ describeE2E('AskUserQuestion format compliance (gate)', () => {
// While polling, auto-grant any permission dialogs we see in the
// recent tail (preamble side-effects: touch on a sensitive file,
// etc) so the agent isn't blocked.
const budgetMs = 300_000;
//
// Budget bumped 300s → 540s in v1.32: /plan-ceo-review's preamble runs
// multiple bash blocks (gbrain sync probe, telemetry, learnings search,
// dashboard read) before reaching its mode-selection AskUserQuestion in
// Step 0F. On substantive branches (or under contention from concurrent
// tests running at max-concurrency 15), 300s sometimes wasn't enough
// for the model to drain Step 0 work before emitting the first AUQ.
// 540s sits below the suite-level 360s/9min timeout headroom and
// tracks the same magnitude the plan-design-with-ui test uses.
const budgetMs = 540_000;
const start = Date.now();
let captured = '';
let askUserQuestionVisible = false;
@@ -191,6 +200,6 @@ describeE2E('AskUserQuestion format compliance (gate)', () => {
await session.close();
}
},
420_000,
660_000,
);
});