test(e2e): split-overflow regression for /plan-ceo-review

Periodic-tier E2E test that catches the original failure mode the
user complained about: 5+ options for ONE decision must split into
N sequential AskUserQuestion calls, not drop one to fit Conductor's
4-option cap.

Fixture: 5 independent chat-platform integration candidates
(Slack/Discord/Teams/Telegram/Mattermost), each carrying its own
include/defer/cut decision. Floor = 4 review-phase AUQs (standard
[N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this
floor.

Wired into test/helpers/touchfiles.ts: tier periodic, depends on
plan-ceo-review/**, the new preamble subsection, the question-pref
binary (for the carve-out), and the runner helper. touchfiles.test.ts
expected count bumped 21 → 22 to account for the new entry.

Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-26 22:27:51 -07:00
parent 975312ef3f
commit d0d8cb2db6
4 changed files with 172 additions and 2 deletions
+6 -2
View File
@@ -105,8 +105,12 @@ describe('selectTests', () => {
expect(result.selected).toContain('auto-decide-preserved');
// v1.27+ gate-tier reviewCount-floor regression for transcript bug
expect(result.selected).toContain('plan-ceo-finding-floor');
expect(result.selected.length).toBe(21);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 21);
// garrytan/askuserquestion-split-on-overflow: split-overflow periodic
// E2E test also depends on plan-ceo-review/** (5-option scope decision
// regression for the "drop to fit 4 options" failure mode).
expect(result.selected).toContain('plan-ceo-split-overflow');
expect(result.selected.length).toBe(22);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 22);
});
test('global touchfile triggers ALL tests', () => {