test(periodic): AUTO_DECIDE opt-in preserved under Conductor flags

Periodic-tier eval that exercises the legitimate /plan-tune AUTO_DECIDE
path under the same flags Conductor uses (--disallowedTools
AskUserQuestion). Confirms the new Tool resolution preamble doesn't trip
opt-in users: when the user has set a never-ask preference for a
question, the model should auto-pick (outcome 'auto_decided' or
'plan_ready') rather than surface the prompt.

Setup runs in an isolated GSTACK_HOME tmpdir — never touches the user's
real ~/.gstack state. Writes question_tuning=true + a never-ask
preference for plan-ceo-review-mode (source: 'plan-tune', which bypasses
the inline-user origin gate). Spawns claude with
--disallowedTools AskUserQuestion in plan mode, runs /plan-ceo-review,
asserts outcome is NOT 'asked' (i.e., the model honored the preference).

Periodic tier because AUTO_DECIDE behavior depends on the model adhering
to the QUESTION_TUNING preamble injection — non-deterministic, weekly
cron is the right cadence rather than CI gating.

Touchfiles cover the AUTO_DECIDE-bearing resolvers + the question-tuning
binaries the test setup invokes. touchfiles.test.ts count updates 19 ->
20 because auto-decide-preserved also depends on plan-ceo-review/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-30 21:28:58 -07:00
parent bec54c2b40
commit 916b6ff50f
3 changed files with 143 additions and 3 deletions
+5 -3
View File
@@ -97,10 +97,12 @@ describe('selectTests', () => {
expect(result.selected).toContain('ask-user-question-format-pty');
expect(result.selected).toContain('plan-ceo-mode-routing');
expect(result.selected).toContain('autoplan-chain-pty');
// v1.21+ auto-mode regression: autoplan-auto-mode also depends on plan-ceo-review/**
// v1.21+ regression: autoplan-auto-mode + auto-decide-preserved also
// depend on plan-ceo-review/**
expect(result.selected).toContain('autoplan-auto-mode');
expect(result.selected.length).toBe(19);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 19);
expect(result.selected).toContain('auto-decide-preserved');
expect(result.selected.length).toBe(20);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 20);
});
test('global touchfile triggers ALL tests', () => {