Calibration run 1 surfaced a second issue beyond the parser bug: the
default pick of 1 on /plan-ceo-review's scope-selection AUQ routes
the agent to "branch diff vs main" — so it reviews the gstack PR
itself (recursive!) instead of the seeded fixture plan we sent.
Added firstAUQPick callback to runPlanSkillCounting. Override applies
only to the FIRST AUQ; subsequent presses keep using defaultPick.
ceoStep0Boundary now fires on either the mode-pick AUQ (existing path)
or any AUQ containing "Skip interview and plan immediately" — which
is the scope-selection AUQ. Picking that option bypasses Step 0 and
routes straight to review-phase using the chat-paste plan as context.
Plan-ceo test wires firstAUQPick = pickSkipInterview which finds the
"Skip interview" option by label. Falls back to "describe inline" if
the option labels change.
Two new unit tests: ceoStep0Boundary fires on the scope-selection
fixture; existing mode-pick fixture still fires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each test drives its plan-* skill through Step 0 then asserts the
review-phase AskUserQuestion count falls in [N-1, N+2] for an N=5
seeded plan, plus D19: produced plan file ends with
"## GSTACK REVIEW REPORT" as its last "## " heading.
plan-ceo also runs a paired-finding positive control: 2 deliberately
related findings should still produce 2 distinct AUQs, not 1 batched.
Periodic-tier (gate-skipped without EVALS=1, EVALS_TIER=periodic).
Sequential execution by plan §D15. Each fixture is inline TypeScript
content delivered as a follow-up message after the slash command, per
the proven pattern at skill-e2e-plan-design-with-ui.test.ts.
Calibration loop (5 runs per skill) and the manual pre-merge negative
check (D7 + D12) are required before merge per plan §Verification.
NOT yet run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>