diff --git a/CHANGELOG.md b/CHANGELOG.md index f44af1d9..6d72f954 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,9 +2,9 @@ ## [1.15.0.0] - 2026-04-26 -## **Skill prompts get a 25% haircut. Plan-mode E2E coverage doubles, and AUQ rendering is now testable.** +## **Skill prompts get a 25% haircut. Plan-mode E2E coverage doubles, and AskUserQuestion rendering is now testable.** -Three pieces of work in one release. First, every preamble resolver got compressed: 18 resolvers (Voice, Writing Style, AskUserQuestion Format, Completeness Principle, Plan Mode Info, Brain Sync, Routing Injection, and 11 more) lost a third of their prose without losing a single semantic rule. The full corpus of generated `SKILL.md` files dropped from 3.08 MB to 2.30 MB across 47 outputs. Second, the 5 plan-mode E2E tests added in v1.11.1.0 and rewritten in v1.12.1.0 turned out to have never actually passed — the SDK harness they used couldn't observe Claude's plan-mode confirmation UI. This release ships a real-PTY harness that drives the actual `claude` binary, watches the rendered terminal, and gets all 5 to green. Third, on top of that harness, 6 new E2E tests cover behaviors no test could reach before: AUQ format compliance, plan-design UI-scope detection (positive path), tool-budget regression, /ship idempotency end-to-end, /plan-ceo answer-routing, and /autoplan phase ordering. +Three pieces of work in one release. First, every preamble resolver got compressed: 18 resolvers (Voice, Writing Style, AskUserQuestion Format, Completeness Principle, Plan Mode Info, Brain Sync, Routing Injection, and 11 more) lost a third of their prose without losing a single semantic rule. The full corpus of generated `SKILL.md` files dropped from 3.08 MB to 2.30 MB across 47 outputs. Second, the 5 plan-mode E2E tests added in v1.11.1.0 and rewritten in v1.12.1.0 turned out to have never actually passed — the SDK harness they used couldn't observe Claude's plan-mode confirmation UI. This release ships a real-PTY harness that drives the actual `claude` binary, watches the rendered terminal, and gets all 5 to green. Third, on top of that harness, 6 new E2E tests cover behaviors no test could reach before: AskUserQuestion format compliance, plan-design UI-scope detection (positive path), tool-budget regression, /ship idempotency end-to-end, /plan-ceo answer-routing, and /autoplan phase ordering. ### The numbers that matter @@ -18,7 +18,7 @@ Token-level reduction comes from regenerating every `SKILL.md` against the slim | Plan-mode E2E tests passing | 0/5 | 5/5 | +5 | | Plan-mode E2E wall time | ∞ (never green) | 790 s (sequential) | proven | | Real-PTY E2E test count | 5 | 11 | +6 | -| Gate-tier paid E2E added | 0 | 3 | auq-format, design-with-ui, budget-regression | +| Gate-tier paid E2E added | 0 | 3 | ask-user-question-format, design-with-ui, budget-regression | | Periodic-tier paid E2E added | 0 | 3 | mode-routing, ship-idempotency, autoplan-chain | | New helper unit tests | 0 | 23 | parser + budget regression coverage | @@ -31,7 +31,7 @@ The biggest wins are the tier-≥3 plan reviews that load full preamble surface ### What this means for builders -Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more headroom inside the 200K context window for actual work. The plan-mode E2E tests now actually verify the skill doesn't silently write a plan file when `/plan-ceo-review` runs in plan mode. And the 3 new gate-tier tests catch a class of regression that was previously invisible: AUQ format drift (`Recommendation:` line missing), UI-scope misdetection (positive path), and tool-call budget bloat (a skill burning 3× the tools it used to). Run `bun run gen:skill-docs --host all` after pulling. The 11 plan-mode tests will run in CI on the next gate-tier eval pass. +Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more headroom inside the 200K context window for actual work. The plan-mode E2E tests now actually verify the skill doesn't silently write a plan file when `/plan-ceo-review` runs in plan mode. And the 3 new gate-tier tests catch a class of regression that was previously invisible: AskUserQuestion format drift (`Recommendation:` line missing), UI-scope misdetection (positive path), and tool-call budget bloat (a skill burning 3× the tools it used to). Run `bun run gen:skill-docs --host all` after pulling. The 11 plan-mode tests will run in CI on the next gate-tier eval pass. ### Itemized changes @@ -41,10 +41,10 @@ Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more hea - `parseNumberedOptions(visible)` and `isPermissionDialogVisible(visible)` helpers in `claude-pty-runner.ts`. Tests can now look up an option index by its label without hard-coding positions, and auto-grant Claude Code's file-edit / workspace-trust / bash-permission dialogs that fire during preamble side-effects. - `findBudgetRegressions()` and `assertNoBudgetRegression()` in `test/helpers/eval-store.ts`. Pure functions returning tests that grew >2× in tools or turns vs the prior eval run, with floors at 5 prior tools / 3 prior turns to avoid noise. Env override `GSTACK_BUDGET_RATIO`. - 6 new real-PTY E2E tests on the harness: - - `skill-e2e-auq-format-compliance.test.ts` (gate, ~$0.50/run): asserts every gstack `AskUserQuestion` rendering contains the 7 mandated format elements (ELI10, Recommendation, Pros/Cons with ✅/❌, Net, `(recommended)` label). + - `skill-e2e-ask-user-question-format-compliance.test.ts` (gate, ~$0.50/run): asserts every gstack `AskUserQuestion` rendering contains the 7 mandated format elements (ELI10, Recommendation, Pros/Cons with ✅/❌, Net, `(recommended)` label). - `skill-e2e-plan-design-with-ui.test.ts` (gate, ~$0.80/run): positive coverage for `/plan-design-review` UI-scope detection. Counterpart to the existing no-UI early-exit test — without it, a regression that flips the detector to "early-exit always" would ship undetected. - `skill-budget-regression.test.ts` (gate, free): branch-scoped library-only assertion that no skill burns >2× tools or turns vs its prior recorded run. - - `skill-e2e-plan-ceo-mode-routing.test.ts` (periodic, ~$3/run): verifies AUQ answer routing — HOLD SCOPE picks routes to rigor language, SCOPE EXPANSION picks route to expansion language. + - `skill-e2e-plan-ceo-mode-routing.test.ts` (periodic, ~$3/run): verifies AskUserQuestion answer routing — HOLD SCOPE picks routes to rigor language, SCOPE EXPANSION picks route to expansion language. - `skill-e2e-ship-idempotency.test.ts` (periodic, ~$3/run): runs `/ship` end-to-end against a real git fixture with `STATE: ALREADY_BUMPED` baked in; asserts no double-bump, no double-commit, no fixture mutation. - `skill-e2e-autoplan-chain.test.ts` (periodic, ~$8/run): asserts `/autoplan` phase ordering by tee'ing timestamps as each `**Phase N complete.**` marker appears. - `test/helpers-unit.test.ts`: 23 unit tests covering `parseNumberedOptions` edge cases (empty, partial paint, >9 options, stale-vs-fresh anchoring) and `findBudgetRegressions` (noise floor, env override, missing tool data). @@ -72,7 +72,7 @@ Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more hea #### For contributors -- `test/helpers/touchfiles.ts`: 5 plan-mode test selections + e2e-harness-audit selection now point at `claude-pty-runner.ts` instead of the deleted helper. 6 new entries (`auq-format-pty`, `plan-ceo-mode-routing`, `plan-design-with-ui-scope`, `budget-regression-pty`, `ship-idempotency-pty`, `autoplan-chain-pty`) with tier classifications: 3 gate, 3 periodic. +- `test/helpers/touchfiles.ts`: 5 plan-mode test selections + e2e-harness-audit selection now point at `claude-pty-runner.ts` instead of the deleted helper. 6 new entries (`ask-user-question-format-pty`, `plan-ceo-mode-routing`, `plan-design-with-ui-scope`, `budget-regression-pty`, `ship-idempotency-pty`, `autoplan-chain-pty`) with tier classifications: 3 gate, 3 periodic. - `test/e2e-harness-audit.test.ts`: recognizes `runPlanSkillObservation` as a valid coverage path alongside the legacy `canUseTool` / `runPlanModeSkillTest` patterns. - New unit test: `test/gen-skill-docs.test.ts` asserts plan-review preambles stay under 33 KB and the slim Voice section preserves its load-bearing semantic contract (lead-with-the-point, name-the-file, user-outcome framing, no-corporate, no-AI-vocab, user-sovereignty). - `test/touchfiles.test.ts`: skill-specific change selection count updated 15 → 18 to match the 6 new touchfile entries that depend on `plan-ceo-review/**`. diff --git a/test/helpers-unit.test.ts b/test/helpers-unit.test.ts index c57ff0d2..8585675f 100644 --- a/test/helpers-unit.test.ts +++ b/test/helpers-unit.test.ts @@ -3,7 +3,7 @@ * * - parseNumberedOptions(visible) * Parses `❯ 1.` / ` 2.` numbered-option lines out of TTY text. - * Used by the AUQ format-compliance and mode-routing tests to look + * Used by the AskUserQuestion format-compliance and mode-routing tests to look * up an option index by its label without hard-coding positions. * * - findBudgetRegressions / assertNoBudgetRegression(comparison) @@ -117,7 +117,7 @@ describe('parseNumberedOptions', () => { test('anchors on LAST cursor when both stale and fresh fit in the tail', () => { // Both lists fit in the same 4KB tail (small buffer). The granted - // permission dialog options come first, the real AUQ comes second. + // permission dialog options come first, the real AskUserQuestion comes second. // We must return the FRESH options, not the STALE ones. const visible = [ '❯ 1. STALE_grant', diff --git a/test/helpers/claude-pty-runner.ts b/test/helpers/claude-pty-runner.ts index 3a5cbda2..9025448d 100644 --- a/test/helpers/claude-pty-runner.ts +++ b/test/helpers/claude-pty-runner.ts @@ -143,7 +143,7 @@ export function isPlanReadyVisible(visible: string): boolean { * option list (so isNumberedOptionListVisible matches them) but they * are NOT a skill's AskUserQuestion — they're claude asking the user * whether to grant a tool/file permission. Tests that look for skill - * AUQs must explicitly skip these. + * AskUserQuestions must explicitly skip these. * * Both English phrases below are stable across recent Claude Code * versions. The check is permissive on whitespace because TTY rendering @@ -206,13 +206,13 @@ export function parseNumberedOptions( // visually reads "1. Option" can come through as "1.Option". const optionRe = /^[\s❯]*([1-9])\.\s*(\S.*?)\s*$/; // We anchor on the LATEST `❯ 1.` line in the buffer — the cursor marker - // for the active AUQ. Older numbered lists (e.g., a granted permission + // for the active AskUserQuestion. Older numbered lists (e.g., a granted permission // dialog still in scrollback) sit above it and must be ignored. Without // this, parseNumberedOptions returns stale options after the dialog is // dismissed. const lines = tail.split('\n'); // Anchor on the LAST `❯ 1.` line (cursor is on option 1 of the active - // AUQ). Greedy character classes don't help here — we need a literal + // AskUserQuestion). Greedy character classes don't help here — we need a literal // `❯` after optional leading whitespace. let cursorLineIdx = -1; for (let i = lines.length - 1; i >= 0; i--) { diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts index 75b681c4..8e57e8e5 100644 --- a/test/helpers/touchfiles.ts +++ b/test/helpers/touchfiles.ts @@ -96,7 +96,7 @@ export const E2E_TOUCHFILES: Record = { // Real-PTY E2E batch (#6 new tests on the harness). // Each one tests behavior the SDK harness can't observe (rendered TTY, // numbered-option lists, multi-phase ordering, idempotency state echo). - 'auq-format-pty': ['plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'scripts/resolvers/preamble.ts', 'test/helpers/claude-pty-runner.ts'], + 'ask-user-question-format-pty': ['plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'scripts/resolvers/preamble.ts', 'test/helpers/claude-pty-runner.ts'], 'plan-ceo-mode-routing': ['plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble.ts', 'test/helpers/claude-pty-runner.ts'], 'plan-design-with-ui-scope': ['plan-design-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'], 'budget-regression-pty': ['test/helpers/eval-store.ts', 'test/skill-budget-regression.test.ts'], @@ -351,8 +351,8 @@ export const E2E_TIERS: Record = { // Real-PTY E2E batch — tier classification: // gate: cheap, deterministic, run on every PR // periodic: long-running or expensive (>$3/run), run weekly - 'auq-format-pty': 'gate', // ~$0.50/run, single skill probe - 'plan-ceo-mode-routing': 'periodic', // ~$3/run, deep navigation through 8-12 prior AUQs + 'ask-user-question-format-pty': 'gate', // ~$0.50/run, single skill probe + 'plan-ceo-mode-routing': 'periodic', // ~$3/run, deep navigation through 8-12 prior AskUserQuestions 'plan-design-with-ui-scope': 'gate', // ~$0.80/run 'budget-regression-pty': 'gate', // free, library-only assertion 'ship-idempotency-pty': 'periodic', // ~$3/run, real /ship in plan mode diff --git a/test/skill-e2e-auq-format-compliance.test.ts b/test/skill-e2e-ask-user-question-format-compliance.test.ts similarity index 87% rename from test/skill-e2e-auq-format-compliance.test.ts rename to test/skill-e2e-ask-user-question-format-compliance.test.ts index 233246a0..f0485d85 100644 --- a/test/skill-e2e-auq-format-compliance.test.ts +++ b/test/skill-e2e-ask-user-question-format-compliance.test.ts @@ -16,12 +16,12 @@ * Why real-PTY: the existing skill-e2e-plan-format tests cover what the * AGENT writes via the SDK (capture-to-file harness). This test covers * what the USER actually sees in the terminal — different bug class - * (e.g., AUQ tool truncates long prose, conductor renderer mangles + * (e.g., AskUserQuestion tool truncates long prose, conductor renderer mangles * bullets, model collapses sections under token pressure). Two layers * of defense for a format-discipline regression that previously ate ~6 * weeks of compliance drift before it was noticed. * - * Trigger choice: /plan-ceo-review fires its mode-selection AUQ + * Trigger choice: /plan-ceo-review fires its mode-selection AskUserQuestion * deterministically and early (Step 0F), so we don't need to drive * through any prior questions to reach a format check. * @@ -69,7 +69,7 @@ function findFormatGaps(visible: string): FormatGap[] { describeE2E('AskUserQuestion format compliance (gate)', () => { test( - 'first AUQ from /plan-ceo-review contains all 7 mandated format elements', + 'first AskUserQuestion from /plan-ceo-review contains all 7 mandated format elements', async () => { const session = await launchClaudePty({ permissionMode: 'plan', @@ -82,10 +82,10 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { const since = session.mark(); session.send('/plan-ceo-review\r'); - // Wait for a SKILL AUQ. Strategy: poll the visible buffer until it + // Wait for a SKILL AskUserQuestion. Strategy: poll the visible buffer until it // contains both a numbered-option list AND the format markers we // expect (ELI10 + Recommendation). When both are present, it IS a - // real format-compliant AUQ — not a permission dialog or trust + // real format-compliant AskUserQuestion — not a permission dialog or trust // prompt. // // While polling, auto-grant any permission dialogs we see in the @@ -94,7 +94,7 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { const budgetMs = 300_000; const start = Date.now(); let captured = ''; - let auqVisible = false; + let askUserQuestionVisible = false; let lastPermSig = ''; // Snapshot debug counters every poll so the timeout error shows // WHY we never matched (cursor-found vs markers-found discrepancy). @@ -106,20 +106,20 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { await Bun.sleep(2000); if (session.exited()) { throw new Error( - `claude exited (code=${session.exitCode()}) before AUQ rendered.\n` + + `claude exited (code=${session.exitCode()}) before AskUserQuestion rendered.\n` + `Last visible:\n${session.visibleSince(since).slice(-2000)}`, ); } const visible = session.visibleSince(since); // Marker check: anywhere in the post-slash region. Since `since` // is set right after sending /plan-ceo-review, there's no stale - // AUQ above this line — the only AUQ that can produce these + // AskUserQuestion above this line — the only AskUserQuestion that can produce these // markers is the current one. const hasEli10 = /ELI10\s*:/i.test(visible); const hasRecommend = /Recommendation\s*:/i.test(visible); // Cursor check: a numbered option list near the bottom of the - // buffer means the AUQ is currently rendered (not scrolled away). + // buffer means the AskUserQuestion is currently rendered (not scrolled away). const cursorTail = visible.slice(-4000); const hasCursor = isNumberedOptionListVisible(cursorTail) && parseNumberedOptions(cursorTail).length >= 2; @@ -129,7 +129,7 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { // Permission dialog branch: grant once per unique rendering, but // only when we don't already have format markers visible (so we - // don't accidentally grant a permission inside a real AUQ). + // don't accidentally grant a permission inside a real AskUserQuestion). if ( hasCursor && !(hasEli10 && hasRecommend) && @@ -144,18 +144,18 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { } } - // Real AUQ check: cursor visible AND markers present anywhere in + // Real AskUserQuestion check: cursor visible AND markers present anywhere in // the post-slash region. if (hasCursor && hasEli10 && hasRecommend) { debugBothSeen++; captured = visible; - auqVisible = true; + askUserQuestionVisible = true; break; } } - if (!auqVisible) { + if (!askUserQuestionVisible) { throw new Error( - `AUQ not rendered within ${budgetMs}ms.\n` + + `AskUserQuestion not rendered within ${budgetMs}ms.\n` + `Debug counts: cursorSeen=${debugCursorSeen} markersSeen=${debugMarkersSeen} bothSeen=${debugBothSeen}\n` + `Last visible (4KB):\n${session.visibleSince(since).slice(-4000)}`, ); @@ -165,7 +165,7 @@ describeE2E('AskUserQuestion format compliance (gate)', () => { // Surface the captured text last 3KB on failure for debugging. const tail = captured.slice(-3000); throw new Error( - `AUQ format compliance FAILED — missing ${gaps.length} mandated field(s):\n` + + `AskUserQuestion format compliance FAILED — missing ${gaps.length} mandated field(s):\n` + gaps.map(g => ` - ${g.field} (regex: ${g.re.source})`).join('\n') + `\n--- captured (last 3KB) ---\n${tail}`, ); diff --git a/test/skill-e2e-autoplan-chain.test.ts b/test/skill-e2e-autoplan-chain.test.ts index adf85803..b5e3ce74 100644 --- a/test/skill-e2e-autoplan-chain.test.ts +++ b/test/skill-e2e-autoplan-chain.test.ts @@ -99,7 +99,7 @@ describeE2E('/autoplan chain ordering (periodic)', () => { const visible = session.visibleSince(since); // Auto-grant any permission dialog so autoplan can keep moving - // through its phases. The autoplan template auto-decides AUQs + // through its phases. The autoplan template auto-decides AskUserQuestions // it owns; only permission prompts (file/tool grants) need our // hand-pressing. Classify on tail to avoid stale matches. const recentTail = visible.slice(-1500); diff --git a/test/skill-e2e-plan-ceo-mode-routing.test.ts b/test/skill-e2e-plan-ceo-mode-routing.test.ts index adb75449..4e85ed64 100644 --- a/test/skill-e2e-plan-ceo-mode-routing.test.ts +++ b/test/skill-e2e-plan-ceo-mode-routing.test.ts @@ -11,9 +11,9 @@ * the question but the agent ignores the choice (e.g. always defaults * to EXPANSION) would not be caught by any prior test. * - * Tier: periodic (not gate). Each run navigates 8-12 prior AUQs (telemetry, + * Tier: periodic (not gate). Each run navigates 8-12 prior AskUserQuestions (telemetry, * proactive, routing, vendoring, brain, office-hours, premise×3, approach) - * before reaching Step 0F. At ~30s per AUQ that's a 4-6 min navigation + * before reaching Step 0F. At ~30s per AskUserQuestion that's a 4-6 min navigation * phase per case. The full 2-case suite runs ~12-15 min, $3-4. Too slow * for gate-tier; weekly is fine. * @@ -57,20 +57,20 @@ const CASES: ModeCase[] = [ ]; /** - * Navigate prior AUQs by picking option 1 until we hit an AUQ whose + * Navigate prior AskUserQuestions by picking option 1 until we hit an AskUserQuestion whose * options match one of the 4 mode names. Returns the option index - * matching `targetMode`, with the buffer marker pointing AT that AUQ. + * matching `targetMode`, with the buffer marker pointing AT that AskUserQuestion. * - * Throws if we don't reach the mode AUQ within `maxNav` prior AUQs or + * Throws if we don't reach the mode AskUserQuestion within `maxNav` prior AskUserQuestions or * the overall budget. */ -async function navigateToModeAuq( +async function navigateToModeAskUserQuestion( session: ClaudePtySession, since: number, targetMode: ModeCase['mode'], opts: { maxNav?: number; budgetMs?: number } = {}, ): Promise<{ modeIndex: number; visibleAtMode: string }> { - // /plan-ceo-review's mode AUQ (Step 0F) sits behind several preamble + // /plan-ceo-review's mode AskUserQuestion (Step 0F) sits behind several preamble // and Step 0A-0C-bis gates: telemetry, proactive, routing, vendoring, // brain privacy, office-hours offer, premise challenge (3 questions), // approach selection. 12 hops is the conservative ceiling. @@ -100,12 +100,12 @@ async function navigateToModeAuq( if (sig === lastSig) continue; lastSeenList = opts; - // Is THIS the mode AUQ? + // Is THIS the mode AskUserQuestion? if (opts.some(o => MODE_RE.test(o.label))) { const target = opts.find(o => o.label.toUpperCase().includes(targetMode)); if (!target) { throw new Error( - `Mode AUQ rendered but target "${targetMode}" not in option labels:\n` + + `Mode AskUserQuestion rendered but target "${targetMode}" not in option labels:\n` + opts.map(o => ` ${o.index}. ${o.label}`).join('\n'), ); } @@ -121,10 +121,10 @@ async function navigateToModeAuq( continue; } - // Not the mode AUQ — answer with option 1 (recommended) and continue. + // Not the mode AskUserQuestion — answer with option 1 (recommended) and continue. if (priorAnswered >= maxNav) { throw new Error( - `Navigated ${maxNav} prior AUQs without reaching the mode AUQ. ` + + `Navigated ${maxNav} prior AskUserQuestions without reaching the mode AskUserQuestion. ` + `Last list:\n${opts.map(o => ` ${o.index}. ${o.label}`).join('\n')}`, ); } @@ -133,7 +133,7 @@ async function navigateToModeAuq( // Give the agent a beat to advance before re-polling. await Bun.sleep(2000); } - throw new Error(`Mode AUQ not reached within ${budgetMs}ms`); + throw new Error(`Mode AskUserQuestion not reached within ${budgetMs}ms`); } describeE2E('/plan-ceo-review mode routing (gate)', () => { @@ -150,13 +150,13 @@ describeE2E('/plan-ceo-review mode routing (gate)', () => { const since = session.mark(); session.send('/plan-ceo-review\r'); - const { modeIndex } = await navigateToModeAuq(session, since, c.mode); + const { modeIndex } = await navigateToModeAskUserQuestion(session, since, c.mode); // Snapshot the visible buffer at mode-pick time, then send the index. const sincePick = session.rawOutput().length; session.send(`${modeIndex}\r`); - // Wait for downstream evidence: either next AUQ or plan_ready or + // Wait for downstream evidence: either next AskUserQuestion or plan_ready or // a posture-distinctive substring shows up. const budgetMs = 240_000; const start = Date.now(); @@ -183,7 +183,7 @@ describeE2E('/plan-ceo-review mode routing (gate)', () => { isNumberedOptionListVisible(downstreamSnapshot) && !c.postureRe.test(downstreamSnapshot) ) { - // Plan-ready AND a follow-up AUQ are both visible but + // Plan-ready AND a follow-up AskUserQuestion are both visible but // posture text has not appeared yet. Keep polling for a bit. } } diff --git a/test/skill-e2e-plan-design-with-ui.test.ts b/test/skill-e2e-plan-design-with-ui.test.ts index bb56b143..8d6c87c5 100644 --- a/test/skill-e2e-plan-design-with-ui.test.ts +++ b/test/skill-e2e-plan-design-with-ui.test.ts @@ -3,7 +3,7 @@ * * Counterpart to the existing no-UI early-exit test. When the input plan * DOES describe UI changes, /plan-design-review must NOT early-exit and - * must reach a real skill numbered-option AUQ (its first design-rating + * must reach a real skill numbered-option AskUserQuestion (its first design-rating * question), with the captured evidence NOT echoing the early-exit phrase. * * Why: today we only test the negative path (no-UI → early-exit). A @@ -37,7 +37,7 @@ const FIXTURE = path.join(ROOT, 'test', 'fixtures', 'plans', 'ui-heavy-feature.m describeE2E('/plan-design-review with UI scope (gate)', () => { test( - 'reaches a real skill AUQ (or plan_ready) without echoing the no-UI early-exit phrase', + 'reaches a real skill AskUserQuestion (or plan_ready) without echoing the no-UI early-exit phrase', async () => { const fixtureRelPath = path.relative(ROOT, FIXTURE); @@ -47,7 +47,7 @@ describeE2E('/plan-design-review with UI scope (gate)', () => { timeoutMs: 480_000, }); - let outcome: 'real_auq' | 'plan_ready' | 'timeout' | 'exited' = 'timeout'; + let outcome: 'real_question' | 'plan_ready' | 'timeout' | 'exited' = 'timeout'; let evidence = ''; let debugBuffer = ''; // captured at end so timeout error has data @@ -86,13 +86,13 @@ describeE2E('/plan-design-review with UI scope (gate)', () => { // in visibleSince(since) and would otherwise re-trigger forever. const recentTail = visible.slice(-2500); - // Real skill AUQ visible (not a permission dialog)? + // Real skill AskUserQuestion visible (not a permission dialog)? if ( isNumberedOptionListVisible(recentTail) && parseNumberedOptions(recentTail).length >= 2 && !isPermissionDialogVisible(recentTail) ) { - outcome = 'real_auq'; + outcome = 'real_question'; evidence = visible.slice(-3000); break; } @@ -122,7 +122,7 @@ describeE2E('/plan-design-review with UI scope (gate)', () => { await session.close(); } - // PASS: real_auq or plan_ready, AND evidence does NOT echo the + // PASS: real_question or plan_ready, AND evidence does NOT echo the // early-exit phrase. if (outcome === 'exited' || outcome === 'timeout') { throw new Error( diff --git a/test/touchfiles.test.ts b/test/touchfiles.test.ts index 871f0c82..0d9ada4b 100644 --- a/test/touchfiles.test.ts +++ b/test/touchfiles.test.ts @@ -94,7 +94,7 @@ describe('selectTests', () => { expect(result.selected).toContain('plan-review-prosons-hardstop-neg'); expect(result.selected).toContain('plan-review-prosons-neutral-neg'); // v1.13.x real-PTY E2E batch entries that also depend on plan-ceo-review/** - expect(result.selected).toContain('auq-format-pty'); + expect(result.selected).toContain('ask-user-question-format-pty'); expect(result.selected).toContain('plan-ceo-mode-routing'); expect(result.selected).toContain('autoplan-chain-pty'); expect(result.selected.length).toBe(18);