Files
gstack/test/touchfiles.test.ts
Garry Tan 6e1625c0d7 v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287)
* test(harness): plumb extraArgs and auto_decided outcome through PTY runner

runPlanSkillObservation now accepts extraArgs that pass through to
launchClaudePty (which already supported them at the lower level), and
exposes a new 'auto_decided' outcome detected via isAutoDecidedVisible
when the AUTO_DECIDE preamble template fires (Auto-decided ... (your
preference)).

Both pieces are needed for the v1.21+ AskUserQuestion-blocked regression
tests in the next commit. Detection order is deliberate: 'asked' (rendered
numbered list) wins over 'auto_decided' (text only, no list), which wins
over 'plan_ready' so the auto-decide evidence isn't masked by a downstream
plan-mode confirmation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): add AskUserQuestion-blocked regression cases for 6 plan-mode skills

Conductor launches Claude Code with --disallowedTools AskUserQuestion
--permission-mode default --permission-prompt-tool stdio (verified by
inspecting the live conductor claude process via ps -p ... -o args=).
Native AskUserQuestion is removed from the model's tool registry; without
fallback guidance the plan-mode skills (plan-ceo-review, plan-eng-review,
plan-design-review, plan-devex-review, autoplan, office-hours) silently
proceed and never surface decisions to the user.

Adds 6 gate-tier real-PTY regression cases:

  - 4 inline test cases inside the existing plan-X-review-plan-mode.test
    files, each exercising the same skill with extraArgs ['--disallowedTools',
    'AskUserQuestion'] and asserting outcome === 'asked'. plan-design-review
    keeps the ['asked', 'plan_ready'] envelope (legitimate short-circuit on
    no-UI-scope) but explicitly fails on 'auto_decided'.
  - 2 standalone test files for autoplan + office-hours (which had no prior
    plan-mode test). autoplan asserts the FIRST non-auto-decided gate fires
    (Phase 1 premise confirmation) — autoplan auto-decides intermediate
    questions BY DESIGN.

Touchfile entries:
  - autoplan-auto-mode + office-hours-auto-mode added to E2E_TOUCHFILES +
    E2E_TIERS (gate)
  - existing plan-X-review-plan-mode entries gain question-tuning.ts and
    generate-ask-user-format.ts touchfile deps so AUTO_DECIDE-related
    resolver changes correctly invalidate the regression tests
  - touchfiles.test.ts count updated 18 -> 19 to cover the autoplan
    touchfile dependency on plan-ceo-review/**

Filenames retain `auto-mode` for branch-history continuity. Auto-mode (the
AUTO_DECIDE preamble path when QUESTION_TUNING=true) is a related but
distinct silencing mechanism; both share the same fix surface in the
preamble.

These tests are expected to FAIL on this branch until the fix lands. The
failure is the receipt for the regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(preamble): teach the model to prefer mcp__*__AskUserQuestion when registered

When a host launches Claude Code with --disallowedTools AskUserQuestion
(Conductor does this by default — verified via ps on the live conductor
claude process), the native AskUserQuestion tool is removed from the
model's tool registry. Skill templates that say "call AskUserQuestion"
silently fail in that environment: the model can't ask, the user never
sees the question, the skill auto-proceeds without input.

The fix is preamble guidance, not a skill-template change:

  generate-ask-user-format.ts: new "Tool resolution" section at the top
  of the AskUserQuestion Format block. Tells the model that
  "AskUserQuestion" can resolve to two tools at runtime — the host MCP
  variant (e.g. mcp__conductor__AskUserQuestion, registered when the
  host injects it) and the native tool — and to PREFER any
  mcp__*__AskUserQuestion variant. Same questions/options shape; same
  decision-brief format. If neither variant is callable, fall back to
  writing a "## Decisions to confirm" section into the plan file plus
  ExitPlanMode (the native plan-mode confirmation surfaces it). Never
  silently auto-decide.

  generate-completion-status.ts: the plan-mode-info block (preamble
  position 1) now explicitly notes that AskUserQuestion satisfies plan
  mode's end-of-turn requirement for "any variant" and points at the
  Tool resolution section for the fallback path.

This puts the resolution rule in front of every tier-≥2 skill via the
preamble, so plan-mode review skills (plan-ceo-review, plan-eng-review,
plan-design-review, plan-devex-review, autoplan, office-hours) all gain
the fix without per-template surgery.

Includes regenerated SKILL.md files for all 41 skills + the 3 host-ship
golden fixtures used by test/host-config.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(periodic): AUTO_DECIDE opt-in preserved under Conductor flags

Periodic-tier eval that exercises the legitimate /plan-tune AUTO_DECIDE
path under the same flags Conductor uses (--disallowedTools
AskUserQuestion). Confirms the new Tool resolution preamble doesn't trip
opt-in users: when the user has set a never-ask preference for a
question, the model should auto-pick (outcome 'auto_decided' or
'plan_ready') rather than surface the prompt.

Setup runs in an isolated GSTACK_HOME tmpdir — never touches the user's
real ~/.gstack state. Writes question_tuning=true + a never-ask
preference for plan-ceo-review-mode (source: 'plan-tune', which bypasses
the inline-user origin gate). Spawns claude with
--disallowedTools AskUserQuestion in plan mode, runs /plan-ceo-review,
asserts outcome is NOT 'asked' (i.e., the model honored the preference).

Periodic tier because AUTO_DECIDE behavior depends on the model adhering
to the QUESTION_TUNING preamble injection — non-deterministic, weekly
cron is the right cadence rather than CI gating.

Touchfiles cover the AUTO_DECIDE-bearing resolvers + the question-tuning
binaries the test setup invokes. touchfiles.test.ts count updates 19 ->
20 because auto-decide-preserved also depends on plan-ceo-review/**.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.21.0.0: AskUserQuestion resolves to host MCP variant when native is disallowed

MINOR scale per scale-aware bumps in CLAUDE.md: substantial coordinated
multi-file change (preamble fix + new test infrastructure + 6 gate-tier
regression cases + 1 periodic eval) and a user-visible regression fix
that affects every plan-mode review skill running under Conductor's
default flag set.

User originally targeted v1.21.2.0; landing as v1.21.0.0 since this is
the first 1.21.x release on main and there's no prior 1.21.0.0/1.21.1.0
to skip past. Adjust at /ship time if a different number is preferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(harness): fix detection order + whitespace-tolerant pattern matching

Two bugs surfaced when validating the v1.21 fix end-to-end:

1. PlanSkillObservation outcome detection ran 'asked' (any numbered
   options list) BEFORE 'plan_ready'. Plan-mode's "Ready to execute?"
   confirmation IS a numbered options list (1=auto, 2=manual, ...), so
   any skill that successfully reached the native confirmation got
   misclassified as 'asked'. Reorder: 'auto_decided' (most specific,
   requires AUTO_DECIDE annotation) > 'plan_ready' (next, requires the
   "ready to execute" stem) > 'asked' (any remaining numbered list).

2. isPlanReadyVisible and isAutoDecidedVisible regexes only matched
   spaced forms ("ready to execute", "(your preference)"). stripAnsi
   removes cursor-positioning escapes (`\x1b[40C`) entirely instead of
   replacing them with spaces, so the same text can render as
   "readytoexecute" or "(yourpreference)". Both detectors now test the
   spaced form first, fall through to a whitespace-collapsed comparison.
   Inline unit smoke confirms both forms match.

Updates to the 5 strict 'asked' regression test cases (plan-ceo,
plan-eng, plan-devex, autoplan, office-hours): with the detection order
corrected, the model's plan-file fallback flow legitimately lands at
'plan_ready' instead of 'asked'. Pass envelope expanded to ['asked',
'plan_ready'] (matching plan-design-review's existing pattern). Failure
signals tightened to include 'auto_decided' (catches AUTO_DECIDE without
opt-in) plus the standard silent_write/exited/timeout. plan-design was
already on this contract from v1.21's first commit, no change needed.

The expanded envelope is correct: under --disallowedTools AskUserQuestion
the Tool resolution preamble routes the question through plan-mode's
native "Ready to execute?" surface — the user still sees the decision,
just via the plan-file flow rather than a numbered prompt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(harness): require ## Decisions section under --disallowedTools plan_ready

Adversarial review (during /ship Step 11) found that the previous gate-test
envelope ['asked', 'plan_ready'] for the AskUserQuestion-blocked regression
cases accepted the bug they exist to catch: a model that silently skips
Step 0 entirely (writes a plan with no questions, no `## Decisions to
confirm` section, just ExitPlanModes) reaches plan_ready and passes.

The fix tightens the contract in two layers:

1. Harness: PlanSkillObservation gains a `planFile?: string` field
   populated when outcome is plan_ready. extractPlanFilePath() walks the
   visible TTY buffer for "Plan saved to:", "Plan file:", or
   ".claude/plans/<name>.md" patterns and resolves tilde to absolute.
   planFileHasDecisionsSection() reads the resolved file and returns true
   if it contains a `## Decisions` heading (any form: "to confirm",
   "needed", etc.).

2. Tests: 5 of 6 regression cases now require, when outcome is plan_ready,
   that obs.planFile is set AND planFileHasDecisionsSection returns true.
   Otherwise the test fails with a "Step 0 was silently skipped" diagnosis.
   plan-design-review remains the sole exception — it legitimately
   short-circuits to plan_ready on no-UI-scope branches and we have no
   deterministic way to distinguish that from a silent skip.

This closes the loophole the adversarial review identified. The fix
preamble flow already tells the model to write `## Decisions to confirm`
when neither AUQ variant is callable — now the test verifies the model
actually did it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(harness): anchor extractPlanFilePath path captures on /Users|~|/home|/var|/tmp

Adversarial-tightened gate sweep surfaced a real bug in the path
extraction: stripAnsi collapses whitespace via cursor-positioning escape
removal, so "yet at /Users/..." in the visible buffer becomes
"yetat/Users/..." with no space between. The previous fallback pattern
`(~?\/?\S*\.claude\/plans\/[\w-]+\.md)` greedily matched non-whitespace
characters BEFORE the path, producing `yetat/Users/garrytan/.claude/...`
which then fails fs.readFileSync.

Fix: every regex now requires the path to START at a known path-anchor:
`~/`, `/Users/`, `/home/`, `/var/`, `/tmp/`, or `./`. Earlier
non-whitespace runs can't be glommed in.

Verified against the failing fixture (`yetat/Users/...`) plus the four
canonical render forms ("Plan saved to:", "Plan file:", `·`-decorated
ctrl-g hint, and the bare fallback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:45:36 -07:00

325 lines
13 KiB
TypeScript

/**
* Unit tests for diff-based test selection.
* Free (no API calls), runs with `bun test`.
*/
import { describe, test, expect } from 'bun:test';
import { spawnSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import {
matchGlob,
selectTests,
detectBaseBranch,
E2E_TOUCHFILES,
E2E_TIERS,
LLM_JUDGE_TOUCHFILES,
GLOBAL_TOUCHFILES,
} from './helpers/touchfiles';
const ROOT = path.resolve(import.meta.dir, '..');
// --- matchGlob ---
describe('matchGlob', () => {
test('** matches any depth of path segments', () => {
expect(matchGlob('browse/src/commands.ts', 'browse/src/**')).toBe(true);
expect(matchGlob('browse/src/deep/nested/file.ts', 'browse/src/**')).toBe(true);
expect(matchGlob('browse/src/cli.ts', 'browse/src/**')).toBe(true);
});
test('** does not match unrelated paths', () => {
expect(matchGlob('browse/src/commands.ts', 'qa/**')).toBe(false);
expect(matchGlob('review/SKILL.md', 'qa/**')).toBe(false);
});
test('exact match works', () => {
expect(matchGlob('SKILL.md', 'SKILL.md')).toBe(true);
expect(matchGlob('SKILL.md.tmpl', 'SKILL.md')).toBe(false);
expect(matchGlob('qa/SKILL.md', 'SKILL.md')).toBe(false);
});
test('* matches within a single segment', () => {
expect(matchGlob('test/fixtures/review-eval-enum.rb', 'test/fixtures/review-eval-enum*.rb')).toBe(true);
expect(matchGlob('test/fixtures/review-eval-enum-diff.rb', 'test/fixtures/review-eval-enum*.rb')).toBe(true);
expect(matchGlob('test/fixtures/review-eval-vuln.rb', 'test/fixtures/review-eval-enum*.rb')).toBe(false);
});
test('dots in patterns are escaped correctly', () => {
expect(matchGlob('SKILL.md', 'SKILL.md')).toBe(true);
expect(matchGlob('SKILLxmd', 'SKILL.md')).toBe(false);
});
test('** at end matches files in the directory', () => {
expect(matchGlob('qa/SKILL.md', 'qa/**')).toBe(true);
expect(matchGlob('qa/SKILL.md.tmpl', 'qa/**')).toBe(true);
expect(matchGlob('qa/templates/report.md', 'qa/**')).toBe(true);
});
});
// --- selectTests ---
describe('selectTests', () => {
test('browse/src change selects browse and qa tests', () => {
const result = selectTests(['browse/src/commands.ts'], E2E_TOUCHFILES);
expect(result.selected).toContain('browse-basic');
expect(result.selected).toContain('browse-snapshot');
expect(result.selected).toContain('qa-quick');
expect(result.selected).toContain('qa-fix-loop');
expect(result.selected).toContain('design-review-fix');
expect(result.reason).toBe('diff');
// Should NOT include unrelated tests
expect(result.selected).not.toContain('plan-ceo-review');
expect(result.selected).not.toContain('retro');
expect(result.selected).not.toContain('document-release');
});
test('skill-specific change selects only that skill and related tests', () => {
const result = selectTests(['plan-ceo-review/SKILL.md'], E2E_TOUCHFILES);
expect(result.selected).toContain('plan-ceo-review');
expect(result.selected).toContain('plan-ceo-review-selective');
expect(result.selected).toContain('plan-ceo-review-benefits');
expect(result.selected).toContain('plan-ceo-review-expansion-energy');
expect(result.selected).toContain('autoplan-core');
expect(result.selected).toContain('codex-offered-ceo-review');
expect(result.selected).toContain('plan-ceo-review-format-mode');
expect(result.selected).toContain('plan-ceo-review-format-approach');
// v1.10.2.0 plan-mode handshake entries also depend on plan-ceo-review/**
expect(result.selected).toContain('plan-ceo-review-plan-mode');
expect(result.selected).toContain('plan-mode-no-op');
expect(result.selected).toContain('e2e-harness-audit');
expect(result.selected).toContain('plan-ceo-review-prosons-cadence');
expect(result.selected).toContain('plan-review-prosons-format');
expect(result.selected).toContain('plan-review-prosons-hardstop-neg');
expect(result.selected).toContain('plan-review-prosons-neutral-neg');
// v1.13.x real-PTY E2E batch entries that also depend on plan-ceo-review/**
expect(result.selected).toContain('ask-user-question-format-pty');
expect(result.selected).toContain('plan-ceo-mode-routing');
expect(result.selected).toContain('autoplan-chain-pty');
// Per-finding count + review-report-at-bottom (v1.21.x)
expect(result.selected).toContain('plan-ceo-finding-count');
// v1.22+ AskUserQuestion-blocked regression: autoplan-auto-mode +
// auto-decide-preserved also depend on plan-ceo-review/**
expect(result.selected).toContain('autoplan-auto-mode');
expect(result.selected).toContain('auto-decide-preserved');
expect(result.selected.length).toBe(21);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 21);
});
test('global touchfile triggers ALL tests', () => {
const result = selectTests(['test/helpers/session-runner.ts'], E2E_TOUCHFILES);
expect(result.selected.length).toBe(Object.keys(E2E_TOUCHFILES).length);
expect(result.skipped.length).toBe(0);
expect(result.reason).toContain('global');
});
test('gen-skill-docs.ts is a scoped touchfile, not global', () => {
const result = selectTests(['scripts/gen-skill-docs.ts'], E2E_TOUCHFILES);
// Should select tests that list gen-skill-docs.ts in their touchfiles, not ALL tests
expect(result.selected.length).toBeGreaterThan(0);
expect(result.selected.length).toBeLessThan(Object.keys(E2E_TOUCHFILES).length);
expect(result.reason).toBe('diff');
// Should include tests that depend on gen-skill-docs.ts
expect(result.selected).toContain('skillmd-setup-discovery');
expect(result.selected).toContain('session-awareness');
expect(result.selected).toContain('journey-ideation');
// Should NOT include tests that don't depend on it
expect(result.selected).not.toContain('retro');
expect(result.selected).not.toContain('cso-full-audit');
});
test('unrelated file selects nothing', () => {
const result = selectTests(['README.md'], E2E_TOUCHFILES);
expect(result.selected).toEqual([]);
expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length);
});
test('empty changed files selects nothing', () => {
const result = selectTests([], E2E_TOUCHFILES);
expect(result.selected).toEqual([]);
});
test('multiple changed files union their selections', () => {
const result = selectTests(
['plan-ceo-review/SKILL.md', 'retro/SKILL.md.tmpl'],
E2E_TOUCHFILES,
);
expect(result.selected).toContain('plan-ceo-review');
expect(result.selected).toContain('plan-ceo-review-selective');
expect(result.selected).toContain('retro');
expect(result.selected).toContain('retro-base-branch');
// Also selects journey routing tests (*/SKILL.md.tmpl matches retro/SKILL.md.tmpl)
expect(result.selected.length).toBeGreaterThanOrEqual(4);
});
test('works with LLM_JUDGE_TOUCHFILES', () => {
const result = selectTests(['qa/SKILL.md'], LLM_JUDGE_TOUCHFILES);
expect(result.selected).toContain('qa/SKILL.md workflow');
expect(result.selected).toContain('qa/SKILL.md health rubric');
expect(result.selected).toContain('qa/SKILL.md anti-refusal');
expect(result.selected.length).toBe(3);
});
test('SKILL.md.tmpl root template selects root-dependent tests and routing tests', () => {
const result = selectTests(['SKILL.md.tmpl'], E2E_TOUCHFILES);
// Should select the 7 tests that depend on root SKILL.md
expect(result.selected).toContain('skillmd-setup-discovery');
expect(result.selected).toContain('session-awareness');
expect(result.selected).toContain('session-awareness');
// Also selects journey routing tests (SKILL.md.tmpl in their touchfiles)
expect(result.selected).toContain('journey-ideation');
// Should NOT select unrelated non-routing tests
expect(result.selected).not.toContain('plan-ceo-review');
expect(result.selected).not.toContain('retro');
});
test('global touchfiles work for LLM-judge tests too', () => {
const result = selectTests(['test/helpers/session-runner.ts'], LLM_JUDGE_TOUCHFILES);
expect(result.selected.length).toBe(Object.keys(LLM_JUDGE_TOUCHFILES).length);
});
});
// --- detectBaseBranch ---
describe('detectBaseBranch', () => {
test('detects local main branch', () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'touchfiles-test-'));
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
fs.writeFileSync(path.join(dir, 'test.txt'), 'hello\n');
run('git', ['add', '.']);
run('git', ['commit', '-m', 'init']);
const result = detectBaseBranch(dir);
// Should find 'main' (or 'master' depending on git default)
expect(result).toMatch(/^(main|master)$/);
try { fs.rmSync(dir, { recursive: true, force: true }); } catch {}
});
test('returns null for empty repo with no branches', () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'touchfiles-test-'));
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
// No commits = no branches
const result = detectBaseBranch(dir);
expect(result).toBeNull();
try { fs.rmSync(dir, { recursive: true, force: true }); } catch {}
});
test('returns null for non-git directory', () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'touchfiles-test-'));
const result = detectBaseBranch(dir);
expect(result).toBeNull();
try { fs.rmSync(dir, { recursive: true, force: true }); } catch {}
});
});
// --- Completeness: every testName in skill-e2e-*.test.ts has a TOUCHFILES entry ---
describe('TOUCHFILES completeness', () => {
test('every E2E testName has a TOUCHFILES entry', () => {
// Read all split E2E test files
const testDir = path.join(ROOT, 'test');
const e2eFiles = fs.readdirSync(testDir).filter(f => f.startsWith('skill-e2e-') && f.endsWith('.test.ts'));
let e2eContent = '';
for (const f of e2eFiles) {
e2eContent += fs.readFileSync(path.join(testDir, f), 'utf-8') + '\n';
}
// Extract all testName: 'value' entries
const testNameRegex = /testName:\s*['"`]([^'"`]+)['"`]/g;
const testNames: string[] = [];
let match;
while ((match = testNameRegex.exec(e2eContent)) !== null) {
let name = match[1];
// Handle template literals like `qa-${label}` — these expand to
// qa-b6-static, qa-b7-spa, qa-b8-checkout
if (name.includes('${')) continue; // skip template literals, check expanded forms below
testNames.push(name);
}
// Add the template-expanded testNames from runPlantedBugEval calls
const plantedBugRegex = /runPlantedBugEval\([^,]+,\s*[^,]+,\s*['"`]([^'"`]+)['"`]\)/g;
while ((match = plantedBugRegex.exec(e2eContent)) !== null) {
testNames.push(`qa-${match[1]}`);
}
expect(testNames.length).toBeGreaterThan(0);
const missing = testNames.filter(name => !(name in E2E_TOUCHFILES));
if (missing.length > 0) {
throw new Error(
`E2E tests missing TOUCHFILES entries: ${missing.join(', ')}\n` +
`Add these to E2E_TOUCHFILES in test/helpers/touchfiles.ts`,
);
}
});
test('E2E_TIERS covers exactly the same tests as E2E_TOUCHFILES', () => {
const touchfileKeys = new Set(Object.keys(E2E_TOUCHFILES));
const tierKeys = new Set(Object.keys(E2E_TIERS));
const missingFromTiers = [...touchfileKeys].filter(k => !tierKeys.has(k));
const extraInTiers = [...tierKeys].filter(k => !touchfileKeys.has(k));
if (missingFromTiers.length > 0) {
throw new Error(
`E2E tests missing TIER entries: ${missingFromTiers.join(', ')}\n` +
`Add these to E2E_TIERS in test/helpers/touchfiles.ts`,
);
}
if (extraInTiers.length > 0) {
throw new Error(
`E2E_TIERS has extra entries not in E2E_TOUCHFILES: ${extraInTiers.join(', ')}\n` +
`Remove these from E2E_TIERS or add to E2E_TOUCHFILES`,
);
}
});
test('E2E_TIERS only contains valid tier values', () => {
const validTiers = ['gate', 'periodic'];
for (const [name, tier] of Object.entries(E2E_TIERS)) {
if (!validTiers.includes(tier)) {
throw new Error(`E2E_TIERS['${name}'] has invalid tier '${tier}'. Valid: ${validTiers.join(', ')}`);
}
}
});
test('every LLM-judge test has a TOUCHFILES entry', () => {
const llmContent = fs.readFileSync(
path.join(ROOT, 'test', 'skill-llm-eval.test.ts'),
'utf-8',
);
// Extract test names from addTest({ name: '...' }) calls
const nameRegex = /name:\s*['"`]([^'"`]+)['"`]/g;
const testNames: string[] = [];
let match;
while ((match = nameRegex.exec(llmContent)) !== null) {
testNames.push(match[1]);
}
// Deduplicate (some tests call addTest with the same name)
const unique = [...new Set(testNames)];
expect(unique.length).toBeGreaterThan(0);
const missing = unique.filter(name => !(name in LLM_JUDGE_TOUCHFILES));
if (missing.length > 0) {
throw new Error(
`LLM-judge tests missing TOUCHFILES entries: ${missing.join(', ')}\n` +
`Add these to LLM_JUDGE_TOUCHFILES in test/helpers/touchfiles.ts`,
);
}
});
});