mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
69733e2622
* test: add AskUserQuestion format regression eval for plan reviews Four-case periodic-tier eval that captures the verbatim AskUserQuestion text /plan-ceo-review and /plan-eng-review produce, then asserts the format rule is honored: RECOMMENDATION always, Completeness: N/10 only on coverage-differentiated options, and an explicit "options differ in kind" note on kind-differentiated options. Cases: - plan-ceo-review mode selection (kind-differentiated) - plan-ceo-review approach menu (coverage-differentiated) - plan-eng-review per-issue coverage decision - plan-eng-review per-issue architectural choice (kind-differentiated) Classified periodic because behavior depends on Opus non-determinism — gate-tier would flake and block merges. Test harness instructs the agent to write its would-be AskUserQuestion text to $OUT_FILE rather than invoke a real tool (MCP AskUserQuestion isn't wired in the test subprocess). Regex predicates then validate the captured content. Cost: ~$2 per full run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(plan-reviews): restore RECOMMENDATION + split Completeness by question type Opus 4.7 users reported /plan-ceo-review and /plan-eng-review stopped emitting the RECOMMENDATION line and per-option Completeness: X/10 scores. E2E capture showed the real failure mode: on kind-differentiated questions (mode selection, architectural A-vs-B, cherry-pick), Opus 4.7 either fabricated filler scores (10/10 on every option — conveys nothing) or dropped the format entirely when the metric didn't fit. Fix is at two layers: 1. scripts/resolvers/preamble/generate-ask-user-format.ts splits the old run-on step 3 into: - Step 3 "Recommend (ALWAYS)": RECOMMENDATION is required on every question, coverage- or kind-differentiated. - Step 4 "Score completeness (when meaningful)": emit Completeness: N/10 only when options differ in coverage. When options differ in kind, skip the score and include a one-line explanatory note. Do not fabricate scores. 2. scripts/resolvers/preamble/generate-completeness-section.ts updates the Completeness Principle tail to match. Without this, the preamble contained two rules (one conditional, one unconditional) and the model hedged. Template anchors reinforce the distinction where agent judgment is most likely to drift: - plan-ceo-review Section 0C-bis (approach menu) gets the coverage-differentiated anchor. - plan-ceo-review Section 0F (mode selection) gets the kind-differentiated anchor. - plan-eng-review CRITICAL RULE section gets the coverage-vs-kind rule for every per-issue AskUserQuestion raised during the review. Regenerated SKILL.md for all T2 skills + golden fixtures refreshed. Every skill using the T2 preamble now has the same conditional scoring rule. Verified via new periodic-tier eval (test/skill-e2e-plan-format.test.ts): all 4 cases fail on prior behavior, all 4 pass with this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.6.2.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: add Codex eval for AskUserQuestion format compliance Four-case periodic-tier eval mirrors test/skill-e2e-plan-format.test.ts but drives the plan review skills via codex exec instead of claude -p. Context: Codex under the gpt.md "No preamble / Prefer doing over listing" overlay tends to skip the Simplify/ELI10 paragraph and the RECOMMENDATION line on AskUserQuestion calls. Users have to manually re-prompt "ELI10 and don't forget to recommend" almost every time. This test pins the behavior so regressions surface. Cases: - plan-ceo-review mode selection (kind-differentiated) - plan-ceo-review approach menu (coverage-differentiated) - plan-eng-review per-issue coverage decision - plan-eng-review per-issue architectural choice (kind-differentiated) Assertions on captured AskUserQuestion text: - RECOMMENDATION: Choose present (all cases) - Completeness: N/10 present on coverage, absent on kind - "options differ in kind" note present on kind - ELI10 length floor (>400 chars) — catches bare options-only output Cost: ~\$2-4 per full run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(preamble): harden AskUserQuestion Format + Codex ELI10 carve-out Follow-up to v1.6.2.0. Codex (GPT-5.4) under the gpt.md overlay treated "No preamble / Prefer doing over listing" as license to skip the Simplify paragraph and the RECOMMENDATION line on AskUserQuestion calls. Users had to manually re-prompt "ELI10 and don't forget to recommend" almost every time. Two layers: 1. model-overlays/gpt.md — adds an explicit "AskUserQuestion is NOT preamble" carve-out. The "No preamble" rule applies to direct answers; AskUserQuestion content must emit the full format (Re-ground, Simplify/ELI10, Recommend, Options). Tells the model: if you find yourself about to skip any of these, back up and emit them — the user will ask anyway, so do it the first time. 2. scripts/resolvers/preamble/generate-ask-user-format.ts — step 2 renamed to "Simplify (ELI10, ALWAYS)" with explicit "not optional verbosity, not preamble" framing. Step 3 "Recommend (ALWAYS)" hardened: "Never omit, never collapse into the options list." All T2 skills regenerated across all hosts. Golden fixtures refreshed (claude-ship, codex-ship, factory-ship). Updated the ELI10 assertion in test/gen-skill-docs.test.ts to match the new wording. Codex compliance to be verified empirically via test/codex-e2e-plan-format.test.ts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: fix Codex eval sandbox + collector API Two test infrastructure bugs in the initial Codex eval landed in the prior commit: 1. sandbox: 'read-only' (the default) blocked Codex from writing $OUT_FILE. Test reported "STATUS: BLOCKED" and exited 0 without a capture file. Fixed: sandbox: 'workspace-write' for all 4 cases, allowing writes inside the tempdir. 2. recordCodexResult called a non-existent evalCollector.record() API (I invented it). The real surface is addTest() with a different field schema. Aligned with test/codex-e2e.test.ts pattern. With both fixed, the eval now actually measures Codex AskUserQuestion format compliance. All 4 cases pass on v1.6.2.0 with the gpt.md carve-out: RECOMMENDATION always, Completeness: N/10 only on coverage, "options differ in kind" note on kind, ELI10 explanation present. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.6.3.0) Adds the Codex ELI10 + RECOMMENDATION carve-out scope landed after v1.6.2.0's Claude-verified fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
321 lines
14 KiB
TypeScript
321 lines
14 KiB
TypeScript
/**
|
|
* AskUserQuestion format regression test for /plan-ceo-review and /plan-eng-review
|
|
* running under Codex CLI (GPT-5.4).
|
|
*
|
|
* Context: GPT-class models under the "No preamble / Prefer doing over listing"
|
|
* gpt.md overlay tend to skip the Simplify (ELI10) paragraph and the RECOMMENDATION
|
|
* line on AskUserQuestion calls. The user has to manually re-prompt "ELI10 and don't
|
|
* forget to recommend" almost every time. This test pins that behavior so future
|
|
* regressions surface automatically.
|
|
*
|
|
* Mirrors test/skill-e2e-plan-format.test.ts (the Claude version) but uses
|
|
* test/helpers/codex-session-runner.ts to drive `codex exec` instead of `claude -p`.
|
|
*
|
|
* Four cases:
|
|
* 1. plan-ceo-review mode selection (kind-differentiated)
|
|
* 2. plan-ceo-review approach menu (coverage-differentiated)
|
|
* 3. plan-eng-review per-issue coverage decision
|
|
* 4. plan-eng-review per-issue architectural choice (kind-differentiated)
|
|
*
|
|
* Assertions on captured AskUserQuestion text:
|
|
* - RECOMMENDATION: Choose present (all cases)
|
|
* - Completeness: N/10 present on coverage cases, absent on kind cases
|
|
* - "options differ in kind" note present on kind cases
|
|
* - ELI10-style plain-English explanation present (length floor + no raw jargon)
|
|
*
|
|
* Periodic tier (Codex non-determinism). Cost: ~$2-3 per full run.
|
|
*/
|
|
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
|
import { runCodexSkill, installSkillToTempHome } from './helpers/codex-session-runner';
|
|
import type { CodexResult } from './helpers/codex-session-runner';
|
|
import { EvalCollector } from './helpers/eval-store';
|
|
import type { EvalTestEntry } from './helpers/eval-store';
|
|
import { selectTests, detectBaseBranch, getChangedFiles, GLOBAL_TOUCHFILES } from './helpers/touchfiles';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
import { spawnSync } from 'child_process';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '..');
|
|
|
|
// --- Prerequisites ---
|
|
|
|
const CODEX_AVAILABLE = (() => {
|
|
try {
|
|
const result = Bun.spawnSync(['which', 'codex']);
|
|
return result.exitCode === 0;
|
|
} catch { return false; }
|
|
})();
|
|
const evalsEnabled = !!process.env.EVALS;
|
|
const SKIP = !CODEX_AVAILABLE || !evalsEnabled;
|
|
const describeCodex = SKIP ? describe.skip : describe;
|
|
|
|
// --- Touchfiles ---
|
|
|
|
const CODEX_FORMAT_TOUCHFILES: Record<string, string[]> = {
|
|
'codex-plan-ceo-format-mode': ['.agents/skills/gstack-plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'model-overlays/gpt.md', 'model-overlays/gpt-5.4.md'],
|
|
'codex-plan-ceo-format-approach': ['.agents/skills/gstack-plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'model-overlays/gpt.md', 'model-overlays/gpt-5.4.md'],
|
|
'codex-plan-eng-format-coverage': ['.agents/skills/gstack-plan-eng-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'model-overlays/gpt.md', 'model-overlays/gpt-5.4.md'],
|
|
'codex-plan-eng-format-kind': ['.agents/skills/gstack-plan-eng-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completeness-section.ts', 'model-overlays/gpt.md', 'model-overlays/gpt-5.4.md'],
|
|
};
|
|
|
|
let selectedTests: string[] | null = null;
|
|
if (evalsEnabled && !process.env.EVALS_ALL) {
|
|
const baseBranch = process.env.EVALS_BASE || detectBaseBranch(ROOT) || 'main';
|
|
const changedFiles = getChangedFiles(baseBranch, ROOT);
|
|
if (changedFiles.length > 0) {
|
|
const selection = selectTests(changedFiles, CODEX_FORMAT_TOUCHFILES, GLOBAL_TOUCHFILES);
|
|
selectedTests = selection.selected;
|
|
}
|
|
}
|
|
|
|
function testIfSelected(name: string, fn: () => Promise<void>, timeout?: number) {
|
|
if (selectedTests !== null && !selectedTests.includes(name)) {
|
|
test.skip(name, fn, timeout);
|
|
} else {
|
|
test(name, fn, timeout);
|
|
}
|
|
}
|
|
|
|
// --- Eval collector ---
|
|
|
|
let evalCollector: EvalCollector | null = null;
|
|
if (!SKIP) {
|
|
evalCollector = new EvalCollector('codex-e2e-plan-format');
|
|
}
|
|
|
|
function recordCodexResult(testName: string, result: CodexResult, passed: boolean) {
|
|
evalCollector?.addTest({
|
|
name: testName,
|
|
suite: 'codex-e2e-plan-format',
|
|
tier: 'e2e',
|
|
passed,
|
|
duration_ms: result.durationMs,
|
|
cost_usd: 0, // Codex doesn't report cost in the same way; tokens tracked separately
|
|
output: result.output?.slice(0, 2000),
|
|
turns_used: result.toolCalls.length,
|
|
exit_reason: result.exitCode === 0 ? 'success' : `exit_code_${result.exitCode}`,
|
|
});
|
|
}
|
|
|
|
afterAll(async () => {
|
|
if (evalCollector) {
|
|
await evalCollector.finalize();
|
|
}
|
|
});
|
|
|
|
// --- Fixtures ---
|
|
|
|
const SAMPLE_PLAN = `# Plan: Add User Dashboard
|
|
|
|
## Context
|
|
We're building a new user dashboard that shows recent activity, notifications, and quick actions.
|
|
|
|
## Changes
|
|
1. New React component \`UserDashboard\` in \`src/components/\`
|
|
2. REST API endpoint \`GET /api/dashboard\` returning user stats
|
|
3. PostgreSQL query for activity aggregation
|
|
4. Redis cache layer for dashboard data (5min TTL)
|
|
|
|
## Architecture
|
|
- Frontend: React + TailwindCSS
|
|
- Backend: Express.js REST API
|
|
- Database: PostgreSQL with existing user/activity tables
|
|
- Cache: Redis for dashboard aggregates
|
|
`;
|
|
|
|
function setupCodexSkillDir(tmpPrefix: string, skillName: 'plan-ceo-review' | 'plan-eng-review'): { skillDir: string; planDir: string; outFile: string } {
|
|
const planDir = fs.mkdtempSync(path.join(os.tmpdir(), tmpPrefix));
|
|
const run = (cmd: string, args: string[]) =>
|
|
spawnSync(cmd, args, { cwd: planDir, stdio: 'pipe', timeout: 5000 });
|
|
|
|
run('git', ['init', '-b', 'main']);
|
|
run('git', ['config', 'user.email', 'test@test.com']);
|
|
run('git', ['config', 'user.name', 'Test']);
|
|
|
|
fs.writeFileSync(path.join(planDir, 'plan.md'), SAMPLE_PLAN);
|
|
run('git', ['add', '.']);
|
|
run('git', ['commit', '-m', 'add plan']);
|
|
|
|
// Codex skill lives in .agents/skills/gstack-{name}/ per the gstack host convention.
|
|
const codexSkillSource = path.join(ROOT, '.agents', 'skills', `gstack-${skillName}`);
|
|
const skillDir = path.join(planDir, '.agents', 'skills', `gstack-${skillName}`);
|
|
fs.mkdirSync(skillDir, { recursive: true });
|
|
fs.cpSync(codexSkillSource, skillDir, { recursive: true });
|
|
|
|
const outFile = path.join(planDir, 'ask-capture.md');
|
|
return { skillDir, planDir, outFile };
|
|
}
|
|
|
|
// Capture instruction — same shape as the Claude version. Codex may ignore tool calls,
|
|
// so we tell it to write prose to the file directly.
|
|
function captureInstruction(outFile: string): string {
|
|
return `Write the verbatim text of every AskUserQuestion you would have presented to the user to the file ${outFile} (one question per session, full text including the re-ground, ELI10 paragraph, RECOMMENDATION line, and options). Do NOT ask the user interactively. Do NOT paraphrase. This is a format-capture test, not an interactive session.`;
|
|
}
|
|
|
|
// --- Regex predicates ---
|
|
// Match RECOMMENDATION lenient to markdown bolding around it.
|
|
const RECOMMENDATION_RE = /RECOMMENDATION:[*\s]*Choose/;
|
|
const COMPLETENESS_RE = /Completeness:\s*\d{1,2}\/10/;
|
|
const KIND_NOTE_RE = /options differ in kind/i;
|
|
// ELI10 signal: some plain-English explanation must exist. Weak proxy: >= 200 chars
|
|
// of narrative prose between the re-ground and the options, AND at least one of the
|
|
// plain-English hints ("plain English", "16-year-old", or "what this means").
|
|
// We test for the length floor and absence of a bare options-list-only output.
|
|
const ELI10_LENGTH_FLOOR = 400; // full AskUserQuestion content should be at least this long
|
|
|
|
// --- Tests ---
|
|
|
|
describeCodex('Codex Plan Format — CEO Mode Selection', () => {
|
|
let skillDir: string, planDir: string, outFile: string;
|
|
|
|
beforeAll(() => {
|
|
({ skillDir, planDir, outFile } = setupCodexSkillDir('codex-e2e-plan-format-ceo-mode-', 'plan-ceo-review'));
|
|
});
|
|
|
|
afterAll(() => {
|
|
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('codex-plan-ceo-format-mode', async () => {
|
|
const result = await runCodexSkill({
|
|
skillDir,
|
|
prompt: `Read the plan-ceo-review skill. Read plan.md (the plan to review). Proceed to Step 0F (Mode Selection) where the skill presents 4 mode options (SCOPE EXPANSION, SELECTIVE EXPANSION, HOLD SCOPE, SCOPE REDUCTION) via AskUserQuestion. These options differ in kind (review posture), not coverage. ${captureInstruction(outFile)}`,
|
|
timeoutMs: 300_000,
|
|
cwd: planDir,
|
|
skillName: 'gstack-plan-ceo-review',
|
|
sandbox: 'workspace-write',
|
|
});
|
|
|
|
recordCodexResult('codex-plan-ceo-format-mode', result, result.exitCode === 0);
|
|
console.log(`codex-plan-ceo-format-mode: ${result.tokens}t, ${Math.round(result.durationMs/1000)}s, exit=${result.exitCode}`);
|
|
|
|
// Codex may timeout — accept as non-fatal (same pattern as existing codex-e2e tests)
|
|
if (result.exitCode === 124 || result.exitCode === 137) {
|
|
console.warn(`codex timed out (exit ${result.exitCode}) — skipping assertions`);
|
|
return;
|
|
}
|
|
|
|
expect(fs.existsSync(outFile)).toBe(true);
|
|
const captured = fs.readFileSync(outFile, 'utf-8');
|
|
expect(captured.length).toBeGreaterThan(ELI10_LENGTH_FLOOR);
|
|
expect(captured).toMatch(RECOMMENDATION_RE);
|
|
// kind-differentiated: no fabricated score, must have note
|
|
expect(captured).not.toMatch(COMPLETENESS_RE);
|
|
expect(captured).toMatch(KIND_NOTE_RE);
|
|
}, 360_000);
|
|
});
|
|
|
|
describeCodex('Codex Plan Format — CEO Approach Menu', () => {
|
|
let skillDir: string, planDir: string, outFile: string;
|
|
|
|
beforeAll(() => {
|
|
({ skillDir, planDir, outFile } = setupCodexSkillDir('codex-e2e-plan-format-ceo-approach-', 'plan-ceo-review'));
|
|
});
|
|
|
|
afterAll(() => {
|
|
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('codex-plan-ceo-format-approach', async () => {
|
|
const result = await runCodexSkill({
|
|
skillDir,
|
|
prompt: `Read the plan-ceo-review skill. Read plan.md. Proceed to Step 0C-bis (Implementation Alternatives / Approach Menu) where the skill generates 2-3 approaches (minimal viable vs ideal architecture) and presents them via AskUserQuestion. These options differ in coverage so Completeness: N/10 applies. ${captureInstruction(outFile)}`,
|
|
timeoutMs: 300_000,
|
|
cwd: planDir,
|
|
skillName: 'gstack-plan-ceo-review',
|
|
sandbox: 'workspace-write',
|
|
});
|
|
|
|
recordCodexResult('codex-plan-ceo-format-approach', result, result.exitCode === 0);
|
|
console.log(`codex-plan-ceo-format-approach: ${result.tokens}t, ${Math.round(result.durationMs/1000)}s, exit=${result.exitCode}`);
|
|
|
|
if (result.exitCode === 124 || result.exitCode === 137) {
|
|
console.warn(`codex timed out (exit ${result.exitCode}) — skipping assertions`);
|
|
return;
|
|
}
|
|
|
|
expect(fs.existsSync(outFile)).toBe(true);
|
|
const captured = fs.readFileSync(outFile, 'utf-8');
|
|
expect(captured.length).toBeGreaterThan(ELI10_LENGTH_FLOOR);
|
|
expect(captured).toMatch(RECOMMENDATION_RE);
|
|
expect(captured).toMatch(COMPLETENESS_RE);
|
|
}, 360_000);
|
|
});
|
|
|
|
describeCodex('Codex Plan Format — Eng Coverage Issue', () => {
|
|
let skillDir: string, planDir: string, outFile: string;
|
|
|
|
beforeAll(() => {
|
|
({ skillDir, planDir, outFile } = setupCodexSkillDir('codex-e2e-plan-format-eng-cov-', 'plan-eng-review'));
|
|
});
|
|
|
|
afterAll(() => {
|
|
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('codex-plan-eng-format-coverage', async () => {
|
|
const result = await runCodexSkill({
|
|
skillDir,
|
|
prompt: `Read the plan-eng-review skill. Read plan.md. In your Section 3 Test Review, generate ONE AskUserQuestion about test coverage depth where options are clearly coverage-differentiated: A) full coverage incl. edge + error paths (Completeness 10/10), B) happy path only (7/10), C) smoke test (3/10). ${captureInstruction(outFile)}`,
|
|
timeoutMs: 300_000,
|
|
cwd: planDir,
|
|
skillName: 'gstack-plan-eng-review',
|
|
sandbox: 'workspace-write',
|
|
});
|
|
|
|
recordCodexResult('codex-plan-eng-format-coverage', result, result.exitCode === 0);
|
|
console.log(`codex-plan-eng-format-coverage: ${result.tokens}t, ${Math.round(result.durationMs/1000)}s, exit=${result.exitCode}`);
|
|
|
|
if (result.exitCode === 124 || result.exitCode === 137) {
|
|
console.warn(`codex timed out (exit ${result.exitCode}) — skipping assertions`);
|
|
return;
|
|
}
|
|
|
|
expect(fs.existsSync(outFile)).toBe(true);
|
|
const captured = fs.readFileSync(outFile, 'utf-8');
|
|
expect(captured.length).toBeGreaterThan(ELI10_LENGTH_FLOOR);
|
|
expect(captured).toMatch(RECOMMENDATION_RE);
|
|
expect(captured).toMatch(COMPLETENESS_RE);
|
|
}, 360_000);
|
|
});
|
|
|
|
describeCodex('Codex Plan Format — Eng Kind Issue', () => {
|
|
let skillDir: string, planDir: string, outFile: string;
|
|
|
|
beforeAll(() => {
|
|
({ skillDir, planDir, outFile } = setupCodexSkillDir('codex-e2e-plan-format-eng-kind-', 'plan-eng-review'));
|
|
});
|
|
|
|
afterAll(() => {
|
|
try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('codex-plan-eng-format-kind', async () => {
|
|
const result = await runCodexSkill({
|
|
skillDir,
|
|
prompt: `Read the plan-eng-review skill. Read plan.md. In your Section 1 Architecture review, generate ONE AskUserQuestion about an architectural choice where the options differ in kind (e.g. Redis vs Postgres materialized view vs in-process cache — different kinds of systems with different tradeoffs, NOT more-or-less-complete versions of the same thing). ${captureInstruction(outFile)}`,
|
|
timeoutMs: 300_000,
|
|
cwd: planDir,
|
|
skillName: 'gstack-plan-eng-review',
|
|
sandbox: 'workspace-write',
|
|
});
|
|
|
|
recordCodexResult('codex-plan-eng-format-kind', result, result.exitCode === 0);
|
|
console.log(`codex-plan-eng-format-kind: ${result.tokens}t, ${Math.round(result.durationMs/1000)}s, exit=${result.exitCode}`);
|
|
|
|
if (result.exitCode === 124 || result.exitCode === 137) {
|
|
console.warn(`codex timed out (exit ${result.exitCode}) — skipping assertions`);
|
|
return;
|
|
}
|
|
|
|
expect(fs.existsSync(outFile)).toBe(true);
|
|
const captured = fs.readFileSync(outFile, 'utf-8');
|
|
expect(captured.length).toBeGreaterThan(ELI10_LENGTH_FLOOR);
|
|
expect(captured).toMatch(RECOMMENDATION_RE);
|
|
// kind-differentiated: no fabricated score
|
|
expect(captured).not.toMatch(COMPLETENESS_RE);
|
|
expect(captured).toMatch(KIND_NOTE_RE);
|
|
}, 360_000);
|
|
});
|