Files
gstack/test/e2e-harness-audit.test.ts
T
Garry Tan aeea57f96a v1.12.1.0 fix: remove vestigial plan-mode handshake (#1185)
* refactor: remove vestigial plan-mode handshake resolver

Delete scripts/resolvers/preamble/generate-plan-mode-handshake.ts and
its four question-registry entries. Split the authoritative
"Plan Mode Safe Operations" and "Skill Invocation During Plan Mode"
sections out of generate-completion-status.ts into a sibling
generatePlanModeInfo() export in the same module, wired at preamble
position 1 where the handshake used to live. Same text, new position.

The vestigial handshake told interactive review skills to emit an
A=exit-and-rerun / C=cancel AskUserQuestion before running their
interactive STOP-Ask workflow. That contradicted the authoritative
rule at the tail of completion-status.ts saying AskUserQuestion
satisfies plan mode's end-of-turn requirement. Skills now run
directly when invoked in plan mode, with each finding gated by
AskUserQuestion just like outside plan mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: rename plan-mode-handshake-helpers to plan-mode-helpers, strengthen smokes

Rename test/helpers/plan-mode-handshake-helpers.ts to
test/helpers/plan-mode-helpers.ts. Keep the write-guard helper that
asserts no Write/Edit tool call before the first AskUserQuestion
(this is what catches silent-bypass regressions the textual smoke
can't see). Rename the API: runPlanModeHandshakeTest to
runPlanModeSkillTest, assertHandshakeShape to assertNotHandshakeShape.
Extend the capture struct with exitPlanModeBeforeAsk.

Rewrite the four per-skill E2E tests (plan-ceo, plan-eng, plan-design,
plan-devex) as smoke tests that assert the skill's Step 0 question
fires first, not an A/C handshake. Each test picks a cheap first
answer (HOLD, TRIAGE, numeric score) so the run terminates quickly.

Keep test/skill-e2e-plan-mode-no-op.test.ts as the outside-plan-mode
non-interference regression, per codex outside-voice review: deleting
it would lose coverage for "the hoisted section stays quiet when plan
mode is absent."

Replace the gen-skill-docs.test.ts handshake describe block (lines
2778+) with a plan-mode-info describe block that:
- scans every generated SKILL.md under the repo root + every host
  subdir (.agents, .openclaw, .opencode, .factory, .hermes, .kiro,
  .cursor, .slate) and asserts "## Plan Mode Handshake" is absent
- asserts "## Skill Invocation During Plan Mode" lands in the first
  15KB of each of the four review skills' generated SKILL.md

Both assertions run on every bun test. A PR that re-introduces the
handshake resolver fails CI immediately.

Update test/e2e-harness-audit.test.ts to reference the renamed
runPlanModeSkillTest. Update test/helpers/touchfiles.ts entries to
point at the new resolver owner (generate-completion-status.ts) and
the renamed helper, and align per-skill touchfile keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md across all hosts + refresh golden fixtures

Run bun run gen:skill-docs for every host to flush the vestigial
"## Plan Mode Handshake" section from every generated SKILL.md and
emit the hoisted "## Skill Invocation During Plan Mode" section at
preamble position 1 instead. Refresh the three golden-fixture
snapshots (claude, codex, factory) to match the new position.

No behavior change beyond the resolver swap in the prior commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.12.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 02:11:24 -07:00

114 lines
3.4 KiB
TypeScript

/**
* E2E harness audit — every skill with `interactive: true` in its frontmatter
* must have at least one test file that uses `canUseTool` via the extended
* agent-sdk-runner. This prevents future drift where a skill opts into the
* handshake without adding real coverage.
*
* Runs as a free unit test (no API calls). Pure filesystem scan.
*/
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
const ROOT = path.resolve(import.meta.dir, '..');
const SKILL_GLOBS = [
'plan-ceo-review',
'plan-eng-review',
'plan-design-review',
'plan-devex-review',
'office-hours',
'codex',
'investigate',
'qa',
'retro',
'cso',
'review',
'ship',
'design-review',
'devex-review',
'qa-only',
'design-consultation',
'design-shotgun',
'autoplan',
'land-and-deploy',
'plan-tune',
'document-release',
'context-save',
'context-restore',
'health',
'setup-deploy',
'setup-browser-cookies',
'canary',
'learn',
'benchmark',
'benchmark-models',
'make-pdf',
'open-gstack-browser',
'gstack-upgrade',
'pair-agent',
'design-html',
'freeze',
'unfreeze',
'careful',
'guard',
];
/**
* Load .tmpl files for each skill and return the names of those that have
* `interactive: true` in frontmatter.
*/
function findInteractiveSkills(): string[] {
const interactive: string[] = [];
for (const skill of SKILL_GLOBS) {
const tmplPath = path.join(ROOT, skill, 'SKILL.md.tmpl');
if (!fs.existsSync(tmplPath)) continue;
const content = fs.readFileSync(tmplPath, 'utf-8');
// Frontmatter lives between the first '---' and the next '---'.
const fmEnd = content.indexOf('\n---', 4);
if (fmEnd < 0) continue;
const frontmatter = content.slice(0, fmEnd);
if (/^interactive:\s*true\s*$/m.test(frontmatter)) {
interactive.push(skill);
}
}
return interactive;
}
/**
* Scan a test file's contents for the canUseTool-via-harness pattern.
* Either: direct canUseTool usage in runAgentSdkTest, or usage of the
* shared plan-mode-helpers that wrap it.
*/
function hasCanUseToolCoverage(testFile: string): boolean {
const content = fs.readFileSync(testFile, 'utf-8');
if (content.includes('canUseTool')) return true;
if (content.includes('runPlanModeSkillTest')) return true;
return false;
}
describe('E2E harness audit — interactive skills must have canUseTool coverage', () => {
test('every interactive: true skill has at least one canUseTool test', () => {
const interactive = findInteractiveSkills();
expect(interactive.length).toBeGreaterThan(0);
const testFiles = fs
.readdirSync(path.join(ROOT, 'test'))
.filter((f) => f.startsWith('skill-e2e-') && f.endsWith('.test.ts'))
.map((f) => path.join(ROOT, 'test', f));
const filesWithCoverage = testFiles.filter(hasCanUseToolCoverage);
for (const skill of interactive) {
// Match the skill name in any test file that uses canUseTool. File
// naming convention is `skill-e2e-<skill>-*.test.ts` — either the full
// name (plan-ceo-review) or a subset token.
const hasDedicatedTest = filesWithCoverage.some((f) => {
const base = path.basename(f, '.test.ts');
return base.includes(skill) || base.includes(skill.replace(/-review$/, ''));
});
expect(hasDedicatedTest, `skill "${skill}" has interactive:true but no canUseTool-based E2E test`).toBe(true);
}
});
});