test(plan-tune): 5 cathedral E2E scenarios + touchfile registration

Plan-tune cathedral T16 (per D12 — all 5 in gate tier). One consolidated
file with five describeIfSelected scenarios, each selectable by its own
touchfile entry so they only run when the relevant code changes (or
EVALS_ALL=1 forces all):

  plan-tune-hook-capture     — PostToolUse hook fires → question-log fills
  plan-tune-enforcement      — never-ask + marker + 2-way → deny+reason
                               + auto-decided event logged
  plan-tune-annotation       — declared profile + memory nugget
                               → additionalContext surfaced on defer
  plan-tune-codex-import     — synthetic JSONL → import bin → log with
                               source=codex-import-marker
  plan-tune-dream-cycle      — apply proposal → re-fire question
                               → memory injected via additionalContext

Each scenario fixtures an isolated git repo + bins + scripts + hooks
under tmp, then exercises the cathedral chain end-to-end against real
on-disk binaries (no mocks at the bin layer). GSTACK_STATE_ROOT keeps
the user's real ~/.gstack untouched.

These five complement the existing unit tests by proving the full
sub-process chain works (not just individual functions in isolation).
They DON'T spawn claude -p because the cathedral's substrate behavior is
deterministic — agent compliance is no longer the variable. The existing
test/skill-e2e-plan-tune.test.ts (plan-tune-inspect) still covers the
LLM-driven intent-routing behavior.

Cost: each scenario runs in ~1s with $0 because no claude -p invocations.
Touchfile-gated, so they only run on PRs that touch cathedral code.

Also fixes a bug found by the E2E: question-log-hook didn't pass the
incoming tool call's cwd to spawnSync when invoking gstack-question-log,
so the bin used the hook process's cwd (the repo root) instead of the
session's cwd. Result: log writes landed in the wrong project bucket.
Fix mirrors the same cwd-passing pattern from question-preference-hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-27 07:58:58 -07:00
parent 3eab8939e3
commit d50d4741b7
3 changed files with 477 additions and 2 deletions
+14
View File
@@ -191,6 +191,13 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
// /plan-tune (v1 observational)
'plan-tune-inspect': ['plan-tune/**', 'scripts/question-registry.ts', 'scripts/psychographic-signals.ts', 'scripts/one-way-doors.ts', 'bin/gstack-question-log', 'bin/gstack-question-preference', 'bin/gstack-developer-profile'],
// /plan-tune cathedral (T16 — 5 E2E scenarios, all gate per D12)
'plan-tune-hook-capture': ['hosts/claude/hooks/**', 'bin/gstack-question-log', 'bin/gstack-developer-profile', 'plan-tune/**'],
'plan-tune-enforcement': ['hosts/claude/hooks/**', 'bin/gstack-question-preference', 'scripts/question-registry.ts'],
'plan-tune-annotation': ['hosts/claude/hooks/**', 'scripts/declared-annotation.ts', 'scripts/psychographic-signals.ts', 'scripts/question-registry.ts'],
'plan-tune-codex-import': ['bin/gstack-codex-session-import', 'bin/gstack-question-log', 'docs/spikes/codex-session-format.md'],
'plan-tune-dream-cycle': ['bin/gstack-distill-free-text', 'bin/gstack-distill-apply', 'hosts/claude/hooks/**', 'plan-tune/**'],
// Codex offering verification
'codex-offered-office-hours': ['office-hours/**', 'scripts/gen-skill-docs.ts'],
'codex-offered-ceo-review': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
@@ -528,6 +535,13 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
// /plan-tune — gate (core v1 DX promise: plain-English intent routing)
'plan-tune-inspect': 'gate',
// /plan-tune cathedral (T16 per D12 — all gate)
'plan-tune-hook-capture': 'gate',
'plan-tune-enforcement': 'gate',
'plan-tune-annotation': 'gate',
'plan-tune-codex-import': 'gate',
'plan-tune-dream-cycle': 'gate',
// Codex offering verification
'codex-offered-office-hours': 'gate',
'codex-offered-ceo-review': 'gate',