docs: add E2E fixture extraction rule to CLAUDE.md

Never copy full SKILL.md files into E2E test fixtures. Extract only the section the test needs. Also: run targeted evals in foreground, never pkill and restart mid-run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-07-10 09:58:38 +02:00 · 2026-03-26 22:11:37 -06:00
parent 5d023d269f
commit 2200ad65e5
1 changed files with 24 additions and 0 deletions
@@ -298,6 +298,30 @@ them. Report progress at each check (which tests passed, which are running, any
 failures so far). The user wants to see the run complete, not a promise that
 you'll check later.

+## E2E test fixtures: extract, don't copy
+
+**NEVER copy a full SKILL.md file into an E2E test fixture.** SKILL.md files are
+1500-2000 lines. When `claude -p` reads a file that large, context bloat causes
+timeouts, flaky turn limits, and tests that take 5-10x longer than necessary.
+
+Instead, extract only the section the test actually needs:
+
+```typescript
+// BAD — agent reads 1900 lines, burns tokens on irrelevant sections
+fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
+
+// GOOD — agent reads ~60 lines, finishes in 38s instead of timing out
+const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+const start = full.indexOf('## Review Readiness Dashboard');
+const end = full.indexOf('\n---\n', start);
+fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), full.slice(start, end > start ? end : undefined));
+```
+
+Also when running targeted E2E tests to debug failures:
+- Run in **foreground** (`bun test ...`), not background with `&` and `tee`
+- Never `pkill` running eval processes and restart — you lose results and waste money
+- One clean run beats three killed-and-restarted runs
+
 ## Deploying to the active skill

 The active skill lives at `~/.claude/skills/gstack/`. After making changes: