mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 13:15:24 +02:00
docs: add E2E eval failure blame protocol
"Not related to our changes" is an extraordinary claim that requires extraordinary proof. When evals fail during /ship: 1. Run the same eval on main — prove it fails there too 2. If it passes on main, it IS your change — trace the blame 3. If you can't verify, say "unverified" not "pre-existing" Added to CLAUDE.md and as a comment in skill-e2e.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -13,6 +13,11 @@ import * as os from 'os';
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
// Skip unless EVALS=1. Session runner strips CLAUDE* env vars to avoid nested session issues.
|
||||
//
|
||||
// BLAME PROTOCOL: When an eval fails, do NOT claim "pre-existing" or "not related
|
||||
// to our changes" without proof. Run the same eval on main to verify. These tests
|
||||
// have invisible couplings — preamble text, SKILL.md content, and timing all affect
|
||||
// agent behavior. See CLAUDE.md "E2E eval failure blame protocol" for details.
|
||||
const evalsEnabled = !!process.env.EVALS;
|
||||
const describeE2E = evalsEnabled ? describe : describe.skip;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user