mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
docs: add E2E eval failure blame protocol
"Not related to our changes" is an extraordinary claim that requires extraordinary proof. When evals fail during /ship: 1. Run the same eval on main — prove it fails there too 2. If it passes on main, it IS your change — trace the blame 3. If you can't verify, say "unverified" not "pre-existing" Added to CLAUDE.md and as a comment in skill-e2e.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -118,6 +118,21 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
|
||||
- No jargon: say "every question now tells you which project and branch you're in" not
|
||||
"AskUserQuestion format standardized across skill templates via preamble resolver."
|
||||
|
||||
## E2E eval failure blame protocol
|
||||
|
||||
When an E2E eval fails during `/ship` or any other workflow, **never claim "not
|
||||
related to our changes" without proving it.** These systems have invisible couplings —
|
||||
a preamble text change affects agent behavior, a new helper changes timing, a
|
||||
regenerated SKILL.md shifts prompt context.
|
||||
|
||||
**Required before attributing a failure to "pre-existing":**
|
||||
1. Run the same eval on main (or base branch) and show it fails there too
|
||||
2. If it passes on main but fails on the branch — it IS your change. Trace the blame.
|
||||
3. If you can't run on main, say "unverified — may or may not be related" and flag it
|
||||
as a risk in the PR body
|
||||
|
||||
"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.
|
||||
|
||||
## Deploying to the active skill
|
||||
|
||||
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
|
||||
|
||||
@@ -13,6 +13,11 @@ import * as os from 'os';
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
// Skip unless EVALS=1. Session runner strips CLAUDE* env vars to avoid nested session issues.
|
||||
//
|
||||
// BLAME PROTOCOL: When an eval fails, do NOT claim "pre-existing" or "not related
|
||||
// to our changes" without proof. Run the same eval on main to verify. These tests
|
||||
// have invisible couplings — preamble text, SKILL.md content, and timing all affect
|
||||
// agent behavior. See CLAUDE.md "E2E eval failure blame protocol" for details.
|
||||
const evalsEnabled = !!process.env.EVALS;
|
||||
const describeE2E = evalsEnabled ? describe : describe.skip;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user