From e3d36b645f921714a51f495f1308340606b2bf16 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 16 Mar 2026 10:09:48 -0500 Subject: [PATCH] =?UTF-8?q?feat:=20add=20verification=20gate=20to=20/ship?= =?UTF-8?q?=20(Step=206.5)=20=E2=80=94=20no=20push=20without=20fresh=20evi?= =?UTF-8?q?dence?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before pushing, re-verify tests if code changed during review fixes. Rationalization prevention: "Should work now" → RUN IT. "I'm confident" → Confidence is not evidence. Co-Authored-By: Claude Opus 4.6 (1M context) --- ship/SKILL.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++ ship/SKILL.md.tmpl | 23 ++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/ship/SKILL.md b/ship/SKILL.md index e023816d..14d3aa1b 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -72,6 +72,31 @@ Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-log Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" +## Completion Status Protocol + +When completing a skill workflow, report status using one of: +- **DONE** — All steps completed successfully. Evidence provided for each claim. +- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern. +- **BLOCKED** — Cannot proceed. State what is blocking and what was tried. +- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need. + +### Escalation + +It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result." + +Bad work is worse than no work. You will not be penalized for escalating. +- If you have attempted a task 3 times without success, STOP and escalate. +- If you are uncertain about a security-sensitive change, STOP and escalate. +- If the scope of work exceeds what you can verify, STOP and escalate. + +Escalation format: +``` +STATUS: BLOCKED | NEEDS_CONTEXT +REASON: [1-2 sentences] +ATTEMPTED: [what you tried] +RECOMMENDATION: [what the user should do next] +``` + # Ship: Fully Automated Ship Workflow You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end. @@ -407,6 +432,28 @@ EOF --- +## Step 6.5: Verification Gate + +**IRON LAW: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.** + +Before pushing, re-verify if code changed during Steps 4-6: + +1. **Test verification:** If ANY code changed after Step 3's test run (fixes from review findings, CHANGELOG edits don't count), re-run the test suite. Paste fresh output. Stale output from Step 3 is NOT acceptable. + +2. **Build verification:** If the project has a build step, run it. Paste output. + +3. **Rationalization prevention:** + - "Should work now" → RUN IT. + - "I'm confident" → Confidence is not evidence. + - "I already tested earlier" → Code changed since then. Test again. + - "It's a trivial change" → Trivial changes break production. + +**If tests fail here:** STOP. Do not push. Fix the issue and return to Step 3. + +Claiming work is complete without verification is dishonesty, not efficiency. + +--- + ## Step 7: Push Push to the remote with upstream tracking: @@ -467,4 +514,5 @@ EOF - **Split commits for bisectability** — each commit = one logical change. - **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done. - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies. +- **Never push without fresh verification evidence.** If code changed after Step 3 tests, re-run before pushing. - **The goal is: user says `/ship`, next thing they see is the review + PR URL.** diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index 06ff5a07..9c1b1649 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -350,6 +350,28 @@ EOF --- +## Step 6.5: Verification Gate + +**IRON LAW: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.** + +Before pushing, re-verify if code changed during Steps 4-6: + +1. **Test verification:** If ANY code changed after Step 3's test run (fixes from review findings, CHANGELOG edits don't count), re-run the test suite. Paste fresh output. Stale output from Step 3 is NOT acceptable. + +2. **Build verification:** If the project has a build step, run it. Paste output. + +3. **Rationalization prevention:** + - "Should work now" → RUN IT. + - "I'm confident" → Confidence is not evidence. + - "I already tested earlier" → Code changed since then. Test again. + - "It's a trivial change" → Trivial changes break production. + +**If tests fail here:** STOP. Do not push. Fix the issue and return to Step 3. + +Claiming work is complete without verification is dishonesty, not efficiency. + +--- + ## Step 7: Push Push to the remote with upstream tracking: @@ -410,4 +432,5 @@ EOF - **Split commits for bisectability** — each commit = one logical change. - **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done. - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies. +- **Never push without fresh verification evidence.** If code changed after Step 3 tests, re-run before pushing. - **The goal is: user says `/ship`, next thing they see is the review + PR URL.**