Files
gstack/ship/SKILL.md.tmpl
Garry Tan e3c961d00f fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063)
* fix(ship): detect + repair VERSION/package.json drift in Step 12

/ship Step 12's idempotency check read only VERSION and its bump
action wrote only VERSION. package.json's version field was never
updated, so the first bump silently drifted and re-runs couldn't
see it (they keyed on VERSION alone). Any consumer reading
package.json (bun pm, npm publish, registry UIs) saw a stale semver.

Rewrites Step 12 as a four-state dispatch:

  FRESH            → normal bump, writes VERSION + package.json in sync
  ALREADY_BUMPED   → skip, reuse current VERSION
  DRIFT_STALE_PKG  → sync-only repair path, no re-bump (prevents
                     double-bump on re-run)
  DRIFT_UNEXPECTED → halt and ask user (pkg edited manually,
                     ambiguous which value is authoritative)

Hardening: NEW_VERSION validated against MAJOR.MINOR.PATCH.MICRO
pattern before any write; node-or-bun required for JSON parsing
(no sed fallback — unsafe on nested "version" fields); invalid
JSON fails hard instead of silently corrupting.

Adds test/ship-version-sync.test.ts with 12 cases covering every
state transition, including the critical drift-repair regression
that verifies sync does not double-bump (the bug Codex caught in
the plan review of my own original fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): regenerate SKILL.md + refresh golden fixtures

Mechanical follow-on from the Step 12 template edit. `bun run
gen:skill-docs --host all` regenerates ship/SKILL.md; host-config
golden-file regression tests then need fresh baselines copied
from the regenerated claude/codex/factory host variants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ship): harden Step 12 against whitespace + invalid REPAIR_VERSION

Claude adversarial subagent surfaced three correctness risks in the
Step 12 state machine:

- CURRENT_VERSION and BASE_VERSION were not stripped of CR/whitespace
  on read. A CRLF VERSION file would mismatch the clean package.json
  version, falsely classify as DRIFT_STALE_PKG, then propagate the
  carriage return into package.json via the repair path.

- REPAIR_VERSION was unvalidated. The bump path validates NEW_VERSION
  against the 4-digit semver pattern, but the drift-repair path wrote
  whatever cat VERSION returned directly into package.json. A
  manually-corrupted VERSION file would silently poison the repair.

- Empty-string CURRENT_VERSION (0-byte VERSION, directory-at-VERSION)
  fell through to "not equal to base" and misclassified as
  ALREADY_BUMPED.

Template fix strips \r/newlines/whitespace on every VERSION read,
guards against empty-string results, and applies the same semver
regex gate in the repair path that already protects the bump path.

Adds two regression tests (trailing-CR idempotency + invalid-semver
repair rejection). Total Step 12 coverage: 14 tests, 14/14 pass.

Opens two follow-up TODOs flagged but not fixed in this branch:
test/template drift risk (the tests still reimplement template bash)
and BASE_VERSION silent fallback on git-show failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ship): regenerate SKILL.md + refresh goldens after hardening

Mechanical follow-on from the whitespace + REPAIR_VERSION validation
edits to ship/SKILL.md.tmpl. bun run gen:skill-docs --host all
regenerates ship/SKILL.md; host-config golden-file regression tests
need fresh baselines copied from the regenerated claude/codex/factory
host variants.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.0.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 23:58:59 +08:00

834 lines
39 KiB
Cheetah
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: ship
preamble-tier: 4
version: 1.0.0
description: |
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION,
update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy",
"push to main", "create a PR", "merge and push", or "get it deployed".
Proactively invoke this skill (do NOT push/PR directly) when the user says code
is ready, asks about deploying, wants to push code up, or asks to create a PR. (gstack)
allowed-tools:
- Bash
- Read
- Write
- Edit
- Grep
- Glob
- Agent
- AskUserQuestion
- WebSearch
sensitive: true
triggers:
- ship it
- create a pr
- push to main
- deploy this
---
{{PREAMBLE}}
{{BASE_BRANCH_DETECT}}
{{GBRAIN_CONTEXT_LOAD}}
# Ship: Fully Automated Ship Workflow
You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
**Only stop for:**
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- In-branch test failures (pre-existing failures are triaged, not auto-blocking)
- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 12)
- Greptile review comments that need user decision (complex fixes, false positives)
- AI-assessed coverage below minimum threshold (hard gate with user override — see Step 7)
- Plan items NOT DONE with no user override (see Step 8)
- Plan verification failures (see Step 8.1)
- TODOS.md missing and user wants to create one (ask — see Step 14)
- TODOS.md disorganized and user wants to reorganize (ask — see Step 14)
**Never stop for:**
- Uncommitted changes (always include them)
- Version bump choice (auto-pick MICRO or PATCH — see Step 12)
- CHANGELOG content (auto-generate from diff)
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
- Test coverage gaps within target threshold (auto-generate and commit, or flag in PR body)
**Re-run behavior (idempotency):**
Re-running `/ship` means "run the whole checklist again." Every verification step
(tests, coverage audit, plan completion, pre-landing review, adversarial review,
VERSION/CHANGELOG check, TODOS, document-release) runs on every invocation.
Only *actions* are idempotent:
- Step 12: If VERSION already bumped, skip the bump but still read the version
- Step 17: If already pushed, skip the push command
- Step 19: If PR exists, update the body instead of creating a new PR
Never skip a verification step because a prior `/ship` run already performed it.
---
## Step 1: Pre-flight
1. Check the current branch. If on the base branch or the repo's default branch, **abort**: "You're on the base branch. Ship from a feature branch."
2. Run `git status` (never use `-uall`). Uncommitted changes are always included — no need to ask.
3. Run `git diff <base>...HEAD --stat` and `git log <base>..HEAD --oneline` to understand what's being shipped.
4. Check review readiness:
{{REVIEW_DASHBOARD}}
If the Eng Review is NOT "CLEAR":
Print: "No prior eng review found — ship will run its own pre-landing review in Step 9."
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 9, but consider running /design-review for a full visual audit post-implementation." Still never block.
Continue to Step 2 — do NOT block or ask. Ship runs its own review in Step 9.
---
## Step 2: Distribution Pipeline Check
If the diff introduces a new standalone artifact (CLI binary, library package, tool) — not a web
service with existing deployment — verify that a distribution pipeline exists.
1. Check if the diff adds a new `cmd/` directory, `main.go`, or `bin/` entry point:
```bash
git diff origin/<base> --name-only | grep -E '(cmd/.*/main\.go|bin/|Cargo\.toml|setup\.py|package\.json)' | head -5
```
2. If new artifact detected, check for a release workflow:
```bash
ls .github/workflows/ 2>/dev/null | grep -iE 'release|publish|dist'
grep -qE 'release|publish|deploy' .gitlab-ci.yml 2>/dev/null && echo "GITLAB_CI_RELEASE"
```
3. **If no release pipeline exists and a new artifact was added:** Use AskUserQuestion:
- "This PR adds a new binary/tool but there's no CI/CD pipeline to build and publish it.
Users won't be able to download the artifact after merge."
- A) Add a release workflow now (CI/CD release pipeline — GitHub Actions or GitLab CI depending on platform)
- B) Defer — add to TODOS.md
- C) Not needed — this is internal/web-only, existing deployment covers it
4. **If release pipeline exists:** Continue silently.
5. **If no new artifact detected:** Skip silently.
---
## Step 3: Merge the base branch (BEFORE tests)
Fetch and merge the base branch into the feature branch so tests run against the merged state:
```bash
git fetch origin <base> && git merge origin/<base> --no-edit
```
**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
**If already up to date:** Continue silently.
---
## Step 4: Test Framework Bootstrap
{{TEST_BOOTSTRAP}}
---
## Step 5: Run tests (on merged code)
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
`db:test:prepare` internally, which loads the schema into the correct lane database.
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
Run both test suites in parallel:
```bash
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
wait
```
After both complete, read the output files and check pass/fail.
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
{{TEST_FAILURE_TRIAGE}}
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
**If all pass:** Continue silently — just note the counts briefly.
---
## Step 6: Eval Suites (conditional)
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
**1. Check if the diff touches prompt-related files:**
```bash
git diff origin/<base> --name-only
```
Match against these patterns (from CLAUDE.md):
- `app/services/*_prompt_builder.rb`
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
- `config/system_prompts/*.txt`
- `test/evals/**/*` (eval infrastructure changes affect all suites)
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
**2. Identify affected eval suites:**
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
```bash
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
```
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
**Special cases:**
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
```bash
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
```
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
**4. Check results:**
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
- **If all pass:** Note pass counts and cost. Continue to Step 9.
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
**Tier reference (for context — /ship always uses `full`):**
| Tier | When | Speed (cached) | Cost |
|------|------|----------------|------|
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
---
## Step 7: Test Coverage Audit
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense.
**Subagent prompt:** Pass the following instructions to the subagent, with `<base>` substituted with the base branch:
> You are running a ship-workflow test coverage audit. Run `git diff <base>...HEAD` as needed. Do not commit or push — report only.
>
> {{TEST_COVERAGE_AUDIT_SHIP}}
>
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
> `{"coverage_pct":N,"gaps":N,"diagram":"<full markdown coverage diagram for PR body>","tests_added":["path",...]}`
**Parent processing:**
1. Read the subagent's final output. Parse the LAST line as JSON.
2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit).
3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19).
4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.`
**If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none.
---
## Step 8: Plan Completion Audit
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
**Subagent prompt:** Pass these instructions to the subagent:
> You are running a ship-workflow plan completion audit. The base branch is `<base>`. Use `git diff <base>...HEAD` to see what shipped. Do not commit or push — report only.
>
> {{PLAN_COMPLETION_AUDIT_SHIP}}
>
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"summary":"<markdown checklist for PR body>"}`
**Parent processing:**
1. Parse the LAST line of the subagent's output as JSON.
2. Store `done`, `deferred` for Step 20 metrics; use `summary` in PR body.
3. If `deferred > 0` and no user override, present the deferred items via AskUserQuestion before continuing.
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19).
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline. Never block /ship on subagent failure.
---
{{PLAN_VERIFICATION_EXEC}}
{{LEARNINGS_SEARCH}}
{{SCOPE_DRIFT}}
---
## Step 9: Pre-Landing Review
Review the diff for structural issues that tests don't catch.
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
2. Run `git diff origin/<base>` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
3. Apply the review checklist in two passes:
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories
{{CONFIDENCE_CALIBRATION}}
{{DESIGN_REVIEW_LITE}}
Include any design findings alongside the code review findings. They follow the same Fix-First flow below.
{{REVIEW_ARMY}}
{{CROSS_REVIEW_DEDUP}}
4. **Classify each finding from both the checklist pass and specialist review (Step 9.1-Step 9.2) as AUTO-FIX or ASK** per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
`[AUTO-FIXED] [file:line] Problem → what you did`
6. **If ASK items remain,** present them in ONE AskUserQuestion:
- List each with number, severity, problem, recommended fix
- Per-item options: A) Fix B) Skip
- Overall RECOMMENDATION
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
7. **After all fixes (auto + user-approved):**
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 12.
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
If no issues found: `Pre-Landing Review: No issues found.`
9. Persist the review result to the review log:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
```
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
- `quality_score` = the PR Quality Score computed in Step 9.2 (e.g., 7.5). If specialists were skipped (small diff), use `10.0`
- `specialists` = the per-specialist stats object compiled in Step 9.2. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}`
- `findings` = array of per-finding records. For each finding (from checklist pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"`, `"fixed"` (user approved), or `"skipped"` (user chose Skip).
Save the review output — it goes into the PR body in Step 19.
---
## Step 10: Address Greptile review comments (if PR exists)
**Dispatch the fetch + classification as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent pulls every Greptile comment, runs the escalation detection algorithm, and classifies each comment. Parent receives a structured list and handles user interaction + file edits.
**Subagent prompt:**
> You are classifying Greptile review comments for a /ship workflow. Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps. Do NOT fix code, do NOT reply to comments, do NOT commit — report only.
>
> For each comment, assign: `classification` (`valid_actionable`, `already_fixed`, `false_positive`, `suppressed`), `escalation_tier` (1 or 2), the file:line or [top-level] tag, body summary, and permalink URL.
>
> If no PR exists, `gh` fails, the API errors, or there are zero comments, output: `{"total":0,"comments":[]}` and stop.
>
> Otherwise, output a single JSON object on the LAST LINE of your response:
> `{"total":N,"comments":[{"classification":"...","escalation_tier":N,"ref":"file:line","summary":"...","permalink":"url"},...]}`
**Parent processing:**
Parse the LAST line as JSON.
If `total` is 0, skip this step silently. Continue to Step 12.
Otherwise, print: `+ {total} Greptile comments ({valid_actionable} valid, {already_fixed} already fixed, {false_positive} FP)`.
For each comment in `comments`:
**VALID & ACTIONABLE:** Use AskUserQuestion with:
- The comment (file:line or [top-level] + body summary + permalink URL)
- `RECOMMENDATION: Choose A because [one-line reason]`
- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
- Include what was done and the fixing commit SHA
- Save to both per-project and global greptile-history (type: already-fixed)
**FALSE POSITIVE:** Use AskUserQuestion:
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
- Options:
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
- B) Fix it anyway (if trivial)
- C) Ignore silently
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
**After all comments are resolved:** If any fixes were applied, the tests from Step 5 are now stale. **Re-run tests** (Step 5) before continuing to Step 12. If no fixes were applied, continue to Step 12.
---
{{ADVERSARIAL_STEP}}
{{LEARNINGS_LOG}}
{{GBRAIN_SAVE_RESULTS}}
## Step 12: Version bump (auto-decide)
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
```bash
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
PKG_VERSION=""
PKG_EXISTS=0
if [ -f package.json ]; then
PKG_EXISTS=1
if command -v node >/dev/null 2>&1; then
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
PARSE_EXIT=$?
elif command -v bun >/dev/null 2>&1; then
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
PARSE_EXIT=$?
else
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
exit 1
fi
if [ "$PARSE_EXIT" != "0" ]; then
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
exit 1
fi
fi
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
echo "STATE: DRIFT_UNEXPECTED"
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
exit 1
fi
echo "STATE: FRESH"
else
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
echo "STATE: DRIFT_STALE_PKG"
else
echo "STATE: ALREADY_BUMPED"
fi
fi
```
Read the `STATE:` line and dispatch:
- **FRESH** → proceed with the bump action below (steps 14).
- **ALREADY_BUMPED** → skip the bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. Continue to the next step.
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
2. **Auto-decide the bump level based on the diff:**
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
3. Compute the new version:
- Bumping a digit resets all digits to its right to 0
- Example: `0.19.1.0` + PATCH → `0.19.2.0`
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
```bash
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
exit 1
fi
echo "$NEW_VERSION" > VERSION
if [ -f package.json ]; then
if command -v node >/dev/null 2>&1; then
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
exit 1
}
elif command -v bun >/dev/null 2>&1; then
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
exit 1
}
else
echo "ERROR: package.json exists but neither node nor bun is available."
exit 1
fi
fi
```
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
```bash
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
exit 1
fi
if command -v node >/dev/null 2>&1; then
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
echo "ERROR: drift repair failed — could not update package.json."
exit 1
}
else
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
echo "ERROR: drift repair failed."
exit 1
}
fi
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
```
---
{{CHANGELOG_WORKFLOW}}
---
## Step 14: TODOS.md (auto-update)
Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
**1. Check if TODOS.md exists** in the repository root.
**If TODOS.md does not exist:** Use AskUserQuestion:
- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
- Options: A) Create it now, B) Skip for now
- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
- If B: Skip the rest of Step 14. Continue to Step 15.
**2. Check structure and organization:**
Read TODOS.md and verify it follows the recommended structure:
- Items grouped under `## <Skill/Component>` headings
- Each item has `**Priority:**` field with P0-P4 value
- A `## Completed` section at the bottom
**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
- Options: A) Reorganize now (recommended), B) Leave as-is
- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
- If B: Continue to step 3 without restructuring.
**3. Detect completed TODOs:**
This step is fully automatic — no user interaction.
Use the diff and commit history already gathered in earlier steps:
- `git diff <base>...HEAD` (full diff against the base branch)
- `git log <base>..HEAD --oneline` (all commits being shipped)
For each TODO item, check if the changes in this PR complete it by:
- Matching commit messages against the TODO title and description
- Checking if files referenced in the TODO appear in the diff
- Checking if the TODO's described work matches the functional changes
**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
**5. Output summary:**
- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
- Or: `TODOS.md: No completed items detected. M items remaining.`
- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
Save this summary — it goes into the PR body in Step 19.
---
## Step 15: Commit (bisectable chunks)
**Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
1. Analyze the diff and group changes into logical commits. Each commit should represent **one coherent change** — not one file, but one logical unit.
2. **Commit ordering** (earlier commits first):
- **Infrastructure:** migrations, config changes, route additions
- **Models & services:** new models, services, concerns (with their tests)
- **Controllers & views:** controllers, views, JS/React components (with their tests)
- **VERSION + CHANGELOG + TODOS.md:** always in the final commit
3. **Rules for splitting:**
- A model and its test file go in the same commit
- A service and its test file go in the same commit
- A controller, its views, and its test go in the same commit
- Migrations are their own commit (or grouped with the model they support)
- Config/route changes can group with the feature they enable
- If the total diff is small (< 50 lines across < 4 files), a single commit is fine
4. **Each commit must be independently valid** — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
5. Compose each commit message:
- First line: `<type>: <summary>` (type = feat/fix/chore/refactor/docs)
- Body: brief description of what this commit contains
- Only the **final commit** (VERSION + CHANGELOG) gets the version tag and co-author trailer:
```bash
git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W)
{{CO_AUTHOR_TRAILER}}
EOF
)"
```
---
## Step 16: Verification Gate
**IRON LAW: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.**
Before pushing, re-verify if code changed during Steps 4-6:
1. **Test verification:** If ANY code changed after Step 5's test run (fixes from review findings, CHANGELOG edits don't count), re-run the test suite. Paste fresh output. Stale output from Step 5 is NOT acceptable.
2. **Build verification:** If the project has a build step, run it. Paste output.
3. **Rationalization prevention:**
- "Should work now" → RUN IT.
- "I'm confident" → Confidence is not evidence.
- "I already tested earlier" → Code changed since then. Test again.
- "It's a trivial change" → Trivial changes break production.
**If tests fail here:** STOP. Do not push. Fix the issue and return to Step 5.
Claiming work is complete without verification is dishonesty, not efficiency.
---
## Step 17: Push
**Idempotency check:** Check if the branch is already pushed and up to date.
```bash
git fetch origin <branch-name> 2>/dev/null
LOCAL=$(git rev-parse HEAD)
REMOTE=$(git rev-parse origin/<branch-name> 2>/dev/null || echo "none")
echo "LOCAL: $LOCAL REMOTE: $REMOTE"
[ "$LOCAL" = "$REMOTE" ] && echo "ALREADY_PUSHED" || echo "PUSH_NEEDED"
```
If `ALREADY_PUSHED`, skip the push but continue to Step 18. Otherwise push with upstream tracking:
```bash
git push -u origin <branch-name>
```
**You are NOT done.** The code is pushed but documentation sync and PR creation are mandatory final steps. Continue to Step 18.
---
## Step 18: Documentation sync (via subagent, before PR creation)
**Dispatch /document-release as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent gets a fresh context window — zero rot from the preceding 17 steps. It also runs the **full** `/document-release` workflow (with CHANGELOG clobber protection, doc exclusions, risky-change gates, named staging, race-safe PR body editing) rather than a weaker reimplementation.
**Sequencing:** This step runs AFTER Step 17 (Push) and BEFORE Step 19 (Create PR). The PR is created once from final HEAD with the `## Documentation` section baked into the initial body. No create-then-re-edit dance.
**Subagent prompt:**
> You are executing the /document-release workflow after a code push. Read the full skill file `${HOME}/.claude/skills/gstack/document-release/SKILL.md` and execute its complete workflow end-to-end, including CHANGELOG clobber protection, doc exclusions, risky-change gates, and named staging. Do NOT attempt to edit the PR body — no PR exists yet. Branch: `<branch>`, base: `<base>`.
>
> After completing the workflow, output a single JSON object on the LAST LINE of your response (no other text after it):
> `{"files_updated":["README.md","CLAUDE.md",...],"commit_sha":"abc1234","pushed":true,"documentation_section":"<markdown block for PR body's ## Documentation section>"}`
>
> If no documentation files needed updating, output:
> `{"files_updated":[],"commit_sha":null,"pushed":false,"documentation_section":null}`
**Parent processing:**
1. Parse the LAST line of the subagent's output as JSON.
2. Store `documentation_section` — Step 19 embeds it in the PR body (or omits the section if null).
3. If `files_updated` is non-empty, print: `Documentation synced: {files_updated.length} files updated, committed as {commit_sha}`.
4. If `files_updated` is empty, print: `Documentation is current — no updates needed.`
**If the subagent fails or returns invalid JSON:** Print a warning and proceed to Step 19 without a `## Documentation` section. Do not block /ship on subagent failure. The user can run `/document-release` manually after the PR lands.
---
## Step 19: Create PR/MR
**Idempotency check:** Check if a PR/MR already exists for this branch.
**If GitHub:**
```bash
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
```
**If GitLab:**
```bash
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
```
If an **open** PR/MR already exists: **update** the PR body using `gh pr edit --body "..."` (GitHub) or `glab mr update -d "..."` (GitLab). Always regenerate the PR body from scratch using this run's fresh results (test output, coverage audit, review findings, adversarial review, TODOS summary, documentation_section from Step 18). Never reuse stale PR body content from a prior run. Print the existing URL and continue to Step 20.
If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
The PR/MR body should contain these sections:
```
## Summary
<Summarize ALL changes being shipped. Run `git log <base>..HEAD --oneline` to enumerate
every commit. Exclude the VERSION/CHANGELOG metadata commit (that's this PR's bookkeeping,
not a substantive change). Group the remaining commits into logical sections (e.g.,
"**Performance**", "**Dead Code Removal**", "**Infrastructure**"). Every substantive commit
must appear in at least one section. If a commit's work isn't reflected in the summary,
you missed it.>
## Test Coverage
<coverage diagram from Step 7, or "All new code paths have test coverage.">
<If Step 7 ran: "Tests: {before} → {after} (+{delta} new)">
## Pre-Landing Review
<findings from Step 9 code review, or "No issues found.">
## Design Review
<If design review ran: "Design Review (lite): N findings — M auto-fixed, K skipped. AI Slop: clean/N issues.">
<If no frontend files changed: "No frontend files changed — design review skipped.">
## Eval Results
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
## Greptile Review
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
<If no Greptile comments found: "No Greptile comments.">
<If no PR existed during Step 10: omit this section entirely>
## Scope Drift
<If scope drift ran: "Scope Check: CLEAN" or list of drift/creep findings>
<If no scope drift: omit this section>
## Plan Completion
<If plan file found: completion checklist summary from Step 8>
<If no plan file: "No plan file detected.">
<If plan items deferred: list deferred items>
## Verification Results
<If verification ran: summary from Step 8.1 (N PASS, M FAIL, K SKIPPED)>
<If skipped: reason (no plan, no server, no verification section)>
<If not applicable: omit this section>
## TODOS
<If items marked complete: bullet list of completed items with version>
<If no items completed: "No TODO items completed in this PR.">
<If TODOS.md created or reorganized: note that>
<If TODOS.md doesn't exist and user skipped: omit this section>
## Documentation
<Embed the `documentation_section` string returned by Step 18's subagent here, verbatim.>
<If Step 18 returned `documentation_section: null` (no docs updated), omit this section entirely.>
## Test plan
- [x] All Rails tests pass (N runs, 0 failures)
- [x] All Vitest tests pass (N tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
```
**If GitHub:**
```bash
gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
<PR body from above>
EOF
)"
```
**If GitLab:**
```bash
glab mr create -b <base> -t "<type>: <summary>" -d "$(cat <<'EOF'
<MR body from above>
EOF
)"
```
**If neither CLI is available:**
Print the branch name, remote URL, and instruct the user to create the PR/MR manually via the web UI. Do not stop — the code is pushed and ready.
**Output the PR/MR URL** — then proceed to Step 20.
---
## Step 20: Persist ship metrics
Log coverage and plan completion data so `/retro` can track trends:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
```
Append to `~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl`:
```bash
echo '{"skill":"ship","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","coverage_pct":COVERAGE_PCT,"plan_items_total":PLAN_TOTAL,"plan_items_done":PLAN_DONE,"verification_result":"VERIFY_RESULT","version":"VERSION","branch":"BRANCH"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute from earlier steps:
- **COVERAGE_PCT**: coverage percentage from Step 7 diagram (integer, or -1 if undetermined)
- **PLAN_TOTAL**: total plan items extracted in Step 8 (0 if no plan file)
- **PLAN_DONE**: count of DONE + CHANGED items from Step 8 (0 if no plan file)
- **VERIFY_RESULT**: "pass", "fail", or "skipped" from Step 8.1
- **VERSION**: from the VERSION file
- **BRANCH**: current branch name
This step is automatic — never skip it, never ask for confirmation.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), and Codex structured review [P1] findings (large diffs only).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
- **Never push without fresh verification evidence.** If code changed after Step 5 tests, re-run before pushing.
- **Step 7 generates coverage tests.** They must pass before committing. Never commit failing tests.
- **The goal is: user says `/ship`, next thing they see is the review + PR URL + auto-synced docs.**