mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-14 00:12:12 +02:00
00f966b3ec
* fix(codex): use resume-compatible flags * fix: V-001 security vulnerability Automated security fix generated by Orbis Security AI * docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 ind75402bb(v1.6.4.0, "cut Haiku classifier FP from 44% to 23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and ARCHITECTURE.md still read 0.60. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table). No code change. No behavior change. Pure doc-vs-code alignment. Verification: $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md browse/src/security.ts:37: WARN: 0.75, CLAUDE.md:290: - \`WARN: 0.75\` ... ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)... BROWSER.md:743: - \`WARN: 0.75\` ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Korean/CJK IME input and rendering in Sidebar Terminal Fixes #1272 This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal: **Bug 1 - IME Input**: Korean text typed via IME composition was not reaching the PTY correctly. Added compositionstart/compositionend event listeners to suppress partial jamo fragments and only send the final composed string. **Bug 2a - Font Rendering**: Added CJK monospace font fallbacks ("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js fontFamily config and the CSS --font-mono variable. This ensures consistent cell-width calculations for Korean characters. **Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent multi-byte UTF-8 characters (Korean is 3 bytes) from being split across WebSocket chunks. This follows the same pattern as PR #1007 which fixed the sidebar-agent path, but extends it to the terminal-agent path. Special thanks to @ldybob for the excellent root cause analysis and proposed solutions in issue #1272. Tested on WSL2 + Windows 11 with Korean IME. * fix(ship): tighten Plan Completion gate (VAS-449 remediation) VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has /docs/dashboard.md) silently skipped. /ship's Plan Completion subagent existed at ship time (added in v1.4.1.0) but the gate let the failure through. Four structural fixes: 1. Path concreteness rule: items naming a concrete filesystem path MUST be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE. 2. Validator detection: CONTENT-SHAPE items scan target repo's package.json for validate-* scripts and run them before falling back to UNVERIFIABLE. 3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked each one" with per-item Y/N/D loop. The blanket-confirm path is the exact failure VAS-449 surfaced. 4. Subagent fail-closed: if Plan Completion subagent + inline fallback both fail, surface explicit AskUserQuestion instead of silent pass. Replaces the prior "Never block /ship on subagent failure" fail-open. Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions, no LLM dependency, ~60ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): bash.exe wrap for telemetry on Windows reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args) where bin is the gstack-telemetry-log bash script. On Windows this fails silently with ENOENT — CreateProcess can't dispatch on shebang lines. Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path or PATH-resolvable override, falling back to Bun.which('bash') which finds Git Bash on the standard Windows install. buildTelemetrySpawnCommand() wraps the script invocation on win32 only; POSIX path is bit-identical. Returns null when bash can't be resolved on Windows so caller skips spawn — local attempts.jsonl audit trail keeps working without surfacing a Windows-only failure. 8 new unit tests cover resolveBashBinary (POSIX bash, absolute override, quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand (POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array immutability). POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on. * fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and make-pdf/src/pdftotext.ts:resolvePdftotext. Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which` isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same Bun.which-based fix shape, same env override convention. Changes: - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides; BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases. - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles Windows PATHEXT natively; no more `where`-vs-`which` branch. - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat after the bare-path miss on win32. Linux/macOS behavior is bit-identical (isExecutable short-circuits before the win32 branch ever runs). - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local, /usr/bin). No Windows candidates added; Poppler installs scatter across Scoop/Chocolatey/portable zips and guessing causes false positives. - Error messages get a Windows install hint (scoop install poppler / oschwartz10612) and `setx` example for GSTACK_*_BIN. - Pre-existing test 'honors BROWSE_BIN when it points at a real executable' was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own. Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming are additive. Either order of merge works. Test plan: - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null, resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip, same shape for resolvePdftotext + Windows install hint in error message). - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits before any win32 logic runs, matching the v1.24 claude-bin.ts pattern. * fix(browse): NTFS ACL hardening for Windows state files via icacls gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent queue contents (with prompt history), session state, security-decision logs, and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows, those mode bits are a silent no-op: Node's fs module doesn't translate POSIX modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable by other principals on the machine (verified via icacls — six ACEs, the intended user is the LAST of six). Threat model is non-trivial on: - Self-hosted CI runners (different service account on the same Windows box can read developer tokens, canary tokens, prompt history) - Shared development machines (agencies, studios, lab environments) - Multi-tenant servers with shared home directories Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin / global / local installs; this PR ensures files written into those resolved paths actually get the POSIX 0o600 semantic translated to NTFS. The fix: - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset). restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX) or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile / appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns. - 19 call sites converted across 9 source files: browser-manager.ts, browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts, security-classifier.ts, security.ts (4 sites), server.ts (5 sites), terminal-agent.ts (8 sites), tunnel-denial-log.ts. - (OI)(CI) inheritance flags on directories mean files created via fs.write* *inside* an mkdirSecure-created dir inherit the owner-only ACL automatically — important for tunnel-denial-log.ts where appends use async fsp.appendFile. Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened environments) log a one-shot warning to stderr and proceed. Once-per-process gating prevents log spam if the condition persists. Filesystem stays functional; the file just ends up with inherited ACLs. Test plan: - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX mode-bit assertions, Windows no-throw, mkdir idempotence, recursive creation, Buffer payloads, append-creates-then-reapplies-once semantics) - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security test suite plus the bash-binary resolution tests added in fix #1119; the converted writeFileSync/appendFileSync/mkdirSync sites in security.ts integrate cleanly) - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE - bun build typecheck on all modified files — clean (server.ts has a pre-existing playwright-core/electron resolution issue unrelated to this PR) POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux and macOS see no behavior change. Inviting pushback on three judgment calls (in PR description): 1. icacls vs npm library 2. ACL scope — just user, or user + SYSTEM? 3. Graceful degradation — once-per-process warn, not silent, not hard-fail. * fix(browse): declare lastConsoleFlushed to restore console-log persistence flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337 and assigns it at :344, but the `let lastConsoleFlushed = 0;` declaration is missing — only the network and dialog siblings are declared at lines 327-328. Result: every 1-second flushBuffers tick (line 376) throws `ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by the catch at line 369 ("[browse] Buffer flush failed: ..."), and the console branch's append never runs. browse-console.log is never written in any production deployment since this regressed. Discovered by stress-testing the daemon with 15 concurrent CLIs against cold state — the race surfaced the buffer-flush error spam in one spawned daemon's stderr. Verified by running the daemon against a real file:// page with console.log events: in-memory `browse console` returns the entries, but `.gstack/browse-console.log` is never created on disk. Regression introduced by1a100a2a"fix: eliminate duplicate command sets in chain, improve flush perf and type safety" — the flush refactor switched from `Bun.write` to `fs.appendFileSync` and added the `lastConsoleFlushed` cursor pattern alongside its network/dialog siblings, but missed the matching `let` declaration. Tests don't currently exercise flushBuffers, so the regression shipped silently. Fix: - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed` and `lastDialogFlushed` (browse/src/server.ts:327) - Add a source-level guard test (browse/test/server-flush-trackers.test.ts) that fails any future refactor that adds a fourth `last*Flushed` cursor without the matching declaration. Same pattern as terminal-agent.test.ts and dual-listener.test.ts — read source as text, assert invariant, no daemon required. Test plan: - [x] New regression test fails on current main, passes with the fix - [x] `bun run build` clean - [x] Manual smoke: spawn daemon -> goto file:// page with console.log -> wait 4s -> .gstack/browse-console.log now exists with the expected entries (163 bytes vs zero before) 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(browse): per-process state-file temp path to fix concurrent-write ENOENT The daemon writes `.gstack/browse.json` via the standard atomic-rename pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four sites in server.ts use this pattern (initial daemon-startup state at :2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all four hard-code the same temp filename `${stateFile}.tmp`. Under concurrent writers the shared filename races on the rename: t0 Writer A: writeFileSync(stateFile + '.tmp', payloadA) t1 Writer B: writeFileSync(stateFile + '.tmp', payloadB) // overwrites A t2 Writer A: renameSync(stateFile + '.tmp', stateFile) // moves B's payload t3 Writer B: renameSync(stateFile + '.tmp', stateFile) // ENOENT — file gone Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`: [browse] Failed to start: ENOENT: no such file or directory, rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json' Pre-fix success rate: **0 / 15** under cold-start race. Post-fix success rate: **15 / 15**, zero ENOENT. Fix: - New `tmpStatePath()` helper (server.ts:333) returns `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}` - All 4 call sites use `tmpStatePath()` instead of the shared literal - Atomic rename still gives last-writer-wins semantics on the final state.json content; only behavior change is that concurrent writers no longer kill each other on the rename step Source-level guard test (browse/test/server-tmp-state-path.test.ts) locks two invariants: (1) no remaining `stateFile + '.tmp'` literals, (2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same read-source-as-text pattern as terminal-agent.test.ts and dual-listener.test.ts — no daemon required, runs in tier-1 free. Test plan: - [x] Targeted source-level guard test passes (3 / 0) - [x] `bun run build` clean - [x] Live regression: 15 concurrent CLIs against cold state → 15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix) - [x] No `.tmp.*` orphans left behind after rename succeeds - [x] Related test cluster (server-auth, dual-listener, cdp-mutex, findport) — same pre-existing flakes as `main`, no new regressions introduced 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage Asymmetric cleanup between two equivalent staleness conditions: onMainFrameNavigated() → clearRefs() + activeFrame = null ✓ getActiveFrameOrPage() → activeFrame = null (refs NOT cleared) ✗ Both paths see the same staleness condition — refs were captured against a frame that no longer exists. The main-frame path correctly clears both pieces of state. The iframe-detach path nulls the frame but leaves the refMap intact. The lazy click-time check in `resolveRef` (tab-session.ts:97) partially saves us — `entry.locator.count()` on a detached-frame locator throws or returns 0, so the click errors out as "Ref X is stale". But the user has no signal that frame context silently changed underfoot: the next `snapshot` runs against `this.page` (main) while old iframe refs still litter `refMap` with the same role+name keys. New refs collide with stale ones, the resolver picks one at random, the user clicks the wrong element. TODOS.md line 816-820 documents "Detached frame auto-recovery" as a shipped iframe-support feature in v0.12.1.0. This restores the documented intent — the recovery should leave the session in a clean state, not a half-cleared one. Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null` inside the if-branch. Test plan: - [x] New regression test: 4/4 pass - refs cleared when getActiveFrameOrPage detects detached iframe - refs preserved when active frame is still attached (no regression) - refs preserved when no frame set (page-level path untouched) - matches onMainFrameNavigated symmetry — both paths reach the same clean end state - [x] `bun run build` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(codex): resolve python for JSON parser * fix: add fail-fast probe for base branch in ship step 12 * fix(plan-devex-review): remove contradictory plan-mode handshake * fix(design): honor Retry-After header in variants 429 handler Closes #1244. The 429 handler in `generateVariant` discarded the `Retry-After` response header and fell straight through to a local exponential schedule (2s/4s/8s). In image-generation batches, that burns retry attempts inside the provider's cooldown window and the request never recovers. Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`) and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds are validated as digits-only (rejects `2abc`). When `Retry-After` is honored (including 0 / past-date "retry now"), the next iteration's leading exponential sleep is skipped so we don't double-wait. Invalid or missing headers fall through to the existing exponential schedule unchanged. Behavior matrix: | Header | Behavior | |---------------------------------|-------------------------------------------| | Retry-After: 5 | wait 5s, skip leading on next attempt | | Retry-After: 999999 | capped to 60s, skip leading | | Retry-After: 2abc | invalid, fall through to exponential | | Retry-After: 0 | wait 0, skip leading (retry immediately) | | Retry-After: <past HTTP-date> | wait 0, skip leading | | Retry-After: <future date> | wait diff capped at 60s, skip leading | | no header | fall through to existing exponential | `generateVariant` now accepts an optional `fetchFn` parameter (defaults to `globalThis.fetch`) so tests can inject a stub. Production call sites are unchanged. Tests cover the five behavior buckets above, asserting both the 1st-to-2nd call timing gap and call counts. All five pass in ~8s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docs): correct per-skill symlink removal snippet in README uninstall Closes #1130. The manual-uninstall fallback in `## Uninstall` → `### Option 2` used `find ~/.claude/skills -maxdepth 1 -type l`, which finds nothing on real installs. Each `~/.claude/skills/<name>/` is a real directory, and only `<name>/SKILL.md` inside it is a symlink into `gstack/`. The find never matched, so the snippet silently removed nothing. Replace with a directory walk that inspects each `<name>/SKILL.md`: find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack → check $dir/SKILL.md is a symlink → readlink it → if target is gstack/* or */gstack/*: rm -f the link, rmdir the dir (only if empty — preserves any user-added files) Excludes the top-level `gstack/` dir from the walk; that's removed by step 3 of the same uninstall block. `bin/gstack-uninstall` (the script-mode path) already handles the layout correctly via its own walk; only this manual fallback needed updating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reject partial browse client env integers * fix(gemini-adapter): detect new ~/.gemini/oauth_creds.json auth path gemini-cli >=0.30 stores OAuth credentials at ~/.gemini/oauth_creds.json instead of the legacy ~/.config/gemini/ directory. The benchmark adapter's availability check now succeeds for users on recent gemini-cli releases who have authenticated via interactive login. Both paths are accepted so users on older versions still work. * fix(browser): add --no-sandbox for root user on Linux/WSL2 Chromium's sandbox can't initialize when running as root on Linux, causing an immediate exit. Extend the existing CI/CONTAINER check to also cover this case, keeping the Windows-safe `typeof getuid` guard. * security: pass cwd to git via execFileSync, not interpolation through /bin/sh `bin/gstack-memory-ingest.ts:632-643` ran `execSync(\`git -C ${JSON.stringify(cwd)} remote get-url origin 2>/dev/null\`, ...)`. JSON.stringify escapes `"` and `\` but not `$` or backticks, so a `cwd` of `"$(touch /tmp/marker)"` survived JSON quoting and detonated under /bin/sh's command-substitution-inside-double-quotes. `cwd` originates from transcript JSONL records under `~/.claude/projects/<encoded-cwd>/<uuid>.jsonl` and `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. The walker grabs the first `.cwd` it sees per session. That's an untrusted surface in the gstack threat model — the L1-L6 sidebar security stack exists exactly because agent transcripts can carry attacker-influenced text. Two pivots above the local same-uid bar: (a) prompt-injection appending `cwd="$(...)"` to the active session log turns the next /sync-gbrain run into RCE under the user's uid; (b) cross-machine transcript share (a colleague's `.claude/projects` snippet untar'd into HOME, a documented gbrain dogfooding shape) → RCE on first sync. Fix swaps the one execSync for `execFileSync("git", ["-C", cwd, "remote", "get-url", "origin"], ...)`. No shell, argv passed directly to git. The same module already uses execFileSync for `gbrainAvailable()` (line 762 pre-patch) and `gbrainPutPage()` (line 816 pre-patch) — this single execSync was the outlier. Test: `gstack-memory-ingest security: untrusted cwd cannot trigger shell substitution` plants a Claude-Code-shaped JSONL with cwd=`$(touch <marker>)` and asserts the marker file is not created after `--incremental --quiet`. Negative control: with the patch reverted, the test fails (marker created); with the patch applied, it passes (18/18 in test/gstack-memory-ingest.test.ts). * security: gate domain-skill auto-promote on classifier_score > 0 `browse/src/domain-skill-commands.ts:140` (handleSave) writes `classifier_score: 0` with the comment "L4 deferred to load-time / sidebar-agent fills this in on first prompt-injection load." But CLAUDE.md "Sidebar architecture" documents that sidebar-agent.ts was ripped, and grep for recordSkillUse + classifierFlagged callers across browse/src/ returns zero hits outside the module under test. Net effect: every quarantined skill that survives three benign uses without flag (`recordSkillUse(... , classifierFlagged: false)` x3) auto-promotes to `active` and lands in prompt context wrapped as UNTRUSTED on every subsequent visit to that host. The L4 score that was supposed to gate the promotion was never written — the production save path puts 0 on disk and nothing later updates it. Threat model: a domain-skill body authored by an agent under the influence of a poisoned page (the new `gstackInjectToTerminal` PTY path runs no L1-L3 either) would lose its auto-promote barrier after three uses. The exploit isn't single-step but the bar is exactly N=3 prompt-injection-shaped uses on a hostile page, which is well within reach. Fix adds a single condition to the auto-promote gate in `recordSkillUse`: if (state === 'quarantined' && useCount >= PROMOTE_THRESHOLD && flagCount === 0 && current.classifier_score > 0) { state = 'active'; } `classifier_score` is set once at writeSkill and never updated. Production saves it as 0 (handleSave), so the gate stays closed; existing tests that explicitly pass `classifierScore: 0.1` still auto-promote (the auto-promote path is preserved for the day L4 is rewired). Manual promotion via `domain-skill promote-to-global` is unaffected (it goes through `promoteToGlobal` which has its own state-machine guard at line 337+). Test: new regression case `does NOT auto-promote when classifier_score is 0 (production handleSave shape)` plants a skill with classifierScore=0 (matches domain-skill-commands.ts:140), runs three uses without flag, asserts the skill stays quarantined and readSkill returns null. Negative control: revert the patch, the test fails with `Received: "active"`. With the patch: 15/15 pass. * fix(ship): port #1302 SKILL.md edits to .tmpl + resolver source PR #1302 added Verification Mode + UNVERIFIABLE classification + per-item confirmation gate to ship/SKILL.md, but only the generated SKILL.md was edited — not the .tmpl source or scripts/resolvers/review.ts. The next `bun run gen:skill-docs` run would have wiped the changes. Port the same content into the resolver and .tmpl so regeneration produces the intended output. * ci(windows): extend free-tests lane to cover icacls + Bun.which resolvers from fix-wave PRs Closes #1306/#1307/#1308 validation gap. The four newly-added test files already have process.platform guards so they run safely on both POSIX and Windows lanes — only platform-relevant assertions execute on each. Tests added to the windows-latest lane: - browse/test/file-permissions.test.ts (#1308 icacls + writeSecureFile) - browse/test/security.test.ts (#1306 bash.exe wrap pure-function path) - make-pdf/test/browseClient.test.ts (#1307 Bun.which browse resolver) - make-pdf/test/pdftotext.test.ts (#1307 Bun.which pdftotext resolver) * test(codex): live flag-semantics smoke for codex exec resume Closes #1270's regex-only test gap. PR #1270 asserted that codex/SKILL.md's `codex exec resume` invocation drops -C/-s and uses sandbox_mode config. That regex catches the skill template regressing, but not codex CLI itself flipping flag semantics again. This test probes `codex exec resume --help` and asserts the surface gstack relies on: -c/sandbox_mode is accepted, top-level -C is absent. Skips silently when codex isn't on PATH, so dev machines without codex installed never see it fail. * chore: regen SKILL.md after fix wave One regen commit at the end of the merge wave per the plan. plan-devex-review loses the contradictory plan-mode handshake (#1333). review/SKILL.md picks up the Verification Mode + UNVERIFIABLE classification additions that #1302 authored against ship/SKILL.md (same resolver shared between ship and review modes). * fix(server.ts): keep fs.writeFileSync for state-file writes #1308's writeSecureFile wrapper added Windows icacls hardening for the 4 state-file write sites in server.ts, but #1310's regression test grep's for fs.writeFileSync(tmpStatePath()) calls. The two changes are technically compatible only if the test relaxes — keeping the test strict (the safer choice for catching regressions on the cold-start race) means the 4 state- file sites stay on fs.writeFileSync(..., { mode: 0o600 }). POSIX 0o600 hardening is preserved on those 4 sites. Windows icacls hardening still applies to all the other writeSecureFile call sites #1308 added (auth.json, mkdirSecure, etc.). Also refreshes golden baselines after #1302 / port + minor wording tweak in scripts/resolvers/review.ts to keep gen-skill-docs.test.ts assertion 'Cite the specific file' satisfied. * v1.30.0.0: fix wave — 21 community PRs + 2 closing fixes for Windows + codex CI gaps Headline release. Browse stops dropping console logs, cold-start race fixed, codex resume works without python3, Windows hardening (icacls + Bun.which + bash.exe wrap), ship gate gets VAS-449 remediation, two closing fixes that put icacls/Bun.which/codex flag semantics under CI. * test(domain-skills): cover #1369 classifier_score=0 quarantine + score>0 promote path The pre-existing T6 test seeded skills via writeSkill (which defaults classifier_score to 0 until L4 is rewired) and then expected 3 uses to auto-promote. PR #1369 added `current.classifier_score > 0` to the gate specifically to block that path — a quarantined skill written under the influence of a poisoned page would otherwise auto-promote after three benign uses. Updated test asserts both halves of the new contract: - classifier_score=0 + 3 uses → stays quarantined (the security guarantee) - classifier_score>0 + 3 more uses → promotes to active (unblock path) Catches both regressions: the gate going away (would re-allow the bypass) and the unblock path breaking (would silently quarantine all skills forever once L4 is rewired). --------- Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: orbisai0security <mediratta01.pally@gmail.com> Co-authored-by: Bryce Alan <brycealan.eth@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Terry Carson YM <cym3118288@gmail.com> Co-authored-by: Vasko Ckorovski <vckorovski@gmail.com> Co-authored-by: Samuel Carson <samuel.carson@gmail.com> Co-authored-by: Yashwant Kotipalli <yashwant7kotipalli@gmail.com> Co-authored-by: Jasper Chen <jasperchen925@gmail.com> Co-authored-by: Stefan Neamtu <stefan.neamtu@gmail.com> Co-authored-by: 陈家名 <chenjiaming@kezaihui.com> Co-authored-by: Abigail Atheryon <abi@atheryon.ai> Co-authored-by: Furkan Köykıran <furkankoykiran@gmail.com> Co-authored-by: gus <gustavoraularagon@gmail.com>
945 lines
47 KiB
Cheetah
945 lines
47 KiB
Cheetah
---
|
||
name: ship
|
||
preamble-tier: 4
|
||
version: 1.0.0
|
||
description: |
|
||
Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION,
|
||
update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy",
|
||
"push to main", "create a PR", "merge and push", or "get it deployed".
|
||
Proactively invoke this skill (do NOT push/PR directly) when the user says code
|
||
is ready, asks about deploying, wants to push code up, or asks to create a PR. (gstack)
|
||
allowed-tools:
|
||
- Bash
|
||
- Read
|
||
- Write
|
||
- Edit
|
||
- Grep
|
||
- Glob
|
||
- Agent
|
||
- AskUserQuestion
|
||
- WebSearch
|
||
sensitive: true
|
||
triggers:
|
||
- ship it
|
||
- create a pr
|
||
- push to main
|
||
- deploy this
|
||
---
|
||
|
||
{{PREAMBLE}}
|
||
|
||
{{BASE_BRANCH_DETECT}}
|
||
|
||
{{GBRAIN_CONTEXT_LOAD}}
|
||
|
||
# Ship: Fully Automated Ship Workflow
|
||
|
||
You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
|
||
|
||
**Only stop for:**
|
||
- On the base branch (abort)
|
||
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
|
||
- In-branch test failures (pre-existing failures are triaged, not auto-blocking)
|
||
- Pre-landing review finds ASK items that need user judgment
|
||
- MINOR or MAJOR version bump needed (ask — see Step 12)
|
||
- Greptile review comments that need user decision (complex fixes, false positives)
|
||
- AI-assessed coverage below minimum threshold (hard gate with user override — see Step 7)
|
||
- Plan items NOT DONE with no user override (see Step 8)
|
||
- Plan verification failures (see Step 8.1)
|
||
- TODOS.md missing and user wants to create one (ask — see Step 14)
|
||
- TODOS.md disorganized and user wants to reorganize (ask — see Step 14)
|
||
|
||
**Never stop for:**
|
||
- Uncommitted changes (always include them)
|
||
- Version bump choice (auto-pick MICRO or PATCH — see Step 12)
|
||
- CHANGELOG content (auto-generate from diff)
|
||
- Commit message approval (auto-commit)
|
||
- Multi-file changesets (auto-split into bisectable commits)
|
||
- TODOS.md completed-item detection (auto-mark)
|
||
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
|
||
- Test coverage gaps within target threshold (auto-generate and commit, or flag in PR body)
|
||
|
||
**Re-run behavior (idempotency):**
|
||
Re-running `/ship` means "run the whole checklist again." Every verification step
|
||
(tests, coverage audit, plan completion, pre-landing review, adversarial review,
|
||
VERSION/CHANGELOG check, TODOS, document-release) runs on every invocation.
|
||
Only *actions* are idempotent:
|
||
- Step 12: If VERSION already bumped, skip the bump but still read the version
|
||
- Step 17: If already pushed, skip the push command
|
||
- Step 19: If PR exists, update the body instead of creating a new PR
|
||
Never skip a verification step because a prior `/ship` run already performed it.
|
||
|
||
---
|
||
|
||
## Step 1: Pre-flight
|
||
|
||
1. Check the current branch. If on the base branch or the repo's default branch, **abort**: "You're on the base branch. Ship from a feature branch."
|
||
|
||
2. Run `git status` (never use `-uall`). Uncommitted changes are always included — no need to ask.
|
||
|
||
3. Run `git diff <base>...HEAD --stat` and `git log <base>..HEAD --oneline` to understand what's being shipped.
|
||
|
||
4. Check review readiness:
|
||
|
||
{{REVIEW_DASHBOARD}}
|
||
|
||
If the Eng Review is NOT "CLEAR":
|
||
|
||
Print: "No prior eng review found — ship will run its own pre-landing review in Step 9."
|
||
|
||
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
|
||
|
||
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
|
||
|
||
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 9, but consider running /design-review for a full visual audit post-implementation." Still never block.
|
||
|
||
Continue to Step 2 — do NOT block or ask. Ship runs its own review in Step 9.
|
||
|
||
---
|
||
|
||
## Step 2: Distribution Pipeline Check
|
||
|
||
If the diff introduces a new standalone artifact (CLI binary, library package, tool) — not a web
|
||
service with existing deployment — verify that a distribution pipeline exists.
|
||
|
||
1. Check if the diff adds a new `cmd/` directory, `main.go`, or `bin/` entry point:
|
||
```bash
|
||
git diff origin/<base> --name-only | grep -E '(cmd/.*/main\.go|bin/|Cargo\.toml|setup\.py|package\.json)' | head -5
|
||
```
|
||
|
||
2. If new artifact detected, check for a release workflow:
|
||
```bash
|
||
ls .github/workflows/ 2>/dev/null | grep -iE 'release|publish|dist'
|
||
grep -qE 'release|publish|deploy' .gitlab-ci.yml 2>/dev/null && echo "GITLAB_CI_RELEASE"
|
||
```
|
||
|
||
3. **If no release pipeline exists and a new artifact was added:** Use AskUserQuestion:
|
||
- "This PR adds a new binary/tool but there's no CI/CD pipeline to build and publish it.
|
||
Users won't be able to download the artifact after merge."
|
||
- A) Add a release workflow now (CI/CD release pipeline — GitHub Actions or GitLab CI depending on platform)
|
||
- B) Defer — add to TODOS.md
|
||
- C) Not needed — this is internal/web-only, existing deployment covers it
|
||
|
||
4. **If release pipeline exists:** Continue silently.
|
||
5. **If no new artifact detected:** Skip silently.
|
||
|
||
---
|
||
|
||
## Step 3: Merge the base branch (BEFORE tests)
|
||
|
||
Fetch and merge the base branch into the feature branch so tests run against the merged state:
|
||
|
||
```bash
|
||
git fetch origin <base> && git merge origin/<base> --no-edit
|
||
```
|
||
|
||
**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
|
||
|
||
**If already up to date:** Continue silently.
|
||
|
||
---
|
||
|
||
## Step 4: Test Framework Bootstrap
|
||
|
||
{{TEST_BOOTSTRAP}}
|
||
|
||
---
|
||
|
||
## Step 5: Run tests (on merged code)
|
||
|
||
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
|
||
`db:test:prepare` internally, which loads the schema into the correct lane database.
|
||
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
|
||
|
||
Run both test suites in parallel:
|
||
|
||
```bash
|
||
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
|
||
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
|
||
wait
|
||
```
|
||
|
||
After both complete, read the output files and check pass/fail.
|
||
|
||
**If any test fails:** Do NOT immediately stop. Apply the Test Failure Ownership Triage:
|
||
|
||
{{TEST_FAILURE_TRIAGE}}
|
||
|
||
**After triage:** If any in-branch failures remain unfixed, **STOP**. Do not proceed. If all failures were pre-existing and handled (fixed, TODOed, assigned, or skipped), continue to Step 6.
|
||
|
||
**If all pass:** Continue silently — just note the counts briefly.
|
||
|
||
---
|
||
|
||
## Step 6: Eval Suites (conditional)
|
||
|
||
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
|
||
|
||
**1. Check if the diff touches prompt-related files:**
|
||
|
||
```bash
|
||
git diff origin/<base> --name-only
|
||
```
|
||
|
||
Match against these patterns (from CLAUDE.md):
|
||
- `app/services/*_prompt_builder.rb`
|
||
- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
|
||
- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
|
||
- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
|
||
- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
|
||
- `config/system_prompts/*.txt`
|
||
- `test/evals/**/*` (eval infrastructure changes affect all suites)
|
||
|
||
**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 9.
|
||
|
||
**2. Identify affected eval suites:**
|
||
|
||
Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
|
||
|
||
```bash
|
||
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
|
||
```
|
||
|
||
Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
|
||
|
||
**Special cases:**
|
||
- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
|
||
- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
|
||
- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
|
||
|
||
**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
|
||
|
||
`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
|
||
|
||
```bash
|
||
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
|
||
```
|
||
|
||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||
|
||
**4. Check results:**
|
||
|
||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||
- **If all pass:** Note pass counts and cost. Continue to Step 9.
|
||
|
||
**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 19).
|
||
|
||
**Tier reference (for context — /ship always uses `full`):**
|
||
| Tier | When | Speed (cached) | Cost |
|
||
|------|------|----------------|------|
|
||
| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
|
||
| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
|
||
| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
|
||
|
||
---
|
||
|
||
## Step 7: Test Coverage Audit
|
||
|
||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent runs the coverage audit in a fresh context window — the parent only sees the conclusion, not intermediate file reads. This is context-rot defense.
|
||
|
||
**Subagent prompt:** Pass the following instructions to the subagent, with `<base>` substituted with the base branch:
|
||
|
||
> You are running a ship-workflow test coverage audit. Run `git diff <base>...HEAD` as needed. Do not commit or push — report only.
|
||
>
|
||
> {{TEST_COVERAGE_AUDIT_SHIP}}
|
||
>
|
||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||
> `{"coverage_pct":N,"gaps":N,"diagram":"<full markdown coverage diagram for PR body>","tests_added":["path",...]}`
|
||
|
||
**Parent processing:**
|
||
|
||
1. Read the subagent's final output. Parse the LAST line as JSON.
|
||
2. Store `coverage_pct` (for Step 20 metrics), `gaps` (user summary), `tests_added` (for the commit).
|
||
3. Embed `diagram` verbatim in the PR body's `## Test Coverage` section (Step 19).
|
||
4. Print a one-line summary: `Coverage: {coverage_pct}%, {gaps} gaps. {tests_added.length} tests added.`
|
||
|
||
**If the subagent fails, times out, or returns invalid JSON:** Fall back to running the audit inline in the parent. Do not block /ship on subagent failure — partial results are better than none.
|
||
|
||
---
|
||
|
||
## Step 8: Plan Completion Audit
|
||
|
||
**Dispatch this step as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent reads the plan file and every referenced code file in its own fresh context. Parent gets only the conclusion.
|
||
|
||
**Subagent prompt:** Pass these instructions to the subagent:
|
||
|
||
> You are running a ship-workflow plan completion audit. The base branch is `<base>`. Use `git diff <base>...HEAD` to see what shipped. Do not commit or push — report only.
|
||
>
|
||
> {{PLAN_COMPLETION_AUDIT_SHIP}}
|
||
>
|
||
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}`
|
||
|
||
**Parent processing:**
|
||
|
||
1. Parse the LAST line of the subagent's output as JSON.
|
||
2. Store `done`, `deferred`, `unverifiable` for Step 20 metrics; use `summary` in PR body.
|
||
3. If `deferred > 0` or `unverifiable > 0` and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing.
|
||
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19). If `unverifiable > 0` and the user picked option A in the UNVERIFIABLE gate, also embed `## Plan Completion — Manual Verifications` listing each user-confirmed item.
|
||
|
||
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.
|
||
|
||
---
|
||
|
||
{{PLAN_VERIFICATION_EXEC}}
|
||
|
||
{{LEARNINGS_SEARCH}}
|
||
|
||
{{SCOPE_DRIFT}}
|
||
|
||
---
|
||
|
||
## Step 9: Pre-Landing Review
|
||
|
||
Review the diff for structural issues that tests don't catch.
|
||
|
||
1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
|
||
|
||
2. Run `git diff origin/<base>` to get the full diff (scoped to feature changes against the freshly-fetched base branch).
|
||
|
||
3. Apply the review checklist in two passes:
|
||
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
|
||
- **Pass 2 (INFORMATIONAL):** All remaining categories
|
||
|
||
{{CONFIDENCE_CALIBRATION}}
|
||
|
||
{{DESIGN_REVIEW_LITE}}
|
||
|
||
Include any design findings alongside the code review findings. They follow the same Fix-First flow below.
|
||
|
||
{{REVIEW_ARMY}}
|
||
|
||
{{CROSS_REVIEW_DEDUP}}
|
||
|
||
4. **Classify each finding from both the checklist pass and specialist review (Step 9.1-Step 9.2) as AUTO-FIX or ASK** per the Fix-First Heuristic in
|
||
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
|
||
|
||
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
|
||
`[AUTO-FIXED] [file:line] Problem → what you did`
|
||
|
||
6. **If ASK items remain,** present them in ONE AskUserQuestion:
|
||
- List each with number, severity, problem, recommended fix
|
||
- Per-item options: A) Fix B) Skip
|
||
- Overall RECOMMENDATION
|
||
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
|
||
|
||
7. **After all fixes (auto + user-approved):**
|
||
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
|
||
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 12.
|
||
|
||
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
|
||
|
||
If no issues found: `Pre-Landing Review: No issues found.`
|
||
|
||
9. Persist the review result to the review log:
|
||
```bash
|
||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
|
||
```
|
||
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
|
||
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
|
||
- `quality_score` = the PR Quality Score computed in Step 9.2 (e.g., 7.5). If specialists were skipped (small diff), use `10.0`
|
||
- `specialists` = the per-specialist stats object compiled in Step 9.2. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}`
|
||
- `findings` = array of per-finding records. For each finding (from checklist pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"`, `"fixed"` (user approved), or `"skipped"` (user chose Skip).
|
||
|
||
Save the review output — it goes into the PR body in Step 19.
|
||
|
||
---
|
||
|
||
## Step 10: Address Greptile review comments (if PR exists)
|
||
|
||
**Dispatch the fetch + classification as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent pulls every Greptile comment, runs the escalation detection algorithm, and classifies each comment. Parent receives a structured list and handles user interaction + file edits.
|
||
|
||
**Subagent prompt:**
|
||
|
||
> You are classifying Greptile review comments for a /ship workflow. Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps. Do NOT fix code, do NOT reply to comments, do NOT commit — report only.
|
||
>
|
||
> For each comment, assign: `classification` (`valid_actionable`, `already_fixed`, `false_positive`, `suppressed`), `escalation_tier` (1 or 2), the file:line or [top-level] tag, body summary, and permalink URL.
|
||
>
|
||
> If no PR exists, `gh` fails, the API errors, or there are zero comments, output: `{"total":0,"comments":[]}` and stop.
|
||
>
|
||
> Otherwise, output a single JSON object on the LAST LINE of your response:
|
||
> `{"total":N,"comments":[{"classification":"...","escalation_tier":N,"ref":"file:line","summary":"...","permalink":"url"},...]}`
|
||
|
||
**Parent processing:**
|
||
|
||
Parse the LAST line as JSON.
|
||
|
||
If `total` is 0, skip this step silently. Continue to Step 12.
|
||
|
||
Otherwise, print: `+ {total} Greptile comments ({valid_actionable} valid, {already_fixed} already fixed, {false_positive} FP)`.
|
||
|
||
For each comment in `comments`:
|
||
|
||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||
- `RECOMMENDATION: Choose A because [one-line reason]`
|
||
- Options: A) Fix now, B) Acknowledge and ship anyway, C) It's a false positive
|
||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||
|
||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||
- Include what was done and the fixing commit SHA
|
||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||
|
||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||
- Options:
|
||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||
- B) Fix it anyway (if trivial)
|
||
- C) Ignore silently
|
||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||
|
||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||
|
||
**After all comments are resolved:** If any fixes were applied, the tests from Step 5 are now stale. **Re-run tests** (Step 5) before continuing to Step 12. If no fixes were applied, continue to Step 12.
|
||
|
||
---
|
||
|
||
{{ADVERSARIAL_STEP}}
|
||
|
||
{{LEARNINGS_LOG}}
|
||
|
||
{{GBRAIN_SAVE_RESULTS}}
|
||
|
||
## Step 12: Version bump (auto-decide)
|
||
|
||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||
|
||
```bash
|
||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||
exit 1
|
||
fi
|
||
|
||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||
PKG_VERSION=""
|
||
PKG_EXISTS=0
|
||
if [ -f package.json ]; then
|
||
PKG_EXISTS=1
|
||
if command -v node >/dev/null 2>&1; then
|
||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||
PARSE_EXIT=$?
|
||
elif command -v bun >/dev/null 2>&1; then
|
||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||
PARSE_EXIT=$?
|
||
else
|
||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||
exit 1
|
||
fi
|
||
if [ "$PARSE_EXIT" != "0" ]; then
|
||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||
exit 1
|
||
fi
|
||
fi
|
||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||
|
||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||
echo "STATE: DRIFT_UNEXPECTED"
|
||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||
exit 1
|
||
fi
|
||
echo "STATE: FRESH"
|
||
else
|
||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||
echo "STATE: DRIFT_STALE_PKG"
|
||
else
|
||
echo "STATE: ALREADY_BUMPED"
|
||
fi
|
||
fi
|
||
```
|
||
|
||
Read the `STATE:` line and dispatch:
|
||
|
||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||
|
||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||
|
||
2. **Auto-decide the bump level based on the diff:**
|
||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||
|
||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||
|
||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||
|
||
```bash
|
||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||
--base <base> \
|
||
--bump "$BUMP_LEVEL" \
|
||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||
```
|
||
|
||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||
```
|
||
Queue on <base> (vBASE_VERSION):
|
||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||
Active sibling workspaces (WIP, not yet PR'd):
|
||
<path> → v<version> (committed Nh ago)
|
||
Your branch will claim: vNEW_VERSION (<reason>)
|
||
```
|
||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||
|
||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||
|
||
```bash
|
||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||
exit 1
|
||
fi
|
||
echo "$NEW_VERSION" > VERSION
|
||
if [ -f package.json ]; then
|
||
if command -v node >/dev/null 2>&1; then
|
||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||
exit 1
|
||
}
|
||
elif command -v bun >/dev/null 2>&1; then
|
||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||
exit 1
|
||
}
|
||
else
|
||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||
exit 1
|
||
fi
|
||
fi
|
||
```
|
||
|
||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||
|
||
```bash
|
||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||
exit 1
|
||
fi
|
||
if command -v node >/dev/null 2>&1; then
|
||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||
echo "ERROR: drift repair failed — could not update package.json."
|
||
exit 1
|
||
}
|
||
else
|
||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||
echo "ERROR: drift repair failed."
|
||
exit 1
|
||
}
|
||
fi
|
||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||
```
|
||
|
||
---
|
||
|
||
{{CHANGELOG_WORKFLOW}}
|
||
|
||
---
|
||
|
||
## Step 14: TODOS.md (auto-update)
|
||
|
||
Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
|
||
|
||
Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
|
||
|
||
**1. Check if TODOS.md exists** in the repository root.
|
||
|
||
**If TODOS.md does not exist:** Use AskUserQuestion:
|
||
- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
|
||
- Options: A) Create it now, B) Skip for now
|
||
- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
|
||
- If B: Skip the rest of Step 14. Continue to Step 15.
|
||
|
||
**2. Check structure and organization:**
|
||
|
||
Read TODOS.md and verify it follows the recommended structure:
|
||
- Items grouped under `## <Skill/Component>` headings
|
||
- Each item has `**Priority:**` field with P0-P4 value
|
||
- A `## Completed` section at the bottom
|
||
|
||
**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
|
||
- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
|
||
- Options: A) Reorganize now (recommended), B) Leave as-is
|
||
- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
|
||
- If B: Continue to step 3 without restructuring.
|
||
|
||
**3. Detect completed TODOs:**
|
||
|
||
This step is fully automatic — no user interaction.
|
||
|
||
Use the diff and commit history already gathered in earlier steps:
|
||
- `git diff <base>...HEAD` (full diff against the base branch)
|
||
- `git log <base>..HEAD --oneline` (all commits being shipped)
|
||
|
||
For each TODO item, check if the changes in this PR complete it by:
|
||
- Matching commit messages against the TODO title and description
|
||
- Checking if files referenced in the TODO appear in the diff
|
||
- Checking if the TODO's described work matches the functional changes
|
||
|
||
**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
|
||
|
||
**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
|
||
|
||
**5. Output summary:**
|
||
- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
|
||
- Or: `TODOS.md: No completed items detected. M items remaining.`
|
||
- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
|
||
|
||
**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
|
||
|
||
Save this summary — it goes into the PR body in Step 19.
|
||
|
||
---
|
||
|
||
## Step 15: Commit (bisectable chunks)
|
||
|
||
### Step 15.0: WIP Commit Squash (continuous checkpoint mode only)
|
||
|
||
If `CHECKPOINT_MODE` is `"continuous"`, the branch likely contains `WIP:` commits
|
||
from auto-checkpointing. These must be squashed INTO the corresponding logical
|
||
commits before the bisectable-grouping logic in Step 15.1 runs. Non-WIP commits
|
||
on the branch (earlier landed work) must be preserved.
|
||
|
||
**Detection:**
|
||
```bash
|
||
WIP_COUNT=$(git log <base>..HEAD --oneline --grep="^WIP:" 2>/dev/null | wc -l | tr -d ' ')
|
||
echo "WIP_COMMITS: $WIP_COUNT"
|
||
```
|
||
|
||
If `WIP_COUNT` is 0: skip this sub-step entirely.
|
||
|
||
If `WIP_COUNT` > 0, collect the WIP context first so it survives the squash:
|
||
|
||
```bash
|
||
# Export [gstack-context] blocks from all WIP commits on this branch.
|
||
# This file becomes input to the CHANGELOG entry and may inform PR body context.
|
||
mkdir -p "$(git rev-parse --show-toplevel)/.gstack"
|
||
git log <base>..HEAD --grep="^WIP:" --format="%H%n%B%n---END---" > \
|
||
"$(git rev-parse --show-toplevel)/.gstack/wip-context-before-squash.md" 2>/dev/null || true
|
||
```
|
||
|
||
**Non-destructive squash strategy:**
|
||
|
||
`git reset --soft <merge-base>` WOULD uncommit everything including non-WIP commits.
|
||
DO NOT DO THAT. Instead, use `git rebase` scoped to filter WIP commits only.
|
||
|
||
Option 1 (preferred, if there are non-WIP commits mixed in):
|
||
```bash
|
||
# Interactive rebase with automated WIP squashing.
|
||
# Mark every WIP commit as 'fixup' (drop its message, fold changes into prior commit).
|
||
git rebase -i $(git merge-base HEAD origin/<base>) \
|
||
--exec 'true' \
|
||
-X ours 2>/dev/null || {
|
||
echo "Rebase conflict. Aborting: git rebase --abort"
|
||
git rebase --abort
|
||
echo "STATUS: BLOCKED — manual WIP squash required"
|
||
exit 1
|
||
}
|
||
```
|
||
|
||
Option 2 (simpler, if the branch is ALL WIP commits so far — no landed work):
|
||
```bash
|
||
# Branch contains only WIP commits. Reset-soft is safe here because there's
|
||
# nothing non-WIP to preserve. Verify first.
|
||
NON_WIP=$(git log <base>..HEAD --oneline --invert-grep --grep="^WIP:" 2>/dev/null | wc -l | tr -d ' ')
|
||
if [ "$NON_WIP" -eq 0 ]; then
|
||
git reset --soft $(git merge-base HEAD origin/<base>)
|
||
echo "WIP-only branch, reset-soft to merge base. Step 15.1 will create clean commits."
|
||
fi
|
||
```
|
||
|
||
Decide at runtime which option applies. If unsure, prefer stopping and asking the
|
||
user via AskUserQuestion rather than destroying non-WIP commits.
|
||
|
||
**Anti-footgun rules:**
|
||
- NEVER blind `git reset --soft` if there are non-WIP commits. Codex flagged this
|
||
as destructive — it would uncommit real landed work and turn the push step into
|
||
a non-fast-forward push for anyone who already pushed.
|
||
- Only proceed to Step 15.1 after WIP commits are successfully squashed/absorbed
|
||
or the branch has been verified to contain only WIP work.
|
||
|
||
### Step 15.1: Bisectable Commits
|
||
|
||
**Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
|
||
|
||
1. Analyze the diff and group changes into logical commits. Each commit should represent **one coherent change** — not one file, but one logical unit.
|
||
|
||
2. **Commit ordering** (earlier commits first):
|
||
- **Infrastructure:** migrations, config changes, route additions
|
||
- **Models & services:** new models, services, concerns (with their tests)
|
||
- **Controllers & views:** controllers, views, JS/React components (with their tests)
|
||
- **VERSION + CHANGELOG + TODOS.md:** always in the final commit
|
||
|
||
3. **Rules for splitting:**
|
||
- A model and its test file go in the same commit
|
||
- A service and its test file go in the same commit
|
||
- A controller, its views, and its test go in the same commit
|
||
- Migrations are their own commit (or grouped with the model they support)
|
||
- Config/route changes can group with the feature they enable
|
||
- If the total diff is small (< 50 lines across < 4 files), a single commit is fine
|
||
|
||
4. **Each commit must be independently valid** — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
|
||
|
||
5. Compose each commit message:
|
||
- First line: `<type>: <summary>` (type = feat/fix/chore/refactor/docs)
|
||
- Body: brief description of what this commit contains
|
||
- Only the **final commit** (VERSION + CHANGELOG) gets the version tag and co-author trailer:
|
||
|
||
```bash
|
||
git commit -m "$(cat <<'EOF'
|
||
chore: bump version and changelog (vX.Y.Z.W)
|
||
|
||
{{CO_AUTHOR_TRAILER}}
|
||
EOF
|
||
)"
|
||
```
|
||
|
||
---
|
||
|
||
## Step 16: Verification Gate
|
||
|
||
**IRON LAW: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.**
|
||
|
||
Before pushing, re-verify if code changed during Steps 4-6:
|
||
|
||
1. **Test verification:** If ANY code changed after Step 5's test run (fixes from review findings, CHANGELOG edits don't count), re-run the test suite. Paste fresh output. Stale output from Step 5 is NOT acceptable.
|
||
|
||
2. **Build verification:** If the project has a build step, run it. Paste output.
|
||
|
||
3. **Rationalization prevention:**
|
||
- "Should work now" → RUN IT.
|
||
- "I'm confident" → Confidence is not evidence.
|
||
- "I already tested earlier" → Code changed since then. Test again.
|
||
- "It's a trivial change" → Trivial changes break production.
|
||
|
||
**If tests fail here:** STOP. Do not push. Fix the issue and return to Step 5.
|
||
|
||
Claiming work is complete without verification is dishonesty, not efficiency.
|
||
|
||
---
|
||
|
||
## Step 17: Push
|
||
|
||
**Idempotency check:** Check if the branch is already pushed and up to date.
|
||
|
||
```bash
|
||
git fetch origin <branch-name> 2>/dev/null
|
||
LOCAL=$(git rev-parse HEAD)
|
||
REMOTE=$(git rev-parse origin/<branch-name> 2>/dev/null || echo "none")
|
||
echo "LOCAL: $LOCAL REMOTE: $REMOTE"
|
||
[ "$LOCAL" = "$REMOTE" ] && echo "ALREADY_PUSHED" || echo "PUSH_NEEDED"
|
||
```
|
||
|
||
If `ALREADY_PUSHED`, skip the push but continue to Step 18. Otherwise push with upstream tracking:
|
||
|
||
```bash
|
||
git push -u origin <branch-name>
|
||
```
|
||
|
||
**You are NOT done.** The code is pushed but documentation sync and PR creation are mandatory final steps. Continue to Step 18.
|
||
|
||
---
|
||
|
||
## Step 18: Documentation sync (via subagent, before PR creation)
|
||
|
||
**Dispatch /document-release as a subagent** using the Agent tool with `subagent_type: "general-purpose"`. The subagent gets a fresh context window — zero rot from the preceding 17 steps. It also runs the **full** `/document-release` workflow (with CHANGELOG clobber protection, doc exclusions, risky-change gates, named staging, race-safe PR body editing) rather than a weaker reimplementation.
|
||
|
||
**Sequencing:** This step runs AFTER Step 17 (Push) and BEFORE Step 19 (Create PR). The PR is created once from final HEAD with the `## Documentation` section baked into the initial body. No create-then-re-edit dance.
|
||
|
||
**Subagent prompt:**
|
||
|
||
> You are executing the /document-release workflow after a code push. Read the full skill file `${HOME}/.claude/skills/gstack/document-release/SKILL.md` and execute its complete workflow end-to-end, including CHANGELOG clobber protection, doc exclusions, risky-change gates, and named staging. Do NOT attempt to edit the PR body — no PR exists yet. Branch: `<branch>`, base: `<base>`.
|
||
>
|
||
> After completing the workflow, output a single JSON object on the LAST LINE of your response (no other text after it):
|
||
> `{"files_updated":["README.md","CLAUDE.md",...],"commit_sha":"abc1234","pushed":true,"documentation_section":"<markdown block for PR body's ## Documentation section>"}`
|
||
>
|
||
> If no documentation files needed updating, output:
|
||
> `{"files_updated":[],"commit_sha":null,"pushed":false,"documentation_section":null}`
|
||
|
||
**Parent processing:**
|
||
|
||
1. Parse the LAST line of the subagent's output as JSON.
|
||
2. Store `documentation_section` — Step 19 embeds it in the PR body (or omits the section if null).
|
||
3. If `files_updated` is non-empty, print: `Documentation synced: {files_updated.length} files updated, committed as {commit_sha}`.
|
||
4. If `files_updated` is empty, print: `Documentation is current — no updates needed.`
|
||
|
||
**If the subagent fails or returns invalid JSON:** Print a warning and proceed to Step 19 without a `## Documentation` section. Do not block /ship on subagent failure. The user can run `/document-release` manually after the PR lands.
|
||
|
||
---
|
||
|
||
## Step 19: Create PR/MR
|
||
|
||
**Idempotency check:** Check if a PR/MR already exists for this branch.
|
||
|
||
**If GitHub:**
|
||
```bash
|
||
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
|
||
```
|
||
|
||
**If GitLab:**
|
||
```bash
|
||
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
|
||
```
|
||
|
||
If an **open** PR/MR already exists: **update** the PR body using `gh pr edit --body "..."` (GitHub) or `glab mr update -d "..."` (GitLab). Always regenerate the PR body from scratch using this run's fresh results (test output, coverage audit, review findings, adversarial review, TODOS summary, documentation_section from Step 18). Never reuse stale PR body content from a prior run.
|
||
|
||
**Always update the PR title to start with `v$NEW_VERSION`.** PR titles use the workspace-aware format `v<NEW_VERSION> <type>: <summary>` — version ALWAYS first, no exceptions, no "custom title kept intentionally" escape hatch. The shared helper `bin/gstack-pr-title-rewrite.sh` is the single source of truth for the rule.
|
||
|
||
1. Read the current title: `CURRENT=$(gh pr view --json title -q .title)` (or `glab mr view -F json | jq -r .title`).
|
||
2. Compute the corrected title: `NEW_TITLE=$(~/.claude/skills/gstack/bin/gstack-pr-title-rewrite.sh "$NEW_VERSION" "$CURRENT")`. The helper handles three cases: title already correct (no-op), title has a different `v<X.Y.Z.W>` prefix (replace it), or title has no version prefix (prepend one).
|
||
3. If `NEW_TITLE` differs from `CURRENT`, run `gh pr edit --title "$NEW_TITLE"` (or `glab mr update -t "$NEW_TITLE"`).
|
||
4. **Self-check:** re-fetch the title and assert it starts with `v$NEW_VERSION `. If it does not, retry the edit once. If still wrong, surface the failure to the user.
|
||
|
||
This keeps the title truthful when Step 12's queue-drift detection rebumps a stale version, and forces the format on PRs that were created without it.
|
||
|
||
Print the existing URL and continue to Step 20.
|
||
|
||
If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
|
||
|
||
The PR/MR body should contain these sections:
|
||
|
||
```
|
||
## Summary
|
||
<Summarize ALL changes being shipped. Run `git log <base>..HEAD --oneline` to enumerate
|
||
every commit. Exclude the VERSION/CHANGELOG metadata commit (that's this PR's bookkeeping,
|
||
not a substantive change). Group the remaining commits into logical sections (e.g.,
|
||
"**Performance**", "**Dead Code Removal**", "**Infrastructure**"). Every substantive commit
|
||
must appear in at least one section. If a commit's work isn't reflected in the summary,
|
||
you missed it.>
|
||
|
||
## Test Coverage
|
||
<coverage diagram from Step 7, or "All new code paths have test coverage.">
|
||
<If Step 7 ran: "Tests: {before} → {after} (+{delta} new)">
|
||
|
||
## Pre-Landing Review
|
||
<findings from Step 9 code review, or "No issues found.">
|
||
|
||
## Design Review
|
||
<If design review ran: "Design Review (lite): N findings — M auto-fixed, K skipped. AI Slop: clean/N issues.">
|
||
<If no frontend files changed: "No frontend files changed — design review skipped.">
|
||
|
||
## Eval Results
|
||
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
|
||
|
||
## Greptile Review
|
||
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
|
||
<If no Greptile comments found: "No Greptile comments.">
|
||
<If no PR existed during Step 10: omit this section entirely>
|
||
|
||
## Scope Drift
|
||
<If scope drift ran: "Scope Check: CLEAN" or list of drift/creep findings>
|
||
<If no scope drift: omit this section>
|
||
|
||
## Plan Completion
|
||
<If plan file found: completion checklist summary from Step 8>
|
||
<If no plan file: "No plan file detected.">
|
||
<If plan items deferred: list deferred items>
|
||
|
||
## Verification Results
|
||
<If verification ran: summary from Step 8.1 (N PASS, M FAIL, K SKIPPED)>
|
||
<If skipped: reason (no plan, no server, no verification section)>
|
||
<If not applicable: omit this section>
|
||
|
||
## TODOS
|
||
<If items marked complete: bullet list of completed items with version>
|
||
<If no items completed: "No TODO items completed in this PR.">
|
||
<If TODOS.md created or reorganized: note that>
|
||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||
|
||
## Documentation
|
||
<Embed the `documentation_section` string returned by Step 18's subagent here, verbatim.>
|
||
<If Step 18 returned `documentation_section: null` (no docs updated), omit this section entirely.>
|
||
|
||
## Test plan
|
||
- [x] All Rails tests pass (N runs, 0 failures)
|
||
- [x] All Vitest tests pass (N tests)
|
||
|
||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||
```
|
||
|
||
**If GitHub:**
|
||
|
||
```bash
|
||
# PR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||
gh pr create --base <base> --title "v$NEW_VERSION <type>: <summary>" --body "$(cat <<'EOF'
|
||
<PR body from above>
|
||
EOF
|
||
)"
|
||
```
|
||
|
||
**If GitLab:**
|
||
|
||
```bash
|
||
# MR title MUST start with v$NEW_VERSION — enforced on every run, no exceptions.
|
||
# (See Step 19 idempotency block + bin/gstack-pr-title-rewrite.sh for the rule.)
|
||
glab mr create -b <base> -t "v$NEW_VERSION <type>: <summary>" -d "$(cat <<'EOF'
|
||
<MR body from above>
|
||
EOF
|
||
)"
|
||
```
|
||
|
||
**If neither CLI is available:**
|
||
Print the branch name, remote URL, and instruct the user to create the PR/MR manually via the web UI. Do not stop — the code is pushed and ready.
|
||
|
||
**Output the PR/MR URL** — then proceed to Step 20.
|
||
|
||
---
|
||
|
||
## Step 20: Persist ship metrics
|
||
|
||
Log coverage and plan completion data so `/retro` can track trends:
|
||
|
||
```bash
|
||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
|
||
```
|
||
|
||
Append to `~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl`:
|
||
|
||
```bash
|
||
echo '{"skill":"ship","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","coverage_pct":COVERAGE_PCT,"plan_items_total":PLAN_TOTAL,"plan_items_done":PLAN_DONE,"verification_result":"VERIFY_RESULT","version":"VERSION","branch":"BRANCH"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||
```
|
||
|
||
Substitute from earlier steps:
|
||
- **COVERAGE_PCT**: coverage percentage from Step 7 diagram (integer, or -1 if undetermined)
|
||
- **PLAN_TOTAL**: total plan items extracted in Step 8 (0 if no plan file)
|
||
- **PLAN_DONE**: count of DONE + CHANGED items from Step 8 (0 if no plan file)
|
||
- **VERIFY_RESULT**: "pass", "fail", or "skipped" from Step 8.1
|
||
- **VERSION**: from the VERSION file
|
||
- **BRANCH**: current branch name
|
||
|
||
This step is automatic — never skip it, never ask for confirmation.
|
||
|
||
---
|
||
|
||
## Important Rules
|
||
|
||
- **Never skip tests.** If tests fail, stop.
|
||
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
|
||
- **Never force push.** Use regular `git push` only.
|
||
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), and Codex structured review [P1] findings (large diffs only).
|
||
- **Always use the 4-digit version format** from the VERSION file.
|
||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||
- **Split commits for bisectability** — each commit = one logical change.
|
||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||
- **Never push without fresh verification evidence.** If code changed after Step 5 tests, re-run before pushing.
|
||
- **Step 7 generates coverage tests.** They must pass before committing. Never commit failing tests.
|
||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL + auto-synced docs.**
|