* feat: add bin/gstack-config CLI for reading/writing ~/.gstack/config.yaml Simple get/set/list interface for persistent gstack configuration. Used by update-check and upgrade skill for auto_upgrade and update_check settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: smart update check with 12h cache, snooze backoff, config disable - Reduce cache TTL from 24h to 12h for faster update detection - Add exponential snooze backoff: 24h → 48h → 1 week (resets on new version) - Add update_check: false config option to disable checks entirely - Clear snooze file on just-upgraded - 14 new tests covering snooze levels, expiry, corruption, and config paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: upgrade skill with auto-upgrade, 4-option prompt, vendored sync - Auto-upgrade mode via config or GSTACK_AUTO_UPGRADE=1 env var - 4-option AskUserQuestion: upgrade once, always, not now, never - Step 4.5: sync local vendored copy after upgrading primary install - Snooze write with escalating backoff on "Not now" - Update preamble text in gen-skill-docs for new upgrade flow - Regenerate all SKILL.md files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: simplify upgrade instructions, move auto-upgrade to completed README now points to /gstack-upgrade instead of long paste commands. Auto-upgrade TODO moved to Completed section (v0.3.8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
19 KiB
name, version, description, allowed-tools
| name | version | description | allowed-tools | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ship | 1.0.0 | Ship workflow: merge main, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. |
|
Update Check (run first)
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
If output shows UPGRADE_AVAILABLE <old> <new>: read ~/.claude/skills/gstack/gstack-upgrade/SKILL.md and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If JUST_UPGRADED <from> <to>: tell user "Running gstack v{to} (just updated!)" and continue.
Ship: Fully Automated Ship Workflow
You are running the /ship workflow. This is a non-interactive, fully automated workflow. Do NOT ask for confirmation at any step. The user said /ship which means DO IT. Run straight through and output the PR URL at the end.
Only stop for:
- On
mainbranch (abort) - Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)
Never stop for:
- Uncommitted changes (always include them)
- Version bump choice (auto-pick MICRO or PATCH — see Step 4)
- CHANGELOG content (auto-generate from diff)
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
Step 1: Pre-flight
-
Check the current branch. If on
main, abort: "You're on main. Ship from a feature branch." -
Run
git status(never use-uall). Uncommitted changes are always included — no need to ask. -
Run
git diff main...HEAD --statandgit log main..HEAD --onelineto understand what's being shipped.
Step 2: Merge origin/main (BEFORE tests)
Fetch and merge origin/main into the feature branch so tests run against the merged state:
git fetch origin main && git merge origin/main --no-edit
If there are merge conflicts: Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, STOP and show them.
If already up to date: Continue silently.
Step 3: Run tests (on merged code)
Do NOT run RAILS_ENV=test bin/rails db:migrate — bin/test-lane already calls
db:test:prepare internally, which loads the schema into the correct lane database.
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
Run both test suites in parallel:
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
wait
After both complete, read the output files and check pass/fail.
If any test fails: Show the failures and STOP. Do not proceed.
If all pass: Continue silently — just note the counts briefly.
Step 3.25: Eval Suites (conditional)
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
1. Check if the diff touches prompt-related files:
git diff origin/main --name-only
Match against these patterns (from CLAUDE.md):
app/services/*_prompt_builder.rbapp/services/*_generation_service.rb,*_writer_service.rb,*_designer_service.rbapp/services/*_evaluator.rb,*_scorer.rb,*_classifier_service.rb,*_analyzer.rbapp/services/concerns/*voice*.rb,*writing*.rb,*prompt*.rb,*token*.rbapp/services/chat_tools/*.rb,app/services/x_thread_tools/*.rbconfig/system_prompts/*.txttest/evals/**/*(eval infrastructure changes affect all suites)
If no matches: Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
2. Identify affected eval suites:
Each eval runner (test/evals/*_eval_runner.rb) declares PROMPT_SOURCE_FILES listing which source files affect it. Grep these to find which suites match the changed files:
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
Map runner → test file: post_generation_eval_runner.rb → post_generation_eval_test.rb.
Special cases:
- Changes to
test/evals/judges/*.rb,test/evals/support/*.rb, ortest/evals/fixtures/affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which. - Changes to
config/system_prompts/*.txt— grep eval runners for the prompt filename to find affected suites. - If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
3. Run affected suites at EVAL_JUDGE_TIER=full:
/ship is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
4. Check results:
- If any eval fails: Show the failures, the cost dashboard, and STOP. Do not proceed.
- If all pass: Note pass counts and cost. Continue to Step 3.5.
5. Save eval output — include eval results and cost dashboard in the PR body (Step 8).
Tier reference (for context — /ship always uses full):
| Tier | When | Speed (cached) | Cost |
|---|---|---|---|
fast (Haiku) |
Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
standard (Sonnet) |
Default dev, bin/test-lane --eval |
~17s (4x faster) | ~$0.37/run |
full (Opus persona) |
/ship and pre-merge |
~72s (baseline) | ~$1.27/run |
Step 3.5: Pre-Landing Review
Review the diff for structural issues that tests don't catch.
-
Read
.claude/skills/review/checklist.md. If the file cannot be read, STOP and report the error. -
Run
git diff origin/mainto get the full diff (scoped to feature changes against the freshly-fetched remote main). -
Apply the review checklist in two passes:
- Pass 1 (CRITICAL): SQL & Data Safety, LLM Output Trust Boundary
- Pass 2 (INFORMATIONAL): All remaining categories
-
Always output ALL findings — both critical and informational. The user must see every issue found.
-
Output a summary header:
Pre-Landing Review: N issues (X critical, Y informational) -
If CRITICAL issues found: For EACH critical issue, use a separate AskUserQuestion with:
- The problem (
file:line+ description) - Your recommended fix
- Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip
After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (
git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"), then STOP and tell the user to run/shipagain to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
- The problem (
-
If only non-critical issues found: Output them and continue. They will be included in the PR body at Step 8.
-
If no issues found: Output
Pre-Landing Review: No issues found.and continue.
Save the review output — it goes into the PR body in Step 8.
Step 3.75: Address Greptile review comments (if PR exists)
Read .claude/skills/review/greptile-triage.md and follow the fetch, filter, classify, and escalation detection steps.
If no PR exists, gh fails, API returns an error, or there are zero Greptile comments: Skip this step silently. Continue to Step 4.
If Greptile comments are found:
Include a Greptile summary in your output: + N Greptile comments (X valid, Y fixed, Z FP)
Before replying to any comment, run the Escalation Detection algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
For each classified comment:
VALID & ACTIONABLE: Use AskUserQuestion with:
- The comment (file:line or [top-level] + body summary + permalink URL)
- Your recommended fix
- Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
- If user chooses A: apply the fix, commit the fixed files (
git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"), reply using the Fix reply template from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix). - If user chooses C: reply using the False Positive reply template from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
VALID BUT ALREADY FIXED: Reply using the Already Fixed reply template from greptile-triage.md — no AskUserQuestion needed:
- Include what was done and the fixing commit SHA
- Save to both per-project and global greptile-history (type: already-fixed)
FALSE POSITIVE: Use AskUserQuestion:
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
- Options:
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
- B) Fix it anyway (if trivial)
- C) Ignore silently
- If user chooses A: reply using the False Positive reply template from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
SUPPRESSED: Skip silently — these are known false positives from previous triage.
After all comments are resolved: If any fixes were applied, the tests from Step 3 are now stale. Re-run tests (Step 3) before continuing to Step 4. If no fixes were applied, continue to Step 4.
Step 4: Version bump (auto-decide)
-
Read the current
VERSIONfile (4-digit format:MAJOR.MINOR.PATCH.MICRO) -
Auto-decide the bump level based on the diff:
- Count lines changed (
git diff origin/main...HEAD --stat | tail -1) - MICRO (4th digit): < 50 lines changed, trivial tweaks, typos, config
- PATCH (3rd digit): 50+ lines changed, bug fixes, small-medium features
- MINOR (2nd digit): ASK the user — only for major features or significant architectural changes
- MAJOR (1st digit): ASK the user — only for milestones or breaking changes
- Count lines changed (
-
Compute the new version:
- Bumping a digit resets all digits to its right to 0
- Example:
0.19.1.0+ PATCH →0.19.2.0
-
Write the new version to the
VERSIONfile.
Step 5: CHANGELOG (auto-generate)
-
Read
CHANGELOG.mdheader to know the format. -
Auto-generate the entry from ALL commits on the branch (not just recent ones):
- Use
git log main..HEAD --onelineto see every commit being shipped - Use
git diff main...HEADto see the full diff against main - The CHANGELOG entry must be comprehensive of ALL changes going into the PR
- If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
- Categorize changes into applicable sections:
### Added— new features### Changed— changes to existing functionality### Fixed— bug fixes### Removed— removed features
- Write concise, descriptive bullet points
- Insert after the file header (line 5), dated today
- Format:
## [X.Y.Z.W] - YYYY-MM-DD
- Use
Do NOT ask the user to describe changes. Infer from the diff and commit history.
Step 5.5: TODOS.md (auto-update)
Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
Read .claude/skills/review/TODOS-format.md for the canonical format reference.
1. Check if TODOS.md exists in the repository root.
If TODOS.md does not exist: Use AskUserQuestion:
- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
- Options: A) Create it now, B) Skip for now
- If A: Create
TODOS.mdwith a skeleton (# TODOS heading + ## Completed section). Continue to step 3. - If B: Skip the rest of Step 5.5. Continue to Step 6.
2. Check structure and organization:
Read TODOS.md and verify it follows the recommended structure:
- Items grouped under
## <Skill/Component>headings - Each item has
**Priority:**field with P0-P4 value - A
## Completedsection at the bottom
If disorganized (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
- Options: A) Reorganize now (recommended), B) Leave as-is
- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
- If B: Continue to step 3 without restructuring.
3. Detect completed TODOs:
This step is fully automatic — no user interaction.
Use the diff and commit history already gathered in earlier steps:
git diff main...HEAD(full diff against main)git log main..HEAD --oneline(all commits being shipped)
For each TODO item, check if the changes in this PR complete it by:
- Matching commit messages against the TODO title and description
- Checking if files referenced in the TODO appear in the diff
- Checking if the TODO's described work matches the functional changes
Be conservative: Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
4. Move completed items to the ## Completed section at the bottom. Append: **Completed:** vX.Y.Z (YYYY-MM-DD)
5. Output summary:
TODOS.md: N items marked complete (item1, item2, ...). M items remaining.- Or:
TODOS.md: No completed items detected. M items remaining. - Or:
TODOS.md: Created./TODOS.md: Reorganized.
6. Defensive: If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
Save this summary — it goes into the PR body in Step 8.
Step 6: Commit (bisectable chunks)
Goal: Create small, logical commits that work well with git bisect and help LLMs understand what changed.
-
Analyze the diff and group changes into logical commits. Each commit should represent one coherent change — not one file, but one logical unit.
-
Commit ordering (earlier commits first):
- Infrastructure: migrations, config changes, route additions
- Models & services: new models, services, concerns (with their tests)
- Controllers & views: controllers, views, JS/React components (with their tests)
- VERSION + CHANGELOG + TODOS.md: always in the final commit
-
Rules for splitting:
- A model and its test file go in the same commit
- A service and its test file go in the same commit
- A controller, its views, and its test go in the same commit
- Migrations are their own commit (or grouped with the model they support)
- Config/route changes can group with the feature they enable
- If the total diff is small (< 50 lines across < 4 files), a single commit is fine
-
Each commit must be independently valid — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
-
Compose each commit message:
- First line:
<type>: <summary>(type = feat/fix/chore/refactor/docs) - Body: brief description of what this commit contains
- Only the final commit (VERSION + CHANGELOG) gets the version tag and co-author trailer:
- First line:
git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"
Step 7: Push
Push to the remote with upstream tracking:
git push -u origin <branch-name>
Step 8: Create PR
Create a pull request using gh:
gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
## Summary
<bullet points from CHANGELOG>
## Pre-Landing Review
<findings from Step 3.5, or "No issues found.">
## Eval Results
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
## Greptile Review
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
<If no Greptile comments found: "No Greptile comments.">
<If no PR existed during Step 3.75: omit this section entirely>
## TODOS
<If items marked complete: bullet list of completed items with version>
<If no items completed: "No TODO items completed in this PR.">
<If TODOS.md created or reorganized: note that>
<If TODOS.md doesn't exist and user skipped: omit this section>
## Test plan
- [x] All Rails tests pass (N runs, 0 failures)
- [x] All Vitest tests pass (N tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
Output the PR URL — this should be the final output the user sees.
Important Rules
- Never skip tests. If tests fail, stop.
- Never skip the pre-landing review. If checklist.md is unreadable, stop.
- Never force push. Use regular
git pushonly. - Never ask for confirmation except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
- Always use the 4-digit version format from the VERSION file.
- Date format in CHANGELOG:
YYYY-MM-DD - Split commits for bisectability — each commit = one logical change.
- TODOS.md completion detection must be conservative. Only mark items as completed when the diff clearly shows the work is done.
- Use Greptile reply templates from greptile-triage.md. Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
- The goal is: user says
/ship, next thing they see is the review + PR URL.