mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Files

T

Garry Tan 07b4e15b34 feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36 )

* fix: cookie import picker returns JSON instead of HTML

jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.

Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add help command to browse server

Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: version-aware find-browse with META signal protocol

Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.

- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: clean up .bun-build temp files after compile

bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make help command reachable by removing it from META_COMMANDS

help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared Greptile comment triage reference doc

Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.

* feat: make /review and /ship Greptile-aware

/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.

/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.

* feat: add Greptile batting average to /retro

Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.

* docs: add Greptile integration section to README

Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.

* feat: add local dev mode for testing skills from within the repo

bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow gitignore to .claude/skills/ instead of all .claude/

Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md

Rewritten as a contributor-friendly guide instead of a dry plan doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add browser interaction guidance to CLAUDE.md

Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared config module for project-local browse state

Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: rewrite port selection to use random ports

Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: move browse state from /tmp to project-local .gstack/

CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update crash log path reference to .gstack/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add config tests and update CLI lifecycle test

14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update BROWSER.md and TODO.md for project-local state

Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2

- README: replace Conductor-aware language with project-local isolation,
  add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff

When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: update CHANGELOG for complete v0.3.2 coverage

Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 18:10:56 -07:00

14 KiB

Raw Blame History

name, version, description, allowed-tools

name

version

description

allowed-tools

ship

1.0.0

Ship workflow: merge main, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR.

Bash

Read

Write

Edit

Grep

Glob

AskUserQuestion

Ship: Fully Automated Ship Workflow

You are running the /ship workflow. This is a non-interactive, fully automated workflow. Do NOT ask for confirmation at any step. The user said /ship which means DO IT. Run straight through and output the PR URL at the end.

Only stop for:

On main branch (abort)
Merge conflicts that can't be auto-resolved (stop, show conflicts)
Test failures (stop, show failures)
Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
MINOR or MAJOR version bump needed (ask — see Step 4)
Greptile review comments that need user decision (complex fixes, false positives)

Never stop for:

Uncommitted changes (always include them)
Version bump choice (auto-pick MICRO or PATCH — see Step 4)
CHANGELOG content (auto-generate from diff)
Commit message approval (auto-commit)
Multi-file changesets (auto-split into bisectable commits)

Step 1: Pre-flight

Check the current branch. If on main, abort: "You're on main. Ship from a feature branch."
Run git status (never use -uall). Uncommitted changes are always included — no need to ask.
Run git diff main...HEAD --stat and git log main..HEAD --oneline to understand what's being shipped.

Step 2: Merge origin/main (BEFORE tests)

Fetch and merge origin/main into the feature branch so tests run against the merged state:

git fetch origin main && git merge origin/main --no-edit

If there are merge conflicts: Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, STOP and show them.

If already up to date: Continue silently.

Step 3: Run tests (on merged code)

Do NOT run RAILS_ENV=test bin/rails db:migrate — bin/test-lane already calls db:test:prepare internally, which loads the schema into the correct lane database. Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.

Run both test suites in parallel:

bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
wait

After both complete, read the output files and check pass/fail.

If any test fails: Show the failures and STOP. Do not proceed.

If all pass: Continue silently — just note the counts briefly.

Step 3.25: Eval Suites (conditional)

Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.

1. Check if the diff touches prompt-related files:

git diff origin/main --name-only

Match against these patterns (from CLAUDE.md):

app/services/*_prompt_builder.rb
app/services/*_generation_service.rb, *_writer_service.rb, *_designer_service.rb
app/services/*_evaluator.rb, *_scorer.rb, *_classifier_service.rb, *_analyzer.rb
app/services/concerns/*voice*.rb, *writing*.rb, *prompt*.rb, *token*.rb
app/services/chat_tools/*.rb, app/services/x_thread_tools/*.rb
config/system_prompts/*.txt
test/evals/**/* (eval infrastructure changes affect all suites)

If no matches: Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.

2. Identify affected eval suites:

Each eval runner (test/evals/*_eval_runner.rb) declares PROMPT_SOURCE_FILES listing which source files affect it. Grep these to find which suites match the changed files:

grep -l "changed_file_basename" test/evals/*_eval_runner.rb

Map runner → test file: post_generation_eval_runner.rb → post_generation_eval_test.rb.

Special cases:

Changes to test/evals/judges/*.rb, test/evals/support/*.rb, or test/evals/fixtures/ affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
Changes to config/system_prompts/*.txt — grep eval runners for the prompt filename to find affected suites.
If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.

3. Run affected suites at EVAL_JUDGE_TIER=full:

/ship is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).

EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt

If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.

4. Check results:

If any eval fails: Show the failures, the cost dashboard, and STOP. Do not proceed.
If all pass: Note pass counts and cost. Continue to Step 3.5.

5. Save eval output — include eval results and cost dashboard in the PR body (Step 8).

Tier reference (for context — /ship always uses full):

Tier	When	Speed (cached)	Cost
`fast` (Haiku)	Dev iteration, smoke tests	~5s (14x faster)	~$0.07/run
`standard` (Sonnet)	Default dev, `bin/test-lane --eval`	~17s (4x faster)	~$0.37/run
`full` (Opus persona)	`/ship` and pre-merge	~72s (baseline)	~$1.27/run

Step 3.5: Pre-Landing Review

Review the diff for structural issues that tests don't catch.

Read .claude/skills/review/checklist.md. If the file cannot be read, STOP and report the error.
Run git diff origin/main to get the full diff (scoped to feature changes against the freshly-fetched remote main).
Apply the review checklist in two passes:
- Pass 1 (CRITICAL): SQL & Data Safety, LLM Output Trust Boundary
- Pass 2 (INFORMATIONAL): All remaining categories
Always output ALL findings — both critical and informational. The user must see every issue found.
Output a summary header: Pre-Landing Review: N issues (X critical, Y informational)
If CRITICAL issues found: For EACH critical issue, use a separate AskUserQuestion with:
- The problem (file:line + description)
- Your recommended fix
- Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"), then STOP and tell the user to run /ship again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
If only non-critical issues found: Output them and continue. They will be included in the PR body at Step 8.
If no issues found: Output Pre-Landing Review: No issues found. and continue.

Save the review output — it goes into the PR body in Step 8.

Step 3.75: Address Greptile review comments (if PR exists)

Read .claude/skills/review/greptile-triage.md and follow the fetch, filter, and classify steps.

If no PR exists, gh fails, API returns an error, or there are zero Greptile comments: Skip this step silently. Continue to Step 4.

If Greptile comments are found:

Include a Greptile summary in your output: + N Greptile comments (X valid, Y fixed, Z FP)

For each classified comment:

VALID & ACTIONABLE: Use AskUserQuestion with:

The comment (file:line or [top-level] + body summary + permalink URL)
Your recommended fix
Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
If user chooses A: apply the fix, commit the fixed files (git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"), reply to the comment ("Fixed in <commit-sha>."), and save to ~/.gstack/greptile-history.md (type: fix).
If user chooses C: reply explaining the false positive, save to history (type: fp).

VALID BUT ALREADY FIXED: Reply acknowledging the catch — no AskUserQuestion needed:

Post reply: "Good catch — already fixed in <commit-sha>."
Save to ~/.gstack/greptile-history.md (type: already-fixed)

FALSE POSITIVE: Use AskUserQuestion:

Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
Options:
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
- B) Fix it anyway (if trivial)
- C) Ignore silently
If user chooses A: post reply using the appropriate API from the triage doc, save to history (type: fp)

SUPPRESSED: Skip silently — these are known false positives from previous triage.

After all comments are resolved: If any fixes were applied, the tests from Step 3 are now stale. Re-run tests (Step 3) before continuing to Step 4. If no fixes were applied, continue to Step 4.

Step 4: Version bump (auto-decide)

Read the current VERSION file (4-digit format: MAJOR.MINOR.PATCH.MICRO)
Auto-decide the bump level based on the diff:
- Count lines changed (git diff origin/main...HEAD --stat | tail -1)
- MICRO (4th digit): < 50 lines changed, trivial tweaks, typos, config
- PATCH (3rd digit): 50+ lines changed, bug fixes, small-medium features
- MINOR (2nd digit): ASK the user — only for major features or significant architectural changes
- MAJOR (1st digit): ASK the user — only for milestones or breaking changes
Compute the new version:
- Bumping a digit resets all digits to its right to 0
- Example: 0.19.1.0 + PATCH → 0.19.2.0
Write the new version to the VERSION file.

Step 5: CHANGELOG (auto-generate)

Read CHANGELOG.md header to know the format.
Auto-generate the entry from ALL commits on the branch (not just recent ones):
- Use git log main..HEAD --oneline to see every commit being shipped
- Use git diff main...HEAD to see the full diff against main
- The CHANGELOG entry must be comprehensive of ALL changes going into the PR
- If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
- Categorize changes into applicable sections:
  - ### Added — new features
  - ### Changed — changes to existing functionality
  - ### Fixed — bug fixes
  - ### Removed — removed features
- Write concise, descriptive bullet points
- Insert after the file header (line 5), dated today
- Format: ## [X.Y.Z.W] - YYYY-MM-DD

Do NOT ask the user to describe changes. Infer from the diff and commit history.

Step 6: Commit (bisectable chunks)

Goal: Create small, logical commits that work well with git bisect and help LLMs understand what changed.

Analyze the diff and group changes into logical commits. Each commit should represent one coherent change — not one file, but one logical unit.
Commit ordering (earlier commits first):
- Infrastructure: migrations, config changes, route additions
- Models & services: new models, services, concerns (with their tests)
- Controllers & views: controllers, views, JS/React components (with their tests)
- VERSION + CHANGELOG: always in the final commit
Rules for splitting:
- A model and its test file go in the same commit
- A service and its test file go in the same commit
- A controller, its views, and its test go in the same commit
- Migrations are their own commit (or grouped with the model they support)
- Config/route changes can group with the feature they enable
- If the total diff is small (< 50 lines across < 4 files), a single commit is fine
Each commit must be independently valid — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
Compose each commit message:
- First line: <type>: <summary> (type = feat/fix/chore/refactor/docs)
- Body: brief description of what this commit contains
- Only the final commit (VERSION + CHANGELOG) gets the version tag and co-author trailer:

git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"

Step 7: Push

Push to the remote with upstream tracking:

git push -u origin <branch-name>

Step 8: Create PR

Create a pull request using gh:

gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
## Summary
<bullet points from CHANGELOG>

## Pre-Landing Review
<findings from Step 3.5, or "No issues found.">

## Eval Results
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">

## Greptile Review
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
<If no Greptile comments found: "No Greptile comments.">
<If no PR existed during Step 3.75: omit this section entirely>

## Test plan
- [x] All Rails tests pass (N runs, 0 failures)
- [x] All Vitest tests pass (N tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"

Output the PR URL — this should be the final output the user sees.

Important Rules

Never skip tests. If tests fail, stop.
Never skip the pre-landing review. If checklist.md is unreadable, stop.
Never force push. Use regular git push only.
Never ask for confirmation except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
Always use the 4-digit version format from the VERSION file.
Date format in CHANGELOG: YYYY-MM-DD
Split commits for bisectability — each commit = one logical change.
The goal is: user says /ship, next thing they see is the review + PR URL.

14 KiB Raw Blame History