* feat: add {{BASE_BRANCH_DETECT}} resolver to gen-skill-docs
DRY placeholder for dynamic base branch detection across PR-targeting
skills. Detects via gh pr view (existing PR base) → gh repo view
(repo default) → fallback to main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: ship skill detects base branch instead of hardcoding main
Replaces ~14 hardcoded 'main' references with dynamic detection via
{{BASE_BRANCH_DETECT}}. Fixes stacked branches and Conductor workspaces
targeting non-main branches. Adds --base <base> to gh pr create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review, qa, plan-ceo-review detect base branch dynamically
Same pattern as ship: replaces hardcoded 'main' with {{BASE_BRANCH_DETECT}}.
Also cleans up qa bash-isms (REPORT_DIR variable, port chaining).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: retro detects default branch instead of hardcoding origin/main
Retro queries commit history (not PR targets), so uses simpler detection:
gh repo view defaultBranchRef. Replaces ~11 origin/main refs with
origin/<default>.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit cross-step references in gstack-upgrade template
Bash blocks are self-contained, but cross-block variable references
(INSTALL_DIR from Step 2) were implicit. Adds prose making them explicit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs+test: SKILL authoring guidance + regression tests
Adds "Writing SKILL templates" section to CLAUDE.md explaining that
templates are prompts, not scripts. Adds validation test catching
hardcoded 'main' in git commands, and resolver content test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update ARCHITECTURE + CONTRIBUTING for new placeholders
Add {{BASE_BRANCH_DETECT}} to ARCHITECTURE.md placeholder list.
Cross-reference CLAUDE.md template authoring guidance from CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing blank line between resolver functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 3 E2E smoke tests for base branch detection
- /review: verifies Step 0 detection + git diff against detected base
- /ship: truncated dry-run (Steps 0-1 only, no push/PR), asserts no
destructive actions
- /retro: verifies default branch detection for git log queries
Covers the {{BASE_BRANCH_DETECT}} resolver path (review), the ship
template's dual abort check, and retro's inline detection pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
Contributing to gstack
Thanks for wanting to make gstack better. Whether you're fixing a typo in a skill prompt or building an entirely new workflow, this guide will get you up and running fast.
Quick start
gstack skills are Markdown files that Claude Code discovers from a skills/ directory. Normally they live at ~/.claude/skills/gstack/ (your global install). But when you're developing gstack itself, you want Claude Code to use the skills in your working tree — so edits take effect instantly without copying or deploying anything.
That's what dev mode does. It symlinks your repo into the local .claude/skills/ directory so Claude Code reads skills straight from your checkout.
git clone <repo> && cd gstack
bun install # install dependencies
bin/dev-setup # activate dev mode
Now edit any SKILL.md, invoke it in Claude Code (e.g. /review), and see your changes live. When you're done developing:
bin/dev-teardown # deactivate — back to your global install
Contributor mode
Contributor mode is for people who want to fix gstack when it annoys them. Enable it
and Claude Code will automatically log issues to ~/.gstack/contributor-logs/ as you
work — what you were doing, what went wrong, repro steps, raw output.
~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
The logs are for you. When something bugs you enough to fix, the report is already written. Fork gstack, symlink your fork into the project where you hit the issue, fix it, and open a PR.
The contributor workflow
- Hit friction while using gstack — contributor mode logs it automatically
- Check your logs:
ls ~/.gstack/contributor-logs/ - Fork and clone gstack (if you haven't already)
- Symlink your fork into the project where you hit the bug:
# In your core project (the one where gstack annoyed you) ln -sfn /path/to/your/gstack-fork .claude/skills/gstack cd .claude/skills/gstack && bun install && bun run build - Fix the issue — your changes are live immediately in this project
- Test by actually using gstack — do the thing that annoyed you, verify it's fixed
- Open a PR from your fork
This is the best way to contribute: fix gstack while doing your real work, in the project where you actually felt the pain.
Working on gstack inside the gstack repo
When you're editing gstack skills and want to test them by actually using gstack
in the same repo, bin/dev-setup wires this up. It creates .claude/skills/
symlinks (gitignored) pointing back to your working tree, so Claude Code uses
your local edits instead of the global install.
gstack/ <- your working tree
├── .claude/skills/ <- created by dev-setup (gitignored)
│ ├── gstack -> ../../ <- symlink back to repo root
│ ├── review -> gstack/review
│ ├── ship -> gstack/ship
│ └── ... <- one symlink per skill
├── review/
│ └── SKILL.md <- edit this, test with /review
├── ship/
│ └── SKILL.md
├── browse/
│ ├── src/ <- TypeScript source
│ └── dist/ <- compiled binary (gitignored)
└── ...
Day-to-day workflow
# 1. Enter dev mode
bin/dev-setup
# 2. Edit a skill
vim review/SKILL.md
# 3. Test it in Claude Code — changes are live
# > /review
# 4. Editing browse source? Rebuild the binary
bun run build
# 5. Done for the day? Tear down
bin/dev-teardown
Testing & evals
Setup
# 1. Copy .env.example and add your API key
cp .env.example .env
# Edit .env → set ANTHROPIC_API_KEY=sk-ant-...
# 2. Install deps (if you haven't already)
bun install
Bun auto-loads .env — no extra config. Conductor workspaces inherit .env from the main worktree automatically (see "Conductor workspaces" below).
Test tiers
| Tier | Command | Cost | What it tests |
|---|---|---|---|
| 1 — Static | bun test |
Free | Command validation, snapshot flags, SKILL.md correctness, TODOS-format.md refs, observability unit tests |
| 2 — E2E | bun run test:e2e |
~$3.85 | Full skill execution via claude -p subprocess |
| 3 — LLM eval | bun run test:evals |
~$0.15 standalone | LLM-as-judge scoring of generated SKILL.md docs |
| 2+3 | bun run test:evals |
~$4 combined | E2E + LLM-as-judge (runs both) |
bun test # Tier 1 only (runs on every commit, <5s)
bun run test:e2e # Tier 2: E2E only (needs EVALS=1, can't run inside Claude Code)
bun run test:evals # Tier 2 + 3 combined (~$4/run)
Tier 1: Static validation (free)
Runs automatically with bun test. No API keys needed.
- Skill parser tests (
test/skill-parser.test.ts) — Extracts every$Bcommand from SKILL.md bash code blocks and validates against the command registry inbrowse/src/commands.ts. Catches typos, removed commands, and invalid snapshot flags. - Skill validation tests (
test/skill-validation.test.ts) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds. - Generator tests (
test/gen-skill-docs.test.ts) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g.-d <N>not just-d), enriched descriptions for key commands (e.g.islists valid states,presslists key examples).
Tier 2: E2E via claude -p (~$3.85/run)
Spawns claude -p as a subprocess with --output-format stream-json --verbose, streams NDJSON for real-time progress, and scans for browse errors. This is the closest thing to "does this skill actually work end-to-end?"
# Must run from a plain terminal — can't nest inside Claude Code or Conductor
EVALS=1 bun test test/skill-e2e.test.ts
- Gated by
EVALS=1env var (prevents accidental expensive runs) - Auto-skips if running inside Claude Code (
claude -pcan't nest) - API connectivity pre-check — fails fast on ConnectionRefused before burning budget
- Real-time progress to stderr:
[Ns] turn T tool #C: Name(...) - Saves full NDJSON transcripts and failure JSON for debugging
- Tests live in
test/skill-e2e.test.ts, runner logic intest/helpers/session-runner.ts
E2E observability
When E2E tests run, they produce machine-readable artifacts in ~/.gstack-dev/:
| Artifact | Path | Purpose |
|---|---|---|
| Heartbeat | e2e-live.json |
Current test status (updated per tool call) |
| Partial results | evals/_partial-e2e.json |
Completed tests (survives kills) |
| Progress log | e2e-runs/{runId}/progress.log |
Append-only text log |
| NDJSON transcripts | e2e-runs/{runId}/{test}.ndjson |
Raw claude -p output per test |
| Failure JSON | e2e-runs/{runId}/{test}-failure.json |
Diagnostic data on failure |
Live dashboard: Run bun run eval:watch in a second terminal to see a live dashboard showing completed tests, the currently running test, and cost. Use --tail to also show the last 10 lines of progress.log.
Eval history tools:
bun run eval:list # list all eval runs (turns, duration, cost per run)
bun run eval:compare # compare two runs — shows per-test deltas + Takeaway commentary
bun run eval:summary # aggregate stats + per-test efficiency averages across runs
Eval comparison commentary: eval:compare generates natural-language Takeaway sections interpreting what changed between runs — flagging regressions, noting improvements, calling out efficiency gains (fewer turns, faster, cheaper), and producing an overall summary. This is driven by generateCommentary() in eval-store.ts.
Artifacts are never cleaned up — they accumulate in ~/.gstack-dev/ for post-mortem debugging and trend analysis.
Tier 3: LLM-as-judge (~$0.15/run)
Uses Claude Sonnet to score generated SKILL.md docs on three dimensions:
- Clarity — Can an AI agent understand the instructions without ambiguity?
- Completeness — Are all commands, flags, and usage patterns documented?
- Actionability — Can the agent execute tasks using only the information in the doc?
Each dimension is scored 1-5. Threshold: every dimension must score ≥ 4. There's also a regression test that compares generated docs against the hand-maintained baseline from origin/main — generated must score equal or higher.
# Needs ANTHROPIC_API_KEY in .env — included in bun run test:evals
- Uses
claude-sonnet-4-6for scoring stability - Tests live in
test/skill-llm-eval.test.ts - Calls the Anthropic API directly (not
claude -p), so it works from anywhere including inside Claude Code
CI
A GitHub Action (.github/workflows/skill-docs.yml) runs bun run gen:skill-docs --dry-run on every push and PR. If the generated SKILL.md files differ from what's committed, CI fails. This catches stale docs before they merge.
Tests run against the browse binary directly — they don't require dev mode.
Editing SKILL.md files
SKILL.md files are generated from .tmpl templates. Don't edit the .md directly — your changes will be overwritten on the next build.
# 1. Edit the template
vim SKILL.md.tmpl # or browse/SKILL.md.tmpl
# 2. Regenerate
bun run gen:skill-docs
# 3. Check health
bun run skill:check
# Or use watch mode — auto-regenerates on save
bun run dev:skill
For template authoring best practices (natural language over bash-isms, dynamic branch detection, {{BASE_BRANCH_DETECT}} usage), see CLAUDE.md's "Writing SKILL templates" section.
To add a browse command, add it to browse/src/commands.ts. To add a snapshot flag, add it to SNAPSHOT_FLAGS in browse/src/snapshot.ts. Then rebuild.
Conductor workspaces
If you're using Conductor to run multiple Claude Code sessions in parallel, conductor.json wires up workspace lifecycle automatically:
| Hook | Script | What it does |
|---|---|---|
setup |
bin/dev-setup |
Copies .env from main worktree, installs deps, symlinks skills |
archive |
bin/dev-teardown |
Removes skill symlinks, cleans up .claude/ directory |
When Conductor creates a new workspace, bin/dev-setup runs automatically. It detects the main worktree (via git worktree list), copies your .env so API keys carry over, and sets up dev mode — no manual steps needed.
First-time setup: Put your ANTHROPIC_API_KEY in .env in the main repo (see .env.example). Every Conductor workspace inherits it automatically.
Things to know
- SKILL.md files are generated. Edit the
.tmpltemplate, not the.md. Runbun run gen:skill-docsto regenerate. - TODOS.md is the unified backlog. Organized by skill/component with P0-P4 priorities.
/shipauto-detects completed items. All planning/review/retro skills read it for context. - Browse source changes need a rebuild. If you touch
browse/src/*.ts, runbun run build. - Dev mode shadows your global install. Project-local skills take priority over
~/.claude/skills/gstack.bin/dev-teardownrestores the global one. - Conductor workspaces are independent. Each workspace is its own git worktree.
bin/dev-setupruns automatically viaconductor.json. .envpropagates across worktrees. Set it once in the main repo, all Conductor workspaces get it..claude/skills/is gitignored. The symlinks never get committed.
Testing your changes in a real project
This is the recommended way to develop gstack. Symlink your gstack checkout into the project where you actually use it, so your changes are live while you do real work:
# In your core project
ln -sfn /path/to/your/gstack-checkout .claude/skills/gstack
cd .claude/skills/gstack && bun install && bun run build
Now every gstack skill invocation in this project uses your working tree. Edit a
template, run bun run gen:skill-docs, and the next /review or /qa call picks
it up immediately.
To go back to the stable global install, just remove the symlink:
rm .claude/skills/gstack
Claude Code falls back to ~/.claude/skills/gstack/ automatically.
Alternative: point your global install at a branch
If you don't want per-project symlinks, you can switch the global install:
cd ~/.claude/skills/gstack
git fetch origin
git checkout origin/<branch>
bun install && bun run build
This affects all projects. To revert: git checkout main && git pull && bun run build.
Shipping your changes
When you're happy with your skill edits:
/ship
This runs tests, reviews the diff, triages Greptile comments (with 2-tier escalation), manages TODOS.md, bumps the version, and opens a PR. See ship/SKILL.md for the full workflow.