mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
5205070299
* refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata
- NEW: browse/src/commands.ts — command sets + COMMAND_DESCRIPTIONS + load-time validation (zero side effects)
- server.ts imports from commands.ts instead of declaring sets inline
- snapshot.ts: SNAPSHOT_FLAGS array drives parseSnapshotArgs (metadata-driven, no duplication)
- All 186 existing tests pass
* feat: SKILL.md template system with auto-generated command references
- SKILL.md.tmpl + browse/SKILL.md.tmpl with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders
- scripts/gen-skill-docs.ts generates SKILL.md from templates (supports --dry-run)
- Build pipeline runs gen:skill-docs before binary compilation
- Generated files have AUTO-GENERATED header, committed to git
* test: Tier 1 static validation — 34 tests for SKILL.md command correctness
- test/helpers/skill-parser.ts: extracts $B commands from code blocks, validates against registry
- test/skill-parser.test.ts: 13 parser/validator unit tests
- test/skill-validation.test.ts: 13 tests validating all SKILL.md files + registry consistency
- test/gen-skill-docs.test.ts: 8 generator tests (categories, sorting, freshness)
* feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding
- scripts/skill-check.ts: health summary for all SKILL.md files (commands, templates, freshness)
- scripts/dev-skill.ts: watch mode for template development
- test/helpers/session-runner.ts: Agent SDK wrapper for E2E skill tests
- test/skill-e2e.test.ts: 2 E2E tests + 3 stubs (auto-skip inside Claude Code sessions)
- E2E tests must run from plain terminal: SKILL_E2E=1 bun test test/skill-e2e.test.ts
* ci: SKILL.md freshness check on push/PR + TODO updates
- .github/workflows/skill-docs.yml: fails if generated SKILL.md files are stale
- TODO.md: add E2E cost tracking and model pinning to future ideas
* fix: restore rich descriptions lost in auto-generation
- Snapshot flags: add back value hints (-d <N>, -s <sel>, -o <path>)
- Snapshot flags: restore parenthetical context (@e refs, @c refs, etc.)
- Commands: is → includes valid states enum
- Commands: console → notes --errors filter behavior
- Commands: press → lists common keys (Enter, Tab, Escape)
- Commands: cookie-import-browser → describes picker UI
- Commands: dialog-accept → specifies alert/confirm/prompt
- Tips: restore → arrow (was downgraded to ->)
* test: quality evals for generated SKILL.md descriptions
Catches the exact regressions we shipped and caught in review:
- Snapshot flags must include value hints (-d <N>, -s <sel>, -o <path>)
- is command must list all valid states (visible/hidden/enabled/...)
- press command must list example keys (Enter, Tab, Escape)
- console command must describe --errors behavior
- Snapshot -i must mention @e refs, -C must mention @c refs
- All descriptions must be >= 8 chars (no empty stubs)
- Tips section must use → not ->
* feat: LLM-as-judge evals for SKILL.md documentation quality
4 eval tests using Anthropic API (claude-haiku, ~$0.01-0.03/run):
- Command reference table: clarity/completeness/actionability >= 4/5
- Snapshot flags section: same thresholds
- browse/SKILL.md overall quality
- Regression: generated version must score >= hand-maintained baseline
Requires ANTHROPIC_API_KEY. Auto-skips without it.
Run: bun run test:eval (or ANTHROPIC_API_KEY=sk-... bun test test/skill-llm-eval.test.ts)
* chore: bump version to 0.3.3, update changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add ARCHITECTURE.md, update CLAUDE.md and CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: conductor.json lifecycle hooks + .env propagation across worktrees
bin/dev-setup now copies .env from main worktree so API keys carry
over to Conductor workspaces automatically. conductor.json wires up
setup and archive hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: complete CHANGELOG for v0.3.3 (architecture, conductor, .env)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
255 lines
10 KiB
Markdown
255 lines
10 KiB
Markdown
# Contributing to gstack
|
|
|
|
Thanks for wanting to make gstack better. Whether you're fixing a typo in a skill prompt or building an entirely new workflow, this guide will get you up and running fast.
|
|
|
|
## Quick start
|
|
|
|
gstack skills are Markdown files that Claude Code discovers from a `skills/` directory. Normally they live at `~/.claude/skills/gstack/` (your global install). But when you're developing gstack itself, you want Claude Code to use the skills *in your working tree* — so edits take effect instantly without copying or deploying anything.
|
|
|
|
That's what dev mode does. It symlinks your repo into the local `.claude/skills/` directory so Claude Code reads skills straight from your checkout.
|
|
|
|
```bash
|
|
git clone <repo> && cd gstack
|
|
bun install # install dependencies
|
|
bin/dev-setup # activate dev mode
|
|
```
|
|
|
|
Now edit any `SKILL.md`, invoke it in Claude Code (e.g. `/review`), and see your changes live. When you're done developing:
|
|
|
|
```bash
|
|
bin/dev-teardown # deactivate — back to your global install
|
|
```
|
|
|
|
## How dev mode works
|
|
|
|
`bin/dev-setup` creates a `.claude/skills/` directory inside the repo (gitignored) and fills it with symlinks pointing back to your working tree. Claude Code sees the local `skills/` first, so your edits win over the global install.
|
|
|
|
```
|
|
gstack/ <- your working tree
|
|
├── .claude/skills/ <- created by dev-setup (gitignored)
|
|
│ ├── gstack -> ../../ <- symlink back to repo root
|
|
│ ├── review -> gstack/review
|
|
│ ├── ship -> gstack/ship
|
|
│ └── ... <- one symlink per skill
|
|
├── review/
|
|
│ └── SKILL.md <- edit this, test with /review
|
|
├── ship/
|
|
│ └── SKILL.md
|
|
├── browse/
|
|
│ ├── src/ <- TypeScript source
|
|
│ └── dist/ <- compiled binary (gitignored)
|
|
└── ...
|
|
```
|
|
|
|
## Day-to-day workflow
|
|
|
|
```bash
|
|
# 1. Enter dev mode
|
|
bin/dev-setup
|
|
|
|
# 2. Edit a skill
|
|
vim review/SKILL.md
|
|
|
|
# 3. Test it in Claude Code — changes are live
|
|
# > /review
|
|
|
|
# 4. Editing browse source? Rebuild the binary
|
|
bun run build
|
|
|
|
# 5. Done for the day? Tear down
|
|
bin/dev-teardown
|
|
```
|
|
|
|
## Testing & evals
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# 1. Copy .env.example and add your API key
|
|
cp .env.example .env
|
|
# Edit .env → set ANTHROPIC_API_KEY=sk-ant-...
|
|
|
|
# 2. Install deps (if you haven't already)
|
|
bun install
|
|
```
|
|
|
|
Bun auto-loads `.env` — no extra config. Conductor workspaces inherit `.env` from the main worktree automatically (see "Conductor workspaces" below).
|
|
|
|
### Test tiers
|
|
|
|
| Tier | Command | Cost | What it tests |
|
|
|------|---------|------|---------------|
|
|
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness |
|
|
| 2 — E2E | `bun run test:e2e` | ~$0.50 | Full skill execution via Agent SDK |
|
|
| 3 — LLM eval | `bun run test:eval` | ~$0.03 | Doc quality scoring via LLM-as-judge |
|
|
|
|
```bash
|
|
bun test # Tier 1 only (runs on every commit, <5s)
|
|
bun run test:eval # Tier 3: LLM-as-judge (needs ANTHROPIC_API_KEY in .env)
|
|
bun run test:e2e # Tier 2: E2E (needs SKILL_E2E=1, can't run inside Claude Code)
|
|
bun run test:all # Tier 1 + Tier 2
|
|
```
|
|
|
|
### Tier 1: Static validation (free)
|
|
|
|
Runs automatically with `bun test`. No API keys needed.
|
|
|
|
- **Skill parser tests** (`test/skill-parser.test.ts`) — Extracts every `$B` command from SKILL.md bash code blocks and validates against the command registry in `browse/src/commands.ts`. Catches typos, removed commands, and invalid snapshot flags.
|
|
- **Skill validation tests** (`test/skill-validation.test.ts`) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds.
|
|
- **Generator tests** (`test/gen-skill-docs.test.ts`) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g. `-d <N>` not just `-d`), enriched descriptions for key commands (e.g. `is` lists valid states, `press` lists key examples).
|
|
|
|
### Tier 2: E2E via Agent SDK (~$0.50/run)
|
|
|
|
Spawns a real Claude Code session, invokes `/qa` or `/browse`, and scans tool results for errors. This is the closest thing to "does this skill actually work end-to-end?"
|
|
|
|
```bash
|
|
# Must run from a plain terminal — can't nest inside Claude Code or Conductor
|
|
SKILL_E2E=1 bun test test/skill-e2e.test.ts
|
|
```
|
|
|
|
- Gated by `SKILL_E2E=1` env var (prevents accidental expensive runs)
|
|
- Auto-skips if it detects it's running inside Claude Code (Agent SDK can't nest)
|
|
- Saves full conversation transcripts on failure for debugging
|
|
- Tests live in `test/skill-e2e.test.ts`, runner logic in `test/helpers/session-runner.ts`
|
|
|
|
### Tier 3: LLM-as-judge (~$0.03/run)
|
|
|
|
Uses Claude Haiku to score generated SKILL.md docs on three dimensions:
|
|
|
|
- **Clarity** — Can an AI agent understand the instructions without ambiguity?
|
|
- **Completeness** — Are all commands, flags, and usage patterns documented?
|
|
- **Actionability** — Can the agent execute tasks using only the information in the doc?
|
|
|
|
Each dimension is scored 1-5. Threshold: every dimension must score **≥ 4**. There's also a regression test that compares generated docs against the hand-maintained baseline from `origin/main` — generated must score equal or higher.
|
|
|
|
```bash
|
|
# Needs ANTHROPIC_API_KEY in .env
|
|
bun run test:eval
|
|
```
|
|
|
|
- Uses `claude-haiku-4-5` for cost efficiency
|
|
- Tests live in `test/skill-llm-eval.test.ts`
|
|
- Calls the Anthropic API directly (not Agent SDK), so it works from anywhere including inside Claude Code
|
|
|
|
### CI
|
|
|
|
A GitHub Action (`.github/workflows/skill-docs.yml`) runs `bun run gen:skill-docs --dry-run` on every push and PR. If the generated SKILL.md files differ from what's committed, CI fails. This catches stale docs before they merge.
|
|
|
|
Tests run against the browse binary directly — they don't require dev mode.
|
|
|
|
## Editing SKILL.md files
|
|
|
|
SKILL.md files are **generated** from `.tmpl` templates. Don't edit the `.md` directly — your changes will be overwritten on the next build.
|
|
|
|
```bash
|
|
# 1. Edit the template
|
|
vim SKILL.md.tmpl # or browse/SKILL.md.tmpl
|
|
|
|
# 2. Regenerate
|
|
bun run gen:skill-docs
|
|
|
|
# 3. Check health
|
|
bun run skill:check
|
|
|
|
# Or use watch mode — auto-regenerates on save
|
|
bun run dev:skill
|
|
```
|
|
|
|
To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild.
|
|
|
|
## Conductor workspaces
|
|
|
|
If you're using [Conductor](https://conductor.build) to run multiple Claude Code sessions in parallel, `conductor.json` wires up workspace lifecycle automatically:
|
|
|
|
| Hook | Script | What it does |
|
|
|------|--------|-------------|
|
|
| `setup` | `bin/dev-setup` | Copies `.env` from main worktree, installs deps, symlinks skills |
|
|
| `archive` | `bin/dev-teardown` | Removes skill symlinks, cleans up `.claude/` directory |
|
|
|
|
When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It detects the main worktree (via `git worktree list`), copies your `.env` so API keys carry over, and sets up dev mode — no manual steps needed.
|
|
|
|
**First-time setup:** Put your `ANTHROPIC_API_KEY` in `.env` in the main repo (see `.env.example`). Every Conductor workspace inherits it automatically.
|
|
|
|
## Things to know
|
|
|
|
- **SKILL.md files are generated.** Edit the `.tmpl` template, not the `.md`. Run `bun run gen:skill-docs` to regenerate.
|
|
- **Browse source changes need a rebuild.** If you touch `browse/src/*.ts`, run `bun run build`.
|
|
- **Dev mode shadows your global install.** Project-local skills take priority over `~/.claude/skills/gstack`. `bin/dev-teardown` restores the global one.
|
|
- **Conductor workspaces are independent.** Each workspace is its own git worktree. `bin/dev-setup` runs automatically via `conductor.json`.
|
|
- **`.env` propagates across worktrees.** Set it once in the main repo, all Conductor workspaces get it.
|
|
- **`.claude/skills/` is gitignored.** The symlinks never get committed.
|
|
|
|
## Testing a branch in another repo
|
|
|
|
When you're developing gstack in one workspace and want to test your branch in a
|
|
different project (e.g. testing browse changes against your real app), there are
|
|
two cases depending on how gstack is installed in that project.
|
|
|
|
### Global install only (no `.claude/skills/gstack/` in the project)
|
|
|
|
Point your global install at the branch:
|
|
|
|
```bash
|
|
cd ~/.claude/skills/gstack
|
|
git fetch origin
|
|
git checkout origin/<branch> # e.g. origin/v0.3.2
|
|
bun install # in case deps changed
|
|
bun run build # rebuild the binary
|
|
```
|
|
|
|
Now open Claude Code in the other project — it picks up skills from
|
|
`~/.claude/skills/` automatically. To go back to main when you're done:
|
|
|
|
```bash
|
|
cd ~/.claude/skills/gstack
|
|
git checkout main && git pull
|
|
bun run build
|
|
```
|
|
|
|
### Vendored project copy (`.claude/skills/gstack/` checked into the project)
|
|
|
|
Some projects vendor gstack by copying it into the repo (no `.git` inside the
|
|
copy). Project-local skills take priority over global, so you need to update
|
|
the vendored copy too. This is a three-step process:
|
|
|
|
1. **Update your global install to the branch** (so you have the source):
|
|
```bash
|
|
cd ~/.claude/skills/gstack
|
|
git fetch origin
|
|
git checkout origin/<branch> # e.g. origin/v0.3.2
|
|
bun install && bun run build
|
|
```
|
|
|
|
2. **Replace the vendored copy** in the other project:
|
|
```bash
|
|
cd /path/to/other-project
|
|
|
|
# Remove old skill symlinks and vendored copy
|
|
for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do
|
|
rm -f .claude/skills/$s
|
|
done
|
|
rm -rf .claude/skills/gstack
|
|
|
|
# Copy from global install (strips .git so it stays vendored)
|
|
cp -Rf ~/.claude/skills/gstack .claude/skills/gstack
|
|
rm -rf .claude/skills/gstack/.git
|
|
|
|
# Rebuild binary and re-create skill symlinks
|
|
cd .claude/skills/gstack && ./setup
|
|
```
|
|
|
|
3. **Test your changes** — open Claude Code in that project and use the skills.
|
|
|
|
To revert to main later, repeat steps 1-2 with `git checkout main && git pull`
|
|
instead of `git checkout origin/<branch>`.
|
|
|
|
## Shipping your changes
|
|
|
|
When you're happy with your skill edits:
|
|
|
|
```bash
|
|
/ship
|
|
```
|
|
|
|
This runs tests, reviews the diff, bumps the version, and opens a PR. See `ship/SKILL.md` for the full workflow.
|