mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 21:25:27 +02:00
1e06b6a5c6
* feat: add {{BASE_BRANCH_DETECT}} resolver to gen-skill-docs
DRY placeholder for dynamic base branch detection across PR-targeting
skills. Detects via gh pr view (existing PR base) → gh repo view
(repo default) → fallback to main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: ship skill detects base branch instead of hardcoding main
Replaces ~14 hardcoded 'main' references with dynamic detection via
{{BASE_BRANCH_DETECT}}. Fixes stacked branches and Conductor workspaces
targeting non-main branches. Adds --base <base> to gh pr create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review, qa, plan-ceo-review detect base branch dynamically
Same pattern as ship: replaces hardcoded 'main' with {{BASE_BRANCH_DETECT}}.
Also cleans up qa bash-isms (REPORT_DIR variable, port chaining).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: retro detects default branch instead of hardcoding origin/main
Retro queries commit history (not PR targets), so uses simpler detection:
gh repo view defaultBranchRef. Replaces ~11 origin/main refs with
origin/<default>.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit cross-step references in gstack-upgrade template
Bash blocks are self-contained, but cross-block variable references
(INSTALL_DIR from Step 2) were implicit. Adds prose making them explicit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs+test: SKILL authoring guidance + regression tests
Adds "Writing SKILL templates" section to CLAUDE.md explaining that
templates are prompts, not scripts. Adds validation test catching
hardcoded 'main' in git commands, and resolver content test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update ARCHITECTURE + CONTRIBUTING for new placeholders
Add {{BASE_BRANCH_DETECT}} to ARCHITECTURE.md placeholder list.
Cross-reference CLAUDE.md template authoring guidance from CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing blank line between resolver functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 3 E2E smoke tests for base branch detection
- /review: verifies Step 0 detection + git diff against detected base
- /ship: truncated dry-run (Steps 0-1 only, no push/PR), asserts no
destructive actions
- /retro: verifies default branch detection for git log queries
Covers the {{BASE_BRANCH_DETECT}} resolver path (review), the ship
template's dual abort check, and retro's inline detection pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
130 lines
6.5 KiB
Markdown
130 lines
6.5 KiB
Markdown
# gstack development
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
bun install # install dependencies
|
|
bun test # run free tests (browse + snapshot + skill validation)
|
|
bun run test:evals # run paid evals: LLM judge + E2E (~$4/run)
|
|
bun run test:e2e # run E2E tests only (~$3.85/run)
|
|
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
|
|
bun run build # gen docs + compile binaries
|
|
bun run gen:skill-docs # regenerate SKILL.md files from templates
|
|
bun run skill:check # health dashboard for all skills
|
|
bun run dev:skill # watch mode: auto-regen + validate on change
|
|
bun run eval:list # list all eval runs from ~/.gstack-dev/evals/
|
|
bun run eval:compare # compare two eval runs (auto-picks most recent)
|
|
bun run eval:summary # aggregate stats across all eval runs
|
|
```
|
|
|
|
`test:evals` requires `ANTHROPIC_API_KEY`. E2E tests stream progress in real-time
|
|
(tool-by-tool via `--output-format stream-json --verbose`). Results are persisted
|
|
to `~/.gstack-dev/evals/` with auto-comparison against the previous run.
|
|
|
|
## Project structure
|
|
|
|
```
|
|
gstack/
|
|
├── browse/ # Headless browser CLI (Playwright)
|
|
│ ├── src/ # CLI + server + commands
|
|
│ │ ├── commands.ts # Command registry (single source of truth)
|
|
│ │ └── snapshot.ts # SNAPSHOT_FLAGS metadata array
|
|
│ ├── test/ # Integration tests + fixtures
|
|
│ └── dist/ # Compiled binary
|
|
├── scripts/ # Build + DX tooling
|
|
│ ├── gen-skill-docs.ts # Template → SKILL.md generator
|
|
│ ├── skill-check.ts # Health dashboard
|
|
│ └── dev-skill.ts # Watch mode
|
|
├── test/ # Skill validation + eval tests
|
|
│ ├── helpers/ # skill-parser.ts, session-runner.ts, llm-judge.ts, eval-store.ts
|
|
│ ├── fixtures/ # Ground truth JSON, planted-bug fixtures, eval baselines
|
|
│ ├── skill-validation.test.ts # Tier 1: static validation (free, <1s)
|
|
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
|
|
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
|
|
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
|
|
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
|
|
├── ship/ # Ship workflow skill
|
|
├── review/ # PR review skill
|
|
├── plan-ceo-review/ # /plan-ceo-review skill
|
|
├── plan-eng-review/ # /plan-eng-review skill
|
|
├── retro/ # Retrospective skill
|
|
├── setup # One-time setup: build binary + symlink skills
|
|
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
|
|
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
|
|
└── package.json # Build scripts for browse
|
|
```
|
|
|
|
## SKILL.md workflow
|
|
|
|
SKILL.md files are **generated** from `.tmpl` templates. To update docs:
|
|
|
|
1. Edit the `.tmpl` file (e.g. `SKILL.md.tmpl` or `browse/SKILL.md.tmpl`)
|
|
2. Run `bun run gen:skill-docs` (or `bun run build` which does it automatically)
|
|
3. Commit both the `.tmpl` and generated `.md` files
|
|
|
|
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
|
|
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
|
|
|
|
## Writing SKILL templates
|
|
|
|
SKILL.md.tmpl files are **prompt templates read by Claude**, not bash scripts.
|
|
Each bash code block runs in a separate shell — variables do not persist between blocks.
|
|
|
|
Rules:
|
|
- **Use natural language for logic and state.** Don't use shell variables to pass
|
|
state between code blocks. Instead, tell Claude what to remember and reference
|
|
it in prose (e.g., "the base branch detected in Step 0").
|
|
- **Don't hardcode branch names.** Detect `main`/`master`/etc dynamically via
|
|
`gh pr view` or `gh repo view`. Use `{{BASE_BRANCH_DETECT}}` for PR-targeting
|
|
skills. Use "the base branch" in prose, `<base>` in code block placeholders.
|
|
- **Keep bash blocks self-contained.** Each code block should work independently.
|
|
If a block needs context from a previous step, restate it in the prose above.
|
|
- **Express conditionals as English.** Instead of nested `if/elif/else` in bash,
|
|
write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."
|
|
|
|
## Browser interaction
|
|
|
|
When you need to interact with a browser (QA, dogfooding, cookie setup), use the
|
|
`/browse` skill or run the browse binary directly via `$B <command>`. NEVER use
|
|
`mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this
|
|
project uses.
|
|
|
|
## Vendored symlink awareness
|
|
|
|
When developing gstack, `.claude/skills/gstack` may be a symlink back to this
|
|
working directory (gitignored). This means skill changes are **live immediately** —
|
|
great for rapid iteration, risky during big refactors where half-written skills
|
|
could break other Claude Code sessions using gstack concurrently.
|
|
|
|
**Check once per session:** Run `ls -la .claude/skills/gstack` to see if it's a
|
|
symlink or a real copy. If it's a symlink to your working directory, be aware that:
|
|
- Template changes + `bun run gen:skill-docs` immediately affect all gstack invocations
|
|
- Breaking changes to SKILL.md.tmpl files can break concurrent gstack sessions
|
|
- During large refactors, remove the symlink (`rm .claude/skills/gstack`) so the
|
|
global install at `~/.claude/skills/gstack/` is used instead
|
|
|
|
**For plan reviews:** When reviewing plans that modify skill templates or the
|
|
gen-skill-docs pipeline, consider whether the changes should be tested in isolation
|
|
before going live (especially if the user is actively using gstack in other windows).
|
|
|
|
## CHANGELOG style
|
|
|
|
CHANGELOG.md is **for users**, not contributors. Write it like product release notes:
|
|
|
|
- Lead with what the user can now **do** that they couldn't before. Sell the feature.
|
|
- Use plain language, not implementation details. "You can now..." not "Refactored the..."
|
|
- Put contributor/internal changes in a separate "For contributors" section at the bottom.
|
|
- Every entry should make someone think "oh nice, I want to try that."
|
|
- No jargon: say "every question now tells you which project and branch you're in" not
|
|
"AskUserQuestion format standardized across skill templates via preamble resolver."
|
|
|
|
## Deploying to the active skill
|
|
|
|
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
|
|
|
|
1. Push your branch
|
|
2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
|
|
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
|
|
|
|
Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
|