* docs: update project documentation for v0.4.2 - README: skill count 9→10, added /document-release to skills table, install/uninstall sections, and dedicated section with example - CHANGELOG: added /document-release bullet to v0.4.2 entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /document-release skill with smart VERSION handling New skill runs after /ship but before PR merge. Reads every doc file, cross-references the diff, auto-updates factual changes, asks about risky edits. CHANGELOG clobber protection: never uses Write tool on CHANGELOG.md, only Edit with exact old_string matches. Smart VERSION logic: instead of silently skipping already-bumped versions, compares CHANGELOG entry scope against full diff and asks if significant uncovered changes exist. Also fixes gstack-upgrade/SKILL.md missing from skill-check.ts SKILL_FILES array (existing inconsistency with gen-skill-docs.ts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /review Step 5.6 — documentation staleness check Review skill now cross-references code changes against doc files. If a doc describes a feature that changed but the doc wasn't updated, flags it as INFORMATIONAL with a pointer to /document-release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /document-release E2E with CHANGELOG clobber guard E2E test creates a repo with existing CHANGELOG entries, runs /document-release, and asserts original entries survive. Critical guardrail against the incident where an agent replaced CHANGELOG entries during conflict resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump to v0.4.3 — /document-release skill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after merge * chore: regenerate SKILL.md files after merge --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7.3 KiB
gstack development
Commands
bun install # install dependencies
bun test # run free tests (browse + snapshot + skill validation)
bun run test:evals # run paid evals: LLM judge + E2E (~$4/run)
bun run test:e2e # run E2E tests only (~$3.85/run)
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build # gen docs + compile binaries
bun run gen:skill-docs # regenerate SKILL.md files from templates
bun run skill:check # health dashboard for all skills
bun run dev:skill # watch mode: auto-regen + validate on change
bun run eval:list # list all eval runs from ~/.gstack-dev/evals/
bun run eval:compare # compare two eval runs (auto-picks most recent)
bun run eval:summary # aggregate stats across all eval runs
test:evals requires ANTHROPIC_API_KEY. E2E tests stream progress in real-time
(tool-by-tool via --output-format stream-json --verbose). Results are persisted
to ~/.gstack-dev/evals/ with auto-comparison against the previous run.
Project structure
gstack/
├── browse/ # Headless browser CLI (Playwright)
│ ├── src/ # CLI + server + commands
│ │ ├── commands.ts # Command registry (single source of truth)
│ │ └── snapshot.ts # SNAPSHOT_FLAGS metadata array
│ ├── test/ # Integration tests + fixtures
│ └── dist/ # Compiled binary
├── scripts/ # Build + DX tooling
│ ├── gen-skill-docs.ts # Template → SKILL.md generator
│ ├── skill-check.ts # Health dashboard
│ └── dev-skill.ts # Watch mode
├── test/ # Skill validation + eval tests
│ ├── helpers/ # skill-parser.ts, session-runner.ts, llm-judge.ts, eval-store.ts
│ ├── fixtures/ # Ground truth JSON, planted-bug fixtures, eval baselines
│ ├── skill-validation.test.ts # Tier 1: static validation (free, <1s)
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/ # Retrospective skill
├── document-release/ # /document-release skill (post-ship doc updates)
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
└── package.json # Build scripts for browse
SKILL.md workflow
SKILL.md files are generated from .tmpl templates. To update docs:
- Edit the
.tmplfile (e.g.SKILL.md.tmplorbrowse/SKILL.md.tmpl) - Run
bun run gen:skill-docs(orbun run buildwhich does it automatically) - Commit both the
.tmpland generated.mdfiles
To add a new browse command: add it to browse/src/commands.ts and rebuild.
To add a snapshot flag: add it to SNAPSHOT_FLAGS in browse/src/snapshot.ts and rebuild.
Writing SKILL templates
SKILL.md.tmpl files are prompt templates read by Claude, not bash scripts. Each bash code block runs in a separate shell — variables do not persist between blocks.
Rules:
- Use natural language for logic and state. Don't use shell variables to pass state between code blocks. Instead, tell Claude what to remember and reference it in prose (e.g., "the base branch detected in Step 0").
- Don't hardcode branch names. Detect
main/master/etc dynamically viagh pr vieworgh repo view. Use{{BASE_BRANCH_DETECT}}for PR-targeting skills. Use "the base branch" in prose,<base>in code block placeholders. - Keep bash blocks self-contained. Each code block should work independently. If a block needs context from a previous step, restate it in the prose above.
- Express conditionals as English. Instead of nested
if/elif/elsein bash, write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."
Browser interaction
When you need to interact with a browser (QA, dogfooding, cookie setup), use the
/browse skill or run the browse binary directly via $B <command>. NEVER use
mcp__claude-in-chrome__* tools — they are slow, unreliable, and not what this
project uses.
Vendored symlink awareness
When developing gstack, .claude/skills/gstack may be a symlink back to this
working directory (gitignored). This means skill changes are live immediately —
great for rapid iteration, risky during big refactors where half-written skills
could break other Claude Code sessions using gstack concurrently.
Check once per session: Run ls -la .claude/skills/gstack to see if it's a
symlink or a real copy. If it's a symlink to your working directory, be aware that:
- Template changes +
bun run gen:skill-docsimmediately affect all gstack invocations - Breaking changes to SKILL.md.tmpl files can break concurrent gstack sessions
- During large refactors, remove the symlink (
rm .claude/skills/gstack) so the global install at~/.claude/skills/gstack/is used instead
For plan reviews: When reviewing plans that modify skill templates or the gen-skill-docs pipeline, consider whether the changes should be tested in isolation before going live (especially if the user is actively using gstack in other windows).
CHANGELOG style
CHANGELOG.md is for users, not contributors. Write it like product release notes:
- Lead with what the user can now do that they couldn't before. Sell the feature.
- Use plain language, not implementation details. "You can now..." not "Refactored the..."
- Put contributor/internal changes in a separate "For contributors" section at the bottom.
- Every entry should make someone think "oh nice, I want to try that."
- No jargon: say "every question now tells you which project and branch you're in" not "AskUserQuestion format standardized across skill templates via preamble resolver."
E2E eval failure blame protocol
When an E2E eval fails during /ship or any other workflow, never claim "not
related to our changes" without proving it. These systems have invisible couplings —
a preamble text change affects agent behavior, a new helper changes timing, a
regenerated SKILL.md shifts prompt context.
Required before attributing a failure to "pre-existing":
- Run the same eval on main (or base branch) and show it fails there too
- If it passes on main but fails on the branch — it IS your change. Trace the blame.
- If you can't run on main, say "unverified — may or may not be related" and flag it as a risk in the PR body
"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.
Deploying to the active skill
The active skill lives at ~/.claude/skills/gstack/. After making changes:
- Push your branch
- Fetch and reset in the skill directory:
cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main - Rebuild:
cd ~/.claude/skills/gstack && bun run build
Or copy the binary directly: cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse