Files
gstack/CLAUDE.md
T
Garry Tan c35e933c7d fix: rewrite session-runner to claude -p subprocess, lower flaky baselines
Session runner now spawns `claude -p` as a subprocess instead of using
Agent SDK query(), which fixes E2E tests hanging inside Claude Code.
Also lowers command_reference completeness baseline to 3 (flaky oscillation),
adds test:e2e script, and updates CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 02:34:10 -05:00

82 lines
3.7 KiB
Markdown

# gstack development
## Commands
```bash
bun install # install dependencies
bun test # run free tests (browse + snapshot + skill validation)
bun run test:evals # run paid evals: LLM judge + Agent SDK E2E (~$4/run)
bun run test:e2e # run Agent SDK E2E tests only (~$3.85/run)
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build # gen docs + compile binaries
bun run gen:skill-docs # regenerate SKILL.md files from templates
bun run skill:check # health dashboard for all skills
bun run dev:skill # watch mode: auto-regen + validate on change
```
`test:evals` requires `ANTHROPIC_API_KEY` and must be run from a plain terminal
(not inside Claude Code — nested Agent SDK sessions hang).
**Update (v0.3.5):** The session runner now strips CLAUDE* env vars automatically,
so `test:evals` may work inside Claude Code. If E2E tests hang, run from a plain terminal.
## Project structure
```
gstack/
├── browse/ # Headless browser CLI (Playwright)
│ ├── src/ # CLI + server + commands
│ │ ├── commands.ts # Command registry (single source of truth)
│ │ └── snapshot.ts # SNAPSHOT_FLAGS metadata array
│ ├── test/ # Integration tests + fixtures
│ └── dist/ # Compiled binary
├── scripts/ # Build + DX tooling
│ ├── gen-skill-docs.ts # Template → SKILL.md generator
│ ├── skill-check.ts # Health dashboard
│ └── dev-skill.ts # Watch mode
├── test/ # Skill validation + eval tests
│ ├── helpers/ # skill-parser.ts, session-runner.ts, llm-judge.ts
│ ├── fixtures/ # Ground truth JSON, planted-bug fixtures, eval baselines
│ ├── skill-validation.test.ts # Tier 1: static validation (free, <1s)
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: Agent SDK E2E (~$3.85/run)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/ # Retrospective skill
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
└── package.json # Build scripts for browse
```
## SKILL.md workflow
SKILL.md files are **generated** from `.tmpl` templates. To update docs:
1. Edit the `.tmpl` file (e.g. `SKILL.md.tmpl` or `browse/SKILL.md.tmpl`)
2. Run `bun run gen:skill-docs` (or `bun run build` which does it automatically)
3. Commit both the `.tmpl` and generated `.md` files
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
## Browser interaction
When you need to interact with a browser (QA, dogfooding, cookie setup), use the
`/browse` skill or run the browse binary directly via `$B <command>`. NEVER use
`mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this
project uses.
## Deploying to the active skill
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
1. Push your branch
2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`