Files
gstack/CLAUDE.md
T
Garry Tan 4a77cc2c34 feat: /plan-design-review + /qa-design-review skills (v0.5.0) (#102)
* feat: add {{DESIGN_METHODOLOGY}} resolver and register design review skills

Add generateDesignMethodology() to gen-skill-docs.ts with 10-category, 80-item
design audit checklist. Register plan-design-review and qa-design-review templates
in findTemplates(). Add both skills to skill-check.ts SKILL_FILES. Add command
and snapshot flag validation tests for both skills in skill-validation.test.ts.

* feat: add /plan-design-review and /qa-design-review skills

/plan-design-review: report-only designer audit with letter grades, AI slop
scoring, structured first impression, design system extraction, DESIGN.md
inference and export offer. Never modifies code.

/qa-design-review: same audit, then iterative fix loop with style(design):
commits, CSS-safe WTF heuristic, before/after screenshots, final re-audit.

* chore: bump version and changelog (v0.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, ARCHITECTURE for design review skills (v0.5.0)

- Update skill count to 11, add /plan-design-review and /qa-design-review
  to skill table, install/uninstall commands, and demo walkthrough
- Add narrative sections: "senior designer mode" and "designer who codes mode"
  with compelling examples showing AI Slop detection and design system inference
- Add {{DESIGN_METHODOLOGY}} to ARCHITECTURE.md placeholder table
- Extend demo to show full plan→eng→review→ship→qa→design-review pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: regenerate design review SKILL.md files after merge from main

Picks up BASE_BRANCH_DETECT resolver and updated contributor mode from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /design-consultation skill — design consultant that creates DESIGN.md

6-phase consultant flow: product context → competitive research (WebSearch) →
complete coherent proposal → drill-downs on demand → font+color preview page →
write DESIGN.md + update CLAUDE.md. Opinionated recommendations grounded in
product context, not menu-driven forms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add E2E tests for design skill family (7 tests + LLM quality judge)

Tests 1-4: /design-consultation (core flow, research integration, existing
DESIGN.md handling, font+color preview generation).
Tests 5-6: /plan-design-review (audit report, DESIGN.md export).
Test 7: /qa-design-review (audit + fix loop).
LLM judge validates font blacklist compliance, coherence, and AI slop avoidance.
Also adds plan-design-review + qa-design-review to ALL_SKILLS test array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: mark /design-consultation as shipped in TODOS.md

Renamed from /setup-design-md to reflect the consultant approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 21:55:07 -05:00

154 lines
7.7 KiB
Markdown

# gstack development
## Commands
```bash
bun install # install dependencies
bun test # run free tests (browse + snapshot + skill validation)
bun run test:evals # run paid evals: LLM judge + E2E (~$4/run)
bun run test:e2e # run E2E tests only (~$3.85/run)
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build # gen docs + compile binaries
bun run gen:skill-docs # regenerate SKILL.md files from templates
bun run skill:check # health dashboard for all skills
bun run dev:skill # watch mode: auto-regen + validate on change
bun run eval:list # list all eval runs from ~/.gstack-dev/evals/
bun run eval:compare # compare two eval runs (auto-picks most recent)
bun run eval:summary # aggregate stats across all eval runs
```
`test:evals` requires `ANTHROPIC_API_KEY`. E2E tests stream progress in real-time
(tool-by-tool via `--output-format stream-json --verbose`). Results are persisted
to `~/.gstack-dev/evals/` with auto-comparison against the previous run.
## Project structure
```
gstack/
├── browse/ # Headless browser CLI (Playwright)
│ ├── src/ # CLI + server + commands
│ │ ├── commands.ts # Command registry (single source of truth)
│ │ └── snapshot.ts # SNAPSHOT_FLAGS metadata array
│ ├── test/ # Integration tests + fixtures
│ └── dist/ # Compiled binary
├── scripts/ # Build + DX tooling
│ ├── gen-skill-docs.ts # Template → SKILL.md generator
│ ├── skill-check.ts # Health dashboard
│ └── dev-skill.ts # Watch mode
├── test/ # Skill validation + eval tests
│ ├── helpers/ # skill-parser.ts, session-runner.ts, llm-judge.ts, eval-store.ts
│ ├── fixtures/ # Ground truth JSON, planted-bug fixtures, eval baselines
│ ├── skill-validation.test.ts # Tier 1: static validation (free, <1s)
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── plan-design-review/ # /plan-design-review skill (report-only design audit)
├── qa-design-review/ # /qa-design-review skill (design audit + fix loop)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/ # Retrospective skill
├── document-release/ # /document-release skill (post-ship doc updates)
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
└── package.json # Build scripts for browse
```
## SKILL.md workflow
SKILL.md files are **generated** from `.tmpl` templates. To update docs:
1. Edit the `.tmpl` file (e.g. `SKILL.md.tmpl` or `browse/SKILL.md.tmpl`)
2. Run `bun run gen:skill-docs` (or `bun run build` which does it automatically)
3. Commit both the `.tmpl` and generated `.md` files
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
## Writing SKILL templates
SKILL.md.tmpl files are **prompt templates read by Claude**, not bash scripts.
Each bash code block runs in a separate shell — variables do not persist between blocks.
Rules:
- **Use natural language for logic and state.** Don't use shell variables to pass
state between code blocks. Instead, tell Claude what to remember and reference
it in prose (e.g., "the base branch detected in Step 0").
- **Don't hardcode branch names.** Detect `main`/`master`/etc dynamically via
`gh pr view` or `gh repo view`. Use `{{BASE_BRANCH_DETECT}}` for PR-targeting
skills. Use "the base branch" in prose, `<base>` in code block placeholders.
- **Keep bash blocks self-contained.** Each code block should work independently.
If a block needs context from a previous step, restate it in the prose above.
- **Express conditionals as English.** Instead of nested `if/elif/else` in bash,
write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."
## Browser interaction
When you need to interact with a browser (QA, dogfooding, cookie setup), use the
`/browse` skill or run the browse binary directly via `$B <command>`. NEVER use
`mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this
project uses.
## Vendored symlink awareness
When developing gstack, `.claude/skills/gstack` may be a symlink back to this
working directory (gitignored). This means skill changes are **live immediately**
great for rapid iteration, risky during big refactors where half-written skills
could break other Claude Code sessions using gstack concurrently.
**Check once per session:** Run `ls -la .claude/skills/gstack` to see if it's a
symlink or a real copy. If it's a symlink to your working directory, be aware that:
- Template changes + `bun run gen:skill-docs` immediately affect all gstack invocations
- Breaking changes to SKILL.md.tmpl files can break concurrent gstack sessions
- During large refactors, remove the symlink (`rm .claude/skills/gstack`) so the
global install at `~/.claude/skills/gstack/` is used instead
**For plan reviews:** When reviewing plans that modify skill templates or the
gen-skill-docs pipeline, consider whether the changes should be tested in isolation
before going live (especially if the user is actively using gstack in other windows).
## CHANGELOG style
CHANGELOG.md is **for users**, not contributors. Write it like product release notes:
- Lead with what the user can now **do** that they couldn't before. Sell the feature.
- Use plain language, not implementation details. "You can now..." not "Refactored the..."
- Put contributor/internal changes in a separate "For contributors" section at the bottom.
- Every entry should make someone think "oh nice, I want to try that."
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."
## Local plans
Contributors can store long-range vision docs and design documents in `~/.gstack-dev/plans/`.
These are local-only (not checked in). When reviewing TODOS.md, check `plans/` for candidates
that may be ready to promote to TODOs or implement.
## E2E eval failure blame protocol
When an E2E eval fails during `/ship` or any other workflow, **never claim "not
related to our changes" without proving it.** These systems have invisible couplings —
a preamble text change affects agent behavior, a new helper changes timing, a
regenerated SKILL.md shifts prompt context.
**Required before attributing a failure to "pre-existing":**
1. Run the same eval on main (or base branch) and show it fails there too
2. If it passes on main but fails on the branch — it IS your change. Trace the blame.
3. If you can't run on main, say "unverified — may or may not be related" and flag it
as a risk in the PR body
"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.
## Deploying to the active skill
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
1. Push your branch
2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`