From c8c2cbba33ac7037d9196d545be8effed8bba9db Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 08:10:01 -0500 Subject: [PATCH 1/4] docs: add /design-consultation skill to README (#127) The skill was fully implemented but completely absent from the README. Add it to the skill table, write a detailed section with usage example, and include it in install/uninstall instructions. Co-authored-by: Claude Opus 4.6 (1M context) --- README.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 1db8d1de..de6144ed 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.** -Twelve opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, design review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands. +Thirteen opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, design review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands. ### Without gstack @@ -20,6 +20,7 @@ Twelve opinionated workflow skills for [Claude Code](https://docs.anthropic.com/ | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. | | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. | | `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. | +| `/design-consultation` | Design consultant | Build a complete design system from scratch. Researches competitors, proposes aesthetic + typography + color + spacing + motion, generates a preview page, and writes DESIGN.md. | | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. | | `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. | | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. | @@ -131,11 +132,11 @@ This is the setup I use. One person, ten parallel agents, each with the right co Open Claude Code and paste this. Claude will do the rest. -> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it. +> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it. ### Step 2: Add to your repo so teammates get it (optional) -> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. +> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use). @@ -337,6 +338,70 @@ This is report only — it never touches your code. Use `/qa-design-review` when --- +## `/design-consultation` + +This is my **design partner mode**. + +`/plan-design-review` audits a site that already exists. `/design-consultation` is for when you have nothing yet — no design system, no font choices, no color palette. You are starting from zero and you want a senior designer to sit down with you and build the whole visual identity together. + +It is a conversation, not a form. The agent asks about your product, your users, and your space. If you want, it researches what top competitors in your category are doing — fonts, colors, layout patterns, aesthetic direction. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice. + +The key insight: individual design decisions are easy to make but hard to make coherently. Picking a font is simple. Picking a font that works with your color palette, your spacing density, your aesthetic direction, and your product's personality — that is the actual job. `/design-consultation` solves the coherence problem by proposing everything as one package, then letting you adjust any piece while it checks that the rest still holds together. + +After you agree on the system, it generates an interactive HTML preview page — your fonts rendered at real sizes, your color palette as actual UI components (buttons, cards, form inputs, alerts), light/dark mode toggle, all using your product name instead of lorem ipsum. You open the page in your browser and see the design system rendered before a single line of product code is written. + +Then it writes `DESIGN.md` to your repo root — your project's design source of truth — and updates `CLAUDE.md` so every future Claude Code session respects the system. From that point on, `/qa-design-review` can audit against it, and any agent working on your frontend knows the rules. + +``` +You: /design-consultation + +Claude: From the README, this looks like a developer analytics dashboard + for engineering teams. Sound right? And would you like me to + research what other dev tools are doing for design, or should + I work from what I know? + +You: Yes, it's a dashboard for eng managers. Research would be great. + +Claude: I looked at Linear, Vercel, Railway, Planetscale, and Posthog. + They converge on clean sans-serifs and muted palettes, but + Linear stands out with its dark-first approach and tight spacing. + The gap: most of these feel interchangeable. Here's my proposal: + + AESTHETIC: Industrial/Utilitarian — data-first, no decoration tax + TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data) + COLOR: Restrained — slate neutrals, single teal accent (#0D9488) + SPACING: 4px base, compact density + LAYOUT: Grid-disciplined, 12-column + MOTION: Minimal-functional — only transitions that aid comprehension + + This coheres because the utilitarian aesthetic demands a tight + spacing grid and restrained color. The serif display font creates + hierarchy contrast without adding visual noise. The monospace for + data tables feels native to the engineering audience. + + Want to adjust anything, or should I generate the preview page? + +You: Generate it. + +Claude: Preview opened in your browser. + [/tmp/design-consultation-preview-1710000000.html] + + The page shows Instrument Serif at hero scale, Geist in body + paragraphs, JetBrains Mono in a sample data table. Color swatches + with your teal accent against slate grays. Buttons, cards, and + alerts rendered in both light and dark mode. + + How does it feel? Want to adjust anything before I write DESIGN.md? + +You: Looks great. Ship it. + +Claude: Wrote DESIGN.md (typography, color, spacing, layout, motion). + Updated CLAUDE.md with design system guidelines. + All future sessions will respect these choices. +``` + +--- + ## `/qa-design-review` This is my **designer who codes mode**. @@ -757,7 +822,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically Paste this into Claude Code: -> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. +> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review plan-design-review design-consultation review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. ## Development From 5f41cd9ad76aba2d3817498273d174a33e0aadd8 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 10:30:19 -0500 Subject: [PATCH 2/4] feat: show screenshots to user during QA and browse sessions (v0.5.0.1) (#129) Add rule 11 to QA and Design methodologies in gen-skill-docs.ts instructing Claude to Read screenshot PNGs after taking them. This makes screenshots visible as clickable elements in Conductor and other Claude Code UIs. Also added to browse and gstack SKILL templates. Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 6 ++++++ SKILL.md | 1 + SKILL.md.tmpl | 1 + VERSION | 2 +- browse/SKILL.md | 3 +++ browse/SKILL.md.tmpl | 3 +++ plan-design-review/SKILL.md | 1 + qa-design-review/SKILL.md | 1 + qa-only/SKILL.md | 1 + qa/SKILL.md | 1 + scripts/gen-skill-docs.ts | 6 ++++-- 11 files changed, 23 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a86569c8..92a45858 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## 0.5.0.1 — 2026-03-17 + +### Fixed + +- **Screenshots are now visible during QA and browse sessions.** When gstack takes screenshots, they now show up as clickable image elements in your output — no more invisible `/tmp/browse-screenshot.png` paths you can't see. Works in `/qa`, `/qa-only`, `/plan-design-review`, `/qa-design-review`, `/browse`, and `/gstack`. + ## 0.5.0 — 2026-03-16 - **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches. diff --git a/SKILL.md b/SKILL.md index 76007eee..c0d6e267 100644 --- a/SKILL.md +++ b/SKILL.md @@ -111,6 +111,7 @@ If `NEEDS_SETUP`: - NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable. - Browser persists between calls — cookies, login sessions, and tabs carry over. - Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup. +- **Show screenshots:** After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. ## QA Workflows diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl index 7f2e11db..dd2b2345 100644 --- a/SKILL.md.tmpl +++ b/SKILL.md.tmpl @@ -29,6 +29,7 @@ Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, - NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable. - Browser persists between calls — cookies, login sessions, and tabs carry over. - Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup. +- **Show screenshots:** After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. ## QA Workflows diff --git a/VERSION b/VERSION index 8f0916f7..494bbc1a 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.5.0 +0.5.0.1 diff --git a/browse/SKILL.md b/browse/SKILL.md index dd473dc0..33accc09 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -184,6 +184,9 @@ $B snapshot -D # verify deletion happened $B diff https://staging.app.com https://prod.app.com ``` +### 11. Show screenshots to the user +After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. + ## Snapshot Flags The snapshot is your primary tool for understanding and interacting with pages. diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index 6ce20634..0c42ffcb 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -102,6 +102,9 @@ $B snapshot -D # verify deletion happened $B diff https://staging.app.com https://prod.app.com ``` +### 11. Show screenshots to the user +After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. + ## Snapshot Flags {{SNAPSHOT_FLAGS}} diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index c0ca9512..1231d968 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -465,6 +465,7 @@ Tie everything to user goals and product objectives. Always suggest specific imp 8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense. 9. **Document incrementally.** Write each finding to the report as you find it. Don't batch. 10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations. +11. **Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. --- diff --git a/qa-design-review/SKILL.md b/qa-design-review/SKILL.md index 08c9916a..7fa0b856 100644 --- a/qa-design-review/SKILL.md +++ b/qa-design-review/SKILL.md @@ -477,6 +477,7 @@ Tie everything to user goals and product objectives. Always suggest specific imp 8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense. 9. **Document incrementally.** Write each finding to the report as you find it. Don't batch. 10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations. +11. **Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. Record baseline design score and AI slop score at end of Phase 6. diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 5e74f03e..ed732bf5 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -415,6 +415,7 @@ Minimum 0 per category. 8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions. 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional. 10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses. +11. **Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. --- diff --git a/qa/SKILL.md b/qa/SKILL.md index 6b078a9d..ac12fc94 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -454,6 +454,7 @@ Minimum 0 per category. 8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions. 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional. 10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses. +11. **Show screenshots to the user.** After every `$B screenshot`, `$B snapshot -a -o`, or `$B responsive` command, use the Read tool on the output file(s) so the user can see them inline. For `responsive` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. Record baseline health score at end of Phase 6. diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 7a0b85ab..5d5f84bb 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -476,7 +476,8 @@ Minimum 0 per category. 7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end. 8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions. 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional. -10. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses.`; +10. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses. +11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`; } function generateDesignMethodology(): string { @@ -809,7 +810,8 @@ Tie everything to user goals and product objectives. Always suggest specific imp 7. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses. 8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense. 9. **Document incrementally.** Write each finding to the report as you find it. Don't batch. -10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.`; +10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations. +11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`; } const RESOLVERS: Record string> = { From 73b00b4e29eef5164e166a06db2df255edd4c379 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 10:33:46 -0500 Subject: [PATCH 3/4] feat: Review Readiness Dashboard + gstack-slug helper (v0.5.1) (#130) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: add bin/gstack-slug helper + migrate all inline SLUG computation Extract the opaque SLUG sed pipeline into a shared 5-line shell script. Replace 8 inline copies across templates with eval $(gstack-slug). Sanitizes branch names (/ → -) to prevent subdirectory creation. * feat: review readiness dashboard — track CEO/Eng/Design reviews per branch Each review skill logs its result to JSONL. A shared {{REVIEW_DASHBOARD}} placeholder displays run counts, timestamps, and a CLEARED TO SHIP verdict. /ship pre-flight reads the dashboard and prompts when reviews are missing. * chore: bump version and changelog (v0.5.1) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 13 ++++++-- TODOS.md | 26 ++++++++++++++++ VERSION | 2 +- bin/gstack-slug | 9 ++++++ design-consultation/SKILL.md | 2 +- design-consultation/SKILL.md.tmpl | 2 +- plan-ceo-review/SKILL.md | 47 +++++++++++++++++++++++++++++ plan-ceo-review/SKILL.md.tmpl | 19 ++++++++++++ plan-design-review/SKILL.md | 48 ++++++++++++++++++++++++++++- plan-design-review/SKILL.md.tmpl | 18 +++++++++++ plan-eng-review/SKILL.md | 50 +++++++++++++++++++++++++++++-- plan-eng-review/SKILL.md.tmpl | 22 ++++++++++++-- qa-design-review/SKILL.md | 4 +-- qa-design-review/SKILL.md.tmpl | 2 +- qa-only/SKILL.md | 4 +-- qa-only/SKILL.md.tmpl | 4 +-- qa/SKILL.md | 4 +-- qa/SKILL.md.tmpl | 4 +-- scripts/gen-skill-docs.ts | 35 +++++++++++++++++++++- ship/SKILL.md | 37 +++++++++++++++++++++++ ship/SKILL.md.tmpl | 9 ++++++ test/gen-skill-docs.test.ts | 27 +++++++++++++++++ test/skill-validation.test.ts | 42 ++++++++++++++++++++++++++ 23 files changed, 407 insertions(+), 23 deletions(-) create mode 100755 bin/gstack-slug diff --git a/CHANGELOG.md b/CHANGELOG.md index 92a45858..a4bd0786 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,11 +1,18 @@ # Changelog -## 0.5.0.1 — 2026-03-17 - -### Fixed +## 0.5.1 — 2026-03-17 +- **Know where you stand before you ship.** Every `/plan-ceo-review`, `/plan-eng-review`, and `/plan-design-review` now logs its result to a review tracker. At the end of each review, you see a **Review Readiness Dashboard** showing which reviews are done, when they ran, and whether they're clean — with a clear CLEARED TO SHIP or NOT READY verdict. +- **`/ship` checks your reviews before creating the PR.** Pre-flight now reads the dashboard and asks if you want to continue when reviews are missing. Informational only — it won't block you, but you'll know what you skipped. +- **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `eval $(gstack-slug)`. If the format ever changes, fix it once. - **Screenshots are now visible during QA and browse sessions.** When gstack takes screenshots, they now show up as clickable image elements in your output — no more invisible `/tmp/browse-screenshot.png` paths you can't see. Works in `/qa`, `/qa-only`, `/plan-design-review`, `/qa-design-review`, `/browse`, and `/gstack`. +### For contributors + +- Added `{{REVIEW_DASHBOARD}}` resolver to `gen-skill-docs.ts` — shared dashboard reader injected into 4 templates (3 review skills + ship). +- Added `bin/gstack-slug` helper (5-line bash) with unit tests. Outputs `SLUG=` and `BRANCH=` lines, sanitizes `/` to `-`. +- New TODOs: smart review relevance detection (P3), `/merge` skill for review-gated PR merge (P2). + ## 0.5.0 — 2026-03-16 - **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches. diff --git a/TODOS.md b/TODOS.md index d76f71c4..f52bb693 100644 --- a/TODOS.md +++ b/TODOS.md @@ -408,6 +408,32 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu **Priority:** P2 **Depends on:** None +## Ship Confidence Dashboard + +### Smart review relevance detection + +**What:** Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only). + +**Why:** Currently dashboard always shows 4 rows. On docs-only changes, "Design Review: NOT YET RUN" is noise. + +**Context:** /plan-design-review and /qa already do file-type detection in diff-aware mode. Could reuse that heuristic. Would require a `gstack-diff-scope` helper or enriching `gstack-slug` to also output change categories. + +**Effort:** M +**Priority:** P3 +**Depends on:** Ship Confidence Dashboard (shipped) + +### /merge skill — review-gated PR merge + +**What:** Create a `/merge` skill that merges an approved PR, but first checks the Review Readiness Dashboard and runs `/review` (Fix-First) if code review hasn't been done. Separates "ship" (create PR) from "merge" (land it). + +**Why:** Currently `/review` runs inside `/ship` Step 3.5 but isn't tracked as a gate. A `/merge` skill ensures code review always happens before landing, and enables workflows where someone else reviews the PR first. + +**Context:** `/ship` creates the PR. `/merge` would: check dashboard → run `/review` if needed → `gh pr merge`. This is where code review tracking belongs — at merge time, not at plan time. + +**Effort:** M +**Priority:** P2 +**Depends on:** Ship Confidence Dashboard (shipped) + ## Completed ### Phase 1: Foundations (v0.2.0) diff --git a/VERSION b/VERSION index 494bbc1a..4b9fcbec 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.5.0.1 +0.5.1 diff --git a/bin/gstack-slug b/bin/gstack-slug new file mode 100755 index 00000000..7336b7b4 --- /dev/null +++ b/bin/gstack-slug @@ -0,0 +1,9 @@ +#!/usr/bin/env bash +# gstack-slug — output project slug and sanitized branch name +# Usage: eval $(gstack-slug) → sets SLUG and BRANCH variables +# Or: gstack-slug → prints SLUG=... and BRANCH=... lines +set -euo pipefail +SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-') +echo "SLUG=$SLUG" +echo "BRANCH=$BRANCH" diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md index d92d47fd..0d919950 100644 --- a/design-consultation/SKILL.md +++ b/design-consultation/SKILL.md @@ -114,7 +114,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30 Look for brainstorm output: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5 ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5 ``` diff --git a/design-consultation/SKILL.md.tmpl b/design-consultation/SKILL.md.tmpl index 11d868fa..ef5044fb 100644 --- a/design-consultation/SKILL.md.tmpl +++ b/design-consultation/SKILL.md.tmpl @@ -49,7 +49,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30 Look for brainstorm output: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5 ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5 ``` diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index b05db5e9..033f4e5f 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -534,6 +534,53 @@ List every ASCII diagram in files this plan touches. Still accurate? ### Unresolved Decisions If any AskUserQuestion goes unanswered, note it here. Never silently default. +## Review Log + +After producing the Completion Summary above, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Before running this command, substitute the placeholder values from the Completion Summary you just produced: +- **TIMESTAMP**: current ISO 8601 datetime (e.g., 2026-03-16T14:30:00) +- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open" +- **unresolved**: number from "Unresolved decisions" in the summary +- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary +- **MODE**: the mode the user selected (SCOPE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION) + +## Review Readiness Dashboard + +After completing the review, read the review log to display the dashboard. + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS" +``` + +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: + +``` ++====================================================================+ +| REVIEW READINESS DASHBOARD | ++====================================================================+ +| Review | Runs | Last Run | Status | +|-----------------|------|---------------------|----------------------| +| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | +| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | +| Design Review | 0 | — | NOT YET RUN | ++--------------------------------------------------------------------+ +| VERDICT: 2/3 CLEAR — Design Review not yet run | ++====================================================================+ +``` + +**Verdict logic:** +- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean" +- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days) +- Informational only — does NOT block. + ## Formatting Rules * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). * Label with NUMBER + LETTER (e.g., "3A", "3B"). diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl index 9902fafb..8664d095 100644 --- a/plan-ceo-review/SKILL.md.tmpl +++ b/plan-ceo-review/SKILL.md.tmpl @@ -452,6 +452,25 @@ List every ASCII diagram in files this plan touches. Still accurate? ### Unresolved Decisions If any AskUserQuestion goes unanswered, note it here. Never silently default. +## Review Log + +After producing the Completion Summary above, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Before running this command, substitute the placeholder values from the Completion Summary you just produced: +- **TIMESTAMP**: current ISO 8601 datetime (e.g., 2026-03-16T14:30:00) +- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open" +- **unresolved**: number from "Unresolved decisions" in the summary +- **critical_gaps**: number from "Failure modes: ___ CRITICAL GAPS" in the summary +- **MODE**: the mode the user selected (SCOPE_EXPANSION / HOLD_SCOPE / SCOPE_REDUCTION) + +{{REVIEW_DASHBOARD}} + ## Formatting Rules * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...). * Label with NUMBER + LETTER (e.g., "3A", "3B"). diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 1231d968..8fb0bfeb 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -384,7 +384,7 @@ Compare screenshots and observations across pages for: **Project-scoped:** ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to: `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md` @@ -557,3 +557,49 @@ Project type: {web app / dashboard / marketing site / etc.} 11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop. 12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create. + +## Review Log + +After compiling the report, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","design_score":"GRADE","ai_slop_score":"GRADE","mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute values from the report: +- **TIMESTAMP**: current ISO 8601 datetime +- **STATUS**: "clean" if Design Score is A or B; "issues_open" if C, D, or F +- **GRADE**: the letter grade from the report (Design Score and AI Slop Score respectively) +- **MODE**: Full / Quick / Deep / Diff-aware / Regression + +## Review Readiness Dashboard + +After completing the review, read the review log to display the dashboard. + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS" +``` + +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: + +``` ++====================================================================+ +| REVIEW READINESS DASHBOARD | ++====================================================================+ +| Review | Runs | Last Run | Status | +|-----------------|------|---------------------|----------------------| +| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | +| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | +| Design Review | 0 | — | NOT YET RUN | ++--------------------------------------------------------------------+ +| VERDICT: 2/3 CLEAR — Design Review not yet run | ++====================================================================+ +``` + +**Verdict logic:** +- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean" +- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days) +- Informational only — does NOT block. diff --git a/plan-design-review/SKILL.md.tmpl b/plan-design-review/SKILL.md.tmpl index b381c682..97546805 100644 --- a/plan-design-review/SKILL.md.tmpl +++ b/plan-design-review/SKILL.md.tmpl @@ -145,3 +145,21 @@ Project type: {web app / dashboard / marketing site / etc.} 11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop. 12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create. + +## Review Log + +After compiling the report, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","design_score":"GRADE","ai_slop_score":"GRADE","mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute values from the report: +- **TIMESTAMP**: current ISO 8601 datetime +- **STATUS**: "clean" if Design Score is A or B; "issues_open" if C, D, or F +- **GRADE**: the letter grade from the report (Design Score and AI Slop Score respectively) +- **MODE**: Full / Quick / Deep / Diff-aware / Regression + +{{REVIEW_DASHBOARD}} diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 4cd8f481..b5c0775b 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -154,8 +154,7 @@ For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in C After producing the test diagram, write a test plan artifact to the project directory so `/qa` and `/qa-only` can consume it as primary test input (replacing the lossy git-diff heuristic): ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') -BRANCH=$(git rev-parse --abbrev-ref HEAD) +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) USER=$(whoami) DATETIME=$(date +%Y%m%d-%H%M%S) mkdir -p ~/.gstack/projects/$SLUG @@ -259,5 +258,52 @@ Check the git log for this branch. If there are prior commits suggesting a previ * One sentence max per option. Pick in under 5 seconds. * After each review section, pause and ask for feedback before moving on. +## Review Log + +After producing the Completion Summary above, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute values from the Completion Summary: +- **TIMESTAMP**: current ISO 8601 datetime +- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open" +- **unresolved**: number from "Unresolved decisions" count +- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged" +- **MODE**: SCOPE_REDUCTION / BIG_CHANGE / SMALL_CHANGE + +## Review Readiness Dashboard + +After completing the review, read the review log to display the dashboard. + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS" +``` + +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: + +``` ++====================================================================+ +| REVIEW READINESS DASHBOARD | ++====================================================================+ +| Review | Runs | Last Run | Status | +|-----------------|------|---------------------|----------------------| +| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | +| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | +| Design Review | 0 | — | NOT YET RUN | ++--------------------------------------------------------------------+ +| VERDICT: 2/3 CLEAR — Design Review not yet run | ++====================================================================+ +``` + +**Verdict logic:** +- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean" +- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days) +- Informational only — does NOT block. + ## Unresolved decisions If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option. diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl index 410b0726..4905d8d3 100644 --- a/plan-eng-review/SKILL.md.tmpl +++ b/plan-eng-review/SKILL.md.tmpl @@ -89,8 +89,7 @@ For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in C After producing the test diagram, write a test plan artifact to the project directory so `/qa` and `/qa-only` can consume it as primary test input (replacing the lossy git-diff heuristic): ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') -BRANCH=$(git rev-parse --abbrev-ref HEAD) +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) USER=$(whoami) DATETIME=$(date +%Y%m%d-%H%M%S) mkdir -p ~/.gstack/projects/$SLUG @@ -194,5 +193,24 @@ Check the git log for this branch. If there are prior commits suggesting a previ * One sentence max per option. Pick in under 5 seconds. * After each review section, pause and ask for feedback before moving on. +## Review Log + +After producing the Completion Summary above, persist the review result: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +echo '{"skill":"plan-eng-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl +``` + +Substitute values from the Completion Summary: +- **TIMESTAMP**: current ISO 8601 datetime +- **STATUS**: "clean" if 0 unresolved decisions AND 0 critical gaps; otherwise "issues_open" +- **unresolved**: number from "Unresolved decisions" count +- **critical_gaps**: number from "Failure modes: ___ critical gaps flagged" +- **MODE**: SCOPE_REDUCTION / BIG_CHANGE / SMALL_CHANGE + +{{REVIEW_DASHBOARD}} + ## Unresolved decisions If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option. diff --git a/qa-design-review/SKILL.md b/qa-design-review/SKILL.md index 7fa0b856..0d8d0771 100644 --- a/qa-design-review/SKILL.md +++ b/qa-design-review/SKILL.md @@ -396,7 +396,7 @@ Compare screenshots and observations across pages for: **Project-scoped:** ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to: `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md` @@ -603,7 +603,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md` diff --git a/qa-design-review/SKILL.md.tmpl b/qa-design-review/SKILL.md.tmpl index 86e9aab9..0053a494 100644 --- a/qa-design-review/SKILL.md.tmpl +++ b/qa-design-review/SKILL.md.tmpl @@ -191,7 +191,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md` diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index ed732bf5..36f5fead 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -134,7 +134,7 @@ Before falling back to git diff heuristics, check for richer test plan sources: 1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo ```bash - SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') + eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 ``` 2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation @@ -427,7 +427,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** Write test outcome artifact for cross-session context: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md` diff --git a/qa-only/SKILL.md.tmpl b/qa-only/SKILL.md.tmpl index 5d49e201..101cd71c 100644 --- a/qa-only/SKILL.md.tmpl +++ b/qa-only/SKILL.md.tmpl @@ -52,7 +52,7 @@ Before falling back to git diff heuristics, check for richer test plan sources: 1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo ```bash - SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') + eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 ``` 2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation @@ -72,7 +72,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** Write test outcome artifact for cross-session context: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md` diff --git a/qa/SKILL.md b/qa/SKILL.md index ac12fc94..9bd8fc9b 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -171,7 +171,7 @@ Before falling back to git diff heuristics, check for richer test plan sources: 1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo ```bash - SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') + eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 ``` 2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation @@ -579,7 +579,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** Write test outcome artifact for cross-session context: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md` diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl index f491999e..45dfbea6 100644 --- a/qa/SKILL.md.tmpl +++ b/qa/SKILL.md.tmpl @@ -72,7 +72,7 @@ Before falling back to git diff heuristics, check for richer test plan sources: 1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo ```bash - SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') + eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1 ``` 2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation @@ -207,7 +207,7 @@ Write the report to both local and project-scoped locations: **Project-scoped:** Write test outcome artifact for cross-session context: ```bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG ``` Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md` diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 5d5f84bb..a9d3bce6 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -730,7 +730,7 @@ Compare screenshots and observations across pages for: **Project-scoped:** \`\`\`bash -SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\\([^/]*/[^/]*\\)\\.git$|\\1|;s|.*[:/]\\([^/]*/[^/]*\\)$|\\1|' | tr '/' '-') +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) mkdir -p ~/.gstack/projects/$SLUG \`\`\` Write to: \`~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md\` @@ -814,6 +814,38 @@ Tie everything to user goals and product objectives. Always suggest specific imp 11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`; } +function generateReviewDashboard(): string { + return `## Review Readiness Dashboard + +After completing the review, read the review log to display the dashboard. + +\`\`\`bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS" +\`\`\` + +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: + +\`\`\` ++====================================================================+ +| REVIEW READINESS DASHBOARD | ++====================================================================+ +| Review | Runs | Last Run | Status | +|-----------------|------|---------------------|----------------------| +| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | +| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | +| Design Review | 0 | — | NOT YET RUN | ++--------------------------------------------------------------------+ +| VERDICT: 2/3 CLEAR — Design Review not yet run | ++====================================================================+ +\`\`\` + +**Verdict logic:** +- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean" +- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days) +- Informational only — does NOT block.`; +} + const RESOLVERS: Record string> = { COMMAND_REFERENCE: generateCommandReference, SNAPSHOT_FLAGS: generateSnapshotFlags, @@ -822,6 +854,7 @@ const RESOLVERS: Record string> = { BASE_BRANCH_DETECT: generateBaseBranchDetect, QA_METHODOLOGY: generateQAMethodology, DESIGN_METHODOLOGY: generateDesignMethodology, + REVIEW_DASHBOARD: generateReviewDashboard, }; // ─── Template Processing ──────────────────────────────────── diff --git a/ship/SKILL.md b/ship/SKILL.md index 8aa4a181..e7b8b753 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -132,6 +132,43 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat 3. Run `git diff ...HEAD --stat` and `git log ..HEAD --oneline` to understand what's being shipped. +4. Check review readiness: + +## Review Readiness Dashboard + +After completing the review, read the review log to display the dashboard. + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS" +``` + +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review). Ignore entries with timestamps older than 7 days. Display: + +``` ++====================================================================+ +| REVIEW READINESS DASHBOARD | ++====================================================================+ +| Review | Runs | Last Run | Status | +|-----------------|------|---------------------|----------------------| +| CEO Review | 1 | 2026-03-16 14:30 | CLEAR | +| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | +| Design Review | 0 | — | NOT YET RUN | ++--------------------------------------------------------------------+ +| VERDICT: 2/3 CLEAR — Design Review not yet run | ++====================================================================+ +``` + +**Verdict logic:** +- **CLEARED TO SHIP (3/3)**: All three have >= 1 entry within 7 days AND most recent status is "clean" +- **N/3 CLEAR**: Show count and list which are missing, have open issues, or are stale (>7 days) +- Informational only — does NOT block. + +If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion: +- Show which reviews are missing or have open issues +- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial +- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change + --- ## Step 2: Merge the base branch (BEFORE tests) diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index 9339e90c..2a24bea3 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -50,6 +50,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat 3. Run `git diff ...HEAD --stat` and `git log ..HEAD --oneline` to understand what's being shipped. +4. Check review readiness: + +{{REVIEW_DASHBOARD}} + +If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion: +- Show which reviews are missing or have open issues +- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial +- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change + --- ## Step 2: Merge the base branch (BEFORE tests) diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index def042c9..26de63db 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -322,3 +322,30 @@ describe('description quality evals', () => { expect(tipsSection).not.toContain('->'); }); }); + +describe('REVIEW_DASHBOARD resolver', () => { + const REVIEW_SKILLS = ['plan-ceo-review', 'plan-eng-review', 'plan-design-review']; + + for (const skill of REVIEW_SKILLS) { + test(`review dashboard appears in ${skill} generated file`, () => { + const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8'); + expect(content).toContain('reviews.jsonl'); + expect(content).toContain('REVIEW READINESS DASHBOARD'); + }); + } + + test('review dashboard appears in ship generated file', () => { + const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8'); + expect(content).toContain('reviews.jsonl'); + expect(content).toContain('REVIEW READINESS DASHBOARD'); + }); + + test('resolver output contains key dashboard elements', () => { + const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8'); + expect(content).toContain('VERDICT'); + expect(content).toContain('CLEARED TO SHIP'); + expect(content).toContain('NOT YET RUN'); + expect(content).toContain('7 days'); + expect(content).toContain('Design Review'); + }); +}); diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 83cdca6e..78a9bef7 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -665,3 +665,45 @@ describe('Planted-bug fixture validation', () => { expect(content).toContain('update_column'); }); }); + +// --- gstack-slug helper --- + +describe('gstack-slug', () => { + const SLUG_BIN = path.join(ROOT, 'bin', 'gstack-slug'); + + test('binary exists and is executable', () => { + expect(fs.existsSync(SLUG_BIN)).toBe(true); + const stat = fs.statSync(SLUG_BIN); + expect(stat.mode & 0o111).toBeGreaterThan(0); + }); + + test('outputs SLUG and BRANCH lines in a git repo', () => { + const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }); + expect(result.exitCode).toBe(0); + const output = result.stdout.toString(); + expect(output).toContain('SLUG='); + expect(output).toContain('BRANCH='); + }); + + test('SLUG does not contain forward slashes', () => { + const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }); + const slug = result.stdout.toString().match(/SLUG=(.*)/)?.[1] ?? ''; + expect(slug).not.toContain('/'); + expect(slug.length).toBeGreaterThan(0); + }); + + test('BRANCH does not contain forward slashes', () => { + const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }); + const branch = result.stdout.toString().match(/BRANCH=(.*)/)?.[1] ?? ''; + expect(branch).not.toContain('/'); + expect(branch.length).toBeGreaterThan(0); + }); + + test('output is eval-compatible (KEY=VALUE format)', () => { + const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }); + const lines = result.stdout.toString().trim().split('\n'); + expect(lines.length).toBe(2); + expect(lines[0]).toMatch(/^SLUG=.+/); + expect(lines[1]).toMatch(/^BRANCH=.+/); + }); +}); From c99757b522efaff4011f5210126afc26695b827c Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 10:49:22 -0500 Subject: [PATCH 4/4] =?UTF-8?q?feat:=20/design-consultation=20=E2=80=94=20?= =?UTF-8?q?risk-taking,=20visual=20research,=20ambitious=20preview=20(v0.5?= =?UTF-8?q?.2)=20(#131)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: /design-consultation — risk-taking thesis, visual research, ambitious preview Add SAFE/RISK breakdown to design proposals so users see which choices match category conventions vs. which are deliberate creative departures. Wire browse binary for visual competitive research — agent browses competitor sites, takes screenshots, and analyzes fonts/colors/spacing with graceful degradation to WebSearch-only or built-in knowledge. Upgrade preview page instructions to render realistic product mockups (dashboards, marketing pages, settings forms) instead of just swatches. Rewrite README section with the thesis: "coherence is table stakes — the real question is where you take risks." * chore: bump version and changelog (v0.5.2) Co-Authored-By: Claude Opus 4.6 * fix: restore SKILL.md files to match main Prior commit included SKILL.md files regenerated from stale templates. Restore to match origin/main content. --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 7 ++- README.md | 67 ++++++++++++++--------- VERSION | 2 +- design-consultation/SKILL.md | 89 ++++++++++++++++++++++++++----- design-consultation/SKILL.md.tmpl | 72 ++++++++++++++++++++----- 5 files changed, 183 insertions(+), 54 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a4bd0786..38a4aef0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,12 @@ # Changelog -## 0.5.1 — 2026-03-17 +## 0.5.2 — 2026-03-17 +- **Your design consultant now takes creative risks.** `/design-consultation` doesn't just propose a safe, coherent system — it explicitly breaks down SAFE CHOICES (category baseline) vs. RISKS (where your product stands out). You pick which rules to break. Every risk comes with a rationale for why it works and what it costs. +- **See the competition before you choose.** When you opt into research, the agent browses competitor sites with screenshots and accessibility tree analysis — not just web search results. You see what the landscape looks like before making design decisions. +- **Preview pages that look like your product.** The preview page now renders realistic product mockups — dashboards with sidebar nav and data tables, marketing pages with hero sections, settings pages with forms — not just font swatches and color palettes. + +## 0.5.1 — 2026-03-17 - **Know where you stand before you ship.** Every `/plan-ceo-review`, `/plan-eng-review`, and `/plan-design-review` now logs its result to a review tracker. At the end of each review, you see a **Review Readiness Dashboard** showing which reviews are done, when they ran, and whether they're clean — with a clear CLEARED TO SHIP or NOT READY verdict. - **`/ship` checks your reviews before creating the PR.** Pre-flight now reads the dashboard and asks if you want to continue when reviews are missing. Informational only — it won't block you, but you'll know what you skipped. - **One less thing to copy-paste.** The SLUG computation (that opaque sed pipeline for computing `owner-repo` from git remote) is now a shared `bin/gstack-slug` helper. All 14 inline copies across templates replaced with `eval $(gstack-slug)`. If the format ever changes, fix it once. diff --git a/README.md b/README.md index de6144ed..fd4b684f 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Thirteen opinionated workflow skills for [Claude Code](https://docs.anthropic.co | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. | | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. | | `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. | -| `/design-consultation` | Design consultant | Build a complete design system from scratch. Researches competitors, proposes aesthetic + typography + color + spacing + motion, generates a preview page, and writes DESIGN.md. | +| `/design-consultation` | Design consultant | Build a complete design system from scratch. Browses competitors to get in the ballpark, proposes safe choices AND creative risks, generates realistic product mockups, and writes DESIGN.md. | | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. | | `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. | | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. | @@ -344,11 +344,15 @@ This is my **design partner mode**. `/plan-design-review` audits a site that already exists. `/design-consultation` is for when you have nothing yet — no design system, no font choices, no color palette. You are starting from zero and you want a senior designer to sit down with you and build the whole visual identity together. -It is a conversation, not a form. The agent asks about your product, your users, and your space. If you want, it researches what top competitors in your category are doing — fonts, colors, layout patterns, aesthetic direction. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice. +It is a conversation, not a form. The agent asks about your product, your users, and your audience. It thinks about what your product needs to communicate — trust, speed, craft, warmth, whatever fits — and works backward from that to concrete choices. Then it proposes a complete, coherent design system: aesthetic direction, typography (3+ fonts with specific roles), color palette with hex values, spacing scale, layout approach, and motion strategy. Every recommendation comes with a rationale. Every choice reinforces every other choice. -The key insight: individual design decisions are easy to make but hard to make coherently. Picking a font is simple. Picking a font that works with your color palette, your spacing density, your aesthetic direction, and your product's personality — that is the actual job. `/design-consultation` solves the coherence problem by proposing everything as one package, then letting you adjust any piece while it checks that the rest still holds together. +But coherence is table stakes. Every dev tool dashboard looks the same — clean sans-serif, muted grays, a blue accent. They are all coherent. They are all forgettable. The difference between a product that looks "nice" and one that people actually recognize is the **deliberate creative risks**: the unexpected serif for headings, the bold accent nobody else in your category uses, the tighter spacing that makes your data feel authoritative instead of airy. -After you agree on the system, it generates an interactive HTML preview page — your fonts rendered at real sizes, your color palette as actual UI components (buttons, cards, form inputs, alerts), light/dark mode toggle, all using your product name instead of lorem ipsum. You open the page in your browser and see the design system rendered before a single line of product code is written. +That is what `/design-consultation` is really about. It does not just propose a safe system. It proposes safe choices AND risks — and tells you which is which. "Here are the choices that keep you literate in your category. And here is where I think you should break from convention, and why." You pick which risks to take. The agent checks that the whole system still coheres either way. + +If you want, the agent will browse your competitors' actual sites — take screenshots, analyze their fonts and colors and spacing — so you can see the landscape before you make choices. This is not about copying. It is about getting in the ballpark so you know what the conventions are, and then deciding which ones are worth breaking. + +After you agree on the system, it generates an interactive HTML preview page — not just swatches and font samples, but realistic product pages. If you are building a dashboard, you see a dashboard with a sidebar, data tables, and stat cards. If you are building a marketing site, you see a hero section with real copy and a CTA. Everything rendered in your design system, with your product name, in light and dark mode. You see what your product could feel like before a single line of production code is written. Then it writes `DESIGN.md` to your repo root — your project's design source of truth — and updates `CLAUDE.md` so every future Claude Code session respects the system. From that point on, `/qa-design-review` can audit against it, and any agent working on your frontend knows the rules. @@ -356,44 +360,59 @@ Then it writes `DESIGN.md` to your repo root — your project's design source of You: /design-consultation Claude: From the README, this looks like a developer analytics dashboard - for engineering teams. Sound right? And would you like me to - research what other dev tools are doing for design, or should - I work from what I know? + for engineering teams. Sound right? Tell me about the feeling + you want — and would you like me to browse some competitors + first so we can see the landscape? -You: Yes, it's a dashboard for eng managers. Research would be great. +You: Yes, it's for eng managers. Show me what's out there. -Claude: I looked at Linear, Vercel, Railway, Planetscale, and Posthog. - They converge on clean sans-serifs and muted palettes, but - Linear stands out with its dark-first approach and tight spacing. - The gap: most of these feel interchangeable. Here's my proposal: +Claude: [browses Linear, Vercel, Datadog, Posthog — takes screenshots] - AESTHETIC: Industrial/Utilitarian — data-first, no decoration tax + Here's the landscape. They all converge on clean sans-serifs + and muted blue-gray palettes. Linear stands out with dark-first + and tight spacing. The rest feel interchangeable. + + Here's my proposal — and where I'd play it safe vs. take risks: + + AESTHETIC: Industrial/Utilitarian — data-first, every pixel earns its place TYPOGRAPHY: Geist (body + UI), Instrument Serif (display), JetBrains Mono (data) COLOR: Restrained — slate neutrals, single teal accent (#0D9488) SPACING: 4px base, compact density LAYOUT: Grid-disciplined, 12-column MOTION: Minimal-functional — only transitions that aid comprehension - This coheres because the utilitarian aesthetic demands a tight - spacing grid and restrained color. The serif display font creates - hierarchy contrast without adding visual noise. The monospace for - data tables feels native to the engineering audience. + SAFE CHOICES (match category expectations): + Geist for body — your users already know this font from Vercel + Slate neutrals — warm grays feel right for data-heavy UIs + Grid-disciplined layout — dashboards need predictable alignment - Want to adjust anything, or should I generate the preview page? + RISKS (where you'd stand out): + Instrument Serif for display headings — nobody in dev tools + uses a serif. It creates instant hierarchy contrast and + signals "we care about craft" in a space full of sans-serifs. + Single teal accent instead of blue — every competitor uses blue. + Teal is close enough to feel trustworthy, different enough + to be recognizable. + 4px compact density — tighter than Linear, says "we respect + your screen real estate." -You: Generate it. + The safe choices keep you literate. The risks are where your + product gets its own face. Which risks appeal to you? + +You: Love the serif risk. Let's see it. Claude: Preview opened in your browser. [/tmp/design-consultation-preview-1710000000.html] - The page shows Instrument Serif at hero scale, Geist in body - paragraphs, JetBrains Mono in a sample data table. Color swatches - with your teal accent against slate grays. Buttons, cards, and - alerts rendered in both light and dark mode. + The page shows a realistic dashboard: sidebar nav, header with + avatar, stat cards, data table — all in your design system. + Instrument Serif at hero scale, Geist in body text, JetBrains + Mono in the data table. Teal accent on buttons and active states. + Light and dark mode toggle in the corner. How does it feel? Want to adjust anything before I write DESIGN.md? -You: Looks great. Ship it. +You: Ship it. Claude: Wrote DESIGN.md (typography, color, spacing, layout, motion). Updated CLAUDE.md with design system guidelines. diff --git a/VERSION b/VERSION index 4b9fcbec..cb0c939a 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.5.1 +0.5.2 diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md index 0d919950..f04e2213 100644 --- a/design-consultation/SKILL.md +++ b/design-consultation/SKILL.md @@ -123,6 +123,29 @@ If brainstorm output exists, read it — the product context is pre-filled. If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."* +**Find the browse binary (optional — enables visual competitive research):** + +## SETUP (run this check BEFORE any browse command) + +```bash +_ROOT=$(git rev-parse --show-toplevel 2>/dev/null) +B="" +[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse" +[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse +if [ -x "$B" ]; then + echo "READY: $B" +else + echo "NEEDS_SETUP" +fi +``` + +If `NEEDS_SETUP`: +1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. +2. Run: `cd && ./setup` +3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` + +If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge. + --- ## Phase 1: Product Context @@ -143,17 +166,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr If the user wants competitive research: +**Step 1: Identify competitors via WebSearch** + Use WebSearch to find 5-10 products in their space. Search for: - "[product category] website design" - "[product category] best websites 2025" - "best [industry] web apps" -For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction. +**Step 2: Visual research via browse (if available)** -Summarize your findings conversationally: -> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..." +If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence: -If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."* +```bash +$B goto "https://competitor-site.com" +$B screenshot "/tmp/design-research-competitor-name.png" +$B snapshot +``` + +For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data. + +If a competitor site blocks the headless browser or requires login, skip it and note why. + +If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine. + +**Step 3: Synthesize findings** + +The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable). + +Summarize conversationally: +> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..." + +**Graceful degradation:** +- Browse available → screenshots + snapshots + WebSearch (richest research) +- Browse unavailable → WebSearch only (still good) +- WebSearch also unavailable → agent's built-in design knowledge (always works) If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge. @@ -163,7 +209,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu This is the soul of the skill. Propose EVERYTHING as one coherent package. -**AskUserQuestion Q2 — present the full proposal:** +**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:** ``` Based on [product context] and [research findings / my design knowledge]: @@ -178,12 +224,21 @@ MOTION: [approach] — [rationale] This system is coherent because [explain how choices reinforce each other]. -Want to adjust anything? You can drill into any section, or just tell me -what feels off and I'll rework it. Or if this looks right, I'll generate -a preview page so you can see the fonts and colors rendered. +SAFE CHOICES (category baseline — your users expect these): + - [2-3 decisions that match category conventions, with rationale for playing safe] + +RISKS (where your product gets its own face): + - [2-3 deliberate departures from convention] + - For each risk: what it is, why it works, what you gain, what it costs + +The safe choices keep you literate in your category. The risks are where +your product becomes memorable. Which risks appeal to you? Want to see +different ones? Or adjust anything else? ``` -**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md. +The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality. + +**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md. ### Your Design Knowledge (use to inform proposals — do NOT display as tables) @@ -273,7 +328,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci 1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `` tags 2. **Uses the proposed color palette** throughout — dogfood the design system 3. **Shows the product name** (not "Lorem Ipsum") as the hero heading -4. **Font comparison section:** +4. **Font specimen section:** - Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row) - Side-by-side comparison if multiple candidates for one role - Real content that matches the product (e.g., civic tech → government data examples) @@ -281,11 +336,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci - Swatches with hex values and names - Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info) - Background/text color combinations showing contrast -6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button -7. **Clean, professional layout** — the preview page IS a taste signal for the skill -8. **Responsive** — looks good on any screen width +6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system: + - **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards + - **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA + - **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button + - **Auth / onboarding:** login form with social buttons, branding, input validation states + - Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code. +7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button +8. **Clean, professional layout** — the preview page IS a taste signal for the skill +9. **Responsive** — looks good on any screen width -The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes. +The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names. If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."* diff --git a/design-consultation/SKILL.md.tmpl b/design-consultation/SKILL.md.tmpl index ef5044fb..0ffb121c 100644 --- a/design-consultation/SKILL.md.tmpl +++ b/design-consultation/SKILL.md.tmpl @@ -58,6 +58,12 @@ If brainstorm output exists, read it — the product context is pre-filled. If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."* +**Find the browse binary (optional — enables visual competitive research):** + +{{BROWSE_SETUP}} + +If browse is not available, that's fine — visual research is optional. The skill works without it using WebSearch and your built-in design knowledge. + --- ## Phase 1: Product Context @@ -78,17 +84,40 @@ If the README or brainstorm gives you enough context, pre-fill and confirm: *"Fr If the user wants competitive research: +**Step 1: Identify competitors via WebSearch** + Use WebSearch to find 5-10 products in their space. Search for: - "[product category] website design" - "[product category] best websites 2025" - "best [industry] web apps" -For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction. +**Step 2: Visual research via browse (if available)** -Summarize your findings conversationally: -> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..." +If the browse binary is available (`$B` is set), visit the top 3-5 competitor sites and capture visual evidence: -If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."* +```bash +$B goto "https://competitor-site.com" +$B screenshot "/tmp/design-research-competitor-name.png" +$B snapshot +``` + +For each competitor, analyze: fonts actually used, color palette, layout approach, spacing density, aesthetic direction. The screenshot gives you the feel; the snapshot gives you structural data. + +If a competitor site blocks the headless browser or requires login, skip it and note why. + +If browse is not available, rely on WebSearch results and your built-in design knowledge — this is fine. + +**Step 3: Synthesize findings** + +The goal of research is NOT to copy. It is to get in the ballpark — to understand the visual language users in this category already expect. This gives you the baseline. The interesting design work starts after you have the baseline: deciding where to follow conventions (so the product feels literate) and where to break from them (so the product is memorable). + +Summarize conversationally: +> "I looked at [competitors]. Here's the landscape: they converge on [patterns]. Most of them feel [observation — e.g., interchangeable, polished but generic, etc.]. The opportunity to stand out is [gap]. Here's where I'd play it safe and where I'd take a risk..." + +**Graceful degradation:** +- Browse available → screenshots + snapshots + WebSearch (richest research) +- Browse unavailable → WebSearch only (still good) +- WebSearch also unavailable → agent's built-in design knowledge (always works) If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge. @@ -98,7 +127,7 @@ If the user said no research, skip entirely and proceed to Phase 3 using your bu This is the soul of the skill. Propose EVERYTHING as one coherent package. -**AskUserQuestion Q2 — present the full proposal:** +**AskUserQuestion Q2 — present the full proposal with SAFE/RISK breakdown:** ``` Based on [product context] and [research findings / my design knowledge]: @@ -113,12 +142,21 @@ MOTION: [approach] — [rationale] This system is coherent because [explain how choices reinforce each other]. -Want to adjust anything? You can drill into any section, or just tell me -what feels off and I'll rework it. Or if this looks right, I'll generate -a preview page so you can see the fonts and colors rendered. +SAFE CHOICES (category baseline — your users expect these): + - [2-3 decisions that match category conventions, with rationale for playing safe] + +RISKS (where your product gets its own face): + - [2-3 deliberate departures from convention] + - For each risk: what it is, why it works, what you gain, what it costs + +The safe choices keep you literate in your category. The risks are where +your product becomes memorable. Which risks appeal to you? Want to see +different ones? Or adjust anything else? ``` -**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md. +The SAFE/RISK breakdown is critical. Design coherence is table stakes — every product in a category can be coherent and still look identical. The real question is: where do you take creative risks? The agent should always propose at least 2 risks, each with a clear rationale for why the risk is worth taking and what the user gives up. Risks might include: an unexpected typeface for the category, a bold accent color nobody else uses, tighter or looser spacing than the norm, a layout approach that breaks from convention, motion choices that add personality. + +**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) I want different risks — show me wilder options. D) Start over with a different direction. E) Skip the preview, just write DESIGN.md. ### Your Design Knowledge (use to inform proposals — do NOT display as tables) @@ -208,7 +246,7 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci 1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `` tags 2. **Uses the proposed color palette** throughout — dogfood the design system 3. **Shows the product name** (not "Lorem Ipsum") as the hero heading -4. **Font comparison section:** +4. **Font specimen section:** - Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row) - Side-by-side comparison if multiple candidates for one role - Real content that matches the product (e.g., civic tech → government data examples) @@ -216,11 +254,17 @@ The agent writes a **single, self-contained HTML file** (no framework dependenci - Swatches with hex values and names - Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info) - Background/text color combinations showing contrast -6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button -7. **Clean, professional layout** — the preview page IS a taste signal for the skill -8. **Responsive** — looks good on any screen width +6. **Realistic product mockups** — this is what makes the preview page powerful. Based on the project type from Phase 1, render 2-3 realistic page layouts using the full design system: + - **Dashboard / web app:** sample data table with metrics, sidebar nav, header with user avatar, stat cards + - **Marketing site:** hero section with real copy, feature highlights, testimonial block, CTA + - **Settings / admin:** form with labeled inputs, toggle switches, dropdowns, save button + - **Auth / onboarding:** login form with social buttons, branding, input validation states + - Use the product name, realistic content for the domain, and the proposed spacing/layout/border-radius. The user should see their product (roughly) before writing any code. +7. **Light/dark mode toggle** using CSS custom properties and a JS toggle button +8. **Clean, professional layout** — the preview page IS a taste signal for the skill +9. **Responsive** — looks good on any screen width -The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes. +The page should make the user think "oh nice, they thought of this." It's selling the design system by showing what the product could feel like, not just listing hex codes and font names. If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*