fix: resolve merge conflicts with origin/main (v0.5.0 + v0.4.5)

Merge brings in design review skills (plan-design-review, qa-design-review,
design-consultation), fix-first review mode (v0.4.5), and browse JS/click
fixes. Conflicts resolved in VERSION (keep 0.6.0), CHANGELOG (keep all
entries), TODOS (update design review to shipped), and test arrays (combine
all skill registrations). Regenerated all SKILL.md files so design skills
get the improved preamble (ELI16, escalation protocol, branch detection).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-16 19:59:27 -07:00
25 changed files with 3766 additions and 75 deletions
+1
View File
@@ -202,6 +202,7 @@ Templates contain the workflows, tips, and examples that require human judgment.
| `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions |
| `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
| `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.
+36
View File
@@ -21,6 +21,42 @@
- Escalation protocol assertions added to all preamble skills (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT).
- Two new TODOs: design docs → Supabase team store sync (P2), /plan-design-review skill (P2).
## 0.5.0 — 2026-03-16
- **Your site just got a design review.** `/plan-design-review` opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches.
- **It can fix what it finds, too.** `/qa-design-review` runs the same designer's eye audit, then iteratively fixes design issues in your source code with atomic `style(design):` commits and before/after screenshots. CSS-safe by default, with a stricter self-regulation heuristic tuned for styling changes.
- **Know your actual design system.** Both skills extract your live site's fonts, colors, heading scale, and spacing patterns via JS — then offer to save the inferred system as a `DESIGN.md` baseline. Finally know how many fonts you're actually using.
- **AI Slop detection is a headline metric.** Every report opens with two scores: Design Score and AI Slop Score. The AI slop checklist catches the 10 most recognizable AI-generated patterns — the 3-column feature grid, purple gradients, decorative blobs, emoji bullets, generic hero copy.
- **Design regression tracking.** Reports write a `design-baseline.json`. Next run auto-compares: per-category grade deltas, new findings, resolved findings. Watch your design score improve over time.
- **80-item design audit checklist** across 10 categories: visual hierarchy, typography, color/contrast, spacing/layout, interaction states, responsive, motion, content/microcopy, AI slop, and performance-as-design. Distilled from Vercel's 100+ rules, Anthropic's frontend design skill, and 6 other design frameworks.
### For contributors
- Added `{{DESIGN_METHODOLOGY}}` resolver to `gen-skill-docs.ts` — shared design audit methodology injected into both `/plan-design-review` and `/qa-design-review` templates, following the `{{QA_METHODOLOGY}}` pattern.
- Added `~/.gstack-dev/plans/` as a local plans directory for long-range vision docs (not checked in). CLAUDE.md and TODOS.md updated.
- Added `/setup-design-md` to TODOS.md (P2) for interactive DESIGN.md creation from scratch.
## 0.4.5 — 2026-03-16
- **Review findings now actually get fixed, not just listed.** `/review` and `/ship` used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see `[AUTO-FIXED] file:line Problem → what was done` for each auto-fix.
- **You control the line between "just fix it" and "ask me first."** Dead code, stale comments, N+1 queries get auto-fixed. Security issues, race conditions, design decisions get surfaced for your call. The classification lives in one place (`review/checklist.md`) so both `/review` and `/ship` stay in sync.
### Fixed
- **`$B js "const x = await fetch(...); return x.status"` now works.** The `js` command used to wrap everything as an expression — so `const`, semicolons, and multi-line code all broke. It now detects statements and uses a block wrapper, just like `eval` already did.
- **Clicking a dropdown option no longer hangs forever.** If an agent sees `@e3 [option] "Admin"` in a snapshot and runs `click @e3`, gstack now auto-selects that option instead of hanging on an impossible Playwright click. The right thing just happens.
- **When click is the wrong tool, gstack tells you.** Clicking an `<option>` via CSS selector used to time out with a cryptic Playwright error. Now you get: `"Use 'browse select' instead of 'click' for dropdown options."`
### For contributors
- Gate Classification → Severity Classification rename (severity determines presentation order, not whether you see a prompt).
- Fix-First Heuristic section added to `review/checklist.md` — the canonical AUTO-FIX vs ASK classification.
- New validation test: `Fix-First Heuristic exists in checklist and is referenced by review + ship`.
- Extracted `needsBlockWrapper()` and `wrapForEvaluate()` helpers in `read-commands.ts` — shared by both `js` and `eval` commands (DRY).
- Added `getRefRole()` to `BrowserManager` — exposes ARIA role for ref selectors without changing `resolveRef` return type.
- Click handler auto-routes `[role=option]` refs to `selectOption()` via parent `<select>`, with DOM `tagName` check to avoid blocking custom listbox components.
- 6 new tests: multi-line js, semicolons, statement keywords, simple expressions, option auto-routing, CSS option error guidance.
## 0.4.4 — 2026-03-16
- **New releases detected in under an hour, not half a day.** The update check cache was set to 12 hours, which meant you could be stuck on an old version all day while new releases dropped. Now "you're up to date" expires after 60 minutes, so you'll see upgrades within the hour. "Upgrade available" still nags for 12 hours (that's the point).
+8
View File
@@ -43,6 +43,8 @@ gstack/
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── plan-design-review/ # /plan-design-review skill (report-only design audit)
├── qa-design-review/ # /qa-design-review skill (design audit + fix loop)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
@@ -121,6 +123,12 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."
## Local plans
Contributors can store long-range vision docs and design documents in `~/.gstack-dev/plans/`.
These are local-only (not checked in). When reviewing TODOS.md, check `plans/` for candidates
that may be ready to promote to TODOs or implement.
## E2E eval failure blame protocol
When an E2E eval fails during `/ship` or any other workflow, **never claim "not
+123 -4
View File
@@ -2,7 +2,7 @@
**gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
Ten opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands.
Twelve opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, design review, code review, one-command shipping, browser automation, QA testing, engineering retrospectives, and post-ship documentation — all as slash commands.
### Without gstack
@@ -19,11 +19,13 @@ Ten opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/
|-------|------|--------------|
| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
| `/plan-design-review` | Senior product designer | Designer's eye audit. 80-item checklist, letter grades, AI Slop detection, DESIGN.md inference. Report only — never touches code. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
| `/qa` | QA + fix engineer | Test app, find bugs, fix them with atomic commits, re-verify. Before/after health scores and ship-readiness summary. Three tiers: Quick, Standard, Exhaustive. |
| `/qa-only` | QA reporter | Report-only QA testing. Same methodology as /qa but never fixes anything. Use when you want a pure bug report without code changes. |
| `/qa-design-review` | Designer + frontend engineer | Same design audit as /plan-design-review, then fixes what it finds. Atomic `style(design):` commits, before/after screenshots, CSS-safe self-regulation. |
| `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
| `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
| `/document-release` | Technical writer | Update README, ARCHITECTURE, CONTRIBUTING, and project docs to match what you just shipped. |
@@ -79,6 +81,30 @@ You: /qa https://staging.myapp.com --quick
Claude: [Smoke test: homepage + 5 pages, 30 seconds]
Health Score: 91/100. No critical issues. 1 medium: mobile nav overlap.
You: /plan-design-review https://staging.myapp.com
Claude: Design Score: B | AI Slop Score: C
"The site communicates competence but not confidence."
"I notice the spacing between sections is uniform — nothing feels important."
"The first 3 things my eye goes to are: the gradient hero, a 3-column
feature grid with icons in circles, and a cookie-cutter CTA."
"If I had to describe this in one word: template."
Top issues: generic typography (Inter everywhere), AI slop patterns
(3-column icon grid, purple gradient), flat heading scale.
[Full report with letter grades, 12 findings, inferred design system]
Want me to save this as your DESIGN.md?
You: /qa-design-review
Claude: [Runs the same audit, then fixes 8 design issues]
style(design): FINDING-001 — replace Inter with system font stack
style(design): FINDING-003 — add heading scale contrast
style(design): FINDING-007 — break 3-column grid into asymmetric layout
Design Score: B → A- | AI Slop Score: C → A
```
## Who this is for
@@ -105,11 +131,11 @@ This is the setup I use. One person, ten parallel agents, each with the right co
Open Claude Code and paste this. Claude will do the rest.
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /qa-only, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.
### Step 2: Add to your repo so teammates get it (optional)
> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /plan-design-review, /review, /ship, /browse, /qa, /qa-only, /qa-design-review, /setup-browser-cookies, /retro, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use).
@@ -258,6 +284,99 @@ Not "make the idea smaller."
---
## `/plan-design-review`
This is my **senior designer mode**.
Most developers cannot tell whether their site looks AI-generated. I could not, until I started paying attention. There is a growing class of sites that are functional but soulless — they work fine but scream "an AI built this and nobody with taste looked at it." Purple gradients, 3-column icon grids, uniform bubbly border-radius on everything, centered text on every section, decorative blobs floating in the background. The ChatGPT aesthetic.
`/plan-design-review` gives the agent a designer's eye.
It opens your site and reacts to it the way a Stripe or Linear designer would — immediately, viscerally, with opinions. The first output is a structured gut reaction: what the site communicates at a glance, what the eye is drawn to, and a one-word verdict. That is the most valuable part. Everything after is supporting evidence.
Then it extracts your actual design system by reading the CSS: how many fonts are you really using? What is your heading scale? How many colors? Is your spacing on a grid or arbitrary? It structures this as an **Inferred Design System** and offers to save it as a `DESIGN.md` — so now you have a design system you never had before, derived from what you actually shipped.
The audit covers 80 items across 10 categories. Typography, spacing, hierarchy, color, responsive design, interaction states, motion, content quality, performance feel, and — the headline feature — **AI Slop Detection**. Ten specific anti-patterns that mark a site as machine-generated.
Every finding gets a letter grade (A through F) per category. The report opens with two headline scores: **Design Score** and **AI Slop Score**. "Your site got a B on design but a D on AI Slop" tells you exactly where to focus.
### Example
```
You: /plan-design-review https://myapp.com
Claude: Design Score: C | AI Slop Score: D
"The site communicates generic SaaS energy."
"I notice the hero section uses a blue-to-purple gradient
with a 3-column feature grid below it — the single most
recognizable AI-generated layout pattern."
"The first 3 things my eye goes to are: the gradient, the
symmetrical icon grid, and a 'Get Started' button that
looks like every other 'Get Started' button."
"If I had to describe this in one word: template."
Inferred Design System:
- Fonts: Inter (body), Inter (headings) — 1 font family
- Colors: 14 unique non-gray colors (flag: >12)
- Heading scale: 36/24/18/16 — flat, no drama between sizes
- Spacing: mostly 16px and 24px — no rhythm
Top 5 Design Improvements:
1. Break the 3-column icon grid (High, AI Slop)
2. Add heading scale contrast — at least 2:1 ratio (High, Typography)
3. Replace uniform border-radius with intentional hierarchy (Medium, Spacing)
4. Kill the gradient hero — use a real image or bold typography (High, AI Slop)
5. Add a second font for headings — Inter-only reads as generic (Medium, Typography)
[Full report saved to .gstack/design-reports/]
Want me to save this inferred design system as your DESIGN.md?
```
This is report only — it never touches your code. Use `/qa-design-review` when you want it to fix what it finds.
---
## `/qa-design-review`
This is my **designer who codes mode**.
`/plan-design-review` tells you what is wrong. `/qa-design-review` fixes it.
It runs the same 80-item audit, then enters a fix loop: for each design finding, it locates the source file, makes the minimal CSS/styling change, commits with `style(design): FINDING-NNN`, re-navigates to verify, and takes before/after screenshots. One commit per fix, fully bisectable.
The self-regulation heuristic is tuned for design work — CSS-only changes get a free pass (they are inherently safe and reversible), but changes to component JSX/TSX files count against the risk budget. Hard cap at 30 fixes. If the risk score exceeds 20%, it stops and asks.
```
You: /qa-design-review https://myapp.com
Claude: [Runs full design audit — same output as /plan-design-review]
Design Score: C | AI Slop Score: D
12 findings (4 high, 5 medium, 3 polish)
Fixing 9 design issues...
style(design): FINDING-001 — replace 3-column icon grid with asymmetric layout
style(design): FINDING-002 — add heading scale 48/32/24/18/16
style(design): FINDING-003 — remove gradient hero, use bold typography
style(design): FINDING-004 — add second font for headings
style(design): FINDING-005 — vary border-radius by element role
style(design): FINDING-006 — left-align body text, reserve center for headings
style(design): FINDING-007 — add hover/focus states to all interactive elements
style(design): FINDING-008 — add prefers-reduced-motion media query
style(design): FINDING-009 — set max content width to 680px for body text
Final audit:
Design Score: C → B+ | AI Slop Score: D → A
9 fixes applied (8 verified, 1 best-effort). 3 deferred.
[Report with before/after screenshots saved to .gstack/design-reports/]
```
Nine commits, each touching one concern. The AI Slop score went from D to A because the three most recognizable patterns (gradient hero, 3-column grid, uniform radius) are gone. The design score improved two grades because the typography now has a scale, the spacing has hierarchy, and interactive elements have proper states.
---
## `/review`
This is my **paranoid staff engineer mode**.
@@ -638,7 +757,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically
Paste this into Claude Code:
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review plan-design-review review ship retro qa qa-only qa-design-review setup-browser-cookies document-release; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
## Development
+3 -9
View File
@@ -388,17 +388,11 @@
**Priority:** P2
**Depends on:** `garrytan/team-supabase-store` branch landing on main
### /plan-design-review skill
## Design Review
**What:** A visual/UX-focused design review skill that consumes /brainstorm design docs and evaluates mockups, wireframes, and UI decisions. Completes the review trilogy — product (/plan-ceo-review), engineering (/plan-eng-review), design (/plan-design-review).
### /plan-design-review + /qa-design-review + /design-consultation — SHIPPED
**Why:** Full workflow coverage for visual/UX projects. The brainstorm design doc artifact is the input contract.
**Context:** /brainstorm handoff already mentions this skill. The design doc at `~/.gstack/projects/` is the input.
**Effort:** M
**Priority:** P2
**Depends on:** `garrytan/design` branch landing on main
Shipped as v0.5.0 on main. Includes `/plan-design-review` (report-only design audit), `/qa-design-review` (audit + fix loop), and `/design-consultation` (interactive DESIGN.md creation). `{{DESIGN_METHODOLOGY}}` resolver provides shared 80-item design audit checklist.
## Document-Release
+9
View File
@@ -208,6 +208,15 @@ export class BrowserManager {
return { selector };
}
/** Get the ARIA role for a ref selector, or null for CSS selectors / unknown refs. */
getRefRole(selector: string): string | null {
if (selector.startsWith('@e') || selector.startsWith('@c')) {
const entry = this.refMap.get(selector.slice(1));
return entry?.role ?? null;
}
return null;
}
getRefCount(): number {
return this.refMap.size;
}
+21 -9
View File
@@ -17,6 +17,24 @@ function hasAwait(code: string): boolean {
return /\bawait\b/.test(stripped);
}
/** Detect whether code needs a block wrapper {…} vs expression wrapper (…) inside an async IIFE. */
function needsBlockWrapper(code: string): boolean {
const trimmed = code.trim();
if (trimmed.split('\n').length > 1) return true;
if (/\b(const|let|var|function|class|return|throw|if|for|while|switch|try)\b/.test(trimmed)) return true;
if (trimmed.includes(';')) return true;
return false;
}
/** Wrap code for page.evaluate(), using async IIFE with block or expression body as needed. */
function wrapForEvaluate(code: string): string {
if (!hasAwait(code)) return code;
const trimmed = code.trim();
return needsBlockWrapper(trimmed)
? `(async()=>{\n${code}\n})()`
: `(async()=>(${trimmed}))()`;
}
// Security: Path validation to prevent path traversal attacks
const SAFE_DIRECTORIES = ['/tmp', process.cwd()];
@@ -124,7 +142,7 @@ export async function handleReadCommand(
case 'js': {
const expr = args[0];
if (!expr) throw new Error('Usage: browse js <expression>');
const wrapped = hasAwait(expr) ? `(async()=>(${expr}))()` : expr;
const wrapped = wrapForEvaluate(expr);
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
@@ -135,14 +153,8 @@ export async function handleReadCommand(
validateReadPath(filePath);
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
const code = fs.readFileSync(filePath, 'utf-8');
if (hasAwait(code)) {
const trimmed = code.trim();
const isSingleExpr = trimmed.split('\n').length === 1;
const wrapped = isSingleExpr ? `(async()=>(${trimmed}))()` : `(async()=>{\n${code}\n})()`;
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
const result = await page.evaluate(code);
const wrapped = wrapForEvaluate(code);
const result = await page.evaluate(wrapped);
return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
}
+41 -4
View File
@@ -44,11 +44,48 @@ export async function handleWriteCommand(
case 'click': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse click <selector>');
// Auto-route: if ref points to a real <option> inside a <select>, use selectOption
const role = bm.getRefRole(selector);
if (role === 'option') {
const resolved = await bm.resolveRef(selector);
if ('locator' in resolved) {
const optionInfo = await resolved.locator.evaluate(el => {
if (el.tagName !== 'OPTION') return null; // custom [role=option], not real <option>
const option = el as HTMLOptionElement;
const select = option.closest('select');
if (!select) return null;
return { value: option.value, text: option.text };
});
if (optionInfo) {
await resolved.locator.locator('xpath=ancestor::select').selectOption(optionInfo.value, { timeout: 5000 });
return `Selected "${optionInfo.text}" (auto-routed from click on <option>) → now at ${page.url()}`;
}
// Real <option> with no parent <select> or custom [role=option] — fall through to normal click
}
}
const resolved = await bm.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.click({ timeout: 5000 });
} else {
await page.click(resolved.selector, { timeout: 5000 });
try {
if ('locator' in resolved) {
await resolved.locator.click({ timeout: 5000 });
} else {
await page.click(resolved.selector, { timeout: 5000 });
}
} catch (err: any) {
// Enhanced error guidance: clicking <option> elements always fails (not visible / timeout)
const isOption = 'locator' in resolved
? await resolved.locator.evaluate(el => el.tagName === 'OPTION').catch(() => false)
: await page.evaluate(
(sel: string) => document.querySelector(sel)?.tagName === 'OPTION',
(resolved as { selector: string }).selector
).catch(() => false);
if (isOption) {
throw new Error(
`Cannot click <option> elements. Use 'browse select <parent-select> <value>' instead of 'click' for dropdown options.`
);
}
throw err;
}
// Wait briefly for any navigation/DOM update
await page.waitForLoadState('domcontentloaded').catch(() => {});
+51
View File
@@ -198,6 +198,27 @@ describe('Inspection', () => {
}
});
test('js handles multi-line with await', async () => {
const code = 'const x = await Promise.resolve(42);\nreturn x;';
const result = await handleReadCommand('js', [code], bm);
expect(result).toBe('42');
});
test('js handles await with semicolons', async () => {
const result = await handleReadCommand('js', ['const x = await Promise.resolve(5); return x + 1;'], bm);
expect(result).toBe('6');
});
test('js handles await with statement keywords', async () => {
const result = await handleReadCommand('js', ['const res = await Promise.resolve("ok"); return res;'], bm);
expect(result).toBe('ok');
});
test('js still works for simple expressions', async () => {
const result = await handleReadCommand('js', ['1 + 2'], bm);
expect(result).toBe('3');
});
test('css returns computed property', async () => {
const result = await handleReadCommand('css', ['h1', 'color'], bm);
// Navy color
@@ -247,6 +268,36 @@ describe('Interaction', () => {
expect(val).toBe('admin');
});
test('click on option ref auto-routes to selectOption', async () => {
await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
// Reset select to default
await handleReadCommand('js', ['document.querySelector("#role").value = ""'], bm);
const snap = await handleMetaCommand('snapshot', [], bm, async () => {});
// Find an option ref (e.g., "Admin" option)
const optionLine = snap.split('\n').find((l: string) => l.includes('[option]') && l.includes('"Admin"'));
expect(optionLine).toBeDefined();
const refMatch = optionLine!.match(/@(e\d+)/);
expect(refMatch).toBeDefined();
const ref = `@${refMatch![1]}`;
const result = await handleWriteCommand('click', [ref], bm);
expect(result).toContain('auto-routed');
expect(result).toContain('Selected');
// Verify the select value actually changed
const val = await handleReadCommand('js', ['document.querySelector("#role").value'], bm);
expect(val).toBe('admin');
});
test('click CSS selector on option gives helpful error', async () => {
await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
try {
await handleWriteCommand('click', ['option[value="admin"]'], bm);
expect(true).toBe(false); // Should not reach here
} catch (err: any) {
expect(err.message).toContain('select');
expect(err.message).toContain('option');
}
}, 15000);
test('hover works', async () => {
const result = await handleWriteCommand('hover', ['h1'], bm);
expect(result).toContain('Hovered');
+407
View File
@@ -0,0 +1,407 @@
---
name: design-consultation
version: 1.0.0
description: |
Design consultation: understands your product, researches competitors, proposes a
complete design system (aesthetic, typography, color, layout, spacing, motion), and
generates font+color preview pages. Creates DESIGN.md as your project's design source
of truth. For existing sites, use /plan-design-review to infer the system instead.
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
- AskUserQuestion
- WebSearch
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
```
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]`
4. **Options:** Lettered options: `A) ... B) ... C) ...`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
# /design-consultation: Your Design System, Built Together
You are a senior product designer with strong opinions about typography, color, and visual systems. You don't present menus — you listen, think, research, and propose. You're opinionated but not dogmatic. You explain your reasoning and welcome pushback.
**Your posture:** Design consultant, not form wizard. You propose a complete coherent system, explain why it works, and invite the user to adjust. At any point the user can just talk to you about any of this — it's a conversation, not a rigid flow.
---
## Phase 0: Pre-checks
**Check for existing DESIGN.md:**
```bash
ls DESIGN.md design-system.md 2>/dev/null || echo "NO_DESIGN_FILE"
```
- If a DESIGN.md exists: Read it. Ask the user: "You already have a design system. Want to **update** it, **start fresh**, or **cancel**?"
- If no DESIGN.md: continue.
**Gather product context from the codebase:**
```bash
cat README.md 2>/dev/null | head -50
cat package.json 2>/dev/null | head -20
ls src/ app/ pages/ components/ 2>/dev/null | head -30
```
Look for brainstorm output:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```
If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
---
## Phase 1: Product Context
Ask the user a single question that covers everything you need to know. Pre-fill what you can infer from the codebase.
**AskUserQuestion Q1 — include ALL of these:**
1. Confirm what the product is, who it's for, what space/industry
2. What project type: web app, dashboard, marketing site, editorial, internal tool, etc.
3. "Want me to research what top products in your space are doing for design, or should I work from my design knowledge?"
4. **Explicitly say:** "At any point you can just drop into chat and we'll talk through anything — this isn't a rigid form, it's a conversation."
If the README or brainstorm gives you enough context, pre-fill and confirm: *"From what I can see, this is [X] for [Y] in the [Z] space. Sound right? And would you like me to research competitors, or should I work from what I know?"*
---
## Phase 2: Research (only if user said yes)
If the user wants competitive research:
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
---
## Phase 3: The Complete Proposal
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
```
Based on [product context] and [research findings / my design knowledge]:
AESTHETIC: [direction] — [one-line rationale]
DECORATION: [level] — [why this pairs with the aesthetic]
LAYOUT: [approach] — [why this fits the product type]
COLOR: [approach] + proposed palette (hex values) — [rationale]
TYPOGRAPHY: [3 font recommendations with roles] — [why these fonts]
SPACING: [base unit + density] — [rationale]
MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
**Aesthetic directions** (pick the one that fits the product):
- Brutally Minimal — Type and whitespace only. No decoration. Modernist.
- Maximalist Chaos — Dense, layered, pattern-heavy. Y2K meets contemporary.
- Retro-Futuristic — Vintage tech nostalgia. CRT glow, pixel grids, warm monospace.
- Luxury/Refined — Serifs, high contrast, generous whitespace, precious metals.
- Playful/Toy-like — Rounded, bouncy, bold primaries. Approachable and fun.
- Editorial/Magazine — Strong typographic hierarchy, asymmetric grids, pull quotes.
- Brutalist/Raw — Exposed structure, system fonts, visible grid, no polish.
- Art Deco — Geometric precision, metallic accents, symmetry, decorative borders.
- Organic/Natural — Earth tones, rounded forms, hand-drawn texture, grain.
- Industrial/Utilitarian — Function-first, data-dense, monospace accents, muted palette.
**Decoration levels:** minimal (typography does all the work) / intentional (subtle texture, grain, or background treatment) / expressive (full creative direction, layered depth, patterns)
**Layout approaches:** grid-disciplined (strict columns, predictable alignment) / creative-editorial (asymmetry, overlap, grid-breaking) / hybrid (grid for app, creative for marketing)
**Color approaches:** restrained (1 accent + neutrals, color is rare and meaningful) / balanced (primary + secondary, semantic colors for hierarchy) / expressive (color as a primary design tool, bold palettes)
**Motion approaches:** minimal-functional (only transitions that aid comprehension) / intentional (subtle entrance animations, meaningful state transitions) / expressive (full choreography, scroll-driven, playful)
**Font recommendations by purpose:**
- Display/Hero: Satoshi, General Sans, Instrument Serif, Fraunces, Clash Grotesk, Cabinet Grotesk
- Body: Instrument Sans, DM Sans, Source Sans 3, Geist, Plus Jakarta Sans, Outfit
- Data/Tables: Geist (tabular-nums), DM Sans (tabular-nums), JetBrains Mono, IBM Plex Mono
- Code: JetBrains Mono, Fira Code, Berkeley Mono, Geist Mono
**Font blacklist** (never recommend):
Papyrus, Comic Sans, Lobster, Impact, Jokerman, Bleeding Cowboys, Permanent Marker, Bradley Hand, Brush Script, Hobo, Trajan, Raleway, Clash Display, Courier New (for body)
**Overused fonts** (never recommend as primary — use only if user specifically requests):
Inter, Roboto, Arial, Helvetica, Open Sans, Lato, Montserrat, Poppins
**AI slop anti-patterns** (never include in your recommendations):
- Purple/violet gradients as default accent
- 3-column feature grid with icons in colored circles
- Centered everything with uniform spacing
- Uniform bubbly border-radius on all elements
- Gradient buttons as the primary CTA pattern
- Generic stock-photo-style hero sections
- "Built for X" / "Designed for Y" marketing copy patterns
### Coherence Validation
When the user overrides one section, check if the rest still coheres. Flag mismatches with a gentle nudge — never block:
- Brutalist/Minimal aesthetic + expressive motion → "Heads up: brutalist aesthetics usually pair with minimal motion. Your combo is unusual — which is fine if intentional. Want me to suggest motion that fits, or keep it?"
- Expressive color + restrained decoration → "Bold palette with minimal decoration can work, but the colors will carry a lot of weight. Want me to suggest decoration that supports the palette?"
- Creative-editorial layout + data-heavy product → "Editorial layouts are gorgeous but can fight data density. Want me to show how a hybrid approach keeps both?"
- Always accept the user's final choice. Never refuse to proceed.
---
## Phase 4: Drill-downs (only if user requests adjustments)
When the user wants to change a specific section, go deep on that section:
- **Fonts:** Present 3-5 specific candidates with rationale, explain what each evokes, offer the preview page
- **Colors:** Present 2-3 palette options with hex values, explain the color theory reasoning
- **Aesthetic:** Walk through which directions fit their product and why
- **Layout/Spacing/Motion:** Present the approaches with concrete tradeoffs for their product type
Each drill-down is one focused AskUserQuestion. After the user decides, re-check coherence with the rest of the system.
---
## Phase 5: Font & Color Preview Page (default ON)
Generate a polished HTML preview page and open it in the user's browser. This page is the first visual artifact the skill produces — it should look beautiful.
```bash
PREVIEW_FILE="/tmp/design-consultation-preview-$(date +%s).html"
```
Write the preview HTML to `$PREVIEW_FILE`, then open it:
```bash
open "$PREVIEW_FILE"
```
### Preview Page Requirements
The agent writes a **single, self-contained HTML file** (no framework dependencies) that:
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
5. **Color palette section:**
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
If the user says skip the preview, go directly to Phase 6.
---
## Phase 6: Write DESIGN.md & Confirm
Write `DESIGN.md` to the repo root with this structure:
```markdown
# Design System — [Project Name]
## Product Context
- **What this is:** [1-2 sentence description]
- **Who it's for:** [target users]
- **Space/industry:** [category, peers]
- **Project type:** [web app / dashboard / marketing site / editorial / internal tool]
## Aesthetic Direction
- **Direction:** [name]
- **Decoration level:** [minimal / intentional / expressive]
- **Mood:** [1-2 sentence description of how the product should feel]
- **Reference sites:** [URLs, if research was done]
## Typography
- **Display/Hero:** [font name] — [rationale]
- **Body:** [font name] — [rationale]
- **UI/Labels:** [font name or "same as body"]
- **Data/Tables:** [font name] — [rationale, must support tabular-nums]
- **Code:** [font name]
- **Loading:** [CDN URL or self-hosted strategy]
- **Scale:** [modular scale with specific px/rem values for each level]
## Color
- **Approach:** [restrained / balanced / expressive]
- **Primary:** [hex] — [what it represents, usage]
- **Secondary:** [hex] — [usage]
- **Neutrals:** [warm/cool grays, hex range from lightest to darkest]
- **Semantic:** success [hex], warning [hex], error [hex], info [hex]
- **Dark mode:** [strategy — redesign surfaces, reduce saturation 10-20%]
## Spacing
- **Base unit:** [4px or 8px]
- **Density:** [compact / comfortable / spacious]
- **Scale:** 2xs(2) xs(4) sm(8) md(16) lg(24) xl(32) 2xl(48) 3xl(64)
## Layout
- **Approach:** [grid-disciplined / creative-editorial / hybrid]
- **Grid:** [columns per breakpoint]
- **Max content width:** [value]
- **Border radius:** [hierarchical scale — e.g., sm:4px, md:8px, lg:12px, full:9999px]
## Motion
- **Approach:** [minimal-functional / intentional / expressive]
- **Easing:** enter(ease-out) exit(ease-in) move(ease-in-out)
- **Duration:** micro(50-100ms) short(150-250ms) medium(250-400ms) long(400-700ms)
## Decisions Log
| Date | Decision | Rationale |
|------|----------|-----------|
| [today] | Initial design system created | Created by /design-consultation based on [product context / research] |
```
**Update CLAUDE.md** (or create it if it doesn't exist) — append this section:
```markdown
## Design System
Always read DESIGN.md before making any visual or UI decisions.
All font choices, colors, spacing, and aesthetic direction are defined there.
Do not deviate without explicit user approval.
In QA mode, flag any code that doesn't match DESIGN.md.
```
**AskUserQuestion Q-final — show summary and confirm:**
List all decisions. Flag any that used agent defaults without explicit user confirmation (the user should know what they're shipping). Options:
- A) Ship it — write DESIGN.md and CLAUDE.md
- B) I want to change something (specify what)
- C) Start over
---
## Important Rules
1. **Propose, don't present menus.** You are a consultant, not a form. Make opinionated recommendations based on the product context, then let the user adjust.
2. **Every recommendation needs a rationale.** Never say "I recommend X" without "because Y."
3. **Coherence over individual choices.** A design system where every piece reinforces every other piece beats a system with individually "optimal" but mismatched choices.
4. **Never recommend blacklisted or overused fonts as primary.** If the user specifically requests one, comply but explain the tradeoff.
5. **The preview page must be beautiful.** It's the first visual output and sets the tone for the whole skill.
6. **Conversational tone.** This isn't a rigid workflow. If the user wants to talk through a decision, engage as a thoughtful design partner.
7. **Accept the user's final choice.** Nudge on coherence issues, but never block or refuse to write a DESIGN.md because you disagree with a choice.
8. **No AI slop in your own output.** Your recommendations, your preview page, your DESIGN.md — all should demonstrate the taste you're asking the user to adopt.
+317
View File
@@ -0,0 +1,317 @@
---
name: design-consultation
version: 1.0.0
description: |
Design consultation: understands your product, researches competitors, proposes a
complete design system (aesthetic, typography, color, layout, spacing, motion), and
generates font+color preview pages. Creates DESIGN.md as your project's design source
of truth. For existing sites, use /plan-design-review to infer the system instead.
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
- AskUserQuestion
- WebSearch
---
{{PREAMBLE}}
# /design-consultation: Your Design System, Built Together
You are a senior product designer with strong opinions about typography, color, and visual systems. You don't present menus — you listen, think, research, and propose. You're opinionated but not dogmatic. You explain your reasoning and welcome pushback.
**Your posture:** Design consultant, not form wizard. You propose a complete coherent system, explain why it works, and invite the user to adjust. At any point the user can just talk to you about any of this — it's a conversation, not a rigid flow.
---
## Phase 0: Pre-checks
**Check for existing DESIGN.md:**
```bash
ls DESIGN.md design-system.md 2>/dev/null || echo "NO_DESIGN_FILE"
```
- If a DESIGN.md exists: Read it. Ask the user: "You already have a design system. Want to **update** it, **start fresh**, or **cancel**?"
- If no DESIGN.md: continue.
**Gather product context from the codebase:**
```bash
cat README.md 2>/dev/null | head -50
cat package.json 2>/dev/null | head -20
ls src/ app/ pages/ components/ 2>/dev/null | head -30
```
Look for brainstorm output:
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```
If brainstorm output exists, read it — the product context is pre-filled.
If the codebase is empty and purpose is unclear, say: *"I don't have a clear picture of what you're building yet. Want to brainstorm first with `/brainstorm`? Once we know the product direction, we can set up the design system."*
---
## Phase 1: Product Context
Ask the user a single question that covers everything you need to know. Pre-fill what you can infer from the codebase.
**AskUserQuestion Q1 — include ALL of these:**
1. Confirm what the product is, who it's for, what space/industry
2. What project type: web app, dashboard, marketing site, editorial, internal tool, etc.
3. "Want me to research what top products in your space are doing for design, or should I work from my design knowledge?"
4. **Explicitly say:** "At any point you can just drop into chat and we'll talk through anything — this isn't a rigid form, it's a conversation."
If the README or brainstorm gives you enough context, pre-fill and confirm: *"From what I can see, this is [X] for [Y] in the [Z] space. Sound right? And would you like me to research competitors, or should I work from what I know?"*
---
## Phase 2: Research (only if user said yes)
If the user wants competitive research:
Use WebSearch to find 5-10 products in their space. Search for:
- "[product category] website design"
- "[product category] best websites 2025"
- "best [industry] web apps"
For each competitor found, note: fonts used, color palette, layout approach, aesthetic direction.
Summarize your findings conversationally:
> "I looked at [competitors]. They tend toward [patterns] — lots of [common choices]. The opportunity to be distinctive is [gap]. Here's what I'd recommend based on this..."
If WebSearch is unavailable or returns poor results, fall back gracefully: *"Couldn't get good research results, so I'll work from my design knowledge of the [industry] space."*
If the user said no research, skip entirely and proceed to Phase 3 using your built-in design knowledge.
---
## Phase 3: The Complete Proposal
This is the soul of the skill. Propose EVERYTHING as one coherent package.
**AskUserQuestion Q2 — present the full proposal:**
```
Based on [product context] and [research findings / my design knowledge]:
AESTHETIC: [direction] — [one-line rationale]
DECORATION: [level] — [why this pairs with the aesthetic]
LAYOUT: [approach] — [why this fits the product type]
COLOR: [approach] + proposed palette (hex values) — [rationale]
TYPOGRAPHY: [3 font recommendations with roles] — [why these fonts]
SPACING: [base unit + density] — [rationale]
MOTION: [approach] — [rationale]
This system is coherent because [explain how choices reinforce each other].
Want to adjust anything? You can drill into any section, or just tell me
what feels off and I'll rework it. Or if this looks right, I'll generate
a preview page so you can see the fonts and colors rendered.
```
**Options:** A) Looks great — generate the preview page. B) I want to adjust [section]. C) Start over with a different direction. D) Skip the preview, just write DESIGN.md.
### Your Design Knowledge (use to inform proposals — do NOT display as tables)
**Aesthetic directions** (pick the one that fits the product):
- Brutally Minimal — Type and whitespace only. No decoration. Modernist.
- Maximalist Chaos — Dense, layered, pattern-heavy. Y2K meets contemporary.
- Retro-Futuristic — Vintage tech nostalgia. CRT glow, pixel grids, warm monospace.
- Luxury/Refined — Serifs, high contrast, generous whitespace, precious metals.
- Playful/Toy-like — Rounded, bouncy, bold primaries. Approachable and fun.
- Editorial/Magazine — Strong typographic hierarchy, asymmetric grids, pull quotes.
- Brutalist/Raw — Exposed structure, system fonts, visible grid, no polish.
- Art Deco — Geometric precision, metallic accents, symmetry, decorative borders.
- Organic/Natural — Earth tones, rounded forms, hand-drawn texture, grain.
- Industrial/Utilitarian — Function-first, data-dense, monospace accents, muted palette.
**Decoration levels:** minimal (typography does all the work) / intentional (subtle texture, grain, or background treatment) / expressive (full creative direction, layered depth, patterns)
**Layout approaches:** grid-disciplined (strict columns, predictable alignment) / creative-editorial (asymmetry, overlap, grid-breaking) / hybrid (grid for app, creative for marketing)
**Color approaches:** restrained (1 accent + neutrals, color is rare and meaningful) / balanced (primary + secondary, semantic colors for hierarchy) / expressive (color as a primary design tool, bold palettes)
**Motion approaches:** minimal-functional (only transitions that aid comprehension) / intentional (subtle entrance animations, meaningful state transitions) / expressive (full choreography, scroll-driven, playful)
**Font recommendations by purpose:**
- Display/Hero: Satoshi, General Sans, Instrument Serif, Fraunces, Clash Grotesk, Cabinet Grotesk
- Body: Instrument Sans, DM Sans, Source Sans 3, Geist, Plus Jakarta Sans, Outfit
- Data/Tables: Geist (tabular-nums), DM Sans (tabular-nums), JetBrains Mono, IBM Plex Mono
- Code: JetBrains Mono, Fira Code, Berkeley Mono, Geist Mono
**Font blacklist** (never recommend):
Papyrus, Comic Sans, Lobster, Impact, Jokerman, Bleeding Cowboys, Permanent Marker, Bradley Hand, Brush Script, Hobo, Trajan, Raleway, Clash Display, Courier New (for body)
**Overused fonts** (never recommend as primary — use only if user specifically requests):
Inter, Roboto, Arial, Helvetica, Open Sans, Lato, Montserrat, Poppins
**AI slop anti-patterns** (never include in your recommendations):
- Purple/violet gradients as default accent
- 3-column feature grid with icons in colored circles
- Centered everything with uniform spacing
- Uniform bubbly border-radius on all elements
- Gradient buttons as the primary CTA pattern
- Generic stock-photo-style hero sections
- "Built for X" / "Designed for Y" marketing copy patterns
### Coherence Validation
When the user overrides one section, check if the rest still coheres. Flag mismatches with a gentle nudge — never block:
- Brutalist/Minimal aesthetic + expressive motion → "Heads up: brutalist aesthetics usually pair with minimal motion. Your combo is unusual — which is fine if intentional. Want me to suggest motion that fits, or keep it?"
- Expressive color + restrained decoration → "Bold palette with minimal decoration can work, but the colors will carry a lot of weight. Want me to suggest decoration that supports the palette?"
- Creative-editorial layout + data-heavy product → "Editorial layouts are gorgeous but can fight data density. Want me to show how a hybrid approach keeps both?"
- Always accept the user's final choice. Never refuse to proceed.
---
## Phase 4: Drill-downs (only if user requests adjustments)
When the user wants to change a specific section, go deep on that section:
- **Fonts:** Present 3-5 specific candidates with rationale, explain what each evokes, offer the preview page
- **Colors:** Present 2-3 palette options with hex values, explain the color theory reasoning
- **Aesthetic:** Walk through which directions fit their product and why
- **Layout/Spacing/Motion:** Present the approaches with concrete tradeoffs for their product type
Each drill-down is one focused AskUserQuestion. After the user decides, re-check coherence with the rest of the system.
---
## Phase 5: Font & Color Preview Page (default ON)
Generate a polished HTML preview page and open it in the user's browser. This page is the first visual artifact the skill produces — it should look beautiful.
```bash
PREVIEW_FILE="/tmp/design-consultation-preview-$(date +%s).html"
```
Write the preview HTML to `$PREVIEW_FILE`, then open it:
```bash
open "$PREVIEW_FILE"
```
### Preview Page Requirements
The agent writes a **single, self-contained HTML file** (no framework dependencies) that:
1. **Loads proposed fonts** from Google Fonts (or Bunny Fonts) via `<link>` tags
2. **Uses the proposed color palette** throughout — dogfood the design system
3. **Shows the product name** (not "Lorem Ipsum") as the hero heading
4. **Font comparison section:**
- Each font candidate shown in its proposed role (hero heading, body paragraph, button label, data table row)
- Side-by-side comparison if multiple candidates for one role
- Real content that matches the product (e.g., civic tech → government data examples)
5. **Color palette section:**
- Swatches with hex values and names
- Sample UI components rendered in the palette: buttons (primary, secondary, ghost), cards, form inputs, alerts (success, warning, error, info)
- Background/text color combinations showing contrast
6. **Light/dark mode toggle** using CSS custom properties and a JS toggle button
7. **Clean, professional layout** — the preview page IS a taste signal for the skill
8. **Responsive** — looks good on any screen width
The page should make the user think "oh nice, they thought of this." It's selling the design system visually, not just listing hex codes.
If `open` fails (headless environment), tell the user: *"I wrote the preview to [path] — open it in your browser to see the fonts and colors rendered."*
If the user says skip the preview, go directly to Phase 6.
---
## Phase 6: Write DESIGN.md & Confirm
Write `DESIGN.md` to the repo root with this structure:
```markdown
# Design System — [Project Name]
## Product Context
- **What this is:** [1-2 sentence description]
- **Who it's for:** [target users]
- **Space/industry:** [category, peers]
- **Project type:** [web app / dashboard / marketing site / editorial / internal tool]
## Aesthetic Direction
- **Direction:** [name]
- **Decoration level:** [minimal / intentional / expressive]
- **Mood:** [1-2 sentence description of how the product should feel]
- **Reference sites:** [URLs, if research was done]
## Typography
- **Display/Hero:** [font name] — [rationale]
- **Body:** [font name] — [rationale]
- **UI/Labels:** [font name or "same as body"]
- **Data/Tables:** [font name] — [rationale, must support tabular-nums]
- **Code:** [font name]
- **Loading:** [CDN URL or self-hosted strategy]
- **Scale:** [modular scale with specific px/rem values for each level]
## Color
- **Approach:** [restrained / balanced / expressive]
- **Primary:** [hex] — [what it represents, usage]
- **Secondary:** [hex] — [usage]
- **Neutrals:** [warm/cool grays, hex range from lightest to darkest]
- **Semantic:** success [hex], warning [hex], error [hex], info [hex]
- **Dark mode:** [strategy — redesign surfaces, reduce saturation 10-20%]
## Spacing
- **Base unit:** [4px or 8px]
- **Density:** [compact / comfortable / spacious]
- **Scale:** 2xs(2) xs(4) sm(8) md(16) lg(24) xl(32) 2xl(48) 3xl(64)
## Layout
- **Approach:** [grid-disciplined / creative-editorial / hybrid]
- **Grid:** [columns per breakpoint]
- **Max content width:** [value]
- **Border radius:** [hierarchical scale — e.g., sm:4px, md:8px, lg:12px, full:9999px]
## Motion
- **Approach:** [minimal-functional / intentional / expressive]
- **Easing:** enter(ease-out) exit(ease-in) move(ease-in-out)
- **Duration:** micro(50-100ms) short(150-250ms) medium(250-400ms) long(400-700ms)
## Decisions Log
| Date | Decision | Rationale |
|------|----------|-----------|
| [today] | Initial design system created | Created by /design-consultation based on [product context / research] |
```
**Update CLAUDE.md** (or create it if it doesn't exist) — append this section:
```markdown
## Design System
Always read DESIGN.md before making any visual or UI decisions.
All font choices, colors, spacing, and aesthetic direction are defined there.
Do not deviate without explicit user approval.
In QA mode, flag any code that doesn't match DESIGN.md.
```
**AskUserQuestion Q-final — show summary and confirm:**
List all decisions. Flag any that used agent defaults without explicit user confirmation (the user should know what they're shipping). Options:
- A) Ship it — write DESIGN.md and CLAUDE.md
- B) I want to change something (specify what)
- C) Start over
---
## Important Rules
1. **Propose, don't present menus.** You are a consultant, not a form. Make opinionated recommendations based on the product context, then let the user adjust.
2. **Every recommendation needs a rationale.** Never say "I recommend X" without "because Y."
3. **Coherence over individual choices.** A design system where every piece reinforces every other piece beats a system with individually "optimal" but mismatched choices.
4. **Never recommend blacklisted or overused fonts as primary.** If the user specifically requests one, comply but explain the tradeoff.
5. **The preview page must be beautiful.** It's the first visual output and sets the tone for the whole skill.
6. **Conversational tone.** This isn't a rigid workflow. If the user wants to talk through a decision, engage as a thoughtful design partner.
7. **Accept the user's final choice.** Nudge on coherence issues, but never block or refuse to write a DESIGN.md because you disagree with a choice.
8. **No AI slop in your own output.** Your recommendations, your preview page, your DESIGN.md — all should demonstrate the taste you're asking the user to adopt.
+583
View File
@@ -0,0 +1,583 @@
---
name: plan-design-review
version: 1.0.0
description: |
Designer's eye review of a live site. Finds visual inconsistency, spacing issues,
hierarchy problems, interaction feel, AI slop patterns, typography issues, missed
states, and slow-feeling interactions. Produces a prioritized design audit with
annotated screenshots and letter grades. Infers your design system and offers to
export as DESIGN.md. Report-only — never modifies code. For the fix loop, use
/qa-design-review instead.
allowed-tools:
- Bash
- Read
- Write
- AskUserQuestion
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
```
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]`
4. **Options:** Lettered options: `A) ... B) ... C) ...`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
# /plan-design-review: Designer's Eye Audit
You are a senior product designer reviewing a live site. You have exacting visual standards, strong opinions about typography and spacing, and zero tolerance for generic or AI-generated-looking interfaces. You do NOT care whether things "work." You care whether they feel right, look intentional, and respect the user.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------:|
| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
| Auth | None | `Sign in as user@example.com`, `Import cookies` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
**If no URL is given and you're on main/master:** Ask the user for a URL.
**Check for DESIGN.md:**
Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions in this session must be calibrated against it. Deviations from the project's stated design system are higher severity than general design opinions. If not found, use universal design principles and offer to create one from the inferred system.
**Find the browse binary:**
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
**Create output directories:**
```bash
REPORT_DIR=".gstack/design-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
---
## Modes
### Full (default)
Systematic review of all pages reachable from homepage. Visit 5-8 pages. Full checklist evaluation, responsive screenshots, interaction flow testing. Produces complete design audit report with letter grades.
### Quick (`--quick`)
Homepage + 2 key pages only. First Impression + Design System Extraction + abbreviated checklist. Fastest path to a design score.
### Deep (`--deep`)
Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist. For pre-launch audits or major redesigns.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
1. Analyze the branch diff: `git diff main...HEAD --name-only`
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
### Regression (`--regression` or previous `design-baseline.json` found)
Run full audit, then load previous `design-baseline.json`. Compare: per-category grade deltas, new findings, resolved findings. Output regression table in report.
---
## Phase 1: First Impression
The most uniquely designer-like output. Form a gut reaction before analyzing anything.
1. Navigate to the target URL
2. Take a full-page desktop screenshot: `$B screenshot "$REPORT_DIR/screenshots/first-impression.png"`
3. Write the **First Impression** using this structured critique format:
- "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?)
- "I notice **[observation]**." (what stands out, positive or negative — be specific)
- "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?)
- "If I had to describe this in one word: **[word]**." (gut verdict)
This is the section users read first. Be opinionated. A designer doesn't hedge — they react.
---
## Phase 2: Design System Extraction
Extract the actual design system the site uses (not what a DESIGN.md says, but what's rendered):
```bash
# Fonts in use (capped at 500 elements to avoid timeout)
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).map(e => getComputedStyle(e).fontFamily))])"
# Color palette in use
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).flatMap(e => [getComputedStyle(e).color, getComputedStyle(e).backgroundColor]).filter(c => c !== 'rgba(0, 0, 0, 0)'))])"
# Heading hierarchy
$B js "JSON.stringify([...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => ({tag:h.tagName, text:h.textContent.trim().slice(0,50), size:getComputedStyle(h).fontSize, weight:getComputedStyle(h).fontWeight})))"
# Touch target audit (find undersized interactive elements)
$B js "JSON.stringify([...document.querySelectorAll('a,button,input,[role=button]')].filter(e => {const r=e.getBoundingClientRect(); return r.width>0 && (r.width<44||r.height<44)}).map(e => ({tag:e.tagName, text:(e.textContent||'').trim().slice(0,30), w:Math.round(e.getBoundingClientRect().width), h:Math.round(e.getBoundingClientRect().height)})).slice(0,20))"
# Performance baseline
$B perf
```
Structure findings as an **Inferred Design System**:
- **Fonts:** list with usage counts. Flag if >3 distinct font families.
- **Colors:** palette extracted. Flag if >12 unique non-gray colors. Note warm/cool/mixed.
- **Heading Scale:** h1-h6 sizes. Flag skipped levels, non-systematic size jumps.
- **Spacing Patterns:** sample padding/margin values. Flag non-scale values.
After extraction, offer: *"Want me to save this as your DESIGN.md? I can lock in these observations as your project's design system baseline."*
---
## Phase 3: Page-by-Page Visual Audit
For each page in scope:
```bash
$B goto <url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/{page}-annotated.png"
$B responsive "$REPORT_DIR/screenshots/{page}"
$B console --errors
$B perf
```
### Auth Detection
After the first navigation, check if the URL changed to a login-like path:
```bash
$B url
```
If URL contains `/login`, `/signin`, `/auth`, or `/sso`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run `/setup-browser-cookies` first if needed."
### Design Audit Checklist (10 categories, ~80 items)
Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
**1. Visual Hierarchy & Composition** (8 items)
- Clear focal point? One primary CTA per view?
- Eye flows naturally top-left to bottom-right?
- Visual noise — competing elements fighting for attention?
- Information density appropriate for content type?
- Z-index clarity — nothing unexpectedly overlapping?
- Above-the-fold content communicates purpose in 3 seconds?
- Squint test: hierarchy still visible when blurred?
- White space is intentional, not leftover?
**2. Typography** (15 items)
- Font count <=3 (flag if more)
- Scale follows ratio (1.25 major third or 1.333 perfect fourth)
- Line-height: 1.5x body, 1.15-1.25x headings
- Measure: 45-75 chars per line (66 ideal)
- Heading hierarchy: no skipped levels (h1→h3 without h2)
- Weight contrast: >=2 weights used for hierarchy
- No blacklisted fonts (Papyrus, Comic Sans, Lobster, Impact, Jokerman)
- If primary font is Inter/Roboto/Open Sans/Poppins → flag as potentially generic
- `text-wrap: balance` or `text-pretty` on headings (check via `$B css <heading> text-wrap`)
- Curly quotes used, not straight quotes
- Ellipsis character (`…`) not three dots (`...`)
- `font-variant-numeric: tabular-nums` on number columns
- Body text >= 16px
- Caption/label >= 12px
- No letterspacing on lowercase text
**3. Color & Contrast** (10 items)
- Palette coherent (<=12 unique non-gray colors)
- WCAG AA: body text 4.5:1, large text (18px+) 3:1, UI components 3:1
- Semantic colors consistent (success=green, error=red, warning=yellow/amber)
- No color-only encoding (always add labels, icons, or patterns)
- Dark mode: surfaces use elevation, not just lightness inversion
- Dark mode: text off-white (~#E0E0E0), not pure white
- Primary accent desaturated 10-20% in dark mode
- `color-scheme: dark` on html element (if dark mode present)
- No red/green only combinations (8% of men have red-green deficiency)
- Neutral palette is warm or cool consistently — not mixed
**4. Spacing & Layout** (12 items)
- Grid consistent at all breakpoints
- Spacing uses a scale (4px or 8px base), not arbitrary values
- Alignment is consistent — nothing floats outside the grid
- Rhythm: related items closer together, distinct sections further apart
- Border-radius hierarchy (not uniform bubbly radius on everything)
- Inner radius = outer radius - gap (nested elements)
- No horizontal scroll on mobile
- Max content width set (no full-bleed body text)
- `env(safe-area-inset-*)` for notch devices
- URL reflects state (filters, tabs, pagination in query params)
- Flex/grid used for layout (not JS measurement)
- Breakpoints: mobile (375), tablet (768), desktop (1024), wide (1440)
**5. Interaction States** (10 items)
- Hover state on all interactive elements
- `focus-visible` ring present (never `outline: none` without replacement)
- Active/pressed state with depth effect or color shift
- Disabled state: reduced opacity + `cursor: not-allowed`
- Loading: skeleton shapes match real content layout
- Empty states: warm message + primary action + visual (not just "No items.")
- Error messages: specific + include fix/next step
- Success: confirmation animation or color, auto-dismiss
- Touch targets >= 44px on all interactive elements
- `cursor: pointer` on all clickable elements
**6. Responsive Design** (8 items)
- Mobile layout makes *design* sense (not just stacked desktop columns)
- Touch targets sufficient on mobile (>= 44px)
- No horizontal scroll on any viewport
- Images handle responsive (srcset, sizes, or CSS containment)
- Text readable without zooming on mobile (>= 16px body)
- Navigation collapses appropriately (hamburger, bottom nav, etc.)
- Forms usable on mobile (correct input types, no autoFocus on mobile)
- No `user-scalable=no` or `maximum-scale=1` in viewport meta
**7. Motion & Animation** (6 items)
- Easing: ease-out for entering, ease-in for exiting, ease-in-out for moving
- Duration: 50-700ms range (nothing slower unless page transition)
- Purpose: every animation communicates something (state change, attention, spatial relationship)
- `prefers-reduced-motion` respected (check: `$B js "matchMedia('(prefers-reduced-motion: reduce)').matches"`)
- No `transition: all` — properties listed explicitly
- Only `transform` and `opacity` animated (not layout properties like width, height, top, left)
**8. Content & Microcopy** (8 items)
- Empty states designed with warmth (message + action + illustration/icon)
- Error messages specific: what happened + why + what to do next
- Button labels specific ("Save API Key" not "Continue" or "Submit")
- No placeholder/lorem ipsum text visible in production
- Truncation handled (`text-overflow: ellipsis`, `line-clamp`, or `break-words`)
- Active voice ("Install the CLI" not "The CLI will be installed")
- Loading states end with `…` ("Saving…" not "Saving...")
- Destructive actions have confirmation modal or undo window
**9. AI Slop Detection** (10 anti-patterns — the blacklist)
The test: would a human designer at a respected studio ever ship this?
- Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes
- **The 3-column feature grid:** icon-in-colored-circle + bold title + 2-line description, repeated 3x symmetrically. THE most recognizable AI layout.
- Icons in colored circles as section decoration (SaaS starter template look)
- Centered everything (`text-align: center` on all headings, descriptions, cards)
- Uniform bubbly border-radius on every element (same large radius on everything)
- Decorative blobs, floating circles, wavy SVG dividers (if a section feels empty, it needs better content, not decoration)
- Emoji as design elements (rockets in headings, emoji as bullet points)
- Colored left-border on cards (`border-left: 3px solid <accent>`)
- Generic hero copy ("Welcome to [X]", "Unlock the power of...", "Your all-in-one solution for...")
- Cookie-cutter section rhythm (hero → 3 features → testimonials → pricing → CTA, every section same height)
**10. Performance as Design** (6 items)
- LCP < 2.0s (web apps), < 1.5s (informational sites)
- CLS < 0.1 (no visible layout shifts during load)
- Skeleton quality: shapes match real content, shimmer animation
- Images: `loading="lazy"`, width/height dimensions set, WebP/AVIF format
- Fonts: `font-display: swap`, preconnect to CDN origins
- No visible font swap flash (FOUT) — critical fonts preloaded
---
## Phase 4: Interaction Flow Review
Walk 2-3 key user flows and evaluate the *feel*, not just the function:
```bash
$B snapshot -i
$B click @e3 # perform action
$B snapshot -D # diff to see what changed
```
Evaluate:
- **Response feel:** Does clicking feel responsive? Any delays or missing loading states?
- **Transition quality:** Are transitions intentional or generic/absent?
- **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
- **Form polish:** Focus states visible? Validation timing correct? Errors near the source?
---
## Phase 5: Cross-Page Consistency
Compare screenshots and observations across pages for:
- Navigation bar consistent across all pages?
- Footer consistent?
- Component reuse vs one-off designs (same button styled differently on different pages?)
- Tone consistency (one page playful while another is corporate?)
- Spacing rhythm carries across pages?
---
## Phase 6: Compile Report
### Output Locations
**Local:** `.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md`
**Project-scoped:**
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
```
Write to: `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
**Baseline:** Write `design-baseline.json` for regression mode:
```json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"designScore": "B",
"aiSlopScore": "C",
"categoryGrades": { "hierarchy": "A", "typography": "B", ... },
"findings": [{ "id": "FINDING-001", "title": "...", "impact": "high", "category": "typography" }]
}
```
### Scoring System
**Dual headline scores:**
- **Design Score: {A-F}** — weighted average of all 10 categories
- **AI Slop Score: {A-F}** — standalone grade with pithy verdict
**Per-category grades:**
- **A:** Intentional, polished, delightful. Shows design thinking.
- **B:** Solid fundamentals, minor inconsistencies. Looks professional.
- **C:** Functional but generic. No major problems, no design point of view.
- **D:** Noticeable problems. Feels unfinished or careless.
- **F:** Actively hurting user experience. Needs significant rework.
**Grade computation:** Each category starts at A. Each High-impact finding drops one letter grade. Each Medium-impact finding drops half a letter grade. Polish findings are noted but do not affect grade. Minimum is F.
**Category weights for Design Score:**
| Category | Weight |
|----------|--------|
| Visual Hierarchy | 15% |
| Typography | 15% |
| Spacing & Layout | 15% |
| Color & Contrast | 10% |
| Interaction States | 10% |
| Responsive | 10% |
| Content Quality | 10% |
| AI Slop | 5% |
| Motion | 5% |
| Performance Feel | 5% |
AI Slop is 5% of Design Score but also graded independently as a headline metric.
### Regression Output
When previous `design-baseline.json` exists or `--regression` flag is used:
- Load baseline grades
- Compare: per-category deltas, new findings, resolved findings
- Append regression table to report
---
## Design Critique Format
Use structured feedback, not opinions:
- "I notice..." — observation (e.g., "I notice the primary CTA competes with the secondary action")
- "I wonder..." — question (e.g., "I wonder if users will understand what 'Process' means here")
- "What if..." — suggestion (e.g., "What if we moved search to a more prominent position?")
- "I think... because..." — reasoned opinion (e.g., "I think the spacing between sections is too uniform because it doesn't create hierarchy")
Tie everything to user goals and product objectives. Always suggest specific improvements alongside problems.
---
## Important Rules
1. **Think like a designer, not a QA engineer.** You care whether things feel right, look intentional, and respect the user. You do NOT just care whether things "work."
2. **Screenshots are evidence.** Every finding needs at least one screenshot. Use annotated screenshots (`snapshot -a`) to highlight elements.
3. **Be specific and actionable.** "Change X to Y because Z" — not "the spacing feels off."
4. **Never read source code.** Evaluate the rendered site, not the implementation. (Exception: offer to write DESIGN.md from extracted observations.)
5. **AI Slop detection is your superpower.** Most developers can't evaluate whether their site looks AI-generated. You can. Be direct about it.
6. **Quick wins matter.** Always include a "Quick Wins" section — the 3-5 highest-impact fixes that take <30 minutes each.
7. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense.
9. **Document incrementally.** Write each finding to the report as you find it. Don't batch.
10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.
---
## Report Format
Write the report to `$REPORT_DIR/design-audit-{domain}-{YYYY-MM-DD}.md`:
```markdown
# Design Audit: {DOMAIN}
| Field | Value |
|-------|-------|
| **Date** | {DATE} |
| **URL** | {URL} |
| **Scope** | {SCOPE or "Full site"} |
| **Pages reviewed** | {COUNT} |
| **DESIGN.md** | {Found / Inferred / Not found} |
## Design Score: {LETTER} | AI Slop Score: {LETTER}
> {Pithy one-line verdict}
| Category | Grade | Notes |
|----------|-------|-------|
| Visual Hierarchy | {A-F} | {one-line} |
| Typography | {A-F} | {one-line} |
| Spacing & Layout | {A-F} | {one-line} |
| Color & Contrast | {A-F} | {one-line} |
| Interaction States | {A-F} | {one-line} |
| Responsive | {A-F} | {one-line} |
| Motion | {A-F} | {one-line} |
| Content Quality | {A-F} | {one-line} |
| AI Slop | {A-F} | {one-line} |
| Performance Feel | {A-F} | {one-line} |
## First Impression
{structured critique}
## Top 5 Design Improvements
{prioritized, actionable}
## Inferred Design System
{fonts, colors, heading scale, spacing}
## Findings
{each: impact, category, page, what's wrong, what good looks like, screenshot}
## Responsive Summary
{mobile/tablet/desktop grades per page}
## Quick Wins (< 30 min each)
{high-impact, low-effort fixes}
```
---
## DESIGN.md Export
After Phase 2 (Design System Extraction), if the user accepts the offer, write a `DESIGN.md` to the repo root:
```markdown
# Design System — {Project Name}
## Product Context
What this is: {inferred from site}
Project type: {web app / dashboard / marketing site / etc.}
## Typography
{extracted fonts with roles}
## Color
{extracted palette}
## Spacing
{extracted scale}
## Heading Scale
{extracted h1-h6 sizes}
## Decisions Log
| Date | Decision | Rationale |
|------|----------|-----------|
| {today} | Baseline captured from live site | Inferred by /plan-design-review |
```
---
## Additional Rules (plan-design-review specific)
11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop.
12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create.
+147
View File
@@ -0,0 +1,147 @@
---
name: plan-design-review
version: 1.0.0
description: |
Designer's eye review of a live site. Finds visual inconsistency, spacing issues,
hierarchy problems, interaction feel, AI slop patterns, typography issues, missed
states, and slow-feeling interactions. Produces a prioritized design audit with
annotated screenshots and letter grades. Infers your design system and offers to
export as DESIGN.md. Report-only — never modifies code. For the fix loop, use
/qa-design-review instead.
allowed-tools:
- Bash
- Read
- Write
- AskUserQuestion
---
{{PREAMBLE}}
# /plan-design-review: Designer's Eye Audit
You are a senior product designer reviewing a live site. You have exacting visual standards, strong opinions about typography and spacing, and zero tolerance for generic or AI-generated-looking interfaces. You do NOT care whether things "work." You care whether they feel right, look intentional, and respect the user.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------:|
| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
| Auth | None | `Sign in as user@example.com`, `Import cookies` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
**If no URL is given and you're on main/master:** Ask the user for a URL.
**Check for DESIGN.md:**
Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions in this session must be calibrated against it. Deviations from the project's stated design system are higher severity than general design opinions. If not found, use universal design principles and offer to create one from the inferred system.
**Find the browse binary:**
{{BROWSE_SETUP}}
**Create output directories:**
```bash
REPORT_DIR=".gstack/design-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
---
{{DESIGN_METHODOLOGY}}
---
## Report Format
Write the report to `$REPORT_DIR/design-audit-{domain}-{YYYY-MM-DD}.md`:
```markdown
# Design Audit: {DOMAIN}
| Field | Value |
|-------|-------|
| **Date** | {DATE} |
| **URL** | {URL} |
| **Scope** | {SCOPE or "Full site"} |
| **Pages reviewed** | {COUNT} |
| **DESIGN.md** | {Found / Inferred / Not found} |
## Design Score: {LETTER} | AI Slop Score: {LETTER}
> {Pithy one-line verdict}
| Category | Grade | Notes |
|----------|-------|-------|
| Visual Hierarchy | {A-F} | {one-line} |
| Typography | {A-F} | {one-line} |
| Spacing & Layout | {A-F} | {one-line} |
| Color & Contrast | {A-F} | {one-line} |
| Interaction States | {A-F} | {one-line} |
| Responsive | {A-F} | {one-line} |
| Motion | {A-F} | {one-line} |
| Content Quality | {A-F} | {one-line} |
| AI Slop | {A-F} | {one-line} |
| Performance Feel | {A-F} | {one-line} |
## First Impression
{structured critique}
## Top 5 Design Improvements
{prioritized, actionable}
## Inferred Design System
{fonts, colors, heading scale, spacing}
## Findings
{each: impact, category, page, what's wrong, what good looks like, screenshot}
## Responsive Summary
{mobile/tablet/desktop grades per page}
## Quick Wins (< 30 min each)
{high-impact, low-effort fixes}
```
---
## DESIGN.md Export
After Phase 2 (Design System Extraction), if the user accepts the offer, write a `DESIGN.md` to the repo root:
```markdown
# Design System — {Project Name}
## Product Context
What this is: {inferred from site}
Project type: {web app / dashboard / marketing site / etc.}
## Typography
{extracted fonts with roles}
## Color
{extracted palette}
## Spacing
{extracted scale}
## Heading Scale
{extracted h1-h6 sizes}
## Decisions Log
| Date | Decision | Rationale |
|------|----------|-----------|
| {today} | Baseline captured from live site | Inferred by /plan-design-review |
```
---
## Additional Rules (plan-design-review specific)
11. **Never fix anything.** Find and document only. Do not read source code, edit files, or suggest code fixes. Your job is to report what could be better and suggest design improvements. Use `/qa-design-review` for the fix loop.
12. **The exception:** You MAY write a DESIGN.md file if the user accepts the offer. This is the only file you create.
+670
View File
@@ -0,0 +1,670 @@
---
name: qa-design-review
version: 1.0.0
description: |
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems,
AI slop patterns, and slow interactions — then fixes them. Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For report-only mode, use /plan-design-review instead.
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
- AskUserQuestion
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
```
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]`
4. **Options:** Lettered options: `A) ... B) ... C) ...`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
# /qa-design-review: Design Audit → Fix → Verify
You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------:|
| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
| Auth | None | `Sign in as user@example.com`, `Import cookies` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
**If no URL is given and you're on main/master:** Ask the user for a URL.
**Check for DESIGN.md:**
Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions must be calibrated against it. Deviations from the project's stated design system are higher severity. If not found, use universal design principles and offer to create one from the inferred system.
**Require clean working tree before starting:**
```bash
if [ -n "$(git status --porcelain)" ]; then
echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa-design-review."
exit 1
fi
```
**Find the browse binary:**
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
**Create output directories:**
```bash
REPORT_DIR=".gstack/design-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
---
## Phases 1-6: Design Audit Baseline
## Modes
### Full (default)
Systematic review of all pages reachable from homepage. Visit 5-8 pages. Full checklist evaluation, responsive screenshots, interaction flow testing. Produces complete design audit report with letter grades.
### Quick (`--quick`)
Homepage + 2 key pages only. First Impression + Design System Extraction + abbreviated checklist. Fastest path to a design score.
### Deep (`--deep`)
Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist. For pre-launch audits or major redesigns.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
1. Analyze the branch diff: `git diff main...HEAD --name-only`
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
### Regression (`--regression` or previous `design-baseline.json` found)
Run full audit, then load previous `design-baseline.json`. Compare: per-category grade deltas, new findings, resolved findings. Output regression table in report.
---
## Phase 1: First Impression
The most uniquely designer-like output. Form a gut reaction before analyzing anything.
1. Navigate to the target URL
2. Take a full-page desktop screenshot: `$B screenshot "$REPORT_DIR/screenshots/first-impression.png"`
3. Write the **First Impression** using this structured critique format:
- "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?)
- "I notice **[observation]**." (what stands out, positive or negative — be specific)
- "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?)
- "If I had to describe this in one word: **[word]**." (gut verdict)
This is the section users read first. Be opinionated. A designer doesn't hedge — they react.
---
## Phase 2: Design System Extraction
Extract the actual design system the site uses (not what a DESIGN.md says, but what's rendered):
```bash
# Fonts in use (capped at 500 elements to avoid timeout)
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).map(e => getComputedStyle(e).fontFamily))])"
# Color palette in use
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).flatMap(e => [getComputedStyle(e).color, getComputedStyle(e).backgroundColor]).filter(c => c !== 'rgba(0, 0, 0, 0)'))])"
# Heading hierarchy
$B js "JSON.stringify([...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => ({tag:h.tagName, text:h.textContent.trim().slice(0,50), size:getComputedStyle(h).fontSize, weight:getComputedStyle(h).fontWeight})))"
# Touch target audit (find undersized interactive elements)
$B js "JSON.stringify([...document.querySelectorAll('a,button,input,[role=button]')].filter(e => {const r=e.getBoundingClientRect(); return r.width>0 && (r.width<44||r.height<44)}).map(e => ({tag:e.tagName, text:(e.textContent||'').trim().slice(0,30), w:Math.round(e.getBoundingClientRect().width), h:Math.round(e.getBoundingClientRect().height)})).slice(0,20))"
# Performance baseline
$B perf
```
Structure findings as an **Inferred Design System**:
- **Fonts:** list with usage counts. Flag if >3 distinct font families.
- **Colors:** palette extracted. Flag if >12 unique non-gray colors. Note warm/cool/mixed.
- **Heading Scale:** h1-h6 sizes. Flag skipped levels, non-systematic size jumps.
- **Spacing Patterns:** sample padding/margin values. Flag non-scale values.
After extraction, offer: *"Want me to save this as your DESIGN.md? I can lock in these observations as your project's design system baseline."*
---
## Phase 3: Page-by-Page Visual Audit
For each page in scope:
```bash
$B goto <url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/{page}-annotated.png"
$B responsive "$REPORT_DIR/screenshots/{page}"
$B console --errors
$B perf
```
### Auth Detection
After the first navigation, check if the URL changed to a login-like path:
```bash
$B url
```
If URL contains `/login`, `/signin`, `/auth`, or `/sso`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run `/setup-browser-cookies` first if needed."
### Design Audit Checklist (10 categories, ~80 items)
Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
**1. Visual Hierarchy & Composition** (8 items)
- Clear focal point? One primary CTA per view?
- Eye flows naturally top-left to bottom-right?
- Visual noise — competing elements fighting for attention?
- Information density appropriate for content type?
- Z-index clarity — nothing unexpectedly overlapping?
- Above-the-fold content communicates purpose in 3 seconds?
- Squint test: hierarchy still visible when blurred?
- White space is intentional, not leftover?
**2. Typography** (15 items)
- Font count <=3 (flag if more)
- Scale follows ratio (1.25 major third or 1.333 perfect fourth)
- Line-height: 1.5x body, 1.15-1.25x headings
- Measure: 45-75 chars per line (66 ideal)
- Heading hierarchy: no skipped levels (h1→h3 without h2)
- Weight contrast: >=2 weights used for hierarchy
- No blacklisted fonts (Papyrus, Comic Sans, Lobster, Impact, Jokerman)
- If primary font is Inter/Roboto/Open Sans/Poppins → flag as potentially generic
- `text-wrap: balance` or `text-pretty` on headings (check via `$B css <heading> text-wrap`)
- Curly quotes used, not straight quotes
- Ellipsis character (`…`) not three dots (`...`)
- `font-variant-numeric: tabular-nums` on number columns
- Body text >= 16px
- Caption/label >= 12px
- No letterspacing on lowercase text
**3. Color & Contrast** (10 items)
- Palette coherent (<=12 unique non-gray colors)
- WCAG AA: body text 4.5:1, large text (18px+) 3:1, UI components 3:1
- Semantic colors consistent (success=green, error=red, warning=yellow/amber)
- No color-only encoding (always add labels, icons, or patterns)
- Dark mode: surfaces use elevation, not just lightness inversion
- Dark mode: text off-white (~#E0E0E0), not pure white
- Primary accent desaturated 10-20% in dark mode
- `color-scheme: dark` on html element (if dark mode present)
- No red/green only combinations (8% of men have red-green deficiency)
- Neutral palette is warm or cool consistently — not mixed
**4. Spacing & Layout** (12 items)
- Grid consistent at all breakpoints
- Spacing uses a scale (4px or 8px base), not arbitrary values
- Alignment is consistent — nothing floats outside the grid
- Rhythm: related items closer together, distinct sections further apart
- Border-radius hierarchy (not uniform bubbly radius on everything)
- Inner radius = outer radius - gap (nested elements)
- No horizontal scroll on mobile
- Max content width set (no full-bleed body text)
- `env(safe-area-inset-*)` for notch devices
- URL reflects state (filters, tabs, pagination in query params)
- Flex/grid used for layout (not JS measurement)
- Breakpoints: mobile (375), tablet (768), desktop (1024), wide (1440)
**5. Interaction States** (10 items)
- Hover state on all interactive elements
- `focus-visible` ring present (never `outline: none` without replacement)
- Active/pressed state with depth effect or color shift
- Disabled state: reduced opacity + `cursor: not-allowed`
- Loading: skeleton shapes match real content layout
- Empty states: warm message + primary action + visual (not just "No items.")
- Error messages: specific + include fix/next step
- Success: confirmation animation or color, auto-dismiss
- Touch targets >= 44px on all interactive elements
- `cursor: pointer` on all clickable elements
**6. Responsive Design** (8 items)
- Mobile layout makes *design* sense (not just stacked desktop columns)
- Touch targets sufficient on mobile (>= 44px)
- No horizontal scroll on any viewport
- Images handle responsive (srcset, sizes, or CSS containment)
- Text readable without zooming on mobile (>= 16px body)
- Navigation collapses appropriately (hamburger, bottom nav, etc.)
- Forms usable on mobile (correct input types, no autoFocus on mobile)
- No `user-scalable=no` or `maximum-scale=1` in viewport meta
**7. Motion & Animation** (6 items)
- Easing: ease-out for entering, ease-in for exiting, ease-in-out for moving
- Duration: 50-700ms range (nothing slower unless page transition)
- Purpose: every animation communicates something (state change, attention, spatial relationship)
- `prefers-reduced-motion` respected (check: `$B js "matchMedia('(prefers-reduced-motion: reduce)').matches"`)
- No `transition: all` — properties listed explicitly
- Only `transform` and `opacity` animated (not layout properties like width, height, top, left)
**8. Content & Microcopy** (8 items)
- Empty states designed with warmth (message + action + illustration/icon)
- Error messages specific: what happened + why + what to do next
- Button labels specific ("Save API Key" not "Continue" or "Submit")
- No placeholder/lorem ipsum text visible in production
- Truncation handled (`text-overflow: ellipsis`, `line-clamp`, or `break-words`)
- Active voice ("Install the CLI" not "The CLI will be installed")
- Loading states end with `…` ("Saving…" not "Saving...")
- Destructive actions have confirmation modal or undo window
**9. AI Slop Detection** (10 anti-patterns — the blacklist)
The test: would a human designer at a respected studio ever ship this?
- Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes
- **The 3-column feature grid:** icon-in-colored-circle + bold title + 2-line description, repeated 3x symmetrically. THE most recognizable AI layout.
- Icons in colored circles as section decoration (SaaS starter template look)
- Centered everything (`text-align: center` on all headings, descriptions, cards)
- Uniform bubbly border-radius on every element (same large radius on everything)
- Decorative blobs, floating circles, wavy SVG dividers (if a section feels empty, it needs better content, not decoration)
- Emoji as design elements (rockets in headings, emoji as bullet points)
- Colored left-border on cards (`border-left: 3px solid <accent>`)
- Generic hero copy ("Welcome to [X]", "Unlock the power of...", "Your all-in-one solution for...")
- Cookie-cutter section rhythm (hero → 3 features → testimonials → pricing → CTA, every section same height)
**10. Performance as Design** (6 items)
- LCP < 2.0s (web apps), < 1.5s (informational sites)
- CLS < 0.1 (no visible layout shifts during load)
- Skeleton quality: shapes match real content, shimmer animation
- Images: `loading="lazy"`, width/height dimensions set, WebP/AVIF format
- Fonts: `font-display: swap`, preconnect to CDN origins
- No visible font swap flash (FOUT) — critical fonts preloaded
---
## Phase 4: Interaction Flow Review
Walk 2-3 key user flows and evaluate the *feel*, not just the function:
```bash
$B snapshot -i
$B click @e3 # perform action
$B snapshot -D # diff to see what changed
```
Evaluate:
- **Response feel:** Does clicking feel responsive? Any delays or missing loading states?
- **Transition quality:** Are transitions intentional or generic/absent?
- **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
- **Form polish:** Focus states visible? Validation timing correct? Errors near the source?
---
## Phase 5: Cross-Page Consistency
Compare screenshots and observations across pages for:
- Navigation bar consistent across all pages?
- Footer consistent?
- Component reuse vs one-off designs (same button styled differently on different pages?)
- Tone consistency (one page playful while another is corporate?)
- Spacing rhythm carries across pages?
---
## Phase 6: Compile Report
### Output Locations
**Local:** `.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md`
**Project-scoped:**
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
```
Write to: `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
**Baseline:** Write `design-baseline.json` for regression mode:
```json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"designScore": "B",
"aiSlopScore": "C",
"categoryGrades": { "hierarchy": "A", "typography": "B", ... },
"findings": [{ "id": "FINDING-001", "title": "...", "impact": "high", "category": "typography" }]
}
```
### Scoring System
**Dual headline scores:**
- **Design Score: {A-F}** — weighted average of all 10 categories
- **AI Slop Score: {A-F}** — standalone grade with pithy verdict
**Per-category grades:**
- **A:** Intentional, polished, delightful. Shows design thinking.
- **B:** Solid fundamentals, minor inconsistencies. Looks professional.
- **C:** Functional but generic. No major problems, no design point of view.
- **D:** Noticeable problems. Feels unfinished or careless.
- **F:** Actively hurting user experience. Needs significant rework.
**Grade computation:** Each category starts at A. Each High-impact finding drops one letter grade. Each Medium-impact finding drops half a letter grade. Polish findings are noted but do not affect grade. Minimum is F.
**Category weights for Design Score:**
| Category | Weight |
|----------|--------|
| Visual Hierarchy | 15% |
| Typography | 15% |
| Spacing & Layout | 15% |
| Color & Contrast | 10% |
| Interaction States | 10% |
| Responsive | 10% |
| Content Quality | 10% |
| AI Slop | 5% |
| Motion | 5% |
| Performance Feel | 5% |
AI Slop is 5% of Design Score but also graded independently as a headline metric.
### Regression Output
When previous `design-baseline.json` exists or `--regression` flag is used:
- Load baseline grades
- Compare: per-category deltas, new findings, resolved findings
- Append regression table to report
---
## Design Critique Format
Use structured feedback, not opinions:
- "I notice..." — observation (e.g., "I notice the primary CTA competes with the secondary action")
- "I wonder..." — question (e.g., "I wonder if users will understand what 'Process' means here")
- "What if..." — suggestion (e.g., "What if we moved search to a more prominent position?")
- "I think... because..." — reasoned opinion (e.g., "I think the spacing between sections is too uniform because it doesn't create hierarchy")
Tie everything to user goals and product objectives. Always suggest specific improvements alongside problems.
---
## Important Rules
1. **Think like a designer, not a QA engineer.** You care whether things feel right, look intentional, and respect the user. You do NOT just care whether things "work."
2. **Screenshots are evidence.** Every finding needs at least one screenshot. Use annotated screenshots (`snapshot -a`) to highlight elements.
3. **Be specific and actionable.** "Change X to Y because Z" — not "the spacing feels off."
4. **Never read source code.** Evaluate the rendered site, not the implementation. (Exception: offer to write DESIGN.md from extracted observations.)
5. **AI Slop detection is your superpower.** Most developers can't evaluate whether their site looks AI-generated. You can. Be direct about it.
6. **Quick wins matter.** Always include a "Quick Wins" section — the 3-5 highest-impact fixes that take <30 minutes each.
7. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design — it's lazy. Evaluate whether the mobile layout makes *design* sense.
9. **Document incrementally.** Write each finding to the report as you find it. Don't batch.
10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.
Record baseline design score and AI slop score at end of Phase 6.
---
## Output Structure
```
.gstack/design-reports/
├── design-audit-{domain}-{YYYY-MM-DD}.md # Structured report
├── screenshots/
│ ├── first-impression.png # Phase 1
│ ├── {page}-annotated.png # Per-page annotated
│ ├── {page}-mobile.png # Responsive
│ ├── {page}-tablet.png
│ ├── {page}-desktop.png
│ ├── finding-001-before.png # Before fix
│ ├── finding-001-after.png # After fix
│ └── ...
└── design-baseline.json # For regression mode
```
---
## Phase 7: Triage
Sort all discovered findings by impact, then decide which to fix:
- **High Impact:** Fix first. These affect the first impression and hurt user trust.
- **Medium Impact:** Fix next. These reduce polish and are felt subconsciously.
- **Polish:** Fix if time allows. These separate good from great.
Mark findings that cannot be fixed from source code (e.g., third-party widget issues, content problems requiring copy from the team) as "deferred" regardless of impact.
---
## Phase 8: Fix Loop
For each fixable finding, in impact order:
### 8a. Locate source
```bash
# Search for CSS classes, component names, style files
# Glob for file patterns matching the affected page
```
- Find the source file(s) responsible for the design issue
- ONLY modify files directly related to the finding
- Prefer CSS/styling changes over structural component changes
### 8b. Fix
- Read the source code, understand the context
- Make the **minimal fix** — smallest change that resolves the design issue
- CSS-only changes are preferred (safer, more reversible)
- Do NOT refactor surrounding code, add features, or "improve" unrelated things
### 8c. Commit
```bash
git add <only-changed-files>
git commit -m "style(design): FINDING-NNN — short description"
```
- One commit per fix. Never bundle multiple fixes.
- Message format: `style(design): FINDING-NNN — short description`
### 8d. Re-test
Navigate back to the affected page and verify the fix:
```bash
$B goto <affected-url>
$B screenshot "$REPORT_DIR/screenshots/finding-NNN-after.png"
$B console --errors
$B snapshot -D
```
Take **before/after screenshot pair** for every fix.
### 8e. Classify
- **verified**: re-test confirms the fix works, no new errors introduced
- **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state)
- **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred"
### 8f. Self-Regulation (STOP AND EVALUATE)
Every 5 fixes (or after any revert), compute the design-fix risk level:
```
DESIGN-FIX RISK:
Start at 0%
Each revert: +15%
Each CSS-only file change: +0% (safe — styling only)
Each JSX/TSX/component file change: +5% per file
After fix 10: +1% per additional fix
Touching unrelated files: +20%
```
**If risk > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
**Hard cap: 30 fixes.** After 30 fixes, stop regardless of remaining findings.
---
## Phase 9: Final Design Audit
After all fixes are applied:
1. Re-run the design audit on all affected pages
2. Compute final design score and AI slop score
3. **If final scores are WORSE than baseline:** WARN prominently — something regressed
---
## Phase 10: Report
Write the report to both local and project-scoped locations:
**Local:** `.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md`
**Project-scoped:**
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
**Per-finding additions** (beyond standard design audit report):
- Fix Status: verified / best-effort / reverted / deferred
- Commit SHA (if fixed)
- Files Changed (if fixed)
- Before/After screenshots (if fixed)
**Summary section:**
- Total findings
- Fixes applied (verified: X, best-effort: Y, reverted: Z)
- Deferred findings
- Design score delta: baseline → final
- AI slop score delta: baseline → final
**PR Summary:** Include a one-line summary suitable for PR descriptions:
> "Design review found N issues, fixed M. Design score X → Y, AI slop score X → Y."
---
## Phase 11: TODOS.md Update
If the repo has a `TODOS.md`:
1. **New deferred design findings** → add as TODOs with impact level, category, and description
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /qa-design-review on {branch}, {date}"
---
## Additional Rules (qa-design-review specific)
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
13. **Never modify tests or CI configuration.** Only fix application source code and styles.
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask.
16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible.
17. **DESIGN.md export.** You MAY write a DESIGN.md file if the user accepts the offer from Phase 2.
+234
View File
@@ -0,0 +1,234 @@
---
name: qa-design-review
version: 1.0.0
description: |
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems,
AI slop patterns, and slow interactions — then fixes them. Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For report-only mode, use /plan-design-review instead.
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
- AskUserQuestion
---
{{PREAMBLE}}
# /qa-design-review: Design Audit → Fix → Verify
You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------:|
| Target URL | (auto-detect or ask) | `https://myapp.com`, `http://localhost:3000` |
| Scope | Full site | `Focus on the settings page`, `Just the homepage` |
| Depth | Standard (5-8 pages) | `--quick` (homepage + 2), `--deep` (10-15 pages) |
| Auth | None | `Sign in as user@example.com`, `Import cookies` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below).
**If no URL is given and you're on main/master:** Ask the user for a URL.
**Check for DESIGN.md:**
Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions must be calibrated against it. Deviations from the project's stated design system are higher severity. If not found, use universal design principles and offer to create one from the inferred system.
**Require clean working tree before starting:**
```bash
if [ -n "$(git status --porcelain)" ]; then
echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa-design-review."
exit 1
fi
```
**Find the browse binary:**
{{BROWSE_SETUP}}
**Create output directories:**
```bash
REPORT_DIR=".gstack/design-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
---
## Phases 1-6: Design Audit Baseline
{{DESIGN_METHODOLOGY}}
Record baseline design score and AI slop score at end of Phase 6.
---
## Output Structure
```
.gstack/design-reports/
├── design-audit-{domain}-{YYYY-MM-DD}.md # Structured report
├── screenshots/
│ ├── first-impression.png # Phase 1
│ ├── {page}-annotated.png # Per-page annotated
│ ├── {page}-mobile.png # Responsive
│ ├── {page}-tablet.png
│ ├── {page}-desktop.png
│ ├── finding-001-before.png # Before fix
│ ├── finding-001-after.png # After fix
│ └── ...
└── design-baseline.json # For regression mode
```
---
## Phase 7: Triage
Sort all discovered findings by impact, then decide which to fix:
- **High Impact:** Fix first. These affect the first impression and hurt user trust.
- **Medium Impact:** Fix next. These reduce polish and are felt subconsciously.
- **Polish:** Fix if time allows. These separate good from great.
Mark findings that cannot be fixed from source code (e.g., third-party widget issues, content problems requiring copy from the team) as "deferred" regardless of impact.
---
## Phase 8: Fix Loop
For each fixable finding, in impact order:
### 8a. Locate source
```bash
# Search for CSS classes, component names, style files
# Glob for file patterns matching the affected page
```
- Find the source file(s) responsible for the design issue
- ONLY modify files directly related to the finding
- Prefer CSS/styling changes over structural component changes
### 8b. Fix
- Read the source code, understand the context
- Make the **minimal fix** — smallest change that resolves the design issue
- CSS-only changes are preferred (safer, more reversible)
- Do NOT refactor surrounding code, add features, or "improve" unrelated things
### 8c. Commit
```bash
git add <only-changed-files>
git commit -m "style(design): FINDING-NNN — short description"
```
- One commit per fix. Never bundle multiple fixes.
- Message format: `style(design): FINDING-NNN — short description`
### 8d. Re-test
Navigate back to the affected page and verify the fix:
```bash
$B goto <affected-url>
$B screenshot "$REPORT_DIR/screenshots/finding-NNN-after.png"
$B console --errors
$B snapshot -D
```
Take **before/after screenshot pair** for every fix.
### 8e. Classify
- **verified**: re-test confirms the fix works, no new errors introduced
- **best-effort**: fix applied but couldn't fully verify (e.g., needs specific browser state)
- **reverted**: regression detected → `git revert HEAD` → mark finding as "deferred"
### 8f. Self-Regulation (STOP AND EVALUATE)
Every 5 fixes (or after any revert), compute the design-fix risk level:
```
DESIGN-FIX RISK:
Start at 0%
Each revert: +15%
Each CSS-only file change: +0% (safe — styling only)
Each JSX/TSX/component file change: +5% per file
After fix 10: +1% per additional fix
Touching unrelated files: +20%
```
**If risk > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
**Hard cap: 30 fixes.** After 30 fixes, stop regardless of remaining findings.
---
## Phase 9: Final Design Audit
After all fixes are applied:
1. Re-run the design audit on all affected pages
2. Compute final design score and AI slop score
3. **If final scores are WORSE than baseline:** WARN prominently — something regressed
---
## Phase 10: Report
Write the report to both local and project-scoped locations:
**Local:** `.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md`
**Project-scoped:**
```bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
**Per-finding additions** (beyond standard design audit report):
- Fix Status: verified / best-effort / reverted / deferred
- Commit SHA (if fixed)
- Files Changed (if fixed)
- Before/After screenshots (if fixed)
**Summary section:**
- Total findings
- Fixes applied (verified: X, best-effort: Y, reverted: Z)
- Deferred findings
- Design score delta: baseline → final
- AI slop score delta: baseline → final
**PR Summary:** Include a one-line summary suitable for PR descriptions:
> "Design review found N issues, fixed M. Design score X → Y, AI slop score X → Y."
---
## Phase 11: TODOS.md Update
If the repo has a `TODOS.md`:
1. **New deferred design findings** → add as TODOs with impact level, category, and description
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /qa-design-review on {branch}, {date}"
---
## Additional Rules (qa-design-review specific)
11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
13. **Never modify tests or CI configuration.** Only fix application source code and styles.
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
15. **Self-regulate.** Follow the design-fix risk heuristic. When in doubt, stop and ask.
16. **CSS-first.** Prefer CSS/styling changes over structural component changes. CSS-only changes are safer and more reversible.
17. **DESIGN.md export.** You MAY write a DESIGN.md file if the user accepts the offer from Phase 2.
+47 -8
View File
@@ -216,14 +216,53 @@ Follow the output format specified in the checklist. Respect the suppressions
---
## Step 5: Output findings
## Step 5: Fix-First Review
**Always output ALL findings**both critical and informational. The user must see every issue.
**Every finding gets actionnot just critical ones.**
- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
### Step 5a: Classify each finding
For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational findings lean
toward AUTO-FIX.
### Step 5b: Auto-fix all AUTO-FIX items
Apply each fix directly. For each one, output a one-line summary:
`[AUTO-FIXED] [file:line] Problem → what you did`
### Step 5c: Batch-ask about ASK items
If there are ASK items remaining, present them in ONE AskUserQuestion:
- List each item with a number, the severity label, the problem, and a recommended fix
- For each item, provide options: A) Fix as recommended, B) Skip
- Include an overall RECOMMENDATION
Example format:
```
I auto-fixed 5 issues. 2 need your input:
1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
Fix: Add `WHERE status = 'draft'` to the UPDATE
→ A) Fix B) Skip
2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
Fix: Add JSON schema validation
→ A) Fix B) Skip
RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
```
If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
### Step 5d: Apply user-approved fixes
Apply fixes for items where the user chose "Fix." Output what was fixed.
If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
### Verification of claims
@@ -243,7 +282,7 @@ After outputting your own findings, if Greptile comments were classified in Step
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -292,7 +331,7 @@ If no documentation files exist, skip this step silently.
## Important Rules
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
+47 -8
View File
@@ -109,14 +109,53 @@ Follow the output format specified in the checklist. Respect the suppressions
---
## Step 5: Output findings
## Step 5: Fix-First Review
**Always output ALL findings**both critical and informational. The user must see every issue.
**Every finding gets actionnot just critical ones.**
- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
### Step 5a: Classify each finding
For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational findings lean
toward AUTO-FIX.
### Step 5b: Auto-fix all AUTO-FIX items
Apply each fix directly. For each one, output a one-line summary:
`[AUTO-FIXED] [file:line] Problem → what you did`
### Step 5c: Batch-ask about ASK items
If there are ASK items remaining, present them in ONE AskUserQuestion:
- List each item with a number, the severity label, the problem, and a recommended fix
- For each item, provide options: A) Fix as recommended, B) Skip
- Include an overall RECOMMENDATION
Example format:
```
I auto-fixed 5 issues. 2 need your input:
1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
Fix: Add `WHERE status = 'draft'` to the UPDATE
→ A) Fix B) Skip
2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
Fix: Add JSON schema validation
→ A) Fix B) Skip
RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
```
If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
### Step 5d: Apply user-approved fixes
Apply fixes for items where the user chose "Fix." Output what was fixed.
If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
### Verification of claims
@@ -136,7 +175,7 @@ After outputting your own findings, if Greptile comments were classified in Step
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -185,7 +224,7 @@ If no documentation files exist, skip this step silently.
## Important Rules
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
+42 -9
View File
@@ -5,21 +5,23 @@
Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
**Two-pass review:**
- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. Highest severity.
- **Pass 2 (INFORMATIONAL):** Run all remaining categories. Lower severity but still actioned.
All findings get action via Fix-First Review: obvious mechanical fixes are applied automatically,
genuinely ambiguous issues are batched into a single user question.
**Output format:**
```
Pre-Landing Review: N issues (X critical, Y informational)
**CRITICAL** (blocking /ship):
- [file:line] Problem description
Fix: suggested fix
**AUTO-FIXED:**
- [file:line] Problem → fix applied
**Issues** (non-blocking):
**NEEDS INPUT:**
- [file:line] Problem description
Fix: suggested fix
Recommended fix: suggested fix
```
If no issues found: `Pre-Landing Review: No issues found.`
@@ -102,10 +104,10 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo
---
## Gate Classification
## Severity Classification
```
CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
CRITICAL (highest severity): INFORMATIONAL (lower severity):
├─ SQL & Data Safety ├─ Conditional Side Effects
├─ Race Conditions & Concurrency ├─ Magic Numbers & String Coupling
├─ LLM Output Trust Boundary ├─ Dead Code & Consistency
@@ -115,10 +117,41 @@ CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
├─ Time Window Safety
├─ Type Coercion at Boundaries
└─ View/Frontend
All findings are actioned via Fix-First Review. Severity determines
presentation order and classification of AUTO-FIX vs ASK — critical
findings lean toward ASK (they're riskier), informational findings
lean toward AUTO-FIX (they're more mechanical).
```
---
## Fix-First Heuristic
This heuristic is referenced by both `/review` and `/ship`. It determines whether
the agent auto-fixes a finding or asks the user.
```
AUTO-FIX (agent fixes without asking): ASK (needs human judgment):
├─ Dead code / unused variables ├─ Security (auth, XSS, injection)
├─ N+1 queries (missing .includes()) ├─ Race conditions
├─ Stale comments contradicting code ├─ Design decisions
├─ Magic numbers → named constants ├─ Large fixes (>20 lines)
├─ Missing LLM output validation ├─ Enum completeness
├─ Version/path mismatches ├─ Removing functionality
├─ Variables assigned but never read └─ Anything changing user-visible
└─ Inline styles, O(n*m) view lookups behavior
```
**Rule of thumb:** If the fix is mechanical and a senior engineer would apply it
without discussion, it's AUTO-FIX. If reasonable engineers could disagree about
the fix, it's ASK.
**Critical findings default toward ASK** (they're inherently riskier).
**Informational findings default toward AUTO-FIX** (they're more mechanical).
---
## Suppressions — DO NOT flag these
- "X is redundant with Y" when the redundancy is harmless and aids readability (e.g., `present?` redundant with `length > 20`)
+337
View File
@@ -504,6 +504,339 @@ Minimum 0 per category.
10. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses.`;
}
function generateDesignMethodology(): string {
return `## Modes
### Full (default)
Systematic review of all pages reachable from homepage. Visit 5-8 pages. Full checklist evaluation, responsive screenshots, interaction flow testing. Produces complete design audit report with letter grades.
### Quick (\`--quick\`)
Homepage + 2 key pages only. First Impression + Design System Extraction + abbreviated checklist. Fastest path to a design score.
### Deep (\`--deep\`)
Comprehensive review: 10-15 pages, every interaction flow, exhaustive checklist. For pre-launch audits or major redesigns.
### Diff-aware (automatic when on a feature branch with no URL)
When on a feature branch, scope to pages affected by the branch changes:
1. Analyze the branch diff: \`git diff main...HEAD --name-only\`
2. Map changed files to affected pages/routes
3. Detect running app on common local ports (3000, 4000, 8080)
4. Audit only affected pages, compare design quality before/after
### Regression (\`--regression\` or previous \`design-baseline.json\` found)
Run full audit, then load previous \`design-baseline.json\`. Compare: per-category grade deltas, new findings, resolved findings. Output regression table in report.
---
## Phase 1: First Impression
The most uniquely designer-like output. Form a gut reaction before analyzing anything.
1. Navigate to the target URL
2. Take a full-page desktop screenshot: \`$B screenshot "$REPORT_DIR/screenshots/first-impression.png"\`
3. Write the **First Impression** using this structured critique format:
- "The site communicates **[what]**." (what it says at a glance competence? playfulness? confusion?)
- "I notice **[observation]**." (what stands out, positive or negative be specific)
- "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check are these intentional?)
- "If I had to describe this in one word: **[word]**." (gut verdict)
This is the section users read first. Be opinionated. A designer doesn't hedge they react.
---
## Phase 2: Design System Extraction
Extract the actual design system the site uses (not what a DESIGN.md says, but what's rendered):
\`\`\`bash
# Fonts in use (capped at 500 elements to avoid timeout)
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).map(e => getComputedStyle(e).fontFamily))])"
# Color palette in use
$B js "JSON.stringify([...new Set([...document.querySelectorAll('*')].slice(0,500).flatMap(e => [getComputedStyle(e).color, getComputedStyle(e).backgroundColor]).filter(c => c !== 'rgba(0, 0, 0, 0)'))])"
# Heading hierarchy
$B js "JSON.stringify([...document.querySelectorAll('h1,h2,h3,h4,h5,h6')].map(h => ({tag:h.tagName, text:h.textContent.trim().slice(0,50), size:getComputedStyle(h).fontSize, weight:getComputedStyle(h).fontWeight})))"
# Touch target audit (find undersized interactive elements)
$B js "JSON.stringify([...document.querySelectorAll('a,button,input,[role=button]')].filter(e => {const r=e.getBoundingClientRect(); return r.width>0 && (r.width<44||r.height<44)}).map(e => ({tag:e.tagName, text:(e.textContent||'').trim().slice(0,30), w:Math.round(e.getBoundingClientRect().width), h:Math.round(e.getBoundingClientRect().height)})).slice(0,20))"
# Performance baseline
$B perf
\`\`\`
Structure findings as an **Inferred Design System**:
- **Fonts:** list with usage counts. Flag if >3 distinct font families.
- **Colors:** palette extracted. Flag if >12 unique non-gray colors. Note warm/cool/mixed.
- **Heading Scale:** h1-h6 sizes. Flag skipped levels, non-systematic size jumps.
- **Spacing Patterns:** sample padding/margin values. Flag non-scale values.
After extraction, offer: *"Want me to save this as your DESIGN.md? I can lock in these observations as your project's design system baseline."*
---
## Phase 3: Page-by-Page Visual Audit
For each page in scope:
\`\`\`bash
$B goto <url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/{page}-annotated.png"
$B responsive "$REPORT_DIR/screenshots/{page}"
$B console --errors
$B perf
\`\`\`
### Auth Detection
After the first navigation, check if the URL changed to a login-like path:
\`\`\`bash
$B url
\`\`\`
If URL contains \`/login\`, \`/signin\`, \`/auth\`, or \`/sso\`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run \`/setup-browser-cookies\` first if needed."
### Design Audit Checklist (10 categories, ~80 items)
Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
**1. Visual Hierarchy & Composition** (8 items)
- Clear focal point? One primary CTA per view?
- Eye flows naturally top-left to bottom-right?
- Visual noise competing elements fighting for attention?
- Information density appropriate for content type?
- Z-index clarity nothing unexpectedly overlapping?
- Above-the-fold content communicates purpose in 3 seconds?
- Squint test: hierarchy still visible when blurred?
- White space is intentional, not leftover?
**2. Typography** (15 items)
- Font count <=3 (flag if more)
- Scale follows ratio (1.25 major third or 1.333 perfect fourth)
- Line-height: 1.5x body, 1.15-1.25x headings
- Measure: 45-75 chars per line (66 ideal)
- Heading hierarchy: no skipped levels (h1h3 without h2)
- Weight contrast: >=2 weights used for hierarchy
- No blacklisted fonts (Papyrus, Comic Sans, Lobster, Impact, Jokerman)
- If primary font is Inter/Roboto/Open Sans/Poppins flag as potentially generic
- \`text-wrap: balance\` or \`text-pretty\` on headings (check via \`$B css <heading> text-wrap\`)
- Curly quotes used, not straight quotes
- Ellipsis character (\`\`) not three dots (\`...\`)
- \`font-variant-numeric: tabular-nums\` on number columns
- Body text >= 16px
- Caption/label >= 12px
- No letterspacing on lowercase text
**3. Color & Contrast** (10 items)
- Palette coherent (<=12 unique non-gray colors)
- WCAG AA: body text 4.5:1, large text (18px+) 3:1, UI components 3:1
- Semantic colors consistent (success=green, error=red, warning=yellow/amber)
- No color-only encoding (always add labels, icons, or patterns)
- Dark mode: surfaces use elevation, not just lightness inversion
- Dark mode: text off-white (~#E0E0E0), not pure white
- Primary accent desaturated 10-20% in dark mode
- \`color-scheme: dark\` on html element (if dark mode present)
- No red/green only combinations (8% of men have red-green deficiency)
- Neutral palette is warm or cool consistently not mixed
**4. Spacing & Layout** (12 items)
- Grid consistent at all breakpoints
- Spacing uses a scale (4px or 8px base), not arbitrary values
- Alignment is consistent nothing floats outside the grid
- Rhythm: related items closer together, distinct sections further apart
- Border-radius hierarchy (not uniform bubbly radius on everything)
- Inner radius = outer radius - gap (nested elements)
- No horizontal scroll on mobile
- Max content width set (no full-bleed body text)
- \`env(safe-area-inset-*)\` for notch devices
- URL reflects state (filters, tabs, pagination in query params)
- Flex/grid used for layout (not JS measurement)
- Breakpoints: mobile (375), tablet (768), desktop (1024), wide (1440)
**5. Interaction States** (10 items)
- Hover state on all interactive elements
- \`focus-visible\` ring present (never \`outline: none\` without replacement)
- Active/pressed state with depth effect or color shift
- Disabled state: reduced opacity + \`cursor: not-allowed\`
- Loading: skeleton shapes match real content layout
- Empty states: warm message + primary action + visual (not just "No items.")
- Error messages: specific + include fix/next step
- Success: confirmation animation or color, auto-dismiss
- Touch targets >= 44px on all interactive elements
- \`cursor: pointer\` on all clickable elements
**6. Responsive Design** (8 items)
- Mobile layout makes *design* sense (not just stacked desktop columns)
- Touch targets sufficient on mobile (>= 44px)
- No horizontal scroll on any viewport
- Images handle responsive (srcset, sizes, or CSS containment)
- Text readable without zooming on mobile (>= 16px body)
- Navigation collapses appropriately (hamburger, bottom nav, etc.)
- Forms usable on mobile (correct input types, no autoFocus on mobile)
- No \`user-scalable=no\` or \`maximum-scale=1\` in viewport meta
**7. Motion & Animation** (6 items)
- Easing: ease-out for entering, ease-in for exiting, ease-in-out for moving
- Duration: 50-700ms range (nothing slower unless page transition)
- Purpose: every animation communicates something (state change, attention, spatial relationship)
- \`prefers-reduced-motion\` respected (check: \`$B js "matchMedia('(prefers-reduced-motion: reduce)').matches"\`)
- No \`transition: all\` — properties listed explicitly
- Only \`transform\` and \`opacity\` animated (not layout properties like width, height, top, left)
**8. Content & Microcopy** (8 items)
- Empty states designed with warmth (message + action + illustration/icon)
- Error messages specific: what happened + why + what to do next
- Button labels specific ("Save API Key" not "Continue" or "Submit")
- No placeholder/lorem ipsum text visible in production
- Truncation handled (\`text-overflow: ellipsis\`, \`line-clamp\`, or \`break-words\`)
- Active voice ("Install the CLI" not "The CLI will be installed")
- Loading states end with \`\` ("Saving…" not "Saving...")
- Destructive actions have confirmation modal or undo window
**9. AI Slop Detection** (10 anti-patterns the blacklist)
The test: would a human designer at a respected studio ever ship this?
- Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes
- **The 3-column feature grid:** icon-in-colored-circle + bold title + 2-line description, repeated 3x symmetrically. THE most recognizable AI layout.
- Icons in colored circles as section decoration (SaaS starter template look)
- Centered everything (\`text-align: center\` on all headings, descriptions, cards)
- Uniform bubbly border-radius on every element (same large radius on everything)
- Decorative blobs, floating circles, wavy SVG dividers (if a section feels empty, it needs better content, not decoration)
- Emoji as design elements (rockets in headings, emoji as bullet points)
- Colored left-border on cards (\`border-left: 3px solid <accent>\`)
- Generic hero copy ("Welcome to [X]", "Unlock the power of...", "Your all-in-one solution for...")
- Cookie-cutter section rhythm (hero 3 features testimonials pricing CTA, every section same height)
**10. Performance as Design** (6 items)
- LCP < 2.0s (web apps), < 1.5s (informational sites)
- CLS < 0.1 (no visible layout shifts during load)
- Skeleton quality: shapes match real content, shimmer animation
- Images: \`loading="lazy"\`, width/height dimensions set, WebP/AVIF format
- Fonts: \`font-display: swap\`, preconnect to CDN origins
- No visible font swap flash (FOUT) critical fonts preloaded
---
## Phase 4: Interaction Flow Review
Walk 2-3 key user flows and evaluate the *feel*, not just the function:
\`\`\`bash
$B snapshot -i
$B click @e3 # perform action
$B snapshot -D # diff to see what changed
\`\`\`
Evaluate:
- **Response feel:** Does clicking feel responsive? Any delays or missing loading states?
- **Transition quality:** Are transitions intentional or generic/absent?
- **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
- **Form polish:** Focus states visible? Validation timing correct? Errors near the source?
---
## Phase 5: Cross-Page Consistency
Compare screenshots and observations across pages for:
- Navigation bar consistent across all pages?
- Footer consistent?
- Component reuse vs one-off designs (same button styled differently on different pages?)
- Tone consistency (one page playful while another is corporate?)
- Spacing rhythm carries across pages?
---
## Phase 6: Compile Report
### Output Locations
**Local:** \`.gstack/design-reports/design-audit-{domain}-{YYYY-MM-DD}.md\`
**Project-scoped:**
\`\`\`bash
SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\\([^/]*/[^/]*\\)\\.git$|\\1|;s|.*[:/]\\([^/]*/[^/]*\\)$|\\1|' | tr '/' '-')
mkdir -p ~/.gstack/projects/$SLUG
\`\`\`
Write to: \`~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md\`
**Baseline:** Write \`design-baseline.json\` for regression mode:
\`\`\`json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"designScore": "B",
"aiSlopScore": "C",
"categoryGrades": { "hierarchy": "A", "typography": "B", ... },
"findings": [{ "id": "FINDING-001", "title": "...", "impact": "high", "category": "typography" }]
}
\`\`\`
### Scoring System
**Dual headline scores:**
- **Design Score: {A-F}** weighted average of all 10 categories
- **AI Slop Score: {A-F}** standalone grade with pithy verdict
**Per-category grades:**
- **A:** Intentional, polished, delightful. Shows design thinking.
- **B:** Solid fundamentals, minor inconsistencies. Looks professional.
- **C:** Functional but generic. No major problems, no design point of view.
- **D:** Noticeable problems. Feels unfinished or careless.
- **F:** Actively hurting user experience. Needs significant rework.
**Grade computation:** Each category starts at A. Each High-impact finding drops one letter grade. Each Medium-impact finding drops half a letter grade. Polish findings are noted but do not affect grade. Minimum is F.
**Category weights for Design Score:**
| Category | Weight |
|----------|--------|
| Visual Hierarchy | 15% |
| Typography | 15% |
| Spacing & Layout | 15% |
| Color & Contrast | 10% |
| Interaction States | 10% |
| Responsive | 10% |
| Content Quality | 10% |
| AI Slop | 5% |
| Motion | 5% |
| Performance Feel | 5% |
AI Slop is 5% of Design Score but also graded independently as a headline metric.
### Regression Output
When previous \`design-baseline.json\` exists or \`--regression\` flag is used:
- Load baseline grades
- Compare: per-category deltas, new findings, resolved findings
- Append regression table to report
---
## Design Critique Format
Use structured feedback, not opinions:
- "I notice..." observation (e.g., "I notice the primary CTA competes with the secondary action")
- "I wonder..." question (e.g., "I wonder if users will understand what 'Process' means here")
- "What if..." suggestion (e.g., "What if we moved search to a more prominent position?")
- "I think... because..." reasoned opinion (e.g., "I think the spacing between sections is too uniform because it doesn't create hierarchy")
Tie everything to user goals and product objectives. Always suggest specific improvements alongside problems.
---
## Important Rules
1. **Think like a designer, not a QA engineer.** You care whether things feel right, look intentional, and respect the user. You do NOT just care whether things "work."
2. **Screenshots are evidence.** Every finding needs at least one screenshot. Use annotated screenshots (\`snapshot -a\`) to highlight elements.
3. **Be specific and actionable.** "Change X to Y because Z" not "the spacing feels off."
4. **Never read source code.** Evaluate the rendered site, not the implementation. (Exception: offer to write DESIGN.md from extracted observations.)
5. **AI Slop detection is your superpower.** Most developers can't evaluate whether their site looks AI-generated. You can. Be direct about it.
6. **Quick wins matter.** Always include a "Quick Wins" section the 3-5 highest-impact fixes that take <30 minutes each.
7. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
8. **Responsive is design, not just "not broken."** A stacked desktop layout on mobile is not responsive design it's lazy. Evaluate whether the mobile layout makes *design* sense.
9. **Document incrementally.** Write each finding to the report as you find it. Don't batch.
10. **Depth over breadth.** 5-10 well-documented findings with screenshots and specific suggestions > 20 vague observations.`;
}
const RESOLVERS: Record<string, () => string> = {
COMMAND_REFERENCE: generateCommandReference,
SNAPSHOT_FLAGS: generateSnapshotFlags,
@@ -511,6 +844,7 @@ const RESOLVERS: Record<string, () => string> = {
BROWSE_SETUP: generateBrowseSetup,
BASE_BRANCH_DETECT: generateBaseBranchDetect,
QA_METHODOLOGY: generateQAMethodology,
DESIGN_METHODOLOGY: generateDesignMethodology,
};
// ─── Template Processing ────────────────────────────────────
@@ -566,6 +900,9 @@ function findTemplates(): string[] {
path.join(ROOT, 'brainstorm', 'SKILL.md.tmpl'),
path.join(ROOT, 'debug', 'SKILL.md.tmpl'),
path.join(ROOT, 'gstack-upgrade', 'SKILL.md.tmpl'),
path.join(ROOT, 'plan-design-review', 'SKILL.md.tmpl'),
path.join(ROOT, 'qa-design-review', 'SKILL.md.tmpl'),
path.join(ROOT, 'design-consultation', 'SKILL.md.tmpl'),
path.join(ROOT, 'document-release', 'SKILL.md.tmpl'),
];
for (const p of candidates) {
+2
View File
@@ -27,6 +27,8 @@ const SKILL_FILES = [
'plan-ceo-review/SKILL.md',
'plan-eng-review/SKILL.md',
'setup-browser-cookies/SKILL.md',
'plan-design-review/SKILL.md',
'qa-design-review/SKILL.md',
'gstack-upgrade/SKILL.md',
'document-release/SKILL.md',
].filter(f => fs.existsSync(path.join(ROOT, f)));
+18 -11
View File
@@ -132,7 +132,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
@@ -145,6 +145,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
---
@@ -268,19 +269,25 @@ Review the diff for structural issues that tests don't catch.
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories
4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
`[AUTO-FIXED] [file:line] Problem → what you did`
6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
- The problem (`file:line` + description)
- `RECOMMENDATION: Choose A because [one-line reason]`
- Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
6. **If ASK items remain,** present them in ONE AskUserQuestion:
- List each with number, severity, problem, recommended fix
- Per-item options: A) Fix B) Skip
- Overall RECOMMENDATION
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
7. **After all fixes (auto + user-approved):**
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
If no issues found: `Pre-Landing Review: No issues found.`
Save the review output — it goes into the PR body in Step 8.
@@ -535,7 +542,7 @@ EOF
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
+18 -11
View File
@@ -25,7 +25,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
@@ -38,6 +38,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
---
@@ -161,19 +162,25 @@ Review the diff for structural issues that tests don't catch.
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories
4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
`[AUTO-FIXED] [file:line] Problem → what you did`
6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
- The problem (`file:line` + description)
- `RECOMMENDATION: Choose A because [one-line reason]`
- Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
6. **If ASK items remain,** present them in ONE AskUserQuestion:
- List each with number, severity, problem, recommended fix
- Per-item options: A) Fix B) Skip
- Overall RECOMMENDATION
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
7. **After all fixes (auto + user-approved):**
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
If no issues found: `Pre-Landing Review: No issues found.`
Save the review output — it goes into the PR body in Step 8.
@@ -428,7 +435,7 @@ EOF
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
+3
View File
@@ -69,6 +69,9 @@ describe('gen-skill-docs', () => {
{ dir: 'retro', name: 'retro' },
{ dir: 'setup-browser-cookies', name: 'setup-browser-cookies' },
{ dir: 'gstack-upgrade', name: 'gstack-upgrade' },
{ dir: 'plan-design-review', name: 'plan-design-review' },
{ dir: 'qa-design-review', name: 'qa-design-review' },
{ dir: 'design-consultation', name: 'design-consultation' },
];
test('every skill has a SKILL.md.tmpl template', () => {
+552
View File
@@ -1663,6 +1663,558 @@ describeE2E('Deferred skill E2E', () => {
test.todo('/gstack-upgrade completes upgrade flow');
});
// --- Design Consultation E2E ---
/**
* LLM judge for DESIGN.md quality checks font blacklist compliance,
* coherence, specificity, and AI slop avoidance.
*/
async function designQualityJudge(designMd: string): Promise<{ passed: boolean; reasoning: string }> {
return callJudge<{ passed: boolean; reasoning: string }>(`You are evaluating a generated DESIGN.md file for quality.
Evaluate against these criteria ALL must pass for an overall "passed: true":
1. Does NOT recommend Inter, Roboto, Arial, Helvetica, Open Sans, Lato, Montserrat, or Poppins as primary fonts
2. Aesthetic direction is coherent with color approach (e.g., brutalist aesthetic doesn't pair with expressive color without explanation)
3. Font recommendations include specific font names (not generic like "a sans-serif font")
4. Color palette includes actual hex values, not placeholders like "[hex]"
5. Rationale is provided for major decisions (not just "because it looks good")
6. No AI slop patterns: purple gradients mentioned positively, "3-column feature grid" language, generic marketing speak
7. Product context is reflected in design choices (civic tech should have appropriate, professional aesthetic)
DESIGN.md content:
\`\`\`
${designMd}
\`\`\`
Return JSON: { "passed": true/false, "reasoning": "one paragraph explaining your evaluation" }`);
}
describeE2E('Design Consultation E2E', () => {
let designDir: string;
beforeAll(() => {
designDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-design-consultation-'));
const { spawnSync } = require('child_process');
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: designDir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
// Create a realistic project context
fs.writeFileSync(path.join(designDir, 'README.md'), `# CivicPulse
A civic tech data platform for government employees to access, visualize, and share public data. Built with Next.js and PostgreSQL.
## Features
- Real-time data dashboards for municipal budgets
- Public records search with faceted filtering
- Data export and sharing tools for inter-department collaboration
`);
fs.writeFileSync(path.join(designDir, 'package.json'), JSON.stringify({
name: 'civicpulse',
version: '0.1.0',
dependencies: { next: '^14.0.0', react: '^18.2.0', 'tailwindcss': '^3.4.0' },
}, null, 2));
run('git', ['add', '.']);
run('git', ['commit', '-m', 'initial project setup']);
// Copy design-consultation skill
fs.mkdirSync(path.join(designDir, 'design-consultation'), { recursive: true });
fs.copyFileSync(
path.join(ROOT, 'design-consultation', 'SKILL.md'),
path.join(designDir, 'design-consultation', 'SKILL.md'),
);
});
afterAll(() => {
try { fs.rmSync(designDir, { recursive: true, force: true }); } catch {}
});
test('Test 1: core flow produces valid DESIGN.md + CLAUDE.md', async () => {
const result = await runSkillTest({
prompt: `Read design-consultation/SKILL.md for the design consultation workflow.
This is a civic tech data platform called CivicPulse for government employees who need to access public data. Read the README.md for details.
Skip research work from your design knowledge. Skip the font preview page. Skip any AskUserQuestion calls this is non-interactive. Accept your first design system proposal.
Write DESIGN.md and CLAUDE.md (or update it) in the working directory.`,
workingDirectory: designDir,
maxTurns: 20,
timeout: 360_000,
testName: 'design-consultation-core',
runId,
});
logCost('/design-consultation core', result);
const designPath = path.join(designDir, 'DESIGN.md');
const claudePath = path.join(designDir, 'CLAUDE.md');
const designExists = fs.existsSync(designPath);
const claudeExists = fs.existsSync(claudePath);
let designContent = '';
if (designExists) {
designContent = fs.readFileSync(designPath, 'utf-8');
}
// Structural checks
const requiredSections = ['Product Context', 'Aesthetic', 'Typography', 'Color', 'Spacing', 'Layout', 'Motion'];
const missingSections = requiredSections.filter(s => !designContent.toLowerCase().includes(s.toLowerCase()));
// LLM judge for quality
let judgeResult = { passed: false, reasoning: 'judge not run' };
if (designExists && designContent.length > 100) {
try {
judgeResult = await designQualityJudge(designContent);
console.log('Design quality judge:', JSON.stringify(judgeResult, null, 2));
} catch (err) {
console.warn('Judge failed:', err);
judgeResult = { passed: true, reasoning: 'judge error — defaulting to pass' };
}
}
const structuralPass = designExists && claudeExists && missingSections.length === 0;
recordE2E('/design-consultation core', 'Design Consultation E2E', result, {
passed: structuralPass && judgeResult.passed && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
expect(designExists).toBe(true);
if (designExists) {
expect(missingSections).toHaveLength(0);
}
if (claudeExists) {
const claude = fs.readFileSync(claudePath, 'utf-8');
expect(claude.toLowerCase()).toContain('design.md');
}
}, 420_000);
test('Test 2: research integration uses WebSearch', async () => {
// Clean up from previous test
try { fs.unlinkSync(path.join(designDir, 'DESIGN.md')); } catch {}
try { fs.unlinkSync(path.join(designDir, 'CLAUDE.md')); } catch {}
const result = await runSkillTest({
prompt: `Read design-consultation/SKILL.md for the design consultation workflow.
This is a civic tech data platform called CivicPulse. Read the README.md.
DO research competitors before proposing search for civic tech and government data platform designs. Skip the font preview page. Skip any AskUserQuestion calls this is non-interactive.
Write DESIGN.md to the working directory.`,
workingDirectory: designDir,
maxTurns: 30,
timeout: 360_000,
testName: 'design-consultation-research',
runId,
});
logCost('/design-consultation research', result);
const designPath = path.join(designDir, 'DESIGN.md');
const designExists = fs.existsSync(designPath);
let designContent = '';
if (designExists) {
designContent = fs.readFileSync(designPath, 'utf-8');
}
// Check if WebSearch was used (may not be available in all envs)
const webSearchCalls = result.toolCalls.filter(tc => tc.tool === 'WebSearch');
if (webSearchCalls.length > 0) {
console.log(`WebSearch used ${webSearchCalls.length} times`);
} else {
console.warn('WebSearch not used — may be unavailable in test env');
}
// LLM judge
let judgeResult = { passed: false, reasoning: 'judge not run' };
if (designExists && designContent.length > 100) {
try {
judgeResult = await designQualityJudge(designContent);
console.log('Design quality judge (research):', JSON.stringify(judgeResult, null, 2));
} catch (err) {
console.warn('Judge failed:', err);
judgeResult = { passed: true, reasoning: 'judge error — defaulting to pass' };
}
}
recordE2E('/design-consultation research', 'Design Consultation E2E', result, {
passed: designExists && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
expect(designExists).toBe(true);
}, 420_000);
test('Test 3: handles existing DESIGN.md', async () => {
// Pre-create a minimal DESIGN.md
fs.writeFileSync(path.join(designDir, 'DESIGN.md'), `# Design System — CivicPulse
## Typography
Body: system-ui
`);
const result = await runSkillTest({
prompt: `Read design-consultation/SKILL.md for the design consultation workflow.
There is already a DESIGN.md in this repo. Update it with a complete design system for CivicPulse, a civic tech data platform for government employees.
Skip research. Skip font preview. Skip any AskUserQuestion calls this is non-interactive.`,
workingDirectory: designDir,
maxTurns: 20,
timeout: 360_000,
testName: 'design-consultation-existing',
runId,
});
logCost('/design-consultation existing', result);
const designPath = path.join(designDir, 'DESIGN.md');
const designExists = fs.existsSync(designPath);
let designContent = '';
if (designExists) {
designContent = fs.readFileSync(designPath, 'utf-8');
}
// Should have more content than the minimal version
const hasColor = designContent.toLowerCase().includes('color');
const hasSpacing = designContent.toLowerCase().includes('spacing');
recordE2E('/design-consultation existing', 'Design Consultation E2E', result, {
passed: designExists && hasColor && hasSpacing && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
expect(designExists).toBe(true);
if (designExists) {
expect(hasColor).toBe(true);
expect(hasSpacing).toBe(true);
}
}, 420_000);
test('Test 4: generates font + color preview HTML', async () => {
// Clean up
try { fs.unlinkSync(path.join(designDir, 'DESIGN.md')); } catch {}
const result = await runSkillTest({
prompt: `Read design-consultation/SKILL.md for the design consultation workflow.
This is CivicPulse, a civic tech data platform. Read the README.md.
Skip research. Skip any AskUserQuestion calls this is non-interactive. Generate the font and color preview page but write it to ./design-preview.html instead of /tmp/ (do NOT run the open command). Then write DESIGN.md.`,
workingDirectory: designDir,
maxTurns: 20,
timeout: 360_000,
testName: 'design-consultation-preview',
runId,
});
logCost('/design-consultation preview', result);
const previewPath = path.join(designDir, 'design-preview.html');
const designPath = path.join(designDir, 'DESIGN.md');
const previewExists = fs.existsSync(previewPath);
const designExists = fs.existsSync(designPath);
let previewContent = '';
if (previewExists) {
previewContent = fs.readFileSync(previewPath, 'utf-8');
}
const hasHtml = previewContent.includes('<html') || previewContent.includes('<!DOCTYPE');
const hasFontRef = previewContent.includes('font-family') || previewContent.includes('fonts.googleapis') || previewContent.includes('fonts.bunny');
const hasColorRef = previewContent.includes('#') && (previewContent.includes('background') || previewContent.includes('color:'));
// LLM judge on the DESIGN.md
let judgeResult = { passed: false, reasoning: 'judge not run' };
if (designExists) {
const designContent = fs.readFileSync(designPath, 'utf-8');
if (designContent.length > 100) {
try {
judgeResult = await designQualityJudge(designContent);
console.log('Design quality judge (preview):', JSON.stringify(judgeResult, null, 2));
} catch (err) {
console.warn('Judge failed:', err);
judgeResult = { passed: true, reasoning: 'judge error — defaulting to pass' };
}
}
}
recordE2E('/design-consultation preview', 'Design Consultation E2E', result, {
passed: previewExists && designExists && hasHtml && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
expect(previewExists).toBe(true);
if (previewExists) {
expect(hasHtml).toBe(true);
expect(hasFontRef).toBe(true);
}
expect(designExists).toBe(true);
}, 420_000);
});
// --- Plan Design Review E2E ---
describeE2E('Plan Design Review E2E', () => {
let reviewDir: string;
beforeAll(() => {
testServer = testServer || startTestServer();
reviewDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-plan-design-'));
setupBrowseShims(reviewDir);
const { spawnSync } = require('child_process');
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: reviewDir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
fs.writeFileSync(path.join(reviewDir, 'index.html'), '<h1>Test</h1>\n');
run('git', ['add', '.']);
run('git', ['commit', '-m', 'initial']);
// Copy plan-design-review skill
fs.mkdirSync(path.join(reviewDir, 'plan-design-review'), { recursive: true });
fs.copyFileSync(
path.join(ROOT, 'plan-design-review', 'SKILL.md'),
path.join(reviewDir, 'plan-design-review', 'SKILL.md'),
);
});
afterAll(() => {
try { fs.rmSync(reviewDir, { recursive: true, force: true }); } catch {}
});
test('Test 5: /plan-design-review produces audit report', async () => {
const result = await runSkillTest({
prompt: `IMPORTANT: The browse binary is already assigned below as B. Do NOT search for it or run the SKILL.md setup block — just use $B directly.
B="${browseBin}"
Read plan-design-review/SKILL.md for the design review workflow.
Review the site at ${testServer.url}. Use --quick mode (homepage + 2 pages). Skip any AskUserQuestion calls this is non-interactive. Write your audit report to ./design-audit.md. Do not offer to create DESIGN.md.`,
workingDirectory: reviewDir,
maxTurns: 20,
timeout: 360_000,
testName: 'plan-design-review-audit',
runId,
});
logCost('/plan-design-review audit', result);
const reportPath = path.join(reviewDir, 'design-audit.md');
const reportExists = fs.existsSync(reportPath);
let reportContent = '';
if (reportExists) {
reportContent = fs.readFileSync(reportPath, 'utf-8');
}
const hasFirstImpression = reportContent.toLowerCase().includes('first impression') ||
reportContent.toLowerCase().includes('impression');
recordE2E('/plan-design-review audit', 'Plan Design Review E2E', result, {
passed: reportExists && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
expect(reportExists).toBe(true);
if (reportExists) {
expect(reportContent.length).toBeGreaterThan(200);
}
}, 420_000);
test('Test 6: /plan-design-review exports DESIGN.md', async () => {
// Clean up previous test artifacts
try { fs.unlinkSync(path.join(reviewDir, 'design-audit.md')); } catch {}
const result = await runSkillTest({
prompt: `IMPORTANT: The browse binary is already assigned below as B. Do NOT search for it or run the SKILL.md setup block — just use $B directly.
B="${browseBin}"
Read plan-design-review/SKILL.md for the design review workflow.
Review ${testServer.url} with --quick mode. Skip any AskUserQuestion calls this is non-interactive. After Phase 2 (Design System Extraction), write a DESIGN.md to the working directory. Also write the audit report to ./design-audit.md.`,
workingDirectory: reviewDir,
maxTurns: 25,
timeout: 360_000,
testName: 'plan-design-review-export',
runId,
});
logCost('/plan-design-review export', result);
const designPath = path.join(reviewDir, 'DESIGN.md');
const reportPath = path.join(reviewDir, 'design-audit.md');
const designExists = fs.existsSync(designPath);
const reportExists = fs.existsSync(reportPath);
let designContent = '';
if (designExists) {
designContent = fs.readFileSync(designPath, 'utf-8');
}
const hasTypography = designContent.toLowerCase().includes('typography') || designContent.toLowerCase().includes('font');
const hasColor = designContent.toLowerCase().includes('color');
recordE2E('/plan-design-review export', 'Plan Design Review E2E', result, {
passed: designExists && ['success', 'error_max_turns'].includes(result.exitReason),
});
expect(['success', 'error_max_turns']).toContain(result.exitReason);
// DESIGN.md export is best-effort — agent may not always produce it
if (designExists) {
expect(hasTypography || hasColor).toBe(true);
}
}, 420_000);
});
// --- QA Design Review E2E ---
describeE2E('QA Design Review E2E', () => {
let qaDesignDir: string;
let qaDesignServer: ReturnType<typeof Bun.serve> | null = null;
beforeAll(() => {
qaDesignDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-qa-design-'));
setupBrowseShims(qaDesignDir);
const { spawnSync } = require('child_process');
const run = (cmd: string, args: string[]) =>
spawnSync(cmd, args, { cwd: qaDesignDir, stdio: 'pipe', timeout: 5000 });
run('git', ['init']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
// Create HTML/CSS with intentional design issues
fs.writeFileSync(path.join(qaDesignDir, 'index.html'), `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Design Test App</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<header>
<h1 style="font-size: 48px; color: #333;">Welcome</h1>
<h2 style="font-size: 47px; color: #334;">Subtitle Here</h2>
</header>
<main>
<div class="card" style="padding: 10px; margin: 20px;">
<h3 style="color: blue;">Card Title</h3>
<p style="color: #666; font-size: 14px; line-height: 1.2;">Some content here with tight line height.</p>
</div>
<div class="card" style="padding: 30px; margin: 5px;">
<h3 style="color: green;">Another Card</h3>
<p style="color: #999; font-size: 16px;">Different spacing and colors for no reason.</p>
</div>
<button style="background: red; color: white; padding: 5px 10px; border: none;">Click Me</button>
<button style="background: #007bff; color: white; padding: 12px 24px; border: none; border-radius: 20px;">Also Click</button>
</main>
</body>
</html>`);
fs.writeFileSync(path.join(qaDesignDir, 'style.css'), `body {
font-family: Arial, sans-serif;
margin: 0;
padding: 20px;
}
.card {
border: 1px solid #ddd;
border-radius: 4px;
}
`);
run('git', ['add', '.']);
run('git', ['commit', '-m', 'initial design test page']);
// Start a simple file server for the design test page
qaDesignServer = Bun.serve({
port: 0,
fetch(req) {
const url = new URL(req.url);
const filePath = path.join(qaDesignDir, url.pathname === '/' ? 'index.html' : url.pathname.slice(1));
try {
const content = fs.readFileSync(filePath);
const ext = path.extname(filePath);
const contentType = ext === '.css' ? 'text/css' : ext === '.html' ? 'text/html' : 'text/plain';
return new Response(content, { headers: { 'Content-Type': contentType } });
} catch {
return new Response('Not Found', { status: 404 });
}
},
});
// Copy qa-design-review skill
fs.mkdirSync(path.join(qaDesignDir, 'qa-design-review'), { recursive: true });
fs.copyFileSync(
path.join(ROOT, 'qa-design-review', 'SKILL.md'),
path.join(qaDesignDir, 'qa-design-review', 'SKILL.md'),
);
});
afterAll(() => {
qaDesignServer?.stop();
try { fs.rmSync(qaDesignDir, { recursive: true, force: true }); } catch {}
});
test('Test 7: /qa-design-review audits and fixes design issues', async () => {
const serverUrl = `http://localhost:${(qaDesignServer as any)?.port}`;
const result = await runSkillTest({
prompt: `IMPORTANT: The browse binary is already assigned below as B. Do NOT search for it or run the SKILL.md setup block — just use $B directly.
B="${browseBin}"
Read qa-design-review/SKILL.md for the design review + fix workflow.
Review the site at ${serverUrl}. Use --quick mode. Skip any AskUserQuestion calls this is non-interactive. Fix up to 3 issues max. Write your report to ./design-audit.md.`,
workingDirectory: qaDesignDir,
maxTurns: 30,
timeout: 360_000,
testName: 'qa-design-review-fix',
runId,
});
logCost('/qa-design-review fix', result);
const reportPath = path.join(qaDesignDir, 'design-audit.md');
const reportExists = fs.existsSync(reportPath);
// Check if any design fix commits were made
const gitLog = spawnSync('git', ['log', '--oneline'], {
cwd: qaDesignDir, stdio: 'pipe',
});
const commits = gitLog.stdout.toString().trim().split('\n');
const designFixCommits = commits.filter((c: string) => c.includes('style(design)'));
recordE2E('/qa-design-review fix', 'QA Design Review E2E', result, {
passed: ['success', 'error_max_turns'].includes(result.exitReason),
});
// Accept error_max_turns — the fix loop is complex
expect(['success', 'error_max_turns']).toContain(result.exitReason);
// Report and commits are best-effort — log what happened
if (reportExists) {
const report = fs.readFileSync(reportPath, 'utf-8');
console.log(`Design audit report: ${report.length} chars`);
} else {
console.warn('No design-audit.md generated');
}
console.log(`Design fix commits: ${designFixCommits.length}`);
}, 420_000);
});
// Module-level afterAll — finalize eval collector after all tests complete
afterAll(async () => {
if (evalCollector) {
+49 -2
View File
@@ -57,6 +57,34 @@ describe('SKILL.md command validation', () => {
const result = validateSkill(qaOnlySkill);
expect(result.snapshotFlagErrors).toHaveLength(0);
});
test('all $B commands in plan-design-review/SKILL.md are valid browse commands', () => {
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
if (!fs.existsSync(skill)) return;
const result = validateSkill(skill);
expect(result.invalid).toHaveLength(0);
});
test('all snapshot flags in plan-design-review/SKILL.md are valid', () => {
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
if (!fs.existsSync(skill)) return;
const result = validateSkill(skill);
expect(result.snapshotFlagErrors).toHaveLength(0);
});
test('all $B commands in qa-design-review/SKILL.md are valid browse commands', () => {
const skill = path.join(ROOT, 'qa-design-review', 'SKILL.md');
if (!fs.existsSync(skill)) return;
const result = validateSkill(skill);
expect(result.invalid).toHaveLength(0);
});
test('all snapshot flags in qa-design-review/SKILL.md are valid', () => {
const skill = path.join(ROOT, 'qa-design-review', 'SKILL.md');
if (!fs.existsSync(skill)) return;
const result = validateSkill(skill);
expect(result.snapshotFlagErrors).toHaveLength(0);
});
});
describe('Command registry consistency', () => {
@@ -177,6 +205,9 @@ describe('Update check preamble', () => {
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
'retro/SKILL.md',
'brainstorm/SKILL.md', 'debug/SKILL.md',
'plan-design-review/SKILL.md',
'qa-design-review/SKILL.md',
'design-consultation/SKILL.md',
'document-release/SKILL.md',
];
@@ -483,6 +514,9 @@ describe('v0.4.1 preamble features', () => {
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
'retro/SKILL.md',
'brainstorm/SKILL.md', 'debug/SKILL.md',
'plan-design-review/SKILL.md',
'qa-design-review/SKILL.md',
'design-consultation/SKILL.md',
'document-release/SKILL.md',
];
@@ -589,8 +623,8 @@ describe('Enum & Value Completeness in review checklist', () => {
expect(checklist).toContain('allowlist');
});
test('Enum & Value Completeness is in the gate classification as CRITICAL', () => {
const gateSection = checklist.slice(checklist.indexOf('## Gate Classification'));
test('Enum & Value Completeness is in the severity classification as CRITICAL', () => {
const gateSection = checklist.slice(checklist.indexOf('## Severity Classification'));
// The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
// Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));
@@ -598,6 +632,19 @@ describe('Enum & Value Completeness in review checklist', () => {
// It's on the left (CRITICAL) side — starts with ├─ or └─
expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
});
test('Fix-First Heuristic exists in checklist and is referenced by review + ship', () => {
expect(checklist).toContain('## Fix-First Heuristic');
expect(checklist).toContain('AUTO-FIX');
expect(checklist).toContain('ASK');
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
expect(reviewSkill).toContain('AUTO-FIX');
expect(reviewSkill).toContain('[AUTO-FIXED]');
expect(shipSkill).toContain('AUTO-FIX');
expect(shipSkill).toContain('[AUTO-FIXED]');
});
});
// --- Part 7: Planted-bug fixture validation (A4) ---