From 7e96fe299b085010fb2e34d9c4fbfc7e44b617e1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 13 Apr 2026 07:49:37 -1000 Subject: [PATCH 1/2] =?UTF-8?q?fix:=20security=20wave=203=20=E2=80=94=2012?= =?UTF-8?q?=20fixes,=207=20contributors=20(v0.16.4.0)=20(#988)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(security): validateOutputPath symlink bypass — check file-level symlinks validateOutputPath() previously only resolved symlinks on the parent directory. A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is /tmp, which is safe) but the write followed the symlink outside safe dirs. Add lstatSync() check: if the target file exists and is a symlink, resolve through it and verify the real target is within SAFE_DIRECTORIES. ENOENT (file doesn't exist yet) falls through to the existing parent-dir check. Closes #921 Co-Authored-By: Yunsu * fix(security): shell injection in bin/ scripts — use env vars instead of interpolation gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e double-quoted blocks. A path containing quotes or backticks breaks the JS string context, enabling arbitrary code execution. Replace direct interpolation with environment variables (process.env). Same fix applied to gstack-team-init which had the same pattern. Systematic audit confirmed only these two scripts were vulnerable — all other bin/ scripts already use stdin piping or env vars. Closes #858 Co-Authored-By: Gus * fix(security): cookie-import path validation bypass + hardcoded /tmp Two fixes: 1. cookie-import relative path bypass (#707): path.isAbsolute() gated the entire validation, so relative paths like "sensitive-file.json" bypassed the safe-directory check entirely. Now always resolves to absolute path with realpathSync for symlink resolution, matching validateOutputPath(). 2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used /tmp directly instead of os.tmpdir(), breaking Windows support. Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in write-commands.ts (previously resolved implicitly through bundler). Closes #852 Co-Authored-By: Toby Morning * fix(security): redact form fields with sensitive names, not just type=password Form redaction only applied to type="password" fields. Hidden and text fields named csrf_token, api_key, session_id, etc. were exposed unredacted in LLM context, leaking secrets. Extend redaction to check field name and id against sensitive patterns: token, secret, key, password, credential, auth, jwt, session, csrf, sid, api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME. Closes #860 Co-Authored-By: Gus * fix(security): restrict session file permissions to owner-only Design session files written to /tmp with default umask (0644) were world-readable on shared systems. Sessions contain design prompts and feedback history. Set mode 0o600 (owner read/write only) on both create and update paths. Closes #859 Co-Authored-By: Gus * fix(security): enforce frozen lockfile during setup bun install without --frozen-lockfile resolves ^semver ranges from npm on every run. If an attacker publishes a compromised compatible version of any dependency, the next ./setup pulls it silently. Add --frozen-lockfile with fallback to plain install (for fresh clones where bun.lock may not exist yet). Matches the pattern already used in the .agents/ generation block (line 237). Closes #614 Co-Authored-By: Alberto Martinez * fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci chmod -R 1777 /tmp recursively sets sticky bit on files (no defined behavior), not just the directory. Deduplicate to single chmod 1777 /tmp. Closes #747 Co-Authored-By: Maksim Soltan * fix(security): learnings input validation + cross-project trust gate Three fixes to the learnings system: 1. Input validation in gstack-learnings-log: type must be from allowed list, key must be alphanumeric, confidence must be 1-10 integer, source must be from allowed list. Prevents injection via malformed fields. 2. Prompt injection defense: insight field checked against 10 instruction-like patterns (ignore previous, system:, override, etc.). Rejected with clear error message. 3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings from other projects are filtered out. Only user-stated learnings cross project boundaries. Prevents silent prompt injection across codebases. Also adds trusted field (true for user-stated source, false for AI-generated) to enable the trust gate at read time. Closes #841 Co-Authored-By: Ziad Al Sharif * feat(security): track cookie-imported domains and scope cookie imports Foundation for origin-pinned JS execution (#616). Tracks which domains cookies were imported from so the JS/eval commands can verify execution stays within imported origins. Changes: - BrowserManager: new cookieImportedDomains Set with track/get/has methods - cookie-import: tracks imported cookie domains after addCookies - cookie-import-browser: tracks domains on --domain direct import - cookie-import-browser --all: new explicit opt-in for all-domain import (previously implicit behavior, now requires deliberate flag) Closes #615 Co-Authored-By: Alberto Martinez * feat(security): pin JS/eval execution to cookie-imported origins When cookies have been imported for specific domains, block JS execution on pages whose origin doesn't match. Prevents the attack chain: 1. Agent imports cookies for github.com 2. Prompt injection navigates to attacker.com 3. Agent runs js document.cookie → exfiltrates github cookies assertJsOriginAllowed() checks the current page hostname against imported cookie domains with subdomain matching (.github.com allows api.github.com). When no cookies are imported, all origins allowed (nothing to protect). about:blank and data: URIs are allowed (no cookies at risk). Depends on #615 (cookie domain tracking). Closes #616 Co-Authored-By: Alberto Martinez * feat(security): add persistent command audit log Append-only JSONL audit trail for all browse server commands. Unlike in-memory ring buffers, the audit log persists across restarts and is never truncated. Each entry records: timestamp, command, args (truncated to 200 chars), page origin, duration, status, error (truncated to 300 chars), hasCookies flag, connection mode. All writes are best-effort — audit failures never block command execution. Log stored at ~/.gstack/.browse/browse-audit.jsonl. Closes #617 Co-Authored-By: Alberto Martinez * fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe (hex form), which was not in the blocklist. Similarly, ::169.254.169.254 normalizes to ::a9fe:a9fe. Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught by the direct hostname check in validateNavigationUrl. Closes #739 Co-Authored-By: Osman Mehmood * chore: bump version and changelog (v0.16.4.0) Security wave 3: 12 fixes, 7 contributors. Cookie origin pinning, command audit log, domain tracking. Symlink bypass, path validation, shell injection, form redaction, learnings injection, IPv6 SSRF, session permissions, frozen lockfile. Co-Authored-By: Claude Opus 4.6 (1M context) --------- Co-authored-by: Yunsu Co-authored-by: Gus Co-authored-by: Toby Morning Co-authored-by: Alberto Martinez Co-authored-by: Maksim Soltan Co-authored-by: Ziad Al Sharif Co-authored-by: Osman Mehmood Co-authored-by: Claude Opus 4.6 (1M context) --- .github/docker/Dockerfile.ci | 3 +- CHANGELOG.md | 24 +++++++++ VERSION | 2 +- bin/gstack-learnings-log | 82 ++++++++++++++++++++++++----- bin/gstack-learnings-search | 8 ++- bin/gstack-settings-hook | 10 ++-- bin/gstack-team-init | 4 +- browse/src/audit.ts | 65 +++++++++++++++++++++++ browse/src/browser-manager.ts | 16 ++++++ browse/src/config.ts | 2 + browse/src/cookie-import-browser.ts | 3 +- browse/src/path-security.ts | 21 +++++++- browse/src/read-commands.ts | 43 ++++++++++++++- browse/src/server.ts | 33 ++++++++++-- browse/src/url-validation.ts | 2 + browse/src/write-commands.ts | 56 +++++++++++++++----- browse/test/commands.test.ts | 3 +- design/src/session.ts | 2 +- setup | 2 +- 19 files changed, 333 insertions(+), 48 deletions(-) create mode 100644 browse/src/audit.ts diff --git a/.github/docker/Dockerfile.ci b/.github/docker/Dockerfile.ci index 038b2576..1048bb47 100644 --- a/.github/docker/Dockerfile.ci +++ b/.github/docker/Dockerfile.ci @@ -59,5 +59,4 @@ RUN useradd -m -s /bin/bash runner \ && chmod -R a+rX /opt/node_modules_cache \ && mkdir -p /home/runner/.gstack && chown -R runner:runner /home/runner/.gstack \ && chmod 1777 /tmp \ - && mkdir -p /home/runner/.bun && chown -R runner:runner /home/runner/.bun \ - && chmod -R 1777 /tmp + && mkdir -p /home/runner/.bun && chown -R runner:runner /home/runner/.bun diff --git a/CHANGELOG.md b/CHANGELOG.md index 1b5965ca..061888ff 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,29 @@ # Changelog +## [0.16.4.0] - 2026-04-13 + +### Added +- **Cookie origin pinning.** When you import cookies for specific domains, JS execution is now blocked on pages that don't match those domains. This prevents the attack where a prompt injection navigates to an attacker's site and runs `document.cookie` to steal your imported cookies. Subdomain matching works automatically (importing `.github.com` allows `api.github.com`). When no cookies are imported, everything works as before. 3 PRs from @halbert04. +- **Command audit log.** Every browse command now gets a persistent forensic trail in `~/.gstack/.browse/browse-audit.jsonl`. Timestamp, command, args, page origin, duration, status, error, and whether cookies were imported. Append-only, never truncated, survives server restarts. Best-effort writes that never block command execution. From @halbert04. +- **Cookie domain tracking.** gstack now tracks which domains cookies were imported from. Foundation for origin pinning above. Direct imports via `--domain` track automatically. New `--all` flag makes full-browser cookie import an explicit opt-in instead of the default. + +### Fixed +- **Symlink bypass in file writes.** `validateOutputPath` only checked the parent directory for symlinks, not the file itself. A symlink at `/tmp/evil.png` pointing to `/etc/crontab` passed validation because the parent `/tmp` was safe. Now checks the file with `lstatSync` before writing. From @Hybirdss. +- **Cookie-import path bypass.** Two issues: relative paths bypassed all validation (the `path.isAbsolute()` gate let `sensitive-file.json` through), and symlink resolution was missing (`path.resolve` without `realpathSync`). Now resolves to absolute, resolves symlinks, and checks against safe directories. From @urbantech. +- **Shell injection in setup scripts.** `gstack-settings-hook` interpolated file paths directly into `bun -e` JavaScript blocks. A path with quotes broke the JS string context. Now uses environment variables (`process.env`). Systematic audit confirmed only this script was vulnerable. From @garagon. +- **Form field credential leak.** Snapshot redaction only applied to `type="password"` fields. Hidden and text fields named `csrf_token`, `api_key`, `session_id` were exposed unredacted in LLM context. Now checks field name and id against sensitive patterns. From @garagon. +- **Learnings prompt injection.** Three fixes: input validation (type/key/confidence allowlists), injection pattern detection in insight field (blocks "ignore previous instructions" etc.), and cross-project trust gate (only user-stated learnings cross project boundaries). From @Ziadstr. +- **IPv6 metadata bypass.** The URL constructor normalizes `::ffff:169.254.169.254` to `::ffff:a9fe:a9fe` (hex), which wasn't in the blocklist. Added both hex-encoded forms. From @mehmoodosman. +- **Session files world-readable.** Design session files in `/tmp` were created with default permissions (0644). Now 0600 (owner-only). From @garagon. +- **Frozen lockfile in setup.** `bun install` now uses `--frozen-lockfile` to prevent supply chain attacks via floating semver ranges. From @halbert04. +- **Dockerfile chmod fix.** Removed duplicate recursive `chmod -R 1777 /tmp` (recursive sticky bit on files has no defined behavior). From @Gonzih. +- **Hardcoded /tmp in cookie import.** `cookie-import-browser` used `/tmp` directly instead of `os.tmpdir()`, breaking Windows support. + +### Security +- Closed 14 security issues (#665-#675, #566, #479, #467, #545) that were fixed in prior waves but still open on GitHub. +- Closed 17 community security PRs with thank-you messages and commit references. +- Security wave 3: 12 fixes, 7 contributors. Big thanks to @Hybirdss, @urbantech, @garagon, @Ziadstr, @halbert04, @mehmoodosman, @Gonzih. + ## [0.16.3.0] - 2026-04-09 ### Changed diff --git a/VERSION b/VERSION index de939e96..d1a96684 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.16.3.0 +0.16.4.0 diff --git a/bin/gstack-learnings-log b/bin/gstack-learnings-log index e63c14cb..6c528d3a 100755 --- a/bin/gstack-learnings-log +++ b/bin/gstack-learnings-log @@ -12,19 +12,75 @@ mkdir -p "$GSTACK_HOME/projects/$SLUG" INPUT="$1" -# Validate: input must be parseable JSON -if ! printf '%s' "$INPUT" | bun -e "JSON.parse(await Bun.stdin.text())" 2>/dev/null; then - echo "gstack-learnings-log: invalid JSON, skipping" >&2 +# Validate and sanitize input +VALIDATED=$(printf '%s' "$INPUT" | bun -e " +const raw = await Bun.stdin.text(); +let j; +try { j = JSON.parse(raw); } catch { process.stderr.write('gstack-learnings-log: invalid JSON, skipping\n'); process.exit(1); } + +// Field validation: type must be from allowed list +const ALLOWED_TYPES = ['pattern', 'pitfall', 'preference', 'architecture', 'tool', 'operational']; +if (!j.type || !ALLOWED_TYPES.includes(j.type)) { + process.stderr.write('gstack-learnings-log: invalid type \"' + (j.type || '') + '\", must be one of: ' + ALLOWED_TYPES.join(', ') + '\n'); + process.exit(1); +} + +// Field validation: key must be alphanumeric, hyphens, underscores (no injection surface) +if (!j.key || !/^[a-zA-Z0-9_-]+$/.test(j.key)) { + process.stderr.write('gstack-learnings-log: invalid key, must be alphanumeric with hyphens/underscores only\n'); + process.exit(1); +} + +// Field validation: confidence must be 1-10 +const conf = Number(j.confidence); +if (!Number.isInteger(conf) || conf < 1 || conf > 10) { + process.stderr.write('gstack-learnings-log: confidence must be integer 1-10\n'); + process.exit(1); +} +j.confidence = conf; + +// Field validation: source must be from allowed list +const ALLOWED_SOURCES = ['observed', 'user-stated', 'inferred', 'cross-model']; +if (j.source && !ALLOWED_SOURCES.includes(j.source)) { + process.stderr.write('gstack-learnings-log: invalid source, must be one of: ' + ALLOWED_SOURCES.join(', ') + '\n'); + process.exit(1); +} + +// Content sanitization: strip instruction-like patterns from insight field +// These patterns could be used for prompt injection when learnings are loaded into agent context +if (j.insight) { + const INJECTION_PATTERNS = [ + /ignore\s+(all\s+)?previous\s+(instructions|context|rules)/i, + /you\s+are\s+now\s+/i, + /always\s+output\s+no\s+findings/i, + /skip\s+(all\s+)?(security|review|checks)/i, + /override[:\s]/i, + /\bsystem\s*:/i, + /\bassistant\s*:/i, + /\buser\s*:/i, + /do\s+not\s+(report|flag|mention)/i, + /approve\s+(all|every|this)/i, + ]; + for (const pat of INJECTION_PATTERNS) { + if (pat.test(j.insight)) { + process.stderr.write('gstack-learnings-log: insight contains suspicious instruction-like content, rejected\n'); + process.exit(1); + } + } +} + +// Inject timestamp if not present +if (!j.ts) j.ts = new Date().toISOString(); + +// Mark trust level based on source +// user-stated = user explicitly told the agent this. All others are AI-generated. +j.trusted = j.source === 'user-stated'; + +console.log(JSON.stringify(j)); +" 2>/dev/null) + +if [ $? -ne 0 ] || [ -z "$VALIDATED" ]; then exit 1 fi -# Inject timestamp if not present -if ! printf '%s' "$INPUT" | bun -e "const j=JSON.parse(await Bun.stdin.text()); if(!j.ts) process.exit(1)" 2>/dev/null; then - INPUT=$(printf '%s' "$INPUT" | bun -e " - const j = JSON.parse(await Bun.stdin.text()); - j.ts = new Date().toISOString(); - console.log(JSON.stringify(j)); - " 2>/dev/null) || true -fi - -echo "$INPUT" >> "$GSTACK_HOME/projects/$SLUG/learnings.jsonl" +echo "$VALIDATED" >> "$GSTACK_HOME/projects/$SLUG/learnings.jsonl" diff --git a/bin/gstack-learnings-search b/bin/gstack-learnings-search index 634342e6..3b39e462 100755 --- a/bin/gstack-learnings-search +++ b/bin/gstack-learnings-search @@ -68,7 +68,13 @@ for (const line of lines) { // Determine if this is from the current project or cross-project // Cross-project entries are tagged for display - e._crossProject = !line.includes(slug) && process.env.GSTACK_SEARCH_CROSS === 'true'; + const isCrossProject = !line.includes(slug) && process.env.GSTACK_SEARCH_CROSS === 'true'; + e._crossProject = isCrossProject; + + // Trust gate: cross-project learnings only loaded if trusted (user-stated) + // This prevents prompt injection from one project's AI-generated learnings + // silently influencing reviews in another project. + if (isCrossProject && e.trusted === false) continue; entries.push(e); } catch {} diff --git a/bin/gstack-settings-hook b/bin/gstack-settings-hook index 93a537f0..21445a14 100755 --- a/bin/gstack-settings-hook +++ b/bin/gstack-settings-hook @@ -26,10 +26,10 @@ fi case "$ACTION" in add) - bun -e " + GSTACK_SETTINGS_PATH="$SETTINGS_FILE" GSTACK_HOOK_CMD="$HOOK_CMD" bun -e " const fs = require('fs'); - const settingsPath = '$SETTINGS_FILE'; - const hookCmd = $(printf '%s' "$HOOK_CMD" | bun -e "process.stdout.write(JSON.stringify(require('fs').readFileSync('/dev/stdin','utf8')))"); + const settingsPath = process.env.GSTACK_SETTINGS_PATH; + const hookCmd = process.env.GSTACK_HOOK_CMD; let settings = {}; try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {} @@ -55,9 +55,9 @@ case "$ACTION" in ;; remove) [ -f "$SETTINGS_FILE" ] || exit 0 - bun -e " + GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e " const fs = require('fs'); - const settingsPath = '$SETTINGS_FILE'; + const settingsPath = process.env.GSTACK_SETTINGS_PATH; let settings = {}; try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch { process.exit(0); } diff --git a/bin/gstack-team-init b/bin/gstack-team-init index 1fc08ea9..256735f8 100755 --- a/bin/gstack-team-init +++ b/bin/gstack-team-init @@ -139,9 +139,9 @@ HOOK_EOF # Add hook to project-level settings.json if command -v bun >/dev/null 2>&1; then - bun -e " + GSTACK_SETTINGS_PATH="$SETTINGS" bun -e " const fs = require('fs'); - const settingsPath = '$SETTINGS'; + const settingsPath = process.env.GSTACK_SETTINGS_PATH; let settings = {}; try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {} diff --git a/browse/src/audit.ts b/browse/src/audit.ts new file mode 100644 index 00000000..5ac59f6d --- /dev/null +++ b/browse/src/audit.ts @@ -0,0 +1,65 @@ +/** + * Persistent command audit log — forensic trail for all browse server commands. + * + * Writes append-only JSONL to .gstack/browse-audit.jsonl. Unlike the in-memory + * ring buffers (console, network, dialog), the audit log persists across server + * restarts and is never truncated by the server. Each entry records: + * + * - timestamp, command, args (truncated), page origin + * - duration, status (ok/error), error message if any + * - whether cookies were imported (elevated security context) + * - connection mode (headless/headed) + * + * All writes are best-effort — audit failures never cause command failures. + */ + +import * as fs from 'fs'; + +export interface AuditEntry { + ts: string; + cmd: string; + args: string; + origin: string; + durationMs: number; + status: 'ok' | 'error'; + error?: string; + hasCookies: boolean; + mode: 'launched' | 'headed'; +} + +const MAX_ARGS_LENGTH = 200; +const MAX_ERROR_LENGTH = 300; + +let auditPath: string | null = null; + +export function initAuditLog(logPath: string): void { + auditPath = logPath; +} + +export function writeAuditEntry(entry: AuditEntry): void { + if (!auditPath) return; + try { + const truncatedArgs = entry.args.length > MAX_ARGS_LENGTH + ? entry.args.slice(0, MAX_ARGS_LENGTH) + '…' + : entry.args; + const truncatedError = entry.error && entry.error.length > MAX_ERROR_LENGTH + ? entry.error.slice(0, MAX_ERROR_LENGTH) + '…' + : entry.error; + + const record: Record = { + ts: entry.ts, + cmd: entry.cmd, + args: truncatedArgs, + origin: entry.origin, + durationMs: entry.durationMs, + status: entry.status, + hasCookies: entry.hasCookies, + mode: entry.mode, + }; + if (truncatedError) record.error = truncatedError; + + fs.appendFileSync(auditPath, JSON.stringify(record) + '\n'); + } catch { + // Audit write failures are silent — never block command execution + } +} diff --git a/browse/src/browser-manager.ts b/browse/src/browser-manager.ts index 3e7562bb..63d78358 100644 --- a/browse/src/browser-manager.ts +++ b/browse/src/browser-manager.ts @@ -55,6 +55,9 @@ export class BrowserManager { private dialogAutoAccept: boolean = true; private dialogPromptText: string | null = null; + // ─── Cookie Origin Tracking ──────────────────────────────── + private cookieImportedDomains: Set = new Set(); + // ─── Handoff State ───────────────────────────────────────── private isHeaded: boolean = false; private consecutiveFailures: number = 0; @@ -749,6 +752,19 @@ export class BrowserManager { return this.dialogPromptText; } + // ─── Cookie Origin Tracking ──────────────────────────────── + trackCookieImportDomains(domains: string[]): void { + for (const d of domains) this.cookieImportedDomains.add(d); + } + + getCookieImportedDomains(): ReadonlySet { + return this.cookieImportedDomains; + } + + hasCookieImports(): boolean { + return this.cookieImportedDomains.size > 0; + } + // ─── Viewport ────────────────────────────────────────────── async setViewport(width: number, height: number) { await this.getPage().setViewportSize({ width, height }); diff --git a/browse/src/config.ts b/browse/src/config.ts index 498c083b..65c18728 100644 --- a/browse/src/config.ts +++ b/browse/src/config.ts @@ -20,6 +20,7 @@ export interface BrowseConfig { consoleLog: string; networkLog: string; dialogLog: string; + auditLog: string; } /** @@ -70,6 +71,7 @@ export function resolveConfig( consoleLog: path.join(stateDir, 'browse-console.log'), networkLog: path.join(stateDir, 'browse-network.log'), dialogLog: path.join(stateDir, 'browse-dialog.log'), + auditLog: path.join(stateDir, 'browse-audit.jsonl'), }; } diff --git a/browse/src/cookie-import-browser.ts b/browse/src/cookie-import-browser.ts index 1e7f1ce4..7dc75e07 100644 --- a/browse/src/cookie-import-browser.ts +++ b/browse/src/cookie-import-browser.ts @@ -386,7 +386,8 @@ function openDb(dbPath: string, browserName: string): Database { } function openDbFromCopy(dbPath: string, browserName: string): Database { - const tmpPath = `/tmp/browse-cookies-${browserName.toLowerCase()}-${crypto.randomUUID()}.db`; + // Use os.tmpdir() instead of hardcoded /tmp for cross-platform support (#708) + const tmpPath = path.join(os.tmpdir(), `browse-cookies-${browserName.toLowerCase()}-${crypto.randomUUID()}.db`); try { fs.copyFileSync(dbPath, tmpPath); // Also copy WAL and SHM if they exist (for consistent reads) diff --git a/browse/src/path-security.ts b/browse/src/path-security.ts index 4b1961b0..cb6b1e08 100644 --- a/browse/src/path-security.ts +++ b/browse/src/path-security.ts @@ -33,7 +33,26 @@ const TEMP_ONLY = [TEMP_DIR].map(d => { export function validateOutputPath(filePath: string): void { const resolved = path.resolve(filePath); - // Resolve real path of the parent directory to catch symlinks. + // If the target already exists and is a symlink, resolve through it. + // Without this, a symlink at /tmp/evil.png → /etc/crontab passes the + // parent-directory check (parent is /tmp, which is safe) but the actual + // write follows the symlink to /etc/crontab. + try { + const stat = fs.lstatSync(resolved); + if (stat.isSymbolicLink()) { + const realTarget = fs.realpathSync(resolved); + const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realTarget, dir)); + if (!isSafe) { + throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`); + } + return; // symlink target verified, no need to check parent + } + } catch (e: any) { + // ENOENT = file doesn't exist yet, fall through to parent-dir check + if (e.code !== 'ENOENT') throw e; + } + + // For new files (no existing symlink), verify the parent directory. // The file itself may not exist yet (e.g., screenshot output). // This also handles macOS /tmp → /private/tmp transparently. let dir = path.dirname(resolved); diff --git a/browse/src/read-commands.ts b/browse/src/read-commands.ts index 746b0959..367770ee 100644 --- a/browse/src/read-commands.ts +++ b/browse/src/read-commands.ts @@ -6,6 +6,7 @@ */ import type { TabSession } from './tab-session'; +import type { BrowserManager } from './browser-manager'; import { consoleBuffer, networkBuffer, dialogBuffer } from './buffers'; import type { Page, Frame } from 'playwright'; import * as fs from 'fs'; @@ -62,10 +63,43 @@ export async function getCleanText(page: Page | Frame): Promise { }); } +/** + * When cookies have been imported for specific domains, block JS execution + * on pages whose origin doesn't match any imported cookie domain. + * Prevents cross-origin cookie exfiltration via `js document.cookie` or + * similar when the agent navigates to an untrusted page. + */ +function assertJsOriginAllowed(bm: BrowserManager, pageUrl: string): void { + if (!bm.hasCookieImports()) return; + + let hostname: string; + try { + hostname = new URL(pageUrl).hostname; + } catch { + return; // about:blank, data: URIs — allow (no cookies at risk) + } + + const importedDomains = bm.getCookieImportedDomains(); + const allowed = [...importedDomains].some(domain => { + // Exact match or subdomain match (e.g., ".github.com" matches "api.github.com") + const normalized = domain.startsWith('.') ? domain : '.' + domain; + return hostname === domain.replace(/^\./, '') || hostname.endsWith(normalized); + }); + + if (!allowed) { + throw new Error( + `JS execution blocked: current page (${hostname}) does not match any cookie-imported domain. ` + + `Imported cookies for: ${[...importedDomains].join(', ')}. ` + + `This prevents cross-origin cookie exfiltration. Navigate to an imported domain or run without imported cookies.` + ); + } +} + export async function handleReadCommand( command: string, args: string[], - session: TabSession + session: TabSession, + bm?: BrowserManager, ): Promise { const page = session.getPage(); // Frame-aware target for content extraction @@ -116,7 +150,10 @@ export async function handleReadCommand( id: input.id || undefined, placeholder: input.placeholder || undefined, required: input.required || undefined, - value: input.type === 'password' ? '[redacted]' : (input.value || undefined), + value: input.type === 'password' + || (input.name && /(^|[_.-])(token|secret|key|password|credential|auth|jwt|session|csrf|sid)($|[_.-])|api.?key/i.test(input.name)) + || (input.id && /(^|[_.-])(token|secret|key|password|credential|auth|jwt|session|csrf|sid)($|[_.-])|api.?key/i.test(input.id)) + ? '[redacted]' : (input.value || undefined), options: el.tagName === 'SELECT' ? [...(el as HTMLSelectElement).options].map(o => ({ value: o.value, text: o.text })) : undefined, @@ -142,6 +179,7 @@ export async function handleReadCommand( case 'js': { const expr = args[0]; if (!expr) throw new Error('Usage: browse js '); + if (bm) assertJsOriginAllowed(bm, page.url()); const wrapped = wrapForEvaluate(expr); const result = await target.evaluate(wrapped); return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); @@ -150,6 +188,7 @@ export async function handleReadCommand( case 'eval': { const filePath = args[0]; if (!filePath) throw new Error('Usage: browse eval '); + if (bm) assertJsOriginAllowed(bm, page.url()); validateReadPath(filePath); if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`); const code = fs.readFileSync(filePath, 'utf-8'); diff --git a/browse/src/server.ts b/browse/src/server.ts index c370ecc7..98f43af0 100644 --- a/browse/src/server.ts +++ b/browse/src/server.ts @@ -35,6 +35,7 @@ import { import { validateTempPath } from './path-security'; import { resolveConfig, ensureStateDir, readVersionHash } from './config'; import { emitActivity, subscribe, getActivityAfter, getActivityHistory, getSubscriberCount } from './activity'; +import { initAuditLog, writeAuditEntry } from './audit'; import { inspectElement, modifyStyle, resetModifications, getModificationHistory, detachSession, type InspectorResult } from './cdp-inspector'; // Bun.spawn used instead of child_process.spawn (compiled bun binaries // fail posix_spawn on all executables including /bin/bash) @@ -47,6 +48,7 @@ import * as crypto from 'crypto'; // ─── Config ───────────────────────────────────────────────────── const config = resolveConfig(); ensureStateDir(config); +initAuditLog(config.auditLog); // ─── Auth ─────────────────────────────────────────────────────── const AUTH_TOKEN = crypto.randomUUID(); @@ -1013,7 +1015,7 @@ async function handleCommandInternal( await cleanupHiddenMarkers(page); } } else { - result = await handleReadCommand(command, args, session); + result = await handleReadCommand(command, args, session, browserManager); } } else if (WRITE_COMMANDS.has(command)) { result = await handleWriteCommand(command, args, session, browserManager); @@ -1088,13 +1090,14 @@ async function handleCommandInternal( } // Activity: emit command_end (skipped for chain subcommands) + const successDuration = Date.now() - startTime; if (!opts?.skipActivity) { emitActivity({ type: 'command_end', command, args, url: browserManager.getCurrentUrl(), - duration: Date.now() - startTime, + duration: successDuration, status: 'ok', result: result, tabs: browserManager.getTabCount(), @@ -1103,6 +1106,17 @@ async function handleCommandInternal( }); } + writeAuditEntry({ + ts: new Date().toISOString(), + cmd: command, + args: args.join(' '), + origin: browserManager.getCurrentUrl(), + durationMs: successDuration, + status: 'ok', + hasCookies: browserManager.hasCookieImports(), + mode: browserManager.getConnectionMode(), + }); + browserManager.resetFailures(); // Restore original active tab if we pinned to a specific one if (savedTabId !== null) { @@ -1120,13 +1134,14 @@ async function handleCommandInternal( } // Activity: emit command_end (error) — skipped for chain subcommands + const errorDuration = Date.now() - startTime; if (!opts?.skipActivity) { emitActivity({ type: 'command_end', command, args, url: browserManager.getCurrentUrl(), - duration: Date.now() - startTime, + duration: errorDuration, status: 'error', error: err.message, tabs: browserManager.getTabCount(), @@ -1135,6 +1150,18 @@ async function handleCommandInternal( }); } + writeAuditEntry({ + ts: new Date().toISOString(), + cmd: command, + args: args.join(' '), + origin: browserManager.getCurrentUrl(), + durationMs: errorDuration, + status: 'error', + error: err.message, + hasCookies: browserManager.hasCookieImports(), + mode: browserManager.getConnectionMode(), + }); + browserManager.incrementFailures(); let errorMsg = wrapError(err); const hint = browserManager.getFailureHint(); diff --git a/browse/src/url-validation.ts b/browse/src/url-validation.ts index 5d37cf0d..ddac0d5a 100644 --- a/browse/src/url-validation.ts +++ b/browse/src/url-validation.ts @@ -7,6 +7,8 @@ export const BLOCKED_METADATA_HOSTS = new Set([ '169.254.169.254', // AWS/GCP/Azure instance metadata 'fe80::1', // IPv6 link-local — common metadata endpoint alias '::ffff:169.254.169.254', // IPv4-mapped IPv6 form of the metadata IP + '::ffff:a9fe:a9fe', // Hex-encoded IPv4-mapped form (URL constructor normalizes to this) + '::a9fe:a9fe', // Deprecated IPv4-compatible hex form 'metadata.google.internal', // GCP metadata 'metadata.azure.internal', // Azure IMDS ]); diff --git a/browse/src/write-commands.ts b/browse/src/write-commands.ts index 432b6d58..779a858e 100644 --- a/browse/src/write-commands.ts +++ b/browse/src/write-commands.ts @@ -13,7 +13,8 @@ import { validateNavigationUrl } from './url-validation'; import { validateOutputPath } from './path-security'; import * as fs from 'fs'; import * as path from 'path'; -import { TEMP_DIR } from './platform'; +import { TEMP_DIR, isPathWithin } from './platform'; +import { SAFE_DIRECTORIES } from './path-security'; import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector'; /** @@ -441,16 +442,17 @@ export async function handleWriteCommand( case 'cookie-import': { const filePath = args[0]; if (!filePath) throw new Error('Usage: browse cookie-import '); - // Path validation — prevent reading arbitrary files - if (path.isAbsolute(filePath)) { - const safeDirs = [TEMP_DIR, process.cwd()]; - const resolved = path.resolve(filePath); - if (!safeDirs.some(dir => isPathWithin(resolved, dir))) { - throw new Error(`Path must be within: ${safeDirs.join(', ')}`); - } + // Path validation — resolve to absolute and check against safe dirs. + // Fixes #707: relative paths previously bypassed the safe directory check. + // Mirrors validateOutputPath() — resolves symlinks (e.g., macOS /tmp → /private/tmp). + const resolved = path.resolve(filePath); + let resolvedReal = resolved; + try { resolvedReal = fs.realpathSync(resolved); } catch { + // File may not exist yet — resolve parent dir instead + try { resolvedReal = path.join(fs.realpathSync(path.dirname(resolved)), path.basename(resolved)); } catch {} } - if (path.normalize(filePath).includes('..')) { - throw new Error('Path traversal sequences (..) are not allowed'); + if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedReal, dir))) { + throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`); } if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`); const raw = fs.readFileSync(filePath, 'utf-8'); @@ -476,20 +478,24 @@ export async function handleWriteCommand( } await page.context().addCookies(cookies); + const importedDomains = [...new Set(cookies.map((c: any) => c.domain).filter(Boolean))]; + if (importedDomains.length > 0) bm.trackCookieImportDomains(importedDomains); return `Loaded ${cookies.length} cookies from ${filePath}`; } case 'cookie-import-browser': { // Two modes: // 1. Direct CLI import: cookie-import-browser --domain [--profile ] - // 2. Open picker UI: cookie-import-browser [browser] + // Requires --domain (or --all to explicitly import everything). + // 2. Open picker UI: cookie-import-browser [browser] (interactive domain selection) const browserArg = args[0]; const domainIdx = args.indexOf('--domain'); const profileIdx = args.indexOf('--profile'); + const hasAll = args.includes('--all'); const profile = (profileIdx !== -1 && profileIdx + 1 < args.length) ? args[profileIdx + 1] : 'Default'; if (domainIdx !== -1 && domainIdx + 1 < args.length) { - // Direct import mode — no UI + // Direct import mode — scoped to specific domain const domain = args[domainIdx + 1]; // Validate --domain against current page hostname to prevent cross-site cookie injection const pageHostname = new URL(page.url()).hostname; @@ -501,13 +507,35 @@ export async function handleWriteCommand( const result = await importCookies(browser, [domain], profile); if (result.cookies.length > 0) { await page.context().addCookies(result.cookies); + bm.trackCookieImportDomains([domain]); } const msg = [`Imported ${result.count} cookies for ${domain} from ${browser}`]; if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`); return msg.join(' '); } - // Picker UI mode — open in user's browser + if (hasAll) { + // Explicit all-cookies import — requires --all flag as a deliberate opt-in. + // Imports every non-expired cookie domain from the browser. + const browser = browserArg || 'comet'; + const { listDomains } = await import('./cookie-import-browser'); + const { domains } = listDomains(browser, profile); + const allDomainNames = domains.map((d: any) => d.domain); + if (allDomainNames.length === 0) { + return `No cookies found in ${browser} (profile: ${profile})`; + } + const result = await importCookies(browser, allDomainNames, profile); + if (result.cookies.length > 0) { + await page.context().addCookies(result.cookies); + bm.trackCookieImportDomains(allDomainNames); + } + const msg = [`Imported ${result.count} cookies across ${Object.keys(result.domainCounts).length} domains from ${browser}`]; + msg.push('(used --all: all browser cookies imported, consider --domain for tighter scoping)'); + if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`); + return msg.join(' '); + } + + // Picker UI mode — open in user's browser for interactive domain selection const port = bm.serverPort; if (!port) throw new Error('Server port not available'); @@ -525,7 +553,7 @@ export async function handleWriteCommand( if (err?.code !== 'ENOENT' && !err?.message?.includes('spawn')) throw err; } - return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.`; + return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.\n\nTip: For scripted imports, use --domain to scope cookies to a single domain.`; } case 'style': { diff --git a/browse/test/commands.test.ts b/browse/test/commands.test.ts index 8434e2ef..2c006955 100644 --- a/browse/test/commands.test.ts +++ b/browse/test/commands.test.ts @@ -1811,7 +1811,8 @@ describe('Path traversal prevention', () => { await handleWriteCommand('cookie-import', ['../../etc/shadow'], bm); expect(true).toBe(false); } catch (err: any) { - expect(err.message).toContain('Path traversal'); + // Traversal blocked by safe-directory check (#707) or explicit .. check + expect(err.message).toMatch(/Path must be within|Path traversal/); } }); diff --git a/design/src/session.ts b/design/src/session.ts index 16d6f0ee..01986618 100644 --- a/design/src/session.ts +++ b/design/src/session.ts @@ -49,7 +49,7 @@ export function createSession( updatedAt: new Date().toISOString(), }; - fs.writeFileSync(sessionPath(id), JSON.stringify(session, null, 2)); + fs.writeFileSync(sessionPath(id), JSON.stringify(session, null, 2), { mode: 0o600 }); return session; } diff --git a/setup b/setup index f71f4552..1611a454 100755 --- a/setup +++ b/setup @@ -208,7 +208,7 @@ if [ "$NEEDS_BUILD" -eq 1 ]; then log "Building browse binary..." ( cd "$SOURCE_GSTACK_DIR" - bun install + bun install --frozen-lockfile 2>/dev/null || bun install bun run build ) # Safety net: write .version if build script didn't (e.g., git not available during build) From 23000672673224f04a5d0cb8d692356069c95f6a Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 14 Apr 2026 07:47:11 -1000 Subject: [PATCH 2/2] feat: UX behavioral foundations + ux-audit command (v0.17.0.0) (#1000) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: UX behavioral foundations — Krug's usability principles as shared design infrastructure Add UX_PRINCIPLES resolver distilling Steve Krug's "Don't Make Me Think" into actionable guidance for AI agents. Injected into all 4 design skills as a shared behavioral foundation complementing the existing visual checklist (WHAT to check) and cognitive patterns (HOW designers see) with HOW USERS ACTUALLY BEHAVE. Methodology rewire: 6 Krug usability tests woven into existing design-review phases — Trunk Test, 3-Second Scan, Page Area Test, Happy Talk Detection with word count metric, Mindless Choice Audit, Goodwill Reservoir tracking with visual dashboard. First-person narration mode for design-review output with anti-slop guardrail. Hard rules: 4 Krug always/never rules in DESIGN_HARD_RULES (placeholder-as-label, floating headings, visited link distinction, minimum type size). Krug, Redish, Jarrett added to plan-design-review references. Token ceiling: gen-skill-docs.ts warns if any SKILL.md exceeds 100KB (~25K tokens). Documented in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: $B ux-audit command + snapshot --heatmap flag New browse meta-command: ux-audit extracts page structure (site ID, navigation, headings, interactive elements, text blocks) as structured JSON for agent-side UX behavioral analysis. Pure data extraction — the agent applies the 6 usability tests and makes judgment calls. Element caps: 50 headings, 100 links, 200 interactive, 50 text blocks. New snapshot flag: -H/--heatmap accepts a JSON color map mapping ref IDs to colors (green/yellow/red/blue/orange/gray). Extends existing snapshot -a annotation system with per-ref colors instead of hardcoded red. Color whitelist validation prevents CSS injection. Composable — any skill can use it. Co-Authored-By: Claude Opus 4.6 (1M context) * docs: update project documentation for v0.17.0.0 ARCHITECTURE.md: added {{UX_PRINCIPLES}} resolver to placeholder table. VERSION: bumped to 0.17.0.0 for UX behavioral foundations release. Co-Authored-By: Claude Opus 4.6 (1M context) * chore: bump version and changelog (v0.17.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) * fix: adversarial review fixes for ux-audit and heatmap Security: - Remove live form value extraction from ux-audit (leaked input field values) - Add ux-audit to PAGE_CONTENT_COMMANDS (untrusted content wrapping) Correctness: - Scope youAreHere selector to nav containers (was matching animation classes) - Validate heatmap JSON is a plain object (string/array/null produced garbage) - Use textContent instead of innerText for word count (avoids layout computation) - Remove dead url variable and unused LINK_CAP constant Found by Codex + Claude adversarial review. Co-Authored-By: Claude Opus 4.6 (1M context) --------- Co-authored-by: Claude Opus 4.6 (1M context) --- ARCHITECTURE.md | 1 + CHANGELOG.md | 14 +++ CLAUDE.md | 5 + SKILL.md | 2 + VERSION | 2 +- browse/SKILL.md | 2 + browse/src/commands.ts | 4 + browse/src/meta-commands.ts | 110 ++++++++++++++++++++++ browse/src/snapshot.ts | 120 ++++++++++++++++++++++++ design-html/SKILL.md | 85 +++++++++++++++++ design-html/SKILL.md.tmpl | 2 + design-review/SKILL.md | 149 +++++++++++++++++++++++++++++- design-review/SKILL.md.tmpl | 2 + design-shotgun/SKILL.md | 85 +++++++++++++++++ design-shotgun/SKILL.md.tmpl | 2 + plan-design-review/SKILL.md | 91 +++++++++++++++++- plan-design-review/SKILL.md.tmpl | 4 +- scripts/gen-skill-docs.ts | 6 ++ scripts/resolvers/design.ts | 152 ++++++++++++++++++++++++++++++- scripts/resolvers/index.ts | 3 +- test/skill-validation.test.ts | 1 + 21 files changed, 836 insertions(+), 6 deletions(-) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 086bb2e4..a755ff24 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -208,6 +208,7 @@ Templates contain the workflows, tips, and examples that require human judgment. | `{{CODEX_PLAN_REVIEW}}` | `gen-skill-docs.ts` | Optional cross-model plan review (Codex or Claude subagent fallback) for /plan-ceo-review and /plan-eng-review | | `{{DESIGN_SETUP}}` | `resolvers/design.ts` | Discovery pattern for `$D` design binary, mirrors `{{BROWSE_SETUP}}` | | `{{DESIGN_SHOTGUN_LOOP}}` | `resolvers/design.ts` | Shared comparison board feedback loop for /design-shotgun, /plan-design-review, /design-consultation | +| `{{UX_PRINCIPLES}}` | `resolvers/design.ts` | User behavioral foundations (scanning, satisficing, goodwill reservoir, trunk test) for /design-html, /design-shotgun, /design-review, /plan-design-review | This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear. diff --git a/CHANGELOG.md b/CHANGELOG.md index 061888ff..b912ba03 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,19 @@ # Changelog +## [0.17.0.0] - 2026-04-14 + +### Added +- **UX behavioral foundations.** Every design skill now thinks about how users actually behave, not just how the interface looks. A shared `{{UX_PRINCIPLES}}` resolver distills Steve Krug's "Don't Make Me Think" into actionable guidance: scanning behavior, satisficing, the goodwill reservoir, navigation wayfinding, and the trunk test. Injected into /design-html, /design-shotgun, /design-review, and /plan-design-review. Your design reviews now catch "this navigation is confusing" problems, not just "the contrast ratio is 4.3:1." +- **6 usability tests woven into design-review.** The methodology now runs the Trunk Test (can you tell what site this is, what page you're on, and how to search?), 3-Second Scan (what do users see first?), Page Area Test (can you name each section's purpose?), Happy Talk Detection with word count (how much of this page is "blah blah blah"?), Mindless Choice Audit (does every click feel obvious?), and Goodwill Reservoir tracking with a visual dashboard (what depletes the user's patience at each step?). +- **First-person narration mode.** Design review reports now read like a usability consultant watching someone use your site: "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely. Wait, is that a button?" With anti-slop guardrail: if the agent can't name the specific element, it's generating platitudes. +- **`$B ux-audit` command.** Standalone UX structural extraction. One command extracts site ID, navigation, headings, interactive elements, text blocks, and search presence as structured JSON. The agent applies the 6 usability tests to the data. Pure data extraction with element caps (50 headings, 100 links, 200 interactive, 50 text blocks). +- **`snapshot -H` / `--heatmap` flag.** Color-coded overlay screenshots. Pass a JSON map of ref IDs to colors (`green`/`yellow`/`red`/`blue`/`orange`/`gray`) and get an annotated screenshot with per-element colored boxes. Color whitelist prevents CSS injection. Composable: any skill can use it. +- **Token ceiling enforcement.** `gen-skill-docs` now warns if any generated SKILL.md exceeds 100KB (~25K tokens). Catches prompt bloat before it degrades agent performance. + +### Changed +- **Krug's always/never rules** added to the design hard rules: never placeholder-as-label, never floating headings, always visited link distinction, never sub-16px body text. These join the existing AI slop blacklist as mechanical checks. +- **Plan-design-review references** now include Steve Krug, Ginny Redish (Letting Go of the Words), and Caroline Jarrett (Forms that Work) alongside Rams, Norman, and Nielsen. + ## [0.16.4.0] - 2026-04-13 ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 7a2c6faf..8d4d2735 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -138,6 +138,11 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs: To add a new browse command: add it to `browse/src/commands.ts` and rebuild. To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild. +**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens). +`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the +ceiling, consider extracting optional sections into separate resolvers that only +inject when relevant, or making verbose evaluation rubrics more concise. + **Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates and `scripts/gen-skill-docs.ts` (the sources of truth), (2) run `bun run gen:skill-docs` diff --git a/SKILL.md b/SKILL.md index 94ba826b..0c189814 100644 --- a/SKILL.md +++ b/SKILL.md @@ -719,6 +719,7 @@ The snapshot is your primary tool for understanding and interacting with pages. -a --annotate Annotated screenshot with red overlay boxes and ref labels -o --output Output path for annotated screenshot (default: /browse-annotated.png) -C --cursor-interactive Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used. +-H --heatmap Color-coded overlay screenshot from JSON map: '{"@e1":"green","@e3":"red"}'. Valid colors: green, yellow, red, blue, orange, gray. ``` All flags can be combined freely. `-o` only applies when `-a` is also used. @@ -825,6 +826,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage [set k v]` | Read all localStorage + sessionStorage as JSON, or set to write localStorage | +| `ux-audit` | Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation. | ### Visual | Command | Description | diff --git a/VERSION b/VERSION index d1a96684..ca415c68 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.16.4.0 +0.17.0.0 diff --git a/browse/SKILL.md b/browse/SKILL.md index 420e2b0b..5ac0377b 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -587,6 +587,7 @@ The snapshot is your primary tool for understanding and interacting with pages. -a --annotate Annotated screenshot with red overlay boxes and ref labels -o --output Output path for annotated screenshot (default: /browse-annotated.png) -C --cursor-interactive Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used. +-H --heatmap Color-coded overlay screenshot from JSON map: '{"@e1":"green","@e3":"red"}'. Valid colors: green, yellow, red, blue, orange, gray. ``` All flags can be combined freely. `-o` only applies when `-a` is also used. @@ -717,6 +718,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage [set k v]` | Read all localStorage + sessionStorage as JSON, or set to write localStorage | +| `ux-audit` | Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation. | ### Visual | Command | Description | diff --git a/browse/src/commands.ts b/browse/src/commands.ts index eacdf0cd..2fd0b421 100644 --- a/browse/src/commands.ts +++ b/browse/src/commands.ts @@ -40,6 +40,7 @@ export const META_COMMANDS = new Set([ 'watch', 'state', 'frame', + 'ux-audit', ]); export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]); @@ -49,6 +50,7 @@ export const PAGE_CONTENT_COMMANDS = new Set([ 'text', 'html', 'links', 'forms', 'accessibility', 'attrs', 'console', 'dialog', 'media', 'data', + 'ux-audit', ]); /** Wrap output from untrusted-content commands with trust boundary markers */ @@ -146,6 +148,8 @@ export const COMMAND_DESCRIPTIONS: Record | style --undo [N]' }, 'cleanup': { category: 'Interaction', description: 'Remove page clutter (ads, cookie banners, sticky elements, social widgets)', usage: 'cleanup [--ads] [--cookies] [--sticky] [--social] [--all]' }, 'prettyscreenshot': { category: 'Visual', description: 'Clean screenshot with optional cleanup, scroll positioning, and element hiding', usage: 'prettyscreenshot [--scroll-to sel|text] [--cleanup] [--hide sel...] [--width px] [path]' }, + // UX Audit + 'ux-audit': { category: 'Inspection', description: 'Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation.', usage: 'ux-audit' }, }; // Load-time validation: descriptions must cover exactly the command sets diff --git a/browse/src/meta-commands.ts b/browse/src/meta-commands.ts index 1fa905e1..392602f0 100644 --- a/browse/src/meta-commands.ts +++ b/browse/src/meta-commands.ts @@ -653,6 +653,116 @@ export async function handleMetaCommand( return `Switched to frame: ${frame.url()}`; } + // ─── UX Audit ───────────────────────────────────── + case 'ux-audit': { + const page = bm.getPage(); + + // Extract page structure for UX behavioral analysis + // Agent interprets the data and applies Krug's 6 usability tests + // Uses textContent (not innerText) to avoid layout computation on large DOMs + const data = await page.evaluate(() => { + const HEADING_CAP = 50; + const INTERACTIVE_CAP = 200; + const TEXT_BLOCK_CAP = 50; + + // Site ID: logo or brand element + const logoEl = document.querySelector('[class*="logo"], [id*="logo"], header img, [aria-label*="home"], a[href="/"]'); + const siteId = logoEl ? { + found: true, + text: (logoEl.textContent || '').trim().slice(0, 100), + tag: logoEl.tagName, + alt: (logoEl as HTMLImageElement).alt || null, + } : { found: false, text: null, tag: null, alt: null }; + + // Page name: main heading + const h1 = document.querySelector('h1'); + const pageName = h1 ? { + found: true, + text: h1.textContent?.trim().slice(0, 200) || '', + } : { found: false, text: null }; + + // Navigation: primary nav elements + const navEls = document.querySelectorAll('nav, [role="navigation"]'); + const navItems: Array<{ text: string; links: number }> = []; + navEls.forEach((nav, i) => { + if (i >= 5) return; + const links = nav.querySelectorAll('a'); + navItems.push({ + text: (nav.getAttribute('aria-label') || `nav-${i}`).slice(0, 50), + links: links.length, + }); + }); + + // "You are here" indicator: current/active nav items + // Scoped to nav containers to avoid false positives from animation classes + const activeNavItems = document.querySelectorAll('nav [aria-current], nav .active, nav .current, [role="navigation"] [aria-current], [role="navigation"] .active, [role="navigation"] .current'); + const youAreHere = Array.from(activeNavItems).slice(0, 5).map(el => ({ + text: (el.textContent || '').trim().slice(0, 50), + tag: el.tagName, + })); + + // Search: search box presence + const searchEl = document.querySelector('input[type="search"], [role="search"], input[name*="search"], input[placeholder*="search" i], input[aria-label*="search" i]'); + const search = { found: !!searchEl }; + + // Breadcrumbs + const breadcrumbEl = document.querySelector('[aria-label*="breadcrumb" i], .breadcrumb, .breadcrumbs, [class*="breadcrumb"]'); + const breadcrumbs = breadcrumbEl ? { + found: true, + items: Array.from(breadcrumbEl.querySelectorAll('a, span, li')).slice(0, 10).map(el => (el.textContent || '').trim().slice(0, 30)), + } : { found: false, items: [] }; + + // Headings: heading hierarchy + const headings = Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).slice(0, HEADING_CAP).map(h => ({ + tag: h.tagName, + text: (h.textContent || '').trim().slice(0, 80), + size: getComputedStyle(h).fontSize, + })); + + // Interactive elements: buttons, links, inputs + const interactiveEls = Array.from(document.querySelectorAll('a, button, input, select, textarea, [role="button"], [tabindex]')).slice(0, INTERACTIVE_CAP); + const interactive = interactiveEls.map(el => { + const rect = el.getBoundingClientRect(); + return { + tag: el.tagName, + text: (el.textContent || (el as HTMLInputElement).placeholder || '').trim().slice(0, 50), + type: (el as HTMLInputElement).type || null, + role: el.getAttribute('role'), + w: Math.round(rect.width), + h: Math.round(rect.height), + visible: rect.width > 0 && rect.height > 0, + }; + }).filter(el => el.visible); + + // Text blocks: paragraphs and large text areas + const textBlocks = Array.from(document.querySelectorAll('p, [class*="description"], [class*="intro"], [class*="welcome"], [class*="hero"] p, main p')).slice(0, TEXT_BLOCK_CAP).map(el => ({ + text: (el.textContent || '').trim().slice(0, 200), + wordCount: (el.textContent || '').trim().split(/\s+/).filter(Boolean).length, + })); + + // Total visible text word count (textContent avoids layout computation) + const bodyText = (document.body?.textContent || '').trim(); + const totalWords = bodyText.split(/\s+/).filter(Boolean).length; + + return { + url: window.location.href, + title: document.title, + siteId, + pageName, + navigation: navItems, + youAreHere, + search, + breadcrumbs, + headings, + interactive, + textBlocks, + totalWords, + }; + }); + + return JSON.stringify(data, null, 2); + } + default: throw new Error(`Unknown meta command: ${command}`); } diff --git a/browse/src/snapshot.ts b/browse/src/snapshot.ts index ac2761bb..8f4791f1 100644 --- a/browse/src/snapshot.ts +++ b/browse/src/snapshot.ts @@ -39,6 +39,7 @@ interface SnapshotOptions { annotate?: boolean; // -a / --annotate: annotated screenshot outputPath?: string; // -o / --output: path for annotated screenshot cursorInteractive?: boolean; // -C / --cursor-interactive: scan cursor:pointer etc. + heatmap?: string; // -H / --heatmap: JSON color map for ref overlays } /** @@ -64,6 +65,7 @@ export const SNAPSHOT_FLAGS: Array<{ { short: '-a', long: '--annotate', description: 'Annotated screenshot with red overlay boxes and ref labels', optionKey: 'annotate' }, { short: '-o', long: '--output', description: 'Output path for annotated screenshot (default: /browse-annotated.png)', takesValue: true, valueHint: '', optionKey: 'outputPath' }, { short: '-C', long: '--cursor-interactive', description: 'Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used.', optionKey: 'cursorInteractive' }, + { short: '-H', long: '--heatmap', description: 'Color-coded overlay screenshot from JSON map: \'{"@e1":"green","@e3":"red"}\'. Valid colors: green, yellow, red, blue, orange, gray.', takesValue: true, valueHint: '', optionKey: 'heatmap' }, ]; interface ParsedNode { @@ -435,6 +437,124 @@ export async function handleSnapshot( } } + // ─── Heatmap mode (-H) ────────────────────────────────────── + if (opts.heatmap) { + const heatmapPath = opts.outputPath || `${TEMP_DIR}/browse-heatmap.png`; + // Validate output path + { + const nodePath = require('path') as typeof import('path'); + const nodeFs = require('fs') as typeof import('fs'); + const absolute = nodePath.resolve(heatmapPath); + const safeDirs = [TEMP_DIR, process.cwd()].map((d: string) => { + try { return nodeFs.realpathSync(d); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; return d; } + }); + let realPath: string; + try { + realPath = nodeFs.realpathSync(absolute); + } catch (err: any) { + if (err.code === 'ENOENT') { + try { + const dir = nodeFs.realpathSync(nodePath.dirname(absolute)); + realPath = nodePath.join(dir, nodePath.basename(absolute)); + } catch (err2: any) { + if (err2?.code !== 'ENOENT') throw err2; + realPath = absolute; + } + } else { + throw new Error(`Cannot resolve real path: ${heatmapPath} (${err.code})`); + } + } + if (!safeDirs.some((dir: string) => isPathWithin(realPath, dir))) { + throw new Error(`Path must be within: ${safeDirs.join(', ')}`); + } + } + + // Parse and validate color map + const VALID_COLORS = new Set(['green', 'yellow', 'red', 'blue', 'orange', 'gray']); + const COLOR_MAP: Record = { + green: { border: '#00b400', bg: 'rgba(0,180,0,0.15)' }, + yellow: { border: '#ffb400', bg: 'rgba(255,180,0,0.15)' }, + red: { border: '#ff0000', bg: 'rgba(255,0,0,0.15)' }, + blue: { border: '#0066ff', bg: 'rgba(0,102,255,0.15)' }, + orange: { border: '#ff6600', bg: 'rgba(255,102,0,0.15)' }, + gray: { border: '#888888', bg: 'rgba(136,136,136,0.15)' }, + }; + + let colorAssignments: Record; + try { + const parsed = JSON.parse(opts.heatmap); + if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) { + throw new Error('not an object'); + } + colorAssignments = parsed; + } catch { + throw new Error('Invalid heatmap JSON. Expected object: \'{"@e1":"green","@e3":"red"}\''); + } + + // Validate colors + for (const [ref, color] of Object.entries(colorAssignments)) { + if (!VALID_COLORS.has(color)) { + throw new Error(`Invalid heatmap color "${color}" for ${ref}. Valid: ${[...VALID_COLORS].join(', ')}`); + } + } + + try { + const boxes: Array<{ ref: string; box: { x: number; y: number; width: number; height: number }; color: string }> = []; + for (const [refKey, color] of Object.entries(colorAssignments)) { + const cleanRef = refKey.startsWith('@') ? refKey.slice(1) : refKey; + const entry = refMap.get(cleanRef); + if (!entry) continue; // Skip refs not found on page + try { + const box = await entry.locator.boundingBox({ timeout: 1000 }); + if (box) { + const colors = COLOR_MAP[color] || COLOR_MAP.gray; + boxes.push({ ref: `@${cleanRef}`, box, color: JSON.stringify(colors) }); + } + } catch { + // Element may be offscreen or hidden — skip + } + } + + await page.evaluate((boxes) => { + for (const { ref, box, color } of boxes) { + const colors = JSON.parse(color); + const overlay = document.createElement('div'); + overlay.className = '__browse_heatmap__'; + overlay.style.cssText = ` + position: absolute; top: ${box.y}px; left: ${box.x}px; + width: ${box.width}px; height: ${box.height}px; + border: 2px solid ${colors.border}; background: ${colors.bg}; + pointer-events: none; z-index: 99999; + font-size: 10px; color: ${colors.border}; font-weight: bold; + `; + const label = document.createElement('span'); + label.textContent = ref; + label.style.cssText = `position: absolute; top: -14px; left: 0; background: ${colors.border}; color: white; padding: 0 3px; font-size: 10px;`; + overlay.appendChild(label); + document.body.appendChild(overlay); + } + }, boxes); + + await page.screenshot({ path: heatmapPath, fullPage: true }); + + // Remove heatmap overlays + await page.evaluate(() => { + document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove()); + }); + + output.push(''); + output.push(`[heatmap screenshot: ${heatmapPath}]`); + } catch (err: any) { + // Cleanup on failure + try { + await page.evaluate(() => { + document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove()); + }); + } catch {} + if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context') && !err?.message?.includes('screenshot')) throw err; + } + } + // ─── Diff mode (-D) ─────────────────────────────────────── if (opts.diff) { const lastSnapshot = session.getLastSnapshot(); diff --git a/design-html/SKILL.md b/design-html/SKILL.md index 10aaece0..f9b87b05 100644 --- a/design-html/SKILL.md +++ b/design-html/SKILL.md @@ -589,6 +589,91 @@ MUST be saved to `~/.gstack/projects/$SLUG/designs/`, NEVER to `.context/`, `docs/designs/`, `/tmp/`, or any project-local directory. Design artifacts are USER data, not project files. They persist across branches, conversations, and workspaces. +## UX Principles: How Users Actually Behave + +These principles govern how real humans interact with interfaces. They are observed +behavior, not preferences. Apply them before, during, and after every design decision. + +### The Three Laws of Usability + +1. **Don't make me think.** Every page should be self-evident. If a user stops + to think "What do I click?" or "What does this mean?", the design has failed. + Self-evident > self-explanatory > requires explanation. + +2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks + beat one click that requires thought. Each step should feel like an obvious + choice (animal, vegetable, or mineral), not a puzzle. + +3. **Omit, then omit again.** Get rid of half the words on each page, then get + rid of half of what's left. Happy talk (self-congratulatory text) must die. + Instructions must die. If they need reading, the design has failed. + +### How Users Actually Behave + +- **Users scan, they don't read.** Design for scanning: visual hierarchy + (prominence = importance), clearly defined areas, headings and bullet lists, + highlighted key terms. We're designing billboards going by at 60 mph, not + product brochures people will study. +- **Users satisfice.** They pick the first reasonable option, not the best. + Make the right choice the most visible choice. +- **Users muddle through.** They don't figure out how things work. They wing + it. If they accomplish their goal by accident, they won't seek the "right" way. + Once they find something that works, no matter how badly, they stick to it. +- **Users don't read instructions.** They dive in. Guidance must be brief, + timely, and unavoidable, or it won't be seen. + +### Billboard Design for Interfaces + +- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass. + Don't innovate on navigation to be clever. Innovate when you KNOW you have a + better idea, otherwise use conventions. Even across languages and cultures, + web conventions let people identify the logo, nav, search, and main content. +- **Visual hierarchy is everything.** Related things are visually grouped. Nested + things are visually contained. More important = more prominent. If everything + shouts, nothing is heard. Start with the assumption everything is visual noise, + guilty until proven innocent. +- **Make clickable things obviously clickable.** No relying on hover states for + discoverability, especially on mobile where hover doesn't exist. Shape, location, + and formatting (color, underlining) must signal clickability without interaction. +- **Eliminate noise.** Three sources: too many things shouting for attention + (shouting), things not organized logically (disorganization), and too much stuff + (clutter). Fix noise by removal, not addition. +- **Clarity trumps consistency.** If making something significantly clearer + requires making it slightly inconsistent, choose clarity every time. + +### Navigation as Wayfinding + +Users on the web have no sense of scale, direction, or location. Navigation +must always answer: What site is this? What page am I on? What are the major +sections? What are my options at this level? Where am I? How can I search? + +Persistent navigation on every page. Breadcrumbs for deep hierarchies. +Current section visually indicated. The "trunk test": cover everything except +the navigation. You should still know what site this is, what page you're on, +and what the major sections are. If not, the navigation has failed. + +### The Goodwill Reservoir + +Users start with a reservoir of goodwill. Every friction point depletes it. + +**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing +users for not doing things your way (formatting requirements on phone numbers). +Asking for unnecessary information. Putting sizzle in their way (splash screens, +forced tours, interstitials). Unprofessional or sloppy appearance. + +**Replenish:** Know what users want to do and make it obvious. Tell them what they +want to know upfront. Save them steps wherever possible. Make it easy to recover +from errors. When in doubt, apologize. + +### Mobile: Same Rules, Higher Stakes + +All the above applies on mobile, just more so. Real estate is scarce, but never +sacrifice usability for space savings. Affordances must be VISIBLE: no cursor +means no hover-to-discover. Touch targets must be big enough (44px minimum). +Flat design can strip away useful visual information that signals interactivity. +Prioritize ruthlessly: things needed in a hurry go close at hand, everything +else a few taps away with an obvious path to get there. + ## SETUP (run this check BEFORE any browse command) ```bash diff --git a/design-html/SKILL.md.tmpl b/design-html/SKILL.md.tmpl index 80527c9e..9fb422e9 100644 --- a/design-html/SKILL.md.tmpl +++ b/design-html/SKILL.md.tmpl @@ -37,6 +37,8 @@ around obstacles. {{DESIGN_SETUP}} +{{UX_PRINCIPLES}} + {{BROWSE_SETUP}} --- diff --git a/design-review/SKILL.md b/design-review/SKILL.md index b87c509d..e3f5cd77 100644 --- a/design-review/SKILL.md +++ b/design-review/SKILL.md @@ -894,6 +894,91 @@ matches a past learning, display: This makes the compounding visible. The user should see that gstack is getting smarter on their codebase over time. +## UX Principles: How Users Actually Behave + +These principles govern how real humans interact with interfaces. They are observed +behavior, not preferences. Apply them before, during, and after every design decision. + +### The Three Laws of Usability + +1. **Don't make me think.** Every page should be self-evident. If a user stops + to think "What do I click?" or "What does this mean?", the design has failed. + Self-evident > self-explanatory > requires explanation. + +2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks + beat one click that requires thought. Each step should feel like an obvious + choice (animal, vegetable, or mineral), not a puzzle. + +3. **Omit, then omit again.** Get rid of half the words on each page, then get + rid of half of what's left. Happy talk (self-congratulatory text) must die. + Instructions must die. If they need reading, the design has failed. + +### How Users Actually Behave + +- **Users scan, they don't read.** Design for scanning: visual hierarchy + (prominence = importance), clearly defined areas, headings and bullet lists, + highlighted key terms. We're designing billboards going by at 60 mph, not + product brochures people will study. +- **Users satisfice.** They pick the first reasonable option, not the best. + Make the right choice the most visible choice. +- **Users muddle through.** They don't figure out how things work. They wing + it. If they accomplish their goal by accident, they won't seek the "right" way. + Once they find something that works, no matter how badly, they stick to it. +- **Users don't read instructions.** They dive in. Guidance must be brief, + timely, and unavoidable, or it won't be seen. + +### Billboard Design for Interfaces + +- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass. + Don't innovate on navigation to be clever. Innovate when you KNOW you have a + better idea, otherwise use conventions. Even across languages and cultures, + web conventions let people identify the logo, nav, search, and main content. +- **Visual hierarchy is everything.** Related things are visually grouped. Nested + things are visually contained. More important = more prominent. If everything + shouts, nothing is heard. Start with the assumption everything is visual noise, + guilty until proven innocent. +- **Make clickable things obviously clickable.** No relying on hover states for + discoverability, especially on mobile where hover doesn't exist. Shape, location, + and formatting (color, underlining) must signal clickability without interaction. +- **Eliminate noise.** Three sources: too many things shouting for attention + (shouting), things not organized logically (disorganization), and too much stuff + (clutter). Fix noise by removal, not addition. +- **Clarity trumps consistency.** If making something significantly clearer + requires making it slightly inconsistent, choose clarity every time. + +### Navigation as Wayfinding + +Users on the web have no sense of scale, direction, or location. Navigation +must always answer: What site is this? What page am I on? What are the major +sections? What are my options at this level? Where am I? How can I search? + +Persistent navigation on every page. Breadcrumbs for deep hierarchies. +Current section visually indicated. The "trunk test": cover everything except +the navigation. You should still know what site this is, what page you're on, +and what the major sections are. If not, the navigation has failed. + +### The Goodwill Reservoir + +Users start with a reservoir of goodwill. Every friction point depletes it. + +**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing +users for not doing things your way (formatting requirements on phone numbers). +Asking for unnecessary information. Putting sizzle in their way (splash screens, +forced tours, interstitials). Unprofessional or sloppy appearance. + +**Replenish:** Know what users want to do and make it obvious. Tell them what they +want to know upfront. Save them steps wherever possible. Make it easy to recover +from errors. When in doubt, apologize. + +### Mobile: Same Rules, Higher Stakes + +All the above applies on mobile, just more so. Real estate is scarce, but never +sacrifice usability for space savings. Affordances must be VISIBLE: no cursor +means no hover-to-discover. Touch targets must be big enough (44px minimum). +Flat design can strip away useful visual information that signals interactivity. +Prioritize ruthlessly: things needed in a hurry go close at hand, everything +else a few taps away with an obvious path to get there. + ## Phases 1-6: Design Audit Baseline ## Modes @@ -928,9 +1013,13 @@ The most uniquely designer-like output. Form a gut reaction before analyzing any 3. Write the **First Impression** using this structured critique format: - "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?) - "I notice **[observation]**." (what stands out, positive or negative — be specific) - - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?) + - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these the 3 things the designer intended? If not, the visual hierarchy is lying.) - "If I had to describe this in one word: **[word]**." (gut verdict) +**Narration mode:** Write this section in first person, as if you are a user scanning the page for the first time. "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely, then... wait, is that a button?" Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually scanning, you're generating platitudes. + +**Page Area Test:** Point at each clearly defined area of the page. Can you instantly name its purpose? ("Things I can buy," "Today's deals," "How to search.") Areas you can't name in 2 seconds are poorly defined. List them. + This is the section users read first. Be opinionated. A designer doesn't hedge — they react. --- @@ -986,6 +1075,19 @@ $B url ``` If URL contains `/login`, `/signin`, `/auth`, or `/sso`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run `/setup-browser-cookies` first if needed." +### Trunk Test (run on every page) + +Imagine being dropped on this page with no context. Can you immediately answer: +1. What site is this? (Site ID visible and identifiable) +2. What page am I on? (Page name prominent, matches what I clicked) +3. What are the major sections? (Primary nav visible and clear) +4. What are my options at this level? (Local nav or content choices obvious) +5. Where am I in the scheme of things? ("You are here" indicator, breadcrumbs) +6. How can I search? (Search box findable without hunting) + +Score: PASS (all 6 clear) / PARTIAL (4-5 clear) / FAIL (3 or fewer clear). +A FAIL on the trunk test is a HIGH-impact finding regardless of how polished the visual design is. + ### Design Audit Checklist (10 categories, ~80 items) Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category. @@ -1054,6 +1156,7 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish - Success: confirmation animation or color, auto-dismiss - Touch targets >= 44px on all interactive elements - `cursor: pointer` on all clickable elements +- Mindless choice audit: every decision point (button, link, dropdown, modal choice) is a mindless click (obvious what happens). If a click requires thought about whether it's the right choice, flag as HIGH. **6. Responsive Design** (8 items) - Mobile layout makes *design* sense (not just stacked desktop columns) @@ -1082,6 +1185,9 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish - Active voice ("Install the CLI" not "The CLI will be installed") - Loading states end with `…` ("Saving…" not "Saving...") - Destructive actions have confirmation modal or undo window +- Happy talk detection: scan for introductory paragraphs that start with "Welcome to..." or tell users how great the site is. If you can hear "blah blah blah", it's happy talk. Flag for removal. +- Instructions detection: any visible instructions longer than one sentence. If users need to read instructions, the design has failed. Flag the instructions AND the interaction they're compensating for. +- Happy talk word count: count total visible words on the page. Classify each text block as "useful content" vs "happy talk" (welcome paragraphs, self-congratulatory text, instructions nobody reads). Report: "This page has X words. Y (Z%) are happy talk." **9. AI Slop Detection** (10 anti-patterns — the blacklist) @@ -1124,6 +1230,43 @@ Evaluate: - **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate? - **Form polish:** Focus states visible? Validation timing correct? Errors near the source? +**Narration mode:** Narrate the flow in first person. "I click 'Sign Up'... spinner appears... 3 seconds pass... still spinning... I'm getting nervous. Finally the dashboard loads, but where am I? The nav doesn't highlight anything." Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually experiencing the flow, you're generating platitudes. + +### Goodwill Reservoir (track across the flow) + +As you walk the user flow, maintain a mental goodwill meter (starts at 70/100). +These scores are heuristic, not measured. The value is in identifying specific +drains and fills, not in the final number. + +Subtract points for: +- Hidden information the user would want (pricing, contact, shipping): subtract 15 +- Format punishment (rejecting valid input like dashes in phone numbers): subtract 10 +- Unnecessary information requests: subtract 10 +- Interstitials, splash screens, forced tours blocking the task: subtract 15 +- Sloppy or unprofessional appearance: subtract 10 +- Ambiguous choices that require thinking: subtract 5 each + +Add points for: +- Top user tasks are obvious and prominent: add 10 +- Upfront about costs and limitations: add 5 +- Saves steps (direct links, smart defaults, autofill): add 5 each +- Graceful error recovery with specific fix instructions: add 10 +- Apologizes when things go wrong: add 5 + +Report the final goodwill score with a visual dashboard: + +``` +Goodwill: 70 ████████████████████░░░░░░░░░░ + Step 1: Login page 70 → 75 (+5 obvious primary action) + Step 2: Dashboard 75 → 60 (-15 interstitial tour popup) + Step 3: Settings 60 → 50 (-10 format punishment on phone) + Step 4: Billing 50 → 35 (-15 hidden pricing info) + FINAL: 35/100 ⚠️ CRITICAL UX DEBT +``` + +Below 30 = critical UX debt. 30-60 = needs work. Above 60 = healthy. +Include the biggest drains and fills as specific findings. + --- ## Phase 5: Cross-Page Consistency @@ -1281,6 +1424,10 @@ Tie everything to user goals and product objectives. Always suggest specific imp - One job per section - "If deleting 30% of the copy improves it, keep deleting" - Cards earn their existence — no decorative card grids +- NEVER use small, low-contrast type (body text < 16px or contrast ratio < 4.5:1 on body text) +- NEVER put labels inside form fields as the only label (placeholder-as-label pattern — labels must be visible when the field has content) +- ALWAYS preserve visited vs unvisited link distinction (visited links must have a different color) +- NEVER float headings between paragraphs (heading must be visually closer to the section it introduces than to the preceding section) **AI Slop blacklist** (the 10 patterns that scream "AI-generated"): 1. Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes diff --git a/design-review/SKILL.md.tmpl b/design-review/SKILL.md.tmpl index adca0991..fbf59e8d 100644 --- a/design-review/SKILL.md.tmpl +++ b/design-review/SKILL.md.tmpl @@ -99,6 +99,8 @@ echo "REPORT_DIR: $REPORT_DIR" {{LEARNINGS_SEARCH}} +{{UX_PRINCIPLES}} + ## Phases 1-6: Design Audit Baseline {{DESIGN_METHODOLOGY}} diff --git a/design-shotgun/SKILL.md b/design-shotgun/SKILL.md index d254d9d2..e8726c47 100644 --- a/design-shotgun/SKILL.md +++ b/design-shotgun/SKILL.md @@ -583,6 +583,91 @@ MUST be saved to `~/.gstack/projects/$SLUG/designs/`, NEVER to `.context/`, `docs/designs/`, `/tmp/`, or any project-local directory. Design artifacts are USER data, not project files. They persist across branches, conversations, and workspaces. +## UX Principles: How Users Actually Behave + +These principles govern how real humans interact with interfaces. They are observed +behavior, not preferences. Apply them before, during, and after every design decision. + +### The Three Laws of Usability + +1. **Don't make me think.** Every page should be self-evident. If a user stops + to think "What do I click?" or "What does this mean?", the design has failed. + Self-evident > self-explanatory > requires explanation. + +2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks + beat one click that requires thought. Each step should feel like an obvious + choice (animal, vegetable, or mineral), not a puzzle. + +3. **Omit, then omit again.** Get rid of half the words on each page, then get + rid of half of what's left. Happy talk (self-congratulatory text) must die. + Instructions must die. If they need reading, the design has failed. + +### How Users Actually Behave + +- **Users scan, they don't read.** Design for scanning: visual hierarchy + (prominence = importance), clearly defined areas, headings and bullet lists, + highlighted key terms. We're designing billboards going by at 60 mph, not + product brochures people will study. +- **Users satisfice.** They pick the first reasonable option, not the best. + Make the right choice the most visible choice. +- **Users muddle through.** They don't figure out how things work. They wing + it. If they accomplish their goal by accident, they won't seek the "right" way. + Once they find something that works, no matter how badly, they stick to it. +- **Users don't read instructions.** They dive in. Guidance must be brief, + timely, and unavoidable, or it won't be seen. + +### Billboard Design for Interfaces + +- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass. + Don't innovate on navigation to be clever. Innovate when you KNOW you have a + better idea, otherwise use conventions. Even across languages and cultures, + web conventions let people identify the logo, nav, search, and main content. +- **Visual hierarchy is everything.** Related things are visually grouped. Nested + things are visually contained. More important = more prominent. If everything + shouts, nothing is heard. Start with the assumption everything is visual noise, + guilty until proven innocent. +- **Make clickable things obviously clickable.** No relying on hover states for + discoverability, especially on mobile where hover doesn't exist. Shape, location, + and formatting (color, underlining) must signal clickability without interaction. +- **Eliminate noise.** Three sources: too many things shouting for attention + (shouting), things not organized logically (disorganization), and too much stuff + (clutter). Fix noise by removal, not addition. +- **Clarity trumps consistency.** If making something significantly clearer + requires making it slightly inconsistent, choose clarity every time. + +### Navigation as Wayfinding + +Users on the web have no sense of scale, direction, or location. Navigation +must always answer: What site is this? What page am I on? What are the major +sections? What are my options at this level? Where am I? How can I search? + +Persistent navigation on every page. Breadcrumbs for deep hierarchies. +Current section visually indicated. The "trunk test": cover everything except +the navigation. You should still know what site this is, what page you're on, +and what the major sections are. If not, the navigation has failed. + +### The Goodwill Reservoir + +Users start with a reservoir of goodwill. Every friction point depletes it. + +**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing +users for not doing things your way (formatting requirements on phone numbers). +Asking for unnecessary information. Putting sizzle in their way (splash screens, +forced tours, interstitials). Unprofessional or sloppy appearance. + +**Replenish:** Know what users want to do and make it obvious. Tell them what they +want to know upfront. Save them steps wherever possible. Make it easy to recover +from errors. When in doubt, apologize. + +### Mobile: Same Rules, Higher Stakes + +All the above applies on mobile, just more so. Real estate is scarce, but never +sacrifice usability for space savings. Affordances must be VISIBLE: no cursor +means no hover-to-discover. Touch targets must be big enough (44px minimum). +Flat design can strip away useful visual information that signals interactivity. +Prioritize ruthlessly: things needed in a hurry go close at hand, everything +else a few taps away with an obvious path to get there. + ## Step 0: Session Detection Check for prior design exploration sessions for this project: diff --git a/design-shotgun/SKILL.md.tmpl b/design-shotgun/SKILL.md.tmpl index 2542c7e8..26c33968 100644 --- a/design-shotgun/SKILL.md.tmpl +++ b/design-shotgun/SKILL.md.tmpl @@ -28,6 +28,8 @@ visual brainstorming, not a review process. {{DESIGN_SETUP}} +{{UX_PRINCIPLES}} + ## Step 0: Session Detection Check for prior design exploration sessions for this project: diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index bc9a1d16..d7167b13 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -660,10 +660,95 @@ These aren't a checklist — they're how you see. The perceptual instincts that 11. **Design for trust** — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb). 12. **Storyboard the journey** — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia). -Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys). +Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Steve Krug ("Don't make me think" — the 3-second scan test, the trunk test, satisficing, the goodwill reservoir), Ginny Redish (Letting Go of the Words — writing for scanning), Caroline Jarrett (Forms that Work — mindless form interactions), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys). When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions. +## UX Principles: How Users Actually Behave + +These principles govern how real humans interact with interfaces. They are observed +behavior, not preferences. Apply them before, during, and after every design decision. + +### The Three Laws of Usability + +1. **Don't make me think.** Every page should be self-evident. If a user stops + to think "What do I click?" or "What does this mean?", the design has failed. + Self-evident > self-explanatory > requires explanation. + +2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks + beat one click that requires thought. Each step should feel like an obvious + choice (animal, vegetable, or mineral), not a puzzle. + +3. **Omit, then omit again.** Get rid of half the words on each page, then get + rid of half of what's left. Happy talk (self-congratulatory text) must die. + Instructions must die. If they need reading, the design has failed. + +### How Users Actually Behave + +- **Users scan, they don't read.** Design for scanning: visual hierarchy + (prominence = importance), clearly defined areas, headings and bullet lists, + highlighted key terms. We're designing billboards going by at 60 mph, not + product brochures people will study. +- **Users satisfice.** They pick the first reasonable option, not the best. + Make the right choice the most visible choice. +- **Users muddle through.** They don't figure out how things work. They wing + it. If they accomplish their goal by accident, they won't seek the "right" way. + Once they find something that works, no matter how badly, they stick to it. +- **Users don't read instructions.** They dive in. Guidance must be brief, + timely, and unavoidable, or it won't be seen. + +### Billboard Design for Interfaces + +- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass. + Don't innovate on navigation to be clever. Innovate when you KNOW you have a + better idea, otherwise use conventions. Even across languages and cultures, + web conventions let people identify the logo, nav, search, and main content. +- **Visual hierarchy is everything.** Related things are visually grouped. Nested + things are visually contained. More important = more prominent. If everything + shouts, nothing is heard. Start with the assumption everything is visual noise, + guilty until proven innocent. +- **Make clickable things obviously clickable.** No relying on hover states for + discoverability, especially on mobile where hover doesn't exist. Shape, location, + and formatting (color, underlining) must signal clickability without interaction. +- **Eliminate noise.** Three sources: too many things shouting for attention + (shouting), things not organized logically (disorganization), and too much stuff + (clutter). Fix noise by removal, not addition. +- **Clarity trumps consistency.** If making something significantly clearer + requires making it slightly inconsistent, choose clarity every time. + +### Navigation as Wayfinding + +Users on the web have no sense of scale, direction, or location. Navigation +must always answer: What site is this? What page am I on? What are the major +sections? What are my options at this level? Where am I? How can I search? + +Persistent navigation on every page. Breadcrumbs for deep hierarchies. +Current section visually indicated. The "trunk test": cover everything except +the navigation. You should still know what site this is, what page you're on, +and what the major sections are. If not, the navigation has failed. + +### The Goodwill Reservoir + +Users start with a reservoir of goodwill. Every friction point depletes it. + +**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing +users for not doing things your way (formatting requirements on phone numbers). +Asking for unnecessary information. Putting sizzle in their way (splash screens, +forced tours, interstitials). Unprofessional or sloppy appearance. + +**Replenish:** Know what users want to do and make it obvious. Tell them what they +want to know upfront. Save them steps wherever possible. Make it easy to recover +from errors. When in doubt, apologize. + +### Mobile: Same Rules, Higher Stakes + +All the above applies on mobile, just more so. Real estate is scarce, but never +sacrifice usability for space savings. Affordances must be VISIBLE: no cursor +means no hover-to-discover. Touch targets must be big enough (44px minimum). +Flat design can strip away useful visual information that signals interactivity. +Prioritize ruthlessly: things needed in a hurry go close at hand, everything +else a few taps away with an obvious path to get there. + ## Priority Hierarchy Under Context Pressure Step 0 > Step 0.5 (mockups — generate by default) > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else. @@ -1199,6 +1284,10 @@ FIX TO 10: Rewrite vague UI descriptions with specific alternatives. - One job per section - "If deleting 30% of the copy improves it, keep deleting" - Cards earn their existence — no decorative card grids +- NEVER use small, low-contrast type (body text < 16px or contrast ratio < 4.5:1 on body text) +- NEVER put labels inside form fields as the only label (placeholder-as-label pattern — labels must be visible when the field has content) +- ALWAYS preserve visited vs unvisited link distinction (visited links must have a different color) +- NEVER float headings between paragraphs (heading must be visually closer to the section it introduces than to the preceding section) **AI Slop blacklist** (the 10 patterns that scream "AI-generated"): 1. Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes diff --git a/plan-design-review/SKILL.md.tmpl b/plan-design-review/SKILL.md.tmpl index ff271191..857ff08c 100644 --- a/plan-design-review/SKILL.md.tmpl +++ b/plan-design-review/SKILL.md.tmpl @@ -91,10 +91,12 @@ These aren't a checklist — they're how you see. The perceptual instincts that 11. **Design for trust** — Every design decision either builds or erodes trust. Strangers sharing a home requires pixel-level intentionality about safety, identity, and belonging (Gebbia, Airbnb). 12. **Storyboard the journey** — Before touching pixels, storyboard the full emotional arc of the user's experience. The "Snow White" method: every moment is a scene with a mood, not just a screen with a layout (Gebbia). -Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys). +Key references: Dieter Rams' 10 Principles, Don Norman's 3 Levels of Design, Nielsen's 10 Heuristics, Gestalt Principles (proximity, similarity, closure, continuity), Steve Krug ("Don't make me think" — the 3-second scan test, the trunk test, satisficing, the goodwill reservoir), Ginny Redish (Letting Go of the Words — writing for scanning), Caroline Jarrett (Forms that Work — mindless form interactions), Ira Glass ("Your taste is why your work disappoints you"), Jony Ive ("People can sense care and can sense carelessness. Different and new is relatively easy. Doing something that's genuinely better is very hard."), Joe Gebbia (designing for trust between strangers, storyboarding emotional journeys). When reviewing a plan, empathy as simulation runs automatically. When rating, principled taste makes your judgment debuggable — never say "this feels off" without tracing it to a broken principle. When something seems cluttered, apply subtraction default before suggesting additions. +{{UX_PRINCIPLES}} + ## Priority Hierarchy Under Context Pressure Step 0 > Step 0.5 (mockups — generate by default) > Interaction State Coverage > AI Slop Risk > Information Architecture > User Journey > everything else. diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 4da9203f..7aa8e4a6 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -542,6 +542,12 @@ for (const currentHost of hostsToRun) { const lines = content.split('\n').length; const tokens = Math.round(content.length / 4); // ~4 chars per token tokenBudget.push({ skill: relOutput, lines, tokens }); + + // Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB) + const TOKEN_CEILING_BYTES = 100_000; + if (content.length > TOKEN_CEILING_BYTES) { + console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`); + } } // Generate gstack-lite and gstack-full for OpenClaw host diff --git a/scripts/resolvers/design.ts b/scripts/resolvers/design.ts index 208b1db3..926e3484 100644 --- a/scripts/resolvers/design.ts +++ b/scripts/resolvers/design.ts @@ -99,9 +99,13 @@ The most uniquely designer-like output. Form a gut reaction before analyzing any 3. Write the **First Impression** using this structured critique format: - "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?) - "I notice **[observation]**." (what stands out, positive or negative — be specific) - - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?) + - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these the 3 things the designer intended? If not, the visual hierarchy is lying.) - "If I had to describe this in one word: **[word]**." (gut verdict) +**Narration mode:** Write this section in first person, as if you are a user scanning the page for the first time. "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely, then... wait, is that a button?" Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually scanning, you're generating platitudes. + +**Page Area Test:** Point at each clearly defined area of the page. Can you instantly name its purpose? ("Things I can buy," "Today's deals," "How to search.") Areas you can't name in 2 seconds are poorly defined. List them. + This is the section users read first. Be opinionated. A designer doesn't hedge — they react. --- @@ -157,6 +161,19 @@ $B url \`\`\` If URL contains \`/login\`, \`/signin\`, \`/auth\`, or \`/sso\`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run \`/setup-browser-cookies\` first if needed." +### Trunk Test (run on every page) + +Imagine being dropped on this page with no context. Can you immediately answer: +1. What site is this? (Site ID visible and identifiable) +2. What page am I on? (Page name prominent, matches what I clicked) +3. What are the major sections? (Primary nav visible and clear) +4. What are my options at this level? (Local nav or content choices obvious) +5. Where am I in the scheme of things? ("You are here" indicator, breadcrumbs) +6. How can I search? (Search box findable without hunting) + +Score: PASS (all 6 clear) / PARTIAL (4-5 clear) / FAIL (3 or fewer clear). +A FAIL on the trunk test is a HIGH-impact finding regardless of how polished the visual design is. + ### Design Audit Checklist (10 categories, ~80 items) Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category. @@ -225,6 +242,7 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish - Success: confirmation animation or color, auto-dismiss - Touch targets >= 44px on all interactive elements - \`cursor: pointer\` on all clickable elements +- Mindless choice audit: every decision point (button, link, dropdown, modal choice) is a mindless click (obvious what happens). If a click requires thought about whether it's the right choice, flag as HIGH. **6. Responsive Design** (8 items) - Mobile layout makes *design* sense (not just stacked desktop columns) @@ -253,6 +271,9 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish - Active voice ("Install the CLI" not "The CLI will be installed") - Loading states end with \`…\` ("Saving…" not "Saving...") - Destructive actions have confirmation modal or undo window +- Happy talk detection: scan for introductory paragraphs that start with "Welcome to..." or tell users how great the site is. If you can hear "blah blah blah", it's happy talk. Flag for removal. +- Instructions detection: any visible instructions longer than one sentence. If users need to read instructions, the design has failed. Flag the instructions AND the interaction they're compensating for. +- Happy talk word count: count total visible words on the page. Classify each text block as "useful content" vs "happy talk" (welcome paragraphs, self-congratulatory text, instructions nobody reads). Report: "This page has X words. Y (Z%) are happy talk." **9. AI Slop Detection** (10 anti-patterns — the blacklist) @@ -286,6 +307,43 @@ Evaluate: - **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate? - **Form polish:** Focus states visible? Validation timing correct? Errors near the source? +**Narration mode:** Narrate the flow in first person. "I click 'Sign Up'... spinner appears... 3 seconds pass... still spinning... I'm getting nervous. Finally the dashboard loads, but where am I? The nav doesn't highlight anything." Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually experiencing the flow, you're generating platitudes. + +### Goodwill Reservoir (track across the flow) + +As you walk the user flow, maintain a mental goodwill meter (starts at 70/100). +These scores are heuristic, not measured. The value is in identifying specific +drains and fills, not in the final number. + +Subtract points for: +- Hidden information the user would want (pricing, contact, shipping): subtract 15 +- Format punishment (rejecting valid input like dashes in phone numbers): subtract 10 +- Unnecessary information requests: subtract 10 +- Interstitials, splash screens, forced tours blocking the task: subtract 15 +- Sloppy or unprofessional appearance: subtract 10 +- Ambiguous choices that require thinking: subtract 5 each + +Add points for: +- Top user tasks are obvious and prominent: add 10 +- Upfront about costs and limitations: add 5 +- Saves steps (direct links, smart defaults, autofill): add 5 each +- Graceful error recovery with specific fix instructions: add 10 +- Apologizes when things go wrong: add 5 + +Report the final goodwill score with a visual dashboard: + +\`\`\` +Goodwill: 70 ████████████████████░░░░░░░░░░ + Step 1: Login page 70 → 75 (+5 obvious primary action) + Step 2: Dashboard 75 → 60 (-15 interstitial tour popup) + Step 3: Settings 60 → 50 (-10 format punishment on phone) + Step 4: Billing 50 → 35 (-15 hidden pricing info) + FINAL: 35/100 ⚠️ CRITICAL UX DEBT +\`\`\` + +Below 30 = critical UX debt. 30-60 = needs work. Above 60 = healthy. +Include the biggest drains and fills as specific findings. + --- ## Phase 5: Cross-Page Consistency @@ -716,6 +774,10 @@ ${litmusItems} - One job per section - "If deleting 30% of the copy improves it, keep deleting" - Cards earn their existence — no decorative card grids +- NEVER use small, low-contrast type (body text < 16px or contrast ratio < 4.5:1 on body text) +- NEVER put labels inside form fields as the only label (placeholder-as-label pattern — labels must be visible when the field has content) +- ALWAYS preserve visited vs unvisited link distinction (visited links must have a different color) +- NEVER float headings between paragraphs (heading must be visually closer to the section it introduces than to the preceding section) **AI Slop blacklist** (the 10 patterns that scream "AI-generated"): ${slopItems} @@ -948,3 +1010,91 @@ echo '{"approved_variant":"","feedback":"","date":"'$(date -u +%Y-%m-%dT% \`\`\``; } +// ─── UX Behavioral Foundations (Krug + HCI research) ─── +export function generateUXPrinciples(_ctx: TemplateContext): string { + return `## UX Principles: How Users Actually Behave + +These principles govern how real humans interact with interfaces. They are observed +behavior, not preferences. Apply them before, during, and after every design decision. + +### The Three Laws of Usability + +1. **Don't make me think.** Every page should be self-evident. If a user stops + to think "What do I click?" or "What does this mean?", the design has failed. + Self-evident > self-explanatory > requires explanation. + +2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks + beat one click that requires thought. Each step should feel like an obvious + choice (animal, vegetable, or mineral), not a puzzle. + +3. **Omit, then omit again.** Get rid of half the words on each page, then get + rid of half of what's left. Happy talk (self-congratulatory text) must die. + Instructions must die. If they need reading, the design has failed. + +### How Users Actually Behave + +- **Users scan, they don't read.** Design for scanning: visual hierarchy + (prominence = importance), clearly defined areas, headings and bullet lists, + highlighted key terms. We're designing billboards going by at 60 mph, not + product brochures people will study. +- **Users satisfice.** They pick the first reasonable option, not the best. + Make the right choice the most visible choice. +- **Users muddle through.** They don't figure out how things work. They wing + it. If they accomplish their goal by accident, they won't seek the "right" way. + Once they find something that works, no matter how badly, they stick to it. +- **Users don't read instructions.** They dive in. Guidance must be brief, + timely, and unavoidable, or it won't be seen. + +### Billboard Design for Interfaces + +- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass. + Don't innovate on navigation to be clever. Innovate when you KNOW you have a + better idea, otherwise use conventions. Even across languages and cultures, + web conventions let people identify the logo, nav, search, and main content. +- **Visual hierarchy is everything.** Related things are visually grouped. Nested + things are visually contained. More important = more prominent. If everything + shouts, nothing is heard. Start with the assumption everything is visual noise, + guilty until proven innocent. +- **Make clickable things obviously clickable.** No relying on hover states for + discoverability, especially on mobile where hover doesn't exist. Shape, location, + and formatting (color, underlining) must signal clickability without interaction. +- **Eliminate noise.** Three sources: too many things shouting for attention + (shouting), things not organized logically (disorganization), and too much stuff + (clutter). Fix noise by removal, not addition. +- **Clarity trumps consistency.** If making something significantly clearer + requires making it slightly inconsistent, choose clarity every time. + +### Navigation as Wayfinding + +Users on the web have no sense of scale, direction, or location. Navigation +must always answer: What site is this? What page am I on? What are the major +sections? What are my options at this level? Where am I? How can I search? + +Persistent navigation on every page. Breadcrumbs for deep hierarchies. +Current section visually indicated. The "trunk test": cover everything except +the navigation. You should still know what site this is, what page you're on, +and what the major sections are. If not, the navigation has failed. + +### The Goodwill Reservoir + +Users start with a reservoir of goodwill. Every friction point depletes it. + +**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing +users for not doing things your way (formatting requirements on phone numbers). +Asking for unnecessary information. Putting sizzle in their way (splash screens, +forced tours, interstitials). Unprofessional or sloppy appearance. + +**Replenish:** Know what users want to do and make it obvious. Tell them what they +want to know upfront. Save them steps wherever possible. Make it easy to recover +from errors. When in doubt, apologize. + +### Mobile: Same Rules, Higher Stakes + +All the above applies on mobile, just more so. Real estate is scarce, but never +sacrifice usability for space savings. Affordances must be VISIBLE: no cursor +means no hover-to-discover. Touch targets must be big enough (44px minimum). +Flat design can strip away useful visual information that signals interactivity. +Prioritize ruthlessly: things needed in a hurry go close at hand, everything +else a few taps away with an obvious path to get there.`; +} + diff --git a/scripts/resolvers/index.ts b/scripts/resolvers/index.ts index 072b1a3d..e765d16c 100644 --- a/scripts/resolvers/index.ts +++ b/scripts/resolvers/index.ts @@ -9,7 +9,7 @@ import type { TemplateContext, ResolverFn } from './types'; import { generatePreamble } from './preamble'; import { generateTestFailureTriage } from './preamble'; import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup } from './browse'; -import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop } from './design'; +import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateUXPrinciples } from './design'; import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing'; import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review'; import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility'; @@ -30,6 +30,7 @@ export const RESOLVERS: Record = { QA_METHODOLOGY: generateQAMethodology, DESIGN_METHODOLOGY: generateDesignMethodology, DESIGN_HARD_RULES: generateDesignHardRules, + UX_PRINCIPLES: generateUXPrinciples, DESIGN_OUTSIDE_VOICES: generateDesignOutsideVoices, DESIGN_REVIEW_LITE: generateDesignReviewLite, REVIEW_DASHBOARD: generateReviewDashboard, diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 1da5db6d..c78c1873 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -143,6 +143,7 @@ describe('Command registry consistency', () => { const validKeys = new Set([ 'interactive', 'compact', 'depth', 'selector', 'diff', 'annotate', 'outputPath', 'cursorInteractive', + 'heatmap', ]); for (const flag of SNAPSHOT_FLAGS) { expect(validKeys.has(flag.optionKey)).toBe(true);