gstack

CalvinBackup/gstack

Fork 0

mirror of https://github.com/garrytan/gstack.git synced 2026-06-01 15:51:41 +02:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Garry Tan	cf50443b63	v1.45.0.0 feat(design): persistent board daemon — 24h boards, one tab, board history (#1710 ) * refactor(design): board JS uses relative paths; drop __GSTACK_SERVER_URL injection Board JS in design/src/compare.ts now calls ./api/feedback and ./api/progress (relative to location.pathname) and feature-detects server mode via location.protocol instead of the injected window.__GSTACK_SERVER_URL global. The injection in design/src/serve.ts is removed (dead code now that nothing reads it). Tests updated to match the new contract: serve.test.ts asserts the relative-path JS is present and the global is gone; feedback-roundtrip asserts location.protocol detects HTTP mode. Why: prep for the multi-board daemon (design/src/daemon.ts upcoming) where the same generated HTML is served at /boards/<id>/ instead of /. Relative paths resolve against location.pathname in both cases, so one HTML, two hosts. The injection was the only thing tying board JS to a specific serving path; removing it unblocks the daemon work without forking the generator. file:// fallback preserved via the location.protocol feature-detect — board opened directly as a file still falls through to the DOM-only success path. The 6 feedback-roundtrip browser tests continue to fail with session.clearLoadedHtml undefined; that failure pre-exists this branch (verified against HEAD with these edits stashed) and lives in browse/src/write-commands.ts, not in the design code path. Tracking separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): reload guard rejects directory paths design/src/serve.ts:200-212 used to accept a path that resolved to the allowedDir itself (the OR branch `\|\| resolvedReload === allowedDir`), which then crashed readFileSync with EISDIR. Now: 1. startsWith(allowedDir + path.sep) must pass — rejects the dir itself and anything outside (403). 2. statSync(resolvedReload).isFile() must pass — rejects subdirectories inside allowedDir with a clear "Path must be a file" 400. The test stub in serve.test.ts mirrors prod; both updated, plus two new test cases for the previously-broken paths. Codex caught this in the plan-review pass; it's a latent bug in shipping code, not a regression from the daemon work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design): introduce design daemon — multi-board persistent server Adds design/src/daemon.ts: a Bun.serve daemon that hosts many boards under /boards/<id>/ instead of one server per `$D compare --serve` call. Spawned by daemon-client (next commit); for now wired only via tests. Endpoint table: GET /health liveness + version + counts (unauth) GET / index of recent boards POST /api/boards publish; daemon derives sourceDir from realpath(html). body sourceDir IGNORED (Codex trust-boundary fix). POST /shutdown graceful; refuses if active boards exist (Codex data-loss fix) GET /boards/<id> 301 → /boards/<id>/ (trailing slash is load-bearing — relative URLs in board JS resolve against pathname) GET /boards/<id>/ render board HTML GET /boards/<id>/api/progress state machine status (no idle reset) POST /boards/<id>/api/feedback submit/regen; writes feedback.json or feedback-pending.json with boardId + publishedAt augmented in POST /boards/<id>/api/reload swap HTML; per-board allowedDir guard rejects traversal, directories, out-of-allowed-dir symlinks Lifecycle: - 24h idle timeout (DESIGN_DAEMON_IDLE_MS for tests). - Idle with active boards extends 1h up to 4x, then force-shuts (Codex). - LRU cap 50 boards; evicts done before non-done; 503 when 50 non-done. - Per-board async mutex serializes feedback POST vs reload POST. - SIGTERM/SIGINT/uncaughtException → graceful shutdown, state file unlink. - Stdout: DAEMON_STARTED port=<N> (the line the client parses). Shared utilities live in design/src/daemon-state.ts: atomic state-file write/read (mode 0o600), fs.openSync('wx') lock, isProcessAlive, cmdline identity verification (/proc on Linux, ps on macOS), CMDLINE_MARKER constant. Modeled on browse/src/cli.ts lock + spawn patterns. design/test/daemon.test.ts: 30 tests, all green. Covers every endpoint, both error paths and happy paths, cross-board feedback isolation, the trailing-slash redirect, the directory-not-file reload rejection, LRU preferring done over non-done, /shutdown refusal with active boards, all path-traversal guards. Uses the exported fetchHandler in-process (no spawn) so the suite runs in ~70ms. design/test/daemon-tests-fixtures.ts: shared helpers — req() builder, tmp-dir helpers, daemon reset, and a spawnDaemonForTest() helper used by the next commit's discovery tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design): daemon-client with lock + identity-verified spawn design/src/daemon-client.ts implements the CLI side of the daemon lifecycle: ensureDaemon() (the spawn-or-attach decision), publishBoard(), and the $D daemon stop\|status helpers. Modeled on browse/src/cli.ts:317-415 — same health-check-first attach, same fs.openSync('wx') lock, same re-read-state-INSIDE-the-lock guard against two CLIs both deciding "no daemon, spawn." Two design-specific safety properties added beyond browse: 1. verifyIdentity before any SIGTERM/SIGKILL. Reads the running process's cmdline (/proc/PID/cmdline on Linux, `ps -p PID -o command=` on macOS) and only signals if it contains CMDLINE_MARKER ("gstack-design-daemon", passed as argv at spawn time). Prevents a stale state file from causing us to kill an unrelated process that inherited the PID. 2. Refuse-kill-with-active-boards on version mismatch. Browse silently restarts; here in-memory board history would vanish, so the client prints a user-actionable WARNING and exit 1 instead. Users explicitly `$D daemon stop` to override. Spawn uses Node child_process.spawn (NOT Bun.spawn().unref) because of the macOS session-detach quirks browse already discovered. Stdio is redirected to ~/.gstack/design-daemon-startup.log, which the client tails into stderr if waitForHealthOrError times out — no more silent "daemon failed for some unknowable reason." daemon-state.ts gains DESIGN_DAEMON_STATE_FILE env override so tests can point both client and spawned daemon at a per-test path without a shared cwd. design/test/daemon-discovery.test.ts: 17 tests, all green in ~8s. Covers: spawn-fresh, attach-existing, stale-state-file (pid dead), PID-reuse safety (uses the test runner's own PID as the bait — verifyIdentity catches the cmdline mismatch, daemon not signaled), version-mismatch with/without active boards (the active-boards case runs a subprocess and asserts exit 1 + WARNING in stderr), publishBoard 200 + 409, shutdownDaemon refuse/force/unresponsive paths, daemonStatus. The daemon-discovery suite is split out of daemon.test.ts because each real spawn costs ~200ms; the in-process daemon.test.ts (30 tests, 70ms) covers the same handler logic without the spawn overhead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design): wire daemon dispatch into CLI; add daemon stop/status design/src/cli.ts now branches on --no-daemon for both `compare --serve` and standalone `serve --html`. Default path: ensureDaemon → publishBoard → openBrowser → exit. The legacy single-process serve() is preserved behind --no-daemon for tests, Windows, and explicit debugging. Adds $D daemon status (prints daemon state JSON, or {running:false}) and $D daemon stop [--force] (refuses with active boards unless --force). parseArgs gains a `positionals` field so daemon sub-commands work naturally (`$D daemon stop` instead of `$D --action stop`). Stderr lines printed by the publishToDaemon path: DAEMON_STARTED port=N (or DAEMON_ATTACHED port=N) BOARD_PUBLISHED: <url> BOARD_URL: <url> (alias for grep-friendliness) Stdout: JSON with id, url, sourceDir. design/src/commands.ts: --no-daemon, --title added to compare + serve; new daemon command entry with status\|stop sub-commands. End-to-end smoke (manual): spawning a board via $D serve, hitting the returned URL, reading /health, calling daemon status (returns the right JSON), and daemon stop refusing because of the active board — all work as designed. Force-stop tears down cleanly and removes the state file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(design): end-to-end daemon round-trip via HTTP fetch design/test/feedback-roundtrip-daemon.test.ts walks the full publish → submit / regenerate / reload cycle against a real spawned daemon, using the same HTTP calls the board JS makes. Four tests, all green in ~650ms. Covers what design-shotgun and friends actually depend on: - Submit writes feedback.json into the board's sourceDir with the augmented boardId + publishedAt fields. - GET /boards/<id> (no slash) returns a 301 to /boards/<id>/ — the load-bearing redirect that lets the board JS use relative paths. - Regenerate writes feedback-pending.json, flips state to regenerating, /api/progress reflects it; /api/reload swaps HTML in place; round-2 submit writes the final feedback.json with the round-2 selection. - Two boards published into the same daemon get independent URLs on the same port — feedback for board A doesn't contaminate board B's sourceDir, both URLs serve their own content, the index lists both. Uses HTTP fetch rather than a real browser because the existing browser round-trip (feedback-roundtrip.test.ts) is broken on a pre-existing browse harness regression (session.clearLoadedHtml undefined in browse/src/write-commands.ts:149) that's unrelated to this branch. The HTTP path proves the same daemon semantics; a browser variant can be added once the browse harness is fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(design): compiled binary self-execs as daemon; unified version lookup Two small but production-critical fixes once the binary actually runs: 1. Compiled binary couldn't spawn the daemon. daemon-client previously pointed at design/src/daemon.ts via import.meta.dir — fine in dev, fatal in production (the source path doesn't exist on a user's machine). Fix: design CLI now self-execs in --daemon-mode when invoked with that flag, so the spawn is `process.execPath --daemon-mode --marker gstack-design-daemon` for the compiled binary and `bun run cli.ts --daemon-mode ...` in dev. Same one binary, two modes, no separate daemon entrypoint to ship. 2. Client and daemon disagreed on VERSION in the compiled binary. Both used a source-tree-relative path that resolves to "unknown" at runtime, which silently shorted the version-mismatch refusal path (client expected "unknown" + daemon reported "unknown" → match → no refusal even when DESIGN_DAEMON_VERSION was set on one side). New readVersionString() consults DESIGN_DAEMON_VERSION env first, then design/dist/.version (sidecar baked at build time by build.sh), then VERSION at the source-tree root. Both client and daemon now go through this one helper. Manual smoke (compiled binary, all checks green): - DAEMON_STARTED + BOARD_PUBLISHED with trailing slash - GET /boards/<id> (no slash) → 301 Location /boards/<id>/ - Second `$D serve` invocation → DAEMON_ATTACHED, new board on same port - feedback.json gets boardId + publishedAt fields - DESIGN_DAEMON_VERSION=v2-different on second invocation with active board → WARNING + "Refusing to auto-kill" + exit 1, original daemon still alive - `$D daemon stop --force` removes state file All 67 design tests still green after the refactor (16 serve + 30 daemon + 17 discovery + 4 daemon round-trip). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): skill resolvers learn the daemon's BOARD_URL output The five skills that invoke $D compare --serve (design-shotgun, design-consultation, plan-design-review, office-hours, design-review) parsed `SERVE_STARTED: port=N` from stderr and then POSTed to `/api/reload` at that port during regenerate cycles. The new daemon hosts boards under `/boards/<id>/` so the reload endpoint moved to `<BOARD_URL>api/reload` — without this update, the regenerate phase of every skill invocation would silently fail against daemon mode. Updated scripts/resolvers/design.ts to parse `BOARD_URL:` instead of the port, and to POST reloads against the per-board URL. Regenerated the four SKILL.md files via bun run gen:skill-docs. Legacy `--no-daemon` invocations continue to emit `SERVE_STARTED:` and serve at `/api/reload` — the resolver instructions note both. Surfaced by the maintainability specialist during /ship review (the "stale comment" finding was actually a behavior bug pointing at five downstream consumers). Codex's plan-review pass flagged the migration story as incomplete but I dismissed the concern — Codex was right. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(design): emit SERVE_STARTED back-compat alias; drop dead import design/src/cli.ts publishToDaemon now emits `SERVE_STARTED: port=N html=<path>` as a third stderr line alongside DAEMON_STARTED/DAEMON_ATTACHED + BOARD_URL. Any out-of-tree script that grepped the legacy line still gets the port — they'd still fail at the reload step (the endpoint moved to /boards/<id>/ api/reload) but they no longer fail at the port-detection step. Combined with the resolver updates one commit back, this is belt-and-suspenders compat. Fixed the stale docstring at cli.ts:316 that claimed back-compat without actually emitting the alias. The maintainability specialist flagged it. Dropped a dead `DaemonState` import from daemon-client.ts. Same review pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.45.0.0) Design boards now live 24h, not 10 minutes. One daemon hosts every board, one tab survives the whole day. See CHANGELOG.md for the full release summary + metrics + itemized changes. TODOS.md gains a "design daemon: follow-ups" section capturing the P3 test gaps + maintainability nits the /ship review army flagged but that aren't blocking for this release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(design): fill daemon test gaps surfaced by ship review army Adds 10 net new tests (and removes 1 misleading smoke) for the gaps the testing specialist flagged at /ship time. Filed as P3 TODOs at ship, filling now per boil-the-lake. design/test/daemon-discovery.test.ts (+6 tests, +1 import): - "idle daemon (no boards) shuts itself down after IDLE_MS + CHECK_MS" Spawn-based, DESIGN_DAEMON_IDLE_MS=2000, CHECK_MS=200. Waits for the daemon process to actually exit and asserts the state file is removed. Previously only "callable without throwing" was tested. - "bare GET polling does NOT prevent idle shutdown" Hammers /api/progress every 200ms in a background loop with a done board, asserts the daemon still idles out — proves the meaningful-activity-only-on-POSTs guard (Codex finding) actually works. - "idle with active (non-done) boards triggers extension instead of shutdown" Sets DESIGN_DAEMON_EXTENSION_MS=1500 + MAX_EXTENSIONS=2, publishes a non-done board, asserts the daemon survives past IDLE_MS (extends), then verifies the MAX_EXTENSIONS hard ceiling force-shuts. Both the extension counter and the hard ceiling were previously untested. - "two parallel ensureDaemon() calls converge on one daemon" Fires two ensureDaemon calls in Promise.all against an empty stateFile, asserts: both ports match, exactly one spawned=true, exactly one daemon alive, no orphaned lock file. The discovery-test file's own docstring claimed this test existed; now it actually does. - "acquireLock reclaims a lockfile owned by a dead PID" Plants a lockfile with PID 999999998, calls acquireLock, asserts the returned release fn is non-null and the lock now holds our PID. - "acquireLock refuses to reclaim a lockfile owned by an alive PID" Uses the test runner's own PID — alive but not the lock's intended owner. Asserts acquireLock returns null and leaves the lockfile untouched. The unrelated-process-PID-reuse safety guard. design/test/daemon.test.ts (-2 misleading, +5 new = +3 net): - Removed: "bare GET /api/progress does NOT reset meaningful activity" (smoke pretending to be behavioral — body comment admitted it couldn't verify). Replaced by the spawn-based version in daemon-discovery above. - Removed: "idleCheckTick is callable without throwing when there's no idle" (collapsed into a single smoke describe that's clearer about its scope). - Added: "POST /api/boards rejects invalid JSON body" - Added: "POST /api/boards rejects non-object body (e.g. JSON null)" - Added: "POST /api/boards: array body falls through to missing-html 400" (documents the typeof-array-is-object JS quirk; will surface if we ever tighten the type check) - Added: "POST /boards/<id>/api/reload rejects invalid JSON body" - Added: "POST /boards/<id>/api/reload rejects body missing html field" Per-file totals after: serve 16, daemon 34, discovery 23, round-trip 4 = 77. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update CHANGELOG + TODOS for filled test gaps in v1.45.0.0 Bumps the design test count from 67 → 77 (and the new-test delta from +51 → +61) to reflect commit `6b037c55`, which filled the 5 P3 test gaps the /ship review army had filed to TODOS.md. Marks the "Tighten daemon test coverage" entry in TODOS.md as DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 20:45:12 -07:00
Garry Tan	03973c2fab	fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 ) * fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>	2026-04-06 00:47:04 -07:00
Garry Tan	78bc1d1968	feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0) (#551 ) * docs: design tools v1 plan — visual mockup generation for gstack skills Full design doc covering the `design` binary that wraps OpenAI's GPT Image API to generate real UI mockups from gstack's design skills. Includes comparison board UX spec, auth model, 6 CEO expansions (design memory, mockup diffing, screenshot evolution, design intent verification, responsive variants, design-to-code prompt), and 9-commit implementation plan. Reviewed: /office-hours + /plan-eng-review (CLEARED) + /plan-ceo-review (EXPANSION, 6/6 accepted) + /plan-design-review (2/10 → 8/10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design tools prototype validation — GPT Image API works Prototype script sends 3 design briefs to OpenAI Responses API with image_generation tool. Results: dashboard (47s, 2.1MB), landing page (42s, 1.3MB), settings page (37s, 1.3MB) all produce real, implementable UI mockups with accurate text rendering and clean layouts. Key finding: Codex OAuth tokens lack image generation scopes. Direct API key (sk-proj-) required, stored in ~/.gstack/openai.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> feat: design binary core — generate, check, compare commands Stateless CLI (design/dist/design) wrapping OpenAI Responses API for UI mockup generation. Three working commands: - generate: brief -> PNG mockup via gpt-4o + image_generation tool - check: vision-based quality gate via GPT-4o (text readability, layout completeness, visual coherence) - compare: generates self-contained HTML comparison board with star ratings, radio Pick, per-variant feedback, regenerate controls, and Submit button that writes structured JSON for agent polling Auth reads from ~/.gstack/openai.json (0600), falls back to OPENAI_API_KEY env var. Compiled separately from browse binary (openai added to devDependencies, not runtime deps). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design binary variants + iterate commands variants: generates N style variations with staggered parallel (1.5s between launches, exponential backoff on 429). 7 built-in style variations (bold, calm, warm, corporate, dark, playful + default). Tested: 3/3 variants in 41.6s. iterate: multi-turn design iteration using previous_response_id for conversational threading. Falls back to re-generation with accumulated feedback if threading doesn't retain visual context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DESIGN_SETUP + DESIGN_MOCKUP template resolvers Add generateDesignSetup() and generateDesignMockup() to the existing design.ts resolver file. Add designDir to HostPaths (claude + codex). Register DESIGN_SETUP and DESIGN_MOCKUP in the resolver index. DESIGN_SETUP: $D binary discovery (mirrors $B browse setup pattern). Falls back to DESIGN_SKETCH if binary not available. DESIGN_MOCKUP: full visual exploration workflow template — construct brief from DESIGN.md context, generate 3 variants, open comparison board in Chrome, poll for user feedback, save approved mockup to docs/designs/, generate HTML wireframe for implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION file (0.12.2.0) Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was 0.12.0.0. Also adds design binary to build script and dev:design convenience command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /office-hours visual design exploration integration Add {{DESIGN_MOCKUP}} to office-hours template before the existing {{DESIGN_SKETCH}}. When the design binary is available, /office-hours generates 3 visual mockup variants, opens a comparison board in Chrome, and polls for user feedback. Falls back to HTML wireframes if the design binary isn't built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /plan-design-review visual mockup integration Add {{DESIGN_SETUP}} to pre-review audit and "show me what 10/10 looks like" mockup generation to the 0-10 rating method. When a design dimension rates below 7/10, the review can generate a mockup showing the improved version. Falls back to text descriptions if the design binary isn't available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design memory — extract visual language from mockups into DESIGN.md New `$D extract` command: sends approved mockup to GPT-4o vision, extracts color palette, typography, spacing, and layout patterns, writes/updates DESIGN.md with an "Extracted Design Language" section. Progressive constraint: if DESIGN.md exists, future mockup briefs include it as style context. If no DESIGN.md, explorations run wide. readDesignConstraints() reads existing DESIGN.md for brief construction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: mockup diffing + design intent verification New commands: - $D diff --before old.png --after new.png: visual diff using GPT-4o vision. Returns differences by area with severity (high/medium/low) and a matchScore (0-100). - $D verify --mockup approved.png --screenshot live.png: compares live site screenshot against approved design mockup. Pass if matchScore >= 70 and no high-severity differences. Used by /design-review to close the design loop: design -> implement -> verify visually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: screenshot-to-mockup evolution ($D evolve) New command: $D evolve --screenshot current.png --brief "make it calmer" Two-step process: first analyzes the screenshot via GPT-4o vision to produce a detailed description, then generates a new mockup that keeps the existing layout structure but applies the requested changes. Starts from reality, not blank canvas. Bridges the gap between /design-review critique ("the spacing is off") and a visual proposal of the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: responsive variants + design-to-code prompt Responsive variants: $D variants --viewports desktop,tablet,mobile generates mockups at 1536x1024, 1024x1024, and 1024x1536 (portrait) with viewport-appropriate layout instructions. Design-to-code prompt: $D prompt --image approved.png extracts colors, typography, layout, and components via GPT-4o vision, producing a structured implementation prompt. Reads DESIGN.md for additional constraint context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gstack designer as first-class tool in /plan-design-review Brand the gstack designer prominently, add Step 0.5 for proactive visual mockup generation before review passes, and update priority hierarchy. When a plan describes new UI, the skill now offers to generate mockups with $D variants, run $D check for quality gating, and present a comparison board via $B goto before any review passes begin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate mockups into review passes and outputs Thread Step 0.5 mockups through the review workflow: Pass 4 (AI Slop) evaluates generated mockups visually, Pass 7 uses mockups as evidence for unresolved decisions, post-pass offers one-shot regeneration after design changes, and Approved Mockups section records chosen variants with paths for the implementer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gstack designer target mockups in /design-review fix loop Add $D generate for target mockups in Phase 8a.5 — before fixing a design finding, generate a mockup showing what it should look like. Add $D verify in Phase 9 to compare fix results against targets. Not plan mode — goes straight to implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gstack designer AI mockups in /design-consultation Phase 5 Replace HTML preview with $D variants + comparison board when designer is available (Path A). Use $D extract to derive DESIGN.md tokens from the approved mockup. Handles both plan mode (write to plan) and non-plan mode (implement immediately). Falls back to HTML preview (Path B) when designer binary is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make gstack designer the default in /plan-design-review, not optional The transcript showed the agent writing 5 text descriptions of homepage variants instead of generating visual mockups, even when the user explicitly asked for design tools. The skill treated mockups as optional ("Want me to generate?") when they should be the default behavior. Changes: - Rename "Your Visual Design Tool" to "YOUR PRIMARY TOOL" with aggressive language: "Don't ask permission. Show it." - Step 0.5 now generates mockups automatically when DESIGN_READY, no AskUserQuestion gatekeeping the default path - Priority hierarchy: mockups are "non-negotiable" not "if available" - Step 0D tells the user mockups are coming next - DESIGN_NOT_AVAILABLE fallback now tells user what they're missing The only valid reasons to skip mockups: no UI scope, or designer not installed. Everything else generates by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: persist design mockups to ~/.gstack/projects/$SLUG/designs/ Mockups were going to .context/mockups/ (gitignored, workspace-local). This meant designs disappeared when switching workspaces or conversations, and downstream skills couldn't reference approved mockups from earlier reviews. Now all three design skills save to persistent project-scoped dirs: - /plan-design-review: ~/.gstack/projects/$SLUG/designs/<screen>-<date>/ - /design-consultation: ~/.gstack/projects/$SLUG/designs/design-system-<date>/ - /design-review: ~/.gstack/projects/$SLUG/designs/design-audit-<date>/ Each directory gets an approved.json recording the user's pick, feedback, and branch. This lets /design-review verify against mockups that /plan-design-review approved, and design history is browsable via ls ~/.gstack/projects/$SLUG/designs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate codex ship skill with zsh glob guards Picked up setopt +o nomatch guards from main's v0.12.8.1 merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add browse binary discovery to DESIGN_SETUP resolver The design setup block now discovers $B alongside $D, so skills can open comparison boards via $B goto and poll feedback via $B eval. Falls back to `open` on macOS when browse binary is unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: comparison board DOM polling in plan-design-review After opening the comparison board, the agent now polls #status via $B eval instead of asking a rigid AskUserQuestion. Handles submit (read structured JSON feedback), regenerate (new variants with updated brief), and $B-unavailable fallback (free-form text response). The user interacts with the real board UI, not a constrained option picker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comparison board feedback loop integration test 16 tests covering the full DOM polling cycle: structure verification, submit with pick/rating/comment, regenerate flows (totally different, more like this, custom text), and the agent polling pattern (empty → submitted → read JSON). Uses real generateCompareHtml() from design/src/compare.ts, served via HTTP. Runs in <1s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add $D serve command for HTTP-based comparison board feedback The comparison board feedback loop was fundamentally broken: browse blocks file:// URLs (url-validation.ts:71), so $B goto file://board.html always fails. The fallback open + $B eval polls a different browser instance. $D serve fixes this by serving the board over HTTP on localhost. The server is stateful: stays alive across regeneration rounds, exposes /api/progress for the board to poll, and accepts /api/reload from the agent to swap in new board HTML. Stdout carries feedback JSON only; stderr carries telemetry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: dual-mode feedback + post-submit lifecycle in comparison board When __GSTACK_SERVER_URL is set (injected by $D serve), the board POSTs feedback to the server instead of only writing to hidden DOM elements. After submit: disables all inputs, shows "Return to your coding agent." After regenerate: shows spinner, polls /api/progress, auto-refreshes on ready. On POST failure: shows copyable JSON fallback. On progress timeout (5 min): shows error with /design-shotgun prompt. DOM fallback preserved for headed browser mode and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: HTTP serve command endpoints and regeneration lifecycle 11 tests covering: HTML serving with injected server URL, /api/progress state reporting, submit → done lifecycle, regenerate → regenerating state, remix with remixSpec, malformed JSON rejection, /api/reload HTML swapping, missing file validation, and full regenerate → reload → submit round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add DESIGN_SHOTGUN_LOOP resolver + fix design artifact paths Adds generateDesignShotgunLoop() resolver for the shared comparison board feedback loop (serve via HTTP, handle regenerate/remix, AskUserQuestion fallback, feedback confirmation). Registered as {{DESIGN_SHOTGUN_LOOP}}. Fixes generateDesignMockup() to use ~/.gstack/projects/$SLUG/designs/ instead of /tmp/ and docs/designs/. Replaces broken $B goto file:// + $B eval polling with $D compare --serve (HTTP-based, stdout feedback). Adds CRITICAL PATH RULE guardrail to DESIGN_SETUP: design artifacts must go to ~/.gstack/projects/$SLUG/designs/, never .context/ or /tmp/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /design-shotgun standalone design exploration skill New skill for visual brainstorming: generate AI design variants, open a comparison board in the user's browser, collect structured feedback, and iterate. Features: session detection (revisit prior explorations), 5-dimension context gathering (who, job to be done, what exists, user flow, edge cases), taste memory (prior approved designs bias new generations), inline variant preview, configurable variant count, screenshot-to-variants via $D evolve. Uses {{DESIGN_SHOTGUN_LOOP}} resolver for the feedback loop. Saves all artifacts to ~/.gstack/projects/$SLUG/designs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for design-shotgun + resolver changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add remix UI to comparison board Per-variant element selectors (Layout, Colors, Typography, Spacing) with radio buttons in a grid. Remix button collects selections into a remixSpec object and sends via the same HTTP POST feedback mechanism. Enabled only when at least one element is selected. Board shows regenerating spinner while agent generates the hybrid variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add $D gallery command for design history timeline Generates a self-contained HTML page showing all prior design explorations for a project: every variant (approved or not), feedback notes, organized by date (newest first). Images embedded as base64. Handles corrupted approved.json gracefully (skips, still shows the session). Empty state shows "No history yet" with /design-shotgun prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: gallery generation — sessions, dates, corruption, empty state 7 tests: empty dir, nonexistent dir, single session with approved variant, multiple sessions sorted newest-first, corrupted approved.json handled gracefully, session without approved.json, self-contained HTML (no external dependencies). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace broken file:// polling with {{DESIGN_SHOTGUN_LOOP}} plan-design-review and design-consultation templates previously used $B goto file:// + $B eval polling for the comparison board feedback loop. This was broken (browse blocks file:// URLs). Both templates now use {{DESIGN_SHOTGUN_LOOP}} which serves via HTTP, handles regeneration in the same browser tab, and falls back to AskUserQuestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add design-shotgun touchfile entries and tier classifications design-shotgun-path (gate): verify artifacts go to ~/.gstack/, not .context/ design-shotgun-session (gate): verify repeat-run detection + AskUserQuestion design-shotgun-full (periodic): full round-trip with real design binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for template refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: comparison board UI improvements — option headers, pick confirmation, grid view Three changes to the design comparison board: 1. Pick confirmation: selecting "Pick" on Option A shows "We'll move forward with Option A" in green, plus a status line above the submit button repeating the choice. 2. Clear option headers: each variant now has "Option A" in bold with a subtitle above the image, instead of just the raw image. 3. View toggle: top-right Large/Grid buttons switch between single-column (default) and 3-across grid view. Also restructured the bottom section into a 2-column grid: submit/overall feedback on the left, regenerate controls on the right. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use 127.0.0.1 instead of localhost for serve URL Avoids DNS resolution issues on some systems where localhost may resolve to IPv6 ::1 while Bun listens on IPv4 only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: write ALL feedback to disk so agent can poll in background mode The agent backgrounds $D serve (Claude Code can't block on a subprocess and do other work simultaneously). With stdout-only feedback delivery, the agent never sees regenerate/remix feedback. Fix: write feedback-pending.json (regenerate/remix) and feedback.json (submit) to disk next to the board HTML. Agent polls the filesystem instead of reading stdout. Both channels (stdout + disk) are always active so foreground mode still works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: DESIGN_SHOTGUN_LOOP uses file polling instead of stdout reading Update the template resolver to instruct the agent to background $D serve and poll for feedback-pending.json / feedback.json on a 5-second loop. This matches the real-world pattern where Claude Code / Conductor agents can't block on subprocess stdout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for file-polling feedback loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: null-safe DOM selectors for post-submit and regenerating states The user's layout restructure renamed .regenerate-bar → .regen-column, .submit-bar → .submit-column, and .overall-section → .bottom-section. The JS still referenced the old class names, causing querySelector to return null and showPostSubmitState() / showRegeneratingState() to silently crash. This meant Submit and Regenerate buttons appeared to work (DOM elements updated, HTTP POST succeeded) but the visual feedback (disabled inputs, spinner, success message) never appeared. Fix: use fallback selectors that check both old and new class names, with null guards so a missing element doesn't crash the function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: end-to-end feedback roundtrip — browser click to file on disk The test that proves "changes on the website propagate to Claude Code." Opens the comparison board in a real headless browser with __GSTACK_SERVER_URL injected, simulates user clicks (Submit, Regenerate, More Like This), and verifies that feedback.json / feedback-pending.json land on disk with the correct structured data. 6 tests covering: submit → feedback.json, post-submit UI lockdown, regenerate → feedback-pending.json, more-like-this → feedback-pending.json, regenerate spinner display, and full regen → reload → submit round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: comprehensive design doc for Design Shotgun feedback loop Documents the full browser-to-agent feedback architecture: state machine, file-based polling, port discovery, post-submit lifecycle, and every known edge case (zombie forms, dead servers, stale spinners, file:// bug, double-click races, port coordination, sequential generate rule). Includes ASCII diagrams of the data flow and state transitions, complete step-by-step walkthrough of happy path and regeneration path, test coverage map with gaps, and short/medium/long-term improvement ideas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: plan-design-review agent guardrails for feedback loop Four fixes to prevent agents from reinventing the feedback loop badly: 1. Sequential generate rule: explicit instruction that $D generate calls must run one at a time (API rate-limits concurrent image generation). 2. No-AskUserQuestion-for-feedback rule: agent reads feedback.json instead of re-asking what the user picked. 3. Remove file:// references: $B goto file:// was always rejected by url-validation.ts. The --serve flag handles everything. 4. Remove $B eval polling reference: no longer needed with HTTP POST. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: design-shotgun Step 3 progressive reveal, silent failure detection, timing estimate Three production UX bugs fixed: 1. Dead air — now shows timing estimate before generation starts 2. Silent variant drop — replaced $D variants batch with individual $D generate calls, each verified for existence and non-zero size with retry 3. No progressive reveal — each variant shown inline via Read tool immediately after generation (~60s increments instead of all at ~180s) Also: /tmp/ then cp as default output pattern (sandbox workaround), screenshot taken once for evolve path (not per-variant). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: parallel design-shotgun with concept-first confirmation Step 3 rewritten to concept-first + parallel Agent architecture: - 3a: generate text concepts (free, instant) - 3b: AskUserQuestion to confirm/modify before spending API credits - 3c: launch N Agent subagents in parallel (~60s total regardless of count) - 3d: show all results, dynamic image list for comparison board Adds Agent to allowed-tools. Softens plan-design-review sequential warning to note design-shotgun uses parallel at Tier 2+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: untrack .agents/skills/ — generated at setup, already gitignored These files were committed despite .agents/ being in .gitignore. They regenerate from ./setup --host codex on any machine. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate design-shotgun SKILL.md for v0.12.12.0 preamble changes Merge from main brought updated preamble resolver (conditional telemetry, local JSONL logging) but design-shotgun/SKILL.md wasn't regenerated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 20:32:59 -06:00

Garry Tan

cf50443b63

v1.45.0.0 feat(design): persistent board daemon — 24h boards, one tab, board history (#1710 )

* refactor(design): board JS uses relative paths; drop __GSTACK_SERVER_URL injection

Board JS in design/src/compare.ts now calls ./api/feedback and ./api/progress
(relative to location.pathname) and feature-detects server mode via
location.protocol instead of the injected window.__GSTACK_SERVER_URL global.
The injection in design/src/serve.ts is removed (dead code now that nothing
reads it). Tests updated to match the new contract: serve.test.ts asserts
the relative-path JS is present and the global is gone; feedback-roundtrip
asserts location.protocol detects HTTP mode.

Why: prep for the multi-board daemon (design/src/daemon.ts upcoming) where
the same generated HTML is served at /boards/<id>/ instead of /. Relative
paths resolve against location.pathname in both cases, so one HTML, two
hosts. The injection was the only thing tying board JS to a specific
serving path; removing it unblocks the daemon work without forking the
generator.

file:// fallback preserved via the location.protocol feature-detect — board
opened directly as a file still falls through to the DOM-only success path.

The 6 feedback-roundtrip browser tests continue to fail with
session.clearLoadedHtml undefined; that failure pre-exists this branch
(verified against HEAD with these edits stashed) and lives in
browse/src/write-commands.ts, not in the design code path. Tracking
separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): reload guard rejects directory paths

design/src/serve.ts:200-212 used to accept a path that resolved to the
allowedDir itself (the OR branch `|| resolvedReload === allowedDir`),
which then crashed readFileSync with EISDIR. Now:

  1. startsWith(allowedDir + path.sep) must pass — rejects the dir itself
     and anything outside (403).
  2. statSync(resolvedReload).isFile() must pass — rejects subdirectories
     inside allowedDir with a clear "Path must be a file" 400.

The test stub in serve.test.ts mirrors prod; both updated, plus two new
test cases for the previously-broken paths. Codex caught this in the
plan-review pass; it's a latent bug in shipping code, not a regression
from the daemon work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(design): introduce design daemon — multi-board persistent server

Adds design/src/daemon.ts: a Bun.serve daemon that hosts many boards
under /boards/<id>/ instead of one server per `$D compare --serve` call.
Spawned by daemon-client (next commit); for now wired only via tests.

Endpoint table:
  GET  /health                       liveness + version + counts (unauth)
  GET  /                             index of recent boards
  POST /api/boards                   publish; daemon derives sourceDir
                                     from realpath(html). body sourceDir
                                     IGNORED (Codex trust-boundary fix).
  POST /shutdown                     graceful; refuses if active boards
                                     exist (Codex data-loss fix)
  GET  /boards/<id>                  301 → /boards/<id>/ (trailing slash
                                     is load-bearing — relative URLs in
                                     board JS resolve against pathname)
  GET  /boards/<id>/                 render board HTML
  GET  /boards/<id>/api/progress     state machine status (no idle reset)
  POST /boards/<id>/api/feedback     submit/regen; writes feedback.json
                                     or feedback-pending.json with
                                     boardId + publishedAt augmented in
  POST /boards/<id>/api/reload       swap HTML; per-board allowedDir
                                     guard rejects traversal, directories,
                                     out-of-allowed-dir symlinks

Lifecycle:
- 24h idle timeout (DESIGN_DAEMON_IDLE_MS for tests).
- Idle with active boards extends 1h up to 4x, then force-shuts (Codex).
- LRU cap 50 boards; evicts done before non-done; 503 when 50 non-done.
- Per-board async mutex serializes feedback POST vs reload POST.
- SIGTERM/SIGINT/uncaughtException → graceful shutdown, state file unlink.
- Stdout: DAEMON_STARTED port=<N> (the line the client parses).

Shared utilities live in design/src/daemon-state.ts: atomic state-file
write/read (mode 0o600), fs.openSync('wx') lock, isProcessAlive, cmdline
identity verification (/proc on Linux, ps on macOS), CMDLINE_MARKER
constant. Modeled on browse/src/cli.ts lock + spawn patterns.

design/test/daemon.test.ts: 30 tests, all green. Covers every endpoint,
both error paths and happy paths, cross-board feedback isolation, the
trailing-slash redirect, the directory-not-file reload rejection, LRU
preferring done over non-done, /shutdown refusal with active boards,
all path-traversal guards. Uses the exported fetchHandler in-process
(no spawn) so the suite runs in ~70ms.

design/test/daemon-tests-fixtures.ts: shared helpers — req() builder,
tmp-dir helpers, daemon reset, and a spawnDaemonForTest() helper used
by the next commit's discovery tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(design): daemon-client with lock + identity-verified spawn

design/src/daemon-client.ts implements the CLI side of the daemon lifecycle:
ensureDaemon() (the spawn-or-attach decision), publishBoard(), and the
$D daemon stop|status helpers.

Modeled on browse/src/cli.ts:317-415 — same health-check-first attach,
same fs.openSync('wx') lock, same re-read-state-INSIDE-the-lock guard
against two CLIs both deciding "no daemon, spawn." Two design-specific
safety properties added beyond browse:

1. verifyIdentity before any SIGTERM/SIGKILL. Reads the running process's
   cmdline (/proc/PID/cmdline on Linux, `ps -p PID -o command=` on macOS)
   and only signals if it contains CMDLINE_MARKER ("gstack-design-daemon",
   passed as argv at spawn time). Prevents a stale state file from
   causing us to kill an unrelated process that inherited the PID.

2. Refuse-kill-with-active-boards on version mismatch. Browse silently
   restarts; here in-memory board history would vanish, so the client
   prints a user-actionable WARNING and exit 1 instead. Users explicitly
   `$D daemon stop` to override.

Spawn uses Node child_process.spawn (NOT Bun.spawn().unref) because of
the macOS session-detach quirks browse already discovered. Stdio is
redirected to ~/.gstack/design-daemon-startup.log, which the client
tails into stderr if waitForHealthOrError times out — no more silent
"daemon failed for some unknowable reason."

daemon-state.ts gains DESIGN_DAEMON_STATE_FILE env override so tests
can point both client and spawned daemon at a per-test path without a
shared cwd.

design/test/daemon-discovery.test.ts: 17 tests, all green in ~8s. Covers:
spawn-fresh, attach-existing, stale-state-file (pid dead), PID-reuse
safety (uses the test runner's own PID as the bait — verifyIdentity
catches the cmdline mismatch, daemon not signaled), version-mismatch
with/without active boards (the active-boards case runs a subprocess
and asserts exit 1 + WARNING in stderr), publishBoard 200 + 409,
shutdownDaemon refuse/force/unresponsive paths, daemonStatus.

The daemon-discovery suite is split out of daemon.test.ts because each
real spawn costs ~200ms; the in-process daemon.test.ts (30 tests, 70ms)
covers the same handler logic without the spawn overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(design): wire daemon dispatch into CLI; add daemon stop/status

design/src/cli.ts now branches on --no-daemon for both `compare --serve`
and standalone `serve --html`. Default path: ensureDaemon → publishBoard
→ openBrowser → exit. The legacy single-process serve() is preserved
behind --no-daemon for tests, Windows, and explicit debugging.

Adds $D daemon status (prints daemon state JSON, or {running:false})
and $D daemon stop [--force] (refuses with active boards unless --force).

parseArgs gains a `positionals` field so daemon sub-commands work
naturally (`$D daemon stop` instead of `$D --action stop`).

Stderr lines printed by the publishToDaemon path:
  DAEMON_STARTED port=N   (or DAEMON_ATTACHED port=N)
  BOARD_PUBLISHED: <url>
  BOARD_URL: <url>        (alias for grep-friendliness)

Stdout: JSON with id, url, sourceDir.

design/src/commands.ts: --no-daemon, --title added to compare + serve;
new daemon command entry with status|stop sub-commands.

End-to-end smoke (manual): spawning a board via $D serve, hitting the
returned URL, reading /health, calling daemon status (returns the
right JSON), and daemon stop refusing because of the active board —
all work as designed. Force-stop tears down cleanly and removes the
state file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(design): end-to-end daemon round-trip via HTTP fetch

design/test/feedback-roundtrip-daemon.test.ts walks the full publish →
submit / regenerate / reload cycle against a real spawned daemon, using
the same HTTP calls the board JS makes. Four tests, all green in ~650ms.

Covers what design-shotgun and friends actually depend on:
  - Submit writes feedback.json into the board's sourceDir with the
    augmented boardId + publishedAt fields.
  - GET /boards/<id> (no slash) returns a 301 to /boards/<id>/ — the
    load-bearing redirect that lets the board JS use relative paths.
  - Regenerate writes feedback-pending.json, flips state to regenerating,
    /api/progress reflects it; /api/reload swaps HTML in place; round-2
    submit writes the final feedback.json with the round-2 selection.
  - Two boards published into the same daemon get independent URLs on
    the same port — feedback for board A doesn't contaminate board B's
    sourceDir, both URLs serve their own content, the index lists both.

Uses HTTP fetch rather than a real browser because the existing browser
round-trip (feedback-roundtrip.test.ts) is broken on a pre-existing
browse harness regression (session.clearLoadedHtml undefined in
browse/src/write-commands.ts:149) that's unrelated to this branch.
The HTTP path proves the same daemon semantics; a browser variant can
be added once the browse harness is fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(design): compiled binary self-execs as daemon; unified version lookup

Two small but production-critical fixes once the binary actually runs:

1. Compiled binary couldn't spawn the daemon. daemon-client previously
   pointed at design/src/daemon.ts via import.meta.dir — fine in dev,
   fatal in production (the source path doesn't exist on a user's
   machine). Fix: design CLI now self-execs in --daemon-mode when
   invoked with that flag, so the spawn is `process.execPath
   --daemon-mode --marker gstack-design-daemon` for the compiled binary
   and `bun run cli.ts --daemon-mode ...` in dev. Same one binary, two
   modes, no separate daemon entrypoint to ship.

2. Client and daemon disagreed on VERSION in the compiled binary.
   Both used a source-tree-relative path that resolves to "unknown"
   at runtime, which silently shorted the version-mismatch refusal
   path (client expected "unknown" + daemon reported "unknown" → match
   → no refusal even when DESIGN_DAEMON_VERSION was set on one side).
   New readVersionString() consults DESIGN_DAEMON_VERSION env first,
   then design/dist/.version (sidecar baked at build time by build.sh),
   then VERSION at the source-tree root. Both client and daemon now go
   through this one helper.

Manual smoke (compiled binary, all checks green):
  - DAEMON_STARTED + BOARD_PUBLISHED with trailing slash
  - GET /boards/<id> (no slash) → 301 Location /boards/<id>/
  - Second `$D serve` invocation → DAEMON_ATTACHED, new board on same port
  - feedback.json gets boardId + publishedAt fields
  - DESIGN_DAEMON_VERSION=v2-different on second invocation with
    active board → WARNING + "Refusing to auto-kill" + exit 1,
    original daemon still alive
  - `$D daemon stop --force` removes state file

All 67 design tests still green after the refactor (16 serve + 30
daemon + 17 discovery + 4 daemon round-trip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): skill resolvers learn the daemon's BOARD_URL output

The five skills that invoke $D compare --serve (design-shotgun,
design-consultation, plan-design-review, office-hours, design-review)
parsed `SERVE_STARTED: port=N` from stderr and then POSTed to
`/api/reload` at that port during regenerate cycles. The new daemon
hosts boards under `/boards/<id>/` so the reload endpoint moved to
`<BOARD_URL>api/reload` — without this update, the regenerate phase
of every skill invocation would silently fail against daemon mode.

Updated scripts/resolvers/design.ts to parse `BOARD_URL:` instead of
the port, and to POST reloads against the per-board URL. Regenerated
the four SKILL.md files via bun run gen:skill-docs.

Legacy `--no-daemon` invocations continue to emit `SERVE_STARTED:` and
serve at `/api/reload` — the resolver instructions note both.

Surfaced by the maintainability specialist during /ship review (the
"stale comment" finding was actually a behavior bug pointing at five
downstream consumers). Codex's plan-review pass flagged the migration
story as incomplete but I dismissed the concern — Codex was right.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(design): emit SERVE_STARTED back-compat alias; drop dead import

design/src/cli.ts publishToDaemon now emits `SERVE_STARTED: port=N html=<path>`
as a third stderr line alongside DAEMON_STARTED/DAEMON_ATTACHED + BOARD_URL.
Any out-of-tree script that grepped the legacy line still gets the port —
they'd still fail at the reload step (the endpoint moved to /boards/<id>/
api/reload) but they no longer fail at the port-detection step. Combined with
the resolver updates one commit back, this is belt-and-suspenders compat.

Fixed the stale docstring at cli.ts:316 that claimed back-compat without
actually emitting the alias. The maintainability specialist flagged it.

Dropped a dead `DaemonState` import from daemon-client.ts. Same review pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.45.0.0)

Design boards now live 24h, not 10 minutes. One daemon hosts every
board, one tab survives the whole day. See CHANGELOG.md for the full
release summary + metrics + itemized changes.

TODOS.md gains a "design daemon: follow-ups" section capturing the
P3 test gaps + maintainability nits the /ship review army flagged
but that aren't blocking for this release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(design): fill daemon test gaps surfaced by ship review army

Adds 10 net new tests (and removes 1 misleading smoke) for the gaps the
testing specialist flagged at /ship time. Filed as P3 TODOs at ship,
filling now per boil-the-lake.

design/test/daemon-discovery.test.ts (+6 tests, +1 import):
  - "idle daemon (no boards) shuts itself down after IDLE_MS + CHECK_MS"
    Spawn-based, DESIGN_DAEMON_IDLE_MS=2000, CHECK_MS=200. Waits for the
    daemon process to actually exit and asserts the state file is removed.
    Previously only "callable without throwing" was tested.
  - "bare GET polling does NOT prevent idle shutdown"
    Hammers /api/progress every 200ms in a background loop with a done
    board, asserts the daemon still idles out — proves the
    meaningful-activity-only-on-POSTs guard (Codex finding) actually works.
  - "idle with active (non-done) boards triggers extension instead of shutdown"
    Sets DESIGN_DAEMON_EXTENSION_MS=1500 + MAX_EXTENSIONS=2, publishes a
    non-done board, asserts the daemon survives past IDLE_MS (extends),
    then verifies the MAX_EXTENSIONS hard ceiling force-shuts. Both the
    extension counter and the hard ceiling were previously untested.
  - "two parallel ensureDaemon() calls converge on one daemon"
    Fires two ensureDaemon calls in Promise.all against an empty stateFile,
    asserts: both ports match, exactly one spawned=true, exactly one daemon
    alive, no orphaned lock file. The discovery-test file's own docstring
    claimed this test existed; now it actually does.
  - "acquireLock reclaims a lockfile owned by a dead PID"
    Plants a lockfile with PID 999999998, calls acquireLock, asserts the
    returned release fn is non-null and the lock now holds our PID.
  - "acquireLock refuses to reclaim a lockfile owned by an alive PID"
    Uses the test runner's own PID — alive but not the lock's intended
    owner. Asserts acquireLock returns null and leaves the lockfile
    untouched. The unrelated-process-PID-reuse safety guard.

design/test/daemon.test.ts (-2 misleading, +5 new = +3 net):
  - Removed: "bare GET /api/progress does NOT reset meaningful activity"
    (smoke pretending to be behavioral — body comment admitted it couldn't
    verify). Replaced by the spawn-based version in daemon-discovery above.
  - Removed: "idleCheckTick is callable without throwing when there's no idle"
    (collapsed into a single smoke describe that's clearer about its scope).
  - Added: "POST /api/boards rejects invalid JSON body"
  - Added: "POST /api/boards rejects non-object body (e.g. JSON null)"
  - Added: "POST /api/boards: array body falls through to missing-html 400"
    (documents the typeof-array-is-object JS quirk; will surface if we
    ever tighten the type check)
  - Added: "POST /boards/<id>/api/reload rejects invalid JSON body"
  - Added: "POST /boards/<id>/api/reload rejects body missing html field"

Per-file totals after: serve 16, daemon 34, discovery 23, round-trip 4 = 77.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update CHANGELOG + TODOS for filled test gaps in v1.45.0.0

Bumps the design test count from 67 → 77 (and the new-test delta from
+51 → +61) to reflect commit 6b037c55, which filled the 5 P3 test gaps
the /ship review army had filed to TODOS.md.

Marks the "Tighten daemon test coverage" entry in TODOS.md as DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-25 20:45:12 -07:00

Garry Tan

03973c2fab

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

* fix(bin): pass search params via env vars (RCE fix) (#819)

Replace shell string interpolation with process.env in gstack-learnings-search
to prevent arbitrary code execution via crafted learnings entries. Also fixes
the CROSS_PROJECT interpolation that the original PR missed.

Adds 3 regression tests verifying no shell interpolation remains in the bun -e block.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): add path validation to upload command (#821)

Add isPathWithin() and path traversal checks to the upload command,
blocking file exfiltration via crafted upload paths. Uses existing
SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): symlink resolution in meta-commands validateOutputPath (#820)

Add realpathSync to validateOutputPath in meta-commands.ts to catch
symlink-based directory escapes in screenshot, pdf, and responsive
commands. Resolves SAFE_DIRECTORIES through realpathSync to handle
macOS /tmp -> /private/tmp symlinks. Existing path validation tests
pass with the hardened implementation.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add uninstall instructions to README (#812)

Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall
script (handles everything) and manual removal steps for when the repo isn't
cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance.

Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): Windows launcher extraEnv + headed-mode token (#822)

Community PR #822 by @pieterklue. Three fixes:
1. Windows launcher now merges extraEnv into spawned server env (was
   only passing BROWSE_STATE_FILE, dropping all other env vars)
2. Welcome page fallback serves inline HTML instead of about:blank
   redirect (avoids ERR_UNSAFE_REDIRECT on Windows)
3. /health returns auth token in headed mode even without Origin header
   (fixes Playwright Chromium extensions that don't send it)

Also adds HOME/USERPROFILE fallback for cross-platform compatibility.

Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): terminate orphan server when parent process exits (#808)

Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned
server process. The server polls every 15s with signal 0 and calls
shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell
processes when Claude Code sessions exit abnormally.

Co-Authored-By: mmporong <mmporong@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664)

Community PR #664 by @mr-k-man (security audit round 1, new parts only).

- IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive
  guard for hostnames like fd.example.com
- Cookie value redaction for tokens, API keys, JWTs in browse cookies command
- Per-tab cancel files in killAgent() replacing broken global kill-signal
- design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload
- extension: targeted getToken handler replaces token-in-health-broadcast
- Supabase migration 003: column-level GRANT restricts anon UPDATE scope
- Telemetry sync: upsert error logging
- 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal

Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): CSS injection guard, timeout clamping, session validation, tests (#806)

Community PR #806 by @mr-k-man (security audit round 2, new parts only).

- CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector
- Queue file permissions (0o700/0o600) in cli, server, sidebar-agent
- escapeRegExp for frame --url ReDoS fix
- Responsive screenshot path validation with validateOutputPath
- State load cookie filtering (reject localhost/.internal/metadata cookies)
- Session ID format validation in loadSession
- /health endpoint: remove currentUrl and currentMessage fields
- QueueEntry interface + isValidQueueEntry validator for sidebar-agent
- SIGTERM->SIGKILL escalation in timeout handler
- Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s)
- Cookie domain validation in cookie-import and cookie-import-browser
- DocumentFragment-based tab switching (XSS fix in sidepanel)
- pollInProgress reentrancy guard for pollChat
- toggleClass/injectCSS input validation in extension inspector
- Snapshot annotated path validation with realpathSync
- 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts

Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.13.0)

Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man,
@mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction,
per-tab cancel signaling, CSS injection guards, timeout clamping, session
validation, DocumentFragment XSS fix, parent process watchdog, uninstall
docs, Windows fixes, and 750+ lines of security regression tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com>
Co-authored-by: pieterklue <pieterklue@users.noreply.github.com>
Co-authored-by: mmporong <mmporong@users.noreply.github.com>
Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>

2026-04-06 00:47:04 -07:00

Garry Tan

78bc1d1968

feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0) (#551 )

* docs: design tools v1 plan — visual mockup generation for gstack skills

Full design doc covering the `design` binary that wraps OpenAI's GPT Image API
to generate real UI mockups from gstack's design skills. Includes comparison
board UX spec, auth model, 6 CEO expansions (design memory, mockup diffing,
screenshot evolution, design intent verification, responsive variants,
design-to-code prompt), and 9-commit implementation plan.

Reviewed: /office-hours + /plan-eng-review (CLEARED) + /plan-ceo-review
(EXPANSION, 6/6 accepted) + /plan-design-review (2/10 → 8/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design tools prototype validation — GPT Image API works

Prototype script sends 3 design briefs to OpenAI Responses API with
image_generation tool. Results: dashboard (47s, 2.1MB), landing page
(42s, 1.3MB), settings page (37s, 1.3MB) all produce real, implementable
UI mockups with accurate text rendering and clean layouts.

Key finding: Codex OAuth tokens lack image generation scopes. Direct
API key (sk-proj-*) required, stored in ~/.gstack/openai.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design binary core — generate, check, compare commands

Stateless CLI (design/dist/design) wrapping OpenAI Responses API for
UI mockup generation. Three working commands:

- generate: brief -> PNG mockup via gpt-4o + image_generation tool
- check: vision-based quality gate via GPT-4o (text readability, layout
  completeness, visual coherence)
- compare: generates self-contained HTML comparison board with star
  ratings, radio Pick, per-variant feedback, regenerate controls,
  and Submit button that writes structured JSON for agent polling

Auth reads from ~/.gstack/openai.json (0600), falls back to
OPENAI_API_KEY env var. Compiled separately from browse binary
(openai added to devDependencies, not runtime deps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design binary variants + iterate commands

variants: generates N style variations with staggered parallel (1.5s
between launches, exponential backoff on 429). 7 built-in style
variations (bold, calm, warm, corporate, dark, playful + default).
Tested: 3/3 variants in 41.6s.

iterate: multi-turn design iteration using previous_response_id for
conversational threading. Falls back to re-generation with accumulated
feedback if threading doesn't retain visual context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DESIGN_SETUP + DESIGN_MOCKUP template resolvers

Add generateDesignSetup() and generateDesignMockup() to the existing
design.ts resolver file. Add designDir to HostPaths (claude + codex).
Register DESIGN_SETUP and DESIGN_MOCKUP in the resolver index.

DESIGN_SETUP: $D binary discovery (mirrors $B browse setup pattern).
Falls back to DESIGN_SKETCH if binary not available.

DESIGN_MOCKUP: full visual exploration workflow template — construct
brief from DESIGN.md context, generate 3 variants, open comparison
board in Chrome, poll for user feedback, save approved mockup to
docs/designs/, generate HTML wireframe for implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync package.json version with VERSION file (0.12.2.0)

Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was
0.12.0.0. Also adds design binary to build script and dev:design
convenience command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /office-hours visual design exploration integration

Add {{DESIGN_MOCKUP}} to office-hours template before the existing
{{DESIGN_SKETCH}}. When the design binary is available, /office-hours
generates 3 visual mockup variants, opens a comparison board in Chrome,
and polls for user feedback. Falls back to HTML wireframes if the
design binary isn't built.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /plan-design-review visual mockup integration

Add {{DESIGN_SETUP}} to pre-review audit and "show me what 10/10
looks like" mockup generation to the 0-10 rating method. When a
design dimension rates below 7/10, the review can generate a mockup
showing the improved version. Falls back to text descriptions if
the design binary isn't available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design memory — extract visual language from mockups into DESIGN.md

New `$D extract` command: sends approved mockup to GPT-4o vision,
extracts color palette, typography, spacing, and layout patterns,
writes/updates DESIGN.md with an "Extracted Design Language" section.

Progressive constraint: if DESIGN.md exists, future mockup briefs
include it as style context. If no DESIGN.md, explorations run wide.
readDesignConstraints() reads existing DESIGN.md for brief construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mockup diffing + design intent verification

New commands:
- $D diff --before old.png --after new.png: visual diff using GPT-4o
  vision. Returns differences by area with severity (high/medium/low)
  and a matchScore (0-100).
- $D verify --mockup approved.png --screenshot live.png: compares live
  site screenshot against approved design mockup. Pass if matchScore
  >= 70 and no high-severity differences.

Used by /design-review to close the design loop: design -> implement ->
verify visually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: screenshot-to-mockup evolution ($D evolve)

New command: $D evolve --screenshot current.png --brief "make it calmer"

Two-step process: first analyzes the screenshot via GPT-4o vision to
produce a detailed description, then generates a new mockup that keeps
the existing layout structure but applies the requested changes. Starts
from reality, not blank canvas.

Bridges the gap between /design-review critique ("the spacing is off")
and a visual proposal of the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: responsive variants + design-to-code prompt

Responsive variants: $D variants --viewports desktop,tablet,mobile
generates mockups at 1536x1024, 1024x1024, and 1024x1536 (portrait)
with viewport-appropriate layout instructions.

Design-to-code prompt: $D prompt --image approved.png extracts colors,
typography, layout, and components via GPT-4o vision, producing a
structured implementation prompt. Reads DESIGN.md for additional
constraint context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer as first-class tool in /plan-design-review

Brand the gstack designer prominently, add Step 0.5 for proactive visual
mockup generation before review passes, and update priority hierarchy.
When a plan describes new UI, the skill now offers to generate mockups
with $D variants, run $D check for quality gating, and present a
comparison board via $B goto before any review passes begin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate mockups into review passes and outputs

Thread Step 0.5 mockups through the review workflow: Pass 4 (AI Slop)
evaluates generated mockups visually, Pass 7 uses mockups as evidence
for unresolved decisions, post-pass offers one-shot regeneration after
design changes, and Approved Mockups section records chosen variants
with paths for the implementer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer target mockups in /design-review fix loop

Add $D generate for target mockups in Phase 8a.5 — before fixing a
design finding, generate a mockup showing what it should look like.
Add $D verify in Phase 9 to compare fix results against targets.
Not plan mode — goes straight to implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer AI mockups in /design-consultation Phase 5

Replace HTML preview with $D variants + comparison board when designer
is available (Path A). Use $D extract to derive DESIGN.md tokens from
the approved mockup. Handles both plan mode (write to plan) and
non-plan mode (implement immediately). Falls back to HTML preview
(Path B) when designer binary is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make gstack designer the default in /plan-design-review, not optional

The transcript showed the agent writing 5 text descriptions of homepage
variants instead of generating visual mockups, even when the user explicitly
asked for design tools. The skill treated mockups as optional ("Want me to
generate?") when they should be the default behavior.

Changes:
- Rename "Your Visual Design Tool" to "YOUR PRIMARY TOOL" with aggressive
  language: "Don't ask permission. Show it."
- Step 0.5 now generates mockups automatically when DESIGN_READY, no
  AskUserQuestion gatekeeping the default path
- Priority hierarchy: mockups are "non-negotiable" not "if available"
- Step 0D tells the user mockups are coming next
- DESIGN_NOT_AVAILABLE fallback now tells user what they're missing

The only valid reasons to skip mockups: no UI scope, or designer not
installed. Everything else generates by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: persist design mockups to ~/.gstack/projects/$SLUG/designs/

Mockups were going to .context/mockups/ (gitignored, workspace-local).
This meant designs disappeared when switching workspaces or conversations,
and downstream skills couldn't reference approved mockups from earlier
reviews.

Now all three design skills save to persistent project-scoped dirs:
- /plan-design-review: ~/.gstack/projects/$SLUG/designs/<screen>-<date>/
- /design-consultation: ~/.gstack/projects/$SLUG/designs/design-system-<date>/
- /design-review: ~/.gstack/projects/$SLUG/designs/design-audit-<date>/

Each directory gets an approved.json recording the user's pick, feedback,
and branch. This lets /design-review verify against mockups that
/plan-design-review approved, and design history is browsable via
ls ~/.gstack/projects/$SLUG/designs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate codex ship skill with zsh glob guards

Picked up setopt +o nomatch guards from main's v0.12.8.1 merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add browse binary discovery to DESIGN_SETUP resolver

The design setup block now discovers $B alongside $D, so skills can
open comparison boards via $B goto and poll feedback via $B eval.
Falls back to `open` on macOS when browse binary is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: comparison board DOM polling in plan-design-review

After opening the comparison board, the agent now polls
#status via $B eval instead of asking a rigid AskUserQuestion.
Handles submit (read structured JSON feedback), regenerate
(new variants with updated brief), and $B-unavailable fallback
(free-form text response). The user interacts with the real
board UI, not a constrained option picker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comparison board feedback loop integration test

16 tests covering the full DOM polling cycle: structure verification,
submit with pick/rating/comment, regenerate flows (totally different,
more like this, custom text), and the agent polling pattern
(empty → submitted → read JSON). Uses real generateCompareHtml()
from design/src/compare.ts, served via HTTP. Runs in <1s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add $D serve command for HTTP-based comparison board feedback

The comparison board feedback loop was fundamentally broken: browse blocks
file:// URLs (url-validation.ts:71), so $B goto file://board.html always
fails. The fallback open + $B eval polls a different browser instance.

$D serve fixes this by serving the board over HTTP on localhost. The server
is stateful: stays alive across regeneration rounds, exposes /api/progress
for the board to poll, and accepts /api/reload from the agent to swap in
new board HTML. Stdout carries feedback JSON only; stderr carries telemetry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: dual-mode feedback + post-submit lifecycle in comparison board

When __GSTACK_SERVER_URL is set (injected by $D serve), the board POSTs
feedback to the server instead of only writing to hidden DOM elements.
After submit: disables all inputs, shows "Return to your coding agent."
After regenerate: shows spinner, polls /api/progress, auto-refreshes on
ready. On POST failure: shows copyable JSON fallback. On progress timeout
(5 min): shows error with /design-shotgun prompt. DOM fallback preserved
for headed browser mode and tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: HTTP serve command endpoints and regeneration lifecycle

11 tests covering: HTML serving with injected server URL, /api/progress
state reporting, submit → done lifecycle, regenerate → regenerating state,
remix with remixSpec, malformed JSON rejection, /api/reload HTML swapping,
missing file validation, and full regenerate → reload → submit round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add DESIGN_SHOTGUN_LOOP resolver + fix design artifact paths

Adds generateDesignShotgunLoop() resolver for the shared comparison board
feedback loop (serve via HTTP, handle regenerate/remix, AskUserQuestion
fallback, feedback confirmation). Registered as {{DESIGN_SHOTGUN_LOOP}}.

Fixes generateDesignMockup() to use ~/.gstack/projects/$SLUG/designs/
instead of /tmp/ and docs/designs/. Replaces broken $B goto file:// +
$B eval polling with $D compare --serve (HTTP-based, stdout feedback).

Adds CRITICAL PATH RULE guardrail to DESIGN_SETUP: design artifacts must
go to ~/.gstack/projects/$SLUG/designs/, never .context/ or /tmp/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /design-shotgun standalone design exploration skill

New skill for visual brainstorming: generate AI design variants, open a
comparison board in the user's browser, collect structured feedback, and
iterate. Features: session detection (revisit prior explorations), 5-dimension
context gathering (who, job to be done, what exists, user flow, edge cases),
taste memory (prior approved designs bias new generations), inline variant
preview, configurable variant count, screenshot-to-variants via $D evolve.

Uses {{DESIGN_SHOTGUN_LOOP}} resolver for the feedback loop. Saves all
artifacts to ~/.gstack/projects/$SLUG/designs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for design-shotgun + resolver changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add remix UI to comparison board

Per-variant element selectors (Layout, Colors, Typography, Spacing) with
radio buttons in a grid. Remix button collects selections into a remixSpec
object and sends via the same HTTP POST feedback mechanism. Enabled only
when at least one element is selected. Board shows regenerating spinner
while agent generates the hybrid variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add $D gallery command for design history timeline

Generates a self-contained HTML page showing all prior design explorations
for a project: every variant (approved or not), feedback notes, organized
by date (newest first). Images embedded as base64. Handles corrupted
approved.json gracefully (skips, still shows the session). Empty state
shows "No history yet" with /design-shotgun prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: gallery generation — sessions, dates, corruption, empty state

7 tests: empty dir, nonexistent dir, single session with approved variant,
multiple sessions sorted newest-first, corrupted approved.json handled
gracefully, session without approved.json, self-contained HTML (no
external dependencies).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace broken file:// polling with {{DESIGN_SHOTGUN_LOOP}}

plan-design-review and design-consultation templates previously used
$B goto file:// + $B eval polling for the comparison board feedback loop.
This was broken (browse blocks file:// URLs). Both templates now use
{{DESIGN_SHOTGUN_LOOP}} which serves via HTTP, handles regeneration in
the same browser tab, and falls back to AskUserQuestion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add design-shotgun touchfile entries and tier classifications

design-shotgun-path (gate): verify artifacts go to ~/.gstack/, not .context/
design-shotgun-session (gate): verify repeat-run detection + AskUserQuestion
design-shotgun-full (periodic): full round-trip with real design binary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for template refactor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: comparison board UI improvements — option headers, pick confirmation, grid view

Three changes to the design comparison board:

1. Pick confirmation: selecting "Pick" on Option A shows "We'll move
   forward with Option A" in green, plus a status line above the submit
   button repeating the choice.

2. Clear option headers: each variant now has "Option A" in bold with a
   subtitle above the image, instead of just the raw image.

3. View toggle: top-right Large/Grid buttons switch between single-column
   (default) and 3-across grid view.

Also restructured the bottom section into a 2-column grid: submit/overall
feedback on the left, regenerate controls on the right.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use 127.0.0.1 instead of localhost for serve URL

Avoids DNS resolution issues on some systems where localhost may resolve
to IPv6 ::1 while Bun listens on IPv4 only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: write ALL feedback to disk so agent can poll in background mode

The agent backgrounds $D serve (Claude Code can't block on a subprocess
and do other work simultaneously). With stdout-only feedback delivery,
the agent never sees regenerate/remix feedback.

Fix: write feedback-pending.json (regenerate/remix) and feedback.json
(submit) to disk next to the board HTML. Agent polls the filesystem
instead of reading stdout. Both channels (stdout + disk) are always
active so foreground mode still works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DESIGN_SHOTGUN_LOOP uses file polling instead of stdout reading

Update the template resolver to instruct the agent to background $D serve
and poll for feedback-pending.json / feedback.json on a 5-second loop.
This matches the real-world pattern where Claude Code / Conductor agents
can't block on subprocess stdout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for file-polling feedback loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: null-safe DOM selectors for post-submit and regenerating states

The user's layout restructure renamed .regenerate-bar → .regen-column,
.submit-bar → .submit-column, and .overall-section → .bottom-section.
The JS still referenced the old class names, causing querySelector to
return null and showPostSubmitState() / showRegeneratingState() to
silently crash. This meant Submit and Regenerate buttons appeared to
work (DOM elements updated, HTTP POST succeeded) but the visual
feedback (disabled inputs, spinner, success message) never appeared.

Fix: use fallback selectors that check both old and new class names,
with null guards so a missing element doesn't crash the function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: end-to-end feedback roundtrip — browser click to file on disk

The test that proves "changes on the website propagate to Claude Code."
Opens the comparison board in a real headless browser with __GSTACK_SERVER_URL
injected, simulates user clicks (Submit, Regenerate, More Like This), and
verifies that feedback.json / feedback-pending.json land on disk with the
correct structured data.

6 tests covering: submit → feedback.json, post-submit UI lockdown,
regenerate → feedback-pending.json, more-like-this → feedback-pending.json,
regenerate spinner display, and full regen → reload → submit round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: comprehensive design doc for Design Shotgun feedback loop

Documents the full browser-to-agent feedback architecture: state machine,
file-based polling, port discovery, post-submit lifecycle, and every known
edge case (zombie forms, dead servers, stale spinners, file:// bug,
double-click races, port coordination, sequential generate rule).

Includes ASCII diagrams of the data flow and state transitions, complete
step-by-step walkthrough of happy path and regeneration path, test coverage
map with gaps, and short/medium/long-term improvement ideas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: plan-design-review agent guardrails for feedback loop

Four fixes to prevent agents from reinventing the feedback loop badly:

1. Sequential generate rule: explicit instruction that $D generate calls
   must run one at a time (API rate-limits concurrent image generation).
2. No-AskUserQuestion-for-feedback rule: agent reads feedback.json instead
   of re-asking what the user picked.
3. Remove file:// references: $B goto file:// was always rejected by
   url-validation.ts. The --serve flag handles everything.
4. Remove $B eval polling reference: no longer needed with HTTP POST.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: design-shotgun Step 3 progressive reveal, silent failure detection, timing estimate

Three production UX bugs fixed:
1. Dead air — now shows timing estimate before generation starts
2. Silent variant drop — replaced $D variants batch with individual $D generate
   calls, each verified for existence and non-zero size with retry
3. No progressive reveal — each variant shown inline via Read tool immediately
   after generation (~60s increments instead of all at ~180s)

Also: /tmp/ then cp as default output pattern (sandbox workaround),
screenshot taken once for evolve path (not per-variant).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: parallel design-shotgun with concept-first confirmation

Step 3 rewritten to concept-first + parallel Agent architecture:
- 3a: generate text concepts (free, instant)
- 3b: AskUserQuestion to confirm/modify before spending API credits
- 3c: launch N Agent subagents in parallel (~60s total regardless of count)
- 3d: show all results, dynamic image list for comparison board

Adds Agent to allowed-tools. Softens plan-design-review sequential
warning to note design-shotgun uses parallel at Tier 2+.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.13.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: untrack .agents/skills/ — generated at setup, already gitignored

These files were committed despite .agents/ being in .gitignore.
They regenerate from ./setup --host codex on any machine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate design-shotgun SKILL.md for v0.12.12.0 preamble changes

Merge from main brought updated preamble resolver (conditional telemetry,
local JSONL logging) but design-shotgun/SKILL.md wasn't regenerated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-27 20:32:59 -06:00

3 Commits