merge: origin/main into garrytan/multi-checkpoint-resume

Catches up 7 commits from main: - c6e6a21d refactor: AI slop reduction with cross-model quality review (v0.16.3.0) - 7e96fe29 fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) - 23000672 feat: UX behavioral foundations + ux-audit command (v0.17.0.0) - b805aa01 feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) - 6a785c57 fix: ngrok Windows build + close CI error-swallowing gap (v0.18.0.1) - 0cc830b6 fix: avoid tilde-in-assignment to silence Claude Code permission prompts - cc42f14a docs: gstack compact design doc (tabled pending Anthropic API) Conflict resolution: - ship/SKILL.md.tmpl line 402: HEAD had "## Step 12: Version bump" from the renumber refactor; origin/main added {{GBRAIN_SAVE_RESULTS}} above "## Step 4: Version bump". Resolved by keeping origin/main's new placeholder AND my branch's "Step 12" heading. - ship/SKILL.md: regenerated from resolved template (per CLAUDE.md policy: never resolve generated files manually). All skill docs regenerated for all 9 hosts (claude, kiro, opencode, slate, cursor, openclaw, factory, hermes, gbrain) to reflect the merged template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:46:40 +02:00 · 2026-04-17 06:38:48 +08:00
parent 7be2472ae7 cc42f14a58
commit c33adb0c7c
147 changed files with 4225 additions and 381 deletions
@@ -59,5 +59,4 @@ RUN useradd -m -s /bin/bash runner \
    && chmod -R a+rX /opt/node_modules_cache \
    && mkdir -p /home/runner/.gstack && chown -R runner:runner /home/runner/.gstack \
    && chmod 1777 /tmp \
-    && mkdir -p /home/runner/.bun && chown -R runner:runner /home/runner/.bun \
-    && chmod -R 1777 /tmp
+    && mkdir -p /home/runner/.bun && chown -R runner:runner /home/runner/.bun
@@ -13,6 +13,8 @@ bin/gstack-global-discover
 .slate/
 .cursor/
 .openclaw/
+.hermes/
+.gbrain/
 .context/
 extension/.auth.json
 .gstack-worktrees/
@@ -208,6 +208,9 @@ Templates contain the workflows, tips, and examples that require human judgment.
 | `{{CODEX_PLAN_REVIEW}}` | `gen-skill-docs.ts` | Optional cross-model plan review (Codex or Claude subagent fallback) for /plan-ceo-review and /plan-eng-review |
 | `{{DESIGN_SETUP}}` | `resolvers/design.ts` | Discovery pattern for `$D` design binary, mirrors `{{BROWSE_SETUP}}` |
 | `{{DESIGN_SHOTGUN_LOOP}}` | `resolvers/design.ts` | Shared comparison board feedback loop for /design-shotgun, /plan-design-review, /design-consultation |
+| `{{UX_PRINCIPLES}}` | `resolvers/design.ts` | User behavioral foundations (scanning, satisficing, goodwill reservoir, trunk test) for /design-html, /design-shotgun, /design-review, /plan-design-review |
+| `{{GBRAIN_CONTEXT_LOAD}}` | `resolvers/gbrain.ts` | Brain-first context search with keyword extraction, health awareness, and data-research routing. Injected into 10 brain-aware skills. Suppressed on non-brain hosts. |
+| `{{GBRAIN_SAVE_RESULTS}}` | `resolvers/gbrain.ts` | Post-skill brain persistence with entity enrichment, throttle handling, and per-skill save instructions. 8 skill-specific save formats. |

 This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.

@@ -1,5 +1,84 @@
 # Changelog

+## [0.18.0.1] - 2026-04-16
+
+### Fixed
+- **Windows install no longer fails with a build error.** If you installed gstack on Windows (or a fresh Linux box), `./setup` was dying with `cannot write multiple output files without an output directory`. The Windows-compat Node server bundle now builds cleanly, so `/browse`, `/canary`, `/pair-agent`, `/open-gstack-browser`, `/setup-browser-cookies`, and `/design-review` all work on Windows again. If you were stuck on gstack v0.15.11-era features without knowing it, this is why. Thanks to @tomasmontbrun-hash (#1019) and @scarson (#1013) for independently tracking this down, and to the issue reporters on #1010 and #960.
+- **CI stops lying about green builds.** The `build` and `test` scripts in `package.json` had a shell precedence trap where a trailing `|| true` swallowed failures from the *entire* command chain, not just the cleanup step it was meant for. That's how the Windows build bug above shipped in the first place — CI ran the build, the build failed, and CI reported success anyway. Now build and test failures actually fail. Silent CI is the worst kind of CI.
+- **`/pair-agent` on Windows surfaces install problems at install time, not tunnel time.** `./setup` now verifies Node can load `@ngrok/ngrok` on Windows, just like it already did for Playwright. If the native binary didn't install, you find out now instead of the first time you try to pair an agent.
+
+### For contributors
+- New `browse/test/build.test.ts` validates `server-node.mjs` is well-formed ES module syntax and that `@ngrok/ngrok` was actually externalized (not inlined). Gracefully skips when no prior build has run.
+- Added a policy comment in `browse/scripts/build-node-server.sh` explaining when and why to externalize a dependency. If you add a dep with a native addon or a dynamic `await import()`, the comment tells you where to plug it in.
+
+## [0.18.0.0] - 2026-04-15
+
+### Added
+- **Confusion Protocol.** Every workflow skill now has an inline ambiguity gate. When Claude hits a decision that could go two ways (which architecture? which data model? destructive operation with unclear scope?), it stops and asks instead of guessing. Scoped to high-stakes decisions only, so it doesn't slow down routine coding. Addresses Karpathy's #1 AI coding failure mode.
+- **Hermes host support.** gstack now generates skill docs for [Hermes Agent](https://github.com/nousresearch/hermes-agent) with proper tool rewrites (`terminal`, `read_file`, `patch`, `delegate_task`). `./setup --host hermes` prints integration instructions.
+- **GBrain host + brain-first resolver.** GBrain is a "mod" for gstack. When installed, your coding skills become brain-aware: they search your brain for relevant context before starting and save results to your brain after finishing. 10 skills are now brain-aware: /office-hours, /investigate, /plan-ceo-review, /retro, /ship, /qa, /design-review, /plan-eng-review, /cso, and /design-consultation. Compatible with GBrain >= v0.10.0.
+- **GBrain v0.10.0 integration.** Agent instructions now use `gbrain search` (fast keyword lookup) instead of `gbrain query` (expensive hybrid). Every command shows full CLI syntax with `--title`, `--tags`, and heredoc examples. Keyword extraction guidance helps agents search effectively. Entity enrichment auto-creates stub pages for people and companies mentioned in skill output. Throttle errors are named so agents can detect and handle them. A preamble health check runs `gbrain doctor --fast --json` at session start and names failing checks when the brain is degraded.
+- **Skill triggers for GBrain router.** All 38 skill templates now include `triggers:` arrays in their frontmatter, multi-word keywords like "debug this", "ship it", "brainstorm this". These power GBrain's RESOLVER.md skill router and pass `checkResolvable()` validation. Distinct from `voice-triggers:` (speech-to-text aliases).
+- **Hermes brain support.** Hermes agents with GBrain installed as a mod now get brain features automatically. The resolver fallback logic ("if GBrain is not available, proceed without") handles non-GBrain Hermes installs gracefully.
+- **slop:diff in /review.** Every code review now runs `bun run slop:diff` as an advisory diagnostic, catching AI code quality issues (empty catches, redundant abstractions, overcomplicated patterns) before they land. Informational only, never blocking.
+- **Karpathy compatibility.** README now positions gstack as the workflow enforcement layer for [Karpathy-style CLAUDE.md rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars). Maps each failure mode to the gstack skill that addresses it.
+
+### Changed
+- **CEO review HARD GATE reinforcement.** "Do NOT make any code changes. Review only." now repeats at every STOP point (12 locations), not just the top. Prompt repetition measurably reduces the "starts implementing" failure mode.
+- **Office-hours design doc visibility.** After writing the design doc, the skill now prints the full path so downstream skills (/plan-ceo-review, /plan-eng-review) can find it.
+- **Investigate investigation history.** Each investigation now logs to the learnings system with `type: "investigation"` and affected file paths. Future investigations on the same files surface prior root causes automatically. Recurring bugs in the same area = architectural smell.
+- **Retro non-git context.** If `~/.gstack/retro-context.md` exists, the retro now reads it for meeting notes, calendar events, and decisions that don't appear in git history.
+- **Native OpenClaw skills improved.** The 4 hand-crafted ClawHub skills (office-hours, ceo-review, investigate, retro) now mirror the template improvements above.
+- **Host count: 8 to 10.** Hermes and GBrain join Claude, Codex, Factory, Kiro, OpenCode, Slate, Cursor, and OpenClaw.
+
+## [0.17.0.0] - 2026-04-14
+
+### Added
+- **UX behavioral foundations.** Every design skill now thinks about how users actually behave, not just how the interface looks. A shared `{{UX_PRINCIPLES}}` resolver distills Steve Krug's "Don't Make Me Think" into actionable guidance: scanning behavior, satisficing, the goodwill reservoir, navigation wayfinding, and the trunk test. Injected into /design-html, /design-shotgun, /design-review, and /plan-design-review. Your design reviews now catch "this navigation is confusing" problems, not just "the contrast ratio is 4.3:1."
+- **6 usability tests woven into design-review.** The methodology now runs the Trunk Test (can you tell what site this is, what page you're on, and how to search?), 3-Second Scan (what do users see first?), Page Area Test (can you name each section's purpose?), Happy Talk Detection with word count (how much of this page is "blah blah blah"?), Mindless Choice Audit (does every click feel obvious?), and Goodwill Reservoir tracking with a visual dashboard (what depletes the user's patience at each step?).
+- **First-person narration mode.** Design review reports now read like a usability consultant watching someone use your site: "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely. Wait, is that a button?" With anti-slop guardrail: if the agent can't name the specific element, it's generating platitudes.
+- **`$B ux-audit` command.** Standalone UX structural extraction. One command extracts site ID, navigation, headings, interactive elements, text blocks, and search presence as structured JSON. The agent applies the 6 usability tests to the data. Pure data extraction with element caps (50 headings, 100 links, 200 interactive, 50 text blocks).
+- **`snapshot -H` / `--heatmap` flag.** Color-coded overlay screenshots. Pass a JSON map of ref IDs to colors (`green`/`yellow`/`red`/`blue`/`orange`/`gray`) and get an annotated screenshot with per-element colored boxes. Color whitelist prevents CSS injection. Composable: any skill can use it.
+- **Token ceiling enforcement.** `gen-skill-docs` now warns if any generated SKILL.md exceeds 100KB (~25K tokens). Catches prompt bloat before it degrades agent performance.
+
+### Changed
+- **Krug's always/never rules** added to the design hard rules: never placeholder-as-label, never floating headings, always visited link distinction, never sub-16px body text. These join the existing AI slop blacklist as mechanical checks.
+- **Plan-design-review references** now include Steve Krug, Ginny Redish (Letting Go of the Words), and Caroline Jarrett (Forms that Work) alongside Rams, Norman, and Nielsen.
+
+## [0.16.4.0] - 2026-04-13
+
+### Added
+- **Cookie origin pinning.** When you import cookies for specific domains, JS execution is now blocked on pages that don't match those domains. This prevents the attack where a prompt injection navigates to an attacker's site and runs `document.cookie` to steal your imported cookies. Subdomain matching works automatically (importing `.github.com` allows `api.github.com`). When no cookies are imported, everything works as before. 3 PRs from @halbert04.
+- **Command audit log.** Every browse command now gets a persistent forensic trail in `~/.gstack/.browse/browse-audit.jsonl`. Timestamp, command, args, page origin, duration, status, error, and whether cookies were imported. Append-only, never truncated, survives server restarts. Best-effort writes that never block command execution. From @halbert04.
+- **Cookie domain tracking.** gstack now tracks which domains cookies were imported from. Foundation for origin pinning above. Direct imports via `--domain` track automatically. New `--all` flag makes full-browser cookie import an explicit opt-in instead of the default.
+
+### Fixed
+- **Symlink bypass in file writes.** `validateOutputPath` only checked the parent directory for symlinks, not the file itself. A symlink at `/tmp/evil.png` pointing to `/etc/crontab` passed validation because the parent `/tmp` was safe. Now checks the file with `lstatSync` before writing. From @Hybirdss.
+- **Cookie-import path bypass.** Two issues: relative paths bypassed all validation (the `path.isAbsolute()` gate let `sensitive-file.json` through), and symlink resolution was missing (`path.resolve` without `realpathSync`). Now resolves to absolute, resolves symlinks, and checks against safe directories. From @urbantech.
+- **Shell injection in setup scripts.** `gstack-settings-hook` interpolated file paths directly into `bun -e` JavaScript blocks. A path with quotes broke the JS string context. Now uses environment variables (`process.env`). Systematic audit confirmed only this script was vulnerable. From @garagon.
+- **Form field credential leak.** Snapshot redaction only applied to `type="password"` fields. Hidden and text fields named `csrf_token`, `api_key`, `session_id` were exposed unredacted in LLM context. Now checks field name and id against sensitive patterns. From @garagon.
+- **Learnings prompt injection.** Three fixes: input validation (type/key/confidence allowlists), injection pattern detection in insight field (blocks "ignore previous instructions" etc.), and cross-project trust gate (only user-stated learnings cross project boundaries). From @Ziadstr.
+- **IPv6 metadata bypass.** The URL constructor normalizes `::ffff:169.254.169.254` to `::ffff:a9fe:a9fe` (hex), which wasn't in the blocklist. Added both hex-encoded forms. From @mehmoodosman.
+- **Session files world-readable.** Design session files in `/tmp` were created with default permissions (0644). Now 0600 (owner-only). From @garagon.
+- **Frozen lockfile in setup.** `bun install` now uses `--frozen-lockfile` to prevent supply chain attacks via floating semver ranges. From @halbert04.
+- **Dockerfile chmod fix.** Removed duplicate recursive `chmod -R 1777 /tmp` (recursive sticky bit on files has no defined behavior). From @Gonzih.
+- **Hardcoded /tmp in cookie import.** `cookie-import-browser` used `/tmp` directly instead of `os.tmpdir()`, breaking Windows support.
+
+### Security
+- Closed 14 security issues (#665-#675, #566, #479, #467, #545) that were fixed in prior waves but still open on GitHub.
+- Closed 17 community security PRs with thank-you messages and commit references.
+- Security wave 3: 12 fixes, 7 contributors. Big thanks to @Hybirdss, @urbantech, @garagon, @Ziadstr, @halbert04, @mehmoodosman, @Gonzih.
+
+## [0.16.3.0] - 2026-04-09
+
+### Changed
+- **AI slop cleanup.** Ran [slop-scan](https://github.com/benvinegar/slop-scan) and dropped from 100 findings (2.38 score/file) to 90 findings (1.96 score/file). The good part: `safeUnlink()` and `safeKill()` utilities that catch real bugs (swallowed EPERM in shutdown was a silent data loss risk). `safeUnlinkQuiet()` for cleanup paths where throwing is worse than swallowing. `isProcessAlive()` extracted to a shared module with Windows support. Redundant `return await` removed. Typed exception catches (TypeError, DOMException, ENOENT) replace empty catches in system boundary code. The part we tried and reverted: string-matching on error messages was brittle, extension catch-and-log was correct as-is, pass-through wrapper comments were linter gaming. We are AI-coded and proud of it. The goal is code quality, not hiding.
+
+### Added
+- **`bun run slop:diff`** shows only NEW slop-scan findings introduced on your branch vs main. Line-number-insensitive comparison so shifted code doesn't create false positives. Runs automatically after `bun test`.
+- **Slop-scan usage guidelines** in CLAUDE.md: what to fix (genuine quality) vs what NOT to fix (linter gaming). Includes utility function reference table.
+- **Design doc** for future slop-scan integration in `/review` and `/ship` skills (`docs/designs/SLOP_SCAN_FOR_REVIEW_SHIP.md`).
+
 ## [0.16.2.0] - 2026-04-09

 ### Added
@@ -20,6 +20,8 @@ bun run dev:skill    # watch mode: auto-regen + validate on change
 bun run eval:list    # list all eval runs from ~/.gstack-dev/evals/
 bun run eval:compare # compare two eval runs (auto-picks most recent)
 bun run eval:summary # aggregate stats across all eval runs
+bun run slop          # full slop-scan report (all files)
+bun run slop:diff     # slop findings in files changed on this branch only
 ```

 `test:evals` requires `ANTHROPIC_API_KEY`. Codex E2E tests (`test/codex-e2e.test.ts`)
@@ -66,14 +68,15 @@ gstack/
 ├── hosts/           # Typed host configs (one per AI agent)
 │   ├── claude.ts    # Primary host config
 │   ├── codex.ts, factory.ts, kiro.ts  # Existing hosts
-│   ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts  # New hosts
+│   ├── opencode.ts, slate.ts, cursor.ts, openclaw.ts  # IDE hosts
+│   ├── hermes.ts, gbrain.ts  # Agent runtime hosts
 │   └── index.ts     # Registry: exports all, derives Host type
 ├── scripts/         # Build + DX tooling
 │   ├── gen-skill-docs.ts  # Template → SKILL.md generator (config-driven)
 │   ├── host-config.ts     # HostConfig interface + validator
 │   ├── host-config-export.ts  # Shell bridge for setup script
 │   ├── host-adapters/     # Host-specific adapters (OpenClaw tool mapping)
-│   ├── resolvers/   # Template resolver modules (preamble, design, review, etc.)
+│   ├── resolvers/   # Template resolver modules (preamble, design, review, gbrain, etc.)
 │   ├── skill-check.ts     # Health dashboard
 │   └── dev-skill.ts       # Watch mode
 ├── test/            # Skill validation + eval tests
@@ -136,6 +139,11 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
 To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
 To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.

+**Token ceiling:** Generated SKILL.md files must stay under 100KB (~25K tokens).
+`gen-skill-docs` warns if any file exceeds this. If a skill template grows past the
+ceiling, consider extracting optional sections into separate resolvers that only
+inject when relevant, or making verbose evaluation rubrics more concise.
+
 **Merge conflicts on SKILL.md files:** NEVER resolve conflicts on generated SKILL.md
 files by accepting either side. Instead: (1) resolve conflicts on the `.tmpl` templates
 and `scripts/gen-skill-docs.ts` (the sources of truth), (2) run `bun run gen:skill-docs`
@@ -250,6 +258,62 @@ Examples of good bisection:
 When the user says "bisect commit" or "bisect and push," split staged/unstaged
 changes into logical commits and push.

+## Slop-scan: AI code quality, not AI code hiding
+
+We use [slop-scan](https://github.com/benvinegar/slop-scan) to catch patterns where
+AI-generated code is genuinely worse than what a human would write. We are NOT trying
+to pass as human code. We are AI-coded and proud of it. The goal is code quality.
+
+```bash
+npx slop-scan scan .          # human-readable report
+npx slop-scan scan . --json   # machine-readable for diffing
+```
+
+Config: `slop-scan.config.json` at repo root (currently excludes `**/vendor/**`).
+
+### What to fix (genuine quality improvements)
+
+- **Empty catches around file ops** — use `safeUnlink()` (ignores ENOENT, rethrows
+  EPERM/EIO). A swallowed EPERM in cleanup means silent data loss.
+- **Empty catches around process kills** — use `safeKill()` (ignores ESRCH, rethrows
+  EPERM). A swallowed EPERM means you think you killed something you didn't.
+- **Redundant `return await`** — remove when there's no enclosing try block. Saves a
+  microtask, signals intent.
+- **Typed exception catches** — `catch (err) { if (!(err instanceof TypeError)) throw err }`
+  is genuinely better than `catch {}` when the try block does URL parsing or DOM work.
+  You know what error you expect, so say so.
+
+### What NOT to fix (linter gaming, not quality)
+
+- **String-matching on error messages** — `err.message.includes('closed')` is brittle.
+  Playwright/Chrome can change wording anytime. If a fire-and-forget operation can fail
+  for ANY reason and you don't care, `catch {}` is the correct pattern.
+- **Adding comments to exempt pass-through wrappers** — "alias for active session" above
+  a method just to trip slop-scan's exemption rule is noise, not documentation.
+- **Converting extension catch-and-log to selective rethrow** — Chrome extensions crash
+  entirely on uncaught errors. If the catch logs and continues, that IS the right pattern
+  for extension code. Don't make it throw.
+- **Tightening best-effort cleanup paths** — shutdown, emergency cleanup, and disconnect
+  code should use `safeUnlinkQuiet()` (swallows ALL errors). A cleanup path that throws
+  on EPERM means the rest of cleanup doesn't run. That's worse.
+
+### Utilities in `browse/src/error-handling.ts`
+
+| Function | Use when | Behavior |
+|----------|----------|----------|
+| `safeUnlink(path)` | Normal file deletion | Ignores ENOENT, rethrows others |
+| `safeUnlinkQuiet(path)` | Shutdown/emergency cleanup | Swallows all errors |
+| `safeKill(pid, signal)` | Sending signals | Ignores ESRCH, rethrows others |
+| `isProcessAlive(pid)` | Boolean process checks | Returns true/false, never throws |
+
+### Score tracking
+
+Baseline (2026-04-09, before cleanup): 100 findings, 432.8 score, 2.38 score/file.
+After cleanup: 90 findings, 358.1 score, 1.96 score/file.
+
+Don't chase the number. Fix patterns that represent actual code quality problems.
+Accept findings where the "sloppy" pattern is the correct engineering choice.
+
 ## Community PR guardrails

 When reviewing or merging community PRs, **always AskUserQuestion** before accepting
@@ -110,7 +110,7 @@ These are conversational skills. Your OpenClaw agent runs them directly via chat

 ### Other AI Agents

-gstack works on 8 AI coding agents, not just Claude. Setup auto-detects which
+gstack works on 10 AI coding agents, not just Claude. Setup auto-detects which
 agents you have installed:

 ```bash
@@ -128,6 +128,8 @@ Or target a specific agent with `./setup --host <name>`:
 | Factory Droid | `--host factory` | `~/.factory/skills/gstack-*/` |
 | Slate | `--host slate` | `~/.slate/skills/gstack-*/` |
 | Kiro | `--host kiro` | `~/.kiro/skills/gstack-*/` |
+| Hermes | `--host hermes` | `~/.hermes/skills/gstack-*/` |
+| GBrain (mod) | `--host gbrain` | `~/.gbrain/skills/gstack-*/` |

 **Want to add support for another agent?** See [docs/ADDING_A_HOST.md](docs/ADDING_A_HOST.md).
 It's one TypeScript config file, zero code changes.
@@ -236,6 +238,10 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-

 **[Deep dives with examples and philosophy for every skill →](docs/skills.md)**

+### Karpathy's four failure modes? Already covered.
+
+Andrej Karpathy's [AI coding rules](https://github.com/forrestchang/andrej-karpathy-skills) (17K stars) nail four failure modes: wrong assumptions, overcomplexity, orthogonal edits, imperative over declarative. gstack's workflow skills enforce all four. `/office-hours` forces assumptions into the open before code is written. The Confusion Protocol stops Claude from guessing on architectural decisions. `/review` catches unnecessary complexity and drive-by edits. `/ship` transforms tasks into verifiable goals with test-first execution. If you already use Karpathy-style CLAUDE.md rules, gstack is the workflow enforcement layer that makes them stick across entire sprints, not just single prompts.
+
 ## Parallel sprints

 gstack works well with one sprint. It gets interesting with ten running at once.
@@ -11,6 +11,11 @@ allowed-tools:
  - Bash
  - Read
  - AskUserQuestion
+triggers:
+  - browse this page
+  - take a screenshot
+  - navigate to url
+  - inspect the page

 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
@@ -255,6 +260,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
@@ -466,7 +473,7 @@ Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs,
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -719,6 +726,7 @@ The snapshot is your primary tool for understanding and interacting with pages.
 -a        --annotate              Annotated screenshot with red overlay boxes and ref labels
 -o <path> --output                Output path for annotated screenshot (default: <temp>/browse-annotated.png)
 -C        --cursor-interactive    Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used.
+-H <json> --heatmap               Color-coded overlay screenshot from JSON map: '{"@e1":"green","@e3":"red"}'. Valid colors: green, yellow, red, blue, orange, gray.
 ```

 All flags can be combined freely. `-o` only applies when `-a` is also used.
@@ -825,6 +833,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
 | `network [--clear]` | Network requests |
 | `perf` | Page load timings |
 | `storage [set k v]` | Read all localStorage + sessionStorage as JSON, or set <key> <value> to write localStorage |
+| `ux-audit` | Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation. |

 ### Visual
 | Command | Description |
@@ -11,6 +11,11 @@ allowed-tools:
  - Bash
  - Read
  - AskUserQuestion
+triggers:
+  - browse this page
+  - take a screenshot
+  - navigate to url
+  - inspect the page

 ---

@@ -1 +1 @@
-0.16.2.0
+0.18.0.1
@@ -13,6 +13,10 @@ description: |
  gauntlet without answering 15-30 intermediate questions. (gstack)
  Voice triggers (speech-to-text aliases): "auto plan", "automatic review".
 benefits-from: [office-hours]
+triggers:
+  - run all reviews
+  - automatic review pipeline
+  - auto plan review
 allowed-tools:
  - Bash
  - Read
@@ -265,6 +269,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -383,6 +389,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -15,6 +15,10 @@ voice-triggers:
  - "auto plan"
  - "automatic review"
 benefits-from: [office-hours]
+triggers:
+  - run all reviews
+  - automatic review pipeline
+  - auto plan review
 allowed-tools:
  - Bash
  - Read
@@ -9,6 +9,10 @@ description: |
  Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
  "bundle size", "load time". (gstack)
  Voice triggers (speech-to-text aliases): "speed test", "check performance".
+triggers:
+  - performance benchmark
+  - check page speed
+  - detect performance regression
 allowed-tools:
  - Bash
  - Read
@@ -258,6 +262,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
@@ -429,7 +435,7 @@ plan's living status.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -11,6 +11,10 @@ description: |
 voice-triggers:
  - "speed test"
  - "check performance"
+triggers:
+  - performance benchmark
+  - check page speed
+  - detect performance regression
 allowed-tools:
  - Bash
  - Read
@@ -167,8 +167,11 @@ function getGitRemote(cwd: string): string | null {
      stdio: ["pipe", "pipe", "pipe"],
    }).trim();
    return remote || null;
-  } catch {
-    return null;
+  } catch (err: any) {
+    // Expected: no remote configured, repo not found, git not installed
+    if (err?.status !== undefined) return null; // non-zero exit from git
+    if (err?.code === 'ENOENT') return null;    // git binary not found
+    throw err;
  }
 }

@@ -183,8 +186,9 @@ function scanClaudeCode(since: Date): Session[] {
  let dirs: string[];
  try {
    dirs = readdirSync(projectsDir);
-  } catch {
-    return [];
+  } catch (err: any) {
+    if (err?.code === 'ENOENT' || err?.code === 'EACCES') return [];
+    throw err;
  }

  for (const dirName of dirs) {
@@ -209,8 +213,9 @@ function scanClaudeCode(since: Date): Session[] {
    const hasRecentFile = jsonlFiles.some((f) => {
      try {
        return statSync(join(dirPath, f)).mtime >= since;
-      } catch {
-        return false;
+      } catch (err: any) {
+        if (err?.code === 'ENOENT' || err?.code === 'EACCES') return false;
+        throw err;
      }
    });
    if (!hasRecentFile) continue;
@@ -223,8 +228,9 @@ function scanClaudeCode(since: Date): Session[] {
    const recentFiles = jsonlFiles.filter((f) => {
      try {
        return statSync(join(dirPath, f)).mtime >= since;
-      } catch {
-        return false;
+      } catch (err: any) {
+        if (err?.code === 'ENOENT' || err?.code === 'EACCES') return false;
+        throw err;
      }
    });
    for (let i = 0; i < recentFiles.length; i++) {
@@ -251,8 +257,9 @@ function resolveClaudeCodeCwd(
    .map((f) => {
      try {
        return { name: f, mtime: statSync(join(dirPath, f)).mtime.getTime() };
-      } catch {
-        return null;
+      } catch (err: any) {
+        if (err?.code === 'ENOENT' || err?.code === 'EACCES') return null;
+        throw err;
      }
    })
    .filter(Boolean)
@@ -381,8 +388,9 @@ function scanGemini(since: Date): Session[] {
  let projectDirs: string[];
  try {
    projectDirs = readdirSync(tmpDir);
-  } catch {
-    return [];
+  } catch (err: any) {
+    if (err?.code === 'ENOENT' || err?.code === 'EACCES') return [];
+    throw err;
  }

  for (const projectName of projectDirs) {
@@ -12,19 +12,75 @@ mkdir -p "$GSTACK_HOME/projects/$SLUG"

 INPUT="$1"

-# Validate: input must be parseable JSON
-if ! printf '%s' "$INPUT" | bun -e "JSON.parse(await Bun.stdin.text())" 2>/dev/null; then
-  echo "gstack-learnings-log: invalid JSON, skipping" >&2
+# Validate and sanitize input
+VALIDATED=$(printf '%s' "$INPUT" | bun -e "
+const raw = await Bun.stdin.text();
+let j;
+try { j = JSON.parse(raw); } catch { process.stderr.write('gstack-learnings-log: invalid JSON, skipping\n'); process.exit(1); }
+
+// Field validation: type must be from allowed list
+const ALLOWED_TYPES = ['pattern', 'pitfall', 'preference', 'architecture', 'tool', 'operational'];
+if (!j.type || !ALLOWED_TYPES.includes(j.type)) {
+  process.stderr.write('gstack-learnings-log: invalid type \"' + (j.type || '') + '\", must be one of: ' + ALLOWED_TYPES.join(', ') + '\n');
+  process.exit(1);
+}
+
+// Field validation: key must be alphanumeric, hyphens, underscores (no injection surface)
+if (!j.key || !/^[a-zA-Z0-9_-]+$/.test(j.key)) {
+  process.stderr.write('gstack-learnings-log: invalid key, must be alphanumeric with hyphens/underscores only\n');
+  process.exit(1);
+}
+
+// Field validation: confidence must be 1-10
+const conf = Number(j.confidence);
+if (!Number.isInteger(conf) || conf < 1 || conf > 10) {
+  process.stderr.write('gstack-learnings-log: confidence must be integer 1-10\n');
+  process.exit(1);
+}
+j.confidence = conf;
+
+// Field validation: source must be from allowed list
+const ALLOWED_SOURCES = ['observed', 'user-stated', 'inferred', 'cross-model'];
+if (j.source && !ALLOWED_SOURCES.includes(j.source)) {
+  process.stderr.write('gstack-learnings-log: invalid source, must be one of: ' + ALLOWED_SOURCES.join(', ') + '\n');
+  process.exit(1);
+}
+
+// Content sanitization: strip instruction-like patterns from insight field
+// These patterns could be used for prompt injection when learnings are loaded into agent context
+if (j.insight) {
+  const INJECTION_PATTERNS = [
+    /ignore\s+(all\s+)?previous\s+(instructions|context|rules)/i,
+    /you\s+are\s+now\s+/i,
+    /always\s+output\s+no\s+findings/i,
+    /skip\s+(all\s+)?(security|review|checks)/i,
+    /override[:\s]/i,
+    /\bsystem\s*:/i,
+    /\bassistant\s*:/i,
+    /\buser\s*:/i,
+    /do\s+not\s+(report|flag|mention)/i,
+    /approve\s+(all|every|this)/i,
+  ];
+  for (const pat of INJECTION_PATTERNS) {
+    if (pat.test(j.insight)) {
+      process.stderr.write('gstack-learnings-log: insight contains suspicious instruction-like content, rejected\n');
+      process.exit(1);
+    }
+  }
+}
+
+// Inject timestamp if not present
+if (!j.ts) j.ts = new Date().toISOString();
+
+// Mark trust level based on source
+// user-stated = user explicitly told the agent this. All others are AI-generated.
+j.trusted = j.source === 'user-stated';
+
+console.log(JSON.stringify(j));
+" 2>/dev/null)
+
+if [ $? -ne 0 ] || [ -z "$VALIDATED" ]; then
  exit 1
 fi

-# Inject timestamp if not present
-if ! printf '%s' "$INPUT" | bun -e "const j=JSON.parse(await Bun.stdin.text()); if(!j.ts) process.exit(1)" 2>/dev/null; then
-  INPUT=$(printf '%s' "$INPUT" | bun -e "
-    const j = JSON.parse(await Bun.stdin.text());
-    j.ts = new Date().toISOString();
-    console.log(JSON.stringify(j));
-  " 2>/dev/null) || true
-fi
-
-echo "$INPUT" >> "$GSTACK_HOME/projects/$SLUG/learnings.jsonl"
+echo "$VALIDATED" >> "$GSTACK_HOME/projects/$SLUG/learnings.jsonl"
@@ -68,7 +68,13 @@ for (const line of lines) {

    // Determine if this is from the current project or cross-project
    // Cross-project entries are tagged for display
-    e._crossProject = !line.includes(slug) && process.env.GSTACK_SEARCH_CROSS === 'true';
+    const isCrossProject = !line.includes(slug) && process.env.GSTACK_SEARCH_CROSS === 'true';
+    e._crossProject = isCrossProject;
+
+    // Trust gate: cross-project learnings only loaded if trusted (user-stated)
+    // This prevents prompt injection from one project's AI-generated learnings
+    // silently influencing reviews in another project.
+    if (isCrossProject && e.trusted === false) continue;

    entries.push(e);
  } catch {}
@@ -26,10 +26,10 @@ fi

 case "$ACTION" in
  add)
-    bun -e "
+    GSTACK_SETTINGS_PATH="$SETTINGS_FILE" GSTACK_HOOK_CMD="$HOOK_CMD" bun -e "
      const fs = require('fs');
-      const settingsPath = '$SETTINGS_FILE';
-      const hookCmd = $(printf '%s' "$HOOK_CMD" | bun -e "process.stdout.write(JSON.stringify(require('fs').readFileSync('/dev/stdin','utf8')))");
+      const settingsPath = process.env.GSTACK_SETTINGS_PATH;
+      const hookCmd = process.env.GSTACK_HOOK_CMD;

      let settings = {};
      try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {}
@@ -54,10 +54,10 @@ case "$ACTION" in
    " 2>/dev/null
    ;;
  remove)
-    [ -f "$SETTINGS_FILE" ] || exit 0
-    bun -e "
+    [ -f "$SETTINGS_FILE" ] || exit 1
+    GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e "
      const fs = require('fs');
-      const settingsPath = '$SETTINGS_FILE';
+      const settingsPath = process.env.GSTACK_SETTINGS_PATH;

      let settings = {};
      try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch { process.exit(0); }
@@ -139,9 +139,9 @@ HOOK_EOF

  # Add hook to project-level settings.json
  if command -v bun >/dev/null 2>&1; then
-    bun -e "
+    GSTACK_SETTINGS_PATH="$SETTINGS" bun -e "
      const fs = require('fs');
-      const settingsPath = '$SETTINGS';
+      const settingsPath = process.env.GSTACK_SETTINGS_PATH;

      let settings = {};
      try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {}
@@ -9,6 +9,10 @@ description: |
  ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
  user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
  site", "take a screenshot", or "dogfood this". (gstack)
+triggers:
+  - browse a page
+  - headless browser
+  - take page screenshot
 allowed-tools:
  - Bash
  - Read
@@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
@@ -433,7 +439,7 @@ State persists between calls (cookies, tabs, login sessions).
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -587,6 +593,7 @@ The snapshot is your primary tool for understanding and interacting with pages.
 -a        --annotate              Annotated screenshot with red overlay boxes and ref labels
 -o <path> --output                Output path for annotated screenshot (default: <temp>/browse-annotated.png)
 -C        --cursor-interactive    Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used.
+-H <json> --heatmap               Color-coded overlay screenshot from JSON map: '{"@e1":"green","@e3":"red"}'. Valid colors: green, yellow, red, blue, orange, gray.
 ```

 All flags can be combined freely. `-o` only applies when `-a` is also used.
@@ -717,6 +724,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
 | `network [--clear]` | Network requests |
 | `perf` | Page load timings |
 | `storage [set k v]` | Read all localStorage + sessionStorage as JSON, or set <key> <value> to write localStorage |
+| `ux-audit` | Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation. |

 ### Visual
 | Command | Description |
@@ -9,6 +9,10 @@ description: |
  ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
  user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
  site", "take a screenshot", or "dogfood this". (gstack)
+triggers:
+  - browse a page
+  - headless browser
+  - take page screenshot
 allowed-tools:
  - Bash
  - Read
@@ -14,13 +14,19 @@ DIST_DIR="$GSTACK_DIR/browse/dist"
 echo "Building Node-compatible server bundle..."

 # Step 1: Transpile server.ts to a single .mjs bundle (externalize runtime deps)
+#
+# Externalize packages with native addons, dynamic imports, or runtime resolution.
+# If you add a new dependency that uses `await import()` or has a .node addon,
+# add it here. Otherwise `bun build --outfile` will fail with
+# "cannot write multiple output files without an output directory".
 bun build "$SRC_DIR/server.ts" \
  --target=node \
  --outfile "$DIST_DIR/server-node.mjs" \
  --external playwright \
  --external playwright-core \
  --external diff \
-  --external "bun:sqlite"
+  --external "bun:sqlite" \
+  --external "@ngrok/ngrok"

 # Step 2: Post-process
 # Replace import.meta.dir with a resolvable reference
@@ -0,0 +1,65 @@
+/**
+ * Persistent command audit log — forensic trail for all browse server commands.
+ *
+ * Writes append-only JSONL to .gstack/browse-audit.jsonl. Unlike the in-memory
+ * ring buffers (console, network, dialog), the audit log persists across server
+ * restarts and is never truncated by the server. Each entry records:
+ *
+ *   - timestamp, command, args (truncated), page origin
+ *   - duration, status (ok/error), error message if any
+ *   - whether cookies were imported (elevated security context)
+ *   - connection mode (headless/headed)
+ *
+ * All writes are best-effort — audit failures never cause command failures.
+ */
+
+import * as fs from 'fs';
+
+export interface AuditEntry {
+  ts: string;
+  cmd: string;
+  args: string;
+  origin: string;
+  durationMs: number;
+  status: 'ok' | 'error';
+  error?: string;
+  hasCookies: boolean;
+  mode: 'launched' | 'headed';
+}
+
+const MAX_ARGS_LENGTH = 200;
+const MAX_ERROR_LENGTH = 300;
+
+let auditPath: string | null = null;
+
+export function initAuditLog(logPath: string): void {
+  auditPath = logPath;
+}
+
+export function writeAuditEntry(entry: AuditEntry): void {
+  if (!auditPath) return;
+  try {
+    const truncatedArgs = entry.args.length > MAX_ARGS_LENGTH
+      ? entry.args.slice(0, MAX_ARGS_LENGTH) + '…'
+      : entry.args;
+    const truncatedError = entry.error && entry.error.length > MAX_ERROR_LENGTH
+      ? entry.error.slice(0, MAX_ERROR_LENGTH) + '…'
+      : entry.error;
+
+    const record: Record<string, unknown> = {
+      ts: entry.ts,
+      cmd: entry.cmd,
+      args: truncatedArgs,
+      origin: entry.origin,
+      durationMs: entry.durationMs,
+      status: entry.status,
+      hasCookies: entry.hasCookies,
+      mode: entry.mode,
+    };
+    if (truncatedError) record.error = truncatedError;
+
+    fs.appendFileSync(auditPath, JSON.stringify(record) + '\n');
+  } catch {
+    // Audit write failures are silent — never block command execution
+  }
+}
@@ -55,6 +55,9 @@ export class BrowserManager {
  private dialogAutoAccept: boolean = true;
  private dialogPromptText: string | null = null;

+  // ─── Cookie Origin Tracking ────────────────────────────────
+  private cookieImportedDomains: Set<string> = new Set();
+
  // ─── Handoff State ─────────────────────────────────────────
  private isHeaded: boolean = false;
  private consecutiveFailures: number = 0;
@@ -127,7 +130,9 @@ export class BrowserManager {
        if (fs.existsSync(path.join(candidate, 'manifest.json'))) {
          return candidate;
        }
-      } catch {}
+      } catch (err: any) {
+        if (err?.code !== 'ENOENT' && err?.code !== 'EACCES') throw err;
+      }
    }
    return null;
  }
@@ -288,11 +293,16 @@ export class BrowserManager {
          let origIcon = iconMatch ? iconMatch[1] : 'app';
          if (!origIcon.endsWith('.icns')) origIcon += '.icns';
          const destIcon = path.join(chromeResources, origIcon);
-          try { fs.copyFileSync(iconSrc, destIcon); } catch { /* non-fatal */ }
+          try {
+            fs.copyFileSync(iconSrc, destIcon);
+          } catch (err: any) {
+            if (err?.code !== 'ENOENT' && err?.code !== 'EACCES') throw err;
+          }
        }
      }
-    } catch {
-      // Non-fatal: app name just stays as Chrome for Testing
+    } catch (err: any) {
+      // Non-fatal: app name stays as Chrome for Testing (ENOENT/EACCES expected)
+      if (err?.code !== 'ENOENT' && err?.code !== 'EACCES') throw err;
    }

    // Build custom user agent: keep Chrome version for site compatibility,
@@ -364,7 +374,11 @@ export class BrowserManager {
      const cleanup = () => {
        for (const key of Object.keys(window)) {
          if (key.startsWith('cdc_') || key.startsWith('__webdriver')) {
-            try { delete (window as any)[key]; } catch {}
+            try {
+              delete (window as any)[key];
+            } catch (e: any) {
+              if (!(e instanceof TypeError)) throw e;
+            }
          }
        }
      };
@@ -446,7 +460,9 @@ export class BrowserManager {
      this.activeTabId = id;
      this.wirePageEvents(page);
      // Inject indicator on restored page (addInitScript only fires on new navigations)
-      try { await page.evaluate(indicatorScript); } catch {}
+      try {
+        await page.evaluate(indicatorScript);
+      } catch {}
    } else {
      await this.newTab();
    }
@@ -581,7 +597,9 @@ export class BrowserManager {
    try {
      const u = new URL(activeUrl);
      activeOriginPath = u.origin + u.pathname;
-    } catch {}
+    } catch (err: any) {
+      if (!(err instanceof TypeError)) throw err;
+    }

    for (const [id, page] of this.pages) {
      try {
@@ -598,7 +616,9 @@ export class BrowserManager {
            if (pu.origin + pu.pathname === activeOriginPath) {
              fuzzyId = id;
            }
-          } catch {}
+          } catch (err: any) {
+            if (!(err instanceof TypeError)) throw err;
+          }
        }
      } catch {}
    }
@@ -732,6 +752,19 @@ export class BrowserManager {
    return this.dialogPromptText;
  }

+  // ─── Cookie Origin Tracking ────────────────────────────────
+  trackCookieImportDomains(domains: string[]): void {
+    for (const d of domains) this.cookieImportedDomains.add(d);
+  }
+
+  getCookieImportedDomains(): ReadonlySet<string> {
+    return this.cookieImportedDomains;
+  }
+
+  hasCookieImports(): boolean {
+    return this.cookieImportedDomains.size > 0;
+  }
+
  // ─── Viewport ──────────────────────────────────────────────
  async setViewport(width: number, height: number) {
    await this.getPage().setViewportSize({ width, height });
@@ -1131,7 +1164,7 @@ export class BrowserManager {
          await dialog.dismiss();
        }
      } catch {
-        // Dialog may have been dismissed by navigation — ignore
+        // Dialog may have been dismissed by navigation
      }
    });

@@ -98,8 +98,9 @@ async function getOrCreateSession(page: Page): Promise<any> {
    try {
      await session.send('DOM.getDocument', { depth: 0 });
      return session;
-    } catch {
-      // Session is stale — recreate
+    } catch (err: any) {
+      // Session is stale — recreate (CDP disconnects throw on closed/Target errors)
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('detached')) throw err;
      cdpSessions.delete(page);
      initializedPages.delete(page);
    }
@@ -117,7 +118,9 @@ async function getOrCreateSession(page: Page): Promise<any> {
  page.once('framenavigated', () => {
    try {
      session.detach().catch(() => {});
-    } catch {}
+    } catch (err: any) {
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('detached')) throw err;
+    }
    cdpSessions.delete(page);
    initializedPages.delete(page);
  });
@@ -258,8 +261,9 @@ export async function inspectElement(
        left: border[0] - margin[0],
      },
    };
-  } catch {
-    // Element may not have a box model (e.g., display:none)
+  } catch (err: any) {
+    // Element may not have a box model (e.g., display:none) — CDP returns "Could not compute box model"
+    if (!err?.message?.includes('box model') && !err?.message?.includes('Could not compute')) throw err;
  }

  // Get matched styles
@@ -315,10 +319,8 @@ export async function inspectElement(

      if (rule.styleSheetId) {
        styleSheetId = rule.styleSheetId;
-        try {
-          // Try to resolve stylesheet URL
-          source = rule.origin === 'regular' ? (rule.styleSheetId || 'stylesheet') : rule.origin;
-        } catch {}
+        // Resolve stylesheet source name
+        source = rule.origin === 'regular' ? (rule.styleSheetId || 'stylesheet') : rule.origin;
      }

      if (rule.style?.range) {
@@ -328,15 +330,7 @@ export async function inspectElement(
      }

      // Try to get a friendly source name from stylesheet
-      if (styleSheetId) {
-        try {
-          // Stylesheet URL might be embedded in the rule data
-          // CDP provides sourceURL in some cases
-          if (rule.style?.cssText) {
-            // Parse source from the styleSheetId metadata
-          }
-        } catch {}
-      }
+      // (styleSheetId metadata is available via CDP — see stylesheet URL resolution below)

      // Get media query if present
      let media: string | undefined;
@@ -433,15 +427,9 @@ export async function inspectElement(
  }

  // Resolve stylesheet URLs for better source info
-  for (const rule of matchedRules) {
-    if (rule.styleSheetId && rule.source !== 'inline') {
-      try {
-        const sheetMeta = await session.send('CSS.getStyleSheetText', { styleSheetId: rule.styleSheetId }).catch(() => null);
-        // Try to get the stylesheet header for URL info
-        // The styleSheetId itself is opaque, but we can try to get source URL
-      } catch {}
-    }
-  }
+  // Note: CSS.getStyleSheetText is called per-rule but result is unused — the styleSheetId
+  // is opaque and CDP doesn't expose a direct URL lookup. Left as a placeholder for future
+  // enhancement (e.g., CSS.styleSheetAdded event tracking).

  return {
    selector,
@@ -531,8 +519,9 @@ export async function modifyStyle(
        method = 'setStyleTexts';
        source = `${targetRule.source}:${targetRule.sourceLine}`;
        sourceLine = targetRule.sourceLine;
-      } catch {
-        // Fall back to inline
+      } catch (err: any) {
+        // Fall back to inline — setStyleTexts fails on immutable stylesheets or stale ranges
+        if (!err?.message?.includes('style') && !err?.message?.includes('range') && !err?.message?.includes('closed') && !err?.message?.includes('Target')) throw err;
      }
    }

@@ -591,8 +580,9 @@ export async function undoModification(page: Page, index?: number): Promise<void
      await modifyStyle(page, mod.selector, mod.property, mod.oldValue);
      // Remove the undo modification from history (it's a restore, not a new mod)
      modificationHistory.pop();
-    } catch {
-      // Fall back to inline restore
+    } catch (err: any) {
+      // Fall back to inline restore — CDP may have disconnected or stylesheet changed
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('style') && !err?.message?.includes('not found') && !err?.message?.includes('Element')) throw err;
      await page.evaluate(
        ([sel, prop, val]) => {
          const el = document.querySelector(sel);
@@ -652,8 +642,9 @@ export async function resetModifications(page: Page): Promise<void> {
        },
        [mod.selector, mod.property, mod.oldValue]
      );
-    } catch {
-      // Best effort
+    } catch (err: any) {
+      // Best effort — page may have navigated or element may be gone
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context')) throw err;
    }
  }
  modificationHistory.length = 0;
@@ -757,7 +748,7 @@ export function detachSession(page?: Page): void {
  if (page) {
    const session = cdpSessions.get(page);
    if (session) {
-      try { session.detach().catch(() => {}); } catch {}
+      try { session.detach().catch(() => {}); } catch (err: any) { if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('detached')) throw err; }
      cdpSessions.delete(page);
      initializedPages.delete(page);
    }
@@ -11,6 +11,7 @@

 import * as fs from 'fs';
 import * as path from 'path';
+import { safeUnlink, safeUnlinkQuiet, safeKill, isProcessAlive } from './error-handling';
 import { resolveConfig, ensureStateDir, readVersionHash } from './config';

 const config = resolveConfig();
@@ -103,27 +104,7 @@ function readState(): ServerState | null {
  }
 }

-function isProcessAlive(pid: number): boolean {
-  if (IS_WINDOWS) {
-    // Bun's compiled binary can't signal Windows PIDs (always throws ESRCH).
-    // Use tasklist as a fallback. Only for one-shot calls — too slow for polling loops.
-    try {
-      const result = Bun.spawnSync(
-        ['tasklist', '/FI', `PID eq ${pid}`, '/NH', '/FO', 'CSV'],
-        { stdout: 'pipe', stderr: 'pipe', timeout: 3000 }
-      );
-      return result.stdout.toString().includes(`"${pid}"`);
-    } catch {
-      return false;
-    }
-  }
-  try {
-    process.kill(pid, 0);
-    return true;
-  } catch {
-    return false;
-  }
-}
+// isProcessAlive is imported from ./error-handling

 /**
 * HTTP health check — definitive proof the server is alive and responsive.
@@ -153,7 +134,9 @@ async function killServer(pid: number): Promise<void> {
        ['taskkill', '/PID', String(pid), '/T', '/F'],
        { stdout: 'pipe', stderr: 'pipe', timeout: 5000 }
      );
-    } catch {}
+    } catch (err: any) {
+      if (err?.code !== 'ENOENT') throw err;
+    }
    const deadline = Date.now() + 2000;
    while (Date.now() < deadline && isProcessAlive(pid)) {
      await Bun.sleep(100);
@@ -161,7 +144,7 @@ async function killServer(pid: number): Promise<void> {
    return;
  }

-  try { process.kill(pid, 'SIGTERM'); } catch { return; }
+  safeKill(pid, 'SIGTERM');

  // Wait up to 2s for graceful shutdown
  const deadline = Date.now() + 2000;
@@ -171,7 +154,7 @@ async function killServer(pid: number): Promise<void> {

  // Force kill if still alive
  if (isProcessAlive(pid)) {
-    try { process.kill(pid, 'SIGKILL'); } catch {}
+    safeKill(pid, 'SIGKILL');
  }
 }

@@ -197,10 +180,10 @@ function cleanupLegacyState(): void {
          });
          const cmd = check.stdout.toString().trim();
          if (cmd.includes('bun') || cmd.includes('server.ts')) {
-            try { process.kill(data.pid, 'SIGTERM'); } catch {}
+            safeKill(data.pid, 'SIGTERM');
          }
        }
-        fs.unlinkSync(fullPath);
+        safeUnlink(fullPath);
      } catch {
        // Best effort — skip files we can't parse or clean up
      }
@@ -210,7 +193,7 @@ function cleanupLegacyState(): void {
      f.startsWith('browse-console') || f.startsWith('browse-network') || f.startsWith('browse-dialog')
    );
    for (const file of logFiles) {
-      try { fs.unlinkSync(`/tmp/${file}`); } catch {}
+      safeUnlink(`/tmp/${file}`);
    }
  } catch {
    // /tmp read failed — skip legacy cleanup
@@ -222,8 +205,8 @@ async function startServer(extraEnv?: Record<string, string>): Promise<ServerSta
  ensureStateDir(config);

  // Clean up stale state file and error log
-  try { fs.unlinkSync(config.stateFile); } catch {}
-  try { fs.unlinkSync(path.join(config.stateDir, 'browse-startup-error.log')); } catch {}
+  safeUnlink(config.stateFile);
+  safeUnlink(path.join(config.stateDir, 'browse-startup-error.log'));

  let proc: any = null;

@@ -297,7 +280,7 @@ function acquireServerLock(): (() => void) | null {
    const fd = fs.openSync(lockPath, 'wx');
    fs.writeSync(fd, `${process.pid}\n`);
    fs.closeSync(fd);
-    return () => { try { fs.unlinkSync(lockPath); } catch {} };
+    return () => { safeUnlink(lockPath); };
  } catch {
    // Lock already held — check if the holder is still alive
    try {
@@ -469,7 +452,9 @@ function isNgrokAvailable(): boolean {
    try {
      const content = fs.readFileSync(conf, 'utf-8');
      if (content.includes('authtoken:')) return true;
-    } catch {}
+    } catch (err: any) {
+      if (err?.code !== 'ENOENT') throw err;
+    }
  }

  return false;
@@ -797,10 +782,10 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:

    // Kill ANY existing server (SIGTERM → wait 2s → SIGKILL)
    if (existingState && isProcessAlive(existingState.pid)) {
-      try { process.kill(existingState.pid, 'SIGTERM'); } catch {}
+      safeKill(existingState.pid, 'SIGTERM');
      await new Promise(resolve => setTimeout(resolve, 2000));
      if (isProcessAlive(existingState.pid)) {
-        try { process.kill(existingState.pid, 'SIGKILL'); } catch {}
+        safeKill(existingState.pid, 'SIGKILL');
        await new Promise(resolve => setTimeout(resolve, 1000));
      }
    }
@@ -814,24 +799,24 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
      const lockTarget = fs.readlinkSync(singletonLock); // e.g. "hostname-12345"
      const orphanPid = parseInt(lockTarget.split('-').pop() || '', 10);
      if (orphanPid && isProcessAlive(orphanPid)) {
-        try { process.kill(orphanPid, 'SIGTERM'); } catch {}
+        safeKill(orphanPid, 'SIGTERM');
        await new Promise(resolve => setTimeout(resolve, 1000));
        if (isProcessAlive(orphanPid)) {
-          try { process.kill(orphanPid, 'SIGKILL'); } catch {}
+          safeKill(orphanPid, 'SIGKILL');
          await new Promise(resolve => setTimeout(resolve, 500));
        }
      }
-    } catch {
-      // No lock symlink or not readable — nothing to kill
+    } catch (err: any) {
+      if (err?.code !== 'ENOENT' && err?.code !== 'EINVAL') throw err;
    }

    // Clean up Chromium profile locks (can persist after crashes)
    for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
-      try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch {}
+      safeUnlinkQuiet(path.join(profileDir, lockFile));
    }

    // Delete stale state file
-    try { fs.unlinkSync(config.stateFile); } catch {}
+    safeUnlinkQuiet(config.stateFile);

    console.log('Launching headed Chromium with extension + sidebar agent...');
    try {
@@ -877,7 +862,9 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
        try {
          fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
          fs.writeFileSync(agentQueue, '', { mode: 0o600 });
-        } catch {}
+        } catch (err: any) {
+          if (err?.code !== 'EACCES') throw err;
+        }

        // Resolve browse binary path the same way — execPath-relative
        let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
@@ -891,7 +878,9 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
        try {
          const { spawnSync } = require('child_process');
          spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
-        } catch {}
+        } catch (err: any) {
+          if (err?.code !== 'ENOENT') throw err;
+        }

        const agentProc = Bun.spawn(['bun', 'run', agentScript], {
          cwd: config.projectDir,
@@ -947,18 +936,18 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
    }
    // Force kill + cleanup
    if (isProcessAlive(existingState.pid)) {
-      try { process.kill(existingState.pid, 'SIGTERM'); } catch {}
+      safeKill(existingState.pid, 'SIGTERM');
      await new Promise(resolve => setTimeout(resolve, 2000));
      if (isProcessAlive(existingState.pid)) {
-        try { process.kill(existingState.pid, 'SIGKILL'); } catch {}
+        safeKill(existingState.pid, 'SIGKILL');
      }
    }
    // Clean profile locks and state file
    const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
    for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
-      try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch {}
+      safeUnlinkQuiet(path.join(profileDir, lockFile));
    }
-    try { fs.unlinkSync(config.stateFile); } catch {}
+    safeUnlinkQuiet(config.stateFile);
    console.log('Disconnected (server was unresponsive — force cleaned).');
    process.exit(0);
  }
@@ -40,6 +40,7 @@ export const META_COMMANDS = new Set([
  'watch',
  'state',
  'frame',
+  'ux-audit',
 ]);

 export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
@@ -49,6 +50,7 @@ export const PAGE_CONTENT_COMMANDS = new Set([
  'text', 'html', 'links', 'forms', 'accessibility', 'attrs',
  'console', 'dialog',
  'media', 'data',
+  'ux-audit',
 ]);

 /** Wrap output from untrusted-content commands with trust boundary markers */
@@ -146,6 +148,8 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
  'style':   { category: 'Interaction', description: 'Modify CSS property on element (with undo support)', usage: 'style <sel> <prop> <value> | style --undo [N]' },
  'cleanup': { category: 'Interaction', description: 'Remove page clutter (ads, cookie banners, sticky elements, social widgets)', usage: 'cleanup [--ads] [--cookies] [--sticky] [--social] [--all]' },
  'prettyscreenshot': { category: 'Visual', description: 'Clean screenshot with optional cleanup, scroll positioning, and element hiding', usage: 'prettyscreenshot [--scroll-to sel|text] [--cleanup] [--hide sel...] [--width px] [path]' },
+  // UX Audit
+  'ux-audit': { category: 'Inspection', description: 'Extract page structure for UX behavioral analysis — site ID, nav, headings, text blocks, interactive elements. Returns JSON for agent interpretation.', usage: 'ux-audit' },
 };

 // Load-time validation: descriptions must cover exactly the command sets
@@ -20,6 +20,7 @@ export interface BrowseConfig {
  consoleLog: string;
  networkLog: string;
  dialogLog: string;
+  auditLog: string;
 }

 /**
@@ -70,6 +71,7 @@ export function resolveConfig(
    consoleLog: path.join(stateDir, 'browse-console.log'),
    networkLog: path.join(stateDir, 'browse-network.log'),
    dialogLog: path.join(stateDir, 'browse-dialog.log'),
+    auditLog: path.join(stateDir, 'browse-audit.jsonl'),
  };
 }

@@ -85,7 +85,7 @@ const ARIA_INJECTION_PATTERNS = [
 *   - ARIA labels with injection patterns
 */
 export async function markHiddenElements(page: Page | Frame): Promise<string[]> {
-  return await page.evaluate((ariaPatterns: string[]) => {
+  return page.evaluate((ariaPatterns: string[]) => {
    const found: string[] = [];
    const elements = document.querySelectorAll('body *');

@@ -167,7 +167,7 @@ export async function markHiddenElements(page: Page | Frame): Promise<string[]>
 * Uses clone + remove approach: clones body, removes marked elements, returns innerText.
 */
 export async function getCleanTextWithStripping(page: Page | Frame): Promise<string> {
-  return await page.evaluate(() => {
+  return page.evaluate(() => {
    const body = document.body;
    if (!body) return '';
    const clone = body.cloneNode(true) as HTMLElement;
@@ -386,7 +386,8 @@ function openDb(dbPath: string, browserName: string): Database {
 }

 function openDbFromCopy(dbPath: string, browserName: string): Database {
-  const tmpPath = `/tmp/browse-cookies-${browserName.toLowerCase()}-${crypto.randomUUID()}.db`;
+  // Use os.tmpdir() instead of hardcoded /tmp for cross-platform support (#708)
+  const tmpPath = path.join(os.tmpdir(), `browse-cookies-${browserName.toLowerCase()}-${crypto.randomUUID()}.db`);
  try {
    fs.copyFileSync(dbPath, tmpPath);
    // Also copy WAL and SHM if they exist (for consistent reads)
@@ -0,0 +1,58 @@
+/**
+ * Shared error-handling utilities for browse server and CLI.
+ *
+ * Each wrapper uses selective catches (checks err.code) to avoid masking
+ * unexpected errors. Empty catches would be flagged by slop-scan.
+ */
+
+import * as fs from 'fs';
+
+const IS_WINDOWS = process.platform === 'win32';
+
+// ─── Filesystem ────────────────────────────────────────────────
+
+/** Remove a file, ignoring ENOENT (already gone). Rethrows other errors. */
+export function safeUnlink(filePath: string): void {
+  try {
+    fs.unlinkSync(filePath);
+  } catch (err: any) {
+    if (err?.code !== 'ENOENT') throw err;
+  }
+}
+
+/** Remove a file, ignoring ALL errors. Use only in best-effort cleanup (shutdown, emergency). */
+export function safeUnlinkQuiet(filePath: string): void {
+  try { fs.unlinkSync(filePath); } catch {}
+}
+
+// ─── Process ───────────────────────────────────────────────────
+
+/** Send a signal to a process, ignoring ESRCH (already dead). Rethrows other errors. */
+export function safeKill(pid: number, signal: NodeJS.Signals | number): void {
+  try {
+    process.kill(pid, signal);
+  } catch (err: any) {
+    if (err?.code !== 'ESRCH') throw err;
+  }
+}
+
+/** Check if a PID is alive. Pure boolean probe — returns false for ALL errors. */
+export function isProcessAlive(pid: number): boolean {
+  if (IS_WINDOWS) {
+    try {
+      const result = Bun.spawnSync(
+        ['tasklist', '/FI', `PID eq ${pid}`, '/NH', '/FO', 'CSV'],
+        { stdout: 'pipe', stderr: 'pipe', timeout: 3000 }
+      );
+      return result.stdout.toString().includes(`"${pid}"`);
+    } catch {
+      return false;
+    }
+  }
+  try {
+    process.kill(pid, 0);
+    return true;
+  } catch {
+    return false;
+  }
+}
@@ -248,8 +248,9 @@ export async function handleMetaCommand(
      try {
        commands = JSON.parse(jsonStr);
        if (!Array.isArray(commands)) throw new Error('not array');
-      } catch {
+      } catch (err: any) {
        // Fallback: pipe-delimited format "goto url | click @e5 | snapshot -ic"
+        if (!(err instanceof SyntaxError) && err?.message !== 'not array') throw err;
        commands = jsonStr.split(' | ')
          .filter(seg => seg.trim().length > 0)
          .map(seg => tokenizePipeSegment(seg.trim()));
@@ -291,7 +292,7 @@ export async function handleMetaCommand(
          } else {
            // Parse error from JSON result
            let errMsg = cr.result;
-            try { errMsg = JSON.parse(cr.result).error || cr.result; } catch {}
+            try { errMsg = JSON.parse(cr.result).error || cr.result; } catch (err: any) { if (!(err instanceof SyntaxError)) throw err; }
            results.push(`[${name}] ERROR: ${errMsg}`);
          }
          lastWasWrite = WRITE_COMMANDS.has(name);
@@ -431,8 +432,9 @@ export async function handleMetaCommand(
            execSync(`osascript -e 'tell application "${appName}" to activate'`, { stdio: 'pipe', timeout: 3000 });
            activated = true;
            break;
-          } catch {
-            // Try next browser
+          } catch (err: any) {
+            // Try next browser — osascript fails if app not found or AppleScript errors
+            if (err?.status === undefined && !err?.message?.includes('Command failed')) throw err;
          }
        }

@@ -448,8 +450,9 @@ export async function handleMetaCommand(
              await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
              return `Browser activated. Scrolled ${args[0]} into view.`;
            }
-          } catch {
-            // Ref not found — still activated the browser
+          } catch (err: any) {
+            // Ref not found or element gone — still activated the browser
+            if (!err?.message?.includes('not found') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('timeout')) throw err;
          }
        }

@@ -491,7 +494,9 @@ export async function handleMetaCommand(
      let gitRoot: string;
      try {
        gitRoot = execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
-      } catch {
+      } catch (err: any) {
+        // execSync throws with exit status on non-git directories
+        if (err?.status === undefined && !err?.message?.includes('Command failed')) throw err;
        return 'Not in a git repository — cannot locate inbox.';
      }

@@ -514,8 +519,9 @@ export async function handleMetaCommand(
            url: data.page?.url || 'unknown',
            userMessage: data.userMessage || '',
          });
-        } catch {
-          // Skip malformed files
+        } catch (err: any) {
+          // Skip malformed JSON or unreadable files
+          if (!(err instanceof SyntaxError) && err?.code !== 'ENOENT' && err?.code !== 'EACCES') throw err;
        }
      }

@@ -537,7 +543,7 @@ export async function handleMetaCommand(
      // Handle --clear flag
      if (args.includes('--clear')) {
        for (const file of files) {
-          try { fs.unlinkSync(path.join(inboxDir, file)); } catch {}
+          try { fs.unlinkSync(path.join(inboxDir, file)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
        }
        lines.push(`Cleared ${files.length} message${files.length === 1 ? '' : 's'}.`);
      }
@@ -647,6 +653,116 @@ export async function handleMetaCommand(
      return `Switched to frame: ${frame.url()}`;
    }

+    // ─── UX Audit ─────────────────────────────────────
+    case 'ux-audit': {
+      const page = bm.getPage();
+
+      // Extract page structure for UX behavioral analysis
+      // Agent interprets the data and applies Krug's 6 usability tests
+      // Uses textContent (not innerText) to avoid layout computation on large DOMs
+      const data = await page.evaluate(() => {
+        const HEADING_CAP = 50;
+        const INTERACTIVE_CAP = 200;
+        const TEXT_BLOCK_CAP = 50;
+
+        // Site ID: logo or brand element
+        const logoEl = document.querySelector('[class*="logo"], [id*="logo"], header img, [aria-label*="home"], a[href="/"]');
+        const siteId = logoEl ? {
+          found: true,
+          text: (logoEl.textContent || '').trim().slice(0, 100),
+          tag: logoEl.tagName,
+          alt: (logoEl as HTMLImageElement).alt || null,
+        } : { found: false, text: null, tag: null, alt: null };
+
+        // Page name: main heading
+        const h1 = document.querySelector('h1');
+        const pageName = h1 ? {
+          found: true,
+          text: h1.textContent?.trim().slice(0, 200) || '',
+        } : { found: false, text: null };
+
+        // Navigation: primary nav elements
+        const navEls = document.querySelectorAll('nav, [role="navigation"]');
+        const navItems: Array<{ text: string; links: number }> = [];
+        navEls.forEach((nav, i) => {
+          if (i >= 5) return;
+          const links = nav.querySelectorAll('a');
+          navItems.push({
+            text: (nav.getAttribute('aria-label') || `nav-${i}`).slice(0, 50),
+            links: links.length,
+          });
+        });
+
+        // "You are here" indicator: current/active nav items
+        // Scoped to nav containers to avoid false positives from animation classes
+        const activeNavItems = document.querySelectorAll('nav [aria-current], nav .active, nav .current, [role="navigation"] [aria-current], [role="navigation"] .active, [role="navigation"] .current');
+        const youAreHere = Array.from(activeNavItems).slice(0, 5).map(el => ({
+          text: (el.textContent || '').trim().slice(0, 50),
+          tag: el.tagName,
+        }));
+
+        // Search: search box presence
+        const searchEl = document.querySelector('input[type="search"], [role="search"], input[name*="search"], input[placeholder*="search" i], input[aria-label*="search" i]');
+        const search = { found: !!searchEl };
+
+        // Breadcrumbs
+        const breadcrumbEl = document.querySelector('[aria-label*="breadcrumb" i], .breadcrumb, .breadcrumbs, [class*="breadcrumb"]');
+        const breadcrumbs = breadcrumbEl ? {
+          found: true,
+          items: Array.from(breadcrumbEl.querySelectorAll('a, span, li')).slice(0, 10).map(el => (el.textContent || '').trim().slice(0, 30)),
+        } : { found: false, items: [] };
+
+        // Headings: heading hierarchy
+        const headings = Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).slice(0, HEADING_CAP).map(h => ({
+          tag: h.tagName,
+          text: (h.textContent || '').trim().slice(0, 80),
+          size: getComputedStyle(h).fontSize,
+        }));
+
+        // Interactive elements: buttons, links, inputs
+        const interactiveEls = Array.from(document.querySelectorAll('a, button, input, select, textarea, [role="button"], [tabindex]')).slice(0, INTERACTIVE_CAP);
+        const interactive = interactiveEls.map(el => {
+          const rect = el.getBoundingClientRect();
+          return {
+            tag: el.tagName,
+            text: (el.textContent || (el as HTMLInputElement).placeholder || '').trim().slice(0, 50),
+            type: (el as HTMLInputElement).type || null,
+            role: el.getAttribute('role'),
+            w: Math.round(rect.width),
+            h: Math.round(rect.height),
+            visible: rect.width > 0 && rect.height > 0,
+          };
+        }).filter(el => el.visible);
+
+        // Text blocks: paragraphs and large text areas
+        const textBlocks = Array.from(document.querySelectorAll('p, [class*="description"], [class*="intro"], [class*="welcome"], [class*="hero"] p, main p')).slice(0, TEXT_BLOCK_CAP).map(el => ({
+          text: (el.textContent || '').trim().slice(0, 200),
+          wordCount: (el.textContent || '').trim().split(/\s+/).filter(Boolean).length,
+        }));
+
+        // Total visible text word count (textContent avoids layout computation)
+        const bodyText = (document.body?.textContent || '').trim();
+        const totalWords = bodyText.split(/\s+/).filter(Boolean).length;
+
+        return {
+          url: window.location.href,
+          title: document.title,
+          siteId,
+          pageName,
+          navigation: navItems,
+          youAreHere,
+          search,
+          breadcrumbs,
+          headings,
+          interactive,
+          textBlocks,
+          totalWords,
+        };
+      });
+
+      return JSON.stringify(data, null, 2);
+    }
+
    default:
      throw new Error(`Unknown meta command: ${command}`);
  }
@@ -33,7 +33,26 @@ const TEMP_ONLY = [TEMP_DIR].map(d => {
 export function validateOutputPath(filePath: string): void {
  const resolved = path.resolve(filePath);

-  // Resolve real path of the parent directory to catch symlinks.
+  // If the target already exists and is a symlink, resolve through it.
+  // Without this, a symlink at /tmp/evil.png → /etc/crontab passes the
+  // parent-directory check (parent is /tmp, which is safe) but the actual
+  // write follows the symlink to /etc/crontab.
+  try {
+    const stat = fs.lstatSync(resolved);
+    if (stat.isSymbolicLink()) {
+      const realTarget = fs.realpathSync(resolved);
+      const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realTarget, dir));
+      if (!isSafe) {
+        throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
+      }
+      return; // symlink target verified, no need to check parent
+    }
+  } catch (e: any) {
+    // ENOENT = file doesn't exist yet, fall through to parent-dir check
+    if (e.code !== 'ENOENT') throw e;
+  }
+
+  // For new files (no existing symlink), verify the parent directory.
  // The file itself may not exist yet (e.g., screenshot output).
  // This also handles macOS /tmp → /private/tmp transparently.
  let dir = path.dirname(resolved);
@@ -6,6 +6,7 @@
 */

 import type { TabSession } from './tab-session';
+import type { BrowserManager } from './browser-manager';
 import { consoleBuffer, networkBuffer, dialogBuffer } from './buffers';
 import type { Page, Frame } from 'playwright';
 import * as fs from 'fs';
@@ -49,7 +50,7 @@ function wrapForEvaluate(code: string): string {
 * Exported for DRY reuse in meta-commands (diff).
 */
 export async function getCleanText(page: Page | Frame): Promise<string> {
-  return await page.evaluate(() => {
+  return page.evaluate(() => {
    const body = document.body;
    if (!body) return '';
    const clone = body.cloneNode(true) as HTMLElement;
@@ -62,10 +63,43 @@ export async function getCleanText(page: Page | Frame): Promise<string> {
  });
 }

+/**
+ * When cookies have been imported for specific domains, block JS execution
+ * on pages whose origin doesn't match any imported cookie domain.
+ * Prevents cross-origin cookie exfiltration via `js document.cookie` or
+ * similar when the agent navigates to an untrusted page.
+ */
+function assertJsOriginAllowed(bm: BrowserManager, pageUrl: string): void {
+  if (!bm.hasCookieImports()) return;
+
+  let hostname: string;
+  try {
+    hostname = new URL(pageUrl).hostname;
+  } catch {
+    return; // about:blank, data: URIs — allow (no cookies at risk)
+  }
+
+  const importedDomains = bm.getCookieImportedDomains();
+  const allowed = [...importedDomains].some(domain => {
+    // Exact match or subdomain match (e.g., ".github.com" matches "api.github.com")
+    const normalized = domain.startsWith('.') ? domain : '.' + domain;
+    return hostname === domain.replace(/^\./, '') || hostname.endsWith(normalized);
+  });
+
+  if (!allowed) {
+    throw new Error(
+      `JS execution blocked: current page (${hostname}) does not match any cookie-imported domain. ` +
+      `Imported cookies for: ${[...importedDomains].join(', ')}. ` +
+      `This prevents cross-origin cookie exfiltration. Navigate to an imported domain or run without imported cookies.`
+    );
+  }
+}
+
 export async function handleReadCommand(
  command: string,
  args: string[],
-  session: TabSession
+  session: TabSession,
+  bm?: BrowserManager,
 ): Promise<string> {
  const page = session.getPage();
  // Frame-aware target for content extraction
@@ -73,7 +107,7 @@ export async function handleReadCommand(

  switch (command) {
    case 'text': {
-      return await getCleanText(target);
+      return getCleanText(target);
    }

    case 'html': {
@@ -81,9 +115,9 @@ export async function handleReadCommand(
      if (selector) {
        const resolved = await session.resolveRef(selector);
        if ('locator' in resolved) {
-          return await resolved.locator.innerHTML({ timeout: 5000 });
+          return resolved.locator.innerHTML({ timeout: 5000 });
        }
-        return await target.locator(resolved.selector).innerHTML({ timeout: 5000 });
+        return target.locator(resolved.selector).innerHTML({ timeout: 5000 });
      }
      // page.content() is page-only; use evaluate for frame compat
      const doctype = await target.evaluate(() => {
@@ -116,7 +150,10 @@ export async function handleReadCommand(
              id: input.id || undefined,
              placeholder: input.placeholder || undefined,
              required: input.required || undefined,
-              value: input.type === 'password' ? '[redacted]' : (input.value || undefined),
+              value: input.type === 'password'
+                || (input.name && /(^|[_.-])(token|secret|key|password|credential|auth|jwt|session|csrf|sid)($|[_.-])|api.?key/i.test(input.name))
+                || (input.id && /(^|[_.-])(token|secret|key|password|credential|auth|jwt|session|csrf|sid)($|[_.-])|api.?key/i.test(input.id))
+                ? '[redacted]' : (input.value || undefined),
              options: el.tagName === 'SELECT'
                ? [...(el as HTMLSelectElement).options].map(o => ({ value: o.value, text: o.text }))
                : undefined,
@@ -142,6 +179,7 @@ export async function handleReadCommand(
    case 'js': {
      const expr = args[0];
      if (!expr) throw new Error('Usage: browse js <expression>');
+      if (bm) assertJsOriginAllowed(bm, page.url());
      const wrapped = wrapForEvaluate(expr);
      const result = await target.evaluate(wrapped);
      return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
@@ -150,6 +188,7 @@ export async function handleReadCommand(
    case 'eval': {
      const filePath = args[0];
      if (!filePath) throw new Error('Usage: browse eval <js-file>');
+      if (bm) assertJsOriginAllowed(bm, page.url());
      validateReadPath(filePath);
      if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
      const code = fs.readFileSync(filePath, 'utf-8');
@@ -35,9 +35,11 @@ import {
 import { validateTempPath } from './path-security';
 import { resolveConfig, ensureStateDir, readVersionHash } from './config';
 import { emitActivity, subscribe, getActivityAfter, getActivityHistory, getSubscriberCount } from './activity';
+import { initAuditLog, writeAuditEntry } from './audit';
 import { inspectElement, modifyStyle, resetModifications, getModificationHistory, detachSession, type InspectorResult } from './cdp-inspector';
 // Bun.spawn used instead of child_process.spawn (compiled bun binaries
 // fail posix_spawn on all executables including /bin/bash)
+import { safeUnlink, safeUnlinkQuiet, safeKill } from './error-handling';
 import * as fs from 'fs';
 import * as net from 'net';
 import * as path from 'path';
@@ -46,6 +48,7 @@ import * as crypto from 'crypto';
 // ─── Config ─────────────────────────────────────────────────────
 const config = resolveConfig();
 ensureStateDir(config);
+initAuditLog(config.auditLog);

 // ─── Auth ───────────────────────────────────────────────────────
 const AUTH_TOKEN = crypto.randomUUID();
@@ -233,7 +236,9 @@ function findBrowseBin(): string {
    path.join(process.env.HOME || '', '.claude', 'skills', 'gstack', 'browse', 'dist', 'browse'),
  ];
  for (const c of candidates) {
-    try { if (fs.existsSync(c)) return c; } catch {}
+    try { if (fs.existsSync(c)) return c; } catch (err: any) {
+      if (err?.code !== 'ENOENT') throw err;
+    }
  }
  return 'browse'; // fallback to PATH
 }
@@ -265,13 +270,17 @@ function findClaudeBin(): string | null {
      const p = proc.stdout.toString().trim();
      if (p) candidates.unshift(p);
    }
-  } catch {}
+  } catch (err: any) {
+    if (err?.code !== 'ENOENT') throw err;
+  }
  for (const c of candidates) {
    try {
      if (!fs.existsSync(c)) continue;
      // Resolve symlinks — posix_spawn can fail on symlinks in compiled bun binaries
      return fs.realpathSync(c);
-    } catch {}
+    } catch (err: any) {
+      if (err?.code !== 'ENOENT') throw err;
+    }
  }
  return null;
 }
@@ -465,8 +474,8 @@ function listSessions(): Array<SidebarSession & { chatLines: number }> {
      try {
        const session = JSON.parse(fs.readFileSync(path.join(SESSIONS_DIR, d, 'session.json'), 'utf-8'));
        let chatLines = 0;
-        try { chatLines = fs.readFileSync(path.join(SESSIONS_DIR, d, 'chat.jsonl'), 'utf-8').split('\n').filter(Boolean).length; } catch {
-          // Expected: no chat file yet
+        try { chatLines = fs.readFileSync(path.join(SESSIONS_DIR, d, 'chat.jsonl'), 'utf-8').split('\n').filter(Boolean).length; } catch (err: any) {
+          if (err?.code !== 'ENOENT') throw err;
        }
        return { ...session, chatLines };
      } catch { return null; }
@@ -602,7 +611,9 @@ function spawnClaude(userMessage: string, extensionUrl?: string | null, forTabId
  try {
    fs.mkdirSync(gstackDir, { recursive: true, mode: 0o700 });
    fs.appendFileSync(agentQueue, entry + '\n');
-    try { fs.chmodSync(agentQueue, 0o600); } catch {}
+    try { fs.chmodSync(agentQueue, 0o600); } catch (err: any) {
+      if (err?.code !== 'ENOENT') throw err;
+    }
  } catch (err: any) {
    addChatEntry({ ts: new Date().toISOString(), role: 'agent', type: 'agent_error', error: `Failed to queue: ${err.message}` });
    agentStatus = 'idle';
@@ -617,12 +628,11 @@ function spawnClaude(userMessage: string, extensionUrl?: string | null, forTabId

 function killAgent(targetTabId?: number | null): void {
  if (agentProcess) {
-    try { agentProcess.kill('SIGTERM'); } catch (err: any) {
-      console.warn('[browse] Failed to SIGTERM agent:', err.message);
+    const pid = agentProcess.pid;
+    if (pid) {
+      safeKill(pid, 'SIGTERM');
+      setTimeout(() => { safeKill(pid, 'SIGKILL'); }, 3000);
    }
-    setTimeout(() => { try { agentProcess?.kill('SIGKILL'); } catch (err: any) {
-      console.warn('[browse] Failed to SIGKILL agent:', err.message);
-    } }, 3000);
  }
  // Signal the sidebar-agent worker to cancel via a per-tab cancel file.
  // Using per-tab files prevents race conditions where one agent's cancel
@@ -631,7 +641,12 @@ function killAgent(targetTabId?: number | null): void {
  const cancelDir = path.join(process.env.HOME || '/tmp', '.gstack');
  const tabId = targetTabId ?? agentTabId ?? 0;
  const cancelFile = path.join(cancelDir, `sidebar-agent-cancel-${tabId}`);
-  try { fs.writeFileSync(cancelFile, Date.now().toString()); } catch {}
+  try {
+    fs.mkdirSync(cancelDir, { recursive: true });
+    fs.writeFileSync(cancelFile, Date.now().toString());
+  } catch (err: any) {
+    if (err?.code !== 'EACCES' && err?.code !== 'ENOENT') throw err;
+  }
  agentProcess = null;
  agentStartTime = null;
  currentMessage = null;
@@ -1000,7 +1015,7 @@ async function handleCommandInternal(
          await cleanupHiddenMarkers(page);
        }
      } else {
-        result = await handleReadCommand(command, args, session);
+        result = await handleReadCommand(command, args, session, browserManager);
      }
    } else if (WRITE_COMMANDS.has(command)) {
      result = await handleWriteCommand(command, args, session, browserManager);
@@ -1075,13 +1090,14 @@ async function handleCommandInternal(
    }

    // Activity: emit command_end (skipped for chain subcommands)
+    const successDuration = Date.now() - startTime;
    if (!opts?.skipActivity) {
      emitActivity({
        type: 'command_end',
        command,
        args,
        url: browserManager.getCurrentUrl(),
-        duration: Date.now() - startTime,
+        duration: successDuration,
        status: 'ok',
        result: result,
        tabs: browserManager.getTabCount(),
@@ -1090,6 +1106,17 @@ async function handleCommandInternal(
      });
    }

+    writeAuditEntry({
+      ts: new Date().toISOString(),
+      cmd: command,
+      args: args.join(' '),
+      origin: browserManager.getCurrentUrl(),
+      durationMs: successDuration,
+      status: 'ok',
+      hasCookies: browserManager.hasCookieImports(),
+      mode: browserManager.getConnectionMode(),
+    });
+
    browserManager.resetFailures();
    // Restore original active tab if we pinned to a specific one
    if (savedTabId !== null) {
@@ -1107,13 +1134,14 @@ async function handleCommandInternal(
    }

    // Activity: emit command_end (error) — skipped for chain subcommands
+    const errorDuration = Date.now() - startTime;
    if (!opts?.skipActivity) {
      emitActivity({
        type: 'command_end',
        command,
        args,
        url: browserManager.getCurrentUrl(),
-        duration: Date.now() - startTime,
+        duration: errorDuration,
        status: 'error',
        error: err.message,
        tabs: browserManager.getTabCount(),
@@ -1122,6 +1150,18 @@ async function handleCommandInternal(
      });
    }

+    writeAuditEntry({
+      ts: new Date().toISOString(),
+      cmd: command,
+      args: args.join(' '),
+      origin: browserManager.getCurrentUrl(),
+      durationMs: errorDuration,
+      status: 'error',
+      error: err.message,
+      hasCookies: browserManager.hasCookieImports(),
+      mode: browserManager.getConnectionMode(),
+    });
+
    browserManager.incrementFailures();
    let errorMsg = wrapError(err);
    const hint = browserManager.getFailureHint();
@@ -1175,15 +1215,11 @@ async function shutdown() {
  // Clean up Chromium profile locks (prevent SingletonLock on next launch)
  const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
  for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
-    try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch (err: any) {
-      console.debug('[browse] Lock cleanup:', lockFile, err.message);
-    }
+    safeUnlinkQuiet(path.join(profileDir, lockFile));
  }

  // Clean up state file
-  try { fs.unlinkSync(config.stateFile); } catch (err: any) {
-    console.debug('[browse] State file cleanup:', err.message);
-  }
+  safeUnlinkQuiet(config.stateFile);

  process.exit(0);
 }
@@ -1195,9 +1231,7 @@ process.on('SIGINT', shutdown);
 // Defense-in-depth — primary cleanup is the CLI's stale-state detection via health check.
 if (process.platform === 'win32') {
  process.on('exit', () => {
-    try { fs.unlinkSync(config.stateFile); } catch {
-      // Best-effort on exit
-    }
+    safeUnlinkQuiet(config.stateFile);
  });
 }

@@ -1216,13 +1250,9 @@ function emergencyCleanup() {
  // Clean Chromium profile locks
  const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
  for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
-    try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch (err: any) {
-      console.debug('[browse] Emergency lock cleanup:', lockFile, err.message);
-    }
-  }
-  try { fs.unlinkSync(config.stateFile); } catch (err: any) {
-    console.debug('[browse] Emergency state cleanup:', err.message);
+    safeUnlinkQuiet(path.join(profileDir, lockFile));
  }
+  safeUnlinkQuiet(config.stateFile);
 }
 process.on('uncaughtException', (err) => {
  console.error('[browse] FATAL uncaught exception:', err.message);
@@ -1238,15 +1268,9 @@ process.on('unhandledRejection', (err: any) => {
 // ─── Start ─────────────────────────────────────────────────────
 async function start() {
  // Clear old log files
-  try { fs.unlinkSync(CONSOLE_LOG_PATH); } catch (err: any) {
-    if (err.code !== 'ENOENT') console.debug('[browse] Log cleanup console:', err.message);
-  }
-  try { fs.unlinkSync(NETWORK_LOG_PATH); } catch (err: any) {
-    if (err.code !== 'ENOENT') console.debug('[browse] Log cleanup network:', err.message);
-  }
-  try { fs.unlinkSync(DIALOG_LOG_PATH); } catch (err: any) {
-    if (err.code !== 'ENOENT') console.debug('[browse] Log cleanup dialog:', err.message);
-  }
+  safeUnlink(CONSOLE_LOG_PATH);
+  safeUnlink(NETWORK_LOG_PATH);
+  safeUnlink(DIALOG_LOG_PATH);

  const port = await findPort();

@@ -1282,15 +1306,11 @@ async function start() {
          const slug = process.env.GSTACK_SLUG || 'unknown';
          const homeDir = process.env.HOME || process.env.USERPROFILE || '/tmp';
          const projectWelcome = `${homeDir}/.gstack/projects/${slug}/designs/welcome-page-20260331/finalized.html`;
-          try { if (require('fs').existsSync(projectWelcome)) return projectWelcome; } catch (err: any) {
-            console.warn('[browse] Error checking project welcome page:', err.message);
-          }
+          if (fs.existsSync(projectWelcome)) return projectWelcome;
          // Fallback: built-in welcome page from gstack install
          const skillRoot = process.env.GSTACK_SKILL_ROOT || `${homeDir}/.claude/skills/gstack`;
          const builtinWelcome = `${skillRoot}/browse/src/welcome.html`;
-          try { if (require('fs').existsSync(builtinWelcome)) return builtinWelcome; } catch (err: any) {
-            console.warn('[browse] Error checking builtin welcome page:', err.message);
-          }
+          if (fs.existsSync(builtinWelcome)) return builtinWelcome;
          return null;
        })();
        if (welcomePath) {
@@ -1814,8 +1834,9 @@ async function start() {
        chatBuffer = [];
        chatNextId = 0;
        if (sidebarSession) {
-          try { fs.writeFileSync(path.join(SESSIONS_DIR, sidebarSession.id, 'chat.jsonl'), '', { mode: 0o600 }); } catch (err: any) {
-            console.error('[browse] Failed to clear chat file:', err.message);
+          const chatFile = path.join(SESSIONS_DIR, sidebarSession.id, 'chat.jsonl');
+          try { fs.writeFileSync(chatFile, '', { mode: 0o600 }); } catch (err: any) {
+            if (err?.code !== 'ENOENT') console.error('[browse] Failed to clear chat file:', err.message);
          }
        }
        return new Response(JSON.stringify({ ok: true }), { status: 200, headers: { 'Content-Type': 'application/json' } });
@@ -12,6 +12,7 @@
 import { spawn } from 'child_process';
 import * as fs from 'fs';
 import * as path from 'path';
+import { safeUnlink } from './error-handling';

 const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
 const KILL_FILE = path.join(path.dirname(QUEUE), 'sidebar-agent-kill');
@@ -290,7 +291,7 @@ async function askClaude(queueEntry: QueueEntry): Promise<void> {

    // Clear any stale cancel signal for this tab before starting
    const cancelFile = cancelFileForTab(tid);
-    try { fs.unlinkSync(cancelFile); } catch {}
+    safeUnlink(cancelFile);

    const proc = spawn('claude', claudeArgs, {
      stdio: ['pipe', 'pipe', 'pipe'],
@@ -321,12 +322,12 @@ async function askClaude(queueEntry: QueueEntry): Promise<void> {
      try {
        if (fs.existsSync(cancelFile)) {
          console.log(`[sidebar-agent] Cancel signal received for tab ${tid} — killing claude subprocess`);
-          try { proc.kill('SIGTERM'); } catch {}
-          setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 3000);
+          try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
+          setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
          fs.unlinkSync(cancelFile);
          clearInterval(cancelCheck);
        }
-      } catch {}
+      } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
    }, 500);

    let buffer = '';
@@ -385,7 +386,7 @@ async function askClaude(queueEntry: QueueEntry): Promise<void> {
      try { proc.kill('SIGTERM'); } catch (killErr: any) {
        console.warn(`[sidebar-agent] Tab ${tid}: Failed to kill timed-out process:`, killErr.message);
      }
-      setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 3000);
+      setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
      const timeoutMsg = stderrBuffer.trim()
        ? `Timed out after ${timeoutMs / 1000}s\nstderr: ${stderrBuffer.trim().slice(-500)}`
        : `Timed out after ${timeoutMs / 1000}s`;
@@ -464,8 +465,8 @@ function pollKillFile(): void {
      if (activeProcs.size > 0) {
        console.log(`[sidebar-agent] Kill signal received — terminating ${activeProcs.size} active agent(s)`);
        for (const [tid, proc] of activeProcs) {
-          try { proc.kill('SIGTERM'); } catch {}
-          setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 2000);
+          try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
+          setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 2000);
          processingTabs.delete(tid);
        }
        activeProcs.clear();
@@ -480,7 +481,7 @@ async function main() {
  const dir = path.dirname(QUEUE);
  fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
  if (!fs.existsSync(QUEUE)) fs.writeFileSync(QUEUE, '', { mode: 0o600 });
-  try { fs.chmodSync(QUEUE, 0o600); } catch {}
+  try { fs.chmodSync(QUEUE, 0o600); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }

  lastLine = countLines();
  await refreshToken();
@@ -39,6 +39,7 @@ interface SnapshotOptions {
  annotate?: boolean;          // -a / --annotate: annotated screenshot
  outputPath?: string;         // -o / --output: path for annotated screenshot
  cursorInteractive?: boolean; // -C / --cursor-interactive: scan cursor:pointer etc.
+  heatmap?: string;            // -H / --heatmap: JSON color map for ref overlays
 }

 /**
@@ -64,6 +65,7 @@ export const SNAPSHOT_FLAGS: Array<{
  { short: '-a', long: '--annotate', description: 'Annotated screenshot with red overlay boxes and ref labels', optionKey: 'annotate' },
  { short: '-o', long: '--output', description: 'Output path for annotated screenshot (default: <temp>/browse-annotated.png)', takesValue: true, valueHint: '<path>', optionKey: 'outputPath' },
  { short: '-C', long: '--cursor-interactive', description: 'Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used.', optionKey: 'cursorInteractive' },
+  { short: '-H', long: '--heatmap', description: 'Color-coded overlay screenshot from JSON map: \'{"@e1":"green","@e3":"red"}\'. Valid colors: green, yellow, red, blue, orange, gray.', takesValue: true, valueHint: '<json>', optionKey: 'heatmap' },
 ];

 interface ParsedNode {
@@ -331,7 +333,9 @@ export async function handleSnapshot(
          output.push(`@${ref} [${elem.reason}] "${elem.text}"`);
        }
      }
-    } catch {
+    } catch (err: any) {
+      // Cursor scan fails on pages with strict CSP or when page has navigated
+      if (!err?.message?.includes('Execution context') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Content Security')) throw err;
      output.push('');
      output.push('(cursor scan failed — CSP restriction)');
    }
@@ -355,7 +359,7 @@ export async function handleSnapshot(
      const nodeFs = require('fs') as typeof import('fs');
      const absolute = nodePath.resolve(screenshotPath);
      const safeDirs = [TEMP_DIR, process.cwd()].map((d: string) => {
-        try { return nodeFs.realpathSync(d); } catch { return d; }
+        try { return nodeFs.realpathSync(d); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; return d; }
      });
      let realPath: string;
      try {
@@ -365,7 +369,8 @@ export async function handleSnapshot(
          try {
            const dir = nodeFs.realpathSync(nodePath.dirname(absolute));
            realPath = nodePath.join(dir, nodePath.basename(absolute));
-          } catch {
+          } catch (err2: any) {
+            if (err2?.code !== 'ENOENT') throw err2;
            realPath = absolute;
          }
        } else {
@@ -385,8 +390,9 @@ export async function handleSnapshot(
          if (box) {
            boxes.push({ ref: `@${ref}`, box });
          }
-        } catch {
-          // Element may be offscreen or hidden — skip
+        } catch (err: any) {
+          // Element may be offscreen, hidden, or page navigated — skip
+          if (!err?.message?.includes('Timeout') && !err?.message?.includes('timeout') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context')) throw err;
        }
      }

@@ -418,13 +424,134 @@ export async function handleSnapshot(

      output.push('');
      output.push(`[annotated screenshot: ${screenshotPath}]`);
-    } catch {
-      // Remove overlays even on screenshot failure
+    } catch (err: any) {
+      // Remove overlays even on screenshot failure — but only swallow page/browser errors
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context') && !err?.message?.includes('screenshot')) throw err;
      try {
        await page.evaluate(() => {
          document.querySelectorAll('.__browse_annotation__').forEach(el => el.remove());
        });
+      } catch (err2: any) {
+        if (!err2?.message?.includes('closed') && !err2?.message?.includes('Target') && !err2?.message?.includes('Execution context')) throw err2;
+      }
+    }
+  }
+
+  // ─── Heatmap mode (-H) ──────────────────────────────────────
+  if (opts.heatmap) {
+    const heatmapPath = opts.outputPath || `${TEMP_DIR}/browse-heatmap.png`;
+    // Validate output path
+    {
+      const nodePath = require('path') as typeof import('path');
+      const nodeFs = require('fs') as typeof import('fs');
+      const absolute = nodePath.resolve(heatmapPath);
+      const safeDirs = [TEMP_DIR, process.cwd()].map((d: string) => {
+        try { return nodeFs.realpathSync(d); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; return d; }
+      });
+      let realPath: string;
+      try {
+        realPath = nodeFs.realpathSync(absolute);
+      } catch (err: any) {
+        if (err.code === 'ENOENT') {
+          try {
+            const dir = nodeFs.realpathSync(nodePath.dirname(absolute));
+            realPath = nodePath.join(dir, nodePath.basename(absolute));
+          } catch (err2: any) {
+            if (err2?.code !== 'ENOENT') throw err2;
+            realPath = absolute;
+          }
+        } else {
+          throw new Error(`Cannot resolve real path: ${heatmapPath} (${err.code})`);
+        }
+      }
+      if (!safeDirs.some((dir: string) => isPathWithin(realPath, dir))) {
+        throw new Error(`Path must be within: ${safeDirs.join(', ')}`);
+      }
+    }
+
+    // Parse and validate color map
+    const VALID_COLORS = new Set(['green', 'yellow', 'red', 'blue', 'orange', 'gray']);
+    const COLOR_MAP: Record<string, { border: string; bg: string }> = {
+      green:  { border: '#00b400', bg: 'rgba(0,180,0,0.15)' },
+      yellow: { border: '#ffb400', bg: 'rgba(255,180,0,0.15)' },
+      red:    { border: '#ff0000', bg: 'rgba(255,0,0,0.15)' },
+      blue:   { border: '#0066ff', bg: 'rgba(0,102,255,0.15)' },
+      orange: { border: '#ff6600', bg: 'rgba(255,102,0,0.15)' },
+      gray:   { border: '#888888', bg: 'rgba(136,136,136,0.15)' },
+    };
+
+    let colorAssignments: Record<string, string>;
+    try {
+      const parsed = JSON.parse(opts.heatmap);
+      if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
+        throw new Error('not an object');
+      }
+      colorAssignments = parsed;
+    } catch {
+      throw new Error('Invalid heatmap JSON. Expected object: \'{"@e1":"green","@e3":"red"}\'');
+    }
+
+    // Validate colors
+    for (const [ref, color] of Object.entries(colorAssignments)) {
+      if (!VALID_COLORS.has(color)) {
+        throw new Error(`Invalid heatmap color "${color}" for ${ref}. Valid: ${[...VALID_COLORS].join(', ')}`);
+      }
+    }
+
+    try {
+      const boxes: Array<{ ref: string; box: { x: number; y: number; width: number; height: number }; color: string }> = [];
+      for (const [refKey, color] of Object.entries(colorAssignments)) {
+        const cleanRef = refKey.startsWith('@') ? refKey.slice(1) : refKey;
+        const entry = refMap.get(cleanRef);
+        if (!entry) continue; // Skip refs not found on page
+        try {
+          const box = await entry.locator.boundingBox({ timeout: 1000 });
+          if (box) {
+            const colors = COLOR_MAP[color] || COLOR_MAP.gray;
+            boxes.push({ ref: `@${cleanRef}`, box, color: JSON.stringify(colors) });
+          }
+        } catch {
+          // Element may be offscreen or hidden — skip
+        }
+      }
+
+      await page.evaluate((boxes) => {
+        for (const { ref, box, color } of boxes) {
+          const colors = JSON.parse(color);
+          const overlay = document.createElement('div');
+          overlay.className = '__browse_heatmap__';
+          overlay.style.cssText = `
+            position: absolute; top: ${box.y}px; left: ${box.x}px;
+            width: ${box.width}px; height: ${box.height}px;
+            border: 2px solid ${colors.border}; background: ${colors.bg};
+            pointer-events: none; z-index: 99999;
+            font-size: 10px; color: ${colors.border}; font-weight: bold;
+          `;
+          const label = document.createElement('span');
+          label.textContent = ref;
+          label.style.cssText = `position: absolute; top: -14px; left: 0; background: ${colors.border}; color: white; padding: 0 3px; font-size: 10px;`;
+          overlay.appendChild(label);
+          document.body.appendChild(overlay);
+        }
+      }, boxes);
+
+      await page.screenshot({ path: heatmapPath, fullPage: true });
+
+      // Remove heatmap overlays
+      await page.evaluate(() => {
+        document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove());
+      });
+
+      output.push('');
+      output.push(`[heatmap screenshot: ${heatmapPath}]`);
+    } catch (err: any) {
+      // Cleanup on failure
+      try {
+        await page.evaluate(() => {
+          document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove());
+        });
      } catch {}
+      if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context') && !err?.message?.includes('screenshot')) throw err;
    }
  }

@@ -7,6 +7,8 @@ export const BLOCKED_METADATA_HOSTS = new Set([
  '169.254.169.254',  // AWS/GCP/Azure instance metadata
  'fe80::1',          // IPv6 link-local — common metadata endpoint alias
  '::ffff:169.254.169.254', // IPv4-mapped IPv6 form of the metadata IP
+  '::ffff:a9fe:a9fe', // Hex-encoded IPv4-mapped form (URL constructor normalizes to this)
+  '::a9fe:a9fe',      // Deprecated IPv4-compatible hex form
  'metadata.google.internal', // GCP metadata
  'metadata.azure.internal',  // Azure IMDS
 ]);
@@ -13,7 +13,8 @@ import { validateNavigationUrl } from './url-validation';
 import { validateOutputPath } from './path-security';
 import * as fs from 'fs';
 import * as path from 'path';
-import { TEMP_DIR } from './platform';
+import { TEMP_DIR, isPathWithin } from './platform';
+import { SAFE_DIRECTORIES } from './path-security';
 import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';

 /**
@@ -399,7 +400,7 @@ export async function handleWriteCommand(
        if (!fs.existsSync(fp)) throw new Error(`File not found: ${fp}`);
        if (path.isAbsolute(fp)) {
          let resolvedFp: string;
-          try { resolvedFp = fs.realpathSync(path.resolve(fp)); } catch { resolvedFp = path.resolve(fp); }
+          try { resolvedFp = fs.realpathSync(path.resolve(fp)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; resolvedFp = path.resolve(fp); }
          if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedFp, dir))) {
            throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
          }
@@ -441,21 +442,22 @@ export async function handleWriteCommand(
    case 'cookie-import': {
      const filePath = args[0];
      if (!filePath) throw new Error('Usage: browse cookie-import <json-file>');
-      // Path validation — prevent reading arbitrary files
-      if (path.isAbsolute(filePath)) {
-        const safeDirs = [TEMP_DIR, process.cwd()];
-        const resolved = path.resolve(filePath);
-        if (!safeDirs.some(dir => isPathWithin(resolved, dir))) {
-          throw new Error(`Path must be within: ${safeDirs.join(', ')}`);
-        }
+      // Path validation — resolve to absolute and check against safe dirs.
+      // Fixes #707: relative paths previously bypassed the safe directory check.
+      // Mirrors validateOutputPath() — resolves symlinks (e.g., macOS /tmp → /private/tmp).
+      const resolved = path.resolve(filePath);
+      let resolvedReal = resolved;
+      try { resolvedReal = fs.realpathSync(resolved); } catch {
+        // File may not exist yet — resolve parent dir instead
+        try { resolvedReal = path.join(fs.realpathSync(path.dirname(resolved)), path.basename(resolved)); } catch {}
      }
-      if (path.normalize(filePath).includes('..')) {
-        throw new Error('Path traversal sequences (..) are not allowed');
+      if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedReal, dir))) {
+        throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
      }
      if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
      const raw = fs.readFileSync(filePath, 'utf-8');
      let cookies: any[];
-      try { cookies = JSON.parse(raw); } catch { throw new Error(`Invalid JSON in ${filePath}`); }
+      try { cookies = JSON.parse(raw); } catch (err: any) { throw new Error(`Invalid JSON in ${filePath}: ${err?.message || err}`); }
      if (!Array.isArray(cookies)) throw new Error('Cookie file must contain a JSON array');

      // Auto-fill domain from current page URL when missing (consistent with cookie command)
@@ -476,20 +478,24 @@ export async function handleWriteCommand(
      }

      await page.context().addCookies(cookies);
+      const importedDomains = [...new Set(cookies.map((c: any) => c.domain).filter(Boolean))];
+      if (importedDomains.length > 0) bm.trackCookieImportDomains(importedDomains);
      return `Loaded ${cookies.length} cookies from ${filePath}`;
    }

    case 'cookie-import-browser': {
      // Two modes:
      // 1. Direct CLI import: cookie-import-browser <browser> --domain <domain> [--profile <profile>]
-      // 2. Open picker UI: cookie-import-browser [browser]
+      //    Requires --domain (or --all to explicitly import everything).
+      // 2. Open picker UI: cookie-import-browser [browser] (interactive domain selection)
      const browserArg = args[0];
      const domainIdx = args.indexOf('--domain');
      const profileIdx = args.indexOf('--profile');
+      const hasAll = args.includes('--all');
      const profile = (profileIdx !== -1 && profileIdx + 1 < args.length) ? args[profileIdx + 1] : 'Default';

      if (domainIdx !== -1 && domainIdx + 1 < args.length) {
-        // Direct import mode — no UI
+        // Direct import mode — scoped to specific domain
        const domain = args[domainIdx + 1];
        // Validate --domain against current page hostname to prevent cross-site cookie injection
        const pageHostname = new URL(page.url()).hostname;
@@ -501,13 +507,35 @@ export async function handleWriteCommand(
        const result = await importCookies(browser, [domain], profile);
        if (result.cookies.length > 0) {
          await page.context().addCookies(result.cookies);
+          bm.trackCookieImportDomains([domain]);
        }
        const msg = [`Imported ${result.count} cookies for ${domain} from ${browser}`];
        if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
        return msg.join(' ');
      }

-      // Picker UI mode — open in user's browser
+      if (hasAll) {
+        // Explicit all-cookies import — requires --all flag as a deliberate opt-in.
+        // Imports every non-expired cookie domain from the browser.
+        const browser = browserArg || 'comet';
+        const { listDomains } = await import('./cookie-import-browser');
+        const { domains } = listDomains(browser, profile);
+        const allDomainNames = domains.map((d: any) => d.domain);
+        if (allDomainNames.length === 0) {
+          return `No cookies found in ${browser} (profile: ${profile})`;
+        }
+        const result = await importCookies(browser, allDomainNames, profile);
+        if (result.cookies.length > 0) {
+          await page.context().addCookies(result.cookies);
+          bm.trackCookieImportDomains(allDomainNames);
+        }
+        const msg = [`Imported ${result.count} cookies across ${Object.keys(result.domainCounts).length} domains from ${browser}`];
+        msg.push('(used --all: all browser cookies imported, consider --domain for tighter scoping)');
+        if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
+        return msg.join(' ');
+      }
+
+      // Picker UI mode — open in user's browser for interactive domain selection
      const port = bm.serverPort;
      if (!port) throw new Error('Server port not available');

@@ -520,11 +548,12 @@ export async function handleWriteCommand(
      const pickerUrl = `http://127.0.0.1:${port}/cookie-picker?code=${code}`;
      try {
        Bun.spawn(['open', pickerUrl], { stdout: 'ignore', stderr: 'ignore' });
-      } catch {
-        // open may fail silently — URL is in the message below
+      } catch (err: any) {
+        // open may fail on non-macOS or if 'open' binary is missing — URL is in the message below
+        if (err?.code !== 'ENOENT' && !err?.message?.includes('spawn')) throw err;
      }

-      return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.`;
+      return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.\n\nTip: For scripted imports, use --domain <domain> to scope cookies to a single domain.`;
    }

    case 'style': {
@@ -606,7 +635,10 @@ export async function handleWriteCommand(
                (el as HTMLElement).style.setProperty('display', 'none', 'important');
                removed++;
              });
-            } catch {}
+            } catch (err: any) {
+              // querySelectorAll throws DOMException on invalid CSS selectors — skip those
+              if (!(err instanceof DOMException)) throw err;
+            }
          }
          return removed;
        }, selectors);
@@ -815,7 +847,9 @@ export async function handleWriteCommand(
              document.querySelectorAll(sel).forEach(el => {
                (el as HTMLElement).style.display = 'none';
              });
-            } catch {}
+            } catch (err: any) {
+              if (!(err instanceof DOMException)) throw err;
+            }
          }
          // Also hide fixed/sticky (except nav)
          for (const el of document.querySelectorAll('*')) {
@@ -838,7 +872,9 @@ export async function handleWriteCommand(
              document.querySelectorAll(sel).forEach(el => {
                (el as HTMLElement).style.display = 'none';
              });
-            } catch {}
+            } catch (err: any) {
+              if (!(err instanceof DOMException)) throw err;
+            }
          }
        }, hideSelectors);
      }
@@ -950,13 +986,13 @@ export async function handleWriteCommand(
              reader.onerror = () => reject('Failed to read blob');
              reader.readAsDataURL(blob);
            });
-          } catch {
-            return 'ERROR:EXPIRED';
+          } catch (err: any) {
+            return `ERROR:EXPIRED:${err?.message || 'unknown'}`;
          }
        }, url);

        if (dataUrl === 'ERROR:TOO_LARGE') throw new Error('Blob too large (>100MB). Use a different approach.');
-        if (dataUrl === 'ERROR:EXPIRED') throw new Error('Blob URL expired or inaccessible.');
+        if (dataUrl.startsWith('ERROR:EXPIRED')) throw new Error(`Blob URL expired or inaccessible: ${dataUrl.slice('ERROR:EXPIRED:'.length)}`);

        const match = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
        if (!match) throw new Error('Failed to decode blob data');
@@ -0,0 +1,28 @@
+import { describe, test, expect } from 'bun:test';
+import { execSync } from 'child_process';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const DIST_DIR = path.resolve(__dirname, '..', 'dist');
+const SERVER_NODE = path.join(DIST_DIR, 'server-node.mjs');
+
+describe('build: server-node.mjs', () => {
+  test('passes node --check if present', () => {
+    if (!fs.existsSync(SERVER_NODE)) {
+      // browse/dist is gitignored; no build has run in this checkout.
+      // Skip rather than fail so plain `bun test` without a prior build passes.
+      return;
+    }
+    expect(() => execSync(`node --check ${SERVER_NODE}`, { stdio: 'pipe' })).not.toThrow();
+  });
+
+  test('does not inline @ngrok/ngrok (must be external)', () => {
+    if (!fs.existsSync(SERVER_NODE)) return;
+    const bundle = fs.readFileSync(SERVER_NODE, 'utf-8');
+    // Dynamic imports of externalized packages show up as string literals in the bundle,
+    // not as inlined module code. The heuristic: ngrok's native binding loader would
+    // reference its own internals. If any ngrok internal identifier appears, the module
+    // got inlined despite the --external flag.
+    expect(bundle).not.toMatch(/ngrok_napi|ngrokNapi|@ngrok\/ngrok-darwin|@ngrok\/ngrok-linux|@ngrok\/ngrok-win32/);
+  });
+});
@@ -1811,7 +1811,8 @@ describe('Path traversal prevention', () => {
      await handleWriteCommand('cookie-import', ['../../etc/shadow'], bm);
      expect(true).toBe(false);
    } catch (err: any) {
-      expect(err.message).toContain('Path traversal');
+      // Traversal blocked by safe-directory check (#707) or explicit .. check
+      expect(err.message).toMatch(/Path must be within|Path traversal/);
    }
  });

@@ -0,0 +1,47 @@
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import { safeUnlink, safeKill, isProcessAlive } from '../src/error-handling';
+
+describe('safeUnlink', () => {
+  test('removes an existing file', () => {
+    const tmp = path.join(os.tmpdir(), `test-safeUnlink-${Date.now()}`);
+    fs.writeFileSync(tmp, 'hello');
+    safeUnlink(tmp);
+    expect(fs.existsSync(tmp)).toBe(false);
+  });
+
+  test('ignores ENOENT (file does not exist)', () => {
+    expect(() => safeUnlink('/tmp/nonexistent-file-' + Date.now())).not.toThrow();
+  });
+
+  test('rethrows non-ENOENT errors', () => {
+    // Attempt to unlink a directory — throws EPERM/EISDIR
+    const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'test-safeUnlink-'));
+    expect(() => safeUnlink(dir)).toThrow();
+    fs.rmdirSync(dir);
+  });
+});
+
+describe('safeKill', () => {
+  test('sends signal to a running process', () => {
+    // signal 0 is a no-op existence check — safe to send to self
+    expect(() => safeKill(process.pid, 0)).not.toThrow();
+  });
+
+  test('ignores ESRCH (process does not exist)', () => {
+    // PID 99999999 is extremely unlikely to exist
+    expect(() => safeKill(99999999, 0)).not.toThrow();
+  });
+});
+
+describe('isProcessAlive', () => {
+  test('returns true for current process', () => {
+    expect(isProcessAlive(process.pid)).toBe(true);
+  });
+
+  test('returns false for non-existent process', () => {
+    expect(isProcessAlive(99999999)).toBe(false);
+  });
+});
@@ -14,6 +14,10 @@ allowed-tools:
  - Write
  - Glob
  - AskUserQuestion
+triggers:
+  - monitor after deploy
+  - canary check
+  - watch for errors post-deploy
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -257,6 +261,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -375,6 +381,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -538,7 +557,7 @@ plan's living status.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -14,6 +14,10 @@ allowed-tools:
  - Write
  - Glob
  - AskUserQuestion
+triggers:
+  - monitor after deploy
+  - canary check
+  - watch for errors post-deploy
 ---

 {{PREAMBLE}}
@@ -7,6 +7,10 @@ description: |
  User can override each warning. Use when touching prod, debugging live systems,
  or working in a shared environment. Use when asked to "be careful", "safety mode",
  "prod mode", or "careful mode". (gstack)
+triggers:
+  - be careful
+  - warn before destructive
+  - safety mode
 allowed-tools:
  - Bash
  - Read
@@ -7,6 +7,10 @@ description: |
  User can override each warning. Use when touching prod, debugging live systems,
  or working in a shared environment. Use when asked to "be careful", "safety mode",
  "prod mode", or "careful mode". (gstack)
+triggers:
+  - be careful
+  - warn before destructive
+  - safety mode
 allowed-tools:
  - Bash
  - Read
@@ -17,6 +17,10 @@ allowed-tools:
  - Glob
  - Grep
  - AskUserQuestion
+triggers:
+  - save progress
+  - checkpoint this
+  - resume where i left off
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -260,6 +264,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -378,6 +384,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -17,6 +17,10 @@ allowed-tools:
  - Glob
  - Grep
  - AskUserQuestion
+triggers:
+  - save progress
+  - checkpoint this
+  - resume where i left off
 ---

 {{PREAMBLE}}
@@ -9,6 +9,10 @@ description: |
  The "200 IQ autistic developer" second opinion. Use when asked to "codex review",
  "codex challenge", "ask codex", "second opinion", or "consult codex". (gstack)
  Voice triggers (speech-to-text aliases): "code x", "code ex", "get another opinion".
+triggers:
+  - codex review
+  - second opinion
+  - outside voice challenge
 allowed-tools:
  - Bash
  - Read
@@ -259,6 +263,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -377,6 +383,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -12,6 +12,10 @@ voice-triggers:
  - "code x"
  - "code ex"
  - "get another opinion"
+triggers:
+  - codex review
+  - second opinion
+  - outside voice challenge
 allowed-tools:
  - Bash
  - Read
@@ -3,6 +3,10 @@ name: gstack-contrib-add-host
 description: |
  Contributor-only skill: create a new host config for gstack's multi-host system.
  NOT installed for end users. Only usable from the gstack source repo.
+triggers:
+  - add new host
+  - create host config
+  - contribute new agent host
 ---

 # /gstack-contrib-add-host — Add a New Host
@@ -19,6 +19,10 @@ allowed-tools:
  - Agent
  - WebSearch
  - AskUserQuestion
+triggers:
+  - security audit
+  - check for vulnerabilities
+  - owasp review
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -262,6 +266,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -380,6 +386,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -537,6 +556,8 @@ Then write a `## GSTACK REVIEW REPORT` section to the end of the plan file:
 file you are allowed to edit in plan mode. The plan file review report is part of the
 plan's living status.

+
+
 # /cso — Chief Security Officer Audit (v2)

 You are a **Chief Security Officer** who has led incident response on real breaches and testified before boards about security posture. You think like an attacker but report like a defender. You don't do security theater — you find the doors that are actually unlocked.
@@ -1199,6 +1220,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Important Rules

 - **Think like an attacker, report like a defender.** Show the exploit path, then the fix.
@@ -25,10 +25,16 @@ allowed-tools:
  - Agent
  - WebSearch
  - AskUserQuestion
+triggers:
+  - security audit
+  - check for vulnerabilities
+  - owasp review
 ---

 {{PREAMBLE}}

+{{GBRAIN_CONTEXT_LOAD}}
+
 # /cso — Chief Security Officer Audit (v2)

 You are a **Chief Security Officer** who has led incident response on real breaches and testified before boards about security posture. You think like an attacker but report like a defender. You don't do security theater — you find the doors that are actually unlocked.
@@ -609,6 +615,8 @@ If `.gstack/` is not in `.gitignore`, note it in findings — security reports s

 {{LEARNINGS_LOG}}

+{{GBRAIN_SAVE_RESULTS}}
+
 ## Important Rules

 - **Think like an attacker, report like a defender.** Show the exploit path, then the fix.
@@ -19,6 +19,10 @@ allowed-tools:
  - Grep
  - AskUserQuestion
  - WebSearch
+triggers:
+  - design system
+  - create a brand
+  - design from scratch
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -262,6 +266,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -380,6 +386,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -603,7 +622,7 @@ If the codebase is empty and purpose is unclear, say: *"I don't have a clear pic
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -686,6 +705,8 @@ If `DESIGN_NOT_AVAILABLE`: Phase 5 falls back to the HTML preview page (still go

 ---

+
+
 ## Prior Learnings

 Search for relevant learnings from previous sessions:
@@ -1253,6 +1274,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Important Rules

 1. **Propose, don't present menus.** You are a consultant, not a form. Make opinionated recommendations based on the product context, then let the user adjust.
@@ -19,6 +19,10 @@ allowed-tools:
  - Grep
  - AskUserQuestion
  - WebSearch
+triggers:
+  - design system
+  - create a brand
+  - design from scratch
 ---

 {{PREAMBLE}}
@@ -79,6 +83,8 @@ If `DESIGN_NOT_AVAILABLE`: Phase 5 falls back to the HTML preview page (still go

 ---

+{{GBRAIN_CONTEXT_LOAD}}
+
 {{LEARNINGS_SEARCH}}

 ## Phase 1: Product Context
@@ -423,6 +429,8 @@ After shipping DESIGN.md, if the session produced screen-level mockups or page l

 {{LEARNINGS_LOG}}

+{{GBRAIN_SAVE_RESULTS}}
+
 ## Important Rules

 1. **Propose, don't present menus.** You are a consultant, not a form. Make opinionated recommendations based on the product context, then let the user adjust.
@@ -12,6 +12,10 @@ description: |
  "build me a page", "implement this design", or after any planning skill.
  Proactively suggest when user has approved a design or has a plan ready. (gstack)
  Voice triggers (speech-to-text aliases): "build the design", "code the mockup", "make it real".
+triggers:
+  - build the design
+  - code the mockup
+  - make design real
 allowed-tools:
  - Bash
  - Read
@@ -264,6 +268,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -382,6 +388,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -589,13 +608,98 @@ MUST be saved to `~/.gstack/projects/$SLUG/designs/`, NEVER to `.context/`,
 `docs/designs/`, `/tmp/`, or any project-local directory. Design artifacts are USER
 data, not project files. They persist across branches, conversations, and workspaces.

+## UX Principles: How Users Actually Behave
+
+These principles govern how real humans interact with interfaces. They are observed
+behavior, not preferences. Apply them before, during, and after every design decision.
+
+### The Three Laws of Usability
+
+1. **Don't make me think.** Every page should be self-evident. If a user stops
+   to think "What do I click?" or "What does this mean?", the design has failed.
+   Self-evident > self-explanatory > requires explanation.
+
+2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks
+   beat one click that requires thought. Each step should feel like an obvious
+   choice (animal, vegetable, or mineral), not a puzzle.
+
+3. **Omit, then omit again.** Get rid of half the words on each page, then get
+   rid of half of what's left. Happy talk (self-congratulatory text) must die.
+   Instructions must die. If they need reading, the design has failed.
+
+### How Users Actually Behave
+
+- **Users scan, they don't read.** Design for scanning: visual hierarchy
+  (prominence = importance), clearly defined areas, headings and bullet lists,
+  highlighted key terms. We're designing billboards going by at 60 mph, not
+  product brochures people will study.
+- **Users satisfice.** They pick the first reasonable option, not the best.
+  Make the right choice the most visible choice.
+- **Users muddle through.** They don't figure out how things work. They wing
+  it. If they accomplish their goal by accident, they won't seek the "right" way.
+  Once they find something that works, no matter how badly, they stick to it.
+- **Users don't read instructions.** They dive in. Guidance must be brief,
+  timely, and unavoidable, or it won't be seen.
+
+### Billboard Design for Interfaces
+
+- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass.
+  Don't innovate on navigation to be clever. Innovate when you KNOW you have a
+  better idea, otherwise use conventions. Even across languages and cultures,
+  web conventions let people identify the logo, nav, search, and main content.
+- **Visual hierarchy is everything.** Related things are visually grouped. Nested
+  things are visually contained. More important = more prominent. If everything
+  shouts, nothing is heard. Start with the assumption everything is visual noise,
+  guilty until proven innocent.
+- **Make clickable things obviously clickable.** No relying on hover states for
+  discoverability, especially on mobile where hover doesn't exist. Shape, location,
+  and formatting (color, underlining) must signal clickability without interaction.
+- **Eliminate noise.** Three sources: too many things shouting for attention
+  (shouting), things not organized logically (disorganization), and too much stuff
+  (clutter). Fix noise by removal, not addition.
+- **Clarity trumps consistency.** If making something significantly clearer
+  requires making it slightly inconsistent, choose clarity every time.
+
+### Navigation as Wayfinding
+
+Users on the web have no sense of scale, direction, or location. Navigation
+must always answer: What site is this? What page am I on? What are the major
+sections? What are my options at this level? Where am I? How can I search?
+
+Persistent navigation on every page. Breadcrumbs for deep hierarchies.
+Current section visually indicated. The "trunk test": cover everything except
+the navigation. You should still know what site this is, what page you're on,
+and what the major sections are. If not, the navigation has failed.
+
+### The Goodwill Reservoir
+
+Users start with a reservoir of goodwill. Every friction point depletes it.
+
+**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing
+users for not doing things your way (formatting requirements on phone numbers).
+Asking for unnecessary information. Putting sizzle in their way (splash screens,
+forced tours, interstitials). Unprofessional or sloppy appearance.
+
+**Replenish:** Know what users want to do and make it obvious. Tell them what they
+want to know upfront. Save them steps wherever possible. Make it easy to recover
+from errors. When in doubt, apologize.
+
+### Mobile: Same Rules, Higher Stakes
+
+All the above applies on mobile, just more so. Real estate is scarce, but never
+sacrifice usability for space savings. Affordances must be VISIBLE: no cursor
+means no hover-to-discover. Touch targets must be big enough (44px minimum).
+Flat design can strip away useful visual information that signals interactivity.
+Prioritize ruthlessly: things needed in a hurry go close at hand, everything
+else a few taps away with an obvious path to get there.
+
 ## SETUP (run this check BEFORE any browse command)

 ```bash
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -15,6 +15,10 @@ voice-triggers:
  - "build the design"
  - "code the mockup"
  - "make it real"
+triggers:
+  - build the design
+  - code the mockup
+  - make design real
 allowed-tools:
  - Bash
  - Read
@@ -37,6 +41,8 @@ around obstacles.

 {{DESIGN_SETUP}}

+{{UX_PRINCIPLES}}
+
 {{BROWSE_SETUP}}

 ---
@@ -19,6 +19,10 @@ allowed-tools:
  - Grep
  - AskUserQuestion
  - WebSearch
+triggers:
+  - visual design audit
+  - design qa
+  - fix design issues
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -262,6 +266,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -380,6 +386,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -555,6 +574,8 @@ Then write a `## GSTACK REVIEW REPORT` section to the end of the plan file:
 file you are allowed to edit in plan mode. The plan file review report is part of the
 plan's living status.

+
+
 # /design-review: Design Audit → Fix → Verify

 You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.
@@ -610,7 +631,7 @@ After the user chooses, execute their choice (commit or stash), then continue wi
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -894,6 +915,91 @@ matches a past learning, display:
 This makes the compounding visible. The user should see that gstack is getting
 smarter on their codebase over time.

+## UX Principles: How Users Actually Behave
+
+These principles govern how real humans interact with interfaces. They are observed
+behavior, not preferences. Apply them before, during, and after every design decision.
+
+### The Three Laws of Usability
+
+1. **Don't make me think.** Every page should be self-evident. If a user stops
+   to think "What do I click?" or "What does this mean?", the design has failed.
+   Self-evident > self-explanatory > requires explanation.
+
+2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks
+   beat one click that requires thought. Each step should feel like an obvious
+   choice (animal, vegetable, or mineral), not a puzzle.
+
+3. **Omit, then omit again.** Get rid of half the words on each page, then get
+   rid of half of what's left. Happy talk (self-congratulatory text) must die.
+   Instructions must die. If they need reading, the design has failed.
+
+### How Users Actually Behave
+
+- **Users scan, they don't read.** Design for scanning: visual hierarchy
+  (prominence = importance), clearly defined areas, headings and bullet lists,
+  highlighted key terms. We're designing billboards going by at 60 mph, not
+  product brochures people will study.
+- **Users satisfice.** They pick the first reasonable option, not the best.
+  Make the right choice the most visible choice.
+- **Users muddle through.** They don't figure out how things work. They wing
+  it. If they accomplish their goal by accident, they won't seek the "right" way.
+  Once they find something that works, no matter how badly, they stick to it.
+- **Users don't read instructions.** They dive in. Guidance must be brief,
+  timely, and unavoidable, or it won't be seen.
+
+### Billboard Design for Interfaces
+
+- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass.
+  Don't innovate on navigation to be clever. Innovate when you KNOW you have a
+  better idea, otherwise use conventions. Even across languages and cultures,
+  web conventions let people identify the logo, nav, search, and main content.
+- **Visual hierarchy is everything.** Related things are visually grouped. Nested
+  things are visually contained. More important = more prominent. If everything
+  shouts, nothing is heard. Start with the assumption everything is visual noise,
+  guilty until proven innocent.
+- **Make clickable things obviously clickable.** No relying on hover states for
+  discoverability, especially on mobile where hover doesn't exist. Shape, location,
+  and formatting (color, underlining) must signal clickability without interaction.
+- **Eliminate noise.** Three sources: too many things shouting for attention
+  (shouting), things not organized logically (disorganization), and too much stuff
+  (clutter). Fix noise by removal, not addition.
+- **Clarity trumps consistency.** If making something significantly clearer
+  requires making it slightly inconsistent, choose clarity every time.
+
+### Navigation as Wayfinding
+
+Users on the web have no sense of scale, direction, or location. Navigation
+must always answer: What site is this? What page am I on? What are the major
+sections? What are my options at this level? Where am I? How can I search?
+
+Persistent navigation on every page. Breadcrumbs for deep hierarchies.
+Current section visually indicated. The "trunk test": cover everything except
+the navigation. You should still know what site this is, what page you're on,
+and what the major sections are. If not, the navigation has failed.
+
+### The Goodwill Reservoir
+
+Users start with a reservoir of goodwill. Every friction point depletes it.
+
+**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing
+users for not doing things your way (formatting requirements on phone numbers).
+Asking for unnecessary information. Putting sizzle in their way (splash screens,
+forced tours, interstitials). Unprofessional or sloppy appearance.
+
+**Replenish:** Know what users want to do and make it obvious. Tell them what they
+want to know upfront. Save them steps wherever possible. Make it easy to recover
+from errors. When in doubt, apologize.
+
+### Mobile: Same Rules, Higher Stakes
+
+All the above applies on mobile, just more so. Real estate is scarce, but never
+sacrifice usability for space savings. Affordances must be VISIBLE: no cursor
+means no hover-to-discover. Touch targets must be big enough (44px minimum).
+Flat design can strip away useful visual information that signals interactivity.
+Prioritize ruthlessly: things needed in a hurry go close at hand, everything
+else a few taps away with an obvious path to get there.
+
 ## Phases 1-6: Design Audit Baseline

 ## Modes
@@ -928,9 +1034,13 @@ The most uniquely designer-like output. Form a gut reaction before analyzing any
 3. Write the **First Impression** using this structured critique format:
   - "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?)
   - "I notice **[observation]**." (what stands out, positive or negative — be specific)
-   - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?)
+   - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these the 3 things the designer intended? If not, the visual hierarchy is lying.)
   - "If I had to describe this in one word: **[word]**." (gut verdict)

+**Narration mode:** Write this section in first person, as if you are a user scanning the page for the first time. "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely, then... wait, is that a button?" Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually scanning, you're generating platitudes.
+
+**Page Area Test:** Point at each clearly defined area of the page. Can you instantly name its purpose? ("Things I can buy," "Today's deals," "How to search.") Areas you can't name in 2 seconds are poorly defined. List them.
+
 This is the section users read first. Be opinionated. A designer doesn't hedge — they react.

 ---
@@ -986,6 +1096,19 @@ $B url
 ```
 If URL contains `/login`, `/signin`, `/auth`, or `/sso`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run `/setup-browser-cookies` first if needed."

+### Trunk Test (run on every page)
+
+Imagine being dropped on this page with no context. Can you immediately answer:
+1. What site is this? (Site ID visible and identifiable)
+2. What page am I on? (Page name prominent, matches what I clicked)
+3. What are the major sections? (Primary nav visible and clear)
+4. What are my options at this level? (Local nav or content choices obvious)
+5. Where am I in the scheme of things? ("You are here" indicator, breadcrumbs)
+6. How can I search? (Search box findable without hunting)
+
+Score: PASS (all 6 clear) / PARTIAL (4-5 clear) / FAIL (3 or fewer clear).
+A FAIL on the trunk test is a HIGH-impact finding regardless of how polished the visual design is.
+
 ### Design Audit Checklist (10 categories, ~80 items)

 Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
@@ -1054,6 +1177,7 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish
 - Success: confirmation animation or color, auto-dismiss
 - Touch targets >= 44px on all interactive elements
 - `cursor: pointer` on all clickable elements
+- Mindless choice audit: every decision point (button, link, dropdown, modal choice) is a mindless click (obvious what happens). If a click requires thought about whether it's the right choice, flag as HIGH.

 **6. Responsive Design** (8 items)
 - Mobile layout makes *design* sense (not just stacked desktop columns)
@@ -1082,6 +1206,9 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish
 - Active voice ("Install the CLI" not "The CLI will be installed")
 - Loading states end with `…` ("Saving…" not "Saving...")
 - Destructive actions have confirmation modal or undo window
+- Happy talk detection: scan for introductory paragraphs that start with "Welcome to..." or tell users how great the site is. If you can hear "blah blah blah", it's happy talk. Flag for removal.
+- Instructions detection: any visible instructions longer than one sentence. If users need to read instructions, the design has failed. Flag the instructions AND the interaction they're compensating for.
+- Happy talk word count: count total visible words on the page. Classify each text block as "useful content" vs "happy talk" (welcome paragraphs, self-congratulatory text, instructions nobody reads). Report: "This page has X words. Y (Z%) are happy talk."

 **9. AI Slop Detection** (10 anti-patterns — the blacklist)

@@ -1124,6 +1251,43 @@ Evaluate:
 - **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
 - **Form polish:** Focus states visible? Validation timing correct? Errors near the source?

+**Narration mode:** Narrate the flow in first person. "I click 'Sign Up'... spinner appears... 3 seconds pass... still spinning... I'm getting nervous. Finally the dashboard loads, but where am I? The nav doesn't highlight anything." Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually experiencing the flow, you're generating platitudes.
+
+### Goodwill Reservoir (track across the flow)
+
+As you walk the user flow, maintain a mental goodwill meter (starts at 70/100).
+These scores are heuristic, not measured. The value is in identifying specific
+drains and fills, not in the final number.
+
+Subtract points for:
+- Hidden information the user would want (pricing, contact, shipping): subtract 15
+- Format punishment (rejecting valid input like dashes in phone numbers): subtract 10
+- Unnecessary information requests: subtract 10
+- Interstitials, splash screens, forced tours blocking the task: subtract 15
+- Sloppy or unprofessional appearance: subtract 10
+- Ambiguous choices that require thinking: subtract 5 each
+
+Add points for:
+- Top user tasks are obvious and prominent: add 10
+- Upfront about costs and limitations: add 5
+- Saves steps (direct links, smart defaults, autofill): add 5 each
+- Graceful error recovery with specific fix instructions: add 10
+- Apologizes when things go wrong: add 5
+
+Report the final goodwill score with a visual dashboard:
+
+```
+Goodwill: 70 ████████████████████░░░░░░░░░░
+  Step 1: Login page        70 → 75  (+5 obvious primary action)
+  Step 2: Dashboard          75 → 60  (-15 interstitial tour popup)
+  Step 3: Settings           60 → 50  (-10 format punishment on phone)
+  Step 4: Billing            50 → 35  (-15 hidden pricing info)
+  FINAL: 35/100 ⚠️ CRITICAL UX DEBT
+```
+
+Below 30 = critical UX debt. 30-60 = needs work. Above 60 = healthy.
+Include the biggest drains and fills as specific findings.
+
 ---

 ## Phase 5: Cross-Page Consistency
@@ -1281,6 +1445,10 @@ Tie everything to user goals and product objectives. Always suggest specific imp
 - One job per section
 - "If deleting 30% of the copy improves it, keep deleting"
 - Cards earn their existence — no decorative card grids
+- NEVER use small, low-contrast type (body text < 16px or contrast ratio < 4.5:1 on body text)
+- NEVER put labels inside form fields as the only label (placeholder-as-label pattern — labels must be visible when the field has content)
+- ALWAYS preserve visited vs unvisited link distinction (visited links must have a different color)
+- NEVER float headings between paragraphs (heading must be visually closer to the section it introduces than to the preceding section)

 **AI Slop blacklist** (the 10 patterns that scream "AI-generated"):
 1. Purple/violet/indigo gradient backgrounds or blue-to-purple color schemes
@@ -1585,6 +1753,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Additional Rules (design-review specific)

 11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.
@@ -19,10 +19,16 @@ allowed-tools:
  - Grep
  - AskUserQuestion
  - WebSearch
+triggers:
+  - visual design audit
+  - design qa
+  - fix design issues
 ---

 {{PREAMBLE}}

+{{GBRAIN_CONTEXT_LOAD}}
+
 # /design-review: Design Audit → Fix → Verify

 You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.
@@ -99,6 +105,8 @@ echo "REPORT_DIR: $REPORT_DIR"

 {{LEARNINGS_SEARCH}}

+{{UX_PRINCIPLES}}
+
 ## Phases 1-6: Design Audit Baseline

 {{DESIGN_METHODOLOGY}}
@@ -291,6 +299,8 @@ If the repo has a `TODOS.md`:

 {{LEARNINGS_LOG}}

+{{GBRAIN_SAVE_RESULTS}}
+
 ## Additional Rules (design-review specific)

 11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.
@@ -9,6 +9,10 @@ description: |
  "visual brainstorm", or "I don't like how this looks".
  Proactively suggest when the user describes a UI feature but hasn't seen
  what it could look like. (gstack)
+triggers:
+  - explore design variants
+  - show me design options
+  - visual design brainstorm
 allowed-tools:
  - Bash
  - Read
@@ -259,6 +263,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -377,6 +383,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -583,6 +602,91 @@ MUST be saved to `~/.gstack/projects/$SLUG/designs/`, NEVER to `.context/`,
 `docs/designs/`, `/tmp/`, or any project-local directory. Design artifacts are USER
 data, not project files. They persist across branches, conversations, and workspaces.

+## UX Principles: How Users Actually Behave
+
+These principles govern how real humans interact with interfaces. They are observed
+behavior, not preferences. Apply them before, during, and after every design decision.
+
+### The Three Laws of Usability
+
+1. **Don't make me think.** Every page should be self-evident. If a user stops
+   to think "What do I click?" or "What does this mean?", the design has failed.
+   Self-evident > self-explanatory > requires explanation.
+
+2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks
+   beat one click that requires thought. Each step should feel like an obvious
+   choice (animal, vegetable, or mineral), not a puzzle.
+
+3. **Omit, then omit again.** Get rid of half the words on each page, then get
+   rid of half of what's left. Happy talk (self-congratulatory text) must die.
+   Instructions must die. If they need reading, the design has failed.
+
+### How Users Actually Behave
+
+- **Users scan, they don't read.** Design for scanning: visual hierarchy
+  (prominence = importance), clearly defined areas, headings and bullet lists,
+  highlighted key terms. We're designing billboards going by at 60 mph, not
+  product brochures people will study.
+- **Users satisfice.** They pick the first reasonable option, not the best.
+  Make the right choice the most visible choice.
+- **Users muddle through.** They don't figure out how things work. They wing
+  it. If they accomplish their goal by accident, they won't seek the "right" way.
+  Once they find something that works, no matter how badly, they stick to it.
+- **Users don't read instructions.** They dive in. Guidance must be brief,
+  timely, and unavoidable, or it won't be seen.
+
+### Billboard Design for Interfaces
+
+- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass.
+  Don't innovate on navigation to be clever. Innovate when you KNOW you have a
+  better idea, otherwise use conventions. Even across languages and cultures,
+  web conventions let people identify the logo, nav, search, and main content.
+- **Visual hierarchy is everything.** Related things are visually grouped. Nested
+  things are visually contained. More important = more prominent. If everything
+  shouts, nothing is heard. Start with the assumption everything is visual noise,
+  guilty until proven innocent.
+- **Make clickable things obviously clickable.** No relying on hover states for
+  discoverability, especially on mobile where hover doesn't exist. Shape, location,
+  and formatting (color, underlining) must signal clickability without interaction.
+- **Eliminate noise.** Three sources: too many things shouting for attention
+  (shouting), things not organized logically (disorganization), and too much stuff
+  (clutter). Fix noise by removal, not addition.
+- **Clarity trumps consistency.** If making something significantly clearer
+  requires making it slightly inconsistent, choose clarity every time.
+
+### Navigation as Wayfinding
+
+Users on the web have no sense of scale, direction, or location. Navigation
+must always answer: What site is this? What page am I on? What are the major
+sections? What are my options at this level? Where am I? How can I search?
+
+Persistent navigation on every page. Breadcrumbs for deep hierarchies.
+Current section visually indicated. The "trunk test": cover everything except
+the navigation. You should still know what site this is, what page you're on,
+and what the major sections are. If not, the navigation has failed.
+
+### The Goodwill Reservoir
+
+Users start with a reservoir of goodwill. Every friction point depletes it.
+
+**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing
+users for not doing things your way (formatting requirements on phone numbers).
+Asking for unnecessary information. Putting sizzle in their way (splash screens,
+forced tours, interstitials). Unprofessional or sloppy appearance.
+
+**Replenish:** Know what users want to do and make it obvious. Tell them what they
+want to know upfront. Save them steps wherever possible. Make it easy to recover
+from errors. When in doubt, apologize.
+
+### Mobile: Same Rules, Higher Stakes
+
+All the above applies on mobile, just more so. Real estate is scarce, but never
+sacrifice usability for space savings. Affordances must be VISIBLE: no cursor
+means no hover-to-discover. Touch targets must be big enough (44px minimum).
+Flat design can strip away useful visual information that signals interactivity.
+Prioritize ruthlessly: things needed in a hurry go close at hand, everything
+else a few taps away with an obvious path to get there.
+
 ## Step 0: Session Detection

 Check for prior design exploration sessions for this project:
@@ -9,6 +9,10 @@ description: |
  "visual brainstorm", or "I don't like how this looks".
  Proactively suggest when the user describes a UI feature but hasn't seen
  what it could look like. (gstack)
+triggers:
+  - explore design variants
+  - show me design options
+  - visual design brainstorm
 allowed-tools:
  - Bash
  - Read
@@ -28,6 +32,8 @@ visual brainstorming, not a review process.

 {{DESIGN_SETUP}}

+{{UX_PRINCIPLES}}
+
 ## Step 0: Session Detection

 Check for prior design exploration sessions for this project:
@@ -49,7 +49,7 @@ export function createSession(
    updatedAt: new Date().toISOString(),
  };

-  fs.writeFileSync(sessionPath(id), JSON.stringify(session, null, 2));
+  fs.writeFileSync(sessionPath(id), JSON.stringify(session, null, 2), { mode: 0o600 });
  return session;
 }

@@ -11,6 +11,10 @@ description: |
  "test the DX", "DX audit", "developer experience test", or "try the
  onboarding". Proactively suggest after shipping a developer-facing feature. (gstack)
  Voice triggers (speech-to-text aliases): "dx audit", "test the developer experience", "try the onboarding", "developer experience test".
+triggers:
+  - live dx audit
+  - test developer experience
+  - measure onboarding time
 allowed-tools:
  - Read
  - Edit
@@ -262,6 +266,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -380,6 +386,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -600,7 +619,7 @@ branch name wherever the instructions say "the base branch" or `<default>`.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -15,6 +15,10 @@ voice-triggers:
  - "test the developer experience"
  - "try the onboarding"
  - "developer experience test"
+triggers:
+  - live dx audit
+  - test developer experience
+  - measure onboarding time
 allowed-tools:
  - Read
  - Edit
@@ -0,0 +1,831 @@
+# GCOMPACTION.md — Design & Architecture (TABLED)
+
+**Target path on approval:** `docs/designs/GCOMPACTION.md`
+
+This is the preserved design artifact for `gstack compact`. Everything above the first `---` divider below gets extracted verbatim to `docs/designs/GCOMPACTION.md` on plan approval. Everything after that divider is archived research (office hours + competitive deep-dive + eng-review notes + codex review + research findings) that informed the design.
+
+---
+
+## Status: TABLED (2026-04-17) — pending Anthropic `updatedBuiltinToolOutput` API
+
+**Why tabled.** The v1 architecture assumed a Claude Code `PostToolUse` hook could REPLACE the tool output that enters the model's context for built-in tools (Bash, Read, Grep, Glob, WebFetch). Research on 2026-04-17 confirmed this is not possible today.
+
+**Evidence:**
+
+1. **Official docs** (https://code.claude.com/docs/en/hooks): The only output-replace field documented for `PostToolUse` is `hookSpecificOutput.updatedMCPToolOutput`, and the docs explicitly state: *"For MCP tools only: replaces the tool's output with the provided value."* No equivalent field exists for built-in tools.
+2. **Anthropic issue [#36843](https://github.com/anthropics/claude-code/issues/36843)** (OPEN): Anthropic themselves acknowledge the gap. *"PostToolUse hooks can replace MCP tool output via `updatedMCPToolOutput`, but there is no equivalent for built-in tools (WebFetch, WebSearch, Bash, Read, etc.)... They can only add warnings via `decision: block` (which injects a reason string) or `additionalContext`. The original malicious content still reaches the model."*
+3. **RTK mechanism** (source-reviewed at `src/hooks/init.rs:906-912` and `hooks/claude/rtk-rewrite.sh:83-100`): RTK is NOT a PostToolUse compactor. It's a **PreToolUse** Bash matcher that rewrites `tool_input.command` (e.g., `git status` → `rtk git status`). The wrapped command produces compact stdout itself. RTK README confirms: *"the hook only runs on Bash tool calls. Claude Code built-in tools like Read, Grep, and Glob do not pass through the Bash hook, so they are not auto-rewritten."* RTK is Bash-only by architectural constraint, not by choice.
+4. **tokenjuice mechanism** (source-reviewed at `src/core/claude-code.ts:160, 491, 540-549`): tokenjuice DOES register `PostToolUse` with `matcher: "Bash"` but has no real output-replace API available — it hijacks `decision: "block"` + `reason` to inject compacted text. Whether this actually reduces model-context tokens or just overlays UI output is disputed. tokenjuice is also Bash-only.
+5. **Read/Grep/Glob execute in-process inside Claude Code** and bypass hooks entirely. Wedge (ii) "native-tool coverage" was architecturally impossible from day one regardless of replacement API.
+
+**Consequence.** Both wedges are dead in their original form:
+- Wedge (i) "Conditional LLM verifier" — still technically possible, but only for Bash output, via PreToolUse command wrapping (RTK's mechanism). The verifier stops being a differentiator once we're also Bash-only.
+- Wedge (ii) "Native-tool coverage" — impossible today. Read/Grep/Glob don't fire hooks. Even if they did, no output-replace field exists.
+
+**Decision.** Shelve `gstack compact` entirely. Track Anthropic issue #36843 for the arrival of `updatedBuiltinToolOutput` (or equivalent). When that API ships, this design doc + the 15 locked decisions below + the research archive at the bottom become the unblocking artifacts for a fresh implementation sprint.
+
+**If un-tabling:** Start from the "Decisions locked during plan-eng-review" block below — most remain valid. Then re-verify the hooks reference against the newly-shipped API, update the Architecture data-flow diagram to use whatever real output-replacement field exists, and re-run `/codex review` against the revised plan before coding.
+
+**What we're NOT doing:**
+- Not shipping a Bash-only PreToolUse wrapper. That's RTK's product; they're at 28K stars and 3 years of rule scars. No wedge.
+- Not shipping the `decision: block` + `reason` hack. Undocumented behavior, Anthropic could break it, and the model may still see the raw output alongside the compacted overlay — context savings are disputed.
+- Not shipping B-series benchmark in isolation. Without a working compactor, there's nothing to benchmark.
+
+**Cost of tabling:** ~0. No code was written. The design doc + research + decisions remain as a ready-to-unblock artifact.
+
+---
+
+## Decisions locked during plan-eng-review (2026-04-17)
+
+Preserved for the un-tabling sprint if/when Anthropic ships the built-in-tool output-replace API.
+
+Summary of every decision made during the engineering review. Full rationale is preserved throughout the sections below; this block is the single source of truth if anything else drifts.
+
+**Scope (Section 0):**
+1. **Claude-first v1.** Ship compact + rules + verifier on Claude Code only. Codex + OpenClaw land at v1.1 after the wedge is proven on the primary host. Cuts ~2 days of host integration and derisks launch. The original "wedge (ii) native-tool coverage" claim applies to Claude Code at v1; we make no cross-host claim until v1.1.
+2. **13-rule launch library.** v1 ships tests (jest/vitest/pytest/cargo-test/go-test/rspec) + git (diff/log/status) + install (npm/pnpm/pip/cargo). Build/lint/log families defer to v1.1, driven by `gstack compact discover` telemetry from real users.
+3. **Verifier default ON at v1.0.** `failureCompaction` trigger (exit≠0 AND >50% reduction) is enabled out of the box. The verifier IS the wedge — defaulting it off hides the differentiating feature. Trigger bounds already keep expected fire rate ≤10% of tool calls.
+
+**Architecture (Section 1):**
+4. **Exact line-match sanitization for Haiku output.** Split raw output by `\n`, put lines in a set, only append lines from Haiku that appear verbatim in that set. Tightest adversarial contract; prompt-injection attempts cannot slip in novel text.
+5. **Layered failureCompaction signal.** Prefer `exitCode` from the envelope; if the host omits it, fall back to `/FAIL|Error|Traceback|panic/` regex on the output. Log which signal fired in `meta.failureSignal` ("exit" | "pattern" | "none"). Pre-implementation task #1 still verifies Claude Code's envelope empirically, but the system no longer breaks if it doesn't.
+6. **Deep-merge rule resolution.** User/project rules inherit built-in fields they don't override. Escape hatch: `"extends": null` in a rule file triggers full replacement semantics. Matches the mental model of eslint/tsconfig/.gitignore — override a piece without losing the rest.
+
+**Code quality (Section 2):**
+7. **Per-rule regex timeout, no RE2 dep.** Run each rule's regex via a 50ms AbortSignal budget; on timeout, skip the rule and record `meta.regexTimedOut: [ruleId]`. Avoids a WASM dependency and keeps rule-author syntax unconstrained.
+8. **Pre-compiled rule bundle.** `gstack compact install` and `gstack compact reload` produce `~/.gstack/compact/rules.bundle.json` (deep-merged, regex-compiled metadata cached). Hook reads that single file instead of parsing N source files.
+9. **Auto-reload on mtime drift.** Hook stats rule source files on startup; if any source file is newer than the bundle, rebuild in-line before applying. Adds ~0.5ms/invocation but eliminates the "I edited a rule and nothing changed" footgun.
+10. **Expanded v1 redaction set.** Tee files redact: AWS keys, GitHub tokens (`ghp_/gho_/ghs_/ghu_`), GitLab tokens (`glpat-`), Slack webhooks, generic JWT (three base64 segments), generic bearer tokens, SSH private-key headers (`-----BEGIN * PRIVATE KEY-----`). Credit cards / SSNs / per-key env-pairs deferred to a full DLP layer in v2.
+
+**Testing (Section 3):**
+11. **P-series gate subset.** v1 gate-tier P-tests: P1 (binary garbage), P3 (empty output), P6 (RTK-killer critical stack frame), P8 (secrets to tee), P15 (hook timeout), P18 (prompt injection), P26 (malformed user rule JSON), P28 (regex DoS), P30 (Haiku hallucination). Remaining 21 P-cases grow R-series as real bugs hit.
+12. **Fixture version-stamping.** Every golden fixture has a `toolVersion:` frontmatter. CI warns when fixture toolVersion ≠ currently installed. No more calendar-based rotation.
+13. **B-series real-world benchmark testbench (hard v1 gate).** New component `compact/benchmark/` scans `~/.claude/projects/**/*.jsonl`, ranks the noisiest tool calls, clusters them into named scenarios, replays the compactor against them, and reports reduction-by-rule-family. v1 cannot ship until B-series on the author's own 30-day corpus shows ≥15% reduction AND zero critical-line loss on planted bugs. Local-only; never uploads. Community-shared corpus is v2.
+
+**Performance (Section 4):**
+14. **Revised latency budgets.** Bun cold-start on macOS ARM is 15-25ms; the original 10ms p50 target was unrealistic. New budgets: <30ms p50 / <80ms p99 on macOS ARM, <20ms p50 / <60ms p99 on Linux (verifier off). Verifier-fires budget stays <600ms p50 / <2s p99. Daemon mode is a v2 option gated on B-series showing cold-start hurts session savings.
+15. **Line-oriented streaming pipeline.** Readline over stdin → filter → group → dedupe → ring-buffered tail truncation → stdout. Any single line >1MB hits P9 (truncate to 1KB with `[... truncated ...]` marker). Caps memory at 64MB regardless of total output size.
+
+Every row above is a `MUST` in the implementation. Drift requires a new eng-review.
+
+---
+
+## Summary
+
+`gstack compact` was designed as a `PostToolUse` hook that reduces tool-output noise before it reaches an AI coding agent's context window. Deterministic JSON rules would shrink noisy test runners, build logs, git diffs, and package installs. A conditional Claude Haiku verifier would act as a safety net when over-compaction risk was high.
+
+**Current status: TABLED.** See "Status" section above. The architecture depends on a Claude Code API (`updatedBuiltinToolOutput` or equivalent for built-in tools) that does not exist as of 2026-04-17. Anthropic issue #36843 tracks the gap.
+
+**Intended goal (preserved for the un-tabling sprint):** 15–30% tool-output token reduction per long session, with zero increase in task-failure rate.
+
+**Original wedge (vs RTK, the 28K-star incumbent) — both invalidated by research:**
+1. ~~**Conditional LLM verifier.**~~ Still technically viable via PreToolUse command wrapping, but only for Bash. Stops being a differentiator once we're Bash-only. Reconsider if the built-in-tool API arrives.
+2. ~~**Native-tool coverage.**~~ Architecturally impossible today. Read/Grep/Glob execute in-process inside Claude Code and do not fire hooks. Even for tools that do fire `PostToolUse`, no output-replacement field exists for non-MCP tools.
+
+**Original positioning (now moot):** *"RTK is fast. gstack compact is fast AND safe, and it covers every tool in your toolbox, not just Bash."*
+
+## Non-goals
+
+- Summarizing user messages or prior agent turns (Claude's own Compaction API owns that).
+- Compressing agent response output (caveman's layer).
+- Caching tool calls to avoid re-execution (token-optimizer-mcp's layer).
+- Acting as a general-purpose log analyzer.
+- Replacing the agent's own judgement about when to re-run a command with `GSTACK_RAW=1`.
+
+## Why this is worth building
+
+**Problem is measured, not hypothetical.**
+
+- [Chroma research (2025)](https://research.trychroma.com/context-rot) tested 18 frontier models. Every model degrades as context grows. Rot starts well before the window limit — a 200K model rots at 50K.
+- Coding agents are the worst case: accumulative context + high distractor density + long task horizon. Tool output is explicitly named as a primary noise source.
+- The market has voted: Anthropic shipped Opus 4.6 Compaction API; OpenAI shipped a compaction guide; Google ADK shipped context compression; LangChain shipped autonomous compression; sst/opencode has built-in compaction. The hybrid deterministic + LLM pattern is industry consensus.
+
+**Existing field (what gstack compact joins and differentiates from):**
+
+| Project | Stars | License | Layer | Threat | Note |
+|---------|-------|---------|-------|--------|------|
+| **RTK (rtk-ai/rtk)** | **28K** | Apache-2.0 | Tool output | Primary benchmark | Pure Rust, Bash-only, zero LLM |
+| caveman | 34.8K | MIT | Output tokens | Different axis | Terse system prompt; pairs WITH us |
+| claude-token-efficient | 4.3K | MIT | Response verbosity | Different axis | Single CLAUDE.md |
+| token-optimizer-mcp | 49 | MIT | MCP caching | Different axis | Prevents calls rather than compresses output |
+| tokenjuice | ~12 | MIT | Tool output | Too new | 2 days old; inspired our JSON envelope |
+| 6-Layer Token Savings Stack | — | Public gist | Recipe | Zero | Documentation; validates stacked compaction thesis |
+
+RTK is the only direct competitor. Everything else compresses a different token source.
+
+**License compatibility:** Every referenced project is permissive-licensed (MIT or Apache-2.0) and compatible with gstack's MIT license. No AGPL, GPL, or other copyleft dependencies. See the "License & attribution" section below for the clean-room policy.
+
+## Architecture
+
+### Data flow
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│  Host (Claude Code / Codex / OpenClaw)                          │
+│  ─────────────────────────────────────────                      │
+│  1. Agent requests tool call: Bash|Read|Grep|Glob|MCP           │
+│  2. Host executes tool                                          │
+│  3. Host invokes PostToolUse hook with: {tool, input, output}   │
+└────────────────────┬────────────────────────────────────────────┘
+                     │ stdin (JSON envelope)
+                     ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  gstack-compact hook binary                                     │
+│  ───────────────────────────                                    │
+│  a. Parse envelope                                              │
+│  b. Match rule by (tool, command, pattern)                      │
+│  c. Apply rule primitives: filter / group / truncate / dedupe   │
+│  d. Record reduction metadata                                   │
+│  e. Evaluate verifier triggers                                  │
+│  f. If trigger met: call Haiku, append preserved lines          │
+│  g. On failure exit code: tee raw to ~/.gstack/compact/tee/...  │
+│  h. Emit JSON envelope to stdout                                │
+└────────────────────┬────────────────────────────────────────────┘
+                     │ stdout (JSON envelope)
+                     ▼
+              Host substitutes compacted output into agent context
+```
+
+### Rule resolution
+
+Three-tier hierarchy (highest precedence wins), same pattern as tokenjuice and gstack's existing host-config-export model:
+
+1. Built-in rules: `compact/rules/` shipped with gstack
+2. User rules: `~/.config/gstack/compact-rules/`
+3. Project rules: `.gstack/compact-rules/`
+
+Rules match tool calls by rule ID. A project rule with ID `tests/jest` overrides the built-in `tests/jest` entirely. No merging — replace semantics, to keep reasoning simple.
+
+### JSON envelope contract (adopted from tokenjuice)
+
+Input:
+```json
+{
+  "tool": "Bash",
+  "command": "bun test test/billing.test.ts",
+  "argv": ["bun", "test", "test/billing.test.ts"],
+  "combinedText": "...",
+  "exitCode": 1,
+  "cwd": "/Users/garry/proj",
+  "host": "claude-code"
+}
+```
+
+Output:
+```json
+{
+  "reduced": "compacted output with [gstack-compact: N → M lines, rule: X] header",
+  "meta": {
+    "rule": "tests/jest",
+    "linesBefore": 247,
+    "linesAfter": 18,
+    "bytesBefore": 18234,
+    "bytesAfter": 892,
+    "verifierFired": false,
+    "teeFile": null,
+    "durationMs": 8
+  }
+}
+```
+
+### Rule schema
+
+Compact, minimal. Total rules-payload must stay <5KB on disk (lesson from claude-token-efficient: rule files themselves consume tokens on every session).
+
+```json
+{
+  "id": "tests/jest",
+  "family": "test-results",
+  "description": "Jest/Vitest output — preserve failures and summary counts",
+  "match": {
+    "tools": ["Bash"],
+    "commands": ["jest", "vitest", "bun test"],
+    "patterns": ["jest", "vitest", "PASS", "FAIL"]
+  },
+  "primitives": {
+    "filter": {
+      "strip": ["\\x1b\\[[0-9;]*m", "^\\s*at .+node_modules"],
+      "keep": ["FAIL", "PASS", "Error:", "Expected:", "Received:", "✓", "✗", "Tests:"]
+    },
+    "group": {
+      "by": "error-kind",
+      "header": "Errors grouped by type:"
+    },
+    "truncate": {
+      "headLines": 5,
+      "tailLines": 15,
+      "onFailure": { "headLines": 20, "tailLines": 30 }
+    },
+    "dedupe": {
+      "pattern": "^\\s*$",
+      "format": "[... {count} blank lines ...]"
+    }
+  },
+  "tee": {
+    "onExit": "nonzero",
+    "maxBytes": 1048576
+  },
+  "counters": [
+    { "name": "failed", "pattern": "^FAIL\\s", "flags": "m" },
+    { "name": "passed", "pattern": "^PASS\\s", "flags": "m" }
+  ]
+}
+```
+
+The four primitives — `filter`, `group`, `truncate`, `dedupe` — are lifted directly from RTK's technique taxonomy (the only thing every serious compactor needs to handle). Any rule can combine any subset of the four; omitted primitives are no-ops.
+
+### Verifier layer (tiered, opt-in)
+
+The verifier is a cheap Haiku call that fires only under specific triggers. Never on every tool call.
+
+**Trigger matrix (user-configurable):**
+
+| Trigger | Default | Condition |
+|---------|---------|-----------|
+| `failureCompaction` | **ON** | exit code ≠ 0 AND reduction >50% (diagnosis at risk) |
+| `aggressiveReduction` | off | reduction >80% AND original >200 lines |
+| `largeNoMatch` | off | no rule matched AND output >500 lines |
+| `userOptIn` | on (env-gated) | `GSTACK_COMPACT_VERIFY=1` forces verifier for that call |
+
+Default config ships with `failureCompaction` only — the highest-leverage case (agent is debugging; rule may have filtered the critical stack frame).
+
+**Haiku's job (bounded):**
+
+```
+Here is raw output (truncated to first 2000 lines) and a compacted version.
+Return any important lines from the raw that are missing from the compacted,
+or `NONE` if nothing critical is missing.
+```
+
+The verifier never rewrites the compacted output. It only appends missing lines under a header:
+
+```
+[gstack-compact: 247 → 18 lines, rule: tests/jest]
+[gstack-verify: 2 additional lines preserved by Haiku]
+  TypeError: Cannot read property 'foo' of undefined
+    at parseConfig (src/config.ts:42:18)
+```
+
+**Why Haiku, not Sonnet:** ~1/12th the cost, ~500ms vs ~2s, and the task is simple substring classification, not reasoning.
+
+**Verifier config (`compact/rules/_verifier.json`):**
+
+```json
+{
+  "verifier": {
+    "enabled": true,
+    "model": "claude-haiku-4-5-20251001",
+    "maxInputLines": 2000,
+    "triggers": {
+      "aggressiveReduction": { "enabled": false, "thresholdPct": 80, "minLines": 200 },
+      "failureCompaction":   { "enabled": true,  "minReductionPct": 50 },
+      "largeNoMatch":        { "enabled": false, "minLines": 500 },
+      "userOptIn":           { "enabled": true, "envVar": "GSTACK_COMPACT_VERIFY" }
+    },
+    "fallback": "passthrough"
+  }
+}
+```
+
+**Failure modes (verifier is strictly additive — never breaks the baseline):**
+
+- No `ANTHROPIC_API_KEY` → skip verifier, use pure rule output.
+- Haiku call times out (>5s) → skip verifier, use pure rule output.
+- Haiku returns malformed JSON → skip, use pure rule output.
+- Haiku returns prompt-injection attempt → sanitize: only append lines that are substring-matches of the original raw output.
+- Haiku returns hallucinated lines (not present in raw) → drop them.
+
+### Tee mode (adopted from RTK)
+
+On any command with exit code ≠ 0, the full unfiltered output is written to `~/.gstack/compact/tee/{timestamp}_{cmd-slug}.log`. The compacted output includes a tee-file pointer:
+
+```
+[gstack-compact: 247 → 18 lines, rule: tests/jest, tee: ~/.gstack/compact/tee/20260416-143022_bun-test.log]
+```
+
+The agent can read the tee file directly if it needs the full stack trace. This replaces the earlier `onFailure.preserveFull` mechanic with a cleaner design: compacted output always stays small; raw output is always one `cat` away.
+
+**Tee safety:**
+
+- File mode `0600` — not world-readable.
+- Built-in secret-regex set redacts AWS keys, bearer tokens, and common credential patterns before write.
+- Failed writes (read-only filesystem, permission denied) degrade gracefully: still emit compacted output, record `meta.teeFailed: true`.
+- Tee files auto-expire after 7 days (cleanup on hook startup).
+
+### Host integration matrix
+
+| Host | Hook type | Supported matchers | Config path |
+|------|-----------|-------------------|-------------|
+| Claude Code | `PostToolUse` | Bash, Read, Grep, Glob, Edit, Write, WebFetch, WebSearch, mcp__* | `~/.claude/settings.json` |
+| Codex (v1.1) | `PostToolUse` equivalent | Bash (primary); tool subset TBD — empirical verification is a v1.1 prereq | `~/.codex/hooks.json` |
+| OpenClaw (v1.1) | Native hook API | Bash + MCP | OpenClaw config |
+
+**v1 is Claude-first.** Wedge (ii) — native-tool coverage — is confirmed on Claude Code via [the hooks reference](https://code.claude.com/docs/en/hooks). Codex and OpenClaw integration ships at v1.1 only after the wedge is proven on the primary host via B-series benchmark data. CHANGELOG for v1 makes the Claude-only scope explicit.
+
+### Config surface
+
+User config (`~/.config/gstack/compact.toml`):
+
+```toml
+[compact]
+enabled = true
+level = "normal"                            # minimal | normal | aggressive (caveman pattern)
+exclude_commands = ["curl", "playwright"]   # RTK pattern
+
+[compact.bundle]
+auto_reload_on_mtime_drift = true           # hook rebuilds bundle if source rule files are newer
+bundle_path = "~/.gstack/compact/rules.bundle.json"
+
+[compact.regex]
+per_rule_timeout_ms = 50                    # AbortSignal budget per regex; timeout → skip rule
+
+[compact.verifier]
+enabled = true
+trigger_failure_compaction = true
+trigger_aggressive_reduction = false
+trigger_large_no_match = false
+failure_signal_fallback = true              # use /FAIL|Error|Traceback|panic/ when exitCode missing
+sanitization = "exact-line-match"           # only append lines present verbatim in raw output
+
+[compact.tee]
+on_exit = "nonzero"
+max_bytes = 1048576
+redact_patterns = ["aws", "github", "gitlab", "slack", "jwt", "bearer", "ssh-private-key"]
+cleanup_days = 7
+
+[compact.benchmark]
+local_only = true                           # hard-coded; config is documentary, cannot be changed
+transcript_root = "~/.claude/projects"
+output_dir = "~/.gstack/compact/benchmark"
+scenario_cap = 20                           # top-N clusters by aggregate output volume
+```
+
+**Intensity levels (caveman pattern):**
+
+- **minimal:** only `filter` + `dedupe`; no truncation. Safest.
+- **normal:** `filter` + `dedupe` + `truncate`. Default.
+- **aggressive:** adds `group`; more savings, more edge-case risk.
+
+### CLI surface
+
+| Command | Purpose | Source |
+|---------|---------|--------|
+| `gstack compact install <host>` | Register PostToolUse hook in host config; builds `rules.bundle.json` | new |
+| `gstack compact uninstall <host>` | Idempotent removal | new |
+| `gstack compact reload` | Rebuild `rules.bundle.json` after editing user/project rules | new |
+| `gstack compact doctor` | Detect drift / broken hook config, offer to repair | tokenjuice |
+| `gstack compact gain` | Show token/dollar savings over time (per-rule breakdown) | RTK |
+| `gstack compact discover` | Find commands with no matching rule, ranked by noise volume | RTK |
+| `gstack compact verify <rule-id>` | Dry-run verifier on a fixture | new |
+| `gstack compact list-rules` | Show effective rule set after deep-merge (built-in + user + project) | new |
+| `gstack compact test <rule-id> <fixture>` | Apply a rule to a fixture and show the diff | new |
+| `gstack compact benchmark` | Run B-series testbench against local transcript corpus (see Benchmark section) | new |
+
+Escape hatch: `GSTACK_RAW=1` env var bypasses the hook entirely for the duration of a command (same pattern as tokenjuice's `--raw` flag). Hook also auto-reloads the bundle if any source rule file's mtime is newer than the bundle file.
+
+## File layout
+
+```
+compact/
+├── SKILL.md.tmpl              # template; regen via `bun run gen:skill-docs`
+├── src/
+│   ├── hook.ts                # entry point; reads stdin, writes stdout; mtime-checks bundle
+│   ├── engine.ts              # rule matching + reduction metadata
+│   ├── apply.ts               # primitive application (line-oriented streaming pipeline)
+│   ├── merge.ts               # deep-merge of built-in/user/project rules; honors `extends: null`
+│   ├── bundle.ts              # compile source rules → rules.bundle.json (install/reload)
+│   ├── primitives/
+│   │   ├── filter.ts
+│   │   ├── group.ts
+│   │   ├── truncate.ts        # ring-buffered tail; safe for arbitrary input size
+│   │   └── dedupe.ts
+│   ├── regex-sandbox.ts       # AbortSignal-bounded regex execution (50ms budget per rule)
+│   ├── verifier.ts            # Haiku integration (triggers + failure-signal fallback + sanitization)
+│   ├── sanitize.ts            # exact-line-match filter for verifier output
+│   ├── tee.ts                 # raw-output archival with secret redaction + 7-day cleanup
+│   ├── redact.ts              # secret-pattern set (AWS/GitHub/GitLab/Slack/JWT/bearer/SSH)
+│   ├── envelope.ts            # JSON I/O contract parsing + validation
+│   ├── doctor.ts              # hook drift detection + repair
+│   ├── analytics.ts           # gain + discover queries against local metadata
+│   └── cli.ts                 # argv dispatch; one thin dispatch per subcommand
+├── benchmark/                 # B-series testbench (hard v1 gate)
+│   └── src/
+│       ├── scanner.ts         # walk ~/.claude/projects/**/*.jsonl; pair tool_use × tool_result
+│       ├── sizer.ts           # tokens per call (ceil(len/4) heuristic); rank heavy tail
+│       ├── cluster.ts         # group high-leverage calls by (tool, command pattern)
+│       ├── scenarios.ts       # emit B1-Bn real-world scenario fixtures
+│       ├── replay.ts          # run compactor against scenarios; measure reduction
+│       ├── pathology.ts       # layer planted-bug P-cases on top of real scenarios
+│       └── report.ts          # dashboard: per-scenario before/after + overall reduction
+├── rules/                     # v1 built-in JSON rule library (13 rules)
+│   ├── tests/
+│   │   ├── jest.json
+│   │   ├── vitest.json
+│   │   ├── pytest.json
+│   │   ├── cargo-test.json
+│   │   ├── go-test.json
+│   │   └── rspec.json
+│   ├── install/
+│   │   ├── npm.json
+│   │   ├── pnpm.json
+│   │   ├── pip.json
+│   │   └── cargo.json
+│   ├── git/
+│   │   ├── diff.json
+│   │   ├── log.json
+│   │   └── status.json
+│   ├── _verifier.json         # verifier config (not a rule per se)
+│   └── _HOLD/                 # v1.1 rule families (not shipped at v1; kept for reference)
+│       ├── build/
+│       ├── lint/
+│       └── log/
+└── test/
+    ├── unit/
+    ├── golden/
+    ├── fuzz/                  # P-series — v1 gate subset only (P1/P3/P6/P8/P15/P18/P26/P28/P30)
+    ├── cross-host/            # v1: claude-code.test.ts only; codex/openclaw stub files
+    ├── adversarial/           # R-series — grows with shipped bugs
+    ├── benchmark/             # B-series scenario fixtures + expected reduction ranges
+    ├── fixtures/              # version-stamped golden inputs (toolVersion: frontmatter)
+    └── evals/
+```
+
+## Testing Strategy
+
+The test plan is comprehensive by design. Shipping into a space where the 28K-star incumbent has three years of regex battle-scars, with our wedges (Haiku verifier + native-tool coverage) introducing new failure surfaces, means we get ONE shot at "the compactor made my agent dumb" going viral. Zero appetite for that.
+
+### Test tiers
+
+| Tier | Cost | Frequency | Blocks merge |
+|------|------|-----------|--------------|
+| Unit | free, <1s | every PR | yes |
+| Golden file (with `toolVersion:` frontmatter) | free, <1s | every PR | yes |
+| Rule schema validation | free, <1s | every PR | yes |
+| Fuzz (P-series gate subset: P1/P3/P6/P8/P15/P18/P26/P28/P30) | free, <10s | every PR | yes |
+| Cross-host E2E — Claude Code only at v1 | free, ~1min | every PR (gate tier) | yes |
+| E2E with verifier (mocked Haiku) | free, ~15s | every PR | yes |
+| E2E with verifier (real Haiku) | paid, ~$0.10/run | PR touching verifier files | yes |
+| **B-series benchmark (real-world scenarios)** | **free, ~2min** | **pre-release gate** | **yes (hard gate for v1)** |
+| Token-savings eval (E1-E4 synthetic) | paid, ~$4/run | periodic weekly | no (informational) |
+| Adversarial regression (R-series) | free, <5s | every PR | yes |
+| Tool-version drift warning | free, <1s | every PR | warning only |
+
+Test file layout:
+
+```
+compact/test/
+├── unit/
+│   ├── engine.test.ts         # rule matching + primitive application
+│   ├── primitives.test.ts     # filter / group / truncate / dedupe
+│   ├── envelope.test.ts       # JSON input/output contract
+│   ├── triggers.test.ts       # verifier trigger evaluation
+│   └── verifier.test.ts       # Haiku call (mocked)
+├── golden/
+│   ├── tests/                 # one fixture per test runner
+│   │   ├── jest-success.input.txt
+│   │   ├── jest-success.expected.txt
+│   │   ├── jest-fail.input.txt
+│   │   ├── jest-fail.expected.txt
+│   │   └── ... (vitest, pytest, cargo-test, go-test, rspec)
+│   ├── install/
+│   ├── git/
+│   ├── build/
+│   ├── lint/
+│   └── log/
+├── fuzz/
+│   └── pathological.test.ts   # P-series
+├── cross-host/
+│   ├── claude-code.test.ts
+│   ├── codex.test.ts
+│   └── openclaw.test.ts
+├── adversarial/
+│   └── regression.test.ts     # R-series; past bugs that must never recur
+├── fixtures/
+│   └── {tool}/                # shared raw output fixtures
+└── evals/
+    └── token-savings.eval.ts  # periodic-tier; measures real reduction
+```
+
+### G-series: good cases (must produce expected reduction)
+
+| ID | Scenario | Expected reduction |
+|----|----------|-------------------|
+| G1 | `jest` 47 passing tests, clean run | 150+ lines → ≤10 lines |
+| G2 | `jest` 47 tests with 2 failures | 200+ lines → keep both failures + summary |
+| G3 | `vitest` run with `--reporter=verbose` | 300+ lines → ≤15 lines |
+| G4 | `pytest` collection then run | preserve failure tracebacks |
+| G5 | `cargo test` with one panic | panic location preserved verbatim |
+| G6 | `go test -v` with 200 subtests passing | collapse to `PASS: 200 subtests` |
+| G7 | `git diff` on a file with 2 hunks in 500 lines of context | keep hunks, drop context |
+| G8 | `git log -50` | preserve SHA + subject + author, drop body |
+| G9 | `git status` with 30 modified files | group by directory |
+| G10 | `pnpm install` fresh | final count + warnings; drop resolved packages |
+| G11 | `pip install -r requirements.txt` | drop download progress; keep final install list + errors |
+| G12 | `cargo build` success | drop compilation progress; keep final target |
+| G13 | `docker build` success | drop layer pulls; keep final image digest |
+| G14 | `tsc --noEmit` clean | compact to `tsc: 0 errors` |
+| G15 | `tsc --noEmit` with 3 errors | keep all 3 errors with location |
+| G16 | `eslint .` clean | compact to `eslint: 0 problems` |
+| G17 | `eslint .` with violations | group by rule; preserve location + fix suggestion |
+| G18 | `docker logs -f` with 1000 repeating lines | dedupe with count: `[last message repeated 973 times]` |
+| G19 | `kubectl get pods -A` | group by namespace |
+| G20 | `ls -la` deep tree | directory grouping (RTK pattern) |
+| G21 | `find . -type f` 10K files | group by extension with counts |
+| G22 | `grep -r "foo" .` with 500 hits | cap at 50; suffix `[... 450 more matches; use --ripgrep for full]` |
+| G23 | `curl -v https://api.example.com` | strip verbose headers; keep response body |
+| G24 | `aws ec2 describe-instances` 50 instances | columnar summary |
+
+### P-series: pathological cases (must NOT break the agent)
+
+These turn "nice feature" into "catastrophic regression" if we get any of them wrong.
+
+| ID | Scenario | Required behavior |
+|----|----------|-------------------|
+| P1 | Binary garbage in output (non-UTF8 bytes) | Pass through unchanged; don't crash |
+| P2 | ANSI escape explosion (10K+ codes) | Strip cleanly, don't choke regex |
+| P3 | Empty output (`""`) | Pass through empty; do NOT inject header |
+| P4 | Stdout+stderr interleaved | Rule matches across both streams |
+| P5 | Truncated output (SIGPIPE mid-stream) | Don't mis-compact partial output |
+| P6 | **Failed test, critical stack frame at line 4 of 200** | Must NOT filter the frame (the RTK-killer case) |
+| P7 | Exit 0 but `ERROR:` in output | Rule must not trust exit code alone |
+| P8 | Output contains AWS key / bearer token / password | Tee file must NOT be world-readable; redact in compacted output |
+| P9 | Single-line minified JS error (40KB one line) | Truncate to first 1KB; append `[... truncated ...]` |
+| P10 | Unicode (emoji, RTL, combining chars, CJK) | Byte-safe truncation; don't split codepoints |
+| P11 | Two rules match same command | Deterministic priority: longest `match.commands` prefix wins; tie → rule ID alphabetical |
+| P12 | Rule's compacted output matches another rule's pattern | No recursive application; hook runs once per tool call |
+| P13 | Command contains embedded newlines in quoted arg | Rule doesn't misparse args |
+| P14 | Concurrent tool calls (parallel Bash invocations) | No shared mutable state in hook; each call is isolated |
+| P15 | Hook execution >5s | Pass through raw; emit `meta.timedOut: true` |
+| P16 | Haiku API offline/rate-limited | Skip verifier silently; use pure rule output |
+| P17 | Haiku returns malformed JSON | Skip verifier; do NOT feed raw response to agent |
+| P18 | Haiku response contains prompt-injection (`"Ignore all prior instructions..."`) | Sanitize: only append lines that are substring matches of the original raw output |
+| P19 | 1M-line output | Stream-process, cap memory at 64MB; truncate with clear marker |
+| P20 | Rapid-fire: 50 tool calls / sec | Hook latency stays <15ms p99 |
+| P21 | Command with shell redirects (`cmd >file 2>&1`) | Match on the underlying command name, not the redirect wrapper |
+| P22 | Deeply nested quotes/escapes in command string | Robust arg parser; no shell injection possible |
+| P23 | NULL bytes in output | Strip safely; don't truncate |
+| P24 | Command that exits then writes more to stderr after | Hook receives final combined output; handles gracefully |
+| P25 | Read-only filesystem / no tee write permission | Degrade gracefully; still emit compacted output; record `meta.teeFailed: true` |
+| P26 | User's rule JSON is malformed | Skip that rule; emit warning to stderr; don't break hook |
+| P27 | Rule references a non-existent primitive field | Ignore unknown field; apply rest of rule |
+| P28 | Rule regex has catastrophic backtracking | RE2-compatible engine (no backtracking) OR per-rule timeout |
+| P29 | Exit code 137 (OOM kill) | Rule treats same as generic failure; preserves full output |
+| P30 | Haiku returns lines NOT present in raw output (hallucination) | Drop hallucinated lines; keep only substring matches |
+
+### CH-series: cross-host E2E
+
+Run each scenario on each supported host. Same input, same expected output. If a host does not support a matcher, the test is marked `skip-on-{host}` with a comment linking the upstream limitation.
+
+| ID | Scenario | Hosts |
+|----|----------|-------|
+| CH1 | Install hook via `gstack compact install <host>` | Claude Code, Codex, OpenClaw |
+| CH2 | Uninstall hook is idempotent | All |
+| CH3 | Re-install doesn't duplicate entries | All |
+| CH4 | Hook co-exists with user's other PostToolUse hooks | All |
+| CH5 | Hook fires on Bash tool | All |
+| CH6 | Hook fires on Read tool | Claude Code (confirmed); Codex/OpenClaw verify-then-require |
+| CH7 | Hook fires on Grep tool | Same as CH6 |
+| CH8 | Hook fires on Glob tool | Same as CH6 |
+| CH9 | Hook fires on MCP tool (`mcp__*` matcher) | Claude Code; verify on others |
+| CH10 | Config precedence: project > user > built-in | All |
+| CH11 | `GSTACK_RAW=1` env var bypasses hook | All |
+| CH12 | Rule ID override works (project rule replaces built-in) | All |
+| CH13 | `gstack compact doctor` detects drift on each host | All |
+| CH14 | Hook error does not crash the agent session | All |
+
+Implementation note: cross-host tests reuse the fixture corpus from the `golden/` tree; the harness wraps each fixture in a host-specific hook invocation envelope and asserts the output is byte-identical across hosts (modulo the `host` field).
+
+### V-series: verifier tests (paid)
+
+| ID | Scenario | Expected |
+|----|----------|----------|
+| V1 | Rule reduces 200-line test output to 5 lines, exit=1 | Verifier fires (failure + >50% reduction), appends any missing critical lines |
+| V2 | Rule reduces 10-line output to 9 lines, exit=1 | Verifier does NOT fire (reduction too small) |
+| V3 | Rule reduces 200-line output to 5 lines, exit=0 | Verifier does NOT fire (success path, default config) |
+| V4 | `aggressiveReduction` trigger enabled, 300 lines → 20 lines, exit=0 | Verifier fires |
+| V5 | `GSTACK_COMPACT_VERIFY=1` env var set | Verifier fires once for that call |
+| V6 | `ANTHROPIC_API_KEY` missing | Verifier silently skipped; raw rule output returned |
+| V7 | Verifier mocked to return "NONE" | Output identical to pure-rule path |
+| V8 | Verifier mocked to return prompt injection | Injection discarded; only substring-matched lines appended |
+| V9 | Verifier mocked to time out >5s | Skipped; `meta.verifierTimedOut: true` |
+| V10 | Verifier mocked to return 500 error | Skipped; rule output returned |
+
+### R-series: adversarial regression
+
+Every bug caught after v1 ship gets a permanent R-series test. Starts empty; grows with scars. Template:
+
+```
+R{N}: {commit-sha} — {1-line summary}
+Scenario: {reproducer}
+Fix: {PR link}
+```
+
+### Performance budgets (enforced in CI; revised for realistic Bun cold-start)
+
+| Metric | Target | Hard limit |
+|--------|--------|-----------|
+| Hook overhead macOS ARM (verifier disabled) | <30ms p50 | <80ms p99 |
+| Hook overhead Linux (verifier disabled) | <20ms p50 | <60ms p99 |
+| Hook overhead (verifier fires) | <600ms p50 | <2s p99 |
+| Bundle deserialize (rules.bundle.json) | <2ms | <10ms |
+| mtime drift check (stat of source files) | <0.5ms | <3ms |
+| Single-regex execution budget (per rule) | <5ms | <50ms (hard abort) |
+| Memory per hook invocation (line-streamed) | <16MB typical | <64MB max |
+| Total rule-payload size on disk (source files) | <5KB | <15KB |
+| Compiled bundle size on disk | <25KB | <80KB |
+
+Daemon mode is a v2 optimization. If B-series benchmark on the author's corpus shows cold-start meaningfully hurts session-total savings (e.g., total hook overhead >5% of saved tokens' wall time), promote to v1.1.
+
+### B-series real-world benchmark testbench (hard v1 gate)
+
+**Why it exists.** Every competing compactor ships with hand-picked fixture numbers. B-series proves the compactor works on the user's *actual* coding sessions before they enable the hook. It's both the ship-gate and the marketing artifact.
+
+**Architecture** (components in `compact/benchmark/src/`):
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│  1. SCAN     scanner.ts walks ~/.claude/projects/**/*.jsonl  │
+│              → pairs tool_use × tool_result blocks           │
+│              → emits {tool, command, outputBytes, lineCount, │
+│                estimatedTokens, sessionId, timestamp}        │
+├──────────────────────────────────────────────────────────────┤
+│  2. RANK     sizer.ts sorts corpus by estimatedTokens desc   │
+│              → cluster.ts groups by (tool, command-pattern)  │
+│              → identifies heavy-tail: which 10% of calls     │
+│                produced 80% of the tokens?                   │
+├──────────────────────────────────────────────────────────────┤
+│  3. SCENARIO scenarios.ts emits fixture files:               │
+│              B1_bun_test_heavy.jsonl                         │
+│              B2_git_diff_huge.jsonl                          │
+│              B3_tsc_errors_production.jsonl                  │
+│              B4_pnpm_install_fresh.jsonl ... (one per        │
+│              high-leverage cluster, up to ~20 scenarios)     │
+├──────────────────────────────────────────────────────────────┤
+│  4. REPLAY   replay.ts runs compactor against each scenario, │
+│              measures token reduction + diff of dropped lines│
+│              → per-rule reduction numbers                    │
+│              → per-scenario before/after token counts        │
+├──────────────────────────────────────────────────────────────┤
+│  5. PATHOLOGY pathology.ts injects planted critical lines    │
+│              (line 4 of 200 in a failing test fixture) into  │
+│              real B-scenarios. Confirms verifier restores    │
+│              them. Real data + real threats = real proof.    │
+├──────────────────────────────────────────────────────────────┤
+│  6. REPORT   report.ts emits HTML + JSON dashboard to        │
+│              ~/.gstack/compact/benchmark/latest/              │
+│              "On YOUR 30 days of Claude Code data, gstack    │
+│              compact would save X tokens in Y scenarios."    │
+└──────────────────────────────────────────────────────────────┘
+```
+
+**v1 ship gate (hard):**
+- ≥15% total-token reduction across the aggregated scenario corpus on the author's own 30-day transcript set.
+- Zero critical-line loss on planted-bug scenarios (every planted stack frame must survive either the rule or the verifier).
+- No scenario regresses to <5% reduction under the new rules (catch over-compaction edge cases).
+
+**Privacy (non-negotiable):**
+- Reads `~/.claude/projects/**/*.jsonl` locally only. Never uploads. Never shares. Never logs scenarios to telemetry.
+- Output files live under `~/.gstack/compact/benchmark/` with mode `0600`.
+- The command prints a confirmation banner: *"Scanning local transcripts at ~/.claude/projects/ (local-only; nothing leaves this machine)."*
+- Any future community corpus is a separate v2 workstream built from hand-contributed, secret-scanned fixtures on OSS projects.
+
+**Ports from analyze_transcripts (TypeScript reimplementation; not a subprocess call):**
+- JSONL parsing + tool_use/tool_result pairing pattern (from `event_extractor.rb`).
+- Token estimate `ceil(len/4)` (same char-ratio heuristic; sufficient for ranking).
+- Event-type taxonomy (`bash_command`, `file_read`, `test_run`, `error_encountered`) for scenario clustering.
+- Stress-fixture generation pattern for pathology layering.
+
+**What we do NOT port:** behavioral scoring, pgvector embeddings, decision-exchange graphs, velocity metrics, the Rails/ActiveRecord layer. Out of scope; not what we're measuring.
+
+### Synthetic token-savings evals (E-series, periodic/informational only)
+
+Retained from the original plan but now informational-only because B-series is the real gate.
+
+- **E1:** simulated 30-min coding session on a medium TypeScript project. Measure total tokens with/without gstack compact enabled. Target: ≥15% reduction.
+- **E2:** same session at `level=aggressive`. Target: ≥25% reduction, zero test-failure increase.
+- **E3:** same session with verifier on `failureCompaction` only. Verifier fire rate ≤10% of tool calls.
+- **E4:** adversarial — inject a planted bug in a test output and confirm the verifier restores the critical stack frame.
+
+### Test corpus sourcing
+
+For each rule family, capture 3+ real outputs:
+
+1. Run the tool against a real project (gstack itself for TS; popular OSS for Rust/Go/Python).
+2. Capture stdout+stderr+exit code into a fixture file with `toolVersion:` frontmatter (e.g., `jest@29.7.0`).
+3. Hand-author the expected compacted output once.
+4. Golden file test: rule application must produce byte-identical output.
+5. CI drift warning: if installed tool version differs from fixture's `toolVersion:`, CI warns (not fails). Drift-warning dashboard is checked pre-release.
+
+Draw from:
+- tokenjuice's fixture directory patterns (`tests/fixtures/`)
+- RTK's per-command examples (their README lists real before/after metrics; verify independently)
+- gstack's own test output (eat our own dog food)
+- Real failure archives from `~/.gstack/compact/tee/` (once volunteers contribute)
+- **B-series real-world scenarios are the primary corpus for reduction measurements.**
+
+## Pattern adoption table
+
+Concrete patterns borrowed from the competitive landscape:
+
+| From | Adopt as | Why |
+|------|----------|-----|
+| RTK | 4 reduction primitives (filter/group/truncate/dedupe) as JSON rule verbs | Table stakes for a serious compactor |
+| RTK | `gstack compact tee` for failure-mode raw save | Better than the original `onFailure.preserveFull` design |
+| RTK | `gstack compact gain` + `gstack compact discover` | Trust + continuous improvement |
+| RTK | `exclude_commands` per-user blocklist | Must-have config |
+| tokenjuice | JSON envelope contract for hook I/O | Clean machine adapter |
+| tokenjuice | `gstack compact doctor` | Hooks drift; self-repair matters |
+| caveman | Intensity levels (minimal/normal/aggressive) | User-tunable safety/savings knob |
+| claude-token-efficient | Rules-file size budget (<5KB total) | Don't bloat context |
+
+## Rollout plan
+
+**ALL PHASES TABLED pending Anthropic `updatedBuiltinToolOutput` API.** See Status section at the top of this doc. The rollout below is the intended sequence if/when the API ships and this design un-tables.
+
+### Un-tabling checklist (do in order when the API arrives)
+
+1. **Confirm the new API's shape.** Read the updated Claude Code hooks reference. Capture a real envelope containing the new output-replacement field for Bash, Read, Grep, Glob. Record in `docs/designs/GCOMPACTION_envelope.md`.
+2. **Re-validate the wedge.** Does the new API cover Read/Grep/Glob (do they fire `PostToolUse` now), or just Bash/WebFetch? If Bash-only, wedge (ii) stays dead and the product needs a new pitch before implementation.
+3. **Re-run `/plan-eng-review`** against the revised plan with the new API. Most of the 15 locked decisions should carry forward; adjust the Architecture data-flow and any envelope-dependent decisions.
+4. **Re-run `/codex review`** against the revised plan. The prior BLOCK verdict's concerns about hook substitution disappear once the API exists; remaining criticals (B-series privacy, regex DoS, JSON-envelope streaming) still apply.
+5. **Execute the original rollout below.**
+
+### Original rollout (preserved for un-tabling)
+
+Each tier blocks on the prior passing all gate-tier tests. Claude-first — Codex and OpenClaw land at v1.1 after the wedge is proven on the primary host.
+
+1. **v0.0 (1 day):** rule engine + 4 primitives + line-oriented streaming pipeline + deep-merge + bundle compiler + envelope contract + golden tests for `tests/*` family only. No host integration yet. Measure savings on offline fixtures.
+2. **v0.1 (1 day):** Claude Code hook integration + `gstack compact install` + mtime-based auto-reload. Ship as opt-in; off by default. Ask 10 gstack power users to try it; collect feedback.
+3. **v0.5 (1 day):** B-series benchmark testbench (`compact/benchmark/`). Ship `gstack compact benchmark` so users can measure on their own data. Collect anonymous-from-the-start (nothing uploaded) reduction numbers from dogfooders.
+4. **v1.0 (1 day):** verifier layer with `failureCompaction` trigger on by default + exact-line-match sanitization + layered exitCode/pattern fallback + expanded tee redaction set. **Hard ship gate:** B-series on the author's 30-day local corpus shows ≥15% total reduction AND zero critical-line loss on planted bugs. Publish CHANGELOG entry leading with wedge framing (Claude Code only at v1).
+5. **v1.1 (+1 day):** Codex + OpenClaw hook integration. Cross-host E2E suite green. Build/lint/log rule families land with `gstack compact discover`-derived priorities.
+6. **v1.2+:** expand rule families, community rule contribution workflow, community-corpus benchmark (hand-authored public fixtures, separate from local B-series).
+
+## Risk analysis
+
+| Risk | Severity | Mitigation |
+|------|----------|------------|
+| RTK adds an LLM verifier in response | Low | Creator is vocal about zero-dependency Rust. Ship first, build the pattern library. |
+| Platform compaction subsumes us (Anthropic Compaction API in Claude Code) | Medium | We operate at a different layer (per-tool output vs whole-context). Position as complementary. |
+| Rules drop something critical → "compactor made my agent dumb" | High | B-series real-world benchmark as hard ship gate; tee mode always available; verifier default-on for failures; exact-line-match sanitization. |
+| Haiku cost creep (triggers fire more than expected) | Medium | E3 eval + B-series fire-rate metric; cost visible in `gstack compact gain`; per-session rate cap in v1.1 if rate >10%. |
+| Rule maintenance debt (jest/vitest output formats change) | Medium | `toolVersion:` fixture frontmatter + CI drift warning; community rule PRs; `discover` flags bypassing commands. |
+| Rules file bloats context | Low | CI-enforced <5KB source + <25KB compiled bundle budget; per-rule size warning at schema-validation. |
+| Regex DoS blocks the agent | Medium | 50ms AbortSignal budget per rule; timeout logged to `meta.regexTimedOut`; stale rules quarantined on repeated failure. |
+| Bundle staleness silently breaks user edits | Low | mtime-check on every hook invocation auto-rebuilds; `gstack compact reload` is a backup not a requirement. |
+| Benchmark leaks user's private data | High | Local-only by construction: no network call, mode-0600 output, explicit banner at runtime. Privacy review before v1 ship. |
+
+## Open questions
+
+1. ~~Does Codex's PostToolUse hook support matchers for Read/Grep/Glob?~~ (Deferred to v1.1 — Claude-first at v1.)
+2. ~~Does OpenClaw's hook API support PostToolUse specifically?~~ (Deferred to v1.1.)
+3. Should the verifier model be pinned, or version-tracked like gstack's other AI calls? (Inclined to pin `claude-haiku-4-5-20251001` and bump explicitly in CHANGELOG.)
+4. ~~Built-in secret-redaction regex set for tee files~~ **(resolved: expanded set — AWS/GitHub/GitLab/Slack/JWT/bearer/SSH-private-key. See decision #10.)**
+5. Should `gstack compact discover` propose auto-generated rules via Haiku? (Deferred to v2; skill-creep risk.)
+6. **New:** Does Claude Code's PostToolUse envelope include `exitCode`? (Still needs empirical verification per pre-implementation task #1; system now has a layered fallback regardless.)
+7. **New:** What's the right scenario-count cap for B-series? Cluster.ts can produce 5-50 scenarios depending on heavy-tail shape. Plan: cap at top 20 clusters by aggregate output volume.
+
+## Pre-implementation assignment (must complete before coding)
+
+1. **Verify Claude Code's PostToolUse envelope contents empirically.** Ship a no-op hook; confirm `exitCode`, `command`, `argv`, `combinedText` are all present. This is the pivot for wedge (ii) native-tool coverage AND for the failureCompaction trigger. Output: `docs/designs/GCOMPACTION_envelope.md` with real captured envelopes for Bash + Read + Grep + Glob.
+2. **Read RTK's rule definitions** (`ARCHITECTURE.md`, `src/rules/`) and write a 1-paragraph summary of which of the 4 primitives they handle best. Inform our v1 rule set. This is the Search Before Building layer.
+3. **Port analyze_transcripts JSONL parser to TypeScript.** `compact/benchmark/src/scanner.ts`. Write a quick-look output that lists the top-50 noisiest tool calls on the author's `~/.claude/projects/`. Confirms the testbench premise before we build the replay loop. This is the B-series foundation.
+4. **Write the CHANGELOG entry FIRST.** Target sentence: *"Every tool in your agent's toolbox on Claude Code now produces less noise — test runners, git diffs, package installs — with an intelligent Haiku safety net that restores critical stack frames when our rules over-compact, and a local benchmark that proves the savings on your actual 30 days of coding sessions. Codex + OpenClaw land in v1.1."* If we cannot write that sentence honestly, the wedge isn't there yet.
+5. **Ship a rule-only v0** (no Haiku verifier, no benchmark). Measure real token savings with current gstack evals + early B-series prototype. If <10% on local corpus, the whole premise is weaker than claimed — iterate the rules before adding the verifier on top.
+
+## License & attribution
+
+gstack ships under MIT. To keep the license clean for downstream users, this project follows a strict clean-room policy for everything borrowed from the competitive landscape:
+
+- **Every project referenced above is permissive-licensed** (MIT or Apache-2.0). No AGPL, GPL, SSPL, or other copyleft exposure.
+  - RTK (rtk-ai/rtk): **Apache-2.0** — MIT-compatible; Apache patent grant is a bonus for us.
+  - tokenjuice, caveman, claude-token-efficient, token-optimizer-mcp, sst/opencode: **MIT**.
+- **Patterns, not code.** We read these projects to understand what they solved and why. We implement independently in TypeScript inside `compact/src/`. We do not copy source files, translate source files line-for-line, or lift test fixtures verbatim.
+- **Attribution.** Where a pattern is directly borrowed (the 4 primitives from RTK, the JSON envelope from tokenjuice, intensity levels from caveman, rules-file size budget from claude-token-efficient), we credit the source inline in comments and in the "Pattern adoption table" above. The project's `README` and `NOTICE` file (if we add one) list the inspirations.
+- **Fixture sourcing.** Golden-file fixtures come from running real tools against real projects — they are our own captures, not imported from RTK or tokenjuice. This keeps the test corpus free of license-tangled content.
+- **Forbidden sources.** Before adding any new reference project, run `gh api repos/OWNER/REPO --jq '.license'` and verify the license key is one of: `mit`, `apache-2.0`, `bsd-2-clause`, `bsd-3-clause`, `isc`, `cc0-1.0`, `unlicense`. If the project has no license field, treat it as "all rights reserved" and do not draw from it. Reject `agpl-3.0`, `gpl-*`, `sspl-*`, and any custom or source-available license.
+
+CI enforcement: a `scripts/check-references.ts` script parses `docs/designs/GCOMPACTION.md` for GitHub URLs and re-runs the license check, failing if any referenced project's license moves off the allowlist.
+
+## References
+
+- [RTK (Rust Token Killer) — rtk-ai/rtk](https://github.com/rtk-ai/rtk)
+- [RTK issue #538 — native-tool gap](https://github.com/rtk-ai/rtk/issues/538)
+- [tokenjuice — vincentkoc/tokenjuice](https://github.com/vincentkoc/tokenjuice)
+- [caveman — juliusbrussee/caveman](https://github.com/juliusbrussee/caveman)
+- [claude-token-efficient — drona23](https://github.com/drona23/claude-token-efficient)
+- [token-optimizer-mcp — ooples](https://github.com/ooples/token-optimizer-mcp)
+- [6-Layer Token Savings Stack — doobidoo gist](https://gist.github.com/doobidoo/e5500be6b59e47cadc39e0b7c5cd9871)
+- [Claude Code hooks reference](https://code.claude.com/docs/en/hooks)
+- [Chroma context rot research](https://research.trychroma.com/context-rot)
+- [Morph: Why LLMs Degrade as Context Grows](https://www.morphllm.com/context-rot)
+- [Anthropic Opus 4.6 Compaction API — InfoQ](https://www.infoq.com/news/2026/03/opus-4-6-context-compaction/)
+- [OpenAI compaction docs](https://developers.openai.com/api/docs/guides/compaction)
+- [Google ADK context compression](https://google.github.io/adk-docs/context/compaction/)
+- [LangChain autonomous context compression](https://blog.langchain.com/autonomous-context-compression/)
+- [sst/opencode context management](https://deepwiki.com/sst/opencode/2.4-context-management-and-compaction)
+- [DEV: Deterministic vs. LLM Evaluators — 2026 trade-off study](https://dev.to/anshd_12/deterministic-vs-llm-evaluators-a-2026-technical-trade-off-study-11h)
+- [MadPlay: RTK 80% token reduction experiment](https://madplay.github.io/en/post/rtk-reduce-ai-coding-agent-token-usage)
+- [Esteban Estrada: RTK 70% Claude Code reduction](https://codestz.dev/experiments/rtk-rust-token-killer)
+
+**End of GCOMPACTION.md canonical section.** On plan approval, everything above is copied verbatim to `docs/designs/GCOMPACTION.md` as a **tabled design artifact**. No code is written; no hook is installed; no CHANGELOG entry is added. The doc exists so a future sprint can unblock quickly when Anthropic ships the built-in-tool output-replace API.
@@ -0,0 +1,84 @@
+# Design: slop-scan integration in /review and /ship
+
+Status: deferred
+Created: 2026-04-09
+Depends on: slop-diff script (scripts/slop-diff.ts, already landed)
+
+## Problem
+
+slop-scan findings are only visible if you run `bun run slop:diff` manually. They
+should surface automatically during code review and shipping, the same way SQL safety
+and trust boundary checks do.
+
+## Integration points
+
+### /review (Step 4, after checklist pass)
+
+Run `bun run slop:diff` after the critical/informational checklist pass. Show new
+findings inline with other review output:
+
+```
+Pre-Landing Review: 3 issues (1 critical, 2 informational)
+
+AI Slop: +2 new findings, -0 removed
+  browse/src/new-feature.ts
+    defensive.empty-catch: 2 locations
+      line 42: empty catch, boundary=filesystem
+      line 87: empty catch, boundary=process
+```
+
+Classification: INFORMATIONAL (never blocks merge, just surfaces the pattern).
+
+Fix-First heuristic applies: if the finding is an empty catch around a file op,
+auto-fix with `safeUnlink()`. If it's a catch-and-log in extension code, skip
+(that's the correct pattern per CLAUDE.md guidelines).
+
+### /ship (Step 3.5, pre-landing review + PR body)
+
+Same integration as /review. Additionally, show a one-line summary in the PR body:
+
+```markdown
+## Pre-Landing Review
+- 2 issues auto-fixed, 0 needs input
+- AI Slop: +0 new / -3 removed ✓
+```
+
+### Review Readiness Dashboard
+
+Do NOT add a row. Slop is a diagnostic on the diff, not a review that gets "run"
+independently. It shows up inside Eng Review output, not as its own dashboard entry.
+
+## What to auto-fix vs what to skip
+
+Follow CLAUDE.md "Slop-scan" section. Summary:
+
+**Auto-fix (genuine quality improvements):**
+- Empty catch around `fs.unlinkSync` → replace with `safeUnlink()`
+- Empty catch around `process.kill` → replace with `safeKill()`
+- `return await` with no enclosing try → remove `await`
+- Untyped catch around URL parsing → add `instanceof TypeError` check
+
+**Skip (correct patterns that slop-scan flags):**
+- `.catch(() => {})` on fire-and-forget browser ops (page.close, bringToFront)
+- Catch-and-log in Chrome extension code (uncaught errors crash extensions)
+- `safeUnlinkQuiet` in shutdown/emergency paths (swallowing all errors is correct)
+- Pass-through wrappers that delegate to active session (API stability layer)
+
+## Implementation notes
+
+- `scripts/slop-diff.ts` already handles the heavy lifting (worktree-based base
+  comparison, line-number-insensitive fingerprinting, graceful fallback)
+- The review/ship skills run bash blocks. Integration is: run the script, parse
+  the output, include in the review findings
+- If slop-scan is not installed (`npx slop-scan` fails), skip silently
+- The script exits 0 always (diagnostic, never gates)
+
+## Effort estimate
+
+| Task | Human | CC+gstack |
+|------|-------|-----------|
+| Add to review/SKILL.md.tmpl | 2 hours | 10 min |
+| Add to ship/SKILL.md.tmpl | 2 hours | 10 min |
+| Add to review/checklist.md | 1 hour | 5 min |
+| Test with actual PRs | 2 hours | 15 min |
+| Regenerate SKILL.md files | — | 1 min |
@@ -16,6 +16,10 @@ allowed-tools:
  - Grep
  - Glob
  - AskUserQuestion
+triggers:
+  - update docs after ship
+  - document what changed
+  - post-ship docs
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -259,6 +263,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -377,6 +383,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -16,6 +16,10 @@ allowed-tools:
  - Grep
  - Glob
  - AskUserQuestion
+triggers:
+  - update docs after ship
+  - document what changed
+  - post-ship docs
 ---

 {{PREAMBLE}}
@@ -207,11 +207,11 @@ function captureBasicData(el) {
                source: sheet.href || 'inline',
              });
            }
-          } catch { /* skip rules that can't be matched */ }
+          } catch (e) { if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e; }
        }
-      } catch { /* cross-origin sheet — silently skip */ }
+      } catch (e) { if (!(e instanceof DOMException)) throw e; }
    }
-  } catch { /* CSSOM not available */ }
+  } catch (e) { if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e; }

  return { computedStyles, boxModel, matchedRules };
 }
@@ -219,7 +219,7 @@ function captureBasicData(el) {
 function basicBuildSelector(el) {
  if (el.id) {
    const sel = '#' + CSS.escape(el.id);
-    try { if (document.querySelectorAll(sel).length === 1) return sel; } catch {}
+    try { if (document.querySelectorAll(sel).length === 1) return sel; } catch (e) { if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e; }
  }
  const parts = [];
  let current = el;
@@ -159,7 +159,8 @@
  function isUnique(selector) {
    try {
      return document.querySelectorAll(selector).length === 1;
-    } catch {
+    } catch (e) {
+      if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e;
      return false;
    }
  }
@@ -244,11 +245,11 @@
                  source: sheet.href || 'inline',
                });
              }
-            } catch { /* skip rules that can't be matched */ }
+            } catch (e) { if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e; }
          }
-        } catch { /* cross-origin sheet — silently skip */ }
+        } catch (e) { if (!(e instanceof DOMException)) throw e; }
      }
-    } catch { /* CSSOM not available */ }
+    } catch (e) { if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e; }

    return { computedStyles, boxModel, matchedRules };
  }
@@ -290,7 +291,7 @@
      try {
        frameInfo.frameSrc = window.location.href;
        frameInfo.frameName = window.name || null;
-      } catch { /* cross-origin frame */ }
+      } catch (e) { if (!(e instanceof DOMException)) throw e; }
    }

    chrome.runtime.sendMessage({
@@ -347,7 +348,8 @@
  function findElement(selector) {
    try {
      return document.querySelector(selector);
-    } catch {
+    } catch (e) {
+      if (!(e instanceof TypeError) && !(e instanceof DOMException)) throw e;
      return null;
    }
  }
@@ -7,6 +7,10 @@ description: |
  "fixing" unrelated code, or when you want to scope changes to one module.
  Use when asked to "freeze", "restrict edits", "only edit this folder",
  or "lock down edits". (gstack)
+triggers:
+  - freeze edits to directory
+  - lock editing scope
+  - restrict file changes
 allowed-tools:
  - Bash
  - Read
@@ -7,6 +7,10 @@ description: |
  "fixing" unrelated code, or when you want to scope changes to one module.
  Use when asked to "freeze", "restrict edits", "only edit this folder",
  or "lock down edits". (gstack)
+triggers:
+  - freeze edits to directory
+  - lock editing scope
+  - restrict file changes
 allowed-tools:
  - Bash
  - Read
@@ -6,6 +6,10 @@ description: |
  runs the upgrade, and shows what's new. Use when asked to "upgrade gstack",
  "update gstack", or "get latest version".
  Voice triggers (speech-to-text aliases): "upgrade the tools", "update the tools", "gee stack upgrade", "g stack upgrade".
+triggers:
+  - upgrade gstack
+  - update gstack version
+  - get latest gstack
 allowed-tools:
  - Bash
  - Read
@@ -49,7 +53,7 @@ Tell user: "Auto-upgrade enabled. Future updates will install automatically." Th

 **If "Not now":** Write snooze state with escalating backoff (first snooze = 24h, second = 48h, third+ = 1 week), then continue with the current skill. Do not mention the upgrade again.
 ```bash
-_SNOOZE_FILE=~/.gstack/update-snoozed
+_SNOOZE_FILE="$HOME/.gstack/update-snoozed"
 _REMOTE_VER="{new}"
 _CUR_LEVEL=0
 if [ -f "$_SNOOZE_FILE" ]; then
@@ -10,6 +10,10 @@ voice-triggers:
  - "update the tools"
  - "gee stack upgrade"
  - "g stack upgrade"
+triggers:
+  - upgrade gstack
+  - update gstack version
+  - get latest gstack
 allowed-tools:
  - Bash
  - Read
@@ -51,7 +55,7 @@ Tell user: "Auto-upgrade enabled. Future updates will install automatically." Th

 **If "Not now":** Write snooze state with escalating backoff (first snooze = 24h, second = 48h, third+ = 1 week), then continue with the current skill. Do not mention the upgrade again.
 ```bash
-_SNOOZE_FILE=~/.gstack/update-snoozed
+_SNOOZE_FILE="$HOME/.gstack/update-snoozed"
 _REMOTE_VER="{new}"
 _CUR_LEVEL=0
 if [ -f "$_SNOOZE_FILE" ]; then
@@ -7,6 +7,10 @@ description: |
  /freeze (blocks edits outside a specified directory). Use for maximum safety
  when touching prod or debugging live systems. Use when asked to "guard mode",
  "full safety", "lock it down", or "maximum safety". (gstack)
+triggers:
+  - full safety mode
+  - guard against mistakes
+  - maximum safety
 allowed-tools:
  - Bash
  - Read
@@ -7,6 +7,10 @@ description: |
  /freeze (blocks edits outside a specified directory). Use for maximum safety
  when touching prod or debugging live systems. Use when asked to "guard mode",
  "full safety", "lock it down", or "maximum safety". (gstack)
+triggers:
+  - full safety mode
+  - guard against mistakes
+  - maximum safety
 allowed-tools:
  - Bash
  - Read
@@ -8,6 +8,10 @@ description: |
  0-10 score, and tracks trends over time. Use when: "health check",
  "code quality", "how healthy is the codebase", "run all checks",
  "quality score". (gstack)
+triggers:
+  - code health check
+  - quality dashboard
+  - how healthy is codebase
 allowed-tools:
  - Bash
  - Read
@@ -259,6 +263,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -377,6 +383,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -8,6 +8,10 @@ description: |
  0-10 score, and tracks trends over time. Use when: "health check",
  "code quality", "how healthy is the codebase", "run all checks",
  "quality score". (gstack)
+triggers:
+  - code health check
+  - quality dashboard
+  - how healthy is codebase
 allowed-tools:
  - Bash
  - Read
@@ -24,7 +24,7 @@ const claude: HostConfig = {

  pathRewrites: [],  // Claude is the primary host — no rewrites needed
  toolRewrites: {},
-  suppressedResolvers: [],
+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],

  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
@@ -37,6 +37,8 @@ const codex: HostConfig = {
    'CODEX_SECOND_OPINION',   // review.ts:257 — Codex can't invoke itself
    'CODEX_PLAN_REVIEW',      // review.ts:541 — Codex can't invoke itself
    'REVIEW_ARMY',            // review-army.ts:180 — Codex shouldn't orchestrate
+    'GBRAIN_CONTEXT_LOAD',
+    'GBRAIN_SAVE_RESULTS',
  ],

  runtimeRoot: {
@@ -28,6 +28,8 @@ const cursor: HostConfig = {
    { from: '.claude/skills', to: '.cursor/skills' },
  ],

+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],
+
  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
    globalFiles: {
@@ -43,6 +43,8 @@ const factory: HostConfig = {
    'use the Glob tool': 'find files matching',
  },

+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],
+
  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
    globalFiles: {
@@ -0,0 +1,78 @@
+import type { HostConfig } from '../scripts/host-config';
+
+/**
+ * GBrain host config.
+ * Compatible with GBrain >= v0.10.0 (doctor --fast --json, search CLI, entity enrichment).
+ * When updating, check INSTALL_FOR_AGENTS.md in the GBrain repo for breaking changes.
+ */
+const gbrain: HostConfig = {
+  name: 'gbrain',
+  displayName: 'GBrain',
+  cliCommand: 'gbrain',
+  cliAliases: [],
+
+  globalRoot: '.gbrain/skills/gstack',
+  localSkillRoot: '.gbrain/skills/gstack',
+  hostSubdir: '.gbrain',
+  usesEnvVars: true,
+
+  frontmatter: {
+    mode: 'allowlist',
+    keepFields: ['name', 'description', 'triggers'],
+    descriptionLimit: null,
+  },
+
+  generation: {
+    generateMetadata: false,
+    skipSkills: ['codex'],
+    includeSkills: [],
+  },
+
+  pathRewrites: [
+    { from: '~/.claude/skills/gstack', to: '~/.gbrain/skills/gstack' },
+    { from: '.claude/skills/gstack', to: '.gbrain/skills/gstack' },
+    { from: '.claude/skills', to: '.gbrain/skills' },
+    { from: 'CLAUDE.md', to: 'AGENTS.md' },
+  ],
+  toolRewrites: {
+    'use the Bash tool': 'use the exec tool',
+    'use the Write tool': 'use the write tool',
+    'use the Read tool': 'use the read tool',
+    'use the Edit tool': 'use the edit tool',
+    'use the Agent tool': 'use sessions_spawn',
+    'use the Grep tool': 'search for',
+    'use the Glob tool': 'find files matching',
+    'the Bash tool': 'the exec tool',
+    'the Read tool': 'the read tool',
+    'the Write tool': 'the write tool',
+    'the Edit tool': 'the edit tool',
+  },
+
+  // GBrain gets brain-aware resolvers. All other hosts suppress these.
+  suppressedResolvers: [
+    'DESIGN_OUTSIDE_VOICES',
+    'ADVERSARIAL_STEP',
+    'CODEX_SECOND_OPINION',
+    'CODEX_PLAN_REVIEW',
+    'REVIEW_ARMY',
+    // NOTE: GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS are NOT suppressed here.
+    // GBrain is the only host that gets brain-first lookup and save-to-brain behavior.
+  ],
+
+  runtimeRoot: {
+    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
+    globalFiles: {
+      'review': ['checklist.md', 'TODOS-format.md'],
+    },
+  },
+
+  install: {
+    prefixable: false,
+    linkingStrategy: 'symlink-generated',
+  },
+
+  coAuthorTrailer: 'Co-Authored-By: GBrain Agent <agent@gbrain.dev>',
+  learningsMode: 'basic',
+};
+
+export default gbrain;
@@ -0,0 +1,73 @@
+import type { HostConfig } from '../scripts/host-config';
+
+const hermes: HostConfig = {
+  name: 'hermes',
+  displayName: 'Hermes',
+  cliCommand: 'hermes',
+  cliAliases: [],
+
+  globalRoot: '.hermes/skills/gstack',
+  localSkillRoot: '.hermes/skills/gstack',
+  hostSubdir: '.hermes',
+  usesEnvVars: true,
+
+  frontmatter: {
+    mode: 'allowlist',
+    keepFields: ['name', 'description'],
+    descriptionLimit: null,
+  },
+
+  generation: {
+    generateMetadata: false,
+    skipSkills: ['codex'],
+    includeSkills: [],
+  },
+
+  pathRewrites: [
+    { from: '~/.claude/skills/gstack', to: '~/.hermes/skills/gstack' },
+    { from: '.claude/skills/gstack', to: '.hermes/skills/gstack' },
+    { from: '.claude/skills', to: '.hermes/skills' },
+    { from: 'CLAUDE.md', to: 'AGENTS.md' },
+  ],
+  toolRewrites: {
+    'use the Bash tool': 'use the terminal tool',
+    'use the Write tool': 'use the patch tool',
+    'use the Read tool': 'use the read_file tool',
+    'use the Edit tool': 'use the patch tool',
+    'use the Agent tool': 'use delegate_task',
+    'use the Grep tool': 'search for',
+    'use the Glob tool': 'find files matching',
+    'the Bash tool': 'the terminal tool',
+    'the Read tool': 'the read_file tool',
+    'the Write tool': 'the patch tool',
+    'the Edit tool': 'the patch tool',
+  },
+
+  suppressedResolvers: [
+    'DESIGN_OUTSIDE_VOICES',
+    'ADVERSARIAL_STEP',
+    'CODEX_SECOND_OPINION',
+    'CODEX_PLAN_REVIEW',
+    'REVIEW_ARMY',
+    // GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS are NOT suppressed.
+    // The resolvers handle GBrain-not-installed gracefully ("proceed without brain context").
+    // If Hermes has GBrain as a mod, brain features activate automatically.
+  ],
+
+  runtimeRoot: {
+    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
+    globalFiles: {
+      'review': ['checklist.md', 'TODOS-format.md'],
+    },
+  },
+
+  install: {
+    prefixable: false,
+    linkingStrategy: 'symlink-generated',
+  },
+
+  coAuthorTrailer: 'Co-Authored-By: Hermes Agent <agent@nousresearch.com>',
+  learningsMode: 'basic',
+};
+
+export default hermes;
@@ -14,9 +14,11 @@ import opencode from './opencode';
 import slate from './slate';
 import cursor from './cursor';
 import openclaw from './openclaw';
+import hermes from './hermes';
+import gbrain from './gbrain';

 /** All registered host configs. Add new hosts here. */
-export const ALL_HOST_CONFIGS: HostConfig[] = [claude, codex, factory, kiro, opencode, slate, cursor, openclaw];
+export const ALL_HOST_CONFIGS: HostConfig[] = [claude, codex, factory, kiro, opencode, slate, cursor, openclaw, hermes, gbrain];

 /** Map from host name to config. */
 export const HOST_CONFIG_MAP: Record<string, HostConfig> = Object.fromEntries(
@@ -63,4 +65,4 @@ export function getExternalHosts(): HostConfig[] {
 }

 // Re-export individual configs for direct import
-export { claude, codex, factory, kiro, opencode, slate, cursor, openclaw };
+export { claude, codex, factory, kiro, opencode, slate, cursor, openclaw, hermes, gbrain };
@@ -30,6 +30,8 @@ const kiro: HostConfig = {
    { from: '.codex/skills', to: '.kiro/skills' },
  ],

+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],
+
  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
    globalFiles: {
@@ -53,6 +53,8 @@ const openclaw: HostConfig = {
    'CODEX_SECOND_OPINION',
    'CODEX_PLAN_REVIEW',
    'REVIEW_ARMY',
+    'GBRAIN_CONTEXT_LOAD',
+    'GBRAIN_SAVE_RESULTS',
  ],

  runtimeRoot: {
@@ -69,8 +71,6 @@ const openclaw: HostConfig = {

  coAuthorTrailer: 'Co-Authored-By: OpenClaw Agent <agent@openclaw.ai>',
  learningsMode: 'basic',
-
-  adapter: './scripts/host-adapters/openclaw-adapter',
 };

 export default openclaw;
@@ -28,6 +28,8 @@ const opencode: HostConfig = {
    { from: '.claude/skills', to: '.opencode/skills' },
  ],

+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],
+
  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
    globalFiles: {
@@ -28,6 +28,8 @@ const slate: HostConfig = {
    { from: '.claude/skills', to: '.slate/skills' },
  ],

+  suppressedResolvers: ['GBRAIN_CONTEXT_LOAD', 'GBRAIN_SAVE_RESULTS'],
+
  runtimeRoot: {
    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
    globalFiles: {
@@ -19,6 +19,12 @@ allowed-tools:
  - Glob
  - AskUserQuestion
  - WebSearch
+triggers:
+  - debug this
+  - fix this bug
+  - why is this broken
+  - root cause analysis
+  - investigate this error
 hooks:
  PreToolUse:
    - matcher: "Edit"
@@ -274,6 +280,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -392,6 +400,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -559,6 +580,8 @@ Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address r

 ---

+
+
 ## Phase 1: Root Cause Investigation

 Gather context before forming any hypothesis.
@@ -575,6 +598,8 @@ Gather context before forming any hypothesis.

 4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.

+5. **Check investigation history:** Search prior learnings for investigations on the same files. Recurring bugs in the same area are an architectural smell. If prior investigations exist, note patterns and check if the root cause was structural.
+
 ## Prior Learnings

 Search for relevant learnings from previous sessions:
@@ -736,6 +761,12 @@ Status:          DONE | DONE_WITH_CONCERNS | BLOCKED
 ════════════════════════════════════════
 ```

+Log the investigation as a learning for future sessions. Use `type: "investigation"` and include the affected files so future investigations on the same area can find this:
+
+```bash
+~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"investigate","type":"investigation","key":"ROOT_CAUSE_KEY","insight":"ROOT_CAUSE_SUMMARY","confidence":9,"source":"observed","files":["affected/file1.ts","affected/file2.ts"]}'
+```
+
 ## Capture Learnings

 If you discovered a non-obvious pattern, pitfall, or architectural insight during
@@ -761,6 +792,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ---

 ## Important Rules
@@ -19,6 +19,12 @@ allowed-tools:
  - Glob
  - AskUserQuestion
  - WebSearch
+triggers:
+  - debug this
+  - fix this bug
+  - why is this broken
+  - root cause analysis
+  - investigate this error
 hooks:
  PreToolUse:
    - matcher: "Edit"
@@ -45,6 +51,8 @@ Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address r

 ---

+{{GBRAIN_CONTEXT_LOAD}}
+
 ## Phase 1: Root Cause Investigation

 Gather context before forming any hypothesis.
@@ -61,6 +69,8 @@ Gather context before forming any hypothesis.

 4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.

+5. **Check investigation history:** Search prior learnings for investigations on the same files. Recurring bugs in the same area are an architectural smell. If prior investigations exist, note patterns and check if the root cause was structural.
+
 {{LEARNINGS_SEARCH}}

 Output: **"Root cause hypothesis: ..."** — a specific, testable claim about what is wrong and why.
@@ -186,8 +196,16 @@ Status:          DONE | DONE_WITH_CONCERNS | BLOCKED
 ════════════════════════════════════════
 ```

+Log the investigation as a learning for future sessions. Use `type: "investigation"` and include the affected files so future investigations on the same area can find this:
+
+```bash
+~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"investigate","type":"investigation","key":"ROOT_CAUSE_KEY","insight":"ROOT_CAUSE_SUMMARY","confidence":9,"source":"observed","files":["affected/file1.ts","affected/file2.ts"]}'
+```
+
 {{LEARNINGS_LOG}}

+{{GBRAIN_SAVE_RESULTS}}
+
 ---

 ## Important Rules
@@ -13,6 +13,10 @@ allowed-tools:
  - Write
  - Glob
  - AskUserQuestion
+triggers:
+  - merge and deploy
+  - land the pr
+  - ship to production
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -256,6 +260,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -374,6 +380,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -555,7 +574,7 @@ plan's living status.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -14,6 +14,10 @@ allowed-tools:
  - Glob
  - AskUserQuestion
 sensitive: true
+triggers:
+  - merge and deploy
+  - land the pr
+  - ship to production
 ---

 {{PREAMBLE}}
@@ -8,6 +8,10 @@ description: |
  "show learnings", "prune stale learnings", or "export learnings".
  Proactively suggest when the user asks about past patterns or wonders
  "didn't we fix this before?"
+triggers:
+  - show learnings
+  - what have we learned
+  - manage project learnings
 allowed-tools:
  - Bash
  - Read
@@ -259,6 +263,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -377,6 +383,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Completion Status Protocol

 When completing a skill workflow, report status using one of:
@@ -8,6 +8,10 @@ description: |
  "show learnings", "prune stale learnings", or "export learnings".
  Proactively suggest when the user asks about past patterns or wonders
  "didn't we fix this before?"
+triggers:
+  - show learnings
+  - what have we learned
+  - manage project learnings
 allowed-tools:
  - Bash
  - Read
@@ -23,6 +23,11 @@ allowed-tools:
  - Edit
  - AskUserQuestion
  - WebSearch
+triggers:
+  - brainstorm this
+  - is this worth building
+  - help me think through
+  - office hours
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -266,6 +271,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -384,6 +391,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -565,7 +585,7 @@ plan's living status.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -603,6 +623,8 @@ You are a **YC office hours partner**. Your job is to ensure the problem is unde

 ---

+
+
 ## Phase 1: Context Gathering

 Understand the project and the area the user wants to change.
@@ -1322,7 +1344,10 @@ PRIOR=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
 ```
 If `$PRIOR` exists, the new doc gets a `Supersedes:` field referencing it. This creates a revision chain — you can trace how a design evolved across office hours sessions.

-Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-{datetime}.md`:
+Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-{datetime}.md`.
+
+After writing the design doc, tell the user:
+**"Design doc saved to: {full path}. Other skills (/plan-ceo-review, /plan-eng-review) will find it automatically."**

 ### Startup mode design doc template:

@@ -1511,6 +1536,8 @@ Present the reviewed design doc to the user via AskUserQuestion:
 - B) Revise — specify which sections need changes (loop back to revise those sections)
 - C) Start over — return to Phase 2

+
+
 ---

 ## Phase 6: Handoff — The Relationship Closing
@@ -23,6 +23,11 @@ allowed-tools:
  - Edit
  - AskUserQuestion
  - WebSearch
+triggers:
+  - brainstorm this
+  - is this worth building
+  - help me think through
+  - office hours
 ---

 {{PREAMBLE}}
@@ -37,6 +42,8 @@ You are a **YC office hours partner**. Your job is to ensure the problem is unde

 ---

+{{GBRAIN_CONTEXT_LOAD}}
+
 ## Phase 1: Context Gathering

 Understand the project and the area the user wants to change.
@@ -462,7 +469,10 @@ PRIOR=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
 ```
 If `$PRIOR` exists, the new doc gets a `Supersedes:` field referencing it. This creates a revision chain — you can trace how a design evolved across office hours sessions.

-Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-{datetime}.md`:
+Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-{datetime}.md`.
+
+After writing the design doc, tell the user:
+**"Design doc saved to: {full path}. Other skills (/plan-ceo-review, /plan-eng-review) will find it automatically."**

 ### Startup mode design doc template:

@@ -591,6 +601,8 @@ Present the reviewed design doc to the user via AskUserQuestion:
 - B) Revise — specify which sections need changes (loop back to revise those sections)
 - C) Start over — return to Phase 2

+{{GBRAIN_SAVE_RESULTS}}
+
 ---

 ## Phase 6: Handoff — The Relationship Closing
@@ -8,6 +8,10 @@ description: |
  Use when asked to "open gstack browser", "launch browser", "connect chrome",
  "open chrome", "real browser", "launch chrome", "side panel", or "control my browser".
  Voice triggers (speech-to-text aliases): "show me the browser".
+triggers:
+  - open gstack browser
+  - launch chromium
+  - show me the browser
 allowed-tools:
  - Bash
  - Read
@@ -256,6 +260,8 @@ AI orchestrator (e.g., OpenClaw). In spawned sessions:
 - Focus on completing the task and reporting results via prose output.
 - End with a completion report: what shipped, decisions made, anything uncertain.

+
+
 ## Voice

 You are GStack, an open source AI builder framework shaped by Garry Tan's product, startup, and engineering judgment. Encode how he thinks, not his biography.
@@ -374,6 +380,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Repo Ownership — See Something, Say Something

 `REPO_MODE` controls how to handle issues outside your branch:
@@ -560,7 +579,7 @@ anti-bot stealth, and custom branding. You see every action in real time.
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
-[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+[ -z "$B" ] && B="$HOME/.claude/skills/gstack/browse/dist/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -9,6 +9,10 @@ description: |
  "open chrome", "real browser", "launch chrome", "side panel", or "control my browser".
 voice-triggers:
  - "show me the browser"
+triggers:
+  - open gstack browser
+  - launch chromium
+  - show me the browser
 allowed-tools:
  - Bash
  - Read
@@ -129,6 +129,7 @@ Once selected, commit fully. Do not silently drift.
 **Anti-skip rule:** Never condense, abbreviate, or skip any review section regardless of plan type. If a section genuinely has zero findings, say "No issues found" and move on, but you must evaluate it.

 Ask the user about each issue ONE AT A TIME. Do NOT batch.
+**Reminder: Do NOT make any code changes. Review only.**

 ### Section 1: Architecture Review
 Evaluate system design, component boundaries, data flow (all four paths), state machines, coupling, scaling, security architecture, production failure scenarios, rollback posture. Draw dependency graphs.
@@ -281,7 +281,8 @@ Count the signals for the closing message.

 ## Phase 5: Design Doc

-Write the design document and save it to memory.
+Write the design document and save it to memory. After writing, tell the user:
+**"Design doc saved. Other skills (/plan-ceo-review, /plan-eng-review) will find it automatically."**

 ### Startup mode design doc template:

--- a/Show More
+++ b/Show More
@@ -1 +1 @@
 .16.2.0
 .18.0.1