Merge branch 'main' into garrytan/team-supabase-store

Brings in 48 commits from main (v0.15.7–v0.15.16): deterministic slugs, TabSession refactor, pair-agent tunnel fix, content security layers, community security wave, team-friendly install, interactive snapshots. Conflict resolution: - .gitignore: merged both sides (kept .factory/ + added .kiro/.opencode/ .slate/.cursor/.openclaw/ from main) - open-gstack-browser/SKILL.md: accepted main (renamed from .factory/) - setup-team-sync/SKILL.md: regenerated via gen:skill-docs - test/fixtures/golden/*: updated golden baselines for ship SKILL.md - codex-ship-SKILL.md: accepted main (renamed from .factory/) - package.json version: synced to VERSION (0.15.16.0) - bin/gstack-uninstall: check settings file exists before claiming SessionStart hook removal (fixes false positive on clean systems) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 21:46:40 +02:00 · 2026-04-07 20:47:07 -10:00
parent e3384e325c 9d34baa973
commit fa22799b56
258 changed files with 55174 additions and 2692 deletions
@@ -0,0 +1,182 @@
+# Adding a New Host to gstack
+
+gstack uses a declarative host config system. Each supported AI coding agent
+(Claude, Codex, Factory, Kiro, OpenCode, Slate, Cursor, OpenClaw) is defined
+as a typed TypeScript config object. Adding a new host means creating one file
+and re-exporting it. Zero code changes to the generator, setup, or tooling.
+
+## How it works
+
+```
+hosts/
+├── claude.ts        # Primary host
+├── codex.ts         # OpenAI Codex CLI
+├── factory.ts       # Factory Droid
+├── kiro.ts          # Amazon Kiro
+├── opencode.ts      # OpenCode
+├── slate.ts         # Slate (Random Labs)
+├── cursor.ts        # Cursor
+├── openclaw.ts      # OpenClaw (hybrid: config + adapter)
+└── index.ts         # Registry: imports all, derives Host type
+```
+
+Each config file exports a `HostConfig` object that tells the generator:
+- Where to put generated skills (paths)
+- How to transform frontmatter (allowlist/denylist fields)
+- What Claude-specific references to rewrite (paths, tool names)
+- What binary to detect for auto-install
+- What resolver sections to suppress
+- What assets to symlink at install time
+
+The generator, setup script, platform-detect, uninstall, health checks, worktree
+copy, and tests all read from these configs. None of them have per-host code.
+
+## Step-by-step: add a new host
+
+### 1. Create the config file
+
+Copy an existing config as a starting point. `hosts/opencode.ts` is a good
+minimal example. `hosts/factory.ts` shows tool rewrites and conditional fields.
+`hosts/openclaw.ts` shows the adapter pattern for hosts with different tool models.
+
+Create `hosts/myhost.ts`:
+
+```typescript
+import type { HostConfig } from '../scripts/host-config';
+
+const myhost: HostConfig = {
+  name: 'myhost',
+  displayName: 'MyHost',
+  cliCommand: 'myhost',        // binary name for `command -v` detection
+  cliAliases: [],              // alternative binary names
+
+  globalRoot: '.myhost/skills/gstack',
+  localSkillRoot: '.myhost/skills/gstack',
+  hostSubdir: '.myhost',
+  usesEnvVars: true,           // false only for Claude (uses literal ~ paths)
+
+  frontmatter: {
+    mode: 'allowlist',         // 'allowlist' keeps only listed fields
+    keepFields: ['name', 'description'],
+    descriptionLimit: null,    // set to 1024 for hosts with limits
+  },
+
+  generation: {
+    generateMetadata: false,   // true only for Codex (openai.yaml)
+    skipSkills: ['codex'],     // codex skill is Claude-only
+  },
+
+  pathRewrites: [
+    { from: '~/.claude/skills/gstack', to: '~/.myhost/skills/gstack' },
+    { from: '.claude/skills/gstack', to: '.myhost/skills/gstack' },
+    { from: '.claude/skills', to: '.myhost/skills' },
+  ],
+
+  runtimeRoot: {
+    globalSymlinks: ['bin', 'browse/dist', 'browse/bin', 'gstack-upgrade', 'ETHOS.md'],
+    globalFiles: { 'review': ['checklist.md', 'TODOS-format.md'] },
+  },
+
+  install: {
+    prefixable: false,
+    linkingStrategy: 'symlink-generated',
+  },
+
+  learningsMode: 'basic',
+};
+
+export default myhost;
+```
+
+### 2. Register in the index
+
+Edit `hosts/index.ts`:
+
+```typescript
+import myhost from './myhost';
+
+// Add to ALL_HOST_CONFIGS array:
+export const ALL_HOST_CONFIGS: HostConfig[] = [
+  claude, codex, factory, kiro, opencode, slate, cursor, openclaw, myhost
+];
+
+// Add to re-exports:
+export { claude, codex, factory, kiro, opencode, slate, cursor, openclaw, myhost };
+```
+
+### 3. Add to .gitignore
+
+Add `.myhost/` to `.gitignore` (generated skill docs are gitignored).
+
+### 4. Generate and verify
+
+```bash
+# Generate skill docs for the new host
+bun run gen:skill-docs --host myhost
+
+# Verify output exists and has no .claude/skills leakage
+ls .myhost/skills/gstack-*/SKILL.md
+grep -r ".claude/skills" .myhost/skills/ | head -5
+# (should be empty)
+
+# Generate for all hosts (includes the new one)
+bun run gen:skill-docs --host all
+
+# Health dashboard shows the new host
+bun run skill:check
+```
+
+### 5. Run tests
+
+```bash
+bun test test/gen-skill-docs.test.ts
+bun test test/host-config.test.ts
+```
+
+The parameterized smoke tests automatically pick up the new host. Zero test
+code to write. They verify: output exists, no path leakage, valid frontmatter,
+freshness check passes, codex skill excluded.
+
+### 6. Update README.md
+
+Add install instructions for the new host in the appropriate section.
+
+## Config field reference
+
+See `scripts/host-config.ts` for the full `HostConfig` interface with JSDoc
+comments on every field.
+
+Key fields:
+
+| Field | Purpose |
+|-------|---------|
+| `frontmatter.mode` | `allowlist` (keep only listed) or `denylist` (strip listed) |
+| `frontmatter.descriptionLimit` | Max chars, `null` for no limit |
+| `frontmatter.descriptionLimitBehavior` | `error` (fail build), `truncate`, `warn` |
+| `frontmatter.conditionalFields` | Add fields based on template values (e.g., sensitive → disable-model-invocation) |
+| `frontmatter.renameFields` | Rename template fields (e.g., voice-triggers → triggers) |
+| `pathRewrites` | Literal replaceAll on content. Order matters. |
+| `toolRewrites` | Rewrite Claude tool names (e.g., "use the Bash tool" → "run this command") |
+| `suppressedResolvers` | Resolver functions that return empty for this host |
+| `coAuthorTrailer` | Git co-author string for commits |
+| `boundaryInstruction` | Anti-prompt-injection warning for cross-model invocations |
+| `adapter` | Path to adapter module for complex transformations |
+
+## Adapter pattern (for hosts with different tool models)
+
+If string-replace tool rewrites aren't enough (the host has fundamentally
+different tool semantics), use the adapter pattern. See `hosts/openclaw.ts`
+and `scripts/host-adapters/openclaw-adapter.ts`.
+
+The adapter runs as a post-processing step after all generic rewrites. It
+exports `transform(content: string, config: HostConfig): string`.
+
+## Validation
+
+The `validateHostConfig()` function in `scripts/host-config.ts` checks:
+- Name: lowercase alphanumeric with hyphens
+- CLI command: alphanumeric with hyphens/underscores
+- Paths: safe characters only (alphanumeric, `.`, `/`, `$`, `{}`, `~`, `-`, `_`)
+- No duplicate names, hostSubdirs, or globalRoots across configs
+
+Run `bun run scripts/host-config-export.ts validate` to check all configs.
@@ -0,0 +1,145 @@
+# gstack x OpenClaw Integration
+
+gstack integrates with OpenClaw as a methodology source, not a ported codebase.
+OpenClaw's ACP runtime spawns Claude Code sessions natively. gstack provides the
+planning discipline and methodology that makes those sessions better.
+
+This is a lightweight protocol encoded as prompt text. No daemon. No JSON-RPC.
+No compatibility matrices. The prompt is the bridge.
+
+## Architecture
+
+```
+  OpenClaw                               gstack repo
+  ─────────────────────                    ──────────────
+  Orchestrator: messaging,                 Source of truth for
+  calendar, memory, EA                     methodology + planning
+       │                                        │
+       ├── Native skills (conversational)       ├── Generates native skills
+       │   office-hours, ceo-review,            │   via gen-skill-docs pipeline
+       │   investigate, retro                   │
+       │                                        ├── Generates gstack-lite
+       ├── sessions_spawn(runtime: "acp")       │   (planning discipline)
+       │       │                                │
+       │       └── Claude Code                  ├── Generates gstack-full
+       │           └── gstack installed at      │   (complete pipeline)
+       │               ~/.claude/skills/gstack  │
+       │                                        └── docs/OPENCLAW.md (this file)
+       └── Dispatch routing (AGENTS.md)
+```
+
+## Dispatch Routing
+
+OpenClaw decides at spawn time which tier of gstack support to use:
+
+| Tier | When | Prompt prefix |
+|------|------|---------------|
+| **Simple** | One-file edits, typos, config changes | No gstack context injected |
+| **Medium** | Multi-file features, refactors | gstack-lite CLAUDE.md appended |
+| **Heavy** | Specific gstack skill needed | "Load gstack. Run /X" |
+| **Full** | Complete features, objectives, projects | gstack-full pipeline appended |
+| **Plan** | "Help me plan a Claude Code project" | gstack-plan pipeline appended |
+
+### Decision heuristic
+
+- Can it be done in <10 lines of code? -> **Simple**
+- Does it touch multiple files but the approach is obvious? -> **Medium**
+- Does the user name a specific skill (/cso, /review, /qa)? -> **Heavy**
+- Is it a feature, project, or objective (not a task)? -> **Full**
+- Does the user want to PLAN something for Claude Code without implementing yet? -> **Plan**
+
+### Dispatch routing guide (for AGENTS.md)
+
+The complete ready-to-paste section lives in `openclaw/agents-gstack-section.md`.
+Copy it into your OpenClaw AGENTS.md.
+
+Key behavioral rules (these go ABOVE the dispatch tiers):
+
+1. **Always spawn, never redirect.** When the user asks to use ANY gstack skill,
+   ALWAYS spawn a Claude Code session. Never tell the user to open Claude Code.
+2. **Resolve the repo.** If the user names a repo, set the working directory. If
+   unknown, ask which repo.
+3. **Autoplan runs end-to-end.** Spawn, let it run the full pipeline, report back
+   in chat. User should never have to leave Telegram.
+
+### CLAUDE.md collision handling
+
+When spawning Claude Code in a repo that already has a CLAUDE.md, APPEND
+gstack-lite/full as a new section. Do not replace the repo's existing instructions.
+
+## What gstack generates for OpenClaw
+
+All artifacts live in the `openclaw/` directory and are generated by
+`bun run gen:skill-docs --host openclaw`:
+
+### gstack-lite (Medium tier)
+`openclaw/gstack-lite-CLAUDE.md` — ~15 lines of planning discipline:
+1. Read every file before modifying
+2. Write a 5-line plan: what, why, which files, test case, risk
+3. Resolve ambiguity using decision principles
+4. Self-review before reporting done
+5. Completion report: what shipped, decisions made, anything uncertain
+
+A/B tested: 2x time, meaningfully better output.
+
+### gstack-full (Full tier)
+`openclaw/gstack-full-CLAUDE.md` — chains existing gstack skills:
+1. Read CLAUDE.md and understand the project
+2. Run /autoplan (CEO + eng + design review)
+3. Implement the approved plan
+4. Run /ship to create a PR
+5. Report back with PR URL and decisions
+
+### gstack-plan (Plan tier)
+`openclaw/gstack-plan-CLAUDE.md` — full review gauntlet, no implementation:
+1. Run /office-hours to produce a design doc
+2. Run /autoplan (CEO + eng + design + DX reviews + codex adversarial)
+3. Save the reviewed plan to `plans/<project-slug>-plan-<date>.md`
+4. Report back: plan path, summary, key decisions, recommended next step
+
+The orchestrator persists the plan link to its own memory store (brain repo,
+knowledge base, or whatever is configured in AGENTS.md). When the user is
+ready to build, spawn a FULL session that references the saved plan.
+
+### Native methodology skills
+Published to ClawHub. Install with `clawhub install`:
+- `gstack-openclaw-office-hours` — Product interrogation (6 forcing questions)
+- `gstack-openclaw-ceo-review` — Strategic challenge (10-section review, 4 modes)
+- `gstack-openclaw-investigate` — Operational debugging (4-phase methodology)
+- `gstack-openclaw-retro` — Operational retrospective (weekly review)
+
+Source lives in `openclaw/skills/` in the gstack repo. These are hand-crafted
+adaptations of the gstack methodology for OpenClaw's conversational context.
+No gstack infrastructure (no browse, no telemetry, no preamble).
+
+## Spawned session detection
+
+When Claude Code runs inside a session spawned by OpenClaw, the `OPENCLAW_SESSION`
+environment variable should be set. gstack detects this and adjusts:
+- Skips interactive prompts (auto-chooses recommended options)
+- Skips upgrade checks and telemetry prompts
+- Focuses on task completion and prose reporting
+
+Set the env var in sessions_spawn: `env: { OPENCLAW_SESSION: "1" }`
+
+## Installation
+
+For OpenClaw users: tell your OpenClaw agent "install gstack for openclaw."
+
+The agent should:
+1. Install gstack-lite CLAUDE.md into its coding session templates
+2. Install the 4 native methodology skills
+3. Add dispatch routing to AGENTS.md
+4. Verify with a test spawn
+
+For gstack developers: `./setup --host openclaw` outputs this documentation.
+The actual artifacts are generated by `bun run gen:skill-docs --host openclaw`.
+
+## What we don't do
+
+- No dispatch daemon (ACP handles session spawning)
+- No Clawvisor relay (no security layer needed)
+- No bidirectional learnings bridge (brain repo is the knowledge store)
+- No JSON schemas or protocol versioning
+- No SOUL.md from gstack (OpenClaw has its own)
+- No full skill porting (coding skills stay native to Claude Code)
@@ -0,0 +1,178 @@
+# Remote Browser Access — How to Pair With a GStack Browser
+
+A GStack Browser server can be shared with any AI agent that can make HTTP requests.
+The agent gets scoped access to a real Chromium browser: navigate pages, read content,
+click elements, fill forms, take screenshots. Each agent gets its own tab.
+
+This document is the reference for remote agents. The quick-start instructions are
+generated by `$B pair-agent` with the actual credentials baked in.
+
+## Architecture
+
+```
+Your Machine                          Remote Agent
+─────────────                         ────────────
+GStack Browser Server                 Any AI agent
+  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
+  ├── HTTP API on localhost:PORT           │
+  ├── ngrok tunnel (optional)              │
+  │     https://xxx.ngrok.dev ─────────────┘
+  └── Token Registry
+        ├── Root token (local only)
+        ├── Setup keys (5 min, one-time)
+        └── Session tokens (24h, scoped)
+```
+
+## Connection Flow
+
+1. **User runs** `$B pair-agent` (or `/pair-agent` in Claude Code)
+2. **Server creates** a one-time setup key (expires in 5 minutes)
+3. **User copies** the instruction block into the other agent's chat
+4. **Remote agent runs** `POST /connect` with the setup key
+5. **Server returns** a scoped session token (24h default)
+6. **Remote agent creates** its own tab via `POST /command` with `newtab`
+7. **Remote agent browses** using `POST /command` with its session token + tabId
+
+## API Reference
+
+### Authentication
+
+All endpoints except `/connect` and `/health` require a Bearer token:
+
+```
+Authorization: Bearer gsk_sess_...
+```
+
+### Endpoints
+
+#### POST /connect
+Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute.
+
+```json
+Request:  {"setup_key": "gsk_setup_..."}
+Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}
+```
+
+#### POST /command
+Send a browser command. Requires Bearer auth.
+
+```json
+Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
+Response: (plain text result of the command)
+```
+
+#### GET /health
+Server status. No auth required. Returns status, tabs, mode, uptime.
+
+### Commands
+
+#### Navigation
+| Command | Args | Description |
+|---------|------|-------------|
+| `goto` | `["URL"]` | Navigate to a URL |
+| `back` | `[]` | Go back |
+| `forward` | `[]` | Go forward |
+| `reload` | `[]` | Reload page |
+
+#### Reading Content
+| Command | Args | Description |
+|---------|------|-------------|
+| `snapshot` | `["-i"]` | Interactive snapshot with @ref labels (most useful) |
+| `text` | `[]` | Full page text |
+| `html` | `["selector?"]` | HTML of element or full page |
+| `links` | `[]` | All links on page |
+| `screenshot` | `["/tmp/s.png"]` | Take a screenshot |
+| `url` | `[]` | Current URL |
+
+#### Interaction
+| Command | Args | Description |
+|---------|------|-------------|
+| `click` | `["@e3"]` | Click an element (use @ref from snapshot) |
+| `fill` | `["@e5", "text"]` | Fill a form field |
+| `select` | `["@e7", "option"]` | Select dropdown value |
+| `type` | `["text"]` | Type text (keyboard) |
+| `press` | `["Enter"]` | Press a key |
+| `scroll` | `["down"]` | Scroll the page |
+
+#### Tabs
+| Command | Args | Description |
+|---------|------|-------------|
+| `newtab` | `["URL?"]` | Create a new tab (required before writing) |
+| `tabs` | `[]` | List all tabs |
+| `closetab` | `["id?"]` | Close a tab |
+
+## The Snapshot → @ref Pattern
+
+This is the most powerful browsing pattern. Instead of writing CSS selectors:
+
+1. Run `snapshot -i` to get an interactive snapshot with labeled elements
+2. The snapshot returns text like:
+   ```
+   [Page Title]
+   @e1 [link] "Home"
+   @e2 [button] "Sign In"
+   @e3 [input] "Search..."
+   ```
+3. Use the `@e` refs directly in commands: `click @e2`, `fill @e3 "search query"`
+
+This is how the snapshot system works, and it's much more reliable than guessing
+CSS selectors. Always `snapshot -i` first, then use the refs.
+
+## Scopes
+
+| Scope | What it allows |
+|-------|---------------|
+| `read` | snapshot, text, html, links, screenshot, url, tabs, console, etc. |
+| `write` | goto, click, fill, scroll, newtab, closetab, etc. |
+| `admin` | eval, js, cookies, storage, cookie-import, useragent, etc. |
+| `meta` | tab, diff, frame, responsive, watch |
+
+Default tokens get `read` + `write`. Admin requires `--admin` flag when pairing.
+
+## Tab Isolation
+
+Each agent owns the tabs it creates. Rules:
+- **Read:** Any agent can read any tab (snapshot, text, screenshot)
+- **Write:** Only the tab owner can write (click, fill, goto, etc.)
+- **Unowned tabs:** Pre-existing tabs are root-only for writes
+- **First step:** Always `newtab` before trying to interact
+
+## Error Codes
+
+| Code | Meaning | What to do |
+|------|---------|------------|
+| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again |
+| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin |
+| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header |
+
+## Security Model
+
+- Setup keys expire in 5 minutes and can only be used once
+- Session tokens expire in 24 hours (configurable)
+- The root token never appears in instruction blocks or connection strings
+- Admin scope (JS execution, cookie access) is denied by default
+- Tokens can be revoked instantly: `$B tunnel revoke agent-name`
+- All agent activity is logged with attribution (clientId)
+
+## Same-Machine Shortcut
+
+If both agents are on the same machine, skip the copy-paste:
+
+```bash
+$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
+$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
+$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json
+```
+
+No tunnel needed. Uses localhost directly.
+
+## ngrok Tunnel Setup
+
+For remote agents on different machines:
+
+1. Sign up at [ngrok.com](https://ngrok.com) (free tier works)
+2. Copy your auth token from the dashboard
+3. Save it: `echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env`
+4. Optionally claim a stable domain: `echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env`
+5. Start with tunnel: `BROWSE_TUNNEL=1 $B restart`
+6. Run `$B pair-agent` — it will use the tunnel URL automatically
@@ -0,0 +1,376 @@
+# GStack Browser V0 — The AI-Native Development Browser
+
+**Date:** 2026-03-30
+**Author:** Garry Tan + Claude Code
+**Status:** Phase 1a shipped, Phase 1b in progress
+**Branch:** garrytan/gstack-as-browser
+
+## The Thesis
+
+Every other AI browser (Atlas, Dia, Comet, Chrome Auto Browse) starts with a
+consumer browser and bolts AI onto it. GStack Browser inverts this. It starts
+with Claude Code as the runtime and gives it a browser viewport.
+
+The agent is the primary citizen. The browser is the canvas. Skills are
+first-class capabilities. You don't "use a browser with AI help." You use
+an AI that can see and interact with the web.
+
+This is the IDE for the post-IDE era. Code lives in the terminal. The product
+lives in the browser. The AI works across both simultaneously. What Cursor did
+for text editors, GStack Browser does for the browser.
+
+## What It Is Today (Phase 1a, shipped)
+
+A double-clickable macOS .app that wraps Playwright's Chromium with the gstack
+sidebar extension baked in. You open it and Claude Code can see your screen,
+navigate pages, fill forms, take screenshots, inspect CSS, clean up overlays,
+and run any gstack skill. All without touching a terminal.
+
+```
+GStack Browser.app (389MB, 189MB DMG)
+├── Compiled browse binary (58MB) — CLI + HTTP server
+├── Chrome extension (172KB) — sidebar, activity feed, inspector
+├── Playwright's Chromium (330MB) — the actual browser
+└── Launcher script — binds project dir, sets env vars
+```
+
+Launch → Chromium opens with sidebar → extension auto-connects to browse server
+→ agent ready in ~5 seconds.
+
+## What It Will Be
+
+### Phase 1b: Developer UX (next)
+
+**Command Palette (Cmd+K):** The signature interaction. Opens a fuzzy-filtered
+skill picker. Type "/qa" to start QA testing, "/investigate" to debug, "/ship"
+to create a PR. Skills are fetched from the browse server, not hardcoded. The
+palette is the entry point to everything.
+
+**Quick Screenshot (Cmd+Shift+S):** Capture the current viewport and pipe it into
+the sidebar chat with "What do you see?" context. The AI analyzes the screenshot
+and gives you actionable feedback. Visual bug reports in one keystroke.
+
+**Status Bar:** A persistent 30px bar at the bottom of every page. Shows agent
+status (idle/thinking), workspace name, current branch, and auto-detected dev
+servers. Click a dev server pill to navigate. Always-visible context about what
+the AI is doing.
+
+**Auto-Detect Dev Servers:** On launch, scans common ports (3000, 3001, 4200,
+5173, 5174, 8000, 8080). If exactly one server is found, auto-navigates to it.
+Dev server pills in the status bar for one-click switching.
+
+### Phase 2: BoomLooper Integration
+
+The sidebar connects to BoomLooper's Phoenix/Elixir APIs instead of a local
+`claude -p` subprocess. BoomLooper provides:
+
+- **Multi-agent orchestration.** Spawn 5 agents in parallel, each with its own
+  browser tab. One runs QA, one does design review, one watches for regressions.
+- **Docker infrastructure.** Each agent gets an isolated container. The browser
+  inside the container tests the dev server. No port conflicts, no state leakage.
+- **Session persistence.** Agent conversations survive browser restarts. Pick up
+  where you left off.
+- **Team visibility.** Your teammates can watch what your agents are doing in
+  real-time. Like pair programming, but the pair is 5 AI agents and you're the
+  conductor.
+
+### Phase 3: Browse as BoomLooper Tool
+
+The browse binary becomes an MCP tool in BoomLooper. Agents in Docker containers
+use browse commands to test dev servers, take screenshots, fill forms, and verify
+deployments. Cross-platform compilation (linux-arm64/x64) required.
+
+### Phase 4: Chromium Fork (trigger-gated)
+
+When the extension side panel hits hard API limits, GStack Browser ships to
+external users, build infra exists, and the business justifies maintenance:
+fork Chromium. Brave's `chromium_src` override pattern, CC-powered 6-week
+rebases (2-4 hours with CC vs 1-2 weeks human). ~20-30 files modified.
+
+### Phase 5: Native Shell
+
+SwiftUI/AppKit app shell with native sidebar, isolated Chromium service. Full
+platform integration. May be superseded by Phase 4 if the Chromium fork includes
+a native sidebar.
+
+## Vision: What an AI Browser Can Do
+
+### 1. See What You See
+
+The browser is the AI's eyes. Not through screenshots (though it can do that),
+but through DOM access, CSS inspection, network monitoring, and accessibility
+tree parsing. The AI understands the page structure, not just the pixels.
+
+**Today:** `snapshot` command returns an accessibility-tree representation of any
+page. The AI can "see" every button, link, form field, and text element. Element
+references (`@e1`, `@e2`) let the AI click, fill, and interact.
+
+**Next:** Real-time page observation. The AI notices when a page changes, when an
+error appears in the console, when a network request fails. Proactive debugging
+without being asked.
+
+**Future:** Visual understanding. The AI compares before/after screenshots to catch
+visual regressions. Pixel-level design review. "This button moved 3px left and the
+font changed from 14px to 13px."
+
+### 2. Act on What It Sees
+
+Not just reading pages, but interacting with them like a human user would.
+
+**Today:** Click, fill, select, hover, type, scroll, upload files, handle dialogs,
+navigate, manage tabs. All via simple commands through the browse server.
+
+**Next:** Multi-step user flows. "Log in, go to settings, change the timezone,
+verify the confirmation message." The AI chains commands with verification at each
+step.
+
+**Future:** Autonomous QA agent. "Test every link on this page. Fill every form.
+Try to break it." The AI runs exhaustive interaction testing without a script.
+Finds bugs a human tester would miss because it tries combinations humans don't
+think of.
+
+### 3. Write Code While Browsing
+
+This is the key differentiator. The AI can see the bug in the browser AND fix it
+in the code simultaneously.
+
+**Today:** The sidebar chat connects to Claude Code. You say "this button is
+misaligned" and the AI reads the CSS, identifies the issue, and proposes a fix.
+The `/design-review` skill takes screenshots, identifies visual issues, and
+commits fixes with before/after evidence.
+
+**Next:** Live reload loop. The AI edits CSS/HTML, the browser auto-reloads, the
+AI verifies the fix visually. No human in the loop for simple visual fixes.
+"Fix every spacing issue on this page" becomes a 30-second task.
+
+**Future:** Full-stack debugging. The AI sees a 500 error in the browser, reads
+the server logs, traces to the failing line, writes the fix, and verifies in the
+browser. One command: "This page is broken. Fix it."
+
+### 4. Understand the Whole Stack
+
+The browser isn't just a viewport. It's a window into the application's health.
+
+**Today:**
+- Console log capture — every `console.log`, `console.error`, and warning
+- Network request monitoring — every XHR, fetch, websocket, and static asset
+- Performance metrics — Core Web Vitals, resource timing, paint events
+- Cookie and storage inspection — read and write localStorage, sessionStorage
+- CSS inspection — computed styles, box model, rule cascade
+
+**Next:**
+- Network request replay — "replay this failing request with different params"
+- Performance regression detection — "this page is 200ms slower than yesterday"
+- Dependency auditing — "this page loads 47 third-party scripts"
+- Accessibility auditing — "this form has no labels, these colors fail contrast"
+
+**Future:**
+- Full application telemetry — CPU, memory, GPU usage in real-time
+- Cross-browser testing — same test suite across Chrome, Firefox, Safari
+- Real user monitoring correlation — "this bug affects 12% of production users"
+
+### 5. The Workspace Model
+
+The browser IS the workspace. Not a tab in a workspace. The workspace itself.
+
+**Today:** Each browser session is bound to a project directory. The sidebar shows
+the current branch. The status bar shows detected dev servers.
+
+**Next:** Multi-project support. Switch between projects without closing the
+browser. Each project gets its own set of tabs, its own agent, its own context.
+Like VSCode workspaces, but for the browser.
+
+**Future:** Team workspaces. Multiple developers share a browser workspace. See
+each other's agents working. Collaborative debugging where one person navigates
+and the other watches the AI fix things in real-time.
+
+### 6. Skills as Browser Capabilities
+
+Every gstack skill becomes a browser capability.
+
+| Skill | Browser Capability |
+|-------|-------------------|
+| `/qa` | Test every page, find bugs, fix them, verify fixes |
+| `/design-review` | Screenshot → analyze → fix CSS → screenshot again |
+| `/investigate` | See the error in browser → trace to code → fix → verify |
+| `/benchmark` | Measure page performance → detect regressions → alert |
+| `/canary` | Monitor deployed site → screenshot periodically → alert on changes |
+| `/ship` | Run tests → review diff → create PR → verify deployment in browser |
+| `/cso` | Audit page for XSS, open redirects, clickjacking in real browser |
+| `/office-hours` | Browse competitor sites → synthesize observations → design doc |
+
+The command palette (Cmd+K) is the hub. You don't need to know the skills exist.
+You type what you want, the fuzzy filter finds the right skill, and the AI runs it
+with the browser as context.
+
+### 7. The Design Loop
+
+AI-powered design is a loop, not a handoff.
+
+```
+Generate mockup (GPT Image API)
+  → Review in browser (side-by-side with live site)
+  → Iterate with feedback ("make the header taller")
+  → Approve direction
+  → Generate production HTML/CSS
+  → Preview in browser
+  → Fine-tune with /design-review
+  → Ship
+```
+
+The browser closes the gap between "what it looks like in Figma" and "what it
+looks like in production." Because the AI can see both simultaneously.
+
+### 8. The Security Loop
+
+CSO review in a real browser, not just static analysis.
+
+- Inject XSS payloads into every input field, check if they execute
+- Test CSRF by replaying requests from a different origin
+- Check for open redirects by navigating to crafted URLs
+- Verify CSP headers are actually enforced (not just present)
+- Test auth flows by manipulating cookies and tokens in real-time
+- Check for clickjacking by loading the site in an iframe
+
+Static analysis catches patterns. Browser testing catches reality.
+
+### 9. The Monitoring Loop
+
+Post-deploy canary monitoring, in a real browser.
+
+```
+Deploy → Browser loads production URL
+  → Screenshot baseline
+  → Every 5 minutes: screenshot, compare, check console
+  → Alert on: visual regression, new console errors, performance drop
+  → Auto-rollback if critical error detected
+```
+
+Synthetic monitoring with AI judgment. Not just "did the page return 200" but
+"does the page look right and work correctly."
+
+## Architecture
+
+```
+-------------------------------------------------------+
+|                  GStack Browser                        |
+|                                                        |
+|  +------------------+  +---------------------------+  |
+|  |   Chromium        |  |   Extension Side Panel    |  |
+|  |   (Playwright)    |  |   ├── Chat (Claude Code)  |  |
+|  |                   |  |   ├── Activity Feed        |  |
+|  |   ┌────────────┐  |  |   ├── Element Refs         |  |
+|  |   │ Status Bar  │  |  |   ├── CSS Inspector        |  |
+|  |   └────────────┘  |  |   ├── Command Palette      |  |
+|  +--------┬──────────+  |   └── Settings             |  |
+|           │              +-------------┬--------------+  |
+-----------┼────────────────────────────┼─────────────────+
+            │                            │
+            v                            v
+  +---------┴-----------+    +-----------┴-----------+
+  |  Browse Server      |    |  Sidebar Agent        |
+  |  (HTTP + SSE)       |    |  (claude -p wrapper)  |
+  |  :34567             |    |  Runs gstack skills   |
+  |                     |    |  Per-tab isolation     |
+  |  Commands:          |    |                       |
+  |  goto, click, fill  |    |  Future: BoomLooper   |
+  |  snapshot, screenshot|   |  GenServer agents     |
+  |  css, inspect, eval |    |                       |
+  +---------┬-----------+    +-----------┬-----------+
+            │                            │
+            v                            v
+  +---------┴-----------+    +-----------┴-----------+
+  |  User's App         |    |  Claude Code          |
+  |  localhost:3000     |    |  (reads/writes code)  |
+  |  (or any URL)       |    |                       |
+  +---------------------+    +-----------------------+
+```
+
+## Competitive Landscape
+
+| Browser | Approach | Differentiator | Weakness |
+|---------|----------|---------------|----------|
+| **Atlas** | Chromium fork + AI layer | Agentic browser, "OWL" isolated Chromium | Consumer-focused, no code integration |
+| **Dia** | AI-native browser | Clean UI, built for AI interaction | No dev tools, no code editing |
+| **Comet** | AI browser | Multi-agent browsing | Early, unclear dev workflow |
+| **Chrome Auto Browse** | Extension | Google's own, deep Chrome integration | Extension-only, no code editing |
+| **Cursor** | VSCode fork + AI | Best-in-class code editing | No browser viewport |
+| **GStack Browser** | CC runtime + browser viewport | See bug in browser, fix in code, verify | Currently macOS-only, no consumer features |
+
+GStack Browser doesn't compete with consumer browsers. It competes with the
+workflow of switching between browser and editor. The goal is to make that switch
+invisible.
+
+## Design System
+
+From DESIGN.md:
+- **Primary accent:** Amber-500 (#F59E0B) — agent active, focus states, pulse
+- **Background:** Zinc-950 (#09090B) through Zinc-800 (#27272A) — dark, dense
+- **Typography:** JetBrains Mono (code/status), DM Sans (UI/labels)
+- **Border radius:** 8px (md), 12px (lg), full (pills)
+- **Motion:** Pulse animation on agent active, 200ms transitions
+- **Layout:** Sidebar (right), status bar (bottom), palette (centered overlay)
+
+## Implementation Status
+
+| Component | Status | Notes |
+|-----------|--------|-------|
+| .app bundle | **SHIPPED** | 389MB, launches in ~5s |
+| DMG packaging | **SHIPPED** | 189MB compressed |
+| `GSTACK_CHROMIUM_PATH` | **SHIPPED** | Custom Chromium binary support |
+| `BROWSE_EXTENSIONS_DIR` | **SHIPPED** | Extension path override |
+| Auth via `/health` | **SHIPPED** | Replaces .auth.json file approach, auto-refreshes on server restart |
+| Build script | **SHIPPED** | `scripts/build-app.sh` |
+| Model routing | **SHIPPED** | Sonnet for actions, Opus for analysis (`pickSidebarModel`) |
+| Debug logging | **SHIPPED** | 40+ silent catches → prefixed console logging across 4 files |
+| No idle timeout (headed) | **SHIPPED** | Browser stays alive as long as window is open |
+| Cookie import button | **SHIPPED** | One-click in sidebar footer, opens `/cookie-picker` |
+| Sidebar arrow hint | **SHIPPED** | Points to sidebar, hides only when sidebar actually opens |
+| Architecture doc | **SHIPPED** | `docs/designs/SIDEBAR_MESSAGE_FLOW.md` |
+| Command palette | Planned | Phase 1b |
+| Quick screenshot | Planned | Phase 1b |
+| Status bar | Planned | Phase 1b |
+| Dev server detection | Planned | Phase 1b |
+| BoomLooper integration | Future | Phase 2 |
+| Cross-platform | Future | Phase 3 |
+| Chromium fork | Trigger-gated | Phase 4 |
+| Native shell | Deferred | Phase 5 |
+
+## The 12-Month Vision
+
+```
+TODAY (Phase 1)               6 MONTHS (Phase 2-3)          12 MONTHS (Phase 4-5)
+─────────────                 ──────────────────            ────────────────────
+macOS .app wrapper            BoomLooper multi-agent         Chromium fork OR
+Extension sidebar             Docker containers              Native SwiftUI shell
+Local claude -p agent         Team workspaces                Cross-platform
+Single project                Linux/x64 browse               Auto-update
+Manual skill invocation       Autonomous QA loops            Skill marketplace
+                              Performance monitoring          Plugin API
+                              Real-time collaboration         Enterprise features
+```
+
+The 12-month ideal: you open GStack Browser, it detects your project, starts
+your dev server, runs your test suite, and reports what's broken. You say "fix
+it" and the AI fixes every bug, verifies each fix visually, and creates a PR.
+You review the PR in the same browser, approve it, and the AI deploys it and
+monitors the canary. All in one window.
+
+That's the browser as AI workspace. Not a browser with AI bolted on. An AI
+with a browser bolted on.
+
+## Review History
+
+This plan went through 4 reviews:
+
+1. **CEO Review** (`/plan-ceo-review`, SELECTIVE EXPANSION) — 9 scope proposals,
+   3 accepted (Cmd+K, Cmd+Shift+S, status bar), 5 deferred, 1 skipped
+2. **Design Review** (`/plan-design-review`) — scored 5/10 → 8/10, 9 design
+   decisions added, 2 approved mockups generated
+3. **Eng Review** (`/plan-eng-review`) — 4 issues found, 0 critical gaps,
+   test plan produced
+4. **Codex Review** (outside voice) — 9 findings, 3 critical gaps caught
+   (server bundling, auth file location, project binding). All resolved.
+
+The Codex review caught 3 real architecture gaps that survived 3 prior reviews.
+Cross-model review works.
@@ -0,0 +1,330 @@
+# Design: GStack Self-Learning Infrastructure
+
+Generated by /office-hours + /plan-ceo-review + /plan-eng-review on 2026-03-28
+Updated: 2026-04-01 (post-Session Intelligence, reviewed by Codex)
+Branch: garrytan/ce-features
+Repo: gstack
+Status: ACTIVE
+Mode: Open Source / Community
+
+## Problem Statement
+
+GStack runs 30+ skills across sessions but learns nothing between them. A /review
+session catches an N+1 query pattern, and the next /review on the same codebase
+starts from scratch. A /ship run discovers the test command, and every future /ship
+re-discovers it. A /investigate finds a tricky race condition, and no future session
+knows about it.
+
+Every AI coding tool has this problem. Cursor has per-user memory. Claude Code has
+CLAUDE.md. Windsurf has persistent context. But none of them compound. None of them
+structure what they learn. None of them share knowledge across skills.
+
+## What We're Building
+
+Per-project institutional knowledge that compounds across sessions and skills.
+Structured, typed, confidence-scored learnings that every gstack skill can read and
+write. The goal: after 20 sessions on the same codebase, gstack knows every
+architectural decision, every past bug pattern, and every time it was wrong.
+
+## North Star
+
+/autoship (Release 5). A full engineering team in one command. Describe a feature,
+approve the plan, everything else is automatic. /autoship can't work without
+learnings (R1), review quality (R2), session persistence (R3), and adaptive ceremony
+(R4). Releases 1-4 are the infrastructure that makes /autoship actually work.
+
+## Audience
+
+YC founders building with AI. The people who run gstack on real codebases 20+ times
+a week and notice when it asks the same question twice.
+
+## Differentiation
+
+| Tool | Memory model | Scope | Structure |
+|------|-------------|-------|-----------|
+| Cursor | Per-user chat memory | Per-session | Unstructured |
+| CLAUDE.md | Static file | Per-project | Manual |
+| Windsurf | Persistent context | Per-session | Unstructured |
+| **GStack** | **Per-project JSONL** | **Cross-session, cross-skill** | **Typed, scored, decaying** |
+
+---
+
+## State Systems
+
+gstack has four distinct persistence layers. They share storage patterns
+(JSONL in `~/.gstack/projects/$SLUG/`) but serve different purposes:
+
+| System | File | What it stores | Written by | Read by |
+|--------|------|---------------|------------|---------|
+| **Learnings** | `learnings.jsonl` | Institutional knowledge (pitfalls, patterns, preferences) | All skills | All skills (preamble) |
+| **Timeline** | `timeline.jsonl` | Event history (skill start/complete, branch, outcome) | Preamble (automatic) | /retro, preamble context recovery |
+| **Checkpoints** | `checkpoints/*.md` | Working state snapshots (decisions, remaining work, files) | /checkpoint, /ship, /investigate | Preamble context recovery, /checkpoint resume |
+| **Health** | `health-history.jsonl` | Code quality scores over time (per-tool, composite) | /health | /retro, /ship (gate), /health (trends) |
+
+These are not overlapping. Learnings = what you know. Timeline = what happened.
+Checkpoints = where you are. Health = how good the code is. Each answers a
+different question.
+
+---
+
+## Release Roadmap
+
+### Release 1: "GStack Learns" (v0.13-0.14) — SHIPPED
+
+**Headline:** Every session makes the next one smarter.
+
+What shipped:
+- Learnings persistence at `~/.gstack/projects/{slug}/learnings.jsonl`
+- `/learn` skill for manual review, search, prune, export
+- Confidence calibration on all review findings (1-10 scores with display rules)
+- Confidence decay for observed/inferred learnings (1pt/30d)
+- Cross-project learnings discovery (opt-in, AskUserQuestion consent)
+- "Learning applied" callouts when reviews match past learnings
+- Integration into /review, /ship, /plan-*, /office-hours, /investigate, /retro
+
+Schema:
+```json
+{
+  "ts": "2026-03-28T12:00:00Z",
+  "skill": "review",
+  "type": "pitfall",
+  "key": "n-plus-one-activerecord",
+  "insight": "Always check includes() for has_many in list endpoints",
+  "confidence": 8,
+  "source": "observed",
+  "branch": "feature-x",
+  "commit": "abc1234",
+  "files": ["app/models/user.rb"]
+}
+```
+
+Types: `pattern` | `pitfall` | `preference` | `architecture` | `tool`
+Sources: `observed` | `user-stated` | `inferred` | `cross-model`
+
+Architecture: append-only JSONL. Duplicates resolved at read time ("latest winner"
+per key+type). No write-time mutation, no race conditions.
+
+### Release 2: "Review Army" (v0.14.3-0.14.4) — SHIPPED
+
+**Headline:** 10 specialist reviewers on every PR.
+
+What shipped:
+- 7 parallel specialist subagents: always-on (testing, maintainability) +
+  conditional (security, performance, data-migration, API contract, design) +
+  red team (large diffs / critical findings)
+- JSON-structured findings with confidence scores + fingerprint dedup across agents
+- PR quality score (0-10) logged per review + /retro trending
+- Learning-informed specialist prompts, past pitfalls injected per domain
+- Multi-specialist consensus highlighting, confirmed findings get boosted
+- Enhanced Delivery Integrity via PLAN_COMPLETION_AUDIT
+- Checklist refactored: CRITICAL categories stay in main pass, specialist
+  categories extracted to focused checklists in review/specialists/
+
+### Release 2.5: "Review Army Expansions" — NOT YET SHIPPED
+
+**Headline:** Ship after R2 proves stable. Check in on how the core loop is performing.
+
+Pre-check: review R2 quality metrics (PR quality scores, specialist hit rates,
+false positive rates, E2E test stability). If core loop has issues, fix those first.
+
+What ships:
+- E1: Adaptive specialist gating, auto-skip specialists with 0-finding track record.
+  Store per-project hit rates via gstack-learnings-log. User can force with --security etc.
+- E3: Test stub generation, each specialist outputs TEST_STUB alongside findings.
+  Framework detected from project (Jest/Vitest/RSpec/pytest/Go test).
+  Flows into Fix-First: AUTO-FIX applies fix + creates test file.
+- E5: Cross-review finding dedup, read gstack-review-read for prior review entries.
+  Suppress findings matching a prior user-skipped finding.
+- E7: Specialist performance tracking, log per-specialist metrics via gstack-review-log.
+  Timeline integration: specialist runs appear in timeline.jsonl for /retro trending.
+
+### Release 3: "Session Intelligence" (v0.15.0) — SHIPPED
+
+**Headline:** Your AI sessions remember what happened.
+
+What shipped:
+- Session timeline: every skill auto-logs start/complete events to
+  `~/.gstack/projects/$SLUG/timeline.jsonl`. Local-only, never sent anywhere,
+  always on regardless of telemetry setting.
+- Context recovery: after compaction or session start, preamble lists recent CEO
+  plans, checkpoints, and reviews. Agent reads the most recent to recover context.
+- Cross-session injection: preamble prints LAST_SESSION and LATEST_CHECKPOINT for
+  the current branch. You see where you left off before typing anything.
+- Predictive skill suggestion: if your last 3 sessions follow a pattern
+  (review, ship, review), gstack suggests what you probably want next.
+- "Welcome back" synthesized context message on session start.
+- `/checkpoint` skill: save/resume/list working state snapshots. Cross-branch
+  listing for Conductor workspace handoff between agents.
+- `/health` skill: code quality scorekeeper wrapping project tools (tsc, biome,
+  knip, shellcheck, tests). Composite 0-10 score, trend tracking, improvement
+  suggestions when scores drop.
+- Timeline binaries: `bin/gstack-timeline-log` and `bin/gstack-timeline-read`.
+- Routing rules: /checkpoint and /health added to preamble skill routing.
+
+Design doc: `docs/designs/SESSION_INTELLIGENCE.md`
+
+### Release 4: "Adaptive Ceremony" — NOT YET SHIPPED
+
+**Headline:** GStack respects your time without compromising your safety.
+
+Ceremony and trust are separate concerns. Ceremony = the set of review/test/QA
+steps a PR goes through. Trust = a policy engine that determines which ceremony
+level applies. They interact but don't merge.
+
+What ships:
+
+**Ceremony levels:**
+- FULL: all specialists, adversarial, Codex structured review, coverage audit, plan
+  completion. For large diffs, new features, migrations, auth changes.
+- STANDARD: adversarial + Codex, coverage audit, plan completion. For medium diffs,
+  typical feature work.
+- FAST: adversarial only. For small, well-tested changes on trusted projects.
+
+**Trust policy engine:**
+- Scope-aware trust. Trust is earned per change class, not globally. Clean history on
+  docs-only PRs does not buy trust on migration PRs.
+- Change class detection: docs, tests, config, frontend, backend, migrations, auth,
+  infra. Each class has its own trust threshold.
+- Trust signals: consecutive clean reviews (per class), /health score stability,
+  regression frequency, test coverage trends.
+- Trust never fast-tracks: migrations, auth/permission changes, new API endpoints,
+  infrastructure changes. These always get FULL ceremony regardless of trust level.
+- Gradual degradation, not binary reset. A single regression doesn't reset all trust.
+  It degrades trust for that change class by one level.
+
+**Scope assessment:**
+- TINY/SMALL/MEDIUM/LARGE classification in /review, /ship, /autoplan based on
+  diff size, files touched, and change class.
+- Ceremony level = f(scope, trust, change class).
+
+**TODO lifecycle:**
+- /triage for interactive approval of incoming TODOs
+- /resolve for batch resolution via parallel agents
+
+### Release 5: "/autoship — One Command, Full Feature" — NOT YET SHIPPED
+
+**Headline:** Describe a feature. Approve the plan. Everything else is automatic.
+
+/autoship is a resumable state machine, not a linear pipeline. Review and QA can
+send work back to build/fix. Compaction can interrupt any phase. The system must
+recover gracefully.
+
+```
+                    ┌──────────┐
+                    │  START   │
+                    └────┬─────┘
+                         │
+                    ┌────▼─────┐
+                    │ /office- │
+                    │  hours   │
+                    └────┬─────┘
+                         │
+                    ┌────▼─────┐
+                    │/autoplan │ ◄── single approval gate
+                    └────┬─────┘
+                         │
+              ┌──────────▼──────────┐
+              │       BUILD         │ ◄── /checkpoint auto-save
+              └──────────┬──────────┘
+                         │
+              ┌──────────▼──────────┐
+              │      /health        │ ◄── quality gate
+              │   (score >= 7.0)    │
+              └──────────┬──────────┘
+                         │ fail → back to BUILD
+              ┌──────────▼──────────┐
+              │      /review        │
+              └──────────┬──────────┘
+                         │ ASK items → back to BUILD
+              ┌──────────▼──────────┐
+              │        /qa          │
+              └──────────┬──────────┘
+                         │ bugs found → back to BUILD
+              ┌──────────▼──────────┐
+              │       /ship         │
+              └──────────┬──────────┘
+                         │
+              ┌──────────▼──────────┐
+              │ /checkpoint archive │ ◄── preserve, don't destroy
+              └─────────────────────┘
+```
+
+What ships:
+- /autoship autonomous pipeline with the state machine above.
+  Each phase writes to timeline.jsonl. Checkpoints auto-save before each phase.
+  Compaction recovery: context recovery reads checkpoint + timeline, resumes at
+  the last completed phase.
+- Checkpoint archival on completion (not deletion). Recovery state is preserved
+  for debugging failed autoship runs.
+- /ideate brainstorming skill (parallel divergent agents + adversarial filtering)
+- Research agents in /plan-eng-review (codebase analyst, history analyst,
+  best practices researcher, learnings researcher)
+
+Depends on: R1 (learnings for research agents), R2 (review army for quality),
+R3 (session intelligence for persistence), R4 (adaptive ceremony for speed).
+
+### Release 6: "Execution Studio" — NOT YET SHIPPED
+
+**Headline:** Parallel execution infrastructure.
+
+What ships:
+- Swarm orchestration: multi-worktree parallel builds. Builds on Conductor
+  workspace handoff from /checkpoint (R3). An orchestrator skill dispatches
+  independent workstreams to parallel agents, each with its own worktree.
+- Codex build delegation: auto-detect when to delegate implementation to Codex
+  CLI based on task type (boilerplate, test generation, mechanical refactors).
+- PR feedback resolution: parallel comment resolver across review platforms.
+- /onboard: auto-generated contributor guide from codebase analysis.
+- /triage-prs: batch PR triage for maintainers.
+
+### Release 7: "Design & Media" — NOT YET SHIPPED
+
+**Headline:** Visual design integration.
+
+What ships:
+- Figma design sync (pixel-matching iteration loop)
+- Feature video recording (auto-generated PR demos)
+- Cross-platform portability (Copilot, Kiro, Windsurf output)
+
+---
+
+## Risk Register
+
+### Proxy signals as permission to skip scrutiny
+(Identified by Codex review, 2026-04-01)
+
+/health scores, clean review history, and timeline patterns are useful signals.
+They are not proof of safety. If those signals feed ceremony reduction AND /autoship,
+the failure mode is rare, silent, high-severity mistakes. Mitigations:
+- Certain change classes never fast-track (migrations, auth, infra, new endpoints).
+- Trust degrades gradually, not binary reset.
+- /autoship always runs FULL ceremony on its first run per project. Trust is earned.
+
+### Stale context recovery
+(Identified by Codex review, 2026-04-01)
+
+Context recovery can inject wrong-branch state, obsolete plans, or invalid
+checkpoints. Mitigations:
+- Checkpoints include branch name in YAML frontmatter. Context recovery filters
+  by current branch.
+- Timeline grep filters by branch before showing LAST_SESSION.
+- Stale artifact detection: if checkpoint is >7 days old, note it as potentially
+  stale rather than presenting as current.
+
+### Validation metrics needed
+(Identified by Codex review, 2026-04-01)
+
+Before shipping R4 (Adaptive Ceremony), measure:
+- Predictive suggestion accuracy (did the user run the suggested skill?)
+- Trust policy false-skip rate (did fast-tracked PRs have post-merge issues?)
+- Context recovery accuracy (did recovered context match actual state?)
+- /health score correlation with actual code quality (do high scores predict
+  fewer production bugs?)
+
+These metrics should be collected during R3 usage and reviewed before R4 ships.
+
+---
+
+## Acknowledged Inspiration
+
+The self-learning roadmap was inspired by ideas from the [Compound Engineering](https://github.com/nicobailon/compound-engineering) project by Nico Bailon. Their exploration of learnings persistence, parallel review agents, and autonomous pipelines catalyzed the design of GStack's approach. We adapted every concept to fit GStack's template system, voice, and architecture rather than porting directly.
@@ -0,0 +1,135 @@
+# Session Intelligence Layer
+
+## The Problem
+
+Claude Code's context window is ephemeral. Every session starts fresh. When
+auto-compaction fires at ~167K tokens, it preserves a generic summary but
+destroys file reads, reasoning chains, and intermediate decisions.
+
+gstack already produces valuable artifacts that survive on disk: CEO plans,
+eng reviews, design reviews, QA reports, learnings. These files contain
+decisions, constraints, and context that shaped the current work. But Claude
+doesn't know they exist. After compaction, the plans and reviews that
+informed every decision silently vanish from context.
+
+The ecosystem is working on this. claude-mem (9K+ stars) captures tool usage
+and injects context into future sessions. Claude HUD shows real-time agent
+status. Anthropic's own `claude-progress.txt` pattern uses a progress file
+that agents read at the start of each session.
+
+Nobody is solving the specific problem of making **skill-produced artifacts**
+survive compaction. Because nobody else has gstack's artifact architecture.
+
+## The Insight
+
+gstack already writes structured artifacts to `~/.gstack/projects/$SLUG/`:
+- CEO plans: `ceo-plans/`
+- Design reviews: `design-reviews/`
+- Eng reviews: `eng-reviews/`
+- Learnings: `learnings.jsonl`
+- Skill usage: `../analytics/skill-usage.jsonl`
+
+The missing piece is not storage. It's awareness. The preamble needs to tell
+the agent: "These files exist. They contain decisions you've already made.
+After compaction, re-read them."
+
+## The Architecture
+
+```
+                   ┌─────────────────────────────────────┐
+                   │        Claude Context Window         │
+                   │   (ephemeral, ~167K token limit)     │
+                   │                                      │
+                   │   Compaction fires ──► summary only   │
+                   └──────────────┬──────────────────────┘
+                                  │
+                          reads on start / after compaction
+                                  │
+                   ┌──────────────▼──────────────────────┐
+                   │    ~/.gstack/projects/$SLUG/         │
+                   │    (persistent, survives everything) │
+                   │                                      │
+                   │  ceo-plans/         ← /plan-ceo-review
+                   │  eng-reviews/       ← /plan-eng-review
+                   │  design-reviews/    ← /plan-design-review
+                   │  checkpoints/       ← /checkpoint (new)
+                   │  timeline.jsonl     ← every skill (new)
+                   │  learnings.jsonl    ← /learn
+                   └─────────────────────────────────────┘
+                                  │
+                          rolled up weekly
+                                  │
+                   ┌──────────────▼──────────────────────┐
+                   │           /retro                      │
+                   │  Timeline: 3 /review, 2 /ship, ...   │
+                   │  Health trends: compile 8/10 (↑2)     │
+                   │  Learnings applied: 4 this week       │
+                   └─────────────────────────────────────┘
+```
+
+## The Features
+
+### Layer 1: Context Recovery (preamble, all skills)
+~10 lines of prose in the preamble. After compaction or context degradation,
+the agent checks `~/.gstack/projects/$SLUG/` for recent plans, reviews, and
+checkpoints. Lists the directory, reads the most recent file.
+
+Cost: near-zero. Benefit: every skill's plans/reviews survive compaction.
+
+### Layer 2: Session Timeline (preamble, all skills)
+Every skill appends a one-line JSONL entry to `timeline.jsonl`: timestamp,
+skill name, branch, key outcome. `/retro` renders it.
+
+Makes the project's AI-assisted work history visible. "This week: 3 /review,
+2 /ship, 1 /investigate across branches feature-auth and fix-billing."
+
+### Layer 3: Cross-Session Injection (preamble, all skills)
+When a new session starts on a branch with recent artifacts, the preamble
+prints a one-liner: "Last session: implemented JWT auth, 3/5 tasks done.
+Plan: ~/.gstack/projects/$SLUG/checkpoints/latest.md"
+
+The agent knows where you left off before reading any files.
+
+### Layer 4: /checkpoint (opt-in skill)
+Manual snapshot of working state: what's being done, files being edited,
+decisions made, what's remaining. Useful before stepping away, before
+complex operations, for workspace handoffs, or coming back after days.
+
+### Layer 5: /health (opt-in skill)
+Code quality dashboard: type-check, lint, test suite, dead code scan.
+Composite 0-10 score. Tracks over time. `/retro` shows trends. `/ship`
+gates on configurable threshold.
+
+## The Compounding Effect
+
+Each feature is independently useful. Together, they create something
+that compounds:
+
+Session 1: /plan-ceo-review produces a plan. Saved to disk.
+Session 2: Agent reads the plan after preamble. Doesn't re-ask decisions.
+Session 3: /checkpoint saves progress. Timeline shows 2 /review, 1 /ship.
+Session 4: Compaction fires mid-refactor. Agent re-reads the checkpoint.
+           Recovers key decisions, types, remaining work. Continues.
+Session 5: /retro rolls up the week. Health trend: 6/10 → 8/10.
+           Timeline shows 12 skill invocations across 3 branches.
+
+The project's AI history is no longer ephemeral. It persists, compounds,
+and makes every future session smarter. That's the session intelligence
+layer.
+
+## What This Is Not
+
+- Not a replacement for Claude's built-in compaction (that handles session
+  state; we handle gstack artifacts)
+- Not a full memory system like claude-mem (that handles cross-session
+  memory via SQLite; we handle structured skill artifacts)
+- Not a database or service (just markdown files on disk)
+
+## Research Sources
+
+- [Anthropic: Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
+- [Anthropic: Effective context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
+- [claude-mem](https://github.com/thedotmack/claude-mem)
+- [Claude HUD](https://github.com/jarrodwatts/claude-hud)
+- [CodeScene: Agentic AI coding best practices](https://codescene.com/blog/agentic-ai-coding-best-practice-patterns-for-speed-with-quality)
+- [Post-compaction recovery via git-persisted state (Beads)](https://dev.to/jeremy_longshore/building-post-compaction-recovery-for-ai-agent-workflows-with-beads-207l)
@@ -0,0 +1,190 @@
+# Sidebar Message Flow
+
+How the GStack Browser sidebar actually works. Read this before touching
+sidepanel.js, background.js, content.js, server.ts sidebar endpoints,
+or sidebar-agent.ts.
+
+## Components
+
+```
+┌─────────────────┐     ┌──────────────┐     ┌─────────────┐     ┌────────────────┐
+│  sidepanel.js   │────▶│ background.js│────▶│  server.ts   │────▶│sidebar-agent.ts│
+│  (Chrome panel) │     │ (svc worker) │     │  (Bun HTTP)  │     │  (Bun process) │
+└─────────────────┘     └──────────────┘     └─────────────┘     └────────────────┘
+        ▲                                           │                      │
+        │           polls /sidebar-chat             │    polls queue file   │
+        └───────────────────────────────────────────┘                      │
+                                                    ◀──────────────────────┘
+                                                    POST /sidebar-agent/event
+```
+
+## Startup Timeline
+
+```
+T+0ms     CLI runs `$B connect`
+            ├── Server starts on port 34567
+            ├── Writes state to .gstack/browse.json (pid, port, token)
+            ├── Launches headed Chromium with extension
+            └── Clears sidebar-agent-queue.jsonl
+
+T+500ms   sidebar-agent.ts spawned by CLI
+            ├── Reads auth token from .gstack/browse.json
+            ├── Creates queue file if missing
+            ├── Sets lastLine = current line count
+            └── Starts polling every 200ms
+
+T+1-3s    Extension loads in Chromium
+            ├── background.js: health poll every 1s (fast startup)
+            │     └── GET /health → gets auth token
+            ├── content.js: injects on welcome page
+            │     └── Does NOT fire gstack-extension-ready (waits for sidebar)
+            └── Side panel: may auto-open via chrome.sidePanel.open()
+
+T+2-10s   Side panel connects
+            ├── tryConnect() → asks background for port/token
+            ├── Fallback: direct GET /health for token
+            ├── updateConnection(url, token)
+            │     ├── Starts chat polling (1s interval)
+            │     ├── Starts tab polling (2s interval)
+            │     ├── Connects SSE activity stream
+            │     └── Sends { type: 'sidebarOpened' } to background
+            └── background relays to content script → hides welcome arrow
+
+T+10s+    Ready for messages
+```
+
+## Message Flow: User Types → Claude Responds
+
+```
+1. User types "go to hn" in sidebar, hits Enter
+
+2. sidepanel.js sendMessage()
+   ├── Renders user bubble immediately (optimistic)
+   ├── Renders thinking dots immediately
+   ├── Switches to fast poll (300ms)
+   └── chrome.runtime.sendMessage({ type: 'sidebar-command', message, tabId })
+
+3. background.js
+   ├── Gets active Chrome tab URL
+   └── POST /sidebar-command { message, activeTabUrl }
+       with Authorization: Bearer ${authToken}
+
+4. server.ts /sidebar-command handler
+   ├── validateAuth(req)
+   ├── syncActiveTabByUrl(extensionUrl) — syncs Playwright tab to Chrome tab
+   ├── pickSidebarModel(message) — 'sonnet' for actions, 'opus' for analysis
+   ├── Adds user message to chat buffer
+   ├── Builds system prompt + args
+   └── Appends JSON to ~/.gstack/sidebar-agent-queue.jsonl
+
+5. sidebar-agent.ts poll() (within 200ms)
+   ├── Reads new line from queue file
+   ├── Parses JSON entry
+   ├── Checks processingTabs — skips if tab already has agent running
+   └── askClaude(entry) — fire and forget
+
+6. sidebar-agent.ts askClaude()
+   ├── spawn('claude', ['-p', prompt, '--model', model, ...])
+   ├── Streams stdout line-by-line (stream-json format)
+   ├── For each event: POST /sidebar-agent/event { type, tool, text, tabId }
+   └── On close: POST /sidebar-agent/event { type: 'agent_done' }
+
+7. server.ts processAgentEvent()
+   ├── Adds entry to chat buffer (in-memory + disk)
+   ├── On agent_done: sets tab status to 'idle'
+   └── On agent_done: processes next queued message for that tab
+
+8. sidepanel.js pollChat() (every 300ms during fast poll)
+   ├── GET /sidebar-chat?after=${chatLineCount}&tabId=${tabId}
+   ├── Renders new entries (text, tool_use, agent_done)
+   └── On agent idle: removes thinking dots, stops fast poll
+```
+
+## Arrow Hint Hide Flow (4-step signal chain)
+
+The welcome page shows a right-pointing arrow until the sidebar opens.
+
+```
+1. sidepanel.js updateConnection()
+   └── chrome.runtime.sendMessage({ type: 'sidebarOpened' })
+
+2. background.js
+   └── chrome.tabs.sendMessage(activeTabId, { type: 'sidebarOpened' })
+
+3. content.js onMessage handler
+   └── document.dispatchEvent(new CustomEvent('gstack-extension-ready'))
+
+4. welcome.html script
+   └── addEventListener('gstack-extension-ready', () => arrow.classList.add('hidden'))
+```
+
+The arrow does NOT hide when the extension loads. Only when the sidebar connects.
+
+## Auth Token Flow
+
+```
+Server starts → AUTH_TOKEN = crypto.randomUUID()
+    │
+    ├── GET /health (no auth) → returns { token: AUTH_TOKEN }
+    │
+    ├── background.js checkHealth() → authToken = data.token
+    │     └── Refreshes on EVERY health poll (fixes stale token on restart)
+    │
+    ├── sidepanel.js tryConnect() → serverToken from background or /health
+    │     └── Used for chat polling: Authorization: Bearer ${serverToken}
+    │
+    └── sidebar-agent.ts refreshToken() → reads from .gstack/browse.json
+          └── Used for event relay: Authorization: Bearer ${authToken}
+```
+
+If the server restarts, all three components get fresh tokens within 10s
+(background health poll interval).
+
+## Model Routing
+
+`pickSidebarModel(message)` in server.ts classifies messages:
+
+| Pattern | Model | Why |
+|---------|-------|-----|
+| "click @e24", "go to hn", "screenshot" | sonnet | Deterministic tool calls, no thinking needed |
+| "what does this page say?", "summarize" | opus | Needs comprehension |
+| "find bugs", "check for broken links" | opus | Analysis task |
+| "navigate to X and fill the form" | sonnet | Action-oriented, no analysis words |
+
+Analysis words (`what`, `why`, `how`, `summarize`, `describe`, `analyze`, `read X and Y`)
+always override action verbs and force opus.
+
+## Known Failure Modes
+
+| Failure | Symptom | Root Cause | Fix |
+|---------|---------|------------|-----|
+| Stale auth token | "Unauthorized" in input | Server restarted, background had old token | background.js refreshes token on every health poll |
+| Tab ID mismatch | Message sent, no response visible | Server assigned tabId 1, sidebar polling tabId 0 | switchChatTab preserves optimistic UI during switch |
+| Sidebar agent not running | Messages queue forever | Agent process failed to spawn or crashed | Check `ps aux | grep sidebar-agent` |
+| Agent stale token | Agent runs but no events appear in sidebar | sidebar-agent has old token from .gstack/browse.json | Agent re-reads token before each event POST |
+| Queue file missing | spawnClaude fails | Race between server start and agent start | Both sides create file if missing |
+| Optimistic UI blown away | User bubble + dots vanish | switchChatTab replaced DOM with welcome screen | Preserved DOM when lastOptimisticMsg is set |
+
+## Per-Tab Concurrency
+
+Each browser tab can run its own agent simultaneously:
+
+- Server: `tabAgents: Map<number, TabAgentState>` with per-tab queue (max 5)
+- sidebar-agent: `processingTabs: Set<number>` prevents duplicate spawns
+- Two messages on same tab: queued sequentially, processed in order
+- Two messages on different tabs: run concurrently
+
+## File Locations
+
+| Component | File | Runs in |
+|-----------|------|---------|
+| Sidebar UI | `extension/sidepanel.js` | Chrome side panel |
+| Service worker | `extension/background.js` | Chrome background |
+| Content script | `extension/content.js` | Page context |
+| Welcome page | `browse/src/welcome.html` | Page context |
+| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
+| Agent process | `browse/src/sidebar-agent.ts` | Bun (non-compiled, can spawn) |
+| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
+| Queue file | `~/.gstack/sidebar-agent-queue.jsonl` | Filesystem |
+| State file | `.gstack/browse.json` | Filesystem |
+| Chat log | `~/.gstack/sessions/<id>/chat.jsonl` | Filesystem |
@@ -0,0 +1,290 @@
+# Slate Host Integration — Research & Design Doc
+
+**Date:** 2026-04-02
+**Branch:** garrytan/slate-agent-support
+**Status:** Research complete, blocked on host config refactor
+**Supersedes:** None
+
+## What is Slate
+
+Slate is a proprietary coding agent CLI from Random Labs.
+Install: `npm i -g @randomlabs/slate` or `brew install anthropic/tap/slate`.
+License: Proprietary. 85MB compiled Bun binary (arm64/x64, darwin/linux/windows).
+npm package: `@randomlabs/slate@1.0.25` (thin 8.8KB launcher + platform-specific optional deps).
+
+Multi-model: dynamically selects Claude Sonnet/Opus/Haiku, plus other models.
+Built for "swarm orchestration" with extended multi-hour sessions.
+
+## Slate is an OpenCode fork
+
+**Confirmed via binary strings analysis** of the 85MB Mach-O arm64 binary:
+
+- Internal name: `name: "opencode"` (literal string in binary)
+- All `OPENCODE_*` env vars present alongside `SLATE_*` equivalents
+- Shares OpenCode's tool/skill architecture, LSP integration, terminal management
+- Own branding, API endpoints (`api.randomlabs.ai`, `agent-worker-prod.randomlabs.workers.dev`), and config paths
+
+This matters for integration: OpenCode conventions mostly apply, but Slate adds
+its own paths and env vars on top.
+
+## Skill Discovery (confirmed from binary)
+
+Slate scans ALL four directory families for skills. Error messages in binary confirm:
+
+```
+"failed .slate directory scan for skills"
+"failed .claude directory scan for skills"
+"failed .agents directory scan for skills"
+"failed .opencode directory scan for skills"
+```
+
+**Discovery paths (priority order from Slate docs):**
+
+1. `.slate/skills/<name>/SKILL.md` — project-level, highest priority
+2. `~/.slate/skills/<name>/SKILL.md` — global
+3. `.opencode/skills/`, `.agents/skills/` — compatibility fallback
+4. `.claude/skills/` — Claude Code compatibility fallback (lowest)
+5. Custom paths via `slate.json`
+
+**Glob patterns:** `**/SKILL.md` and `{skill,skills}/**/SKILL.md`
+
+**Commands:** Same directory structure but under `commands/` subdirs:
+`/.slate/commands/`, `/.claude/commands/`, `/.agents/commands/`, `/.opencode/commands/`
+
+**Skill frontmatter:** YAML with `name` and `description` fields (per Slate docs).
+No documented length limits on either field.
+
+## Project Instructions
+
+Slate reads both `CLAUDE.md` and `AGENTS.md` for project instructions.
+Both literal strings confirmed in binary. No changes needed to existing
+gstack projects... CLAUDE.md works as-is.
+
+## Configuration
+
+**Config file:** `slate.json` / `slate.jsonc` (NOT opencode.json)
+
+**Config options (from Slate docs):**
+- `privacy` (boolean) — disables telemetry/logging
+- Permissions: `allow`, `ask`, `deny` per tool (`read`, `edit`, `bash`, `grep`, `webfetch`, `websearch`, `*`)
+- Model slots: `models.main`, `models.subagent`, `models.search`, `models.reasoning`
+- MCP servers: local or remote with custom commands and headers
+- Custom commands: `/commands` with templates
+
+The setup script should NOT create `slate.json`. Users configure their own permissions.
+
+## CLI Flags (Headless Mode)
+
+```
+--stream-json / --output-format stream-json  — JSONL output, "compatible with Anthropic Claude Code SDK"
+--dangerously-skip-permissions               — bypass all permission checks (CI/automation)
+--input-format stream-json                   — programmatic input
+-q                                           — non-interactive mode
+-w <dir>                                     — workspace directory
+--output-format text                         — plain text output (default)
+```
+
+**Stream-JSON format:** Slate docs claim "compatible with Anthropic Claude Code SDK."
+Not yet empirically verified. Given OpenCode heritage, likely matches Claude Code's
+NDJSON event schema (type: "assistant", type: "tool_result", type: "result").
+
+**Need to verify:** Run `slate -q "hello" --stream-json` with valid credits and
+capture actual JSONL events before building the session runner parser.
+
+## Environment Variables (from binary strings)
+
+### Slate-specific
+```
+SLATE_API_KEY                              — API key
+SLATE_AGENT                                — agent selection
+SLATE_AUTO_SHARE                           — auto-share setting
+SLATE_CLIENT                               — client identifier
+SLATE_CONFIG                               — config override
+SLATE_CONFIG_CONTENT                       — inline config
+SLATE_CONFIG_DIR                           — config directory
+SLATE_DANGEROUSLY_SKIP_PERMISSIONS         — bypass permissions
+SLATE_DIR                                  — data directory override
+SLATE_DISABLE_AUTOUPDATE                   — disable auto-update
+SLATE_DISABLE_CLAUDE_CODE                  — disable Claude Code integration entirely
+SLATE_DISABLE_CLAUDE_CODE_PROMPT           — disable Claude Code prompt loading
+SLATE_DISABLE_CLAUDE_CODE_SKILLS           — disable .claude/skills/ loading
+SLATE_DISABLE_DEFAULT_PLUGINS              — disable default plugins
+SLATE_DISABLE_FILETIME_CHECK               — disable file time checks
+SLATE_DISABLE_LSP_DOWNLOAD                 — disable LSP auto-download
+SLATE_DISABLE_MODELS_FETCH                 — disable models config fetch
+SLATE_DISABLE_PROJECT_CONFIG               — disable project-level config
+SLATE_DISABLE_PRUNE                        — disable session pruning
+SLATE_DISABLE_TERMINAL_TITLE               — disable terminal title updates
+SLATE_ENABLE_EXA                           — enable Exa search
+SLATE_ENABLE_EXPERIMENTAL_MODELS           — enable experimental models
+SLATE_EXPERIMENTAL                         — enable experimental features
+SLATE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS — bash timeout override
+SLATE_EXPERIMENTAL_DISABLE_COPY_ON_SELECT  — disable copy on select
+SLATE_EXPERIMENTAL_DISABLE_FILEWATCHER     — disable file watcher
+SLATE_EXPERIMENTAL_EXA                     — Exa search (alt flag)
+SLATE_EXPERIMENTAL_FILEWATCHER             — enable file watcher
+SLATE_EXPERIMENTAL_ICON_DISCOVERY          — icon discovery
+SLATE_EXPERIMENTAL_LSP_TOOL               — LSP tool
+SLATE_EXPERIMENTAL_LSP_TY                 — LSP type checking
+SLATE_EXPERIMENTAL_MARKDOWN               — markdown mode
+SLATE_EXPERIMENTAL_OUTPUT_TOKEN_MAX       — output token limit
+SLATE_EXPERIMENTAL_OXFMT                  — oxfmt integration
+SLATE_EXPERIMENTAL_PLAN_MODE              — plan mode
+SLATE_FAKE_VCS                            — fake VCS for testing
+SLATE_GIT_BASH_PATH                       — git bash path (Windows)
+SLATE_MODELS_URL                          — models config URL
+SLATE_PERMISSION                          — permission override
+SLATE_SERVER_PASSWORD                     — server auth
+SLATE_SERVER_USERNAME                     — server auth
+SLATE_TELEMETRY_DISABLED                  — disable telemetry
+SLATE_TEST_HOME                           — test home directory
+SLATE_TOKEN_DIR                           — token storage directory
+```
+
+### OpenCode legacy (still functional)
+```
+OPENCODE_DISABLE_LSP_DOWNLOAD
+OPENCODE_EXPERIMENTAL_DISABLE_FILEWATCHER
+OPENCODE_EXPERIMENTAL_FILEWATCHER
+OPENCODE_EXPERIMENTAL_ICON_DISCOVERY
+OPENCODE_EXPERIMENTAL_LSP_TY
+OPENCODE_EXPERIMENTAL_OXFMT
+OPENCODE_FAKE_VCS
+OPENCODE_GIT_BASH_PATH
+OPENCODE_LIBC
+OPENCODE_TERMINAL
+```
+
+### Critical env vars for gstack integration
+
+**`SLATE_DISABLE_CLAUDE_CODE_SKILLS`** — When set, `.claude/skills/` loading is disabled.
+This makes publishing to `.slate/skills/` load-bearing, not just an optimization.
+Without native `.slate/` publishing, gstack skills vanish when this flag is set.
+
+**`SLATE_TEST_HOME`** — Useful for E2E tests. Can redirect Slate's home directory
+to an isolated temp directory, similar to how Codex tests use a temp HOME.
+
+**`SLATE_DANGEROUSLY_SKIP_PERMISSIONS`** — Required for headless E2E tests.
+
+## Model References (from binary)
+
+```
+anthropic/claude-sonnet-4.6
+anthropic/claude-opus-4
+anthropic/claude-haiku-4
+anthropic/slate              — Slate's own model routing
+openai/gpt-5.3-codex
+google/nano-banana
+randomlabs/fast-default-alpha
+```
+
+## API Endpoints (from binary)
+
+```
+https://api.randomlabs.ai                          — main API
+https://api.randomlabs.ai/exaproxy                 — Exa search proxy
+https://agent-worker-prod.randomlabs.workers.dev   — production worker
+https://agent-worker-dev.randomlabs.workers.dev    — dev worker
+https://dashboard.randomlabs.ai                    — dashboard
+https://docs.randomlabs.ai                         — documentation
+https://randomlabs.ai/config.json                  — remote config
+```
+
+Brew tap: `anthropic/tap/slate` (notable: under Anthropic's tap, not Random Labs)
+
+## npm Package Structure
+
+```
+@randomlabs/slate (8.8 kB, thin launcher)
+├── bin/slate           — Node.js launcher (finds platform binary in node_modules)
+├── bin/slate1          — Bun launcher (same logic, import.meta.filename)
+├── postinstall.mjs     — Verifies platform binary exists, symlinks if needed
+└── package.json        — Declares optionalDependencies for all platforms
+
+Platform packages (85MB each):
+├── @randomlabs/slate-darwin-arm64
+├── @randomlabs/slate-darwin-x64
+├── @randomlabs/slate-linux-arm64
+├── @randomlabs/slate-linux-x64
+├── @randomlabs/slate-linux-x64-musl
+├── @randomlabs/slate-linux-arm64-musl
+├── @randomlabs/slate-linux-x64-baseline
+├── @randomlabs/slate-linux-x64-baseline-musl
+├── @randomlabs/slate-darwin-x64-baseline
+├── @randomlabs/slate-windows-x64
+└── @randomlabs/slate-windows-x64-baseline
+```
+
+Binary override: `SLATE_BIN_PATH` env var skips all discovery, runs the specified binary directly.
+
+## What Already Works Today
+
+gstack skills already work in Slate via the `.claude/skills/` fallback path.
+No changes needed for basic functionality. Users who install gstack for Claude Code
+and also use Slate will find their skills available in both agents.
+
+## What First-Class Support Adds
+
+1. **Reliability** — `.slate/skills/` is Slate's highest-priority path. Immune to
+   `SLATE_DISABLE_CLAUDE_CODE_SKILLS`.
+2. **Optimized frontmatter** — Strip Claude-specific fields (allowed-tools, hooks, version)
+   that Slate doesn't use. Keep only `name` and `description`.
+3. **Setup script** — Auto-detect `slate` binary, install skills to `~/.slate/skills/`.
+4. **E2E tests** — Verify skills work when invoked by Slate directly.
+
+## Blocked On: Host Config Refactor
+
+Codex's outside voice review identified that adding Slate as a 4th host (after Claude,
+Codex, Factory) is "host explosion for a path alias." The current architecture has:
+
+- Hard-coded host names in `type Host = 'claude' | 'codex' | 'factory'`
+- Per-host branches in `transformFrontmatter()` with near-duplicate logic
+- Per-host config in `EXTERNAL_HOST_CONFIG` with similar patterns
+- Per-host functions in the setup script (`create_codex_runtime_root`, `link_codex_skill_dirs`)
+- Host names duplicated in `bin/gstack-platform-detect`, `bin/gstack-uninstall`, `bin/dev-setup`
+
+Adding Slate means copying all of these patterns again. A refactor to make hosts
+data-driven (config objects instead of if/else branches) would make Slate integration
+trivial AND make future hosts (any new OpenCode fork, any new agent) zero-effort.
+
+### Missing from the plan (identified by Codex)
+
+- `lib/worktree.ts` only copies `.agents/`, not `.slate/` — E2E tests in worktrees won't
+  have Slate skills
+- `bin/gstack-uninstall` doesn't know about `.slate/`
+- `bin/dev-setup` doesn't wire `.slate/` for contributor dev mode
+- `bin/gstack-platform-detect` doesn't detect Slate
+- E2E tests should set `SLATE_DISABLE_CLAUDE_CODE_SKILLS=1` to prove `.slate/` path
+  actually works (not just falling back to `.claude/`)
+
+## Session Runner Design (for later)
+
+When the JSONL format is verified, the session runner should:
+
+- Spawn: `slate -q "<prompt>" --stream-json --dangerously-skip-permissions -w <dir>`
+- Parse: Claude Code SDK-compatible NDJSON (assumed, needs verification)
+- Skills: Install to `.slate/skills/` in test fixture (not `.claude/skills/`)
+- Auth: Use `SLATE_API_KEY` or existing `~/.slate/` credentials
+- Isolation: Use `SLATE_TEST_HOME` for home directory isolation
+- Timeout: 300s default (same as Codex)
+
+```typescript
+export interface SlateResult {
+  output: string;
+  toolCalls: string[];
+  tokens: number;
+  exitCode: number;
+  durationMs: number;
+  sessionId: string | null;
+  rawLines: string[];
+  stderr: string;
+}
+```
+
+## Docs References
+
+- Slate docs: https://docs.randomlabs.ai
+- Quickstart: https://docs.randomlabs.ai/en/getting-started/quickstart
+- Skills: https://docs.randomlabs.ai/en/using-slate/skills
+- Configuration: https://docs.randomlabs.ai/en/using-slate/configuration
+- Hotkeys: https://docs.randomlabs.ai/en/using-slate/hotkey_reference
@@ -12,14 +12,21 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples.
 | [`/review`](#review) | **Staff Engineer** | Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps. |
 | [`/investigate`](#investigate) | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
 | [`/design-review`](#design-review) | **Designer Who Codes** | Live-site visual audit + fix loop. 80-item audit, then fixes what it finds. Atomic commits, before/after screenshots. |
+| [`/design-shotgun`](#design-shotgun) | **Design Explorer** | Generate multiple AI design variants, open a comparison board in your browser, and iterate until you approve a direction. Taste memory biases toward your preferences. |
+| [`/design-html`](#design-html) | **Design Engineer** | Generates production-quality Pretext-native HTML. Works with approved mockups, CEO plans, design reviews, or from scratch. Text reflows on resize, heights adjust to content. Smart API routing per design type. Framework detection for React/Svelte/Vue. |
 | [`/qa`](#qa) | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
 | [`/qa-only`](#qa) | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. |
 | [`/ship`](#ship) | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command. |
+| [`/land-and-deploy`](#land-and-deploy) | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. One command from "approved" to "verified in production." |
+| [`/canary`](#canary) | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures using the browse daemon. |
+| [`/benchmark`](#benchmark) | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. Track trends over time. |
 | [`/cso`](#cso) | **Chief Security Officer** | OWASP Top 10 + STRIDE threat modeling security audit. Scans for injection, auth, crypto, and access control issues. |
 | [`/document-release`](#document-release) | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
 | [`/retro`](#retro) | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. |
 | [`/browse`](#browse) | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. |
 | [`/setup-browser-cookies`](#setup-browser-cookies) | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
+| [`/autoplan`](#autoplan) | **Review Pipeline** | One command, fully reviewed plan. Runs CEO → design → eng review automatically with encoded decision principles. Surfaces only taste decisions for your approval. |
+| [`/learn`](#learn) | **Memory** | Manage what gstack learned across sessions. Review, search, prune, and export project-specific patterns and preferences. |
 | | | |
 | **Multi-AI** | | |
 | [`/codex`](#codex) | **Second Opinion** | Independent review from OpenAI Codex CLI. Three modes: code review (pass/fail gate), adversarial challenge, and open consultation with session continuity. Cross-model analysis when both `/review` and `/codex` have run. |
@@ -29,6 +36,8 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples.
 | [`/freeze`](#safety--guardrails) | **Edit Lock** | Restrict all file edits to a single directory. Blocks Edit and Write outside the boundary. Accident prevention for debugging. |
 | [`/guard`](#safety--guardrails) | **Full Safety** | Combines /careful + /freeze in one command. Maximum safety for prod work. |
 | [`/unfreeze`](#safety--guardrails) | **Unlock** | Remove the /freeze boundary, allowing edits everywhere again. |
+| [`/open-gstack-browser`](#open-gstack-browser) | **GStack Browser** | Launch GStack Browser with sidebar, anti-bot stealth, auto model routing, cookie import, and Claude Code integration. Watch every action live. |
+| [`/setup-deploy`](#setup-deploy) | **Deploy Configurator** | One-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands. |
 | [`/gstack-upgrade`](#gstack-upgrade) | **Self-Updater** | Upgrade gstack to the latest version. Detects global vs vendored install, syncs both, shows what changed. |

 ---
@@ -399,6 +408,110 @@ Nine commits, each touching one concern. The AI Slop score went from D to A beca

 ---

+## `/design-shotgun`
+
+This is my **design exploration mode**.
+
+You know the feeling. You have a feature, a page, a landing screen... and you're not sure what it should look like. You could describe it to Claude and get one answer. But one answer means one perspective, and design is a taste game. You need to see options.
+
+`/design-shotgun` generates 3 visual design variants using the GPT Image API, opens a comparison board in your browser, and waits for your feedback. You pick a direction, request changes, or ask for entirely new variants. The board supports remix, regenerate, and approval actions.
+
+### The loop
+
+1. You describe what you want (or point at an existing page)
+2. The skill reads your `DESIGN.md` for brand constraints (if it exists)
+3. It generates 3 distinct design variants as PNGs
+4. A comparison board opens in your browser with all 3 side-by-side
+5. You click "Approve" on the one you like, or give feedback for another round
+6. The approved variant saves to `~/.gstack/projects/$SLUG/designs/` with an `approved.json`
+
+That `approved.json` is one way to feed `/design-html`. The design pipeline chains: shotgun picks the direction, design-html renders it as working code. But `/design-html` also works with CEO plans, design reviews, or just a description.
+
+### Taste memory
+
+The skill remembers your preferences across sessions. If you consistently prefer minimal designs over busy ones, it biases future generations. This isn't a setting you configure... it emerges from your approvals.
+
+### Example
+
+```
+You:   /design-shotgun — hero section for a developer tools landing page
+
+Claude: [Generates 3 variants]
+        Variant A: Bold typography, dark background, code snippet hero
+        Variant B: Split layout, product screenshot left, copy right
+        Variant C: Minimal, centered headline, gradient accent
+
+        [Opens comparison board at localhost:PORT]
+
+You:    [Clicks "Approve" on Variant A in the browser]
+
+Claude: Approved Variant A. Saved to ~/.gstack/projects/myapp/designs/
+        Next: run /design-html to generate production HTML from this mockup.
+```
+
+---
+
+## `/design-html`
+
+This is my **design-to-code mode**.
+
+Every AI code generation tool produces static CSS. Hardcoded heights. Text that overflows on resize. Breakpoints that snap instead of flowing. The output looks right at exactly one viewport size and breaks at every other.
+
+`/design-html` fixes this. It generates HTML using [Pretext](https://github.com/chenglou/pretext) by Cheng Lou (ex-React core, Midjourney frontend). Pretext is a 15KB library that computes text layout without DOM measurement. Text reflows. Heights adjust to content. Cards size themselves. Chat bubbles shrinkwrap. All sub-millisecond, all dynamic.
+
+It works with multiple input sources: an approved mockup from `/design-shotgun`, a CEO plan from `/plan-ceo-review`, design review context from `/plan-design-review`, a PNG you provide, or just a description of what you want. It detects what context exists and asks how you want to proceed.
+
+### Smart API routing
+
+Not every page needs the full Pretext engine. The skill reads the design and picks the right tools:
+
+- **Simple layouts** (landing, marketing): `prepare()` + `layout()` for resize-aware heights
+- **Card grids** (dashboard, listing): `prepare()` + `layout()` for self-sizing cards
+- **Chat UIs**: `walkLineRanges()` for tight-fit bubbles with zero wasted pixels
+- **Editorial layouts**: `layoutNextLine()` for text flowing around obstacles
+- **Complex editorial**: Full engine with `layoutWithLines()` for manual line rendering
+
+### The refinement loop
+
+1. Reads the approved mockup from `approved.json`
+2. Uses GPT-4o vision to extract implementation spec (colors, typography, layout)
+3. Generates self-contained HTML with Pretext inlined (15KB, zero network dependency)
+4. Spins up a live-reload server so you see changes instantly
+5. Screenshots at 3 viewports (mobile, tablet, desktop) to verify layout
+6. AskUserQuestion: what needs to change?
+7. Surgical edits via the Edit tool (not full regeneration)
+8. Repeat until you say "done"
+
+### Framework detection
+
+If your project uses React, Svelte, or Vue (detected from `package.json`), the skill offers to generate a framework component instead of vanilla HTML. Framework output uses `npm install @chenglou/pretext` instead of inline vendoring.
+
+### Example
+
+```
+You:   /design-html
+
+Claude: Found approved mockup: variant-A.png (hero section)
+        Design type: simple layout → using prepare() + layout()
+
+        [Generates HTML with Pretext, opens live-reload server]
+        [Screenshots at 375px, 768px, 1440px — all verified]
+
+        The HTML is live at localhost:3456. Resize your browser.
+        Text reflows dynamically. What needs to change?
+
+You:    Make the headline bigger and add more spacing below the CTA
+
+Claude: [Edits font-size and margin, browser auto-reloads]
+        Updated. Two changes applied.
+
+You:    Done
+
+Claude: Saved to ~/.gstack/projects/myapp/designs/hero-20260330/finalized.html
+```
+
+---
+
 ## `/review`

 This is my **paranoid staff engineer mode**.
@@ -525,6 +638,82 @@ A lot of branches die when the interesting work is done and only the boring rele

 ---

+## `/land-and-deploy`
+
+This is my **deploy pipeline mode**.
+
+`/ship` creates the PR. `/land-and-deploy` finishes the job: merge, deploy, verify.
+
+It merges the PR, waits for CI, waits for the deploy to finish, then runs canary checks against production. One command from "approved" to "verified in production." If the deploy breaks, it tells you what failed and whether to rollback.
+
+First run on a new project triggers a dry-run walk-through so you can verify the pipeline before it does anything irreversible. After that, it trusts the config and runs straight through.
+
+### Setup
+
+Run `/setup-deploy` first. It detects your platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, or custom), discovers your production URL and health check endpoints, and writes the config to CLAUDE.md. One-time, 60 seconds.
+
+### Example
+
+```
+You:   /land-and-deploy
+
+Claude: Merging PR #42...
+        CI: 3/3 checks passed
+        Deploy: Fly.io — deploying v2.1.0...
+        Health check: https://myapp.fly.dev/health → 200 OK
+        Canary: 5 pages checked, 0 console errors, p95 < 800ms
+
+        Production verified. v2.1.0 is live.
+```
+
+---
+
+## `/canary`
+
+This is my **post-deploy monitoring mode**.
+
+After deploy, `/canary` watches the live site for trouble. It loops through your key pages using the browse daemon, checking for console errors, performance regressions, page failures, and visual anomalies. Takes periodic screenshots and compares against pre-deploy baselines.
+
+Use it right after `/land-and-deploy`, or schedule it to run periodically after a risky deploy.
+
+```
+You:   /canary https://myapp.com
+
+Claude: Monitoring 8 pages every 2 minutes...
+
+        Cycle 1: ✓ All pages healthy. p95: 340ms. 0 console errors.
+        Cycle 2: ✓ All pages healthy. p95: 380ms. 0 console errors.
+        Cycle 3: ⚠ /dashboard — new console error: "TypeError: Cannot read
+                   property 'map' of undefined" at dashboard.js:142
+                 Screenshot saved.
+
+        Alert: 1 new console error after 3 monitoring cycles.
+```
+
+---
+
+## `/benchmark`
+
+This is my **performance engineer mode**.
+
+`/benchmark` establishes performance baselines for your pages: load time, Core Web Vitals (LCP, CLS, INP), resource counts, and total transfer size. Run it before and after a PR to catch regressions.
+
+It uses the browse daemon for real Chromium measurements, not synthetic estimates. Multiple runs averaged. Results persist so you can track trends across PRs.
+
+```
+You:   /benchmark https://myapp.com
+
+Claude: Benchmarking 5 pages (3 runs each)...
+
+        /           load: 1.2s  LCP: 0.9s  CLS: 0.01  resources: 24 (890KB)
+        /dashboard  load: 2.1s  LCP: 1.8s  CLS: 0.03  resources: 31 (1.4MB)
+        /settings   load: 0.8s  LCP: 0.6s  CLS: 0.00  resources: 18 (420KB)
+
+        Baseline saved. Run again after changes to compare.
+```
+
+---
+
 ## `/cso`

 This is my **Chief Security Officer**.
@@ -711,6 +900,100 @@ Claude: Imported 12 cookies for github.com from Comet.

 ---

+## `/autoplan`
+
+This is my **review autopilot mode**.
+
+Running `/plan-ceo-review`, then `/plan-design-review`, then `/plan-eng-review` individually means answering 15-30 intermediate questions. Each question is valuable, but sometimes you want the gauntlet to run without stopping for every decision.
+
+`/autoplan` reads all three review skills from disk and runs them sequentially: CEO → Design → Eng. It makes decisions automatically using six encoded principles (prefer completeness, match existing patterns, choose reversible options, prefer the option the user chose for similar past decisions, defer ambiguous items, and escalate security). Taste decisions (close approaches, borderline scope expansions, cross-model disagreements) get saved and presented at a final approval gate.
+
+One command, fully reviewed plan out.
+
+```
+You:   /autoplan
+
+Claude: Running CEO review... [4 scope decisions auto-resolved]
+        Running design review... [3 design dimensions auto-scored]
+        Running eng review... [2 architecture decisions auto-resolved]
+
+        TASTE DECISIONS (need your input):
+        1. Scope: Codex suggested adding search — borderline expansion. Add?
+        2. Design: Two approaches scored within 1 point. Which feels right?
+
+        [Shows both options with context]
+
+You:    1) Yes, add search. 2) Option A.
+
+Claude: Plan complete. 9 decisions auto-resolved, 2 taste decisions approved.
+```
+
+---
+
+## `/learn`
+
+This is my **institutional memory mode**.
+
+gstack learns from every session. Patterns, pitfalls, preferences, architectural decisions... they accumulate in `~/.gstack/projects/$SLUG/learnings.jsonl`. Each learning has a confidence score, source attribution, and the files it references.
+
+`/learn` lets you see what gstack has absorbed, search for specific patterns, prune stale entries (when referenced files no longer exist), and export learnings for team sharing. The real magic is in other skills... they automatically search learnings before making recommendations, and display "Prior learning applied" when a past insight is relevant.
+
+```
+You:   /learn
+
+Claude: 23 learnings for this project (14 high confidence, 6 medium, 3 low)
+
+        Top patterns:
+        - [9/10] API responses always wrapped in { data, error } envelope
+        - [8/10] Tests use factory helpers in test/support/factories.ts
+        - [8/10] All DB queries go through repository pattern, never direct
+
+        3 potentially stale (referenced files deleted):
+        - "auth middleware uses JWT" — auth/middleware.ts was deleted
+        [Prune these? Y/N]
+```
+
+---
+
+## `/open-gstack-browser`
+
+This is my **co-presence mode**.
+
+`/browse` runs headless by default. You don't see what the agent sees. `/open-gstack-browser` changes that. It launches GStack Browser (rebranded Chromium with anti-bot stealth) controlled by Playwright, with the sidebar extension auto-loaded. You watch every action in real time.
+
+The sidebar chat is a Claude instance that controls the browser. It auto-routes to the right model: Sonnet for navigation and actions (click, goto, fill, screenshot), Opus for reading and analysis (summarize, find bugs, describe). One-click cookie import from the sidebar footer. The browser stays alive as long as the window is open... no idle timeout in headed mode. The menu bar says "GStack Browser" instead of "Chrome for Testing."
+
+```
+You:   /open-gstack-browser
+
+Claude: Launched GStack Browser with sidebar extension.
+        Anti-bot stealth active. All $B commands run in headed mode.
+        Type in the sidebar to direct the browser agent.
+        Sidebar model routing: sonnet for actions, opus for analysis.
+```
+
+---
+
+## `/setup-deploy`
+
+One-time deploy configuration. Run this before your first `/land-and-deploy`.
+
+It auto-detects your deploy platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, or custom), discovers your production URL, health check endpoints, and deploy status commands. Writes everything to CLAUDE.md so all future deploys are automatic.
+
+```
+You:   /setup-deploy
+
+Claude: Detected: Fly.io (fly.toml found)
+        Production URL: https://myapp.fly.dev
+        Health check: /health → expects 200
+        Deploy command: fly deploy
+        Status command: fly status
+
+        Written to CLAUDE.md. Run /land-and-deploy when ready.
+```
+
+---
+
 ## `/codex`

 This is my **second opinion mode**.