feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216)

* build: vendor xterm@5 for the Terminal sidebar tab Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm` build step that copies the assets into `extension/lib/` at build time. The vendored files are .gitignored so the npm version stays the source of truth. xterm@5 is eval-free, so no MV3 CSP changes needed. No runtime callers yet — this just stages the assets. * feat(server): add pty-session-cookie module for the Terminal tab Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly cookies for authenticating the Terminal-tab WebSocket upgrade against the terminal-agent. Same TTL, same opportunistic-pruning shape, same "scoped tokens never valid as root" invariant. Two registries instead of one because the cookie names are different (`gstack_sse` vs `gstack_pty`) and the token spaces must not overlap. No callers yet — wired up in the next commit. * feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab) Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun non-compiled process. Lives separately from `sidebar-agent.ts` so a WS-framing or PTY-cleanup bug can't take down the chat path (codex outside-voice review caught the coupling risk). Architecture: - Bun.serve on 127.0.0.1:0 (never tunneled). - POST /internal/grant accepts cookie tokens from the parent server over loopback, authenticated with a per-boot internal token. - GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and (b) the gstack_pty cookie minted by /pty-session. Either gate alone is insufficient (CSWSH defense + auth defense). - Lazy spawn: claude PTY is not started until the WS receives its first data frame. Idle sidebar opens cost nothing. - Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at impl time on Bun 1.3.10. proc.terminal.write() for input, proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback on close. - process.on('uncaughtException'|'unhandledRejection') handlers so a framing bug logs but doesn't kill the listener loop. Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration tests spawn /bin/bash instead of requiring claude on every CI runner. Not yet spawned by anything — wired in the next commit. * feat(server): wire /pty-session route + spawn terminal-agent Server-side glue connecting the Terminal sidebar tab to the new terminal-agent process. server.ts: - New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie to the extension. - /health response gains `terminalPort` (just the port number — never a shell token). Tokens flow via the cookie path, never /health, because /health already surfaces AUTH_TOKEN to localhost callers in headed mode (that's a separate v1.1+ TODO). - /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS, so the dual-listener tunnel surface 404s by default-deny. - Shutdown path now also pkills terminal-agent and unlinks its state files (terminal-port + terminal-internal-token) so a reconnect doesn't try to hit a dead port. cli.ts: - After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn fails — chat still works without the terminal agent. * feat(extension): Terminal as default sidebar tab Adds a primary tab bar (Terminal | Chat) above the existing tab-content panes. Terminal is the default-active tab; clicking Chat returns to the existing claude -p one-shot flow which is preserved verbatim. manifest.json: adds ws://127.0.0.1:*/ to host_permissions so MV3 doesn't block the WebSocket upgrade. sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a "Press any key to start Claude Code" bootstrap card, claude-not-found install card, xterm mount point, and "session ended" restart UI. Loads xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no longer the .active default. sidepanel.js: new activePrimaryPaneId() helper that reads which primary tab is selected. Debug-close paths now route back to whichever primary pane is active (was hardcoded to tab-chat). Primary-tab click handler toggles .active classes and aria-selected. window.gstackServerPort and window.gstackAuthToken exposed so sidepanel-terminal.js can build the /pty-session POST and the WS URL. sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first keystroke fires POST /pty-session, then opens ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically by the browser. Resize observer sends {type:"resize"} text frames. ResizeObserver, tab-switch hooks, restart button, install-card retry. On WS close shows "Session ended, click to restart" — no auto-reconnect (codex outside-voice flagged that as session-burning). sidepanel.css: primary-tabs bar + Terminal pane styling (full-height xterm container, install card, ended state). * test: terminal-agent + cookie module + sidebar default-tab regression Three new test files: terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/ revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure since 127.0.0.1 over HTTP), source-level guards that /pty-session and /terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant (spawnClaude is called from message handler, not upgrade), uncaughtException/ unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup. terminal-agent-integration.test.ts (7 tests): spawns the agent as a real subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no cookie → 401), real WebSocket round-trip with /bin/bash via the BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it back), and resize message acceptance. sidebar-tabs.test.ts (13 tests): structural regression suite locking the load-bearing invariants of the default-tab change — Terminal is .active, Chat is not, xterm assets are loaded, debug-close path no longer hardcodes tab-chat (uses activePrimaryPaneId), primary-tab click handler exists, chat surface is not accidentally deleted, terminal JS does NOT auto- reconnect on close, manifest declares ws:// + http:// localhost host permissions, no unsafe-eval. Plan called for Playwright + extension regression; the codebase doesn't ship Playwright extension launcher infra, so we follow the existing extension-test pattern (source-level structural assertions). Same load-bearing intent — locks the invariants before they regress. * docs: Terminal flow + threat model + v1.1 follow-ups SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS upgrade path (/pty-session cookie mint → /ws Origin + cookie gate → lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session, gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback), and the threat-model boundary — the Terminal tab bypasses the entire prompt-injection security stack on purpose; user keystrokes are the trust source. That trust assumption is load-bearing on three transport guarantees: local-only listener, Origin gate, cookie auth. Drop any one of those three and the tab becomes unsafe. CLAUDE.md: extends the "Sidebar architecture" note to include terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is its own process" note so a future contributor doesn't bolt PTY logic onto sidebar-agent.ts. TODOS.md: three new follow-ups under a new "Sidebar Terminal" section: - v1.1: PTY session survives sidebar reload (Issue 1C deferred). - v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 — a pre-existing soft leak that cc-pty-import sidesteps but doesn't fix). - v1.1+: apply terminal-agent's process.on exception handlers to sidebar-agent.ts (codex finding #4 — chat path has no fatal handlers). * feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip The chat queue path is gone. The Chrome side panel is now just an interactive claude PTY in xterm.js. Activity / Refs / Inspector still exist behind the `debug` toggle in the footer. Three threads of change, all from dogfood iteration on top of cc-pty-import: 1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol - Browsers can't set Authorization on a WebSocket upgrade. We had been minting an HttpOnly gstack_pty cookie via /pty-session, but SameSite=Strict cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The WS opened then immediately closed → "Session ended." - /pty-session now also returns ptySessionToken in the JSON body. - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`. Browser sends Sec-WebSocket-Protocol on the upgrade. - Agent reads the protocol header, validates against validTokens, and MUST echo the protocol back (Chromium closes the connection immediately if a server doesn't pick one of the offered protocols). - Cookie path is kept as a fallback for non-browser callers (curl, integration tests). - New integration test exercises the full protocol-auth round-trip via raw fetch+Upgrade so a future regression of this exact class fails in CI. 2. fix(extension): UX polish on the Terminal pane - Eager auto-connect when the sidebar opens — no "Press any key to start" friction every reload. - Always-visible ↻ Restart button in the terminal toolbar (not gated on the ENDED state) so the user can force a fresh claude mid-session. - MutationObserver on #tab-terminal's class attribute drives a fitAddon.fit() + term.refresh() when the pane becomes visible again — xterm doesn't auto-redraw after display:none → display:flex. 3. feat(extension): rip the chat tab + sidebar-agent.ts - Sidebar is Terminal-only. No more Terminal | Chat primary nav. - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat, /sidebar-agent/event, /sidebar-tabs* and friends all deleted. - The pickSidebarModel router (sonnet vs opus) is gone — the live PTY uses whatever model the user's `claude` CLI is configured with. - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive in the Terminal toolbar. Cleanup now injects its prompt into the live PTY via window.gstackInjectToTerminal — no more /sidebar-command POST. The Inspector "Send to Code" action uses the same injection path. - clear-chat button removed from the footer. - sidepanel.js shed ~900 lines of chat polling, optimistic UI, stop-agent, etc. Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27 structural assertions locking the new layout — Terminal sole pane, no chat input, quick-actions in toolbar, eager-connect, MutationObserver repaint, restart helper. * feat: live tab awareness for the Terminal pane claude in the PTY now has continuous tab-aware context. Three pieces: 1. Live state files. background.js listens to chrome.tabs.onActivated / onCreated / onRemoved / onUpdated (throttled to URL/title/status== complete so loading spinners don't spam) and pushes a snapshot. The sidepanel relays it as a custom event; sidepanel-terminal.js sends {type:"tabState"} text frames over the live PTY WebSocket. terminal-agent.ts writes: <stateDir>/tabs.json all open tabs (id, url, title, active, pinned, audible, windowId) <stateDir>/active-tab.json current active tab (skips chrome:// and chrome-extension:// internal pages) Atomic write via tmp + rename so claude never reads a half-written document. A fresh snapshot is pushed on WS open so the files exist by the time claude finishes booting. 2. New $B tab-each <command> [args...] meta-command. Fans out a single command across every open tab, returns {command, args, total, results: [{tabId, url, title, status, output}]}. Skips chrome:// pages; restores the originally active tab in a finally block (so a mid-batch error doesn't leave the user looking at a different tab); uses bringToFront: false so the OS window doesn't jump on every fanout. Scope-checks the inner command BEFORE the loop. 3. --append-system-prompt hint at spawn time. Claude is told about both the state files and the $B tab-each command up front, so it doesn't have to discover the surface by trial. Passed via the --append-system- prompt CLI flag, NOT as a leading PTY write — the hint stays out of the visible transcript. Tests: - browse/test/tab-each.test.ts (new) — registration + source-level invariants (scope check before loop, finally-restore, bringToFront:false, chrome:// skip) + behavior tests with a mock BrowserManager that verify iteration order, JSON shape, error handling, and active-tab restore. - browse/test/terminal-agent.test.ts — three new assertions for tabState handler shape, atomic-write pattern, and the --append-system-prompt wiring at spawn. Verified live: opened 5 tabs, ran $B tab-each url against the live server, got per-tab JSON results back, original active tab restored without OS focus stealing. * chore: drop sidebar-agent test refs after chat rip Five test files / describe blocks targeted the deleted chat path: - browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E with mock claude — whole file gone) - browse/test/security-review-fullstack.test.ts (review-flow E2E with real classifier — whole file gone) - browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for the security event banner that was ripped from sidepanel.html) - browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue permissions, isValidQueueEntry stateFile traversal, loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy guard, /sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation, AGENT_SRC top-level read converted to graceful fallback) - browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split detection on detectCanaryLeak; one tool-output test on sidebar-agent) - test/skill-validation.test.ts (sidebar agent #584 describe block) These all assumed sidebar-agent.ts existed and tested chat-queue plumbing, chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier canary detection. With the live PTY there is no chat queue, no chat tab, no LLM stream to canary-scan, and no per-message subprocess. The Terminal pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts (27 structural assertions), browse/test/terminal-agent.test.ts, and browse/test/terminal-agent-integration.test.ts. bun test → exit 0, 0 failures. * chore: bump version and changelog (v1.14.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(extension): xterm fills the full Terminal panel height The Terminal pane only rendered into the top portion of the panel — most of the panel below the prompt was an empty black gap. Three layered issues, all about xterm.js measuring dimensions during a layout state that wasn't ready yet: 1. order-of-operations in connect(): ensureXterm() ran BEFORE setState(LIVE), so term.open() measured els.mount while it was still display:none. xterm caches a 0-size viewport synchronously inside open() and never auto-recovers when the container goes visible. Flipped: setState(LIVE) → ensureXterm. 2. first fit() ran synchronously before the browser had applied the .active class transition. Wrapped in requestAnimationFrame so layout has settled before fit() reads clientHeight. 3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and the lack of `min-height: 0` on .terminal-mount meant the item couldn't shrink below content size. flex:1 then refused to expand into available space and xterm rendered into whatever its initial 2x2 measurement happened to be. Fixes: - extension/sidepanel-terminal.js: reorder + RAF fit - extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` + `min-height: 0` + `position: relative`. #tab-terminal overrides .tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has its own viewport scroll; the parent shouldn't compete) and explicitly re-declares `display: flex; flex-direction: column` for #tab-terminal.active. bun test browse/test/sidebar-tabs.test.ts → 27/27 pass. Manually verified: side panel opens → Terminal fills full panel height, xterm scrollback works, debug-tab toggle still repaints correctly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:46:40 +02:00 · 2026-04-25 22:52:15 -07:00
parent 23c4d7b228
commit ed1e4be2f6
35 changed files with 2999 additions and 5113 deletions
@@ -853,7 +853,7 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
    // Delete stale state file
    safeUnlinkQuiet(config.stateFile);

-    console.log('Launching headed Chromium with extension + sidebar agent...');
+    console.log('Launching headed Chromium with extension + terminal agent...');
    try {
      // Start server in headed mode with extension auto-loaded
      // Use a well-known port so the Chrome extension auto-connects
@@ -882,56 +882,41 @@ Refs:           After 'snapshot', use @e1, @e2... as selectors:
      const status = await resp.text();
      console.log(`Connected to real Chrome\n${status}`);

-      // Auto-start sidebar agent
-      // __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
-      let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
-      if (!fs.existsSync(agentScript)) {
-        agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
+      // sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
+      // the Terminal pane runs an interactive PTY now, no more one-shot
+      // claude -p subprocesses to multiplex.
+
+      // Auto-start terminal agent (non-compiled bun process). Owns the PTY
+      // WebSocket for the sidebar Terminal pane.
+      let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
+      if (!fs.existsSync(termAgentScript)) {
+        termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');
      }
      try {
-        if (!fs.existsSync(agentScript)) {
-          throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
+        if (fs.existsSync(termAgentScript)) {
+          // Kill old terminal-agents so a stale port file can't trick the
+          // server into routing /pty-session at a dead listener.
+          try {
+            const { spawnSync } = require('child_process');
+            spawnSync('pkill', ['-f', 'terminal-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
+          } catch (err: any) {
+            if (err?.code !== 'ENOENT') throw err;
+          }
+          const termProc = Bun.spawn(['bun', 'run', termAgentScript], {
+            cwd: config.projectDir,
+            env: {
+              ...process.env,
+              BROWSE_STATE_FILE: config.stateFile,
+              BROWSE_SERVER_PORT: String(newState.port),
+            },
+            stdio: ['ignore', 'ignore', 'ignore'],
+          });
+          termProc.unref();
+          console.log(`[browse] Terminal agent started (PID: ${termProc.pid})`);
        }
-        // Clear old agent queue
-        const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
-        try {
-          fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
-          fs.writeFileSync(agentQueue, '', { mode: 0o600 });
-        } catch (err: any) {
-          if (err?.code !== 'EACCES') throw err;
-        }
-
-        // Resolve browse binary path the same way — execPath-relative
-        let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
-        if (!fs.existsSync(browseBin)) {
-          browseBin = process.execPath; // the compiled binary itself
-        }
-
-        // Kill any existing sidebar-agent processes before starting a new one.
-        // Old agents have stale auth tokens and will silently fail to relay events,
-        // causing the server to mark the agent as "hung".
-        try {
-          const { spawnSync } = require('child_process');
-          spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
-        } catch (err: any) {
-          if (err?.code !== 'ENOENT') throw err;
-        }
-
-        const agentProc = Bun.spawn(['bun', 'run', agentScript], {
-          cwd: config.projectDir,
-          env: {
-            ...process.env,
-            BROWSE_BIN: browseBin,
-            BROWSE_STATE_FILE: config.stateFile,
-            BROWSE_SERVER_PORT: String(newState.port),
-          },
-          stdio: ['ignore', 'ignore', 'ignore'],
-        });
-        agentProc.unref();
-        console.log(`[browse] Sidebar agent started (PID: ${agentProc.pid})`);
      } catch (err: any) {
-        console.error(`[browse] Sidebar agent failed to start: ${err.message}`);
-        console.error(`[browse] Run manually: bun run ${agentScript}`);
+        // Non-fatal: chat still works without the terminal agent.
+        console.error(`[browse] Terminal agent failed to start: ${err.message}`);
      }
    } catch (err: any) {
      console.error(`[browse] Connect failed: ${err.message}`);
@@ -30,7 +30,7 @@ export const WRITE_COMMANDS = new Set([
 ]);

 export const META_COMMANDS = new Set([
-  'tabs', 'tab', 'newtab', 'closetab',
+  'tabs', 'tab', 'tab-each', 'newtab', 'closetab',
  'status', 'stop', 'restart',
  'screenshot', 'pdf', 'responsive',
  'chain', 'diff',
@@ -144,6 +144,7 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
  'tab':     { category: 'Tabs', description: 'Switch to tab', usage: 'tab <id>' },
  'newtab':  { category: 'Tabs', description: 'Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf).', usage: 'newtab [url] [--json]' },
  'closetab':{ category: 'Tabs', description: 'Close tab', usage: 'closetab [id]' },
+  'tab-each':{ category: 'Tabs', description: 'Run a command on every open tab. Returns JSON with per-tab results.', usage: 'tab-each <command> [args...]' },
  // Server
  'status':  { category: 'Server', description: 'Health check' },
  'stop':    { category: 'Server', description: 'Shutdown server' },
@@ -285,6 +285,108 @@ export async function handleMetaCommand(
      return `Closed tab${id ? ` ${id}` : ''}`;
    }

+    case 'tab-each': {
+      // Fan out a single command across every open tab. Returns a JSON
+      // object: { results: [{tabId, url, title, status, output}], total }.
+      // Restores the originally active tab when done so the user's view
+      // doesn't shift under them.
+      //
+      // Usage: $B tab-each <command> [args...]
+      //   $B tab-each snapshot -i      → snapshot every tab
+      //   $B tab-each text             → grab clean text from every tab
+      //   $B tab-each goto https://x.y → load the same URL in every tab
+      if (args.length === 0) {
+        throw new Error(
+          'Usage: browse tab-each <command> [args...]\n' +
+          'Example: browse tab-each snapshot -i'
+        );
+      }
+
+      const innerRaw = args[0];
+      const innerName = canonicalizeCommand(innerRaw);
+      const innerArgs = args.slice(1);
+
+      // Scope check the inner command before fanning out, so a single
+      // permission failure aborts the whole batch instead of partially
+      // mutating tabs.
+      if (tokenInfo && tokenInfo.clientId !== 'root' && !checkScope(tokenInfo, innerName)) {
+        throw new Error(
+          `tab-each rejected: subcommand "${innerRaw}" not allowed by your token scope (${tokenInfo.scopes.join(', ')}).`
+        );
+      }
+
+      const tabs = await bm.getTabListWithTitles();
+      const originalActive = tabs.find(t => t.active)?.id ?? bm.getActiveTabId();
+
+      const executeCmd = opts?.executeCommand;
+      const results: Array<{
+        tabId: number;
+        url: string;
+        title: string;
+        status: number;
+        output: string;
+      }> = [];
+
+      try {
+        for (const tab of tabs) {
+          // Skip chrome:// internal pages — they aren't useful targets and
+          // many commands fail outright on them.
+          if (tab.url.startsWith('chrome://') || tab.url.startsWith('chrome-extension://')) {
+            results.push({
+              tabId: tab.id,
+              url: tab.url,
+              title: tab.title || '',
+              status: 0,
+              output: 'skipped: internal page',
+            });
+            continue;
+          }
+          // Switch to the tab. Don't pull focus away — we're a background
+          // operation; the user shouldn't see the OS window jump.
+          bm.switchTab(tab.id, { bringToFront: false });
+
+          let status = 0;
+          let output = '';
+          if (executeCmd) {
+            const r = await executeCmd(
+              { command: innerName, args: innerArgs, tabId: tab.id },
+              tokenInfo,
+            );
+            status = r.status;
+            output = r.result;
+            if (status !== 200) {
+              try { output = JSON.parse(output).error || output; } catch (err: any) { if (!(err instanceof SyntaxError)) throw err; }
+            }
+          } else {
+            // Fallback path (CLI / test harness without a server context).
+            // We don't recurse through read/write/meta directly here because
+            // tab-each is only meaningful with the live server; surface a
+            // clear error.
+            status = 500;
+            output = 'tab-each requires the browse server (no executeCommand context)';
+          }
+
+          results.push({
+            tabId: tab.id,
+            url: tab.url,
+            title: tab.title || '',
+            status,
+            output,
+          });
+        }
+      } finally {
+        // Restore the original active tab so the user's view is unchanged.
+        try { bm.switchTab(originalActive, { bringToFront: false }); } catch {}
+      }
+
+      return JSON.stringify({
+        command: innerName,
+        args: innerArgs,
+        total: results.length,
+        results,
+      }, null, 2);
+    }
+
    // ─── Server Control ────────────────────────────────
    case 'status': {
      const page = bm.getPage();
@@ -0,0 +1,122 @@
+/**
+ * Session cookie registry for the Terminal sidebar tab's PTY WebSocket.
+ *
+ * Why this exists: WebSocket clients in browsers cannot send Authorization
+ * headers on the upgrade request. The terminal-agent's /ws upgrade therefore
+ * authenticates via cookie. We never put the PTY token in /health (codex
+ * outside-voice finding #2: /health already leaks AUTH_TOKEN to any
+ * localhost caller in headed mode; reusing that path for shell access would
+ * widen an existing bug). Instead, the extension does an authenticated
+ * POST /pty-session with the bootstrap AUTH_TOKEN; the server mints a
+ * short-lived cookie scoped to this terminal session and pushes it to the
+ * agent via loopback. The browser then carries the cookie automatically on
+ * the WS upgrade.
+ *
+ * Design mirrors `sse-session-cookie.ts` deliberately. Same TTL, same
+ * scoped-token-must-not-be-valid-as-root invariant, same opportunistic
+ * pruning. Two registries instead of one because the cookie names are
+ * different (`gstack_sse` vs `gstack_pty`) and the token spaces must not
+ * overlap — an SSE-read cookie must never grant PTY access, and vice versa.
+ */
+import * as crypto from 'crypto';
+
+interface Session {
+  createdAt: number;
+  expiresAt: number;
+}
+
+const TTL_MS = 30 * 60 * 1000; // 30 minutes — matches SSE cookie
+const MAX_SESSIONS = 10_000;
+const sessions = new Map<string, Session>();
+
+export const PTY_COOKIE_NAME = 'gstack_pty';
+
+/** Mint a fresh PTY session token. */
+export function mintPtySessionToken(): { token: string; expiresAt: number } {
+  const token = crypto.randomBytes(32).toString('base64url');
+  const now = Date.now();
+  const expiresAt = now + TTL_MS;
+  sessions.set(token, { createdAt: now, expiresAt });
+  pruneExpired(now);
+  return { token, expiresAt };
+}
+
+/**
+ * Validate a token. Returns true only if the token exists AND is not expired.
+ * Lazily removes expired entries; opportunistically prunes a few more on
+ * every call so the registry stays bounded under reconnect pressure.
+ */
+export function validatePtySessionToken(token: string | null | undefined): boolean {
+  if (!token) return false;
+  const s = sessions.get(token);
+  if (!s) {
+    pruneExpired(Date.now());
+    return false;
+  }
+  if (Date.now() > s.expiresAt) {
+    sessions.delete(token);
+    pruneExpired(Date.now());
+    return false;
+  }
+  return true;
+}
+
+/**
+ * Drop a session token (called on WS close so a leaked cookie can't be
+ * replayed against a new PTY).
+ */
+export function revokePtySessionToken(token: string | null | undefined): void {
+  if (!token) return;
+  sessions.delete(token);
+}
+
+/** Parse the PTY session token from a Cookie header. */
+export function extractPtyCookie(req: Request): string | null {
+  const cookieHeader = req.headers.get('cookie');
+  if (!cookieHeader) return null;
+  for (const part of cookieHeader.split(';')) {
+    const [name, ...valueParts] = part.trim().split('=');
+    if (name === PTY_COOKIE_NAME) {
+      return valueParts.join('=') || null;
+    }
+  }
+  return null;
+}
+
+/**
+ * Build the Set-Cookie header value for the PTY session cookie.
+ * - HttpOnly: not readable from JS (mitigates XSS exfiltration).
+ * - SameSite=Strict: not sent on cross-site requests (mitigates CSWSH).
+ * - Path=/: scope to whole origin so /ws and /pty-session both see it.
+ * - Max-Age matches the TTL.
+ *
+ * Secure is intentionally omitted: the daemon binds to 127.0.0.1 over plain
+ * HTTP; setting Secure would prevent the browser from ever sending it back.
+ */
+export function buildPtySetCookie(token: string): string {
+  const maxAge = Math.floor(TTL_MS / 1000);
+  return `${PTY_COOKIE_NAME}=${token}; HttpOnly; SameSite=Strict; Path=/; Max-Age=${maxAge}`;
+}
+
+/** Clear the PTY session cookie. */
+export function buildPtyClearCookie(): string {
+  return `${PTY_COOKIE_NAME}=; HttpOnly; SameSite=Strict; Path=/; Max-Age=0`;
+}
+
+function pruneExpired(now: number): void {
+  let checked = 0;
+  for (const [token, session] of sessions) {
+    if (checked++ >= 20) break;
+    if (session.expiresAt <= now) sessions.delete(token);
+  }
+  while (sessions.size > MAX_SESSIONS) {
+    const first = sessions.keys().next().value;
+    if (!first) break;
+    sessions.delete(first);
+  }
+}
+
+// Test-only reset.
+export function __resetPtySessions(): void {
+  sessions.clear();
+}
@@ -1,947 +0,0 @@
-/**
- * Sidebar Agent — polls agent-queue from server, spawns claude -p for each
- * message, streams live events back to the server via /sidebar-agent/event.
- *
- * This runs as a NON-COMPILED bun process because compiled bun binaries
- * cannot posix_spawn external executables. The server writes to the queue
- * file, this process reads it and spawns claude.
- *
- * Usage: BROWSE_BIN=/path/to/browse bun run browse/src/sidebar-agent.ts
- */
-
-import { spawn } from 'child_process';
-import * as fs from 'fs';
-import * as path from 'path';
-import { safeUnlink } from './error-handling';
-import {
-  checkCanaryInStructure, logAttempt, hashPayload, extractDomain,
-  combineVerdict, writeSessionState, readSessionState, THRESHOLDS,
-  readDecision, clearDecision, excerptForReview,
-  type LayerSignal,
-} from './security';
-import {
-  loadTestsavant, scanPageContent, checkTranscript,
-  shouldRunTranscriptCheck, getClassifierStatus,
-  loadDeberta, scanPageContentDeberta,
-  type ToolCallInput,
-} from './security-classifier';
-
-const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
-const KILL_FILE = path.join(path.dirname(QUEUE), 'sidebar-agent-kill');
-const SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '34567', 10);
-const SERVER_URL = `http://127.0.0.1:${SERVER_PORT}`;
-const POLL_MS = 200;  // 200ms poll — keeps time-to-first-token low
-const B = process.env.BROWSE_BIN || path.resolve(__dirname, '../../.claude/skills/gstack/browse/dist/browse');
-
-const CANCEL_DIR = path.join(process.env.HOME || '/tmp', '.gstack');
-function cancelFileForTab(tabId: number): string {
-  return path.join(CANCEL_DIR, `sidebar-agent-cancel-${tabId}`);
-}
-
-interface QueueEntry {
-  prompt: string;
-  args?: string[];
-  stateFile?: string;
-  cwd?: string;
-  tabId?: number | null;
-  message?: string | null;
-  pageUrl?: string | null;
-  sessionId?: string | null;
-  ts?: string;
-  canary?: string; // session-scoped token; leak = prompt injection evidence
-}
-
-function isValidQueueEntry(e: unknown): e is QueueEntry {
-  if (typeof e !== 'object' || e === null) return false;
-  const obj = e as Record<string, unknown>;
-  if (typeof obj.prompt !== 'string' || obj.prompt.length === 0) return false;
-  if (obj.args !== undefined && (!Array.isArray(obj.args) || !obj.args.every(a => typeof a === 'string'))) return false;
-  if (obj.stateFile !== undefined) {
-    if (typeof obj.stateFile !== 'string') return false;
-    if (obj.stateFile.includes('..')) return false;
-  }
-  if (obj.cwd !== undefined) {
-    if (typeof obj.cwd !== 'string') return false;
-    if (obj.cwd.includes('..')) return false;
-  }
-  if (obj.tabId !== undefined && obj.tabId !== null && typeof obj.tabId !== 'number') return false;
-  if (obj.message !== undefined && obj.message !== null && typeof obj.message !== 'string') return false;
-  if (obj.pageUrl !== undefined && obj.pageUrl !== null && typeof obj.pageUrl !== 'string') return false;
-  if (obj.sessionId !== undefined && obj.sessionId !== null && typeof obj.sessionId !== 'string') return false;
-  if (obj.canary !== undefined && typeof obj.canary !== 'string') return false;
-  return true;
-}
-
-let lastLine = 0;
-let authToken: string | null = null;
-// Per-tab processing — each tab can run its own agent concurrently
-const processingTabs = new Set<number>();
-// Active claude subprocesses — keyed by tabId for targeted kill
-const activeProcs = new Map<number, ReturnType<typeof spawn>>();
-let activeProc: ReturnType<typeof spawn> | null = null;
-// Kill-file timestamp last seen — avoids double-kill on same write
-let lastKillTs = 0;
-
-// ─── File drop relay ──────────────────────────────────────────
-
-function getGitRoot(): string | null {
-  try {
-    const { execSync } = require('child_process');
-    return execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
-  } catch (err: any) {
-    console.debug('[sidebar-agent] Not in a git repo:', err.message);
-    return null;
-  }
-}
-
-function writeToInbox(message: string, pageUrl?: string, sessionId?: string): void {
-  const gitRoot = getGitRoot();
-  if (!gitRoot) {
-    console.error('[sidebar-agent] Cannot write to inbox — not in a git repo');
-    return;
-  }
-
-  const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
-  fs.mkdirSync(inboxDir, { recursive: true, mode: 0o700 });
-
-  const now = new Date();
-  const timestamp = now.toISOString().replace(/:/g, '-');
-  const filename = `${timestamp}-observation.json`;
-  const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
-  const finalFile = path.join(inboxDir, filename);
-
-  const inboxMessage = {
-    type: 'observation',
-    timestamp: now.toISOString(),
-    page: { url: pageUrl || 'unknown', title: '' },
-    userMessage: message,
-    sidebarSessionId: sessionId || 'unknown',
-  };
-
-  fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2), { mode: 0o600 });
-  fs.renameSync(tmpFile, finalFile);
-  console.log(`[sidebar-agent] Wrote inbox message: ${filename}`);
-}
-
-// ─── Auth ────────────────────────────────────────────────────────
-
-async function refreshToken(): Promise<string | null> {
-  // Read token from state file (same-user, mode 0o600) instead of /health
-  try {
-    const stateFile = process.env.BROWSE_STATE_FILE ||
-      path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
-    const data = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
-    authToken = data.token || null;
-    return authToken;
-  } catch (err: any) {
-    console.error('[sidebar-agent] Failed to refresh auth token:', err.message);
-    return null;
-  }
-}
-
-// ─── Event relay to server ──────────────────────────────────────
-
-async function sendEvent(event: Record<string, any>, tabId?: number): Promise<void> {
-  if (!authToken) await refreshToken();
-  if (!authToken) return;
-
-  try {
-    await fetch(`${SERVER_URL}/sidebar-agent/event`, {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-        'Authorization': `Bearer ${authToken}`,
-      },
-      body: JSON.stringify({ ...event, tabId: tabId ?? null }),
-    });
-  } catch (err) {
-    console.error('[sidebar-agent] Failed to send event:', err);
-  }
-}
-
-// ─── Claude subprocess ──────────────────────────────────────────
-
-function shorten(str: string): string {
-  return str
-    .replace(new RegExp(B.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g'), '$B')
-    .replace(/\/Users\/[^/]+/g, '~')
-    .replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
-    .replace(/\.claude\/skills\/gstack\//g, '')
-    .replace(/browse\/dist\/browse/g, '$B');
-}
-
-function describeToolCall(tool: string, input: any): string {
-  if (!input) return '';
-
-  // For Bash commands, generate a plain-English description
-  if (tool === 'Bash' && input.command) {
-    const cmd = input.command;
-
-    // Browse binary commands — the most common case
-    const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
-    if (browseMatch) {
-      const browseCmd = browseMatch[1] || browseMatch[2];
-      const args = cmd.split(/\s+/).slice(2).join(' ');
-      switch (browseCmd) {
-        case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
-        case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
-        case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
-        case 'click': return `Clicking ${args}`;
-        case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
-        case 'text': return 'Reading page text';
-        case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
-        case 'links': return 'Finding all links on the page';
-        case 'forms': return 'Looking for forms';
-        case 'console': return 'Checking browser console for errors';
-        case 'network': return 'Checking network requests';
-        case 'url': return 'Checking current URL';
-        case 'back': return 'Going back';
-        case 'forward': return 'Going forward';
-        case 'reload': return 'Reloading the page';
-        case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
-        case 'wait': return `Waiting for ${args}`;
-        case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
-        case 'style': return `Changing CSS: ${args}`;
-        case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
-        case 'prettyscreenshot': return 'Taking a clean screenshot';
-        case 'css': return `Checking CSS property: ${args}`;
-        case 'is': return `Checking if element is ${args}`;
-        case 'diff': return `Comparing ${args}`;
-        case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
-        case 'status': return 'Checking browser status';
-        case 'tabs': return 'Listing open tabs';
-        case 'focus': return 'Bringing browser to front';
-        case 'select': return `Selecting option in ${args}`;
-        case 'hover': return `Hovering over ${args}`;
-        case 'viewport': return `Setting viewport to ${args}`;
-        case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
-        default: return `Running browse ${browseCmd} ${args}`.trim();
-      }
-    }
-
-    // Non-browse bash commands
-    if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
-    let short = shorten(cmd);
-    return short.length > 100 ? short.slice(0, 100) + '…' : short;
-  }
-
-  if (tool === 'Read' && input.file_path) {
-    // Skip Claude's internal tool-result file reads — they're plumbing, not user-facing
-    if (input.file_path.includes('/tool-results/') || input.file_path.includes('/.claude/projects/')) return '';
-    return `Reading ${shorten(input.file_path)}`;
-  }
-  if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
-  if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
-  if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
-  if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
-  try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
-}
-
-// Keep the old name as an alias for backward compat
-function summarizeToolInput(tool: string, input: any): string {
-  return describeToolCall(tool, input);
-}
-
-/**
- * Scan a Claude stream event for the session canary. Returns the channel where
- * it leaked, or null if clean. Covers every outbound channel: text blocks,
- * text deltas, tool_use arguments (including nested URL/path/command strings),
- * and result payloads.
- */
-function detectCanaryLeak(event: any, canary: string, buf?: DeltaBuffer): string | null {
-  if (!canary) return null;
-
-  if (event.type === 'assistant' && event.message?.content) {
-    for (const block of event.message.content) {
-      if (block.type === 'text' && typeof block.text === 'string' && block.text.includes(canary)) {
-        return 'assistant_text';
-      }
-      if (block.type === 'tool_use' && checkCanaryInStructure(block.input, canary)) {
-        return `tool_use:${block.name}`;
-      }
-    }
-  }
-  if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
-    if (checkCanaryInStructure(event.content_block.input, canary)) {
-      return `tool_use:${event.content_block.name}`;
-    }
-  }
-  if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta') {
-    if (typeof event.delta.text === 'string') {
-      // Rolling buffer: an attacker can ask Claude to emit the canary split
-      // across two deltas (e.g., "CANARY-" then "ABCDEF"). A per-delta
-      // substring check misses this. Concatenate the previous tail with
-      // this chunk and search, then trim the tail to last canary.length-1
-      // chars for the next event.
-      const combined = buf ? buf.text_delta + event.delta.text : event.delta.text;
-      if (combined.includes(canary)) return 'text_delta';
-      if (buf) buf.text_delta = combined.slice(-(canary.length - 1));
-    }
-  }
-  if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
-    if (typeof event.delta.partial_json === 'string') {
-      const combined = buf ? buf.input_json_delta + event.delta.partial_json : event.delta.partial_json;
-      if (combined.includes(canary)) return 'tool_input_delta';
-      if (buf) buf.input_json_delta = combined.slice(-(canary.length - 1));
-    }
-  }
-  if (event.type === 'content_block_stop' && buf) {
-    // Block boundary — reset the rolling buffer so a canary straddling
-    // two independent tool_use blocks isn't inferred.
-    buf.text_delta = '';
-    buf.input_json_delta = '';
-  }
-  if (event.type === 'result' && typeof event.result === 'string' && event.result.includes(canary)) {
-    return 'result';
-  }
-  return null;
-}
-
-/** Rolling-window tails for delta canary detection. See detectCanaryLeak. */
-interface DeltaBuffer {
-  text_delta: string;
-  input_json_delta: string;
-}
-
-interface CanaryContext {
-  canary: string;
-  pageUrl: string;
-  onLeak: (channel: string) => void;
-  deltaBuf: DeltaBuffer;
-}
-
-interface ToolResultScanContext {
-  scan: (toolName: string, text: string) => Promise<void>;
-}
-
-/**
- * Per-tab map of tool_use_id → tool name. Lets the tool_result handler
- * know what tool produced the content (Read, Grep, Glob, Bash $B ...) so
- * we can tag attack logs with the ingress source.
- */
-const toolUseRegistry = new Map<string, { toolName: string; toolInput: unknown }>();
-
-/**
- * Extract plain-text content from a tool_result block. The Claude stream
- * encodes it as either a string or an array of content blocks (text, image).
- * We care about text — images can't carry prompt injection at this layer.
- */
-function extractToolResultText(content: unknown): string {
-  if (typeof content === 'string') return content;
-  if (!Array.isArray(content)) return '';
-  const parts: string[] = [];
-  for (const block of content) {
-    if (block && typeof block === 'object') {
-      const b = block as Record<string, unknown>;
-      if (b.type === 'text' && typeof b.text === 'string') parts.push(b.text);
-    }
-  }
-  return parts.join('\n');
-}
-
-/**
- * Tools whose outputs should be ML-scanned. Bash/$B outputs already get
- * scanned via the page-content flow. Read/Glob/Grep outputs have been
- * uncovered — Codex review flagged this gap. Adding coverage here closes it.
- */
-const SCANNED_TOOLS = new Set(['Read', 'Grep', 'Glob', 'Bash', 'WebFetch']);
-
-async function handleStreamEvent(event: any, tabId?: number, canaryCtx?: CanaryContext, toolResultScanCtx?: ToolResultScanContext): Promise<void> {
-  // Canary check runs BEFORE any outbound send — we never want to relay
-  // a leaked token to the sidepanel UI.
-  if (canaryCtx) {
-    const channel = detectCanaryLeak(event, canaryCtx.canary, canaryCtx.deltaBuf);
-    if (channel) {
-      canaryCtx.onLeak(channel);
-      return; // drop the event — never relay content that leaked the canary
-    }
-  }
-
-  if (event.type === 'system' && event.session_id) {
-    // Relay claude session ID for --resume support
-    await sendEvent({ type: 'system', claudeSessionId: event.session_id }, tabId);
-  }
-
-  if (event.type === 'assistant' && event.message?.content) {
-    for (const block of event.message.content) {
-      if (block.type === 'tool_use') {
-        // Register the tool_use so we can correlate tool_results back to
-        // the originating tool when they arrive in the next user-role message.
-        if (block.id) toolUseRegistry.set(block.id, { toolName: block.name, toolInput: block.input });
-        await sendEvent({ type: 'tool_use', tool: block.name, input: summarizeToolInput(block.name, block.input) }, tabId);
-      } else if (block.type === 'text' && block.text) {
-        await sendEvent({ type: 'text', text: block.text }, tabId);
-      }
-    }
-  }
-
-  // Tool results come back in user-role messages. Content can be a string
-  // or an array of typed content blocks.
-  if (event.type === 'user' && event.message?.content) {
-    for (const block of event.message.content) {
-      if (block && typeof block === 'object' && block.type === 'tool_result') {
-        const meta = block.tool_use_id ? toolUseRegistry.get(block.tool_use_id) : null;
-        const toolName = meta?.toolName ?? 'Unknown';
-        const text = extractToolResultText(block.content);
-        // Scan this tool output with the ML classifier if the tool is in
-        // the SCANNED_TOOLS set and the content is non-trivial.
-        if (SCANNED_TOOLS.has(toolName) && text.length >= 32 && toolResultScanCtx) {
-          // Fire-and-forget — never block the stream handler. If BLOCK
-          // fires, onToolResultBlock handles kill + emit.
-          toolResultScanCtx.scan(toolName, text).catch(() => {});
-        }
-        if (block.tool_use_id) toolUseRegistry.delete(block.tool_use_id);
-      }
-    }
-  }
-
-  if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
-    if (event.content_block.id) {
-      toolUseRegistry.set(event.content_block.id, {
-        toolName: event.content_block.name,
-        toolInput: event.content_block.input,
-      });
-    }
-    await sendEvent({ type: 'tool_use', tool: event.content_block.name, input: summarizeToolInput(event.content_block.name, event.content_block.input) }, tabId);
-  }
-
-  if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta' && event.delta.text) {
-    await sendEvent({ type: 'text_delta', text: event.delta.text }, tabId);
-  }
-
-  // Relay tool results so the sidebar can show what happened
-  if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
-    // Tool input streaming — skip, we already announced the tool
-  }
-
-  if (event.type === 'result') {
-    await sendEvent({ type: 'result', text: event.result || '' }, tabId);
-  }
-
-  // Tool result events — summarize and relay
-  if (event.type === 'tool_result' || (event.type === 'assistant' && event.message?.content)) {
-    // Tool results come in the next assistant turn — handled above
-  }
-}
-
-/**
- * Fire the prompt-injection-detected event to the server. This terminates
- * the session from the sidepanel's perspective and renders the canary leak
- * banner. Also logs locally (salted hash + domain only) and fires telemetry
- * if configured.
- */
-async function onCanaryLeaked(params: {
-  tabId: number;
-  channel: string;
-  canary: string;
-  pageUrl: string;
-}): Promise<void> {
-  const { tabId, channel, canary, pageUrl } = params;
-  const domain = extractDomain(pageUrl);
-  console.warn(`[sidebar-agent] CANARY LEAK detected on ${channel} for tab ${tabId} (domain=${domain || 'unknown'})`);
-
-  // Local log — salted hash + domain only, never the payload
-  logAttempt({
-    ts: new Date().toISOString(),
-    urlDomain: domain,
-    payloadHash: hashPayload(canary), // hash the canary, not the payload (which might be leaked content)
-    confidence: 1.0,
-    layer: 'canary',
-    verdict: 'block',
-  });
-
-  // Broadcast to sidepanel so it can render the approved banner
-  await sendEvent({
-    type: 'security_event',
-    verdict: 'block',
-    reason: 'canary_leaked',
-    layer: 'canary',
-    channel,
-    domain,
-  }, tabId);
-
-  // Also emit agent_error so the sidepanel's existing error surface
-  // reflects that the session terminated. Keeps old clients working.
-  await sendEvent({
-    type: 'agent_error',
-    error: `Session terminated — prompt injection detected${domain ? ` from ${domain}` : ''}`,
-  }, tabId);
-}
-
-/**
- * Pre-spawn ML scan of the user message. If the classifier fires at BLOCK,
- * we log the attempt, emit a security_event to the sidepanel, and DO NOT
- * spawn claude. Returns true if the scan blocked the session.
- *
- * Fail-open: any classifier error or degraded state returns false (safe) so
- * the sidebar keeps working. The architectural controls (XML framing +
- * command allowlist, live in server.ts:554-577) still defend.
- */
-async function preSpawnSecurityCheck(entry: QueueEntry): Promise<boolean> {
-  const { message, canary, pageUrl, tabId } = entry;
-  if (!message || message.length === 0) return false;
-  const tid = tabId ?? 0;
-
-  // L4: scan the user message for direct injection patterns (TestSavantAI)
-  // L4c: also scan with DeBERTa-v3 when ensemble is enabled (opt-in)
-  const [contentSignal, debertaSignal] = await Promise.all([
-    scanPageContent(message),
-    scanPageContentDeberta(message),
-  ]);
-  const signals: LayerSignal[] = [contentSignal, debertaSignal];
-
-  // L4b: only bother with Haiku if another layer already lit up at >= LOG_ONLY.
-  // Saves ~70% of Haiku calls per plan §E1 "gating optimization".
-  if (shouldRunTranscriptCheck(signals)) {
-    const transcriptSignal = await checkTranscript({
-      user_message: message,
-      tool_calls: [], // no tool calls yet at session start
-    });
-    signals.push(transcriptSignal);
-  }
-
-  const result = combineVerdict(signals);
-  if (result.verdict !== 'block') return false;
-
-  // BLOCK verdict. Log + emit + refuse to spawn.
-  const domain = extractDomain(pageUrl ?? '');
-  const leaderSignal = signals.reduce((a, b) => (a.confidence > b.confidence ? a : b));
-
-  logAttempt({
-    ts: new Date().toISOString(),
-    urlDomain: domain,
-    payloadHash: hashPayload(message),
-    confidence: result.confidence,
-    layer: leaderSignal.layer,
-    verdict: 'block',
-  });
-
-  console.warn(`[sidebar-agent] Pre-spawn BLOCK (${result.reason}) for tab ${tid}, confidence=${result.confidence.toFixed(3)}`);
-
-  await sendEvent({
-    type: 'security_event',
-    verdict: 'block',
-    reason: result.reason ?? 'ml_classifier',
-    layer: leaderSignal.layer,
-    confidence: result.confidence,
-    domain,
-  }, tid);
-  await sendEvent({
-    type: 'agent_error',
-    error: `Session blocked — prompt injection detected${domain ? ` from ${domain}` : ' in your message'}`,
-  }, tid);
-
-  return true;
-}
-
-async function askClaude(queueEntry: QueueEntry): Promise<void> {
-  const { prompt, args, stateFile, cwd, tabId, canary, pageUrl } = queueEntry;
-  const tid = tabId ?? 0;
-
-  processingTabs.add(tid);
-  await sendEvent({ type: 'agent_start' }, tid);
-
-  // Pre-spawn ML scan: if the user message trips the ensemble, refuse to
-  // spawn claude. Fail-open on classifier errors.
-  if (await preSpawnSecurityCheck(queueEntry)) {
-    processingTabs.delete(tid);
-    return;
-  }
-
-  return new Promise((resolve) => {
-    // Canary context is set after proc is spawned (needs proc reference for kill).
-    let canaryCtx: CanaryContext | undefined;
-    let canaryTriggered = false;
-
-    // Use args from queue entry (server sets --model, --allowedTools, prompt framing).
-    // Fall back to defaults only if queue entry has no args (backward compat).
-    // Write doesn't expand attack surface beyond what Bash already provides.
-    // The security boundary is the localhost-only message path, not the tool allowlist.
-    let claudeArgs = args || ['-p', prompt, '--output-format', 'stream-json', '--verbose',
-      '--allowedTools', 'Bash,Read,Glob,Grep,Write'];
-
-    // Validate cwd exists — queue may reference a stale worktree
-    let effectiveCwd = cwd || process.cwd();
-    try { fs.accessSync(effectiveCwd); } catch (err: any) {
-      console.warn('[sidebar-agent] Worktree path inaccessible, falling back to cwd:', effectiveCwd, err.message);
-      effectiveCwd = process.cwd();
-    }
-
-    // Clear any stale cancel signal for this tab before starting
-    const cancelFile = cancelFileForTab(tid);
-    safeUnlink(cancelFile);
-
-    const proc = spawn('claude', claudeArgs, {
-      stdio: ['pipe', 'pipe', 'pipe'],
-      cwd: effectiveCwd,
-      env: {
-        ...process.env,
-        BROWSE_STATE_FILE: stateFile || '',
-        // Connect to the existing headed browse server, never start a new one.
-        // BROWSE_PORT tells the CLI which port to check.
-        // BROWSE_NO_AUTOSTART prevents spawning an invisible headless browser
-        // if the headed server is down — fail fast with a clear error instead.
-        BROWSE_PORT: process.env.BROWSE_PORT || '34567',
-        BROWSE_NO_AUTOSTART: '1',
-        // Pin this agent to its tab — prevents cross-tab interference
-        // when multiple agents run simultaneously
-        BROWSE_TAB: String(tid),
-      },
-    });
-
-    // Track active procs so kill-file polling can terminate them
-    activeProcs.set(tid, proc);
-    activeProc = proc;
-
-    proc.stdin.end();
-
-    // Now that proc exists, set up the canary-leak handler. It fires at most
-    // once; on fire we kill the subprocess, emit security_event + agent_error,
-    // and let the normal close handler resolve the promise.
-    if (canary) {
-      canaryCtx = {
-        canary,
-        pageUrl: pageUrl ?? '',
-        deltaBuf: { text_delta: '', input_json_delta: '' },
-        onLeak: (channel: string) => {
-          if (canaryTriggered) return;
-          canaryTriggered = true;
-          onCanaryLeaked({ tabId: tid, channel, canary, pageUrl: pageUrl ?? '' });
-          try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-          setTimeout(() => {
-            try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-          }, 2000);
-        },
-      };
-    }
-
-    // Tool-result ML scan context. Addresses the Codex review gap: Read,
-    // Grep, Glob, and WebFetch outputs enter Claude's context without
-    // passing through the Bash $B pipeline that content-security.ts
-    // already wraps. Scan them here.
-    let toolResultBlockFired = false;
-    const toolResultScanCtx: ToolResultScanContext = {
-      scan: async (toolName: string, text: string) => {
-        if (toolResultBlockFired) return;
-        // Parallel L4 + L4c ensemble scan (DeBERTa no-op when disabled).
-        // We run L4/L4c AND Haiku in parallel on tool outputs regardless of
-        // L4's score, because BrowseSafe-Bench shows L4 (TestSavantAI) has
-        // low recall on browser-agent-specific attacks (~15% at v1). Gating
-        // Haiku on L4 meant our best signal almost never ran. The cost is
-        // ~$0.002 + ~300ms per tool output, bounded by the Haiku timeout
-        // and offset by Haiku actually seeing the real attack context.
-        //
-        // Haiku only runs when the Claude CLI is available (checkHaikuAvailable
-        // caches the probe). In environments without it, the call returns a
-        // degraded signal and the verdict falls back to L4 alone.
-        const [contentSignal, debertaSignal, transcriptSignal] = await Promise.all([
-          scanPageContent(text),
-          scanPageContentDeberta(text),
-          checkTranscript({
-            user_message: queueEntry.message ?? '',
-            tool_calls: [{ tool_name: toolName, tool_input: {} }],
-            tool_output: text,
-          }),
-        ]);
-        const signals: LayerSignal[] = [contentSignal, debertaSignal, transcriptSignal];
-        const result = combineVerdict(signals, { toolOutput: true });
-        if (result.verdict !== 'block') return;
-        toolResultBlockFired = true;
-        const domain = extractDomain(pageUrl ?? '');
-        const payloadHash = hashPayload(text.slice(0, 4096));
-
-        // Log pending — if the user overrides, we'll update via a separate
-        // log line. The attempts.jsonl is append-only so both entries survive.
-        logAttempt({
-          ts: new Date().toISOString(),
-          urlDomain: domain,
-          payloadHash,
-          confidence: result.confidence,
-          layer: 'testsavant_content',
-          verdict: 'block',
-        });
-        console.warn(`[sidebar-agent] Tool-result BLOCK on ${toolName} for tab ${tid} (confidence=${result.confidence.toFixed(3)}) — awaiting user decision`);
-
-        // Surface a REVIEWABLE block event. Sidepanel renders the suspected
-        // text + layer scores + [Allow and continue] / [Block session] buttons.
-        // The user has 60s to decide; default is BLOCK (safe fallback).
-        const layerScores = signals
-          .filter((s) => s.confidence > 0)
-          .map((s) => ({ layer: s.layer, confidence: s.confidence }));
-        await sendEvent({
-          type: 'security_event',
-          verdict: 'block',
-          reason: 'tool_result_ml',
-          layer: 'testsavant_content',
-          confidence: result.confidence,
-          domain,
-          tool: toolName,
-          reviewable: true,
-          suspected_text: excerptForReview(text),
-          signals: layerScores,
-        }, tid);
-
-        // Poll for the user's decision. Default to BLOCK on timeout.
-        const REVIEW_TIMEOUT_MS = 60_000;
-        const POLL_MS = 500;
-        clearDecision(tid); // clear any stale decision from a prior session
-        const deadline = Date.now() + REVIEW_TIMEOUT_MS;
-        let decision: 'allow' | 'block' = 'block';
-        let decisionReason = 'timeout';
-        while (Date.now() < deadline) {
-          const rec = readDecision(tid);
-          if (rec?.decision === 'allow' || rec?.decision === 'block') {
-            decision = rec.decision;
-            decisionReason = rec.reason ?? 'user';
-            break;
-          }
-          await new Promise((r) => setTimeout(r, POLL_MS));
-        }
-        clearDecision(tid);
-
-        if (decision === 'allow') {
-          // User overrode. Log the override so the audit trail captures it.
-          // toolResultBlockFired stays true so we don't re-prompt within the
-          // same message — one override per BLOCK event.
-          logAttempt({
-            ts: new Date().toISOString(),
-            urlDomain: domain,
-            payloadHash,
-            confidence: result.confidence,
-            layer: 'testsavant_content',
-            verdict: 'user_overrode',
-          });
-          await sendEvent({
-            type: 'security_event',
-            verdict: 'user_overrode',
-            reason: 'tool_result_ml',
-            layer: 'testsavant_content',
-            confidence: result.confidence,
-            domain,
-            tool: toolName,
-          }, tid);
-          console.warn(`[sidebar-agent] Tab ${tid}: user overrode BLOCK — session continues`);
-          // Let the block stay consumed; reset the flag so subsequent tool
-          // results get scanned fresh.
-          toolResultBlockFired = false;
-          return;
-        }
-
-        // User chose BLOCK (or timed out). Kill the session as before.
-        await sendEvent({
-          type: 'agent_error',
-          error: `Session terminated — prompt injection detected in ${toolName} output${decisionReason === 'timeout' ? ' (review timeout)' : ''}`,
-        }, tid);
-        try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-        setTimeout(() => {
-          try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-        }, 2000);
-      },
-    };
-
-    // Poll for per-tab cancel signal from server's killAgent()
-    const cancelCheck = setInterval(() => {
-      try {
-        if (fs.existsSync(cancelFile)) {
-          console.log(`[sidebar-agent] Cancel signal received for tab ${tid} — killing claude subprocess`);
-          try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-          setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
-          fs.unlinkSync(cancelFile);
-          clearInterval(cancelCheck);
-        }
-      } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
-    }, 500);
-
-    let buffer = '';
-
-    proc.stdout.on('data', (data: Buffer) => {
-      buffer += data.toString();
-      const lines = buffer.split('\n');
-      buffer = lines.pop() || '';
-      for (const line of lines) {
-        if (!line.trim()) continue;
-        try { handleStreamEvent(JSON.parse(line), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
-          console.error(`[sidebar-agent] Tab ${tid}: Failed to parse stream line:`, line.slice(0, 100), err.message);
-        }
-      }
-    });
-
-    let stderrBuffer = '';
-    proc.stderr.on('data', (data: Buffer) => {
-      stderrBuffer += data.toString();
-    });
-
-    proc.on('close', (code) => {
-      clearInterval(cancelCheck);
-      activeProc = null;
-      activeProcs.delete(tid);
-      if (buffer.trim()) {
-        try { handleStreamEvent(JSON.parse(buffer), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
-          console.error(`[sidebar-agent] Tab ${tid}: Failed to parse final buffer:`, buffer.slice(0, 100), err.message);
-        }
-      }
-      const doneEvent: Record<string, any> = { type: 'agent_done' };
-      if (code !== 0 && stderrBuffer.trim()) {
-        doneEvent.stderr = stderrBuffer.trim().slice(-500);
-      }
-      sendEvent(doneEvent, tid).then(() => {
-        processingTabs.delete(tid);
-        resolve();
-      });
-    });
-
-    proc.on('error', (err) => {
-      clearInterval(cancelCheck);
-      activeProc = null;
-      const errorMsg = stderrBuffer.trim()
-        ? `${err.message}\nstderr: ${stderrBuffer.trim().slice(-500)}`
-        : err.message;
-      sendEvent({ type: 'agent_error', error: errorMsg }, tid).then(() => {
-        processingTabs.delete(tid);
-        resolve();
-      });
-    });
-
-    // Timeout (default 300s / 5 min — multi-page tasks need time)
-    const timeoutMs = parseInt(process.env.SIDEBAR_AGENT_TIMEOUT || '300000', 10);
-    setTimeout(() => {
-      try { proc.kill('SIGTERM'); } catch (killErr: any) {
-        console.warn(`[sidebar-agent] Tab ${tid}: Failed to kill timed-out process:`, killErr.message);
-      }
-      setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
-      const timeoutMsg = stderrBuffer.trim()
-        ? `Timed out after ${timeoutMs / 1000}s\nstderr: ${stderrBuffer.trim().slice(-500)}`
-        : `Timed out after ${timeoutMs / 1000}s`;
-      sendEvent({ type: 'agent_error', error: timeoutMsg }, tid).then(() => {
-        processingTabs.delete(tid);
-        resolve();
-      });
-    }, timeoutMs);
-  });
-}
-
-// ─── Poll loop ───────────────────────────────────────────────────
-
-function countLines(): number {
-  try {
-    return fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean).length;
-  } catch (err: any) {
-    console.error('[sidebar-agent] Failed to read queue file:', err.message);
-    return 0;
-  }
-}
-
-function readLine(n: number): string | null {
-  try {
-    const lines = fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean);
-    return lines[n - 1] || null;
-  } catch (err: any) {
-    console.error(`[sidebar-agent] Failed to read queue line ${n}:`, err.message);
-    return null;
-  }
-}
-
-async function poll() {
-  const current = countLines();
-  if (current <= lastLine) return;
-
-  while (lastLine < current) {
-    lastLine++;
-    const line = readLine(lastLine);
-    if (!line) continue;
-
-    let parsed: unknown;
-    try { parsed = JSON.parse(line); } catch (err: any) {
-      console.warn(`[sidebar-agent] Skipping malformed queue entry at line ${lastLine}:`, line.slice(0, 80), err.message);
-      continue;
-    }
-    if (!isValidQueueEntry(parsed)) {
-      console.warn(`[sidebar-agent] Skipping invalid queue entry at line ${lastLine}: failed schema validation`);
-      continue;
-    }
-    const entry = parsed;
-
-    const tid = entry.tabId ?? 0;
-    // Skip if this tab already has an agent running — server queues per-tab
-    if (processingTabs.has(tid)) continue;
-
-    console.log(`[sidebar-agent] Processing tab ${tid}: "${entry.message}"`);
-    // Write to inbox so workspace agent can pick it up
-    writeToInbox(entry.message || entry.prompt, entry.pageUrl, entry.sessionId);
-    // Fire and forget — each tab's agent runs concurrently
-    askClaude(entry).catch((err) => {
-      console.error(`[sidebar-agent] Error on tab ${tid}:`, err);
-      sendEvent({ type: 'agent_error', error: String(err) }, tid);
-    });
-  }
-}
-
-// ─── Main ────────────────────────────────────────────────────────
-
-function pollKillFile(): void {
-  try {
-    const stat = fs.statSync(KILL_FILE);
-    const mtime = stat.mtimeMs;
-    if (mtime > lastKillTs) {
-      lastKillTs = mtime;
-      if (activeProcs.size > 0) {
-        console.log(`[sidebar-agent] Kill signal received — terminating ${activeProcs.size} active agent(s)`);
-        for (const [tid, proc] of activeProcs) {
-          try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
-          setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 2000);
-          processingTabs.delete(tid);
-        }
-        activeProcs.clear();
-      }
-    }
-  } catch {
-    // Kill file doesn't exist yet — normal state
-  }
-}
-
-async function main() {
-  const dir = path.dirname(QUEUE);
-  fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
-  if (!fs.existsSync(QUEUE)) fs.writeFileSync(QUEUE, '', { mode: 0o600 });
-  try { fs.chmodSync(QUEUE, 0o600); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
-
-  lastLine = countLines();
-  await refreshToken();
-
-  console.log(`[sidebar-agent] Started. Watching ${QUEUE} from line ${lastLine}`);
-  console.log(`[sidebar-agent] Server: ${SERVER_URL}`);
-  console.log(`[sidebar-agent] Browse binary: ${B}`);
-
-  // If GSTACK_SECURITY_ENSEMBLE=deberta is set, also warm the DeBERTa-v3
-  // ensemble classifier. Fire-and-forget alongside TestSavantAI — they
-  // warm in parallel. No-op when the env var is unset.
-  loadDeberta((msg) => console.log(`[security-classifier] ${msg}`))
-    .catch((err) => console.warn('[sidebar-agent] DeBERTa warmup failed:', err?.message));
-
-  // Warm up the ML classifier in the background. First call triggers a 112MB
-  // download (~30s on average broadband). Non-blocking — the sidebar stays
-  // functional on cold start; classifier just reports 'off' until warmed.
-  //
-  // On warmup completion (success or failure), write the classifier status to
-  // ~/.gstack/security/session-state.json so server.ts's /health endpoint can
-  // report it to the sidepanel for shield icon rendering.
-  loadTestsavant((msg) => console.log(`[security-classifier] ${msg}`))
-    .then(() => {
-      const s = getClassifierStatus();
-      console.log(`[sidebar-agent] Classifier warmup complete: ${JSON.stringify(s)}`);
-      const existing = readSessionState();
-      writeSessionState({
-        sessionId: existing?.sessionId ?? String(process.pid),
-        canary: existing?.canary ?? '',
-        warnedDomains: existing?.warnedDomains ?? [],
-        classifierStatus: s,
-        lastUpdated: new Date().toISOString(),
-      });
-    })
-    .catch((err) => console.warn('[sidebar-agent] Classifier warmup failed (degraded mode):', err?.message));
-
-  setInterval(poll, POLL_MS);
-  setInterval(pollKillFile, POLL_MS);
-}
-
-main().catch(console.error);
@@ -0,0 +1,556 @@
+/**
+ * Terminal Agent — PTY-backed Claude Code terminal for the gstack browser
+ * sidebar. Translates the phoenix gbrowser PTY (cmd/gbd/terminal.go) into
+ * Bun, with a few changes informed by codex's outside-voice review:
+ *
+ *  - Lives in a separate non-compiled bun process from sidebar-agent.ts so
+ *    a bug in WS framing or PTY cleanup can't take down the chat path.
+ *  - Binds 127.0.0.1 only — never on the dual-listener tunnel surface.
+ *  - Origin validation on the WS upgrade is REQUIRED (not defense-in-depth)
+ *    because a localhost shell WS is a real cross-site WebSocket-hijacking
+ *    target.
+ *  - Cookie-based auth via /internal/grant from the parent server, not a
+ *    token in /health.
+ *  - Lazy spawn: claude PTY is not spawned until the WS receives its first
+ *    data frame. Sidebar opens that never type don't burn a claude session.
+ *  - PTY dies with WS close (one PTY per WS). v1.1 may add session
+ *    survival; for v1 we match phoenix's lifecycle.
+ *
+ * The PTY uses Bun's `terminal:` spawn option (verified at impl time on
+ * Bun 1.3.10): pass cols/rows + a data callback; write input via
+ * `proc.terminal.write(buf)`; resize via `proc.terminal.resize(cols, rows)`.
+ */
+import * as fs from 'fs';
+import * as path from 'path';
+import * as crypto from 'crypto';
+import { safeUnlink } from './error-handling';
+
+const STATE_FILE = process.env.BROWSE_STATE_FILE || path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
+const PORT_FILE = path.join(path.dirname(STATE_FILE), 'terminal-port');
+const BROWSE_SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '0', 10);
+const EXTENSION_ID = process.env.BROWSE_EXTENSION_ID || ''; // optional: tighten Origin check
+const INTERNAL_TOKEN = crypto.randomBytes(32).toString('base64url'); // shared with parent server via env at spawn
+
+// In-memory cookie token registry. Parent posts /internal/grant after
+// /pty-session; we validate WS cookies against this set.
+const validTokens = new Set<string>();
+
+// Active PTY session per WS. One terminal per connection. Codex finding #4:
+// uncaught handlers below catch bugs in framing/cleanup so they don't kill
+// the listener loop.
+process.on('uncaughtException', (err) => {
+  console.error('[terminal-agent] uncaughtException:', err);
+});
+process.on('unhandledRejection', (reason) => {
+  console.error('[terminal-agent] unhandledRejection:', reason);
+});
+
+interface PtySession {
+  proc: any | null;        // Bun.Subprocess once spawned
+  cols: number;
+  rows: number;
+  cookie: string;
+  spawned: boolean;
+}
+
+const sessions = new WeakMap<any, PtySession>(); // ws -> session
+
+/** Find claude on PATH. */
+function findClaude(): string | null {
+  // Test-only override. Lets the integration tests spawn /bin/bash instead
+  // of requiring claude to be installed on every CI runner. NEVER read in
+  // production (sidebar UI). Documented in browse/test/terminal-agent-integration.test.ts.
+  const override = process.env.BROWSE_TERMINAL_BINARY;
+  if (override && fs.existsSync(override)) return override;
+  // Bun.which is sync and respects PATH. Falls back to a small list of
+  // common install locations if PATH is stripped (e.g., launched from
+  // Conductor with a minimal env).
+  const which = (Bun as any).which?.('claude');
+  if (which) return which;
+  const candidates = [
+    '/opt/homebrew/bin/claude',
+    '/usr/local/bin/claude',
+    `${process.env.HOME}/.local/bin/claude`,
+    `${process.env.HOME}/.bun/bin/claude`,
+    `${process.env.HOME}/.npm-global/bin/claude`,
+  ];
+  for (const c of candidates) {
+    try { fs.accessSync(c, fs.constants.X_OK); return c; } catch {}
+  }
+  return null;
+}
+
+/** Probe + persist claude availability for the bootstrap card. */
+function writeClaudeAvailable(): void {
+  const stateDir = path.dirname(STATE_FILE);
+  try { fs.mkdirSync(stateDir, { recursive: true, mode: 0o700 }); } catch {}
+  const found = findClaude();
+  const status = {
+    available: !!found,
+    path: found || undefined,
+    install_url: 'https://docs.anthropic.com/en/docs/claude-code',
+    checked_at: new Date().toISOString(),
+  };
+  const target = path.join(stateDir, 'claude-available.json');
+  const tmp = path.join(stateDir, `.tmp-claude-${process.pid}`);
+  try {
+    fs.writeFileSync(tmp, JSON.stringify(status, null, 2), { mode: 0o600 });
+    fs.renameSync(tmp, target);
+  } catch {
+    safeUnlink(tmp);
+  }
+}
+
+/**
+ * System-prompt hint passed to claude via --append-system-prompt. Tells
+ * claude what tab-awareness affordances exist in this session so it
+ * doesn't have to discover them by trial. The user can override anything
+ * here just by saying so — system prompt is a soft hint, not a contract.
+ *
+ * Two paths claude has:
+ *   1. Read live state from <stateDir>/tabs.json + active-tab.json
+ *      (updated continuously by the gstack browser extension).
+ *   2. Run $B tab, $B tabs, $B tab-each <command> to act on tabs. The
+ *      tab-each helper fans a single command across every open tab and
+ *      returns per-tab results as JSON.
+ */
+function buildTabAwarenessHint(stateDir: string): string {
+  const tabsFile = path.join(stateDir, 'tabs.json');
+  const activeFile = path.join(stateDir, 'active-tab.json');
+  return [
+    'You are running inside the gstack browser sidebar with live access to the user\'s browser tabs.',
+    '',
+    'Tab state files (kept fresh automatically by the extension):',
+    `  ${tabsFile}        — all open tabs (id, url, title, active, pinned)`,
+    `  ${activeFile}    — the currently active tab`,
+    'Read these any time the user asks about "tabs", "the current page", or anything multi-tab. Do NOT shell out to $B tabs just to learn what\'s open — read the file.',
+    '',
+    'Tab manipulation commands (via $B):',
+    '  $B tab <id>                 — switch to a tab',
+    '  $B newtab [url]             — open a new tab',
+    '  $B closetab [id]            — close a tab (current if no id)',
+    '  $B tab-each <command>       — fan out a command across every tab; returns JSON results',
+    '',
+    'When the user asks for multi-tab work, prefer $B tab-each. Examples:',
+    '  $B tab-each snapshot -i     — grab a snapshot from every tab',
+    '  $B tab-each text            — pull clean text from every tab',
+    '  $B tab-each title           — list every tab\'s title',
+    '',
+    'You\'re in a real terminal with a real PTY — slash commands, /resume, ANSI colors all work as in a normal claude session.',
+  ].join('\n');
+}
+
+/** Spawn claude in a PTY. Returns null if claude not on PATH. */
+function spawnClaude(cols: number, rows: number, onData: (chunk: Buffer) => void) {
+  const claudePath = findClaude();
+  if (!claudePath) return null;
+
+  // Match phoenix env so claude knows which browse server to talk to and
+  // doesn't try to autostart its own. BROWSE_HEADED=1 keeps the existing
+  // headed-mode browser; BROWSE_NO_AUTOSTART prevents claude's gstack
+  // tooling from racing to spawn another server.
+  const env: Record<string, string> = {
+    ...process.env as any,
+    BROWSE_PORT: String(BROWSE_SERVER_PORT),
+    BROWSE_STATE_FILE: STATE_FILE,
+    BROWSE_NO_AUTOSTART: '1',
+    BROWSE_HEADED: '1',
+    TERM: 'xterm-256color',
+    COLORTERM: 'truecolor',
+  };
+
+  // --append-system-prompt is the right injection surface (per `claude --help`):
+  // it gets appended to the model's system prompt, so claude treats this as
+  // contextual guidance, not a user message. Don't use a leading PTY write
+  // for this — that would show up as if the user typed the hint, polluting
+  // the visible transcript.
+  const stateDir = path.dirname(STATE_FILE);
+  const tabHint = buildTabAwarenessHint(stateDir);
+
+  const proc = (Bun as any).spawn([claudePath, '--append-system-prompt', tabHint], {
+    terminal: {
+      rows,
+      cols,
+      data(_terminal: any, chunk: Buffer) { onData(chunk); },
+    },
+    env,
+  });
+  return proc;
+}
+
+/** Cleanup a PTY session: SIGINT, then SIGKILL after 3s. */
+function disposeSession(session: PtySession): void {
+  try { session.proc?.terminal?.close?.(); } catch {}
+  if (session.proc?.pid) {
+    try { session.proc.kill?.('SIGINT'); } catch {}
+    setTimeout(() => {
+      try {
+        if (session.proc && !session.proc.killed) session.proc.kill?.('SIGKILL');
+      } catch {}
+    }, 3000);
+  }
+  session.proc = null;
+  session.spawned = false;
+}
+
+/**
+ * Build the HTTP server. Two routes:
+ *   POST /internal/grant — parent server pushes a fresh cookie token
+ *   GET  /ws             — extension upgrades to WebSocket (PTY transport)
+ *
+ * Everything else returns 404. The listener binds 127.0.0.1 only.
+ */
+function buildServer() {
+  return Bun.serve({
+    hostname: '127.0.0.1',
+    port: 0,
+    idleTimeout: 0, // PTY connections are long-lived; default idleTimeout would kill them
+
+    fetch(req, server) {
+      const url = new URL(req.url);
+
+      // /internal/grant — loopback-only handshake from parent server.
+      if (url.pathname === '/internal/grant' && req.method === 'POST') {
+        const auth = req.headers.get('authorization');
+        if (auth !== `Bearer ${INTERNAL_TOKEN}`) {
+          return new Response('forbidden', { status: 403 });
+        }
+        return req.json().then((body: any) => {
+          if (typeof body?.token === 'string' && body.token.length > 16) {
+            validTokens.add(body.token);
+          }
+          return new Response('ok');
+        }).catch(() => new Response('bad', { status: 400 }));
+      }
+
+      // /internal/revoke — drop a token (called on WS close or bootstrap reload)
+      if (url.pathname === '/internal/revoke' && req.method === 'POST') {
+        const auth = req.headers.get('authorization');
+        if (auth !== `Bearer ${INTERNAL_TOKEN}`) {
+          return new Response('forbidden', { status: 403 });
+        }
+        return req.json().then((body: any) => {
+          if (typeof body?.token === 'string') validTokens.delete(body.token);
+          return new Response('ok');
+        }).catch(() => new Response('bad', { status: 400 }));
+      }
+
+      // /claude-available — bootstrap card hits this when user clicks "I installed it".
+      if (url.pathname === '/claude-available' && req.method === 'GET') {
+        writeClaudeAvailable();
+        const found = findClaude();
+        return new Response(JSON.stringify({ available: !!found, path: found }), {
+          status: 200,
+          headers: { 'Content-Type': 'application/json' },
+        });
+      }
+
+      // /ws — WebSocket upgrade. CRITICAL gates:
+      //   (1) Origin must be chrome-extension://<id>. Cross-site WS hijacking
+      //       defense — required, not optional.
+      //   (2) Token must be in validTokens. We accept the token via two
+      //       transports for compatibility:
+      //         - Sec-WebSocket-Protocol (preferred for browsers — the only
+      //           auth header settable from the browser WebSocket API)
+      //         - Cookie gstack_pty (works for non-browser callers and
+      //           same-port browser callers; doesn't survive the cross-port
+      //           jump from server.ts:34567 to the agent's random port
+      //           when SameSite=Strict is set)
+      //       Either path works; both verify against the same in-memory
+      //       validTokens Set, populated by the parent server's
+      //       authenticated /pty-session → /internal/grant chain.
+      if (url.pathname === '/ws') {
+        const origin = req.headers.get('origin') || '';
+        const isExtensionOrigin = origin.startsWith('chrome-extension://');
+        if (!isExtensionOrigin) {
+          return new Response('forbidden origin', { status: 403 });
+        }
+        if (EXTENSION_ID && origin !== `chrome-extension://${EXTENSION_ID}`) {
+          return new Response('forbidden origin', { status: 403 });
+        }
+
+        // Try Sec-WebSocket-Protocol first. Format: a single token, possibly
+        // with a `gstack-pty.` prefix (which we strip). Browsers send a
+        // comma-separated list when multiple were requested; we pick the
+        // first that matches a known token.
+        const protoHeader = req.headers.get('sec-websocket-protocol') || '';
+        let token: string | null = null;
+        let acceptedProtocol: string | null = null;
+        for (const raw of protoHeader.split(',').map(s => s.trim()).filter(Boolean)) {
+          const candidate = raw.startsWith('gstack-pty.') ? raw.slice('gstack-pty.'.length) : raw;
+          if (validTokens.has(candidate)) {
+            token = candidate;
+            acceptedProtocol = raw;
+            break;
+          }
+        }
+
+        // Fallback: Cookie gstack_pty (legacy / non-browser callers).
+        if (!token) {
+          const cookieHeader = req.headers.get('cookie') || '';
+          for (const part of cookieHeader.split(';')) {
+            const [name, ...rest] = part.trim().split('=');
+            if (name === 'gstack_pty') {
+              const candidate = rest.join('=') || null;
+              if (candidate && validTokens.has(candidate)) {
+                token = candidate;
+              }
+              break;
+            }
+          }
+        }
+
+        if (!token) {
+          return new Response('unauthorized', { status: 401 });
+        }
+
+        const upgraded = server.upgrade(req, {
+          data: { cookie: token },
+          // Echo the protocol back so the browser accepts the upgrade.
+          // Required when the client sends Sec-WebSocket-Protocol — the
+          // server MUST select one of the offered protocols, otherwise
+          // the browser closes the connection immediately.
+          ...(acceptedProtocol ? { headers: { 'Sec-WebSocket-Protocol': acceptedProtocol } } : {}),
+        });
+        return upgraded ? undefined : new Response('upgrade failed', { status: 500 });
+      }
+
+      return new Response('not found', { status: 404 });
+    },
+
+    websocket: {
+      message(ws, raw) {
+        let session = sessions.get(ws);
+        if (!session) {
+          session = {
+            proc: null,
+            cols: 80,
+            rows: 24,
+            cookie: (ws.data as any)?.cookie || '',
+            spawned: false,
+          };
+          sessions.set(ws, session);
+        }
+
+        // Text frames are control messages: {type: "resize", cols, rows} or
+        // {type: "tabSwitch", tabId, url, title}. Binary frames are raw input
+        // bytes destined for the PTY stdin.
+        if (typeof raw === 'string') {
+          let msg: any;
+          try { msg = JSON.parse(raw); } catch { return; }
+          if (msg?.type === 'resize') {
+            const cols = Math.max(2, Math.floor(Number(msg.cols) || 80));
+            const rows = Math.max(2, Math.floor(Number(msg.rows) || 24));
+            session.cols = cols;
+            session.rows = rows;
+            try { session.proc?.terminal?.resize?.(cols, rows); } catch {}
+            return;
+          }
+          if (msg?.type === 'tabSwitch') {
+            handleTabSwitch(msg);
+            return;
+          }
+          if (msg?.type === 'tabState') {
+            handleTabState(msg);
+            return;
+          }
+          // Unknown text frame — ignore.
+          return;
+        }
+
+        // Binary input. Lazy-spawn claude on the first byte.
+        if (!session.spawned) {
+          session.spawned = true;
+          const proc = spawnClaude(session.cols, session.rows, (chunk) => {
+            try { ws.sendBinary(chunk); } catch {}
+          });
+          if (!proc) {
+            try {
+              ws.send(JSON.stringify({
+                type: 'error',
+                code: 'CLAUDE_NOT_FOUND',
+                message: 'claude CLI not on PATH. Install: https://docs.anthropic.com/en/docs/claude-code',
+              }));
+              ws.close(4404, 'claude not found');
+            } catch {}
+            return;
+          }
+          session.proc = proc;
+          // Watch for child exit so the WS closes cleanly when claude exits.
+          proc.exited?.then?.(() => {
+            try { ws.close(1000, 'pty exited'); } catch {}
+          });
+        }
+        try {
+          // raw is a Uint8Array; Bun.Terminal.write accepts string|Buffer.
+          // Convert to Buffer for safety.
+          session.proc?.terminal?.write?.(Buffer.from(raw as Uint8Array));
+        } catch (err) {
+          console.error('[terminal-agent] terminal.write failed:', err);
+        }
+      },
+
+      close(ws) {
+        const session = sessions.get(ws);
+        if (session) {
+          disposeSession(session);
+          if (session.cookie) {
+            // Drop the cookie so it can't be replayed against a new PTY.
+            validTokens.delete(session.cookie);
+          }
+          sessions.delete(ws);
+        }
+      },
+    },
+  });
+}
+
+/**
+ * Tab-switch helper: write the active tab to a state file (claude reads it)
+ * and notify the parent server so its activeTabId stays synced. Skips
+ * chrome:// and chrome-extension:// internal pages.
+ */
+/**
+ * Live tab snapshot. Writes <stateDir>/tabs.json (full list) and updates
+ * <stateDir>/active-tab.json (current active). claude can read these any
+ * time without invoking $B tabs — saves a round-trip when the model just
+ * needs to check the landscape before deciding what to do.
+ */
+function handleTabState(msg: {
+  active?: { tabId?: number; url?: string; title?: string } | null;
+  tabs?: Array<{ tabId?: number; url?: string; title?: string; active?: boolean; windowId?: number; pinned?: boolean; audible?: boolean }>;
+  reason?: string;
+}): void {
+  const stateDir = path.dirname(STATE_FILE);
+  try { fs.mkdirSync(stateDir, { recursive: true, mode: 0o700 }); } catch {}
+
+  // tabs.json — full list
+  if (Array.isArray(msg.tabs)) {
+    const payload = {
+      updatedAt: new Date().toISOString(),
+      reason: msg.reason || 'unknown',
+      tabs: msg.tabs.map(t => ({
+        tabId: t.tabId ?? null,
+        url: t.url || '',
+        title: t.title || '',
+        active: !!t.active,
+        windowId: t.windowId ?? null,
+        pinned: !!t.pinned,
+        audible: !!t.audible,
+      })),
+    };
+    const target = path.join(stateDir, 'tabs.json');
+    const tmp = path.join(stateDir, `.tmp-tabs-${process.pid}`);
+    try {
+      fs.writeFileSync(tmp, JSON.stringify(payload, null, 2), { mode: 0o600 });
+      fs.renameSync(tmp, target);
+    } catch {
+      safeUnlink(tmp);
+    }
+  }
+
+  // active-tab.json — single active tab. Skip chrome-internal pages so
+  // claude doesn't see chrome:// or chrome-extension:// URLs as
+  // "current target."
+  const active = msg.active;
+  if (active && active.url && !active.url.startsWith('chrome://') && !active.url.startsWith('chrome-extension://')) {
+    const ctxFile = path.join(stateDir, 'active-tab.json');
+    const tmp = path.join(stateDir, `.tmp-tab-${process.pid}`);
+    try {
+      fs.writeFileSync(tmp, JSON.stringify({
+        tabId: active.tabId ?? null,
+        url: active.url,
+        title: active.title ?? '',
+      }), { mode: 0o600 });
+      fs.renameSync(tmp, ctxFile);
+    } catch {
+      safeUnlink(tmp);
+    }
+  }
+}
+
+function handleTabSwitch(msg: { tabId?: number; url?: string; title?: string }): void {
+  const url = msg.url || '';
+  if (!url || url.startsWith('chrome://') || url.startsWith('chrome-extension://')) return;
+
+  const stateDir = path.dirname(STATE_FILE);
+  const ctxFile = path.join(stateDir, 'active-tab.json');
+  const tmp = path.join(stateDir, `.tmp-tab-${process.pid}`);
+  try {
+    fs.writeFileSync(tmp, JSON.stringify({
+      tabId: msg.tabId ?? null,
+      url,
+      title: msg.title ?? '',
+    }), { mode: 0o600 });
+    fs.renameSync(tmp, ctxFile);
+  } catch {
+    safeUnlink(tmp);
+  }
+
+  // Best-effort sync to parent server so its activeTabId tracking matches.
+  // No await; this is fire-and-forget.
+  if (BROWSE_SERVER_PORT > 0) {
+    fetch(`http://127.0.0.1:${BROWSE_SERVER_PORT}/command`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'Authorization': `Bearer ${readBrowseToken()}`,
+      },
+      body: JSON.stringify({
+        command: 'tab',
+        args: [String(msg.tabId ?? ''), '--no-focus'],
+      }),
+    }).catch(() => {});
+  }
+}
+
+function readBrowseToken(): string {
+  try {
+    const raw = fs.readFileSync(STATE_FILE, 'utf-8');
+    const j = JSON.parse(raw);
+    return j.token || '';
+  } catch { return ''; }
+}
+
+// Boot.
+function main() {
+  writeClaudeAvailable();
+  const server = buildServer();
+  const port = (server as any).port || (server as any).address?.port;
+  if (!port) {
+    console.error('[terminal-agent] failed to bind: no port');
+    process.exit(1);
+  }
+
+  // Write port file atomically so the parent server can pick it up.
+  const dir = path.dirname(PORT_FILE);
+  try { fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); } catch {}
+  const tmp = `${PORT_FILE}.tmp-${process.pid}`;
+  fs.writeFileSync(tmp, String(port), { mode: 0o600 });
+  fs.renameSync(tmp, PORT_FILE);
+
+  // Hand the parent the internal token so it can call /internal/grant.
+  // Parent learns INTERNAL_TOKEN via env (TERMINAL_AGENT_INTERNAL_TOKEN below).
+  // We just print it on stdout for the supervising process to pick up if it's
+  // not already in env. Defense against env races at spawn time.
+  console.log(`[terminal-agent] listening on 127.0.0.1:${port} pid=${process.pid}`);
+
+  // Cleanup port file on exit.
+  const cleanup = () => { safeUnlink(PORT_FILE); process.exit(0); };
+  process.on('SIGTERM', cleanup);
+  process.on('SIGINT', cleanup);
+}
+
+// Export the internal token so cli.ts can pass the SAME value to the parent
+// server via env. Parent reads BROWSE_TERMINAL_INTERNAL_TOKEN and uses it
+// for /internal/grant calls.
+//
+// In practice, the agent generates INTERNAL_TOKEN once at boot and writes it
+// to a state file the parent reads. This avoids env-passing races. See main().
+const INTERNAL_TOKEN_FILE = path.join(path.dirname(STATE_FILE), 'terminal-internal-token');
+try {
+  fs.mkdirSync(path.dirname(INTERNAL_TOKEN_FILE), { recursive: true, mode: 0o700 });
+  fs.writeFileSync(INTERNAL_TOKEN_FILE, INTERNAL_TOKEN, { mode: 0o600 });
+} catch {}
+
+main();