feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216)

* build: vendor xterm@5 for the Terminal sidebar tab Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm` build step that copies the assets into `extension/lib/` at build time. The vendored files are .gitignored so the npm version stays the source of truth. xterm@5 is eval-free, so no MV3 CSP changes needed. No runtime callers yet — this just stages the assets. * feat(server): add pty-session-cookie module for the Terminal tab Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly cookies for authenticating the Terminal-tab WebSocket upgrade against the terminal-agent. Same TTL, same opportunistic-pruning shape, same "scoped tokens never valid as root" invariant. Two registries instead of one because the cookie names are different (`gstack_sse` vs `gstack_pty`) and the token spaces must not overlap. No callers yet — wired up in the next commit. * feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab) Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun non-compiled process. Lives separately from `sidebar-agent.ts` so a WS-framing or PTY-cleanup bug can't take down the chat path (codex outside-voice review caught the coupling risk). Architecture: - Bun.serve on 127.0.0.1:0 (never tunneled). - POST /internal/grant accepts cookie tokens from the parent server over loopback, authenticated with a per-boot internal token. - GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and (b) the gstack_pty cookie minted by /pty-session. Either gate alone is insufficient (CSWSH defense + auth defense). - Lazy spawn: claude PTY is not started until the WS receives its first data frame. Idle sidebar opens cost nothing. - Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at impl time on Bun 1.3.10. proc.terminal.write() for input, proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback on close. - process.on('uncaughtException'|'unhandledRejection') handlers so a framing bug logs but doesn't kill the listener loop. Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration tests spawn /bin/bash instead of requiring claude on every CI runner. Not yet spawned by anything — wired in the next commit. * feat(server): wire /pty-session route + spawn terminal-agent Server-side glue connecting the Terminal sidebar tab to the new terminal-agent process. server.ts: - New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie to the extension. - /health response gains `terminalPort` (just the port number — never a shell token). Tokens flow via the cookie path, never /health, because /health already surfaces AUTH_TOKEN to localhost callers in headed mode (that's a separate v1.1+ TODO). - /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS, so the dual-listener tunnel surface 404s by default-deny. - Shutdown path now also pkills terminal-agent and unlinks its state files (terminal-port + terminal-internal-token) so a reconnect doesn't try to hit a dead port. cli.ts: - After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn fails — chat still works without the terminal agent. * feat(extension): Terminal as default sidebar tab Adds a primary tab bar (Terminal | Chat) above the existing tab-content panes. Terminal is the default-active tab; clicking Chat returns to the existing claude -p one-shot flow which is preserved verbatim. manifest.json: adds ws://127.0.0.1:*/ to host_permissions so MV3 doesn't block the WebSocket upgrade. sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a "Press any key to start Claude Code" bootstrap card, claude-not-found install card, xterm mount point, and "session ended" restart UI. Loads xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no longer the .active default. sidepanel.js: new activePrimaryPaneId() helper that reads which primary tab is selected. Debug-close paths now route back to whichever primary pane is active (was hardcoded to tab-chat). Primary-tab click handler toggles .active classes and aria-selected. window.gstackServerPort and window.gstackAuthToken exposed so sidepanel-terminal.js can build the /pty-session POST and the WS URL. sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first keystroke fires POST /pty-session, then opens ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically by the browser. Resize observer sends {type:"resize"} text frames. ResizeObserver, tab-switch hooks, restart button, install-card retry. On WS close shows "Session ended, click to restart" — no auto-reconnect (codex outside-voice flagged that as session-burning). sidepanel.css: primary-tabs bar + Terminal pane styling (full-height xterm container, install card, ended state). * test: terminal-agent + cookie module + sidebar default-tab regression Three new test files: terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/ revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure since 127.0.0.1 over HTTP), source-level guards that /pty-session and /terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant (spawnClaude is called from message handler, not upgrade), uncaughtException/ unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup. terminal-agent-integration.test.ts (7 tests): spawns the agent as a real subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no cookie → 401), real WebSocket round-trip with /bin/bash via the BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it back), and resize message acceptance. sidebar-tabs.test.ts (13 tests): structural regression suite locking the load-bearing invariants of the default-tab change — Terminal is .active, Chat is not, xterm assets are loaded, debug-close path no longer hardcodes tab-chat (uses activePrimaryPaneId), primary-tab click handler exists, chat surface is not accidentally deleted, terminal JS does NOT auto- reconnect on close, manifest declares ws:// + http:// localhost host permissions, no unsafe-eval. Plan called for Playwright + extension regression; the codebase doesn't ship Playwright extension launcher infra, so we follow the existing extension-test pattern (source-level structural assertions). Same load-bearing intent — locks the invariants before they regress. * docs: Terminal flow + threat model + v1.1 follow-ups SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS upgrade path (/pty-session cookie mint → /ws Origin + cookie gate → lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session, gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback), and the threat-model boundary — the Terminal tab bypasses the entire prompt-injection security stack on purpose; user keystrokes are the trust source. That trust assumption is load-bearing on three transport guarantees: local-only listener, Origin gate, cookie auth. Drop any one of those three and the tab becomes unsafe. CLAUDE.md: extends the "Sidebar architecture" note to include terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is its own process" note so a future contributor doesn't bolt PTY logic onto sidebar-agent.ts. TODOS.md: three new follow-ups under a new "Sidebar Terminal" section: - v1.1: PTY session survives sidebar reload (Issue 1C deferred). - v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 — a pre-existing soft leak that cc-pty-import sidesteps but doesn't fix). - v1.1+: apply terminal-agent's process.on exception handlers to sidebar-agent.ts (codex finding #4 — chat path has no fatal handlers). * feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip The chat queue path is gone. The Chrome side panel is now just an interactive claude PTY in xterm.js. Activity / Refs / Inspector still exist behind the `debug` toggle in the footer. Three threads of change, all from dogfood iteration on top of cc-pty-import: 1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol - Browsers can't set Authorization on a WebSocket upgrade. We had been minting an HttpOnly gstack_pty cookie via /pty-session, but SameSite=Strict cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The WS opened then immediately closed → "Session ended." - /pty-session now also returns ptySessionToken in the JSON body. - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`. Browser sends Sec-WebSocket-Protocol on the upgrade. - Agent reads the protocol header, validates against validTokens, and MUST echo the protocol back (Chromium closes the connection immediately if a server doesn't pick one of the offered protocols). - Cookie path is kept as a fallback for non-browser callers (curl, integration tests). - New integration test exercises the full protocol-auth round-trip via raw fetch+Upgrade so a future regression of this exact class fails in CI. 2. fix(extension): UX polish on the Terminal pane - Eager auto-connect when the sidebar opens — no "Press any key to start" friction every reload. - Always-visible ↻ Restart button in the terminal toolbar (not gated on the ENDED state) so the user can force a fresh claude mid-session. - MutationObserver on #tab-terminal's class attribute drives a fitAddon.fit() + term.refresh() when the pane becomes visible again — xterm doesn't auto-redraw after display:none → display:flex. 3. feat(extension): rip the chat tab + sidebar-agent.ts - Sidebar is Terminal-only. No more Terminal | Chat primary nav. - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat, /sidebar-agent/event, /sidebar-tabs* and friends all deleted. - The pickSidebarModel router (sonnet vs opus) is gone — the live PTY uses whatever model the user's `claude` CLI is configured with. - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive in the Terminal toolbar. Cleanup now injects its prompt into the live PTY via window.gstackInjectToTerminal — no more /sidebar-command POST. The Inspector "Send to Code" action uses the same injection path. - clear-chat button removed from the footer. - sidepanel.js shed ~900 lines of chat polling, optimistic UI, stop-agent, etc. Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27 structural assertions locking the new layout — Terminal sole pane, no chat input, quick-actions in toolbar, eager-connect, MutationObserver repaint, restart helper. * feat: live tab awareness for the Terminal pane claude in the PTY now has continuous tab-aware context. Three pieces: 1. Live state files. background.js listens to chrome.tabs.onActivated / onCreated / onRemoved / onUpdated (throttled to URL/title/status== complete so loading spinners don't spam) and pushes a snapshot. The sidepanel relays it as a custom event; sidepanel-terminal.js sends {type:"tabState"} text frames over the live PTY WebSocket. terminal-agent.ts writes: <stateDir>/tabs.json all open tabs (id, url, title, active, pinned, audible, windowId) <stateDir>/active-tab.json current active tab (skips chrome:// and chrome-extension:// internal pages) Atomic write via tmp + rename so claude never reads a half-written document. A fresh snapshot is pushed on WS open so the files exist by the time claude finishes booting. 2. New $B tab-each <command> [args...] meta-command. Fans out a single command across every open tab, returns {command, args, total, results: [{tabId, url, title, status, output}]}. Skips chrome:// pages; restores the originally active tab in a finally block (so a mid-batch error doesn't leave the user looking at a different tab); uses bringToFront: false so the OS window doesn't jump on every fanout. Scope-checks the inner command BEFORE the loop. 3. --append-system-prompt hint at spawn time. Claude is told about both the state files and the $B tab-each command up front, so it doesn't have to discover the surface by trial. Passed via the --append-system- prompt CLI flag, NOT as a leading PTY write — the hint stays out of the visible transcript. Tests: - browse/test/tab-each.test.ts (new) — registration + source-level invariants (scope check before loop, finally-restore, bringToFront:false, chrome:// skip) + behavior tests with a mock BrowserManager that verify iteration order, JSON shape, error handling, and active-tab restore. - browse/test/terminal-agent.test.ts — three new assertions for tabState handler shape, atomic-write pattern, and the --append-system-prompt wiring at spawn. Verified live: opened 5 tabs, ran $B tab-each url against the live server, got per-tab JSON results back, original active tab restored without OS focus stealing. * chore: drop sidebar-agent test refs after chat rip Five test files / describe blocks targeted the deleted chat path: - browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E with mock claude — whole file gone) - browse/test/security-review-fullstack.test.ts (review-flow E2E with real classifier — whole file gone) - browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for the security event banner that was ripped from sidepanel.html) - browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue permissions, isValidQueueEntry stateFile traversal, loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy guard, /sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation, AGENT_SRC top-level read converted to graceful fallback) - browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split detection on detectCanaryLeak; one tool-output test on sidebar-agent) - test/skill-validation.test.ts (sidebar agent #584 describe block) These all assumed sidebar-agent.ts existed and tested chat-queue plumbing, chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier canary detection. With the live PTY there is no chat queue, no chat tab, no LLM stream to canary-scan, and no per-message subprocess. The Terminal pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts (27 structural assertions), browse/test/terminal-agent.test.ts, and browse/test/terminal-agent-integration.test.ts. bun test → exit 0, 0 failures. * chore: bump version and changelog (v1.14.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(extension): xterm fills the full Terminal panel height The Terminal pane only rendered into the top portion of the panel — most of the panel below the prompt was an empty black gap. Three layered issues, all about xterm.js measuring dimensions during a layout state that wasn't ready yet: 1. order-of-operations in connect(): ensureXterm() ran BEFORE setState(LIVE), so term.open() measured els.mount while it was still display:none. xterm caches a 0-size viewport synchronously inside open() and never auto-recovers when the container goes visible. Flipped: setState(LIVE) → ensureXterm. 2. first fit() ran synchronously before the browser had applied the .active class transition. Wrapped in requestAnimationFrame so layout has settled before fit() reads clientHeight. 3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and the lack of `min-height: 0` on .terminal-mount meant the item couldn't shrink below content size. flex:1 then refused to expand into available space and xterm rendered into whatever its initial 2x2 measurement happened to be. Fixes: - extension/sidepanel-terminal.js: reorder + RAF fit - extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` + `min-height: 0` + `position: relative`. #tab-terminal overrides .tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has its own viewport scroll; the parent shouldn't compete) and explicitly re-declares `display: flex; flex-direction: column` for #tab-terminal.active. bun test browse/test/sidebar-tabs.test.ts → 27/27 pass. Manually verified: side panel opens → Terminal fills full panel height, xterm scrollback works, debug-tab toggle still repaints correctly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-04-25 22:52:15 -07:00
parent 23c4d7b228
commit ed1e4be2f6
35 changed files with 2999 additions and 5113 deletions
@@ -19,31 +19,10 @@ import { PAGE_CONTENT_COMMANDS } from '../src/commands';

 const REPO_ROOT = path.resolve(__dirname, '..', '..');

-describe('canary stream-chunk split detection', () => {
-  test('detectCanaryLeak uses rolling buffer across consecutive deltas', () => {
-    // Pull in the function via dynamic require so we don't re-export it
-    // from sidebar-agent.ts (it's internal on purpose).
-    const agentSource = fs.readFileSync(
-      path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
-      'utf-8',
-    );
-    // Contract: detectCanaryLeak accepts an optional DeltaBuffer and
-    // uses .slice(-(canary.length - 1)) to retain a rolling tail.
-    expect(agentSource).toContain('DeltaBuffer');
-    expect(agentSource).toMatch(/text_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
-    expect(agentSource).toMatch(/input_json_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
-  });
-
-  test('canary context initializes deltaBuf', () => {
-    const agentSource = fs.readFileSync(
-      path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
-      'utf-8',
-    );
-    // The askClaude call site must construct the buffer so the rolling
-    // detection actually runs.
-    expect(agentSource).toContain("deltaBuf: { text_delta: '', input_json_delta: '' }");
-  });
-});
+// canary stream-chunk split detection — tested detectCanaryLeak inside
+// sidebar-agent.ts. Both the chat-stream pipeline and the function are
+// gone (Terminal pane uses an interactive PTY; user keystrokes are the
+// trust source, no chunked LLM stream to canary-scan).

 describe('tool-output ensemble rule (single-layer BLOCK)', () => {
  test('user-input context: single layer at BLOCK degrades to WARN', () => {
@@ -117,13 +96,10 @@ describe('transcript classifier tool_output parameter', () => {
    expect(src).toContain('tool_output');
  });

-  test('sidebar-agent passes tool text to transcript on tool-result scan', () => {
-    const src = fs.readFileSync(
-      path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
-      'utf-8',
-    );
-    expect(src).toContain('tool_output: text');
-  });
+  // sidebar-agent passed tool text to the transcript classifier on
+  // tool-result scans. That whole pipeline is gone — Terminal pane has
+  // no LLM stream to scan, and security-classifier.ts is dead code with
+  // no production caller (a separate v1.1+ cleanup TODO).
 });

 describe('GSTACK_SECURITY_OFF kill switch', () => {
@@ -15,7 +15,13 @@ import * as os from 'os';
 const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
 const WRITE_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/write-commands.ts'), 'utf-8');
 const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
-const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8');
+// sidebar-agent.ts was ripped (chat queue replaced by interactive PTY).
+// AGENT_SRC kept as empty string so the legacy describe block below skips
+// without crashing module load on a missing file.
+const AGENT_SRC = (() => {
+  try { return fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8'); }
+  catch { return ''; }
+})();
 const SNAPSHOT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/snapshot.ts'), 'utf-8');
 const PATH_SECURITY_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/path-security.ts'), 'utf-8');

@@ -51,53 +57,12 @@ function extractFunction(src: string, name: string): string {
  return src.slice(start);
 }

-// ─── Task 4: Agent queue poisoning — full schema validation + permissions ───
-
-describe('Agent queue security', () => {
-  it('server queue directory must use restricted permissions', () => {
-    const queueSection = SERVER_SRC.slice(SERVER_SRC.indexOf('agentQueue'), SERVER_SRC.indexOf('agentQueue') + 2000);
-    expect(queueSection).toMatch(/0o700/);
-  });
-
-  it('sidebar-agent queue directory must use restricted permissions', () => {
-    // The mkdirSync for the queue dir lives in main() — search the main() body
-    const mainStart = AGENT_SRC.indexOf('async function main');
-    const queueSection = AGENT_SRC.slice(mainStart);
-    expect(queueSection).toMatch(/0o700/);
-  });
-
-  it('cli.ts queue file creation must use restricted permissions', () => {
-    const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
-    const queueSection = CLI_SRC.slice(CLI_SRC.indexOf('queue') || 0, CLI_SRC.indexOf('queue') + 2000);
-    expect(queueSection).toMatch(/0o700|0o600|mode/);
-  });
-
-  it('queue reader must have a validator function covering all fields', () => {
-    // Extract ONLY the validator function body by walking braces
-    const validatorStart = AGENT_SRC.indexOf('function isValidQueueEntry');
-    expect(validatorStart).toBeGreaterThan(-1);
-    let depth = 0;
-    let bodyStart = AGENT_SRC.indexOf('{', validatorStart);
-    let bodyEnd = bodyStart;
-    for (let i = bodyStart; i < AGENT_SRC.length; i++) {
-      if (AGENT_SRC[i] === '{') depth++;
-      if (AGENT_SRC[i] === '}') depth--;
-      if (depth === 0) { bodyEnd = i + 1; break; }
-    }
-    const validatorBlock = AGENT_SRC.slice(validatorStart, bodyEnd);
-
-    expect(validatorBlock).toMatch(/prompt.*string/);
-    expect(validatorBlock).toMatch(/Array\.isArray/);
-    expect(validatorBlock).toMatch(/\.\./);
-    expect(validatorBlock).toContain('stateFile');
-    expect(validatorBlock).toContain('tabId');
-    expect(validatorBlock).toMatch(/number/);
-    expect(validatorBlock).toContain('null');
-    expect(validatorBlock).toContain('message');
-    expect(validatorBlock).toContain('pageUrl');
-    expect(validatorBlock).toContain('sessionId');
-  });
-});
+// ─── Agent queue security ──────────────────────────────────────────────────
+// Original block validated the chat queue's filesystem permissions and
+// schema validator on sidebar-agent.ts. Both are gone (chat queue ripped
+// in favor of the interactive Terminal PTY). The remaining 0o700 / 0o600
+// invariants on extension queue paths are now covered by terminal-agent
+// integration tests and the sidebar-tabs regression suite.

 // ─── Shared source reads for CSS validator tests ────────────────────────────
 const CDP_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cdp-inspector.ts'), 'utf-8');
@@ -325,30 +290,13 @@ describe('Round-2 finding 2: snapshot.ts annotated path uses realpathSync', () =
  });
 });

-// ─── Round-2 finding 3: stateFile path traversal check in isValidQueueEntry ─
-
-describe('Round-2 finding 3: isValidQueueEntry checks stateFile for path traversal', () => {
-  it('isValidQueueEntry checks stateFile for .. traversal sequences', () => {
-    const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
-    expect(fn).toBeTruthy();
-    // Must check stateFile for '..' — find the stateFile block and look for '..' string
-    const stateFileIdx = fn.indexOf('stateFile');
-    expect(stateFileIdx).toBeGreaterThan(-1);
-    const stateFileBlock = fn.slice(stateFileIdx, stateFileIdx + 200);
-    // The block must contain a check for the two-dot traversal sequence
-    expect(stateFileBlock).toMatch(/'\.\.'|"\.\."|\.\./);
-  });
-
-  it('isValidQueueEntry stateFile block contains both type check and traversal check', () => {
-    const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
-    const stateFileIdx = fn.indexOf('stateFile');
-    const stateBlock = fn.slice(stateFileIdx, stateFileIdx + 300);
-    // Must contain the type check
-    expect(stateBlock).toContain('typeof obj.stateFile');
-    // Must contain the includes('..') call
-    expect(stateBlock).toMatch(/includes\s*\(\s*['"]\.\.['"]\s*\)/);
-  });
-});
+// ─── Round-2 finding 3: stateFile path traversal check ─────────────────────
+// Tested isValidQueueEntry's stateFile validator on sidebar-agent.ts. Both
+// the function and the file are gone (chat queue ripped). The terminal-agent
+// PTY path no longer takes a queue entry — it accepts WebSocket frames
+// gated on Origin + session token, no on-disk queue to traverse. Path
+// traversal in browse-server's tab-state writer is covered by
+// browse/test/terminal-agent.test.ts (handleTabState atomic-write tests).

 // ─── Task 5: /health endpoint must not expose sensitive fields ───────────────

@@ -421,24 +369,11 @@ describe('cookie-import domain validation', () => {
  });
 });

-// ─── Task 9: loadSession ID validation ──────────────────────────────────────
-
-describe('loadSession session ID validation', () => {
-  it('loadSession validates session ID format before using it in a path', () => {
-    const fn = extractFunction(SERVER_SRC, 'loadSession');
-    expect(fn).toBeTruthy();
-    // Must contain the alphanumeric regex guard
-    expect(fn).toMatch(/\[a-zA-Z0-9_-\]/);
-  });
-
-  it('loadSession returns null on invalid session ID', () => {
-    const fn = extractFunction(SERVER_SRC, 'loadSession');
-    const block = fn.slice(fn.indexOf('activeData.id'));
-    // Must warn and return null
-    expect(block).toContain('Invalid session ID');
-    expect(block).toContain('return null');
-  });
-});
+// loadSession session ID validation — loadSession lived inside the chat
+// agent state block (sidebar-agent.ts session persistence). Chat queue
+// is gone, so the function and its session-ID validator are gone. The
+// terminal-agent's PTY session has no on-disk session ID — the WebSocket
+// holds the session for its lifetime.

 // ─── Task 10: Responsive screenshot path validation ──────────────────────────

@@ -520,40 +455,11 @@ describe('Task 11: state load cookie validation', () => {
  });
 });

-// ─── Task 12: Validate activeTabUrl before syncActiveTabByUrl ─────────────────
-
-describe('Task 12: activeTabUrl sanitized before syncActiveTabByUrl', () => {
-  it('sidebar-tabs route sanitizes activeUrl before syncActiveTabByUrl', () => {
-    const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
-    expect(block).toContain('sanitizeExtensionUrl');
-    expect(block).toContain('syncActiveTabByUrl');
-    const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
-    const syncIdx = block.indexOf('syncActiveTabByUrl');
-    expect(sanitizeIdx).toBeLessThan(syncIdx);
-  });
-
-  it('sidebar-command route sanitizes extensionUrl before syncActiveTabByUrl', () => {
-    const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
-    expect(block).toContain('sanitizeExtensionUrl');
-    expect(block).toContain('syncActiveTabByUrl');
-    const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
-    const syncIdx = block.indexOf('syncActiveTabByUrl');
-    expect(sanitizeIdx).toBeLessThan(syncIdx);
-  });
-
-  it('direct unsanitized syncActiveTabByUrl calls are not present (all calls go through sanitize)', () => {
-    // Every syncActiveTabByUrl call should be preceded by sanitizeExtensionUrl in the nearby code
-    // We verify there are no direct browserManager.syncActiveTabByUrl(activeUrl) or
-    // browserManager.syncActiveTabByUrl(extensionUrl) patterns (without sanitize wrapper)
-    const block1 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
-    // Should NOT contain direct call with raw activeUrl
-    expect(block1).not.toMatch(/syncActiveTabByUrl\(activeUrl\)/);
-
-    const block2 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
-    // Should NOT contain direct call with raw extensionUrl
-    expect(block2).not.toMatch(/syncActiveTabByUrl\(extensionUrl\)/);
-  });
-});
+// activeTabUrl sanitized before syncActiveTabByUrl — tested URL sanitization
+// on the now-deleted /sidebar-tabs and /sidebar-command routes. The
+// terminal-agent reads tab URLs from the live tabs.json file (atomic write
+// from background.js), and chrome:// / chrome-extension:// pages are
+// filtered server-side in handleTabState — see browse/test/terminal-agent.test.ts.

 // ─── Task 13: Inbox output wrapped as untrusted ──────────────────────────────

@@ -581,107 +487,17 @@ describe('Task 13: inbox output wrapped as untrusted content', () => {
  });
 });

-// ─── Task 14: DOM serialization round-trip replaced with DocumentFragment ─────
+// switchChatTab DocumentFragment + pollChat reentrancy guard tests targeted
+// now-deleted chat-tab DOM logic and chat-polling reentrancy. Both are gone
+// (Terminal pane is the sole sidebar surface; xterm.js owns its own DOM
+// lifecycle, and the WebSocket has no reentrancy hazard).

-const SIDEPANEL_SRC = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
-
-describe('Task 14: switchChatTab uses DocumentFragment, not innerHTML round-trip', () => {
-  it('switchChatTab does NOT use innerHTML to restore chat (string-based re-parse removed)', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
-    expect(fn).toBeTruthy();
-    // Must NOT have the dangerous pattern of assigning chatDomByTab value back to innerHTML
-    expect(fn).not.toMatch(/chatMessages\.innerHTML\s*=\s*chatDomByTab/);
-  });
-
-  it('switchChatTab uses createDocumentFragment to save chat DOM', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
-    expect(fn).toContain('createDocumentFragment');
-  });
-
-  it('switchChatTab moves nodes via appendChild/firstChild (not innerHTML assignment)', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
-    // Must use appendChild to restore nodes from fragment
-    expect(fn).toContain('chatMessages.appendChild');
-  });
-
-  it('chatDomByTab comment documents that values are DocumentFragments, not strings', () => {
-    // Check module-level comment on chatDomByTab
-    const commentIdx = SIDEPANEL_SRC.indexOf('chatDomByTab');
-    const commentLine = SIDEPANEL_SRC.slice(commentIdx, commentIdx + 120);
-    expect(commentLine).toMatch(/DocumentFragment|fragment/i);
-  });
-
-  it('welcome screen is built with DOM methods in the else branch (not innerHTML)', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
-    // The else branch must use createElement, not innerHTML template literal
-    expect(fn).toContain('createElement');
-    // The specific innerHTML template with chat-welcome must be gone
-    expect(fn).not.toMatch(/innerHTML\s*=\s*`[\s\S]*?chat-welcome/);
-  });
-});
-
-// ─── Task 15: pollChat/switchChatTab reentrancy guard ────────────────────────
-
-describe('Task 15: pollChat reentrancy guard and deferred call in switchChatTab', () => {
-  it('pollInProgress guard variable is declared at module scope', () => {
-    // Must be declared before any function definitions (within first 2000 chars)
-    const moduleTop = SIDEPANEL_SRC.slice(0, 2000);
-    expect(moduleTop).toContain('pollInProgress');
-  });
-
-  it('pollChat function checks and sets pollInProgress', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
-    expect(fn).toBeTruthy();
-    expect(fn).toContain('pollInProgress');
-  });
-
-  it('pollChat resets pollInProgress in finally block', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
-    // The finally block must contain the reset
-    const finallyIdx = fn.indexOf('finally');
-    expect(finallyIdx).toBeGreaterThan(-1);
-    const finallyBlock = fn.slice(finallyIdx, finallyIdx + 60);
-    expect(finallyBlock).toContain('pollInProgress');
-  });
-
-  it('switchChatTab calls pollChat via setTimeout (not directly)', () => {
-    const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
-    // Must use setTimeout to defer pollChat — no direct call at the end
-    expect(fn).toMatch(/setTimeout\s*\(\s*pollChat/);
-    // Must NOT have a bare direct call `pollChat()` at the end (outside setTimeout)
-    // We check that there is no standalone `pollChat()` call (outside setTimeout wrapper)
-    const withoutSetTimeout = fn.replace(/setTimeout\s*\(\s*pollChat[^)]*\)/g, '');
-    expect(withoutSetTimeout).not.toMatch(/\bpollChat\s*\(\s*\)/);
-  });
-});
-
-// ─── Task 16: SIGKILL escalation in sidebar-agent timeout ────────────────────
-
-describe('Task 16: sidebar-agent timeout handler uses SIGTERM→SIGKILL escalation', () => {
-  it('timeout block sends SIGTERM first', () => {
-    // Slice from "Timed out" / setTimeout block to processingTabs.delete
-    const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
-    expect(timeoutStart).toBeGreaterThan(-1);
-    const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
-    expect(timeoutBlock).toContain('SIGTERM');
-  });
-
-  it('timeout block escalates to SIGKILL after delay', () => {
-    const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
-    const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
-    expect(timeoutBlock).toContain('SIGKILL');
-  });
-
-  it('SIGTERM appears before SIGKILL in timeout block', () => {
-    const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
-    const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
-    const sigtermIdx = timeoutBlock.indexOf('SIGTERM');
-    const sigkillIdx = timeoutBlock.indexOf('SIGKILL');
-    expect(sigtermIdx).toBeGreaterThan(-1);
-    expect(sigkillIdx).toBeGreaterThan(-1);
-    expect(sigtermIdx).toBeLessThan(sigkillIdx);
-  });
-});
+// ─── Task 16: SIGKILL escalation ────────────────────────────────────────────
+// Originally tested sidebar-agent's SIDEBAR_AGENT_TIMEOUT block. The chat
+// queue and its watchdog are gone. terminal-agent.ts disposes claude with
+// the same SIGINT-then-SIGKILL-after-3s pattern; that's covered by
+// browse/test/terminal-agent.test.ts ("cleanup escalates SIGINT to SIGKILL
+// after 3s on close").

 // ─── Task 17: viewport and wait bounds clamping ──────────────────────────────

@@ -1,218 +0,0 @@
-/**
- * Full-stack E2E — the security-contract anchor test.
- *
- * Spins up a real browse server + real sidebar-agent subprocess, points
- * them at a MOCK claude binary (browse/test/fixtures/mock-claude/claude)
- * that deterministically emits a canary-leaking tool_use event, then
- * verifies the whole pipeline reacts:
- *
- *   1. Server canary-injects into the system prompt
- *   2. Server queues the message
- *   3. Sidebar-agent spawns mock-claude
- *   4. Mock-claude emits tool_use with CANARY-XXX in a URL arg
- *   5. Sidebar-agent's detectCanaryLeak fires on the stream event
- *   6. onCanaryLeaked logs, SIGTERM's mock-claude, emits security_event
- *   7. /sidebar-chat returns security_event + agent_error entries
- *
- * This test proves the end-to-end contract: when a canary leak happens,
- * the session terminates AND the sidepanel receives the events that drive
- * the approved banner render. No LLM cost, <10s total runtime.
- *
- * Fully deterministic — safe to run on every commit (gate tier).
- */
-
-import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
-import { spawn, type Subprocess } from 'bun';
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-let serverProc: Subprocess | null = null;
-let agentProc: Subprocess | null = null;
-let serverPort = 0;
-let authToken = '';
-let tmpDir = '';
-let stateFile = '';
-let queueFile = '';
-const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
-
-async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
-  const headers: Record<string, string> = {
-    'Content-Type': 'application/json',
-    Authorization: `Bearer ${authToken}`,
-    ...(opts.headers as Record<string, string> | undefined),
-  };
-  return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
-}
-
-beforeAll(async () => {
-  tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-e2e-fullstack-'));
-  stateFile = path.join(tmpDir, 'browse.json');
-  queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
-  fs.mkdirSync(path.dirname(queueFile), { recursive: true });
-
-  const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
-  const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
-
-  // 1) Start the browse server.
-  serverProc = spawn(['bun', 'run', serverScript], {
-    env: {
-      ...process.env,
-      BROWSE_STATE_FILE: stateFile,
-      BROWSE_HEADLESS_SKIP: '1', // no Chromium for this test
-      BROWSE_PORT: '0',
-      SIDEBAR_QUEUE_PATH: queueFile,
-      BROWSE_IDLE_TIMEOUT: '300',
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-
-  // Wait for state file with token + port
-  const deadline = Date.now() + 15000;
-  while (Date.now() < deadline) {
-    if (fs.existsSync(stateFile)) {
-      try {
-        const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
-        if (state.port && state.token) {
-          serverPort = state.port;
-          authToken = state.token;
-          break;
-        }
-      } catch {}
-    }
-    await new Promise((r) => setTimeout(r, 100));
-  }
-  if (!serverPort) throw new Error('Server did not start in time');
-
-  // 2) Start the sidebar-agent with PATH prepended by the mock-claude dir.
-  // sidebar-agent spawns `claude` via PATH lookup (spawn('claude', ...) — see
-  // browse/src/sidebar-agent.ts spawnClaude), so prepending works without any
-  // source change.
-  const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
-  agentProc = spawn(['bun', 'run', agentScript], {
-    env: {
-      ...process.env,
-      PATH: shimmedPath,
-      BROWSE_STATE_FILE: stateFile,
-      SIDEBAR_QUEUE_PATH: queueFile,
-      BROWSE_SERVER_PORT: String(serverPort),
-      BROWSE_PORT: String(serverPort),
-      BROWSE_NO_AUTOSTART: '1',
-      // Scenario for mock-claude inherits through spawn env below — the agent
-      // itself doesn't read this, but the claude subprocess it spawns does.
-      MOCK_CLAUDE_SCENARIO: 'canary_leak_in_tool_arg',
-      // Force classifier off so pre-spawn ML scan doesn't fire on our
-      // benign synthetic test prompt. This test exercises the canary
-      // path specifically.
-      GSTACK_SECURITY_OFF: '1',
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-
-  // Give the agent a moment to establish its poll loop.
-  await new Promise((r) => setTimeout(r, 500));
-}, 30000);
-
-async function drainStderr(proc: Subprocess | null, label: string): Promise<void> {
-  if (!proc?.stderr) return;
-  try {
-    const reader = (proc.stderr as ReadableStream).getReader();
-    // Drain briefly — don't block shutdown
-    const result = await Promise.race([
-      reader.read(),
-      new Promise<ReadableStreamReadResult<Uint8Array>>((resolve) =>
-        setTimeout(() => resolve({ done: true, value: undefined }), 100)
-      ),
-    ]);
-    if (result?.value) {
-      const text = new TextDecoder().decode(result.value);
-      if (text.trim()) console.error(`[${label} stderr]`, text.slice(0, 2000));
-    }
-  } catch {}
-}
-
-afterAll(async () => {
-  // Dump agent stderr for diagnostic
-  await drainStderr(agentProc, 'agent');
-  for (const proc of [serverProc, agentProc]) {
-    if (proc) {
-      try { proc.kill('SIGTERM'); } catch {}
-      try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
-    }
-  }
-  try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
-});
-
-describe('security pipeline E2E (mock claude)', () => {
-  test('server injects canary, queues message, agent spawns mock claude', async () => {
-    const resp = await apiFetch('/sidebar-command', {
-      method: 'POST',
-      body: JSON.stringify({
-        message: "What's on this page?",
-        activeTabUrl: 'https://attacker.example.com/',
-      }),
-    });
-    expect(resp.status).toBe(200);
-
-    // Wait for the sidebar-agent to pick up the entry and spawn mock-claude.
-    // Queue entry must contain `canary` field (added by server.ts spawnClaude).
-    await new Promise((r) => setTimeout(r, 250));
-    const queueContent = fs.readFileSync(queueFile, 'utf-8').trim();
-    const lines = queueContent.split('\n').filter(Boolean);
-    expect(lines.length).toBeGreaterThan(0);
-    const entry = JSON.parse(lines[lines.length - 1]);
-    expect(entry.canary).toMatch(/^CANARY-[0-9A-F]+$/);
-    expect(entry.prompt).toContain(entry.canary);
-    expect(entry.prompt).toContain('NEVER include it');
-  });
-
-  test('canary leak triggers security_event + agent_error in /sidebar-chat', async () => {
-    // By now the mock-claude subprocess has emitted the tool_use with the
-    // leaked canary. Sidebar-agent's handleStreamEvent -> detectCanaryLeak
-    // -> onCanaryLeaked should have fired security_event + agent_error and
-    // SIGTERM'd the mock. Poll /sidebar-chat up to 10s for the events.
-    const deadline = Date.now() + 10000;
-    let securityEvent: any = null;
-    let agentError: any = null;
-    while (Date.now() < deadline && (!securityEvent || !agentError)) {
-      const resp = await apiFetch('/sidebar-chat');
-      const data: any = await resp.json();
-      for (const entry of data.entries ?? []) {
-        if (entry.type === 'security_event') securityEvent = entry;
-        if (entry.type === 'agent_error') agentError = entry;
-      }
-      if (securityEvent && agentError) break;
-      await new Promise((r) => setTimeout(r, 250));
-    }
-
-    expect(securityEvent).not.toBeNull();
-    expect(securityEvent.verdict).toBe('block');
-    expect(securityEvent.reason).toBe('canary_leaked');
-    expect(securityEvent.layer).toBe('canary');
-    // The leak is on a tool_use channel — onCanaryLeaked records "tool_use:Bash"
-    expect(String(securityEvent.channel)).toContain('tool_use');
-    expect(securityEvent.domain).toBe('attacker.example.com');
-
-    expect(agentError).not.toBeNull();
-    expect(agentError.error).toContain('Session terminated');
-    expect(agentError.error).toContain('prompt injection detected');
-  }, 15000);
-
-  test('attempts.jsonl logged with salted payload_hash and verdict=block', async () => {
-    // onCanaryLeaked also calls logAttempt — check the log file exists
-    // and contains the event. The file lives at ~/.gstack/security/attempts.jsonl.
-    const logPath = path.join(os.homedir(), '.gstack', 'security', 'attempts.jsonl');
-    expect(fs.existsSync(logPath)).toBe(true);
-    const content = fs.readFileSync(logPath, 'utf-8');
-    const recent = content.split('\n').filter(Boolean).slice(-10);
-    // Find at least one entry with verdict=block and layer=canary from our run
-    const ourEntry = recent
-      .map((l) => { try { return JSON.parse(l); } catch { return null; } })
-      .find((e) => e && e.layer === 'canary' && e.verdict === 'block' && e.urlDomain === 'attacker.example.com');
-    expect(ourEntry).toBeTruthy();
-    // payload_hash is a 64-char sha256 hex
-    expect(String(ourEntry.payloadHash)).toMatch(/^[0-9a-f]{64}$/);
-    // Never stored the payload itself — only the hash
-    expect(JSON.stringify(ourEntry)).not.toContain('CANARY-');
-  });
-});
@@ -1,405 +0,0 @@
-/**
- * Full-stack review-flow E2E with the real classifier.
- *
- * Spins up real server + real sidebar-agent subprocess + mock-claude and
- * exercises the whole tool-output BLOCK → review → decide path with the
- * real TestSavantAI classifier warm. The injection string trips the real
- * model reliably (measured: confidence 0.9999 on classic DAN-style text).
- *
- * What this covers that gate-tier tests don't:
- *   * Real classifier actually fires on the injection
- *   * sidebar-agent emits a reviewable security_event for real, not a stub
- *   * server's POST /security-decision writes the on-disk decision file
- *   * sidebar-agent's poll loop reads the file and either resumes or kills
- *     the mock-claude subprocess
- *   * attempts.jsonl ends up with the right verdict (block vs user_overrode)
- *
- * This is periodic tier. First run warms the ~112MB classifier from
- * HuggingFace — ~30s cold. Subsequent runs use the cached model under
- * ~/.gstack/models/testsavant-small/ and complete in ~5s.
- *
- * SKIPS if the classifier can't warm (no network, no disk) — the test is
- * truth-seeking only when the stack is genuinely up.
- */
-
-import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
-import { spawn, type Subprocess } from 'bun';
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
-const WARMUP_TIMEOUT_MS = 90_000; // first-run download budget
-const CLASSIFIER_CACHE = path.join(os.homedir(), '.gstack', 'models', 'testsavant-small');
-
-let serverProc: Subprocess | null = null;
-let agentProc: Subprocess | null = null;
-let serverPort = 0;
-let authToken = '';
-let tmpDir = '';
-let stateFile = '';
-let queueFile = '';
-let attemptsPath = '';
-
-/**
- * Eager check — is the classifier model already on disk? `test.skipIf()`
- * is evaluated at file-registration time (before beforeAll runs), so a
- * runtime boolean wouldn't work — all tests would unconditionally register
- * as skipped. Probe the model dir synchronously at file load.
- * Same pattern as security-sidepanel-dom.test.ts uses for chromium.
- */
-const CLASSIFIER_READY = (() => {
-  try {
-    if (!fs.existsSync(CLASSIFIER_CACHE)) return false;
-    // At minimum we need the tokenizer config + onnx model.
-    return fs.existsSync(path.join(CLASSIFIER_CACHE, 'tokenizer.json'))
-      && fs.existsSync(path.join(CLASSIFIER_CACHE, 'onnx'));
-  } catch {
-    return false;
-  }
-})();
-
-async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
-  return fetch(`http://127.0.0.1:${serverPort}${pathname}`, {
-    ...opts,
-    headers: {
-      'Content-Type': 'application/json',
-      Authorization: `Bearer ${authToken}`,
-      ...(opts.headers as Record<string, string> | undefined),
-    },
-  });
-}
-
-async function waitForSecurityEntry(
-  predicate: (entry: any) => boolean,
-  timeoutMs: number,
-): Promise<any | null> {
-  const deadline = Date.now() + timeoutMs;
-  while (Date.now() < deadline) {
-    const resp = await apiFetch('/sidebar-chat');
-    const data: any = await resp.json();
-    for (const entry of data.entries ?? []) {
-      if (entry.type === 'security_event' && predicate(entry)) return entry;
-    }
-    await new Promise((r) => setTimeout(r, 250));
-  }
-  return null;
-}
-
-async function waitForProcessExit(proc: Subprocess, timeoutMs: number): Promise<number | null> {
-  const deadline = Date.now() + timeoutMs;
-  while (Date.now() < deadline) {
-    if (proc.exitCode !== null) return proc.exitCode;
-    await new Promise((r) => setTimeout(r, 100));
-  }
-  return null;
-}
-
-async function readAttempts(): Promise<any[]> {
-  if (!fs.existsSync(attemptsPath)) return [];
-  const raw = fs.readFileSync(attemptsPath, 'utf-8');
-  return raw.split('\n').filter(Boolean).map((l) => {
-    try { return JSON.parse(l); } catch { return null; }
-  }).filter(Boolean);
-}
-
-async function startStack(scenario: string, attemptsDir: string): Promise<void> {
-  tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-review-fullstack-'));
-  stateFile = path.join(tmpDir, 'browse.json');
-  queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
-  fs.mkdirSync(path.dirname(queueFile), { recursive: true });
-
-  // Re-root HOME for both server and agent so:
-  // - server.ts's SESSIONS_DIR doesn't load pre-existing chat history
-  //   from ~/.gstack/sidebar-sessions/ (caused ghost security_events to
-  //   leak in from the live /open-gstack-browser session)
-  // - security.ts's attempts.jsonl writes land in a test-owned dir
-  // - session-state.json, chromium-profile, etc. stay isolated
-  fs.mkdirSync(path.join(attemptsDir, '.gstack'), { recursive: true });
-
-  // Symlink the models dir through to the real cache — without it the
-  // sidebar-agent would try to re-download 112MB every test run.
-  const testModelsDir = path.join(attemptsDir, '.gstack', 'models');
-  const realModelsDir = path.join(os.homedir(), '.gstack', 'models');
-  try {
-    if (fs.existsSync(realModelsDir) && !fs.existsSync(testModelsDir)) {
-      fs.symlinkSync(realModelsDir, testModelsDir);
-    }
-  } catch {
-    // Symlink may already exist — ignore.
-  }
-
-  const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
-  const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
-
-  serverProc = spawn(['bun', 'run', serverScript], {
-    env: {
-      ...process.env,
-      BROWSE_STATE_FILE: stateFile,
-      BROWSE_HEADLESS_SKIP: '1',
-      BROWSE_PORT: '0',
-      SIDEBAR_QUEUE_PATH: queueFile,
-      BROWSE_IDLE_TIMEOUT: '300',
-      HOME: attemptsDir,
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-
-  const deadline = Date.now() + 15000;
-  while (Date.now() < deadline) {
-    if (fs.existsSync(stateFile)) {
-      try {
-        const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
-        if (state.port && state.token) {
-          serverPort = state.port;
-          authToken = state.token;
-          break;
-        }
-      } catch {}
-    }
-    await new Promise((r) => setTimeout(r, 100));
-  }
-  if (!serverPort) throw new Error('Server did not start in time');
-
-  const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
-  agentProc = spawn(['bun', 'run', agentScript], {
-    env: {
-      ...process.env,
-      PATH: shimmedPath,
-      BROWSE_STATE_FILE: stateFile,
-      SIDEBAR_QUEUE_PATH: queueFile,
-      BROWSE_SERVER_PORT: String(serverPort),
-      BROWSE_PORT: String(serverPort),
-      BROWSE_NO_AUTOSTART: '1',
-      MOCK_CLAUDE_SCENARIO: scenario,
-      HOME: attemptsDir,
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-  attemptsPath = path.join(attemptsDir, '.gstack', 'security', 'attempts.jsonl');
-
-  // Give the agent a moment to establish its poll loop + warmup the model.
-  await new Promise((r) => setTimeout(r, 500));
-}
-
-async function stopStack(): Promise<void> {
-  for (const proc of [serverProc, agentProc]) {
-    if (proc) {
-      try { proc.kill('SIGTERM'); } catch {}
-      try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
-    }
-  }
-  serverProc = null;
-  agentProc = null;
-  try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
-}
-
-beforeAll(async () => {
-  // Sanity: the on-disk cache is real + decodable. If this fails, mark the
-  // file as "classifier unavailable" (we can't toggle CLASSIFIER_READY
-  // post-registration — a failure here just means the tests below will
-  // exercise the agent without a working classifier, which is the honest
-  // signal we want anyway).
-  if (!CLASSIFIER_READY) return;
-});
-
-afterAll(async () => {
-  await stopStack();
-});
-
-describe('review-flow full-stack E2E', () => {
-  test.skipIf(!CLASSIFIER_READY)(
-    'tool_result injection → reviewable event → user ALLOWS → attempts.jsonl has user_overrode',
-    async () => {
-      const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-allow-'));
-      try {
-        await startStack('tool_result_injection', attemptsDir);
-
-        // Fire the message that will cause mock-claude to emit the
-        // injection-laden tool_result.
-        const resp = await apiFetch('/sidebar-command', {
-          method: 'POST',
-          body: JSON.stringify({
-            message: 'summarize the hacker news comments',
-            activeTabUrl: 'https://news.ycombinator.com/item?id=42',
-          }),
-        });
-        expect(resp.status).toBe(200);
-
-        // Wait for the real classifier to fire and emit a reviewable
-        // security_event. The classifier is warm so this should happen in
-        // well under 10s once the tool_result arrives.
-        const reviewable = await waitForSecurityEntry(
-          (e) => e.verdict === 'block' && e.reviewable === true,
-          30_000,
-        );
-        expect(reviewable).not.toBeNull();
-        expect(reviewable.reason).toBe('tool_result_ml');
-        expect(reviewable.tool).toBe('Bash');
-        expect(String(reviewable.suspected_text ?? '')).toContain('IGNORE ALL PREVIOUS');
-
-        // User clicks Allow via the banner → sidepanel POSTs to server.
-        const decisionResp = await apiFetch('/security-decision', {
-          method: 'POST',
-          body: JSON.stringify({
-            tabId: reviewable.tabId,
-            decision: 'allow',
-            reason: 'user',
-          }),
-        });
-        expect(decisionResp.status).toBe(200);
-
-        // Wait for sidebar-agent's poll loop to consume the decision and
-        // emit a follow-up user_overrode security_event.
-        const overrode = await waitForSecurityEntry(
-          (e) => e.verdict === 'user_overrode',
-          10_000,
-        );
-        expect(overrode).not.toBeNull();
-
-        // Audit log must capture both the block and the override, in that
-        // order. Both records share the same salted payload hash so the
-        // security dashboard can aggregate them as a single attempt.
-        const attempts = await readAttempts();
-        const blockLog = attempts.find(
-          (a) => a.verdict === 'block' && a.layer === 'testsavant_content',
-        );
-        const overrodeLog = attempts.find(
-          (a) => a.verdict === 'user_overrode' && a.layer === 'testsavant_content',
-        );
-        expect(blockLog).toBeTruthy();
-        expect(overrodeLog).toBeTruthy();
-        expect(overrodeLog.payloadHash).toBe(blockLog.payloadHash);
-        // Privacy contract: neither record includes the raw payload.
-        expect(JSON.stringify(overrodeLog)).not.toContain('IGNORE ALL PREVIOUS');
-
-        // Liveness: session must actually KEEP RUNNING after Allow. Mock-claude
-        // emits a second tool_use to post-block-followup.example.com ~8s
-        // after the tool_result. That event must reach the chat feed, proving
-        // the sidebar-agent resumed the stream-handler relay instead of
-        // silently wedging.
-        const followupDeadline = Date.now() + 20_000;
-        let followup: any = null;
-        while (Date.now() < followupDeadline && !followup) {
-          const chatResp = await apiFetch('/sidebar-chat');
-          const chatData: any = await chatResp.json();
-          for (const entry of chatData.entries ?? []) {
-            const input = String((entry as any).input ?? '');
-            if (
-              entry.type === 'tool_use' &&
-              input.includes('post-block-followup.example.com')
-            ) {
-              followup = entry;
-              break;
-            }
-          }
-          if (!followup) await new Promise((r) => setTimeout(r, 300));
-        }
-        expect(followup).not.toBeNull();
-      } finally {
-        await stopStack();
-        try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
-      }
-    },
-    90_000,
-  );
-
-  test.skipIf(!CLASSIFIER_READY)(
-    'tool_result injection → reviewable event → user BLOCKS → agent session terminates',
-    async () => {
-      const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-block-'));
-      try {
-        await startStack('tool_result_injection', attemptsDir);
-
-        const resp = await apiFetch('/sidebar-command', {
-          method: 'POST',
-          body: JSON.stringify({
-            message: 'summarize the hacker news comments',
-            activeTabUrl: 'https://news.ycombinator.com/item?id=42',
-          }),
-        });
-        expect(resp.status).toBe(200);
-
-        const reviewable = await waitForSecurityEntry(
-          (e) => e.verdict === 'block' && e.reviewable === true,
-          30_000,
-        );
-        expect(reviewable).not.toBeNull();
-
-        const decisionResp = await apiFetch('/security-decision', {
-          method: 'POST',
-          body: JSON.stringify({
-            tabId: reviewable.tabId,
-            decision: 'block',
-            reason: 'user',
-          }),
-        });
-        expect(decisionResp.status).toBe(200);
-
-        // Wait for the agent_error that the sidebar-agent emits when it
-        // kills the claude subprocess after a user-confirmed block. This
-        // is the sidepanel's "Session terminated" signal.
-        const deadline = Date.now() + 15_000;
-        let errorEntry: any = null;
-        while (Date.now() < deadline && !errorEntry) {
-          const chatResp = await apiFetch('/sidebar-chat');
-          const chatData: any = await chatResp.json();
-          for (const entry of chatData.entries ?? []) {
-            if (
-              entry.type === 'agent_error' &&
-              String(entry.error ?? '').includes('Session terminated')
-            ) {
-              errorEntry = entry;
-              break;
-            }
-          }
-          if (!errorEntry) await new Promise((r) => setTimeout(r, 200));
-        }
-        expect(errorEntry).not.toBeNull();
-
-        // attempts.jsonl must NOT have a user_overrode entry for this run.
-        const attempts = await readAttempts();
-        const overrodeLog = attempts.find((a) => a.verdict === 'user_overrode');
-        expect(overrodeLog).toBeFalsy();
-
-        // The real security property: after Block, NO FURTHER tool calls
-        // reach the chat feed. Mock-claude would have emitted a tool_use
-        // to post-block-followup.example.com ~8s after the tool_result if
-        // the session had kept running. Wait long enough for that window
-        // to close (12s total), then assert the followup event never
-        // appeared. This is what makes "block" actually stop the page —
-        // the subprocess is SIGTERM'd before it can emit the next event.
-        await new Promise((r) => setTimeout(r, 12_000));
-        const finalChatResp = await apiFetch('/sidebar-chat');
-        const finalChatData: any = await finalChatResp.json();
-        const followupAttempted = (finalChatData.entries ?? []).some(
-          (entry: any) =>
-            entry.type === 'tool_use' &&
-            String(entry.input ?? '').includes('post-block-followup.example.com'),
-        );
-        expect(followupAttempted).toBe(false);
-
-        // And mock-claude must actually have died (not just been signaled
-        // — the SIGTERM + SIGKILL pair should have exited the process).
-        const mockAlive = (await apiFetch('/sidebar-chat')).ok; // channel still open
-        expect(mockAlive).toBe(true);
-      } finally {
-        await stopStack();
-        try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
-      }
-    },
-    90_000,
-  );
-
-  test.skipIf(!CLASSIFIER_READY)(
-    'no decision within 60s → timeout auto-blocks',
-    async () => {
-      // This test would naturally take 60s+ to run. We assert the
-      // decision file semantics instead — the unit-test suite already
-      // verified the poll loop times out and defaults to block
-      // (security-review-flow.test.ts). Kept here as a spec marker so
-      // the scenario is documented in the full-stack file.
-      expect(true).toBe(true);
-    },
-  );
-});
@@ -1,345 +0,0 @@
-/**
- * Review-flow E2E (sidepanel side, hermetic).
- *
- * Loads the real extension sidepanel.html in Playwright Chromium, stubs
- * the browse server responses, injects a `reviewable: true` security_event
- * into /sidebar-chat, and asserts the user-in-the-loop flow end-to-end:
- *
- *   1. Banner renders with "Review suspected injection" title
- *   2. Suspected text excerpt shows up inside the expandable details
- *   3. Allow + Block buttons are visible and actionable
- *   4. Clicking Allow posts to /security-decision with decision:"allow"
- *   5. Clicking Block posts to /security-decision with decision:"block"
- *   6. Banner auto-hides after decision
- *
- * This is the UI-and-wire test. The server-side handshake (decision file
- * write + sidebar-agent poll) is covered by security-review-flow.test.ts.
- * The full-stack version with real mock-claude + real classifier lives
- * in security-review-fullstack.test.ts (periodic tier).
- *
- * Gate tier. ~3s. Skipped if Playwright chromium is unavailable.
- */
-
-import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
-import * as fs from 'fs';
-import * as path from 'path';
-import { chromium, type Browser, type Page } from 'playwright';
-
-const EXTENSION_DIR = path.resolve(import.meta.dir, '..', '..', 'extension');
-const SIDEPANEL_URL = `file://${EXTENSION_DIR}/sidepanel.html`;
-
-const CHROMIUM_AVAILABLE = (() => {
-  try {
-    const exe = chromium.executablePath();
-    return !!exe && fs.existsSync(exe);
-  } catch {
-    return false;
-  }
-})();
-
-interface DecisionCall {
-  tabId: number;
-  decision: 'allow' | 'block';
-  reason?: string;
-}
-
-/**
- * Install the same stubs the existing sidepanel-dom test uses, plus a
- * fetch interceptor that captures POSTs to /security-decision into a
- * page-scoped array. Returns a handle to read the captured calls.
- */
-async function installStubsAndCapture(
-  page: Page,
-  scenario: { securityEntries: any[] },
-): Promise<void> {
-  await page.addInitScript((params: any) => {
-    (window as any).__decisionCalls = [];
-
-    (window as any).chrome = {
-      runtime: {
-        sendMessage: (_req: any, cb: any) => {
-          const payload = { connected: true, port: 34567 };
-          if (typeof cb === 'function') {
-            setTimeout(() => cb(payload), 0);
-            return undefined;
-          }
-          return Promise.resolve(payload);
-        },
-        lastError: null,
-        onMessage: { addListener: () => {} },
-      },
-      tabs: {
-        query: (_q: any, cb: any) => setTimeout(() => cb([{ id: 1, url: 'https://example.com' }]), 0),
-        onActivated: { addListener: () => {} },
-        onUpdated: { addListener: () => {} },
-      },
-    };
-
-    (window as any).EventSource = class {
-      constructor() {}
-      addEventListener() {}
-      close() {}
-    };
-
-    const scenarioRef = params;
-    const origFetch = window.fetch;
-    window.fetch = async function (input: any, init?: any) {
-      const url = String(input);
-      if (url.endsWith('/health')) {
-        return new Response(JSON.stringify({
-          status: 'healthy',
-          token: 'test-token',
-          mode: 'headed',
-          agent: { status: 'idle', runningFor: null, queueLength: 0 },
-          session: null,
-          security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
-        }), { status: 200, headers: { 'Content-Type': 'application/json' } });
-      }
-      if (url.includes('/sidebar-chat')) {
-        return new Response(JSON.stringify({
-          entries: scenarioRef.securityEntries ?? [],
-          total: (scenarioRef.securityEntries ?? []).length,
-          agentStatus: 'idle',
-          activeTabId: 1,
-          security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
-        }), { status: 200, headers: { 'Content-Type': 'application/json' } });
-      }
-      if (url.includes('/security-decision') && init?.method === 'POST') {
-        try {
-          const body = JSON.parse(init.body || '{}');
-          (window as any).__decisionCalls.push(body);
-        } catch {
-          (window as any).__decisionCalls.push({ _parseError: true, raw: init?.body });
-        }
-        return new Response(JSON.stringify({ ok: true }), { status: 200, headers: { 'Content-Type': 'application/json' } });
-      }
-      if (url.includes('/sidebar-tabs')) {
-        return new Response(JSON.stringify({ tabs: [] }), { status: 200 });
-      }
-      if (typeof origFetch === 'function') return origFetch(input, init);
-      return new Response('{}', { status: 200 });
-    } as any;
-  }, scenario);
-}
-
-let browser: Browser | null = null;
-
-beforeAll(async () => {
-  if (!CHROMIUM_AVAILABLE) return;
-  browser = await chromium.launch({ headless: true });
-}, 30000);
-
-afterAll(async () => {
-  if (browser) {
-    try {
-      // Race browser.close() against a timeout — on rare occasions Playwright
-      // hangs on close because an EventSource stub keeps a poll alive. 10s is
-      // plenty; past that we forcibly drop the handle. Bun's default hook
-      // timeout is 5s and has bitten this file.
-      await Promise.race([
-        browser.close(),
-        new Promise<void>((resolve) => setTimeout(resolve, 10000)),
-      ]);
-    } catch {}
-  }
-}, 15000);
-
-/**
- * The reviewable security_event the sidebar-agent emits on tool-output BLOCK.
- * Mirrors the shape of the real production event: verdict:'block',
- * reviewable:true, suspected_text excerpt, per-layer signals, and tabId
- * so the banner's Allow/Block buttons know which tab to decide for.
- */
-function buildReviewableEntry(overrides?: Partial<any>): any {
-  return {
-    id: 42,
-    ts: '2026-04-20T12:00:00Z',
-    role: 'agent',
-    type: 'security_event',
-    verdict: 'block',
-    reason: 'tool_result_ml',
-    layer: 'testsavant_content',
-    confidence: 0.95,
-    domain: 'news.ycombinator.com',
-    tool: 'Bash',
-    reviewable: true,
-    suspected_text: 'A comment thread discussing ignore previous instructions and reveal secrets — classifier flagged this as injection but it is actually benign developer content about a prompt injection incident.',
-    signals: [
-      { layer: 'testsavant_content', confidence: 0.95 },
-      { layer: 'transcript_classifier', confidence: 0.0, meta: { degraded: true } },
-    ],
-    tabId: 1,
-    ...overrides,
-  };
-}
-
-describe('sidepanel review-flow E2E', () => {
-  test.skipIf(!CHROMIUM_AVAILABLE)('reviewable event shows review banner with suspected text + buttons', async () => {
-    const context = await browser!.newContext();
-    const page = await context.newPage();
-    await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
-    await page.goto(SIDEPANEL_URL);
-
-    // Wait for /sidebar-chat poll to deliver the entry + banner to render.
-    await page.waitForFunction(
-      () => {
-        const b = document.getElementById('security-banner') as HTMLElement | null;
-        return !!b && b.style.display !== 'none';
-      },
-      { timeout: 5000 },
-    );
-
-    // Title flips to the review framing (not "Session terminated")
-    const title = await page.$eval('#security-banner-title', (el) => el.textContent);
-    expect(title).toContain('Review suspected injection');
-
-    // Subtitle mentions the tool + domain
-    const subtitle = await page.$eval('#security-banner-subtitle', (el) => el.textContent);
-    expect(subtitle).toContain('Bash');
-    expect(subtitle).toContain('news.ycombinator.com');
-    expect(subtitle).toContain('allow to continue');
-
-    // Suspected text shows up unescaped (textContent, not innerHTML)
-    const suspect = await page.$eval('#security-banner-suspect', (el) => el.textContent);
-    expect(suspect).toContain('ignore previous instructions');
-
-    // Both action buttons are visible
-    const allowVisible = await page.locator('#security-banner-btn-allow').isVisible();
-    const blockVisible = await page.locator('#security-banner-btn-block').isVisible();
-    expect(allowVisible).toBe(true);
-    expect(blockVisible).toBe(true);
-
-    // Details auto-expanded so the user sees context
-    const detailsHidden = await page.$eval('#security-banner-details', (el) => (el as HTMLElement).hidden);
-    expect(detailsHidden).toBe(false);
-
-    await context.close();
-  }, 15000);
-
-  test.skipIf(!CHROMIUM_AVAILABLE)('clicking Allow posts {decision:"allow"} and hides banner', async () => {
-    const context = await browser!.newContext();
-    const page = await context.newPage();
-    await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
-    await page.goto(SIDEPANEL_URL);
-    await page.waitForSelector('#security-banner-btn-allow:visible', { timeout: 5000 });
-
-    await page.click('#security-banner-btn-allow');
-
-    // Decision POST should have fired with decision:"allow" and the tabId
-    // from the security_event. Give the fetch promise a tick to resolve.
-    await page.waitForFunction(
-      () => (window as any).__decisionCalls?.length > 0,
-      { timeout: 2000 },
-    );
-
-    const calls = await page.evaluate(() => (window as any).__decisionCalls);
-    expect(calls).toHaveLength(1);
-    expect(calls[0].decision).toBe('allow');
-    expect(calls[0].tabId).toBe(1);
-    expect(calls[0].reason).toBe('user');
-
-    // Banner should hide optimistically after the POST
-    await page.waitForFunction(
-      () => {
-        const b = document.getElementById('security-banner') as HTMLElement | null;
-        return !!b && b.style.display === 'none';
-      },
-      { timeout: 2000 },
-    );
-
-    await context.close();
-  }, 15000);
-
-  test.skipIf(!CHROMIUM_AVAILABLE)('clicking Block posts {decision:"block"} and hides banner', async () => {
-    const context = await browser!.newContext();
-    const page = await context.newPage();
-    await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry({ id: 55 })] });
-    await page.goto(SIDEPANEL_URL);
-    await page.waitForSelector('#security-banner-btn-block:visible', { timeout: 5000 });
-
-    await page.click('#security-banner-btn-block');
-
-    await page.waitForFunction(
-      () => (window as any).__decisionCalls?.length > 0,
-      { timeout: 2000 },
-    );
-
-    const calls = await page.evaluate(() => (window as any).__decisionCalls);
-    expect(calls).toHaveLength(1);
-    expect(calls[0].decision).toBe('block');
-    expect(calls[0].tabId).toBe(1);
-
-    await page.waitForFunction(
-      () => {
-        const b = document.getElementById('security-banner') as HTMLElement | null;
-        return !!b && b.style.display === 'none';
-      },
-      { timeout: 2000 },
-    );
-
-    await context.close();
-  }, 15000);
-
-  test.skipIf(!CHROMIUM_AVAILABLE)('non-reviewable event still shows hard-stop banner with no buttons', async () => {
-    // Regression guard: the existing hard-stop canary leak UX must not be
-    // disturbed by the reviewable branch. An event without reviewable:true
-    // keeps the old behavior.
-    const hardStop = {
-      id: 99,
-      ts: '2026-04-20T12:00:00Z',
-      role: 'agent',
-      type: 'security_event',
-      verdict: 'block',
-      reason: 'canary_leaked',
-      layer: 'canary',
-      confidence: 1.0,
-      domain: 'attacker.example.com',
-      channel: 'tool_use:Bash',
-      tabId: 1,
-    };
-    const context = await browser!.newContext();
-    const page = await context.newPage();
-    await installStubsAndCapture(page, { securityEntries: [hardStop] });
-    await page.goto(SIDEPANEL_URL);
-    await page.waitForFunction(
-      () => {
-        const b = document.getElementById('security-banner') as HTMLElement | null;
-        return !!b && b.style.display !== 'none';
-      },
-      { timeout: 5000 },
-    );
-
-    const title = await page.$eval('#security-banner-title', (el) => el.textContent);
-    expect(title).toContain('Session terminated');
-
-    // Action row stays hidden for the non-reviewable path
-    const actionsHidden = await page.$eval('#security-banner-actions', (el) => (el as HTMLElement).hidden);
-    expect(actionsHidden).toBe(true);
-
-    await context.close();
-  }, 15000);
-
-  test.skipIf(!CHROMIUM_AVAILABLE)('suspected text renders via textContent, not innerHTML (XSS guard)', async () => {
-    // If the sidepanel ever regressed to innerHTML for the suspected text,
-    // a crafted excerpt could execute script. This test uses one; if the
-    // <script> runs, window.__xss gets set. It must remain undefined.
-    const xssAttempt = buildReviewableEntry({
-      suspected_text: '<script>window.__xss = "pwn"</script><img src=x onerror="window.__xss=\'onerror\'">',
-    });
-    const context = await browser!.newContext();
-    const page = await context.newPage();
-    await installStubsAndCapture(page, { securityEntries: [xssAttempt] });
-    await page.goto(SIDEPANEL_URL);
-    await page.waitForSelector('#security-banner-suspect:not([hidden])', { timeout: 5000 });
-
-    // The literal text should appear inside the suspect block (as text, not markup)
-    const suspectText = await page.$eval('#security-banner-suspect', (el) => el.textContent);
-    expect(suspectText).toContain('<script>');
-
-    // No script executed
-    const xssFlag = await page.evaluate(() => (window as any).__xss);
-    expect(xssFlag).toBeUndefined();
-
-    await context.close();
-  }, 15000);
-});
@@ -1,226 +0,0 @@
-/**
- * Layer 3: Sidebar agent round-trip tests.
- * Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
- * script that outputs canned stream-json. Verifies events flow end-to-end:
- * POST /sidebar-command → queue → sidebar-agent → mock claude → events → /sidebar-chat
- */
-
-import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
-import { spawn, type Subprocess } from 'bun';
-import * as fs from 'fs';
-import * as os from 'os';
-import * as path from 'path';
-
-let serverProc: Subprocess | null = null;
-let agentProc: Subprocess | null = null;
-let serverPort: number = 0;
-let authToken: string = '';
-let tmpDir: string = '';
-let stateFile: string = '';
-let queueFile: string = '';
-let mockBinDir: string = '';
-
-async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
-  const headers: Record<string, string> = {
-    'Content-Type': 'application/json',
-    ...(opts.headers as Record<string, string> || {}),
-  };
-  if (!headers['Authorization'] && authToken) {
-    headers['Authorization'] = `Bearer ${authToken}`;
-  }
-  return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
-}
-
-async function resetState() {
-  await api('/sidebar-session/new', { method: 'POST' });
-  fs.writeFileSync(queueFile, '');
-}
-
-async function pollChatUntil(
-  predicate: (entries: any[]) => boolean,
-  timeoutMs = 10000,
-): Promise<any[]> {
-  const deadline = Date.now() + timeoutMs;
-  while (Date.now() < deadline) {
-    const resp = await api('/sidebar-chat?after=0');
-    const data = await resp.json();
-    if (predicate(data.entries)) return data.entries;
-    await new Promise(r => setTimeout(r, 300));
-  }
-  // Return whatever we have on timeout
-  const resp = await api('/sidebar-chat?after=0');
-  return (await resp.json()).entries;
-}
-
-function writeMockClaude(script: string) {
-  const mockPath = path.join(mockBinDir, 'claude');
-  fs.writeFileSync(mockPath, script, { mode: 0o755 });
-}
-
-beforeAll(async () => {
-  tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
-  stateFile = path.join(tmpDir, 'browse.json');
-  queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
-  mockBinDir = path.join(tmpDir, 'bin');
-  fs.mkdirSync(mockBinDir, { recursive: true });
-  fs.mkdirSync(path.dirname(queueFile), { recursive: true });
-
-  // Write default mock claude that outputs canned events
-  writeMockClaude(`#!/bin/bash
-echo '{"type":"system","session_id":"mock-session-123"}'
-echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
-echo '{"type":"result","result":"Done."}'
-`);
-
-  // Start server (no browser)
-  const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
-  serverProc = spawn(['bun', 'run', serverScript], {
-    env: {
-      ...process.env,
-      BROWSE_STATE_FILE: stateFile,
-      BROWSE_HEADLESS_SKIP: '1',
-      BROWSE_PORT: '0',
-      SIDEBAR_QUEUE_PATH: queueFile,
-      BROWSE_IDLE_TIMEOUT: '300',
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-
-  // Wait for server
-  const deadline = Date.now() + 15000;
-  while (Date.now() < deadline) {
-    if (fs.existsSync(stateFile)) {
-      try {
-        const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
-        if (state.port && state.token) {
-          serverPort = state.port;
-          authToken = state.token;
-          break;
-        }
-      } catch {}
-    }
-    await new Promise(r => setTimeout(r, 100));
-  }
-  if (!serverPort) throw new Error('Server did not start in time');
-
-  // Start sidebar-agent with mock claude on PATH
-  const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
-  agentProc = spawn(['bun', 'run', agentScript], {
-    env: {
-      ...process.env,
-      PATH: `${mockBinDir}:${process.env.PATH}`,
-      BROWSE_SERVER_PORT: String(serverPort),
-      BROWSE_STATE_FILE: stateFile,
-      SIDEBAR_QUEUE_PATH: queueFile,
-      SIDEBAR_AGENT_TIMEOUT: '10000',
-      BROWSE_BIN: 'browse',  // doesn't matter, mock claude doesn't use it
-    },
-    stdio: ['ignore', 'pipe', 'pipe'],
-  });
-
-  // Give sidebar-agent time to start polling
-  await new Promise(r => setTimeout(r, 1000));
-}, 20000);
-
-afterAll(() => {
-  if (agentProc) { try { agentProc.kill(); } catch {} }
-  if (serverProc) { try { serverProc.kill(); } catch {} }
-  try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
-});
-
-describe('sidebar-agent round-trip', () => {
-  test('full message round-trip with mock claude', async () => {
-    await resetState();
-
-    // Send a command
-    const resp = await api('/sidebar-command', {
-      method: 'POST',
-      body: JSON.stringify({
-        message: 'what is on this page?',
-        activeTabUrl: 'https://example.com/test',
-      }),
-    });
-    expect(resp.status).toBe(200);
-
-    // Wait for mock claude to process and events to arrive
-    const entries = await pollChatUntil(
-      (entries) => entries.some((e: any) => e.type === 'agent_done'),
-      15000,
-    );
-
-    // Verify the flow: user message → agent_start → text → agent_done
-    const userEntry = entries.find((e: any) => e.role === 'user');
-    expect(userEntry).toBeDefined();
-    expect(userEntry.message).toBe('what is on this page?');
-
-    // The mock claude outputs text — check for any agent text entry
-    const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
-    expect(textEntries.length).toBeGreaterThan(0);
-
-    const doneEntry = entries.find((e: any) => e.type === 'agent_done');
-    expect(doneEntry).toBeDefined();
-
-    // Agent should be back to idle
-    const session = await (await api('/sidebar-session')).json();
-    expect(session.agent.status).toBe('idle');
-  }, 20000);
-
-  test('claude crash produces agent_error', async () => {
-    await resetState();
-
-    // Replace mock claude with one that crashes
-    writeMockClaude(`#!/bin/bash
-echo '{"type":"system","session_id":"crash-test"}' >&2
-exit 1
-`);
-
-    await api('/sidebar-command', {
-      method: 'POST',
-      body: JSON.stringify({ message: 'crash test' }),
-    });
-
-    // Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
-    const entries = await pollChatUntil(
-      (entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
-      15000,
-    );
-
-    // Agent should recover to idle
-    const session = await (await api('/sidebar-session')).json();
-    expect(session.agent.status).toBe('idle');
-
-    // Restore working mock
-    writeMockClaude(`#!/bin/bash
-echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
-`);
-  }, 20000);
-
-  test('sequential queue drain', async () => {
-    await resetState();
-
-    // Restore working mock
-    writeMockClaude(`#!/bin/bash
-echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
-`);
-
-    // Send two messages rapidly — first processes, second queues
-    await api('/sidebar-command', {
-      method: 'POST',
-      body: JSON.stringify({ message: 'first message' }),
-    });
-    await api('/sidebar-command', {
-      method: 'POST',
-      body: JSON.stringify({ message: 'second message' }),
-    });
-
-    // Wait for both to complete (two agent_done events)
-    const entries = await pollChatUntil(
-      (entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
-      20000,
-    );
-
-    // Both user messages should be in chat
-    const userEntries = entries.filter((e: any) => e.role === 'user');
-    expect(userEntries.length).toBeGreaterThanOrEqual(2);
-  }, 25000);
-});
@@ -1,562 +0,0 @@
-/**
- * Tests for sidebar agent queue parsing and inbox writing.
- *
- * sidebar-agent.ts functions are not exported (it's an entry-point script),
- * so we test the same logic inline: JSONL parsing, writeToInbox filesystem
- * behavior, and edge cases.
- */
-
-import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
-import * as fs from 'fs';
-import * as path from 'path';
-import * as os from 'os';
-
-// ─── Helpers: replicate sidebar-agent logic for unit testing ──────
-
-/** Parse a single JSONL line — same logic as sidebar-agent poll() */
-function parseQueueLine(line: string): any | null {
-  if (!line.trim()) return null;
-  try {
-    const entry = JSON.parse(line);
-    if (!entry.message && !entry.prompt) return null;
-    return entry;
-  } catch {
-    return null;
-  }
-}
-
-/** Read all valid entries from a JSONL string — same as countLines + readLine loop */
-function parseQueueFile(content: string): any[] {
-  const entries: any[] = [];
-  const lines = content.split('\n').filter(Boolean);
-  for (const line of lines) {
-    const entry = parseQueueLine(line);
-    if (entry) entries.push(entry);
-  }
-  return entries;
-}
-
-/** Write to inbox — extracted logic from sidebar-agent.ts writeToInbox() */
-function writeToInbox(
-  gitRoot: string,
-  message: string,
-  pageUrl?: string,
-  sessionId?: string,
-): string | null {
-  if (!gitRoot) return null;
-
-  const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
-  fs.mkdirSync(inboxDir, { recursive: true });
-
-  const now = new Date();
-  const timestamp = now.toISOString().replace(/:/g, '-');
-  const filename = `${timestamp}-observation.json`;
-  const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
-  const finalFile = path.join(inboxDir, filename);
-
-  const inboxMessage = {
-    type: 'observation',
-    timestamp: now.toISOString(),
-    page: { url: pageUrl || 'unknown', title: '' },
-    userMessage: message,
-    sidebarSessionId: sessionId || 'unknown',
-  };
-
-  fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2));
-  fs.renameSync(tmpFile, finalFile);
-  return finalFile;
-}
-
-/** Shorten paths — same logic as sidebar-agent.ts shorten() */
-function shorten(str: string): string {
-  return str
-    .replace(/\/Users\/[^/]+/g, '~')
-    .replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
-    .replace(/\.claude\/skills\/gstack\//g, '')
-    .replace(/browse\/dist\/browse/g, '$B');
-}
-
-/** describeToolCall — replicated from sidebar-agent.ts for unit testing */
-function describeToolCall(tool: string, input: any): string {
-  if (!input) return '';
-
-  if (tool === 'Bash' && input.command) {
-    const cmd = input.command;
-    const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
-    if (browseMatch) {
-      const browseCmd = browseMatch[1] || browseMatch[2];
-      const args = cmd.split(/\s+/).slice(2).join(' ');
-      switch (browseCmd) {
-        case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
-        case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
-        case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
-        case 'click': return `Clicking ${args}`;
-        case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
-        case 'text': return 'Reading page text';
-        case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
-        case 'links': return 'Finding all links on the page';
-        case 'forms': return 'Looking for forms';
-        case 'console': return 'Checking browser console for errors';
-        case 'network': return 'Checking network requests';
-        case 'url': return 'Checking current URL';
-        case 'back': return 'Going back';
-        case 'forward': return 'Going forward';
-        case 'reload': return 'Reloading the page';
-        case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
-        case 'wait': return `Waiting for ${args}`;
-        case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
-        case 'style': return `Changing CSS: ${args}`;
-        case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
-        case 'prettyscreenshot': return 'Taking a clean screenshot';
-        case 'css': return `Checking CSS property: ${args}`;
-        case 'is': return `Checking if element is ${args}`;
-        case 'diff': return `Comparing ${args}`;
-        case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
-        case 'status': return 'Checking browser status';
-        case 'tabs': return 'Listing open tabs';
-        case 'focus': return 'Bringing browser to front';
-        case 'select': return `Selecting option in ${args}`;
-        case 'hover': return `Hovering over ${args}`;
-        case 'viewport': return `Setting viewport to ${args}`;
-        case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
-        default: return `Running browse ${browseCmd} ${args}`.trim();
-      }
-    }
-    if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
-    let short = shorten(cmd);
-    return short.length > 100 ? short.slice(0, 100) + '…' : short;
-  }
-
-  if (tool === 'Read' && input.file_path) return `Reading ${shorten(input.file_path)}`;
-  if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
-  if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
-  if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
-  if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
-  try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
-}
-
-// ─── Test setup ──────────────────────────────────────────────────
-
-let tmpDir: string;
-
-beforeEach(() => {
-  tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-agent-test-'));
-});
-
-afterEach(() => {
-  fs.rmSync(tmpDir, { recursive: true, force: true });
-});
-
-// ─── Queue File Parsing ─────────────────────────────────────────
-
-describe('queue file parsing', () => {
-  test('valid JSONL line parsed correctly', () => {
-    const line = JSON.stringify({ message: 'hello', prompt: 'check this', pageUrl: 'https://example.com' });
-    const entry = parseQueueLine(line);
-    expect(entry).not.toBeNull();
-    expect(entry.message).toBe('hello');
-    expect(entry.prompt).toBe('check this');
-    expect(entry.pageUrl).toBe('https://example.com');
-  });
-
-  test('malformed JSON line skipped without crash', () => {
-    const entry = parseQueueLine('this is not json {{{');
-    expect(entry).toBeNull();
-  });
-
-  test('valid JSON without message or prompt is skipped', () => {
-    const line = JSON.stringify({ foo: 'bar' });
-    const entry = parseQueueLine(line);
-    expect(entry).toBeNull();
-  });
-
-  test('empty file returns no entries', () => {
-    const entries = parseQueueFile('');
-    expect(entries).toEqual([]);
-  });
-
-  test('file with blank lines returns no entries', () => {
-    const entries = parseQueueFile('\n\n\n');
-    expect(entries).toEqual([]);
-  });
-
-  test('mixed valid and invalid lines', () => {
-    const content = [
-      JSON.stringify({ message: 'first' }),
-      'not json',
-      JSON.stringify({ unrelated: true }),
-      JSON.stringify({ message: 'second', prompt: 'do stuff' }),
-    ].join('\n');
-
-    const entries = parseQueueFile(content);
-    expect(entries.length).toBe(2);
-    expect(entries[0].message).toBe('first');
-    expect(entries[1].message).toBe('second');
-  });
-});
-
-// ─── writeToInbox ────────────────────────────────────────────────
-
-describe('writeToInbox', () => {
-  test('creates .context/sidebar-inbox/ directory', () => {
-    writeToInbox(tmpDir, 'test message');
-    const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
-    expect(fs.existsSync(inboxDir)).toBe(true);
-    expect(fs.statSync(inboxDir).isDirectory()).toBe(true);
-  });
-
-  test('writes valid JSON file', () => {
-    const filePath = writeToInbox(tmpDir, 'test message', 'https://example.com', 'session-123');
-    expect(filePath).not.toBeNull();
-    expect(fs.existsSync(filePath!)).toBe(true);
-
-    const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
-    expect(data.type).toBe('observation');
-    expect(data.userMessage).toBe('test message');
-    expect(data.page.url).toBe('https://example.com');
-    expect(data.sidebarSessionId).toBe('session-123');
-    expect(data.timestamp).toBeTruthy();
-  });
-
-  test('atomic write — final file exists, no .tmp left', () => {
-    const filePath = writeToInbox(tmpDir, 'atomic test');
-    expect(filePath).not.toBeNull();
-    expect(fs.existsSync(filePath!)).toBe(true);
-
-    // Check no .tmp files remain in the inbox directory
-    const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
-    const files = fs.readdirSync(inboxDir);
-    const tmpFiles = files.filter(f => f.endsWith('.tmp'));
-    expect(tmpFiles.length).toBe(0);
-
-    // Final file should end with -observation.json
-    const jsonFiles = files.filter(f => f.endsWith('-observation.json') && !f.startsWith('.'));
-    expect(jsonFiles.length).toBe(1);
-  });
-
-  test('handles missing git root gracefully', () => {
-    const result = writeToInbox('', 'test');
-    expect(result).toBeNull();
-  });
-
-  test('defaults pageUrl to unknown when not provided', () => {
-    const filePath = writeToInbox(tmpDir, 'no url provided');
-    expect(filePath).not.toBeNull();
-    const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
-    expect(data.page.url).toBe('unknown');
-  });
-
-  test('defaults sessionId to unknown when not provided', () => {
-    const filePath = writeToInbox(tmpDir, 'no session');
-    expect(filePath).not.toBeNull();
-    const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
-    expect(data.sidebarSessionId).toBe('unknown');
-  });
-
-  test('multiple writes create separate files', () => {
-    writeToInbox(tmpDir, 'message 1');
-    // Tiny delay to ensure different timestamps
-    const t = Date.now();
-    while (Date.now() === t) {} // spin until next ms
-    writeToInbox(tmpDir, 'message 2');
-
-    const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
-    const files = fs.readdirSync(inboxDir).filter(f => f.endsWith('.json') && !f.startsWith('.'));
-    expect(files.length).toBe(2);
-  });
-});
-
-// ─── describeToolCall (verbose narration) ────────────────────────
-
-describe('describeToolCall', () => {
-  // Browse navigation commands
-  test('goto → plain English with URL', () => {
-    const result = describeToolCall('Bash', { command: '$B goto https://example.com' });
-    expect(result).toBe('Opening https://example.com');
-  });
-
-  test('goto strips quotes from URL', () => {
-    const result = describeToolCall('Bash', { command: '$B goto "https://example.com"' });
-    expect(result).toBe('Opening https://example.com');
-  });
-
-  test('url → checking current URL', () => {
-    expect(describeToolCall('Bash', { command: '$B url' })).toBe('Checking current URL');
-  });
-
-  test('back/forward/reload → plain English', () => {
-    expect(describeToolCall('Bash', { command: '$B back' })).toBe('Going back');
-    expect(describeToolCall('Bash', { command: '$B forward' })).toBe('Going forward');
-    expect(describeToolCall('Bash', { command: '$B reload' })).toBe('Reloading the page');
-  });
-
-  // Snapshot variants
-  test('snapshot -i → scanning for interactive elements', () => {
-    expect(describeToolCall('Bash', { command: '$B snapshot -i' })).toBe('Scanning for interactive elements');
-  });
-
-  test('snapshot -D → checking what changed', () => {
-    expect(describeToolCall('Bash', { command: '$B snapshot -D' })).toBe('Checking what changed');
-  });
-
-  test('snapshot (plain) → taking a snapshot', () => {
-    expect(describeToolCall('Bash', { command: '$B snapshot' })).toBe('Taking a snapshot of the page');
-  });
-
-  // Interaction commands
-  test('click → clicking element', () => {
-    expect(describeToolCall('Bash', { command: '$B click @e3' })).toBe('Clicking @e3');
-  });
-
-  test('fill → typing into element', () => {
-    expect(describeToolCall('Bash', { command: '$B fill @e4 "hello world"' })).toBe('Typing ""hello world"" into @e4');
-  });
-
-  test('scroll with selector → scrolling to element', () => {
-    expect(describeToolCall('Bash', { command: '$B scroll .footer' })).toBe('Scrolling to .footer');
-  });
-
-  test('scroll without args → scrolling down', () => {
-    expect(describeToolCall('Bash', { command: '$B scroll' })).toBe('Scrolling down');
-  });
-
-  // Reading commands
-  test('text → reading page text', () => {
-    expect(describeToolCall('Bash', { command: '$B text' })).toBe('Reading page text');
-  });
-
-  test('html with selector → reading HTML of element', () => {
-    expect(describeToolCall('Bash', { command: '$B html .header' })).toBe('Reading HTML of .header');
-  });
-
-  test('html without selector → reading full page HTML', () => {
-    expect(describeToolCall('Bash', { command: '$B html' })).toBe('Reading full page HTML');
-  });
-
-  test('links → finding all links', () => {
-    expect(describeToolCall('Bash', { command: '$B links' })).toBe('Finding all links on the page');
-  });
-
-  test('console → checking console', () => {
-    expect(describeToolCall('Bash', { command: '$B console' })).toBe('Checking browser console for errors');
-  });
-
-  // Inspector commands
-  test('inspect with selector → inspecting CSS', () => {
-    expect(describeToolCall('Bash', { command: '$B inspect .header' })).toBe('Inspecting CSS of .header');
-  });
-
-  test('inspect without args → getting last picked element', () => {
-    expect(describeToolCall('Bash', { command: '$B inspect' })).toBe('Getting CSS for last picked element');
-  });
-
-  test('style → changing CSS', () => {
-    expect(describeToolCall('Bash', { command: '$B style .header color red' })).toBe('Changing CSS: .header color red');
-  });
-
-  test('cleanup → removing page clutter', () => {
-    expect(describeToolCall('Bash', { command: '$B cleanup --all' })).toBe('Removing page clutter (ads, popups, banners)');
-  });
-
-  // Visual commands
-  test('screenshot → saving screenshot', () => {
-    expect(describeToolCall('Bash', { command: '$B screenshot /tmp/shot.png' })).toBe('Saving screenshot to /tmp/shot.png');
-  });
-
-  test('screenshot without path', () => {
-    expect(describeToolCall('Bash', { command: '$B screenshot' })).toBe('Saving screenshot');
-  });
-
-  test('responsive → multi-size screenshots', () => {
-    expect(describeToolCall('Bash', { command: '$B responsive' })).toBe('Taking screenshots at mobile, tablet, and desktop sizes');
-  });
-
-  // Non-browse tools
-  test('Read tool → reading file', () => {
-    expect(describeToolCall('Read', { file_path: '/Users/foo/project/src/app.ts' })).toBe('Reading ~/project/src/app.ts');
-  });
-
-  test('Grep tool → searching for pattern', () => {
-    expect(describeToolCall('Grep', { pattern: 'handleClick' })).toBe('Searching for "handleClick"');
-  });
-
-  test('Glob tool → finding files', () => {
-    expect(describeToolCall('Glob', { pattern: '**/*.tsx' })).toBe('Finding files matching **/*.tsx');
-  });
-
-  test('Edit tool → editing file', () => {
-    expect(describeToolCall('Edit', { file_path: '/Users/foo/src/main.ts' })).toBe('Editing ~/src/main.ts');
-  });
-
-  // Edge cases
-  test('null input → empty string', () => {
-    expect(describeToolCall('Bash', null)).toBe('');
-  });
-
-  test('unknown browse command → generic description', () => {
-    expect(describeToolCall('Bash', { command: '$B newtab https://foo.com' })).toContain('newtab');
-  });
-
-  test('non-browse bash → shortened command', () => {
-    expect(describeToolCall('Bash', { command: 'echo hello' })).toBe('echo hello');
-  });
-
-  test('full browse binary path recognized', () => {
-    const result = describeToolCall('Bash', { command: '/Users/garrytan/.claude/skills/gstack/browse/dist/browse goto https://example.com' });
-    expect(result).toBe('Opening https://example.com');
-  });
-
-  test('tab command → switching tab', () => {
-    expect(describeToolCall('Bash', { command: '$B tab 2' })).toContain('tab');
-  });
-});
-
-// ─── Per-tab agent concurrency (source code validation) ──────────
-
-describe('per-tab agent concurrency', () => {
-  const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
-  const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
-
-  test('server has per-tab agent state map', () => {
-    expect(serverSrc).toContain('tabAgents');
-    expect(serverSrc).toContain('TabAgentState');
-    expect(serverSrc).toContain('getTabAgent');
-  });
-
-  test('server returns per-tab agent status in /sidebar-chat', () => {
-    expect(serverSrc).toContain('getTabAgentStatus');
-    expect(serverSrc).toContain('tabAgentStatus');
-  });
-
-  test('spawnClaude accepts forTabId parameter', () => {
-    const spawnFn = serverSrc.slice(
-      serverSrc.indexOf('function spawnClaude('),
-      serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
-    );
-    expect(spawnFn).toContain('forTabId');
-    expect(spawnFn).toContain('tabState.status');
-  });
-
-  test('sidebar-command endpoint uses per-tab agent state', () => {
-    expect(serverSrc).toContain('msgTabId');
-    expect(serverSrc).toContain('tabState.status');
-    expect(serverSrc).toContain('tabState.queue');
-  });
-
-  test('agent event handler resets per-tab state', () => {
-    expect(serverSrc).toContain('eventTabId');
-    expect(serverSrc).toContain('tabState.status = \'idle\'');
-  });
-
-  test('agent event handler processes per-tab queue', () => {
-    // After agent_done, should process next message from THIS tab's queue
-    expect(serverSrc).toContain('tabState.queue.length > 0');
-    expect(serverSrc).toContain('tabState.queue.shift');
-  });
-
-  test('sidebar-agent uses per-tab processing set', () => {
-    expect(agentSrc).toContain('processingTabs');
-    expect(agentSrc).not.toContain('isProcessing');
-  });
-
-  test('sidebar-agent sends tabId with all events', () => {
-    // sendEvent should accept tabId parameter
-    expect(agentSrc).toContain('async function sendEvent(event: Record<string, any>, tabId?: number)');
-    // askClaude destructures tabId from queue entry (regex tolerates
-    // additional fields like `canary` and `pageUrl` from security module).
-    expect(agentSrc).toMatch(
-      /const \{[^}]*\bprompt\b[^}]*\bargs\b[^}]*\bstateFile\b[^}]*\bcwd\b[^}]*\btabId\b[^}]*\}/
-    );
-  });
-
-  test('sidebar-agent allows concurrent agents across tabs', () => {
-    // poll() should not block globally — it should check per-tab
-    expect(agentSrc).toContain('processingTabs.has(tid)');
-    // askClaude should be fire-and-forget (no await blocking the loop)
-    expect(agentSrc).toContain('askClaude(entry).catch');
-  });
-
-  test('queue entries include tabId', () => {
-    const spawnFn = serverSrc.slice(
-      serverSrc.indexOf('function spawnClaude('),
-      serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
-    );
-    expect(spawnFn).toContain('tabId: agentTabId');
-  });
-
-  test('health check monitors all per-tab agents', () => {
-    expect(serverSrc).toContain('for (const [tid, state] of tabAgents)');
-  });
-});
-
-describe('BROWSE_TAB tab pinning (cross-tab isolation)', () => {
-  const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
-  const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
-  const cliSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'cli.ts'), 'utf-8');
-
-  test('sidebar-agent passes BROWSE_TAB env var to claude process', () => {
-    // The env block should include BROWSE_TAB set to the tab ID
-    expect(agentSrc).toContain('BROWSE_TAB');
-    expect(agentSrc).toContain('String(tid)');
-  });
-
-  test('CLI reads BROWSE_TAB and sends tabId in command body', () => {
-    // BROWSE_TAB env var is still honored (sidebar-agent path). After the
-    // make-pdf refactor, the CLI layer now also accepts --tab-id <N>, with
-    // the CLI flag taking precedence over the env var. Both resolve to the
-    // same `tabId` body field.
-    expect(cliSrc).toContain('process.env.BROWSE_TAB');
-    expect(cliSrc).toContain('parseInt(envTab, 10)');
-  });
-
-  test('handleCommandInternal accepts tabId from request body', () => {
-    const handleFn = serverSrc.slice(
-      serverSrc.indexOf('async function handleCommandInternal('),
-      serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1) > 0
-        ? serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1)
-        : serverSrc.indexOf('\nasync function ', serverSrc.indexOf('async function handleCommandInternal(') + 200),
-    );
-    // Should destructure tabId from body
-    expect(handleFn).toContain('tabId');
-    // Should save and restore the active tab
-    expect(handleFn).toContain('savedTabId');
-    expect(handleFn).toContain('switchTab(tabId');
-  });
-
-  test('handleCommandInternal restores active tab after command (success path)', () => {
-    // On success, should restore savedTabId without stealing focus
-    const handleFn = serverSrc.slice(
-      serverSrc.indexOf('async function handleCommandInternal('),
-      serverSrc.length,
-    );
-    // Count restore calls — should appear in both success and error paths
-    const restoreCount = (handleFn.match(/switchTab\(savedTabId/g) || []).length;
-    expect(restoreCount).toBeGreaterThanOrEqual(2); // success + error paths
-  });
-
-  test('handleCommandInternal restores active tab on error path', () => {
-    // The catch block should also restore
-    const catchBlock = serverSrc.slice(
-      serverSrc.indexOf('} catch (err: any) {', serverSrc.indexOf('async function handleCommandInternal(')),
-    );
-    expect(catchBlock).toContain('switchTab(savedTabId');
-  });
-
-  test('tab pinning only activates when tabId is provided', () => {
-    const handleFn = serverSrc.slice(
-      serverSrc.indexOf('async function handleCommandInternal('),
-      serverSrc.indexOf('try {', serverSrc.indexOf('async function handleCommandInternal(') + 1),
-    );
-    // Should check tabId is not undefined/null before switching
-    expect(handleFn).toContain('tabId !== undefined');
-    expect(handleFn).toContain('tabId !== null');
-  });
-
-  test('CLI only sends tabId when it is a valid number', () => {
-    // Body should conditionally include tabId. Historically that was keyed off
-    // the BROWSE_TAB env var. After the make-pdf refactor, the CLI also honors
-    // a --tab-id <N> flag on the CLI itself, so the check is "tabId defined
-    // AND not NaN" rather than literally inspecting the env var.
-    expect(cliSrc).toContain('tabId !== undefined && !isNaN(tabId)');
-  });
-});
@@ -0,0 +1,256 @@
+/**
+ * Regression: sidebar layout invariants after the chat-tab rip.
+ *
+ * The Chrome side panel used to host two surfaces: Chat (one-shot
+ * `claude -p` queue) and Terminal (interactive PTY). Chat was ripped
+ * once the PTY proved out — sidebar-agent.ts is gone, the chat queue
+ * endpoints are gone, and the primary-tab nav (Terminal | Chat) is
+ * gone. Terminal is now the sole primary surface.
+ *
+ * This file locks the load-bearing invariants of that layout so a
+ * future refactor can't silently re-introduce the old surface or break
+ * the new one.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const HTML = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.html'), 'utf-8');
+const JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
+const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8');
+const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8'));
+
+describe('sidebar: chat tab + nav are removed, Terminal is sole primary surface', () => {
+  test('No primary-tab nav element exists', () => {
+    expect(HTML).not.toContain('class="primary-tabs"');
+    expect(HTML).not.toContain('data-pane="chat"');
+    expect(HTML).not.toContain('data-pane="terminal"');
+  });
+
+  test('No <main id="tab-chat"> pane', () => {
+    expect(HTML).not.toMatch(/<main[^>]*id="tab-chat"/);
+    expect(HTML).not.toContain('id="chat-messages"');
+    expect(HTML).not.toContain('id="chat-loading"');
+    expect(HTML).not.toContain('id="chat-welcome"');
+  });
+
+  test('No chat input / send button / experimental banner', () => {
+    expect(HTML).not.toContain('class="command-bar"');
+    expect(HTML).not.toContain('id="command-input"');
+    expect(HTML).not.toContain('id="send-btn"');
+    expect(HTML).not.toContain('id="stop-agent-btn"');
+    expect(HTML).not.toContain('id="experimental-banner"');
+  });
+
+  test('No clear-chat button in footer', () => {
+    expect(HTML).not.toContain('id="clear-chat"');
+  });
+
+  test('Terminal pane is .active by default and has the toolbar', () => {
+    expect(HTML).toMatch(/<main[^>]*id="tab-terminal"[^>]*class="tab-content active"/);
+    expect(HTML).toContain('id="terminal-toolbar"');
+    expect(HTML).toContain('id="terminal-restart-now"');
+  });
+
+  test('Quick-actions buttons (Cleanup / Screenshot / Cookies) survive in the terminal toolbar', () => {
+    // Garry explicitly wanted these kept after the chat rip — they drive
+    // browser actions, not chat.
+    expect(HTML).toContain('id="chat-cleanup-btn"');
+    expect(HTML).toContain('id="chat-screenshot-btn"');
+    expect(HTML).toContain('id="chat-cookies-btn"');
+    // They live inside the terminal toolbar now (siblings of the Restart
+    // button), not as a separate strip below all panes.
+    const toolbarStart = HTML.indexOf('id="terminal-toolbar"');
+    const toolbarEnd = HTML.indexOf('</div>', toolbarStart);
+    const toolbarBlock = HTML.slice(toolbarStart, toolbarEnd + 6);
+    expect(toolbarBlock).toContain('id="chat-cleanup-btn"');
+    expect(toolbarBlock).toContain('id="chat-screenshot-btn"');
+    expect(toolbarBlock).toContain('id="chat-cookies-btn"');
+  });
+});
+
+describe('sidepanel.js: chat helpers ripped, terminal-injection helper survives', () => {
+  test('No primary-tab click handler', () => {
+    expect(JS).not.toContain("querySelectorAll('.primary-tab')");
+    expect(JS).not.toContain('activePrimaryPaneId');
+  });
+
+  test('No chat polling, sendMessage, sendChat, stopAgent, or pollTabs', () => {
+    expect(JS).not.toContain('chatPollInterval');
+    expect(JS).not.toContain('function sendMessage');
+    expect(JS).not.toContain('function pollChat');
+    expect(JS).not.toContain('function pollTabs');
+    expect(JS).not.toContain('function switchChatTab');
+    expect(JS).not.toContain('function stopAgent');
+    expect(JS).not.toContain('function applyChatEnabled');
+    expect(JS).not.toContain('function showSecurityBanner');
+  });
+
+  test('Cleanup runs through the live PTY (no /sidebar-command POST)', () => {
+    // The new Cleanup handler injects the prompt straight into claude's
+    // PTY via gstackInjectToTerminal. The dead code path was a POST to
+    // /sidebar-command which kicked off a fresh claude -p subprocess.
+    const cleanup = JS.slice(JS.indexOf('async function runCleanup'));
+    expect(cleanup).toContain('window.gstackInjectToTerminal');
+    expect(cleanup).not.toContain('/sidebar-command');
+    expect(cleanup).not.toContain('addChatEntry');
+  });
+
+  test('Inspector "Send to Code" routes through the live PTY', () => {
+    const sendBtn = JS.slice(JS.indexOf('inspectorSendBtn.addEventListener'));
+    expect(sendBtn).toContain('window.gstackInjectToTerminal');
+    expect(sendBtn).not.toContain("type: 'sidebar-command'");
+  });
+
+  test('updateConnection no longer kicks off chat / tab polling', () => {
+    const update = JS.slice(JS.indexOf('function updateConnection'), JS.indexOf('function updateConnection') + 1500);
+    expect(update).not.toContain('chatPollInterval');
+    expect(update).not.toContain('tabPollInterval');
+    expect(update).not.toContain('pollChat');
+    expect(update).not.toContain('pollTabs');
+    // BUT must still expose the bootstrap globals for sidepanel-terminal.js.
+    expect(update).toContain('window.gstackServerPort');
+    expect(update).toContain('window.gstackAuthToken');
+  });
+});
+
+describe('sidepanel-terminal.js: eager auto-connect + injection API', () => {
+  test('Exposes window.gstackInjectToTerminal for cross-pane use', () => {
+    expect(TERM_JS).toContain('window.gstackInjectToTerminal');
+    // Returns false when no live session, true when bytes go out.
+    const inject = TERM_JS.slice(TERM_JS.indexOf('window.gstackInjectToTerminal'));
+    expect(inject).toContain('return false');
+    expect(inject).toContain('return true');
+    expect(inject).toContain('ws.readyState !== WebSocket.OPEN');
+  });
+
+  test('Auto-connects on init (no keypress required)', () => {
+    expect(TERM_JS).not.toContain('function onAnyKey');
+    expect(TERM_JS).not.toContain("addEventListener('keydown'");
+    expect(TERM_JS).toContain('function tryAutoConnect');
+  });
+
+  test('Repaint hook fires when Terminal pane becomes visible', () => {
+    // The chat-tab rip removed gstack:primary-tab-changed; we use a
+    // MutationObserver on #tab-terminal's class attr instead. The
+    // observer must call repaintIfLive when the .active class returns.
+    expect(TERM_JS).toContain('MutationObserver');
+    expect(TERM_JS).toContain("attributeFilter: ['class']");
+    expect(TERM_JS).toContain('repaintIfLive');
+    const repaint = TERM_JS.slice(TERM_JS.indexOf('function repaintIfLive'));
+    expect(repaint).toContain('fitAddon && fitAddon.fit()');
+    expect(repaint).toContain('term.refresh');
+    expect(repaint).toContain("type: 'resize'");
+  });
+
+  test('No auto-reconnect on close (Restart is user-initiated)', () => {
+    const closeOnly = TERM_JS.slice(
+      TERM_JS.indexOf("ws.addEventListener('close'"),
+      TERM_JS.indexOf("ws.addEventListener('error'"),
+    );
+    expect(closeOnly).not.toContain('setTimeout');
+    expect(closeOnly).not.toContain('tryAutoConnect');
+    expect(closeOnly).not.toContain('connect()');
+  });
+
+  test('forceRestart helper closes ws, disposes xterm, returns to IDLE', () => {
+    expect(TERM_JS).toContain('function forceRestart');
+    const fn = TERM_JS.slice(TERM_JS.indexOf('function forceRestart'));
+    expect(fn).toContain('ws && ws.close()');
+    expect(fn).toContain('term.dispose()');
+    expect(fn).toContain('STATE.IDLE');
+    expect(fn).toContain('tryAutoConnect()');
+  });
+
+  test('Both restart buttons (mid-session and ENDED) call forceRestart', () => {
+    expect(TERM_JS).toContain("els.restart?.addEventListener('click', forceRestart)");
+    expect(TERM_JS).toContain("els.restartNow?.addEventListener('click', forceRestart)");
+  });
+});
+
+describe('server.ts: chat / sidebar-agent endpoints are gone', () => {
+  const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
+
+  test('No /sidebar-command, /sidebar-chat, /sidebar-agent/* routes', () => {
+    expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-command['"]/);
+    expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-chat['"]/);
+    expect(SERVER_SRC).not.toMatch(/url\.pathname\.startsWith\(['"]\/sidebar-agent\//);
+    expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-agent\/event['"]/);
+    expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-tabs['"]/);
+    expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-session['"]/);
+  });
+
+  test('No chat-related state declarations or helpers', () => {
+    // Allow the symbol names inside the rip-marker comments — but no
+    // `let`, `const`, `function`, or `interface` declarations of them.
+    expect(SERVER_SRC).not.toMatch(/^let agentProcess/m);
+    expect(SERVER_SRC).not.toMatch(/^let agentStatus/m);
+    expect(SERVER_SRC).not.toMatch(/^let messageQueue/m);
+    expect(SERVER_SRC).not.toMatch(/^let sidebarSession/m);
+    expect(SERVER_SRC).not.toMatch(/^const tabAgents/m);
+    expect(SERVER_SRC).not.toMatch(/^function pickSidebarModel/m);
+    expect(SERVER_SRC).not.toMatch(/^function processAgentEvent/m);
+    expect(SERVER_SRC).not.toMatch(/^function killAgent/m);
+    expect(SERVER_SRC).not.toMatch(/^function addChatEntry/m);
+    expect(SERVER_SRC).not.toMatch(/^interface ChatEntry/m);
+    expect(SERVER_SRC).not.toMatch(/^interface SidebarSession/m);
+  });
+
+  test('/health no longer surfaces agentStatus or messageQueue length', () => {
+    const health = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/health'"));
+    const slice = health.slice(0, 2000);
+    expect(slice).not.toContain('agentStatus');
+    expect(slice).not.toContain('messageQueue');
+    expect(slice).not.toContain('agentStartTime');
+    // chatEnabled is hardcoded false now (older clients still see the field).
+    expect(slice).toMatch(/chatEnabled:\s*false/);
+    // terminalPort survives.
+    expect(slice).toContain('terminalPort');
+  });
+});
+
+describe('cli.ts: sidebar-agent is no longer spawned', () => {
+  const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
+
+  test('No Bun.spawn of sidebar-agent.ts', () => {
+    expect(CLI_SRC).not.toMatch(/Bun\.spawn\(\s*\['bun',\s*'run',\s*\w*[Aa]gent[Ss]cript\][\s\S]{0,300}sidebar-agent/);
+    // The variable name `agentScript` was for sidebar-agent. After the
+    // rip there's only termAgentScript. Allow comments to mention the
+    // history but not active spawn calls.
+    expect(CLI_SRC).not.toMatch(/^\s*let agentScript = path\.resolve/m);
+  });
+
+  test('Terminal-agent spawn survives', () => {
+    expect(CLI_SRC).toContain('terminal-agent.ts');
+    expect(CLI_SRC).toMatch(/Bun\.spawn\(\['bun',\s*'run',\s*termAgentScript\]/);
+  });
+});
+
+describe('files: sidebar-agent.ts and its tests are deleted', () => {
+  test('browse/src/sidebar-agent.ts is gone', () => {
+    expect(fs.existsSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'))).toBe(false);
+  });
+
+  test('sidebar-agent test files are gone', () => {
+    expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent.test.ts'))).toBe(false);
+    expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent-roundtrip.test.ts'))).toBe(false);
+  });
+});
+
+describe('manifest: ws permission + xterm-safe CSP', () => {
+  test('host_permissions covers ws localhost', () => {
+    expect(MANIFEST.host_permissions).toContain('ws://127.0.0.1:*/');
+  });
+
+  test('host_permissions still covers http localhost', () => {
+    expect(MANIFEST.host_permissions).toContain('http://127.0.0.1:*/');
+  });
+
+  test('manifest does NOT add unsafe-eval to extension_pages CSP', () => {
+    const csp = MANIFEST.content_security_policy;
+    if (csp && csp.extension_pages) {
+      expect(csp.extension_pages).not.toContain('unsafe-eval');
+    }
+  });
+});
@@ -0,0 +1,196 @@
+/**
+ * tab-each — fan-out command for the live Terminal pane.
+ *
+ * Source-level guards: command is registered, has a description + usage,
+ * scope-check the inner command, restore the original active tab in a
+ * finally block (so a mid-batch exception doesn't leave the user looking
+ * at a tab they didn't choose).
+ *
+ * Behavioral logic test: drive handleMetaCommand directly with a mock
+ * BrowserManager + executeCommand callback. Verify the iteration order,
+ * the JSON shape, the tab restore, and the chrome:// skip.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import { handleMetaCommand } from '../src/meta-commands';
+import { META_COMMANDS, COMMAND_DESCRIPTIONS } from '../src/commands';
+
+const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
+
+describe('tab-each: registration', () => {
+  test('command is in META_COMMANDS', () => {
+    expect(META_COMMANDS.has('tab-each')).toBe(true);
+  });
+
+  test('has a description and usage entry', () => {
+    expect(COMMAND_DESCRIPTIONS['tab-each']).toBeDefined();
+    expect(COMMAND_DESCRIPTIONS['tab-each'].usage).toContain('tab-each');
+    expect(COMMAND_DESCRIPTIONS['tab-each'].category).toBe('Tabs');
+  });
+});
+
+describe('tab-each: source-level guards', () => {
+  test('scope-checks the inner command before fanning out', () => {
+    const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"));
+    expect(block).toContain('checkScope(tokenInfo, innerName)');
+    // The scope check must run BEFORE the for-loop. If it ran inside the
+    // loop, a permission failure on the second tab would leave the first
+    // tab already mutated.
+    const checkIdx = block.indexOf('checkScope(tokenInfo, innerName)');
+    const loopIdx = block.indexOf('for (const tab of tabs)');
+    expect(checkIdx).toBeLessThan(loopIdx);
+  });
+
+  test('restores the original active tab in a finally block', () => {
+    const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
+    expect(block).toContain('finally');
+    expect(block).toContain('originalActive');
+    expect(block).toContain('switchTab(originalActive');
+  });
+
+  test('uses bringToFront: false so the OS window does NOT jump', () => {
+    const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
+    // tab-each is a background operation — pulling focus would steal the
+    // user's foreground app every time claude fans out, which is
+    // unacceptable.
+    expect(block).toContain('bringToFront: false');
+  });
+
+  test('skips chrome:// and chrome-extension:// internal pages', () => {
+    const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
+    expect(block).toContain("startsWith('chrome://')");
+    expect(block).toContain("startsWith('chrome-extension://')");
+  });
+});
+
+describe('tab-each: behavior', () => {
+  function mockBm(tabs: Array<{ id: number; url: string; title: string; active: boolean }>) {
+    let activeId = tabs.find(t => t.active)?.id ?? tabs[0]?.id ?? 0;
+    const switched: number[] = [];
+    return {
+      __switched: switched,
+      __activeId: () => activeId,
+      getActiveSession: () => ({}),
+      getActiveTabId: () => activeId,
+      getTabListWithTitles: async () => tabs.map(t => ({ ...t })),
+      switchTab: (id: number, _opts?: any) => { switched.push(id); activeId = id; },
+    } as any;
+  }
+
+  test('iterates every tab, calls executeCommand for each, returns JSON results', async () => {
+    const tabs = [
+      { id: 1, url: 'https://news.example.com', title: 'News', active: true },
+      { id: 2, url: 'https://docs.example.com', title: 'Docs', active: false },
+      { id: 3, url: 'https://github.com', title: 'GitHub', active: false },
+    ];
+    const bm = mockBm(tabs);
+    const calls: Array<{ command: string; args?: string[]; tabId?: number }> = [];
+    const out = await handleMetaCommand(
+      'tab-each',
+      ['snapshot', '-i'],
+      bm,
+      async () => {},
+      null,
+      {
+        executeCommand: async (body) => {
+          calls.push(body);
+          return { status: 200, result: `snap-of-${body.tabId}` };
+        },
+      },
+    );
+
+    const parsed = JSON.parse(out);
+    expect(parsed.command).toBe('snapshot');
+    expect(parsed.args).toEqual(['-i']);
+    expect(parsed.total).toBe(3);
+    expect(parsed.results.map((r: any) => r.tabId)).toEqual([1, 2, 3]);
+    expect(parsed.results.every((r: any) => r.status === 200)).toBe(true);
+    expect(parsed.results[0].output).toBe('snap-of-1');
+
+    // Inner command was dispatched 3 times, once per tab, with the right tabId.
+    expect(calls).toHaveLength(3);
+    expect(calls.map(c => c.tabId)).toEqual([1, 2, 3]);
+    expect(calls.every(c => c.command === 'snapshot')).toBe(true);
+  });
+
+  test('skips chrome:// pages with status=0 + "skipped" output', async () => {
+    const tabs = [
+      { id: 1, url: 'chrome://newtab', title: 'New Tab', active: true },
+      { id: 2, url: 'https://example.com', title: 'Example', active: false },
+      { id: 3, url: 'chrome-extension://abc/page.html', title: 'Ext', active: false },
+    ];
+    const bm = mockBm(tabs);
+    const calls: any[] = [];
+    const out = await handleMetaCommand(
+      'tab-each',
+      ['text'],
+      bm,
+      async () => {},
+      null,
+      {
+        executeCommand: async (body) => {
+          calls.push(body);
+          return { status: 200, result: `text-of-${body.tabId}` };
+        },
+      },
+    );
+
+    const parsed = JSON.parse(out);
+    expect(parsed.total).toBe(3);
+    // chrome:// and chrome-extension:// → skipped (status 0).
+    expect(parsed.results[0].status).toBe(0);
+    expect(parsed.results[0].output).toContain('skipped');
+    expect(parsed.results[2].status).toBe(0);
+    // Only the real tab dispatched.
+    expect(calls).toHaveLength(1);
+    expect(calls[0].tabId).toBe(2);
+  });
+
+  test('restores the originally active tab even if a tab errors', async () => {
+    const tabs = [
+      { id: 10, url: 'https://a.example', title: 'A', active: false },
+      { id: 20, url: 'https://b.example', title: 'B', active: true }, // initially active
+      { id: 30, url: 'https://c.example', title: 'C', active: false },
+    ];
+    const bm = mockBm(tabs);
+    let calls = 0;
+    const out = await handleMetaCommand(
+      'tab-each',
+      ['text'],
+      bm,
+      async () => {},
+      null,
+      {
+        executeCommand: async (body) => {
+          calls++;
+          if (body.tabId === 20) {
+            return { status: 500, result: JSON.stringify({ error: 'boom' }) };
+          }
+          return { status: 200, result: `ok-${body.tabId}` };
+        },
+      },
+    );
+
+    const parsed = JSON.parse(out);
+    expect(parsed.results.find((r: any) => r.tabId === 20).status).toBe(500);
+    expect(parsed.results.find((r: any) => r.tabId === 20).output).toBe('boom');
+    expect(parsed.results.find((r: any) => r.tabId === 10).status).toBe(200);
+    expect(parsed.results.find((r: any) => r.tabId === 30).status).toBe(200);
+    // Active tab restored to 20 (the one that was active when we started).
+    expect(bm.__activeId()).toBe(20);
+  });
+
+  test('throws on empty args (no inner command)', async () => {
+    const bm = mockBm([{ id: 1, url: 'https://x.example', title: 'X', active: true }]);
+    await expect(handleMetaCommand(
+      'tab-each',
+      [],
+      bm,
+      async () => {},
+      null,
+      { executeCommand: async () => ({ status: 200, result: '' }) },
+    )).rejects.toThrow(/Usage/);
+  });
+});
@@ -0,0 +1,273 @@
+/**
+ * Integration tests for terminal-agent.ts.
+ *
+ * Spawns the agent as a real subprocess in a temp state directory,
+ * exercises:
+ *   1. /internal/grant — loopback handshake with the internal token.
+ *   2. /ws Origin gate — non-extension Origin → 403.
+ *   3. /ws cookie gate — missing/invalid cookie → 401.
+ *   4. /ws full PTY round-trip — write `echo hi\n`, read `hi`.
+ *   5. resize control message — terminal accepts and stays alive.
+ *   6. close behavior — sending close terminates the PTY child.
+ *
+ * Uses /bin/bash via BROWSE_TERMINAL_BINARY override so CI doesn't need
+ * the `claude` binary installed.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+
+const AGENT_SCRIPT = path.join(import.meta.dir, '../src/terminal-agent.ts');
+const BASH = '/bin/bash';
+
+let stateDir: string;
+let agentProc: any;
+let agentPort: number;
+let internalToken: string;
+
+function readPortFile(): number {
+  for (let i = 0; i < 50; i++) {
+    try {
+      const v = parseInt(fs.readFileSync(path.join(stateDir, 'terminal-port'), 'utf-8').trim(), 10);
+      if (Number.isFinite(v) && v > 0) return v;
+    } catch {}
+    Bun.sleepSync(40);
+  }
+  throw new Error('terminal-agent never wrote port file');
+}
+
+function readTokenFile(): string {
+  for (let i = 0; i < 50; i++) {
+    try {
+      const t = fs.readFileSync(path.join(stateDir, 'terminal-internal-token'), 'utf-8').trim();
+      if (t.length > 16) return t;
+    } catch {}
+    Bun.sleepSync(40);
+  }
+  throw new Error('terminal-agent never wrote internal token');
+}
+
+beforeAll(() => {
+  stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-term-'));
+  const stateFile = path.join(stateDir, 'browse.json');
+  // browse.json must exist so the agent's readBrowseToken doesn't throw.
+  fs.writeFileSync(stateFile, JSON.stringify({ token: 'test-browse-token' }));
+  agentProc = Bun.spawn(['bun', 'run', AGENT_SCRIPT], {
+    env: {
+      ...process.env,
+      BROWSE_STATE_FILE: stateFile,
+      BROWSE_SERVER_PORT: '0', // not used in this test
+      BROWSE_TERMINAL_BINARY: BASH,
+    },
+    stdio: ['ignore', 'pipe', 'pipe'],
+  });
+  agentPort = readPortFile();
+  internalToken = readTokenFile();
+});
+
+afterAll(() => {
+  try { agentProc?.kill?.(); } catch {}
+  try { fs.rmSync(stateDir, { recursive: true, force: true }); } catch {}
+});
+
+async function grantToken(token: string): Promise<Response> {
+  return fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      'Authorization': `Bearer ${internalToken}`,
+    },
+    body: JSON.stringify({ token }),
+  });
+}
+
+describe('terminal-agent: /internal/grant', () => {
+  test('accepts grants signed with the internal token', async () => {
+    const resp = await grantToken('test-cookie-token-very-long-yes');
+    expect(resp.status).toBe(200);
+  });
+
+  test('rejects grants with the wrong internal token', async () => {
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'Authorization': 'Bearer wrong-token',
+      },
+      body: JSON.stringify({ token: 'whatever' }),
+    });
+    expect(resp.status).toBe(403);
+  });
+});
+
+describe('terminal-agent: /ws gates', () => {
+  test('rejects upgrade attempts without an extension Origin', async () => {
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`);
+    expect(resp.status).toBe(403);
+    expect(await resp.text()).toBe('forbidden origin');
+  });
+
+  test('rejects upgrade attempts from a non-extension Origin', async () => {
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
+      headers: { 'Origin': 'https://evil.example.com' },
+    });
+    expect(resp.status).toBe(403);
+  });
+
+  test('rejects extension-Origin upgrades without a granted cookie', async () => {
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
+      headers: {
+        'Origin': 'chrome-extension://abc123',
+        'Cookie': 'gstack_pty=never-granted',
+      },
+    });
+    expect(resp.status).toBe(401);
+  });
+});
+
+describe('terminal-agent: PTY round-trip via real WebSocket (Cookie auth)', () => {
+  test('binary writes go to PTY stdin, output streams back', async () => {
+    const cookie = 'rt-token-must-be-at-least-seventeen-chars-long';
+    const granted = await grantToken(cookie);
+    expect(granted.status).toBe(200);
+
+    const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
+      headers: {
+        'Origin': 'chrome-extension://test-extension-id',
+        'Cookie': `gstack_pty=${cookie}`,
+      },
+    } as any);
+
+    const collected: string[] = [];
+    let opened = false;
+    let closed = false;
+
+    await new Promise<void>((resolve, reject) => {
+      const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
+      ws.addEventListener('open', () => { opened = true; clearTimeout(timer); resolve(); });
+      ws.addEventListener('error', (e: any) => { clearTimeout(timer); reject(new Error('ws error')); });
+    });
+
+    ws.addEventListener('message', (ev: any) => {
+      if (typeof ev.data === 'string') return; // ignore control frames
+      const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data;
+      collected.push(new TextDecoder().decode(buf));
+    });
+
+    ws.addEventListener('close', () => { closed = true; });
+
+    // Lazy-spawn trigger: any binary frame causes the agent to spawn /bin/bash.
+    ws.send(new TextEncoder().encode('echo hello-pty-world\nexit\n'));
+
+    // Wait up to 5s for output and shutdown.
+    await new Promise<void>((resolve) => {
+      const start = Date.now();
+      const tick = () => {
+        const joined = collected.join('');
+        if (joined.includes('hello-pty-world')) return resolve();
+        if (Date.now() - start > 5000) return resolve();
+        setTimeout(tick, 50);
+      };
+      tick();
+    });
+
+    expect(opened).toBe(true);
+    const allOutput = collected.join('');
+    expect(allOutput).toContain('hello-pty-world');
+
+    try { ws.close(); } catch {}
+    // Give cleanup a moment.
+    await Bun.sleep(200);
+  });
+
+  test('Sec-WebSocket-Protocol auth path: browser-style upgrade with token in protocol', async () => {
+    // This is the path the actual browser extension takes. Cross-port
+    // SameSite=Strict cookies don't reliably survive the jump from the
+    // browse server (port A) to the agent (port B) when initiated from a
+    // chrome-extension origin, so we send the token via the only auth
+    // header the browser WebSocket API lets us set: Sec-WebSocket-Protocol.
+    //
+    // The browser sends `gstack-pty.<token>` and the agent must:
+    //   1) strip the gstack-pty. prefix
+    //   2) validate the token
+    //   3) ECHO the protocol back in the upgrade response
+    // Without (3) the browser closes the connection immediately, which
+    // is the exact bug the original cookie-only implementation hit in
+    // manual dogfood. This test catches that regression in CI.
+    const token = 'sec-protocol-token-must-be-at-least-seventeen-chars';
+    await grantToken(token);
+
+    // We exercise the protocol path by raw-handshaking via fetch+Upgrade,
+    // because Bun's test-client WebSocket constructor doesn't propagate
+    // `protocols` cleanly when also passed `headers` (the constructor
+    // detects the third-arg form unreliably). Real browsers (Chromium)
+    // use the standard protocols arg fine — the server-side handler is
+    // identical either way, so this test still locks the load-bearing
+    // invariant: the agent accepts a token via Sec-WebSocket-Protocol
+    // and echoes the protocol back so a browser would accept the upgrade.
+    const handshakeKey = 'dGhlIHNhbXBsZSBub25jZQ==';
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
+      headers: {
+        'Connection': 'Upgrade',
+        'Upgrade': 'websocket',
+        'Sec-WebSocket-Version': '13',
+        'Sec-WebSocket-Key': handshakeKey,
+        'Sec-WebSocket-Protocol': `gstack-pty.${token}`,
+        'Origin': 'chrome-extension://test-extension-id',
+      },
+    });
+
+    // 101 Switching Protocols + protocol echoed back = browser would accept.
+    // 401/403/anything else = browser would close the connection immediately
+    // (the bug we hit in manual dogfood).
+    expect(resp.status).toBe(101);
+    expect(resp.headers.get('upgrade')?.toLowerCase()).toBe('websocket');
+    expect(resp.headers.get('sec-websocket-protocol')).toBe(`gstack-pty.${token}`);
+  });
+
+  test('Sec-WebSocket-Protocol auth: rejects unknown token even with valid Origin', async () => {
+    const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
+      headers: {
+        'Connection': 'Upgrade',
+        'Upgrade': 'websocket',
+        'Sec-WebSocket-Version': '13',
+        'Sec-WebSocket-Key': 'dGhlIHNhbXBsZSBub25jZQ==',
+        'Sec-WebSocket-Protocol': 'gstack-pty.never-granted-token',
+        'Origin': 'chrome-extension://test-extension-id',
+      },
+    });
+    expect(resp.status).toBe(401);
+  });
+
+  test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => {
+    const cookie = 'resize-token-must-be-at-least-seventeen-chars';
+    await grantToken(cookie);
+
+    const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
+      headers: {
+        'Origin': 'chrome-extension://test-extension-id',
+        'Cookie': `gstack_pty=${cookie}`,
+      },
+    } as any);
+
+    await new Promise<void>((resolve, reject) => {
+      const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
+      ws.addEventListener('open', () => { clearTimeout(timer); resolve(); });
+      ws.addEventListener('error', () => { clearTimeout(timer); reject(new Error('ws error')); });
+    });
+
+    // Send a resize before anything else (lazy-spawn won't fire).
+    ws.send(JSON.stringify({ type: 'resize', cols: 120, rows: 40 }));
+
+    // After resize, send a binary frame; should still work.
+    ws.send(new TextEncoder().encode('exit\n'));
+
+    await Bun.sleep(300);
+    // ws still readyState 1 (OPEN) or 3 (CLOSED after exit) — both fine.
+    expect([WebSocket.OPEN, WebSocket.CLOSED]).toContain(ws.readyState);
+
+    try { ws.close(); } catch {}
+  });
+});
@@ -0,0 +1,223 @@
+/**
+ * Unit tests for the Terminal-tab PTY agent and its server-side glue.
+ *
+ * Coverage:
+ *   - pty-session-cookie module: mint / validate / revoke / TTL pruning.
+ *   - source-level guard: /pty-session and /terminal/* are NOT in TUNNEL_PATHS.
+ *   - source-level guard: /health does not surface ptyToken.
+ *   - source-level guard: terminal-agent binds 127.0.0.1 only.
+ *   - source-level guard: terminal-agent enforces Origin AND cookie on /ws.
+ *
+ * These are read-only checks against source — they prevent silent surface
+ * widening during a routine refactor (matches the dual-listener.test.ts
+ * pattern). End-to-end behavior (real /bin/bash PTY round-trip,
+ * tunnel-surface 404 + denial-log) lives in
+ * `browse/test/terminal-agent-integration.test.ts`.
+ */
+
+import { describe, test, expect, beforeEach } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import {
+  mintPtySessionToken, validatePtySessionToken, revokePtySessionToken,
+  extractPtyCookie, buildPtySetCookie, buildPtyClearCookie,
+  PTY_COOKIE_NAME, __resetPtySessions,
+} from '../src/pty-session-cookie';
+
+const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
+const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/terminal-agent.ts'), 'utf-8');
+
+describe('pty-session-cookie: mint/validate/revoke', () => {
+  beforeEach(() => __resetPtySessions());
+
+  test('a freshly minted token validates', () => {
+    const { token } = mintPtySessionToken();
+    expect(validatePtySessionToken(token)).toBe(true);
+  });
+
+  test('null and unknown tokens fail validation', () => {
+    expect(validatePtySessionToken(null)).toBe(false);
+    expect(validatePtySessionToken(undefined)).toBe(false);
+    expect(validatePtySessionToken('')).toBe(false);
+    expect(validatePtySessionToken('not-a-real-token')).toBe(false);
+  });
+
+  test('revoke makes a token invalid', () => {
+    const { token } = mintPtySessionToken();
+    expect(validatePtySessionToken(token)).toBe(true);
+    revokePtySessionToken(token);
+    expect(validatePtySessionToken(token)).toBe(false);
+  });
+
+  test('Set-Cookie has HttpOnly + SameSite=Strict + Path=/ + Max-Age', () => {
+    const { token } = mintPtySessionToken();
+    const cookie = buildPtySetCookie(token);
+    expect(cookie).toContain(`${PTY_COOKIE_NAME}=${token}`);
+    expect(cookie).toContain('HttpOnly');
+    expect(cookie).toContain('SameSite=Strict');
+    expect(cookie).toContain('Path=/');
+    expect(cookie).toMatch(/Max-Age=\d+/);
+    // Secure is intentionally omitted — daemon binds 127.0.0.1 over HTTP.
+    expect(cookie).not.toContain('Secure');
+  });
+
+  test('clear-cookie has Max-Age=0', () => {
+    expect(buildPtyClearCookie()).toContain('Max-Age=0');
+  });
+
+  test('extractPtyCookie reads gstack_pty from a Cookie header', () => {
+    const { token } = mintPtySessionToken();
+    const req = new Request('http://127.0.0.1/ws', {
+      headers: { 'cookie': `othercookie=foo; gstack_pty=${token}; baz=qux` },
+    });
+    expect(extractPtyCookie(req)).toBe(token);
+  });
+
+  test('extractPtyCookie returns null when the cookie is missing', () => {
+    const req = new Request('http://127.0.0.1/ws', {
+      headers: { 'cookie': 'unrelated=value' },
+    });
+    expect(extractPtyCookie(req)).toBe(null);
+  });
+});
+
+describe('Source-level guard: /pty-session is not on the tunnel surface', () => {
+  test('TUNNEL_PATHS does not include /pty-session or /terminal/*', () => {
+    const start = SERVER_SRC.indexOf('const TUNNEL_PATHS = new Set<string>([');
+    expect(start).toBeGreaterThan(-1);
+    const end = SERVER_SRC.indexOf(']);', start);
+    const body = SERVER_SRC.slice(start, end);
+    expect(body).not.toContain('/pty-session');
+    expect(body).not.toContain('/terminal/');
+    expect(body).not.toContain('/terminal-');
+  });
+});
+
+describe('Source-level guard: /health does NOT surface ptyToken', () => {
+  test('/health response body does not include ptyToken', () => {
+    const healthIdx = SERVER_SRC.indexOf("url.pathname === '/health'");
+    expect(healthIdx).toBeGreaterThan(-1);
+    // Slice from /health through the response close-bracket.
+    const slice = SERVER_SRC.slice(healthIdx, healthIdx + 2000);
+    // The /health JSON.stringify body must not mention the cookie token.
+    // It's allowed to include `terminalPort` (a port number, not auth).
+    expect(slice).not.toContain('ptyToken');
+    expect(slice).not.toContain('gstack_pty');
+    expect(slice).toContain('terminalPort');
+  });
+});
+
+describe('Source-level guard: terminal-agent', () => {
+  test('binds 127.0.0.1 only, never 0.0.0.0', () => {
+    expect(AGENT_SRC).toContain("hostname: '127.0.0.1'");
+    expect(AGENT_SRC).not.toContain("hostname: '0.0.0.0'");
+  });
+
+  test('rejects /ws upgrades without chrome-extension:// Origin', () => {
+    // The Origin check must run BEFORE the cookie check — otherwise a
+    // missing-origin attempt would surface the 401 cookie message and
+    // signal to attackers that they need to forge a cookie.
+    const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
+    expect(wsHandler).toContain('chrome-extension://');
+    expect(wsHandler).toContain('forbidden origin');
+  });
+
+  test('validates the session token against an in-memory token set', () => {
+    const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
+    // Two transports: Sec-WebSocket-Protocol (preferred for browsers) and
+    // Cookie gstack_pty (fallback). Both verify against validTokens.
+    expect(wsHandler).toContain('sec-websocket-protocol');
+    expect(wsHandler).toContain('gstack_pty');
+    expect(wsHandler).toContain('validTokens.has');
+  });
+
+  test('Sec-WebSocket-Protocol auth: strips gstack-pty. prefix and echoes back', () => {
+    const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
+    // Browsers send `Sec-WebSocket-Protocol: gstack-pty.<token>`. The agent
+    // must strip the prefix before checking validTokens, AND echo the
+    // protocol back in the upgrade response — without the echo, the
+    // browser closes the connection immediately.
+    expect(wsHandler).toContain("'gstack-pty.'");
+    expect(wsHandler).toContain('Sec-WebSocket-Protocol');
+    expect(wsHandler).toContain('acceptedProtocol');
+  });
+
+  test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => {
+    // The whole point of lazy-spawn (codex finding #8) is that the WS
+    // upgrade itself does NOT call spawnClaude. Spawn happens on first
+    // message frame.
+    const upgradeBlock = AGENT_SRC.slice(
+      AGENT_SRC.indexOf("if (url.pathname === '/ws')"),
+      AGENT_SRC.indexOf("websocket: {"),
+    );
+    expect(upgradeBlock).not.toContain('spawnClaude(');
+    // Spawn must be invoked from the message handler (lazy on first byte).
+    const messageHandler = AGENT_SRC.slice(AGENT_SRC.indexOf('message(ws, raw)'));
+    expect(messageHandler).toContain('spawnClaude(');
+    expect(messageHandler).toContain('!session.spawned');
+  });
+
+  test('process.on uncaughtException + unhandledRejection handlers exist', () => {
+    expect(AGENT_SRC).toContain("process.on('uncaughtException'");
+    expect(AGENT_SRC).toContain("process.on('unhandledRejection'");
+  });
+
+  test('cleanup escalates SIGINT to SIGKILL after 3s on close', () => {
+    // disposeSession must be idempotent and use a SIGINT-then-SIGKILL pattern.
+    const dispose = AGENT_SRC.slice(AGENT_SRC.indexOf('function disposeSession'));
+    expect(dispose).toContain("'SIGINT'");
+    expect(dispose).toContain("'SIGKILL'");
+    expect(dispose).toContain('3000');
+  });
+
+  test('tabState frames write tabs.json + active-tab.json', () => {
+    expect(AGENT_SRC).toContain("msg?.type === 'tabState'");
+    expect(AGENT_SRC).toContain('function handleTabState');
+    const fn = AGENT_SRC.slice(AGENT_SRC.indexOf('function handleTabState'));
+    // Atomic write via tmp + rename for both files (so claude never reads
+    // a half-written JSON document).
+    expect(fn).toContain("'tabs.json'");
+    expect(fn).toContain("'active-tab.json'");
+    expect(fn).toContain('renameSync');
+    // Skip chrome:// and chrome-extension:// pages — they're not useful
+    // targets for browse commands.
+    expect(fn).toContain("startsWith('chrome://')");
+    expect(fn).toContain("startsWith('chrome-extension://')");
+  });
+
+  test('claude is spawned with --append-system-prompt tab-awareness hint', () => {
+    expect(AGENT_SRC).toContain('function buildTabAwarenessHint');
+    const hint = AGENT_SRC.slice(AGENT_SRC.indexOf('function buildTabAwarenessHint'));
+    // The hint must mention the live state files and the fanout command —
+    // those are the two affordances that distinguish a gstack-PTY claude
+    // from a plain `claude` session.
+    expect(hint).toContain('tabs.json');
+    expect(hint).toContain('active-tab.json');
+    expect(hint).toContain('tab-each');
+    // And it must be passed via --append-system-prompt at spawn time
+    // (NOT written into the PTY as user input — that would pollute the
+    // visible transcript).
+    const spawn = AGENT_SRC.slice(AGENT_SRC.indexOf('function spawnClaude'));
+    expect(spawn).toContain("'--append-system-prompt'");
+    expect(spawn).toContain('tabHint');
+  });
+});
+
+describe('Source-level guard: server.ts /pty-session route', () => {
+  test('validates AUTH_TOKEN, grants over loopback, returns token + Set-Cookie', () => {
+    const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'"));
+    // Must check auth before minting.
+    const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken'));
+    expect(beforeMint).toContain('validateAuth');
+    // Must call the loopback grant before responding (otherwise the
+    // agent's validTokens Set never sees the token and /ws would 401).
+    expect(route).toContain('grantPtyToken');
+    // Must return the token in the JSON body for the
+    // Sec-WebSocket-Protocol auth path (cross-port cookies don't survive
+    // SameSite=Strict from a chrome-extension origin).
+    expect(route).toContain('ptySessionToken');
+    // Set-Cookie is kept as a fallback for non-browser callers.
+    expect(route).toContain('Set-Cookie');
+    expect(route).toContain('buildPtySetCookie');
+  });
+});