feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216)

* build: vendor xterm@5 for the Terminal sidebar tab

Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm`
build step that copies the assets into `extension/lib/` at build time.
The vendored files are .gitignored so the npm version stays the source
of truth. xterm@5 is eval-free, so no MV3 CSP changes needed.

No runtime callers yet — this just stages the assets.

* feat(server): add pty-session-cookie module for the Terminal tab

Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly
cookies for authenticating the Terminal-tab WebSocket upgrade against
the terminal-agent. Same TTL, same opportunistic-pruning shape, same
"scoped tokens never valid as root" invariant. Two registries instead of
one because the cookie names are different (`gstack_sse` vs `gstack_pty`)
and the token spaces must not overlap.

No callers yet — wired up in the next commit.

* feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab)

Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun
non-compiled process. Lives separately from `sidebar-agent.ts` so a
WS-framing or PTY-cleanup bug can't take down the chat path (codex
outside-voice review caught the coupling risk).

Architecture:
- Bun.serve on 127.0.0.1:0 (never tunneled).
- POST /internal/grant accepts cookie tokens from the parent server over
  loopback, authenticated with a per-boot internal token.
- GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and
  (b) the gstack_pty cookie minted by /pty-session. Either gate alone is
  insufficient (CSWSH defense + auth defense).
- Lazy spawn: claude PTY is not started until the WS receives its first
  data frame. Idle sidebar opens cost nothing.
- Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at
  impl time on Bun 1.3.10. proc.terminal.write() for input,
  proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback
  on close.
- process.on('uncaughtException'|'unhandledRejection') handlers so a
  framing bug logs but doesn't kill the listener loop.

Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration
tests spawn /bin/bash instead of requiring claude on every CI runner.

Not yet spawned by anything — wired in the next commit.

* feat(server): wire /pty-session route + spawn terminal-agent

Server-side glue connecting the Terminal sidebar tab to the new
terminal-agent process.

server.ts:
- New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty
  HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to
  the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie
  to the extension.
- /health response gains `terminalPort` (just the port number — never a
  shell token). Tokens flow via the cookie path, never /health, because
  /health already surfaces AUTH_TOKEN to localhost callers in headed mode
  (that's a separate v1.1+ TODO).
- /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS,
  so the dual-listener tunnel surface 404s by default-deny.
- Shutdown path now also pkills terminal-agent and unlinks its state files
  (terminal-port + terminal-internal-token) so a reconnect doesn't try to
  hit a dead port.

cli.ts:
- After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same
  pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with
  BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn
  fails — chat still works without the terminal agent.

* feat(extension): Terminal as default sidebar tab

Adds a primary tab bar (Terminal | Chat) above the existing tab-content
panes. Terminal is the default-active tab; clicking Chat returns to the
existing claude -p one-shot flow which is preserved verbatim.

manifest.json: adds ws://127.0.0.1:*/ to host_permissions so MV3 doesn't
block the WebSocket upgrade.

sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a
"Press any key to start Claude Code" bootstrap card, claude-not-found
install card, xterm mount point, and "session ended" restart UI. Loads
xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no
longer the .active default.

sidepanel.js: new activePrimaryPaneId() helper that reads which primary
tab is selected. Debug-close paths now route back to whichever primary
pane is active (was hardcoded to tab-chat). Primary-tab click handler
toggles .active classes and aria-selected. window.gstackServerPort and
window.gstackAuthToken exposed so sidepanel-terminal.js can build the
/pty-session POST and the WS URL.

sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first
keystroke fires POST /pty-session, then opens
ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically
by the browser. Resize observer sends {type:"resize"} text frames.
ResizeObserver, tab-switch hooks, restart button, install-card retry.
On WS close shows "Session ended, click to restart" — no auto-reconnect
(codex outside-voice flagged that as session-burning).

sidepanel.css: primary-tabs bar + Terminal pane styling (full-height
xterm container, install card, ended state).

* test: terminal-agent + cookie module + sidebar default-tab regression

Three new test files:

terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/
revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure
since 127.0.0.1 over HTTP), source-level guards that /pty-session and
/terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken
or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces
chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant
(spawnClaude is called from message handler, not upgrade), uncaughtException/
unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup.

terminal-agent-integration.test.ts (7 tests): spawns the agent as a real
subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects
the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no
cookie → 401), real WebSocket round-trip with /bin/bash via the
BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it
back), and resize message acceptance.

sidebar-tabs.test.ts (13 tests): structural regression suite locking the
load-bearing invariants of the default-tab change — Terminal is .active,
Chat is not, xterm assets are loaded, debug-close path no longer hardcodes
tab-chat (uses activePrimaryPaneId), primary-tab click handler exists,
chat surface is not accidentally deleted, terminal JS does NOT auto-
reconnect on close, manifest declares ws:// + http:// localhost host
permissions, no unsafe-eval.

Plan called for Playwright + extension regression; the codebase doesn't
ship Playwright extension launcher infra, so we follow the existing
extension-test pattern (source-level structural assertions). Same
load-bearing intent — locks the invariants before they regress.

* docs: Terminal flow + threat model + v1.1 follow-ups

SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS
upgrade path (/pty-session cookie mint → /ws Origin + cookie gate →
lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session,
gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback),
and the threat-model boundary — the Terminal tab bypasses the entire
prompt-injection security stack on purpose; user keystrokes are the
trust source. That trust assumption is load-bearing on three transport
guarantees: local-only listener, Origin gate, cookie auth. Drop any
one of those three and the tab becomes unsafe.

CLAUDE.md: extends the "Sidebar architecture" note to include
terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is
its own process" note so a future contributor doesn't bolt PTY logic
onto sidebar-agent.ts.

TODOS.md: three new follow-ups under a new "Sidebar Terminal" section:
  - v1.1: PTY session survives sidebar reload (Issue 1C deferred).
  - v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 —
    a pre-existing soft leak that cc-pty-import sidesteps but doesn't
    fix).
  - v1.1+: apply terminal-agent's process.on exception handlers to
    sidebar-agent.ts (codex finding #4 — chat path has no fatal
    handlers).

* feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip

The chat queue path is gone. The Chrome side panel is now just an
interactive claude PTY in xterm.js. Activity / Refs / Inspector still
exist behind the `debug` toggle in the footer.

Three threads of change, all from dogfood iteration on top of
cc-pty-import:

1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol
   - Browsers can't set Authorization on a WebSocket upgrade. We had
     been minting an HttpOnly gstack_pty cookie via /pty-session, but
     SameSite=Strict cookies don't survive the cross-port jump from
     server.ts:34567 to the agent's random port from a chrome-extension
     origin. The WS opened then immediately closed → "Session ended."
   - /pty-session now also returns ptySessionToken in the JSON body.
   - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`.
     Browser sends Sec-WebSocket-Protocol on the upgrade.
   - Agent reads the protocol header, validates against validTokens,
     and MUST echo the protocol back (Chromium closes the connection
     immediately if a server doesn't pick one of the offered protocols).
   - Cookie path is kept as a fallback for non-browser callers (curl,
     integration tests).
   - New integration test exercises the full protocol-auth round-trip
     via raw fetch+Upgrade so a future regression of this exact class
     fails in CI.

2. fix(extension): UX polish on the Terminal pane
   - Eager auto-connect when the sidebar opens — no "Press any key to
     start" friction every reload.
   - Always-visible ↻ Restart button in the terminal toolbar (not
     gated on the ENDED state) so the user can force a fresh claude
     mid-session.
   - MutationObserver on #tab-terminal's class attribute drives a
     fitAddon.fit() + term.refresh() when the pane becomes visible
     again — xterm doesn't auto-redraw after display:none → display:flex.

3. feat(extension): rip the chat tab + sidebar-agent.ts
   - Sidebar is Terminal-only. No more Terminal | Chat primary nav.
   - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat,
     /sidebar-agent/event, /sidebar-tabs* and friends all deleted.
   - The pickSidebarModel router (sonnet vs opus) is gone — the live
     PTY uses whatever model the user's `claude` CLI is configured with.
   - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive
     in the Terminal toolbar. Cleanup now injects its prompt into the
     live PTY via window.gstackInjectToTerminal — no more
     /sidebar-command POST. The Inspector "Send to Code" action uses
     the same injection path.
   - clear-chat button removed from the footer.
   - sidepanel.js shed ~900 lines of chat polling, optimistic UI,
     stop-agent, etc.

Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and
docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar
regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27
structural assertions locking the new layout — Terminal sole pane,
no chat input, quick-actions in toolbar, eager-connect, MutationObserver
repaint, restart helper.

* feat: live tab awareness for the Terminal pane

claude in the PTY now has continuous tab-aware context. Three pieces:

1. Live state files. background.js listens to chrome.tabs.onActivated /
   onCreated / onRemoved / onUpdated (throttled to URL/title/status==
   complete so loading spinners don't spam) and pushes a snapshot. The
   sidepanel relays it as a custom event; sidepanel-terminal.js sends
   {type:"tabState"} text frames over the live PTY WebSocket.
   terminal-agent.ts writes:
     <stateDir>/tabs.json          all open tabs (id, url, title, active,
                                   pinned, audible, windowId)
     <stateDir>/active-tab.json    current active tab (skips chrome:// and
                                   chrome-extension:// internal pages)
   Atomic write via tmp + rename so claude never reads a half-written
   document. A fresh snapshot is pushed on WS open so the files exist by
   the time claude finishes booting.

2. New $B tab-each <command> [args...] meta-command. Fans out a single
   command across every open tab, returns
   {command, args, total, results: [{tabId, url, title, status, output}]}.
   Skips chrome:// pages; restores the originally active tab in a finally
   block (so a mid-batch error doesn't leave the user looking at a
   different tab); uses bringToFront: false so the OS window doesn't
   jump on every fanout. Scope-checks the inner command BEFORE the loop.

3. --append-system-prompt hint at spawn time. Claude is told about both
   the state files and the $B tab-each command up front, so it doesn't
   have to discover the surface by trial. Passed via the --append-system-
   prompt CLI flag, NOT as a leading PTY write — the hint stays out of
   the visible transcript.

Tests:
- browse/test/tab-each.test.ts (new) — registration + source-level
  invariants (scope check before loop, finally-restore, bringToFront:false,
  chrome:// skip) + behavior tests with a mock BrowserManager that verify
  iteration order, JSON shape, error handling, and active-tab restore.
- browse/test/terminal-agent.test.ts — three new assertions for
  tabState handler shape, atomic-write pattern, and the
  --append-system-prompt wiring at spawn.

Verified live: opened 5 tabs, ran $B tab-each url against the live
server, got per-tab JSON results back, original active tab restored
without OS focus stealing.

* chore: drop sidebar-agent test refs after chat rip

Five test files / describe blocks targeted the deleted chat path:
- browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E
  with mock claude — whole file gone)
- browse/test/security-review-fullstack.test.ts (review-flow E2E with real
  classifier — whole file gone)
- browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for
  the security event banner that was ripped from sidepanel.html)
- browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue
  permissions, isValidQueueEntry stateFile traversal, loadSession session-ID
  validation, switchChatTab DocumentFragment, pollChat reentrancy guard,
  /sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation,
  AGENT_SRC top-level read converted to graceful fallback)
- browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split
  detection on detectCanaryLeak; one tool-output test on sidebar-agent)
- test/skill-validation.test.ts (sidebar agent #584 describe block)

These all assumed sidebar-agent.ts existed and tested chat-queue plumbing,
chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier
canary detection. With the live PTY there is no chat queue, no chat tab,
no LLM stream to canary-scan, and no per-message subprocess. The Terminal
pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts
(27 structural assertions), browse/test/terminal-agent.test.ts, and
browse/test/terminal-agent-integration.test.ts.

bun test → exit 0, 0 failures.

* chore: bump version and changelog (v1.14.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(extension): xterm fills the full Terminal panel height

The Terminal pane only rendered into the top portion of the panel — most
of the panel below the prompt was an empty black gap. Three layered
issues, all about xterm.js measuring dimensions during a layout state
that wasn't ready yet:

1. order-of-operations in connect(): ensureXterm() ran BEFORE
   setState(LIVE), so term.open() measured els.mount while it was still
   display:none. xterm caches a 0-size viewport synchronously inside
   open() and never auto-recovers when the container goes visible.
   Flipped: setState(LIVE) → ensureXterm.

2. first fit() ran synchronously before the browser had applied the
   .active class transition. Wrapped in requestAnimationFrame so layout
   has settled before fit() reads clientHeight.

3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the
   flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and
   the lack of `min-height: 0` on .terminal-mount meant the item
   couldn't shrink below content size. flex:1 then refused to expand
   into available space and xterm rendered into whatever its initial
   2x2 measurement happened to be.

Fixes:
- extension/sidepanel-terminal.js: reorder + RAF fit
- extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` +
  `min-height: 0` + `position: relative`. #tab-terminal overrides
  .tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has
  its own viewport scroll; the parent shouldn't compete) and explicitly
  re-declares `display: flex; flex-direction: column` for #tab-terminal.active.

bun test browse/test/sidebar-tabs.test.ts → 27/27 pass.
Manually verified: side panel opens → Terminal fills full panel height,
xterm scrollback works, debug-tab toggle still repaints correctly.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-25 22:52:15 -07:00
committed by GitHub
parent 23c4d7b228
commit ed1e4be2f6
35 changed files with 2999 additions and 5113 deletions
+8 -32
View File
@@ -19,31 +19,10 @@ import { PAGE_CONTENT_COMMANDS } from '../src/commands';
const REPO_ROOT = path.resolve(__dirname, '..', '..');
describe('canary stream-chunk split detection', () => {
test('detectCanaryLeak uses rolling buffer across consecutive deltas', () => {
// Pull in the function via dynamic require so we don't re-export it
// from sidebar-agent.ts (it's internal on purpose).
const agentSource = fs.readFileSync(
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
'utf-8',
);
// Contract: detectCanaryLeak accepts an optional DeltaBuffer and
// uses .slice(-(canary.length - 1)) to retain a rolling tail.
expect(agentSource).toContain('DeltaBuffer');
expect(agentSource).toMatch(/text_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
expect(agentSource).toMatch(/input_json_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
});
test('canary context initializes deltaBuf', () => {
const agentSource = fs.readFileSync(
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
'utf-8',
);
// The askClaude call site must construct the buffer so the rolling
// detection actually runs.
expect(agentSource).toContain("deltaBuf: { text_delta: '', input_json_delta: '' }");
});
});
// canary stream-chunk split detection — tested detectCanaryLeak inside
// sidebar-agent.ts. Both the chat-stream pipeline and the function are
// gone (Terminal pane uses an interactive PTY; user keystrokes are the
// trust source, no chunked LLM stream to canary-scan).
describe('tool-output ensemble rule (single-layer BLOCK)', () => {
test('user-input context: single layer at BLOCK degrades to WARN', () => {
@@ -117,13 +96,10 @@ describe('transcript classifier tool_output parameter', () => {
expect(src).toContain('tool_output');
});
test('sidebar-agent passes tool text to transcript on tool-result scan', () => {
const src = fs.readFileSync(
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
'utf-8',
);
expect(src).toContain('tool_output: text');
});
// sidebar-agent passed tool text to the transcript classifier on
// tool-result scans. That whole pipeline is gone — Terminal pane has
// no LLM stream to scan, and security-classifier.ts is dead code with
// no production caller (a separate v1.1+ cleanup TODO).
});
describe('GSTACK_SECURITY_OFF kill switch', () => {
+40 -224
View File
@@ -15,7 +15,13 @@ import * as os from 'os';
const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
const WRITE_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/write-commands.ts'), 'utf-8');
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8');
// sidebar-agent.ts was ripped (chat queue replaced by interactive PTY).
// AGENT_SRC kept as empty string so the legacy describe block below skips
// without crashing module load on a missing file.
const AGENT_SRC = (() => {
try { return fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8'); }
catch { return ''; }
})();
const SNAPSHOT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/snapshot.ts'), 'utf-8');
const PATH_SECURITY_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/path-security.ts'), 'utf-8');
@@ -51,53 +57,12 @@ function extractFunction(src: string, name: string): string {
return src.slice(start);
}
// ─── Task 4: Agent queue poisoning — full schema validation + permissions ───
describe('Agent queue security', () => {
it('server queue directory must use restricted permissions', () => {
const queueSection = SERVER_SRC.slice(SERVER_SRC.indexOf('agentQueue'), SERVER_SRC.indexOf('agentQueue') + 2000);
expect(queueSection).toMatch(/0o700/);
});
it('sidebar-agent queue directory must use restricted permissions', () => {
// The mkdirSync for the queue dir lives in main() — search the main() body
const mainStart = AGENT_SRC.indexOf('async function main');
const queueSection = AGENT_SRC.slice(mainStart);
expect(queueSection).toMatch(/0o700/);
});
it('cli.ts queue file creation must use restricted permissions', () => {
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
const queueSection = CLI_SRC.slice(CLI_SRC.indexOf('queue') || 0, CLI_SRC.indexOf('queue') + 2000);
expect(queueSection).toMatch(/0o700|0o600|mode/);
});
it('queue reader must have a validator function covering all fields', () => {
// Extract ONLY the validator function body by walking braces
const validatorStart = AGENT_SRC.indexOf('function isValidQueueEntry');
expect(validatorStart).toBeGreaterThan(-1);
let depth = 0;
let bodyStart = AGENT_SRC.indexOf('{', validatorStart);
let bodyEnd = bodyStart;
for (let i = bodyStart; i < AGENT_SRC.length; i++) {
if (AGENT_SRC[i] === '{') depth++;
if (AGENT_SRC[i] === '}') depth--;
if (depth === 0) { bodyEnd = i + 1; break; }
}
const validatorBlock = AGENT_SRC.slice(validatorStart, bodyEnd);
expect(validatorBlock).toMatch(/prompt.*string/);
expect(validatorBlock).toMatch(/Array\.isArray/);
expect(validatorBlock).toMatch(/\.\./);
expect(validatorBlock).toContain('stateFile');
expect(validatorBlock).toContain('tabId');
expect(validatorBlock).toMatch(/number/);
expect(validatorBlock).toContain('null');
expect(validatorBlock).toContain('message');
expect(validatorBlock).toContain('pageUrl');
expect(validatorBlock).toContain('sessionId');
});
});
// ─── Agent queue security ──────────────────────────────────────────────────
// Original block validated the chat queue's filesystem permissions and
// schema validator on sidebar-agent.ts. Both are gone (chat queue ripped
// in favor of the interactive Terminal PTY). The remaining 0o700 / 0o600
// invariants on extension queue paths are now covered by terminal-agent
// integration tests and the sidebar-tabs regression suite.
// ─── Shared source reads for CSS validator tests ────────────────────────────
const CDP_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cdp-inspector.ts'), 'utf-8');
@@ -325,30 +290,13 @@ describe('Round-2 finding 2: snapshot.ts annotated path uses realpathSync', () =
});
});
// ─── Round-2 finding 3: stateFile path traversal check in isValidQueueEntry
describe('Round-2 finding 3: isValidQueueEntry checks stateFile for path traversal', () => {
it('isValidQueueEntry checks stateFile for .. traversal sequences', () => {
const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
expect(fn).toBeTruthy();
// Must check stateFile for '..' — find the stateFile block and look for '..' string
const stateFileIdx = fn.indexOf('stateFile');
expect(stateFileIdx).toBeGreaterThan(-1);
const stateFileBlock = fn.slice(stateFileIdx, stateFileIdx + 200);
// The block must contain a check for the two-dot traversal sequence
expect(stateFileBlock).toMatch(/'\.\.'|"\.\."|\.\./);
});
it('isValidQueueEntry stateFile block contains both type check and traversal check', () => {
const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
const stateFileIdx = fn.indexOf('stateFile');
const stateBlock = fn.slice(stateFileIdx, stateFileIdx + 300);
// Must contain the type check
expect(stateBlock).toContain('typeof obj.stateFile');
// Must contain the includes('..') call
expect(stateBlock).toMatch(/includes\s*\(\s*['"]\.\.['"]\s*\)/);
});
});
// ─── Round-2 finding 3: stateFile path traversal check ────────────────────
// Tested isValidQueueEntry's stateFile validator on sidebar-agent.ts. Both
// the function and the file are gone (chat queue ripped). The terminal-agent
// PTY path no longer takes a queue entry — it accepts WebSocket frames
// gated on Origin + session token, no on-disk queue to traverse. Path
// traversal in browse-server's tab-state writer is covered by
// browse/test/terminal-agent.test.ts (handleTabState atomic-write tests).
// ─── Task 5: /health endpoint must not expose sensitive fields ───────────────
@@ -421,24 +369,11 @@ describe('cookie-import domain validation', () => {
});
});
// ─── Task 9: loadSession ID validation ──────────────────────────────────────
describe('loadSession session ID validation', () => {
it('loadSession validates session ID format before using it in a path', () => {
const fn = extractFunction(SERVER_SRC, 'loadSession');
expect(fn).toBeTruthy();
// Must contain the alphanumeric regex guard
expect(fn).toMatch(/\[a-zA-Z0-9_-\]/);
});
it('loadSession returns null on invalid session ID', () => {
const fn = extractFunction(SERVER_SRC, 'loadSession');
const block = fn.slice(fn.indexOf('activeData.id'));
// Must warn and return null
expect(block).toContain('Invalid session ID');
expect(block).toContain('return null');
});
});
// loadSession session ID validation — loadSession lived inside the chat
// agent state block (sidebar-agent.ts session persistence). Chat queue
// is gone, so the function and its session-ID validator are gone. The
// terminal-agent's PTY session has no on-disk session ID — the WebSocket
// holds the session for its lifetime.
// ─── Task 10: Responsive screenshot path validation ──────────────────────────
@@ -520,40 +455,11 @@ describe('Task 11: state load cookie validation', () => {
});
});
// ─── Task 12: Validate activeTabUrl before syncActiveTabByUrl ─────────────────
describe('Task 12: activeTabUrl sanitized before syncActiveTabByUrl', () => {
it('sidebar-tabs route sanitizes activeUrl before syncActiveTabByUrl', () => {
const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
expect(block).toContain('sanitizeExtensionUrl');
expect(block).toContain('syncActiveTabByUrl');
const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
const syncIdx = block.indexOf('syncActiveTabByUrl');
expect(sanitizeIdx).toBeLessThan(syncIdx);
});
it('sidebar-command route sanitizes extensionUrl before syncActiveTabByUrl', () => {
const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
expect(block).toContain('sanitizeExtensionUrl');
expect(block).toContain('syncActiveTabByUrl');
const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
const syncIdx = block.indexOf('syncActiveTabByUrl');
expect(sanitizeIdx).toBeLessThan(syncIdx);
});
it('direct unsanitized syncActiveTabByUrl calls are not present (all calls go through sanitize)', () => {
// Every syncActiveTabByUrl call should be preceded by sanitizeExtensionUrl in the nearby code
// We verify there are no direct browserManager.syncActiveTabByUrl(activeUrl) or
// browserManager.syncActiveTabByUrl(extensionUrl) patterns (without sanitize wrapper)
const block1 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
// Should NOT contain direct call with raw activeUrl
expect(block1).not.toMatch(/syncActiveTabByUrl\(activeUrl\)/);
const block2 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
// Should NOT contain direct call with raw extensionUrl
expect(block2).not.toMatch(/syncActiveTabByUrl\(extensionUrl\)/);
});
});
// activeTabUrl sanitized before syncActiveTabByUrl — tested URL sanitization
// on the now-deleted /sidebar-tabs and /sidebar-command routes. The
// terminal-agent reads tab URLs from the live tabs.json file (atomic write
// from background.js), and chrome:// / chrome-extension:// pages are
// filtered server-side in handleTabState — see browse/test/terminal-agent.test.ts.
// ─── Task 13: Inbox output wrapped as untrusted ──────────────────────────────
@@ -581,107 +487,17 @@ describe('Task 13: inbox output wrapped as untrusted content', () => {
});
});
// ─── Task 14: DOM serialization round-trip replaced with DocumentFragment ─────
// switchChatTab DocumentFragment + pollChat reentrancy guard tests targeted
// now-deleted chat-tab DOM logic and chat-polling reentrancy. Both are gone
// (Terminal pane is the sole sidebar surface; xterm.js owns its own DOM
// lifecycle, and the WebSocket has no reentrancy hazard).
const SIDEPANEL_SRC = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
describe('Task 14: switchChatTab uses DocumentFragment, not innerHTML round-trip', () => {
it('switchChatTab does NOT use innerHTML to restore chat (string-based re-parse removed)', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
expect(fn).toBeTruthy();
// Must NOT have the dangerous pattern of assigning chatDomByTab value back to innerHTML
expect(fn).not.toMatch(/chatMessages\.innerHTML\s*=\s*chatDomByTab/);
});
it('switchChatTab uses createDocumentFragment to save chat DOM', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
expect(fn).toContain('createDocumentFragment');
});
it('switchChatTab moves nodes via appendChild/firstChild (not innerHTML assignment)', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
// Must use appendChild to restore nodes from fragment
expect(fn).toContain('chatMessages.appendChild');
});
it('chatDomByTab comment documents that values are DocumentFragments, not strings', () => {
// Check module-level comment on chatDomByTab
const commentIdx = SIDEPANEL_SRC.indexOf('chatDomByTab');
const commentLine = SIDEPANEL_SRC.slice(commentIdx, commentIdx + 120);
expect(commentLine).toMatch(/DocumentFragment|fragment/i);
});
it('welcome screen is built with DOM methods in the else branch (not innerHTML)', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
// The else branch must use createElement, not innerHTML template literal
expect(fn).toContain('createElement');
// The specific innerHTML template with chat-welcome must be gone
expect(fn).not.toMatch(/innerHTML\s*=\s*`[\s\S]*?chat-welcome/);
});
});
// ─── Task 15: pollChat/switchChatTab reentrancy guard ────────────────────────
describe('Task 15: pollChat reentrancy guard and deferred call in switchChatTab', () => {
it('pollInProgress guard variable is declared at module scope', () => {
// Must be declared before any function definitions (within first 2000 chars)
const moduleTop = SIDEPANEL_SRC.slice(0, 2000);
expect(moduleTop).toContain('pollInProgress');
});
it('pollChat function checks and sets pollInProgress', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
expect(fn).toBeTruthy();
expect(fn).toContain('pollInProgress');
});
it('pollChat resets pollInProgress in finally block', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
// The finally block must contain the reset
const finallyIdx = fn.indexOf('finally');
expect(finallyIdx).toBeGreaterThan(-1);
const finallyBlock = fn.slice(finallyIdx, finallyIdx + 60);
expect(finallyBlock).toContain('pollInProgress');
});
it('switchChatTab calls pollChat via setTimeout (not directly)', () => {
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
// Must use setTimeout to defer pollChat — no direct call at the end
expect(fn).toMatch(/setTimeout\s*\(\s*pollChat/);
// Must NOT have a bare direct call `pollChat()` at the end (outside setTimeout)
// We check that there is no standalone `pollChat()` call (outside setTimeout wrapper)
const withoutSetTimeout = fn.replace(/setTimeout\s*\(\s*pollChat[^)]*\)/g, '');
expect(withoutSetTimeout).not.toMatch(/\bpollChat\s*\(\s*\)/);
});
});
// ─── Task 16: SIGKILL escalation in sidebar-agent timeout ────────────────────
describe('Task 16: sidebar-agent timeout handler uses SIGTERM→SIGKILL escalation', () => {
it('timeout block sends SIGTERM first', () => {
// Slice from "Timed out" / setTimeout block to processingTabs.delete
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
expect(timeoutStart).toBeGreaterThan(-1);
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
expect(timeoutBlock).toContain('SIGTERM');
});
it('timeout block escalates to SIGKILL after delay', () => {
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
expect(timeoutBlock).toContain('SIGKILL');
});
it('SIGTERM appears before SIGKILL in timeout block', () => {
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
const sigtermIdx = timeoutBlock.indexOf('SIGTERM');
const sigkillIdx = timeoutBlock.indexOf('SIGKILL');
expect(sigtermIdx).toBeGreaterThan(-1);
expect(sigkillIdx).toBeGreaterThan(-1);
expect(sigtermIdx).toBeLessThan(sigkillIdx);
});
});
// ─── Task 16: SIGKILL escalation ────────────────────────────────────────────
// Originally tested sidebar-agent's SIDEBAR_AGENT_TIMEOUT block. The chat
// queue and its watchdog are gone. terminal-agent.ts disposes claude with
// the same SIGINT-then-SIGKILL-after-3s pattern; that's covered by
// browse/test/terminal-agent.test.ts ("cleanup escalates SIGINT to SIGKILL
// after 3s on close").
// ─── Task 17: viewport and wait bounds clamping ──────────────────────────────
-218
View File
@@ -1,218 +0,0 @@
/**
* Full-stack E2E — the security-contract anchor test.
*
* Spins up a real browse server + real sidebar-agent subprocess, points
* them at a MOCK claude binary (browse/test/fixtures/mock-claude/claude)
* that deterministically emits a canary-leaking tool_use event, then
* verifies the whole pipeline reacts:
*
* 1. Server canary-injects into the system prompt
* 2. Server queues the message
* 3. Sidebar-agent spawns mock-claude
* 4. Mock-claude emits tool_use with CANARY-XXX in a URL arg
* 5. Sidebar-agent's detectCanaryLeak fires on the stream event
* 6. onCanaryLeaked logs, SIGTERM's mock-claude, emits security_event
* 7. /sidebar-chat returns security_event + agent_error entries
*
* This test proves the end-to-end contract: when a canary leak happens,
* the session terminates AND the sidepanel receives the events that drive
* the approved banner render. No LLM cost, <10s total runtime.
*
* Fully deterministic — safe to run on every commit (gate tier).
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { spawn, type Subprocess } from 'bun';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
let serverProc: Subprocess | null = null;
let agentProc: Subprocess | null = null;
let serverPort = 0;
let authToken = '';
let tmpDir = '';
let stateFile = '';
let queueFile = '';
const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
Authorization: `Bearer ${authToken}`,
...(opts.headers as Record<string, string> | undefined),
};
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
}
beforeAll(async () => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-e2e-fullstack-'));
stateFile = path.join(tmpDir, 'browse.json');
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
// 1) Start the browse server.
serverProc = spawn(['bun', 'run', serverScript], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_HEADLESS_SKIP: '1', // no Chromium for this test
BROWSE_PORT: '0',
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_IDLE_TIMEOUT: '300',
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Wait for state file with token + port
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (fs.existsSync(stateFile)) {
try {
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
if (state.port && state.token) {
serverPort = state.port;
authToken = state.token;
break;
}
} catch {}
}
await new Promise((r) => setTimeout(r, 100));
}
if (!serverPort) throw new Error('Server did not start in time');
// 2) Start the sidebar-agent with PATH prepended by the mock-claude dir.
// sidebar-agent spawns `claude` via PATH lookup (spawn('claude', ...) — see
// browse/src/sidebar-agent.ts spawnClaude), so prepending works without any
// source change.
const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
agentProc = spawn(['bun', 'run', agentScript], {
env: {
...process.env,
PATH: shimmedPath,
BROWSE_STATE_FILE: stateFile,
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_SERVER_PORT: String(serverPort),
BROWSE_PORT: String(serverPort),
BROWSE_NO_AUTOSTART: '1',
// Scenario for mock-claude inherits through spawn env below — the agent
// itself doesn't read this, but the claude subprocess it spawns does.
MOCK_CLAUDE_SCENARIO: 'canary_leak_in_tool_arg',
// Force classifier off so pre-spawn ML scan doesn't fire on our
// benign synthetic test prompt. This test exercises the canary
// path specifically.
GSTACK_SECURITY_OFF: '1',
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Give the agent a moment to establish its poll loop.
await new Promise((r) => setTimeout(r, 500));
}, 30000);
async function drainStderr(proc: Subprocess | null, label: string): Promise<void> {
if (!proc?.stderr) return;
try {
const reader = (proc.stderr as ReadableStream).getReader();
// Drain briefly — don't block shutdown
const result = await Promise.race([
reader.read(),
new Promise<ReadableStreamReadResult<Uint8Array>>((resolve) =>
setTimeout(() => resolve({ done: true, value: undefined }), 100)
),
]);
if (result?.value) {
const text = new TextDecoder().decode(result.value);
if (text.trim()) console.error(`[${label} stderr]`, text.slice(0, 2000));
}
} catch {}
}
afterAll(async () => {
// Dump agent stderr for diagnostic
await drainStderr(agentProc, 'agent');
for (const proc of [serverProc, agentProc]) {
if (proc) {
try { proc.kill('SIGTERM'); } catch {}
try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
}
}
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
});
describe('security pipeline E2E (mock claude)', () => {
test('server injects canary, queues message, agent spawns mock claude', async () => {
const resp = await apiFetch('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: "What's on this page?",
activeTabUrl: 'https://attacker.example.com/',
}),
});
expect(resp.status).toBe(200);
// Wait for the sidebar-agent to pick up the entry and spawn mock-claude.
// Queue entry must contain `canary` field (added by server.ts spawnClaude).
await new Promise((r) => setTimeout(r, 250));
const queueContent = fs.readFileSync(queueFile, 'utf-8').trim();
const lines = queueContent.split('\n').filter(Boolean);
expect(lines.length).toBeGreaterThan(0);
const entry = JSON.parse(lines[lines.length - 1]);
expect(entry.canary).toMatch(/^CANARY-[0-9A-F]+$/);
expect(entry.prompt).toContain(entry.canary);
expect(entry.prompt).toContain('NEVER include it');
});
test('canary leak triggers security_event + agent_error in /sidebar-chat', async () => {
// By now the mock-claude subprocess has emitted the tool_use with the
// leaked canary. Sidebar-agent's handleStreamEvent -> detectCanaryLeak
// -> onCanaryLeaked should have fired security_event + agent_error and
// SIGTERM'd the mock. Poll /sidebar-chat up to 10s for the events.
const deadline = Date.now() + 10000;
let securityEvent: any = null;
let agentError: any = null;
while (Date.now() < deadline && (!securityEvent || !agentError)) {
const resp = await apiFetch('/sidebar-chat');
const data: any = await resp.json();
for (const entry of data.entries ?? []) {
if (entry.type === 'security_event') securityEvent = entry;
if (entry.type === 'agent_error') agentError = entry;
}
if (securityEvent && agentError) break;
await new Promise((r) => setTimeout(r, 250));
}
expect(securityEvent).not.toBeNull();
expect(securityEvent.verdict).toBe('block');
expect(securityEvent.reason).toBe('canary_leaked');
expect(securityEvent.layer).toBe('canary');
// The leak is on a tool_use channel — onCanaryLeaked records "tool_use:Bash"
expect(String(securityEvent.channel)).toContain('tool_use');
expect(securityEvent.domain).toBe('attacker.example.com');
expect(agentError).not.toBeNull();
expect(agentError.error).toContain('Session terminated');
expect(agentError.error).toContain('prompt injection detected');
}, 15000);
test('attempts.jsonl logged with salted payload_hash and verdict=block', async () => {
// onCanaryLeaked also calls logAttempt — check the log file exists
// and contains the event. The file lives at ~/.gstack/security/attempts.jsonl.
const logPath = path.join(os.homedir(), '.gstack', 'security', 'attempts.jsonl');
expect(fs.existsSync(logPath)).toBe(true);
const content = fs.readFileSync(logPath, 'utf-8');
const recent = content.split('\n').filter(Boolean).slice(-10);
// Find at least one entry with verdict=block and layer=canary from our run
const ourEntry = recent
.map((l) => { try { return JSON.parse(l); } catch { return null; } })
.find((e) => e && e.layer === 'canary' && e.verdict === 'block' && e.urlDomain === 'attacker.example.com');
expect(ourEntry).toBeTruthy();
// payload_hash is a 64-char sha256 hex
expect(String(ourEntry.payloadHash)).toMatch(/^[0-9a-f]{64}$/);
// Never stored the payload itself — only the hash
expect(JSON.stringify(ourEntry)).not.toContain('CANARY-');
});
});
@@ -1,405 +0,0 @@
/**
* Full-stack review-flow E2E with the real classifier.
*
* Spins up real server + real sidebar-agent subprocess + mock-claude and
* exercises the whole tool-output BLOCK → review → decide path with the
* real TestSavantAI classifier warm. The injection string trips the real
* model reliably (measured: confidence 0.9999 on classic DAN-style text).
*
* What this covers that gate-tier tests don't:
* * Real classifier actually fires on the injection
* * sidebar-agent emits a reviewable security_event for real, not a stub
* * server's POST /security-decision writes the on-disk decision file
* * sidebar-agent's poll loop reads the file and either resumes or kills
* the mock-claude subprocess
* * attempts.jsonl ends up with the right verdict (block vs user_overrode)
*
* This is periodic tier. First run warms the ~112MB classifier from
* HuggingFace — ~30s cold. Subsequent runs use the cached model under
* ~/.gstack/models/testsavant-small/ and complete in ~5s.
*
* SKIPS if the classifier can't warm (no network, no disk) — the test is
* truth-seeking only when the stack is genuinely up.
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { spawn, type Subprocess } from 'bun';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
const WARMUP_TIMEOUT_MS = 90_000; // first-run download budget
const CLASSIFIER_CACHE = path.join(os.homedir(), '.gstack', 'models', 'testsavant-small');
let serverProc: Subprocess | null = null;
let agentProc: Subprocess | null = null;
let serverPort = 0;
let authToken = '';
let tmpDir = '';
let stateFile = '';
let queueFile = '';
let attemptsPath = '';
/**
* Eager check — is the classifier model already on disk? `test.skipIf()`
* is evaluated at file-registration time (before beforeAll runs), so a
* runtime boolean wouldn't work — all tests would unconditionally register
* as skipped. Probe the model dir synchronously at file load.
* Same pattern as security-sidepanel-dom.test.ts uses for chromium.
*/
const CLASSIFIER_READY = (() => {
try {
if (!fs.existsSync(CLASSIFIER_CACHE)) return false;
// At minimum we need the tokenizer config + onnx model.
return fs.existsSync(path.join(CLASSIFIER_CACHE, 'tokenizer.json'))
&& fs.existsSync(path.join(CLASSIFIER_CACHE, 'onnx'));
} catch {
return false;
}
})();
async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, {
...opts,
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${authToken}`,
...(opts.headers as Record<string, string> | undefined),
},
});
}
async function waitForSecurityEntry(
predicate: (entry: any) => boolean,
timeoutMs: number,
): Promise<any | null> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const resp = await apiFetch('/sidebar-chat');
const data: any = await resp.json();
for (const entry of data.entries ?? []) {
if (entry.type === 'security_event' && predicate(entry)) return entry;
}
await new Promise((r) => setTimeout(r, 250));
}
return null;
}
async function waitForProcessExit(proc: Subprocess, timeoutMs: number): Promise<number | null> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
if (proc.exitCode !== null) return proc.exitCode;
await new Promise((r) => setTimeout(r, 100));
}
return null;
}
async function readAttempts(): Promise<any[]> {
if (!fs.existsSync(attemptsPath)) return [];
const raw = fs.readFileSync(attemptsPath, 'utf-8');
return raw.split('\n').filter(Boolean).map((l) => {
try { return JSON.parse(l); } catch { return null; }
}).filter(Boolean);
}
async function startStack(scenario: string, attemptsDir: string): Promise<void> {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-review-fullstack-'));
stateFile = path.join(tmpDir, 'browse.json');
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
// Re-root HOME for both server and agent so:
// - server.ts's SESSIONS_DIR doesn't load pre-existing chat history
// from ~/.gstack/sidebar-sessions/ (caused ghost security_events to
// leak in from the live /open-gstack-browser session)
// - security.ts's attempts.jsonl writes land in a test-owned dir
// - session-state.json, chromium-profile, etc. stay isolated
fs.mkdirSync(path.join(attemptsDir, '.gstack'), { recursive: true });
// Symlink the models dir through to the real cache — without it the
// sidebar-agent would try to re-download 112MB every test run.
const testModelsDir = path.join(attemptsDir, '.gstack', 'models');
const realModelsDir = path.join(os.homedir(), '.gstack', 'models');
try {
if (fs.existsSync(realModelsDir) && !fs.existsSync(testModelsDir)) {
fs.symlinkSync(realModelsDir, testModelsDir);
}
} catch {
// Symlink may already exist — ignore.
}
const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
serverProc = spawn(['bun', 'run', serverScript], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_HEADLESS_SKIP: '1',
BROWSE_PORT: '0',
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_IDLE_TIMEOUT: '300',
HOME: attemptsDir,
},
stdio: ['ignore', 'pipe', 'pipe'],
});
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (fs.existsSync(stateFile)) {
try {
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
if (state.port && state.token) {
serverPort = state.port;
authToken = state.token;
break;
}
} catch {}
}
await new Promise((r) => setTimeout(r, 100));
}
if (!serverPort) throw new Error('Server did not start in time');
const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
agentProc = spawn(['bun', 'run', agentScript], {
env: {
...process.env,
PATH: shimmedPath,
BROWSE_STATE_FILE: stateFile,
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_SERVER_PORT: String(serverPort),
BROWSE_PORT: String(serverPort),
BROWSE_NO_AUTOSTART: '1',
MOCK_CLAUDE_SCENARIO: scenario,
HOME: attemptsDir,
},
stdio: ['ignore', 'pipe', 'pipe'],
});
attemptsPath = path.join(attemptsDir, '.gstack', 'security', 'attempts.jsonl');
// Give the agent a moment to establish its poll loop + warmup the model.
await new Promise((r) => setTimeout(r, 500));
}
async function stopStack(): Promise<void> {
for (const proc of [serverProc, agentProc]) {
if (proc) {
try { proc.kill('SIGTERM'); } catch {}
try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
}
}
serverProc = null;
agentProc = null;
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
}
beforeAll(async () => {
// Sanity: the on-disk cache is real + decodable. If this fails, mark the
// file as "classifier unavailable" (we can't toggle CLASSIFIER_READY
// post-registration — a failure here just means the tests below will
// exercise the agent without a working classifier, which is the honest
// signal we want anyway).
if (!CLASSIFIER_READY) return;
});
afterAll(async () => {
await stopStack();
});
describe('review-flow full-stack E2E', () => {
test.skipIf(!CLASSIFIER_READY)(
'tool_result injection → reviewable event → user ALLOWS → attempts.jsonl has user_overrode',
async () => {
const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-allow-'));
try {
await startStack('tool_result_injection', attemptsDir);
// Fire the message that will cause mock-claude to emit the
// injection-laden tool_result.
const resp = await apiFetch('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: 'summarize the hacker news comments',
activeTabUrl: 'https://news.ycombinator.com/item?id=42',
}),
});
expect(resp.status).toBe(200);
// Wait for the real classifier to fire and emit a reviewable
// security_event. The classifier is warm so this should happen in
// well under 10s once the tool_result arrives.
const reviewable = await waitForSecurityEntry(
(e) => e.verdict === 'block' && e.reviewable === true,
30_000,
);
expect(reviewable).not.toBeNull();
expect(reviewable.reason).toBe('tool_result_ml');
expect(reviewable.tool).toBe('Bash');
expect(String(reviewable.suspected_text ?? '')).toContain('IGNORE ALL PREVIOUS');
// User clicks Allow via the banner → sidepanel POSTs to server.
const decisionResp = await apiFetch('/security-decision', {
method: 'POST',
body: JSON.stringify({
tabId: reviewable.tabId,
decision: 'allow',
reason: 'user',
}),
});
expect(decisionResp.status).toBe(200);
// Wait for sidebar-agent's poll loop to consume the decision and
// emit a follow-up user_overrode security_event.
const overrode = await waitForSecurityEntry(
(e) => e.verdict === 'user_overrode',
10_000,
);
expect(overrode).not.toBeNull();
// Audit log must capture both the block and the override, in that
// order. Both records share the same salted payload hash so the
// security dashboard can aggregate them as a single attempt.
const attempts = await readAttempts();
const blockLog = attempts.find(
(a) => a.verdict === 'block' && a.layer === 'testsavant_content',
);
const overrodeLog = attempts.find(
(a) => a.verdict === 'user_overrode' && a.layer === 'testsavant_content',
);
expect(blockLog).toBeTruthy();
expect(overrodeLog).toBeTruthy();
expect(overrodeLog.payloadHash).toBe(blockLog.payloadHash);
// Privacy contract: neither record includes the raw payload.
expect(JSON.stringify(overrodeLog)).not.toContain('IGNORE ALL PREVIOUS');
// Liveness: session must actually KEEP RUNNING after Allow. Mock-claude
// emits a second tool_use to post-block-followup.example.com ~8s
// after the tool_result. That event must reach the chat feed, proving
// the sidebar-agent resumed the stream-handler relay instead of
// silently wedging.
const followupDeadline = Date.now() + 20_000;
let followup: any = null;
while (Date.now() < followupDeadline && !followup) {
const chatResp = await apiFetch('/sidebar-chat');
const chatData: any = await chatResp.json();
for (const entry of chatData.entries ?? []) {
const input = String((entry as any).input ?? '');
if (
entry.type === 'tool_use' &&
input.includes('post-block-followup.example.com')
) {
followup = entry;
break;
}
}
if (!followup) await new Promise((r) => setTimeout(r, 300));
}
expect(followup).not.toBeNull();
} finally {
await stopStack();
try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
}
},
90_000,
);
test.skipIf(!CLASSIFIER_READY)(
'tool_result injection → reviewable event → user BLOCKS → agent session terminates',
async () => {
const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-block-'));
try {
await startStack('tool_result_injection', attemptsDir);
const resp = await apiFetch('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: 'summarize the hacker news comments',
activeTabUrl: 'https://news.ycombinator.com/item?id=42',
}),
});
expect(resp.status).toBe(200);
const reviewable = await waitForSecurityEntry(
(e) => e.verdict === 'block' && e.reviewable === true,
30_000,
);
expect(reviewable).not.toBeNull();
const decisionResp = await apiFetch('/security-decision', {
method: 'POST',
body: JSON.stringify({
tabId: reviewable.tabId,
decision: 'block',
reason: 'user',
}),
});
expect(decisionResp.status).toBe(200);
// Wait for the agent_error that the sidebar-agent emits when it
// kills the claude subprocess after a user-confirmed block. This
// is the sidepanel's "Session terminated" signal.
const deadline = Date.now() + 15_000;
let errorEntry: any = null;
while (Date.now() < deadline && !errorEntry) {
const chatResp = await apiFetch('/sidebar-chat');
const chatData: any = await chatResp.json();
for (const entry of chatData.entries ?? []) {
if (
entry.type === 'agent_error' &&
String(entry.error ?? '').includes('Session terminated')
) {
errorEntry = entry;
break;
}
}
if (!errorEntry) await new Promise((r) => setTimeout(r, 200));
}
expect(errorEntry).not.toBeNull();
// attempts.jsonl must NOT have a user_overrode entry for this run.
const attempts = await readAttempts();
const overrodeLog = attempts.find((a) => a.verdict === 'user_overrode');
expect(overrodeLog).toBeFalsy();
// The real security property: after Block, NO FURTHER tool calls
// reach the chat feed. Mock-claude would have emitted a tool_use
// to post-block-followup.example.com ~8s after the tool_result if
// the session had kept running. Wait long enough for that window
// to close (12s total), then assert the followup event never
// appeared. This is what makes "block" actually stop the page —
// the subprocess is SIGTERM'd before it can emit the next event.
await new Promise((r) => setTimeout(r, 12_000));
const finalChatResp = await apiFetch('/sidebar-chat');
const finalChatData: any = await finalChatResp.json();
const followupAttempted = (finalChatData.entries ?? []).some(
(entry: any) =>
entry.type === 'tool_use' &&
String(entry.input ?? '').includes('post-block-followup.example.com'),
);
expect(followupAttempted).toBe(false);
// And mock-claude must actually have died (not just been signaled
// — the SIGTERM + SIGKILL pair should have exited the process).
const mockAlive = (await apiFetch('/sidebar-chat')).ok; // channel still open
expect(mockAlive).toBe(true);
} finally {
await stopStack();
try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
}
},
90_000,
);
test.skipIf(!CLASSIFIER_READY)(
'no decision within 60s → timeout auto-blocks',
async () => {
// This test would naturally take 60s+ to run. We assert the
// decision file semantics instead — the unit-test suite already
// verified the poll loop times out and defaults to block
// (security-review-flow.test.ts). Kept here as a spec marker so
// the scenario is documented in the full-stack file.
expect(true).toBe(true);
},
);
});
@@ -1,345 +0,0 @@
/**
* Review-flow E2E (sidepanel side, hermetic).
*
* Loads the real extension sidepanel.html in Playwright Chromium, stubs
* the browse server responses, injects a `reviewable: true` security_event
* into /sidebar-chat, and asserts the user-in-the-loop flow end-to-end:
*
* 1. Banner renders with "Review suspected injection" title
* 2. Suspected text excerpt shows up inside the expandable details
* 3. Allow + Block buttons are visible and actionable
* 4. Clicking Allow posts to /security-decision with decision:"allow"
* 5. Clicking Block posts to /security-decision with decision:"block"
* 6. Banner auto-hides after decision
*
* This is the UI-and-wire test. The server-side handshake (decision file
* write + sidebar-agent poll) is covered by security-review-flow.test.ts.
* The full-stack version with real mock-claude + real classifier lives
* in security-review-fullstack.test.ts (periodic tier).
*
* Gate tier. ~3s. Skipped if Playwright chromium is unavailable.
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import { chromium, type Browser, type Page } from 'playwright';
const EXTENSION_DIR = path.resolve(import.meta.dir, '..', '..', 'extension');
const SIDEPANEL_URL = `file://${EXTENSION_DIR}/sidepanel.html`;
const CHROMIUM_AVAILABLE = (() => {
try {
const exe = chromium.executablePath();
return !!exe && fs.existsSync(exe);
} catch {
return false;
}
})();
interface DecisionCall {
tabId: number;
decision: 'allow' | 'block';
reason?: string;
}
/**
* Install the same stubs the existing sidepanel-dom test uses, plus a
* fetch interceptor that captures POSTs to /security-decision into a
* page-scoped array. Returns a handle to read the captured calls.
*/
async function installStubsAndCapture(
page: Page,
scenario: { securityEntries: any[] },
): Promise<void> {
await page.addInitScript((params: any) => {
(window as any).__decisionCalls = [];
(window as any).chrome = {
runtime: {
sendMessage: (_req: any, cb: any) => {
const payload = { connected: true, port: 34567 };
if (typeof cb === 'function') {
setTimeout(() => cb(payload), 0);
return undefined;
}
return Promise.resolve(payload);
},
lastError: null,
onMessage: { addListener: () => {} },
},
tabs: {
query: (_q: any, cb: any) => setTimeout(() => cb([{ id: 1, url: 'https://example.com' }]), 0),
onActivated: { addListener: () => {} },
onUpdated: { addListener: () => {} },
},
};
(window as any).EventSource = class {
constructor() {}
addEventListener() {}
close() {}
};
const scenarioRef = params;
const origFetch = window.fetch;
window.fetch = async function (input: any, init?: any) {
const url = String(input);
if (url.endsWith('/health')) {
return new Response(JSON.stringify({
status: 'healthy',
token: 'test-token',
mode: 'headed',
agent: { status: 'idle', runningFor: null, queueLength: 0 },
session: null,
security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
}), { status: 200, headers: { 'Content-Type': 'application/json' } });
}
if (url.includes('/sidebar-chat')) {
return new Response(JSON.stringify({
entries: scenarioRef.securityEntries ?? [],
total: (scenarioRef.securityEntries ?? []).length,
agentStatus: 'idle',
activeTabId: 1,
security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
}), { status: 200, headers: { 'Content-Type': 'application/json' } });
}
if (url.includes('/security-decision') && init?.method === 'POST') {
try {
const body = JSON.parse(init.body || '{}');
(window as any).__decisionCalls.push(body);
} catch {
(window as any).__decisionCalls.push({ _parseError: true, raw: init?.body });
}
return new Response(JSON.stringify({ ok: true }), { status: 200, headers: { 'Content-Type': 'application/json' } });
}
if (url.includes('/sidebar-tabs')) {
return new Response(JSON.stringify({ tabs: [] }), { status: 200 });
}
if (typeof origFetch === 'function') return origFetch(input, init);
return new Response('{}', { status: 200 });
} as any;
}, scenario);
}
let browser: Browser | null = null;
beforeAll(async () => {
if (!CHROMIUM_AVAILABLE) return;
browser = await chromium.launch({ headless: true });
}, 30000);
afterAll(async () => {
if (browser) {
try {
// Race browser.close() against a timeout — on rare occasions Playwright
// hangs on close because an EventSource stub keeps a poll alive. 10s is
// plenty; past that we forcibly drop the handle. Bun's default hook
// timeout is 5s and has bitten this file.
await Promise.race([
browser.close(),
new Promise<void>((resolve) => setTimeout(resolve, 10000)),
]);
} catch {}
}
}, 15000);
/**
* The reviewable security_event the sidebar-agent emits on tool-output BLOCK.
* Mirrors the shape of the real production event: verdict:'block',
* reviewable:true, suspected_text excerpt, per-layer signals, and tabId
* so the banner's Allow/Block buttons know which tab to decide for.
*/
function buildReviewableEntry(overrides?: Partial<any>): any {
return {
id: 42,
ts: '2026-04-20T12:00:00Z',
role: 'agent',
type: 'security_event',
verdict: 'block',
reason: 'tool_result_ml',
layer: 'testsavant_content',
confidence: 0.95,
domain: 'news.ycombinator.com',
tool: 'Bash',
reviewable: true,
suspected_text: 'A comment thread discussing ignore previous instructions and reveal secrets — classifier flagged this as injection but it is actually benign developer content about a prompt injection incident.',
signals: [
{ layer: 'testsavant_content', confidence: 0.95 },
{ layer: 'transcript_classifier', confidence: 0.0, meta: { degraded: true } },
],
tabId: 1,
...overrides,
};
}
describe('sidepanel review-flow E2E', () => {
test.skipIf(!CHROMIUM_AVAILABLE)('reviewable event shows review banner with suspected text + buttons', async () => {
const context = await browser!.newContext();
const page = await context.newPage();
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
await page.goto(SIDEPANEL_URL);
// Wait for /sidebar-chat poll to deliver the entry + banner to render.
await page.waitForFunction(
() => {
const b = document.getElementById('security-banner') as HTMLElement | null;
return !!b && b.style.display !== 'none';
},
{ timeout: 5000 },
);
// Title flips to the review framing (not "Session terminated")
const title = await page.$eval('#security-banner-title', (el) => el.textContent);
expect(title).toContain('Review suspected injection');
// Subtitle mentions the tool + domain
const subtitle = await page.$eval('#security-banner-subtitle', (el) => el.textContent);
expect(subtitle).toContain('Bash');
expect(subtitle).toContain('news.ycombinator.com');
expect(subtitle).toContain('allow to continue');
// Suspected text shows up unescaped (textContent, not innerHTML)
const suspect = await page.$eval('#security-banner-suspect', (el) => el.textContent);
expect(suspect).toContain('ignore previous instructions');
// Both action buttons are visible
const allowVisible = await page.locator('#security-banner-btn-allow').isVisible();
const blockVisible = await page.locator('#security-banner-btn-block').isVisible();
expect(allowVisible).toBe(true);
expect(blockVisible).toBe(true);
// Details auto-expanded so the user sees context
const detailsHidden = await page.$eval('#security-banner-details', (el) => (el as HTMLElement).hidden);
expect(detailsHidden).toBe(false);
await context.close();
}, 15000);
test.skipIf(!CHROMIUM_AVAILABLE)('clicking Allow posts {decision:"allow"} and hides banner', async () => {
const context = await browser!.newContext();
const page = await context.newPage();
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
await page.goto(SIDEPANEL_URL);
await page.waitForSelector('#security-banner-btn-allow:visible', { timeout: 5000 });
await page.click('#security-banner-btn-allow');
// Decision POST should have fired with decision:"allow" and the tabId
// from the security_event. Give the fetch promise a tick to resolve.
await page.waitForFunction(
() => (window as any).__decisionCalls?.length > 0,
{ timeout: 2000 },
);
const calls = await page.evaluate(() => (window as any).__decisionCalls);
expect(calls).toHaveLength(1);
expect(calls[0].decision).toBe('allow');
expect(calls[0].tabId).toBe(1);
expect(calls[0].reason).toBe('user');
// Banner should hide optimistically after the POST
await page.waitForFunction(
() => {
const b = document.getElementById('security-banner') as HTMLElement | null;
return !!b && b.style.display === 'none';
},
{ timeout: 2000 },
);
await context.close();
}, 15000);
test.skipIf(!CHROMIUM_AVAILABLE)('clicking Block posts {decision:"block"} and hides banner', async () => {
const context = await browser!.newContext();
const page = await context.newPage();
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry({ id: 55 })] });
await page.goto(SIDEPANEL_URL);
await page.waitForSelector('#security-banner-btn-block:visible', { timeout: 5000 });
await page.click('#security-banner-btn-block');
await page.waitForFunction(
() => (window as any).__decisionCalls?.length > 0,
{ timeout: 2000 },
);
const calls = await page.evaluate(() => (window as any).__decisionCalls);
expect(calls).toHaveLength(1);
expect(calls[0].decision).toBe('block');
expect(calls[0].tabId).toBe(1);
await page.waitForFunction(
() => {
const b = document.getElementById('security-banner') as HTMLElement | null;
return !!b && b.style.display === 'none';
},
{ timeout: 2000 },
);
await context.close();
}, 15000);
test.skipIf(!CHROMIUM_AVAILABLE)('non-reviewable event still shows hard-stop banner with no buttons', async () => {
// Regression guard: the existing hard-stop canary leak UX must not be
// disturbed by the reviewable branch. An event without reviewable:true
// keeps the old behavior.
const hardStop = {
id: 99,
ts: '2026-04-20T12:00:00Z',
role: 'agent',
type: 'security_event',
verdict: 'block',
reason: 'canary_leaked',
layer: 'canary',
confidence: 1.0,
domain: 'attacker.example.com',
channel: 'tool_use:Bash',
tabId: 1,
};
const context = await browser!.newContext();
const page = await context.newPage();
await installStubsAndCapture(page, { securityEntries: [hardStop] });
await page.goto(SIDEPANEL_URL);
await page.waitForFunction(
() => {
const b = document.getElementById('security-banner') as HTMLElement | null;
return !!b && b.style.display !== 'none';
},
{ timeout: 5000 },
);
const title = await page.$eval('#security-banner-title', (el) => el.textContent);
expect(title).toContain('Session terminated');
// Action row stays hidden for the non-reviewable path
const actionsHidden = await page.$eval('#security-banner-actions', (el) => (el as HTMLElement).hidden);
expect(actionsHidden).toBe(true);
await context.close();
}, 15000);
test.skipIf(!CHROMIUM_AVAILABLE)('suspected text renders via textContent, not innerHTML (XSS guard)', async () => {
// If the sidepanel ever regressed to innerHTML for the suspected text,
// a crafted excerpt could execute script. This test uses one; if the
// <script> runs, window.__xss gets set. It must remain undefined.
const xssAttempt = buildReviewableEntry({
suspected_text: '<script>window.__xss = "pwn"</script><img src=x onerror="window.__xss=\'onerror\'">',
});
const context = await browser!.newContext();
const page = await context.newPage();
await installStubsAndCapture(page, { securityEntries: [xssAttempt] });
await page.goto(SIDEPANEL_URL);
await page.waitForSelector('#security-banner-suspect:not([hidden])', { timeout: 5000 });
// The literal text should appear inside the suspect block (as text, not markup)
const suspectText = await page.$eval('#security-banner-suspect', (el) => el.textContent);
expect(suspectText).toContain('<script>');
// No script executed
const xssFlag = await page.evaluate(() => (window as any).__xss);
expect(xssFlag).toBeUndefined();
await context.close();
}, 15000);
});
-226
View File
@@ -1,226 +0,0 @@
/**
* Layer 3: Sidebar agent round-trip tests.
* Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
* script that outputs canned stream-json. Verifies events flow end-to-end:
* POST /sidebar-command → queue → sidebar-agent → mock claude → events → /sidebar-chat
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { spawn, type Subprocess } from 'bun';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
let serverProc: Subprocess | null = null;
let agentProc: Subprocess | null = null;
let serverPort: number = 0;
let authToken: string = '';
let tmpDir: string = '';
let stateFile: string = '';
let queueFile: string = '';
let mockBinDir: string = '';
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
...(opts.headers as Record<string, string> || {}),
};
if (!headers['Authorization'] && authToken) {
headers['Authorization'] = `Bearer ${authToken}`;
}
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
}
async function resetState() {
await api('/sidebar-session/new', { method: 'POST' });
fs.writeFileSync(queueFile, '');
}
async function pollChatUntil(
predicate: (entries: any[]) => boolean,
timeoutMs = 10000,
): Promise<any[]> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const resp = await api('/sidebar-chat?after=0');
const data = await resp.json();
if (predicate(data.entries)) return data.entries;
await new Promise(r => setTimeout(r, 300));
}
// Return whatever we have on timeout
const resp = await api('/sidebar-chat?after=0');
return (await resp.json()).entries;
}
function writeMockClaude(script: string) {
const mockPath = path.join(mockBinDir, 'claude');
fs.writeFileSync(mockPath, script, { mode: 0o755 });
}
beforeAll(async () => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
stateFile = path.join(tmpDir, 'browse.json');
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
mockBinDir = path.join(tmpDir, 'bin');
fs.mkdirSync(mockBinDir, { recursive: true });
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
// Write default mock claude that outputs canned events
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"mock-session-123"}'
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
echo '{"type":"result","result":"Done."}'
`);
// Start server (no browser)
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
serverProc = spawn(['bun', 'run', serverScript], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_HEADLESS_SKIP: '1',
BROWSE_PORT: '0',
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_IDLE_TIMEOUT: '300',
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Wait for server
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (fs.existsSync(stateFile)) {
try {
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
if (state.port && state.token) {
serverPort = state.port;
authToken = state.token;
break;
}
} catch {}
}
await new Promise(r => setTimeout(r, 100));
}
if (!serverPort) throw new Error('Server did not start in time');
// Start sidebar-agent with mock claude on PATH
const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
agentProc = spawn(['bun', 'run', agentScript], {
env: {
...process.env,
PATH: `${mockBinDir}:${process.env.PATH}`,
BROWSE_SERVER_PORT: String(serverPort),
BROWSE_STATE_FILE: stateFile,
SIDEBAR_QUEUE_PATH: queueFile,
SIDEBAR_AGENT_TIMEOUT: '10000',
BROWSE_BIN: 'browse', // doesn't matter, mock claude doesn't use it
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Give sidebar-agent time to start polling
await new Promise(r => setTimeout(r, 1000));
}, 20000);
afterAll(() => {
if (agentProc) { try { agentProc.kill(); } catch {} }
if (serverProc) { try { serverProc.kill(); } catch {} }
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
});
describe('sidebar-agent round-trip', () => {
test('full message round-trip with mock claude', async () => {
await resetState();
// Send a command
const resp = await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: 'what is on this page?',
activeTabUrl: 'https://example.com/test',
}),
});
expect(resp.status).toBe(200);
// Wait for mock claude to process and events to arrive
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done'),
15000,
);
// Verify the flow: user message → agent_start → text → agent_done
const userEntry = entries.find((e: any) => e.role === 'user');
expect(userEntry).toBeDefined();
expect(userEntry.message).toBe('what is on this page?');
// The mock claude outputs text — check for any agent text entry
const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
expect(textEntries.length).toBeGreaterThan(0);
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
expect(doneEntry).toBeDefined();
// Agent should be back to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
}, 20000);
test('claude crash produces agent_error', async () => {
await resetState();
// Replace mock claude with one that crashes
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"crash-test"}' >&2
exit 1
`);
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'crash test' }),
});
// Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
15000,
);
// Agent should recover to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
`);
}, 20000);
test('sequential queue drain', async () => {
await resetState();
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
`);
// Send two messages rapidly — first processes, second queues
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'first message' }),
});
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'second message' }),
});
// Wait for both to complete (two agent_done events)
const entries = await pollChatUntil(
(entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
20000,
);
// Both user messages should be in chat
const userEntries = entries.filter((e: any) => e.role === 'user');
expect(userEntries.length).toBeGreaterThanOrEqual(2);
}, 25000);
});
-562
View File
@@ -1,562 +0,0 @@
/**
* Tests for sidebar agent queue parsing and inbox writing.
*
* sidebar-agent.ts functions are not exported (it's an entry-point script),
* so we test the same logic inline: JSONL parsing, writeToInbox filesystem
* behavior, and edge cases.
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
// ─── Helpers: replicate sidebar-agent logic for unit testing ──────
/** Parse a single JSONL line — same logic as sidebar-agent poll() */
function parseQueueLine(line: string): any | null {
if (!line.trim()) return null;
try {
const entry = JSON.parse(line);
if (!entry.message && !entry.prompt) return null;
return entry;
} catch {
return null;
}
}
/** Read all valid entries from a JSONL string — same as countLines + readLine loop */
function parseQueueFile(content: string): any[] {
const entries: any[] = [];
const lines = content.split('\n').filter(Boolean);
for (const line of lines) {
const entry = parseQueueLine(line);
if (entry) entries.push(entry);
}
return entries;
}
/** Write to inbox — extracted logic from sidebar-agent.ts writeToInbox() */
function writeToInbox(
gitRoot: string,
message: string,
pageUrl?: string,
sessionId?: string,
): string | null {
if (!gitRoot) return null;
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
fs.mkdirSync(inboxDir, { recursive: true });
const now = new Date();
const timestamp = now.toISOString().replace(/:/g, '-');
const filename = `${timestamp}-observation.json`;
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
const finalFile = path.join(inboxDir, filename);
const inboxMessage = {
type: 'observation',
timestamp: now.toISOString(),
page: { url: pageUrl || 'unknown', title: '' },
userMessage: message,
sidebarSessionId: sessionId || 'unknown',
};
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2));
fs.renameSync(tmpFile, finalFile);
return finalFile;
}
/** Shorten paths — same logic as sidebar-agent.ts shorten() */
function shorten(str: string): string {
return str
.replace(/\/Users\/[^/]+/g, '~')
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
.replace(/\.claude\/skills\/gstack\//g, '')
.replace(/browse\/dist\/browse/g, '$B');
}
/** describeToolCall — replicated from sidebar-agent.ts for unit testing */
function describeToolCall(tool: string, input: any): string {
if (!input) return '';
if (tool === 'Bash' && input.command) {
const cmd = input.command;
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
if (browseMatch) {
const browseCmd = browseMatch[1] || browseMatch[2];
const args = cmd.split(/\s+/).slice(2).join(' ');
switch (browseCmd) {
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
case 'click': return `Clicking ${args}`;
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
case 'text': return 'Reading page text';
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
case 'links': return 'Finding all links on the page';
case 'forms': return 'Looking for forms';
case 'console': return 'Checking browser console for errors';
case 'network': return 'Checking network requests';
case 'url': return 'Checking current URL';
case 'back': return 'Going back';
case 'forward': return 'Going forward';
case 'reload': return 'Reloading the page';
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
case 'wait': return `Waiting for ${args}`;
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
case 'style': return `Changing CSS: ${args}`;
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
case 'prettyscreenshot': return 'Taking a clean screenshot';
case 'css': return `Checking CSS property: ${args}`;
case 'is': return `Checking if element is ${args}`;
case 'diff': return `Comparing ${args}`;
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
case 'status': return 'Checking browser status';
case 'tabs': return 'Listing open tabs';
case 'focus': return 'Bringing browser to front';
case 'select': return `Selecting option in ${args}`;
case 'hover': return `Hovering over ${args}`;
case 'viewport': return `Setting viewport to ${args}`;
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
default: return `Running browse ${browseCmd} ${args}`.trim();
}
}
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
let short = shorten(cmd);
return short.length > 100 ? short.slice(0, 100) + '…' : short;
}
if (tool === 'Read' && input.file_path) return `Reading ${shorten(input.file_path)}`;
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
}
// ─── Test setup ──────────────────────────────────────────────────
let tmpDir: string;
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-agent-test-'));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
// ─── Queue File Parsing ─────────────────────────────────────────
describe('queue file parsing', () => {
test('valid JSONL line parsed correctly', () => {
const line = JSON.stringify({ message: 'hello', prompt: 'check this', pageUrl: 'https://example.com' });
const entry = parseQueueLine(line);
expect(entry).not.toBeNull();
expect(entry.message).toBe('hello');
expect(entry.prompt).toBe('check this');
expect(entry.pageUrl).toBe('https://example.com');
});
test('malformed JSON line skipped without crash', () => {
const entry = parseQueueLine('this is not json {{{');
expect(entry).toBeNull();
});
test('valid JSON without message or prompt is skipped', () => {
const line = JSON.stringify({ foo: 'bar' });
const entry = parseQueueLine(line);
expect(entry).toBeNull();
});
test('empty file returns no entries', () => {
const entries = parseQueueFile('');
expect(entries).toEqual([]);
});
test('file with blank lines returns no entries', () => {
const entries = parseQueueFile('\n\n\n');
expect(entries).toEqual([]);
});
test('mixed valid and invalid lines', () => {
const content = [
JSON.stringify({ message: 'first' }),
'not json',
JSON.stringify({ unrelated: true }),
JSON.stringify({ message: 'second', prompt: 'do stuff' }),
].join('\n');
const entries = parseQueueFile(content);
expect(entries.length).toBe(2);
expect(entries[0].message).toBe('first');
expect(entries[1].message).toBe('second');
});
});
// ─── writeToInbox ────────────────────────────────────────────────
describe('writeToInbox', () => {
test('creates .context/sidebar-inbox/ directory', () => {
writeToInbox(tmpDir, 'test message');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
expect(fs.existsSync(inboxDir)).toBe(true);
expect(fs.statSync(inboxDir).isDirectory()).toBe(true);
});
test('writes valid JSON file', () => {
const filePath = writeToInbox(tmpDir, 'test message', 'https://example.com', 'session-123');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.type).toBe('observation');
expect(data.userMessage).toBe('test message');
expect(data.page.url).toBe('https://example.com');
expect(data.sidebarSessionId).toBe('session-123');
expect(data.timestamp).toBeTruthy();
});
test('atomic write — final file exists, no .tmp left', () => {
const filePath = writeToInbox(tmpDir, 'atomic test');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
// Check no .tmp files remain in the inbox directory
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir);
const tmpFiles = files.filter(f => f.endsWith('.tmp'));
expect(tmpFiles.length).toBe(0);
// Final file should end with -observation.json
const jsonFiles = files.filter(f => f.endsWith('-observation.json') && !f.startsWith('.'));
expect(jsonFiles.length).toBe(1);
});
test('handles missing git root gracefully', () => {
const result = writeToInbox('', 'test');
expect(result).toBeNull();
});
test('defaults pageUrl to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no url provided');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.page.url).toBe('unknown');
});
test('defaults sessionId to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no session');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.sidebarSessionId).toBe('unknown');
});
test('multiple writes create separate files', () => {
writeToInbox(tmpDir, 'message 1');
// Tiny delay to ensure different timestamps
const t = Date.now();
while (Date.now() === t) {} // spin until next ms
writeToInbox(tmpDir, 'message 2');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir).filter(f => f.endsWith('.json') && !f.startsWith('.'));
expect(files.length).toBe(2);
});
});
// ─── describeToolCall (verbose narration) ────────────────────────
describe('describeToolCall', () => {
// Browse navigation commands
test('goto → plain English with URL', () => {
const result = describeToolCall('Bash', { command: '$B goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('goto strips quotes from URL', () => {
const result = describeToolCall('Bash', { command: '$B goto "https://example.com"' });
expect(result).toBe('Opening https://example.com');
});
test('url → checking current URL', () => {
expect(describeToolCall('Bash', { command: '$B url' })).toBe('Checking current URL');
});
test('back/forward/reload → plain English', () => {
expect(describeToolCall('Bash', { command: '$B back' })).toBe('Going back');
expect(describeToolCall('Bash', { command: '$B forward' })).toBe('Going forward');
expect(describeToolCall('Bash', { command: '$B reload' })).toBe('Reloading the page');
});
// Snapshot variants
test('snapshot -i → scanning for interactive elements', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -i' })).toBe('Scanning for interactive elements');
});
test('snapshot -D → checking what changed', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -D' })).toBe('Checking what changed');
});
test('snapshot (plain) → taking a snapshot', () => {
expect(describeToolCall('Bash', { command: '$B snapshot' })).toBe('Taking a snapshot of the page');
});
// Interaction commands
test('click → clicking element', () => {
expect(describeToolCall('Bash', { command: '$B click @e3' })).toBe('Clicking @e3');
});
test('fill → typing into element', () => {
expect(describeToolCall('Bash', { command: '$B fill @e4 "hello world"' })).toBe('Typing ""hello world"" into @e4');
});
test('scroll with selector → scrolling to element', () => {
expect(describeToolCall('Bash', { command: '$B scroll .footer' })).toBe('Scrolling to .footer');
});
test('scroll without args → scrolling down', () => {
expect(describeToolCall('Bash', { command: '$B scroll' })).toBe('Scrolling down');
});
// Reading commands
test('text → reading page text', () => {
expect(describeToolCall('Bash', { command: '$B text' })).toBe('Reading page text');
});
test('html with selector → reading HTML of element', () => {
expect(describeToolCall('Bash', { command: '$B html .header' })).toBe('Reading HTML of .header');
});
test('html without selector → reading full page HTML', () => {
expect(describeToolCall('Bash', { command: '$B html' })).toBe('Reading full page HTML');
});
test('links → finding all links', () => {
expect(describeToolCall('Bash', { command: '$B links' })).toBe('Finding all links on the page');
});
test('console → checking console', () => {
expect(describeToolCall('Bash', { command: '$B console' })).toBe('Checking browser console for errors');
});
// Inspector commands
test('inspect with selector → inspecting CSS', () => {
expect(describeToolCall('Bash', { command: '$B inspect .header' })).toBe('Inspecting CSS of .header');
});
test('inspect without args → getting last picked element', () => {
expect(describeToolCall('Bash', { command: '$B inspect' })).toBe('Getting CSS for last picked element');
});
test('style → changing CSS', () => {
expect(describeToolCall('Bash', { command: '$B style .header color red' })).toBe('Changing CSS: .header color red');
});
test('cleanup → removing page clutter', () => {
expect(describeToolCall('Bash', { command: '$B cleanup --all' })).toBe('Removing page clutter (ads, popups, banners)');
});
// Visual commands
test('screenshot → saving screenshot', () => {
expect(describeToolCall('Bash', { command: '$B screenshot /tmp/shot.png' })).toBe('Saving screenshot to /tmp/shot.png');
});
test('screenshot without path', () => {
expect(describeToolCall('Bash', { command: '$B screenshot' })).toBe('Saving screenshot');
});
test('responsive → multi-size screenshots', () => {
expect(describeToolCall('Bash', { command: '$B responsive' })).toBe('Taking screenshots at mobile, tablet, and desktop sizes');
});
// Non-browse tools
test('Read tool → reading file', () => {
expect(describeToolCall('Read', { file_path: '/Users/foo/project/src/app.ts' })).toBe('Reading ~/project/src/app.ts');
});
test('Grep tool → searching for pattern', () => {
expect(describeToolCall('Grep', { pattern: 'handleClick' })).toBe('Searching for "handleClick"');
});
test('Glob tool → finding files', () => {
expect(describeToolCall('Glob', { pattern: '**/*.tsx' })).toBe('Finding files matching **/*.tsx');
});
test('Edit tool → editing file', () => {
expect(describeToolCall('Edit', { file_path: '/Users/foo/src/main.ts' })).toBe('Editing ~/src/main.ts');
});
// Edge cases
test('null input → empty string', () => {
expect(describeToolCall('Bash', null)).toBe('');
});
test('unknown browse command → generic description', () => {
expect(describeToolCall('Bash', { command: '$B newtab https://foo.com' })).toContain('newtab');
});
test('non-browse bash → shortened command', () => {
expect(describeToolCall('Bash', { command: 'echo hello' })).toBe('echo hello');
});
test('full browse binary path recognized', () => {
const result = describeToolCall('Bash', { command: '/Users/garrytan/.claude/skills/gstack/browse/dist/browse goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('tab command → switching tab', () => {
expect(describeToolCall('Bash', { command: '$B tab 2' })).toContain('tab');
});
});
// ─── Per-tab agent concurrency (source code validation) ──────────
describe('per-tab agent concurrency', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
test('server has per-tab agent state map', () => {
expect(serverSrc).toContain('tabAgents');
expect(serverSrc).toContain('TabAgentState');
expect(serverSrc).toContain('getTabAgent');
});
test('server returns per-tab agent status in /sidebar-chat', () => {
expect(serverSrc).toContain('getTabAgentStatus');
expect(serverSrc).toContain('tabAgentStatus');
});
test('spawnClaude accepts forTabId parameter', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('forTabId');
expect(spawnFn).toContain('tabState.status');
});
test('sidebar-command endpoint uses per-tab agent state', () => {
expect(serverSrc).toContain('msgTabId');
expect(serverSrc).toContain('tabState.status');
expect(serverSrc).toContain('tabState.queue');
});
test('agent event handler resets per-tab state', () => {
expect(serverSrc).toContain('eventTabId');
expect(serverSrc).toContain('tabState.status = \'idle\'');
});
test('agent event handler processes per-tab queue', () => {
// After agent_done, should process next message from THIS tab's queue
expect(serverSrc).toContain('tabState.queue.length > 0');
expect(serverSrc).toContain('tabState.queue.shift');
});
test('sidebar-agent uses per-tab processing set', () => {
expect(agentSrc).toContain('processingTabs');
expect(agentSrc).not.toContain('isProcessing');
});
test('sidebar-agent sends tabId with all events', () => {
// sendEvent should accept tabId parameter
expect(agentSrc).toContain('async function sendEvent(event: Record<string, any>, tabId?: number)');
// askClaude destructures tabId from queue entry (regex tolerates
// additional fields like `canary` and `pageUrl` from security module).
expect(agentSrc).toMatch(
/const \{[^}]*\bprompt\b[^}]*\bargs\b[^}]*\bstateFile\b[^}]*\bcwd\b[^}]*\btabId\b[^}]*\}/
);
});
test('sidebar-agent allows concurrent agents across tabs', () => {
// poll() should not block globally — it should check per-tab
expect(agentSrc).toContain('processingTabs.has(tid)');
// askClaude should be fire-and-forget (no await blocking the loop)
expect(agentSrc).toContain('askClaude(entry).catch');
});
test('queue entries include tabId', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('tabId: agentTabId');
});
test('health check monitors all per-tab agents', () => {
expect(serverSrc).toContain('for (const [tid, state] of tabAgents)');
});
});
describe('BROWSE_TAB tab pinning (cross-tab isolation)', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
const cliSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'cli.ts'), 'utf-8');
test('sidebar-agent passes BROWSE_TAB env var to claude process', () => {
// The env block should include BROWSE_TAB set to the tab ID
expect(agentSrc).toContain('BROWSE_TAB');
expect(agentSrc).toContain('String(tid)');
});
test('CLI reads BROWSE_TAB and sends tabId in command body', () => {
// BROWSE_TAB env var is still honored (sidebar-agent path). After the
// make-pdf refactor, the CLI layer now also accepts --tab-id <N>, with
// the CLI flag taking precedence over the env var. Both resolve to the
// same `tabId` body field.
expect(cliSrc).toContain('process.env.BROWSE_TAB');
expect(cliSrc).toContain('parseInt(envTab, 10)');
});
test('handleCommandInternal accepts tabId from request body', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1) > 0
? serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1)
: serverSrc.indexOf('\nasync function ', serverSrc.indexOf('async function handleCommandInternal(') + 200),
);
// Should destructure tabId from body
expect(handleFn).toContain('tabId');
// Should save and restore the active tab
expect(handleFn).toContain('savedTabId');
expect(handleFn).toContain('switchTab(tabId');
});
test('handleCommandInternal restores active tab after command (success path)', () => {
// On success, should restore savedTabId without stealing focus
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.length,
);
// Count restore calls — should appear in both success and error paths
const restoreCount = (handleFn.match(/switchTab\(savedTabId/g) || []).length;
expect(restoreCount).toBeGreaterThanOrEqual(2); // success + error paths
});
test('handleCommandInternal restores active tab on error path', () => {
// The catch block should also restore
const catchBlock = serverSrc.slice(
serverSrc.indexOf('} catch (err: any) {', serverSrc.indexOf('async function handleCommandInternal(')),
);
expect(catchBlock).toContain('switchTab(savedTabId');
});
test('tab pinning only activates when tabId is provided', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('try {', serverSrc.indexOf('async function handleCommandInternal(') + 1),
);
// Should check tabId is not undefined/null before switching
expect(handleFn).toContain('tabId !== undefined');
expect(handleFn).toContain('tabId !== null');
});
test('CLI only sends tabId when it is a valid number', () => {
// Body should conditionally include tabId. Historically that was keyed off
// the BROWSE_TAB env var. After the make-pdf refactor, the CLI also honors
// a --tab-id <N> flag on the CLI itself, so the check is "tabId defined
// AND not NaN" rather than literally inspecting the env var.
expect(cliSrc).toContain('tabId !== undefined && !isNaN(tabId)');
});
});
+256
View File
@@ -0,0 +1,256 @@
/**
* Regression: sidebar layout invariants after the chat-tab rip.
*
* The Chrome side panel used to host two surfaces: Chat (one-shot
* `claude -p` queue) and Terminal (interactive PTY). Chat was ripped
* once the PTY proved out — sidebar-agent.ts is gone, the chat queue
* endpoints are gone, and the primary-tab nav (Terminal | Chat) is
* gone. Terminal is now the sole primary surface.
*
* This file locks the load-bearing invariants of that layout so a
* future refactor can't silently re-introduce the old surface or break
* the new one.
*/
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
const HTML = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.html'), 'utf-8');
const JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8');
const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8'));
describe('sidebar: chat tab + nav are removed, Terminal is sole primary surface', () => {
test('No primary-tab nav element exists', () => {
expect(HTML).not.toContain('class="primary-tabs"');
expect(HTML).not.toContain('data-pane="chat"');
expect(HTML).not.toContain('data-pane="terminal"');
});
test('No <main id="tab-chat"> pane', () => {
expect(HTML).not.toMatch(/<main[^>]*id="tab-chat"/);
expect(HTML).not.toContain('id="chat-messages"');
expect(HTML).not.toContain('id="chat-loading"');
expect(HTML).not.toContain('id="chat-welcome"');
});
test('No chat input / send button / experimental banner', () => {
expect(HTML).not.toContain('class="command-bar"');
expect(HTML).not.toContain('id="command-input"');
expect(HTML).not.toContain('id="send-btn"');
expect(HTML).not.toContain('id="stop-agent-btn"');
expect(HTML).not.toContain('id="experimental-banner"');
});
test('No clear-chat button in footer', () => {
expect(HTML).not.toContain('id="clear-chat"');
});
test('Terminal pane is .active by default and has the toolbar', () => {
expect(HTML).toMatch(/<main[^>]*id="tab-terminal"[^>]*class="tab-content active"/);
expect(HTML).toContain('id="terminal-toolbar"');
expect(HTML).toContain('id="terminal-restart-now"');
});
test('Quick-actions buttons (Cleanup / Screenshot / Cookies) survive in the terminal toolbar', () => {
// Garry explicitly wanted these kept after the chat rip — they drive
// browser actions, not chat.
expect(HTML).toContain('id="chat-cleanup-btn"');
expect(HTML).toContain('id="chat-screenshot-btn"');
expect(HTML).toContain('id="chat-cookies-btn"');
// They live inside the terminal toolbar now (siblings of the Restart
// button), not as a separate strip below all panes.
const toolbarStart = HTML.indexOf('id="terminal-toolbar"');
const toolbarEnd = HTML.indexOf('</div>', toolbarStart);
const toolbarBlock = HTML.slice(toolbarStart, toolbarEnd + 6);
expect(toolbarBlock).toContain('id="chat-cleanup-btn"');
expect(toolbarBlock).toContain('id="chat-screenshot-btn"');
expect(toolbarBlock).toContain('id="chat-cookies-btn"');
});
});
describe('sidepanel.js: chat helpers ripped, terminal-injection helper survives', () => {
test('No primary-tab click handler', () => {
expect(JS).not.toContain("querySelectorAll('.primary-tab')");
expect(JS).not.toContain('activePrimaryPaneId');
});
test('No chat polling, sendMessage, sendChat, stopAgent, or pollTabs', () => {
expect(JS).not.toContain('chatPollInterval');
expect(JS).not.toContain('function sendMessage');
expect(JS).not.toContain('function pollChat');
expect(JS).not.toContain('function pollTabs');
expect(JS).not.toContain('function switchChatTab');
expect(JS).not.toContain('function stopAgent');
expect(JS).not.toContain('function applyChatEnabled');
expect(JS).not.toContain('function showSecurityBanner');
});
test('Cleanup runs through the live PTY (no /sidebar-command POST)', () => {
// The new Cleanup handler injects the prompt straight into claude's
// PTY via gstackInjectToTerminal. The dead code path was a POST to
// /sidebar-command which kicked off a fresh claude -p subprocess.
const cleanup = JS.slice(JS.indexOf('async function runCleanup'));
expect(cleanup).toContain('window.gstackInjectToTerminal');
expect(cleanup).not.toContain('/sidebar-command');
expect(cleanup).not.toContain('addChatEntry');
});
test('Inspector "Send to Code" routes through the live PTY', () => {
const sendBtn = JS.slice(JS.indexOf('inspectorSendBtn.addEventListener'));
expect(sendBtn).toContain('window.gstackInjectToTerminal');
expect(sendBtn).not.toContain("type: 'sidebar-command'");
});
test('updateConnection no longer kicks off chat / tab polling', () => {
const update = JS.slice(JS.indexOf('function updateConnection'), JS.indexOf('function updateConnection') + 1500);
expect(update).not.toContain('chatPollInterval');
expect(update).not.toContain('tabPollInterval');
expect(update).not.toContain('pollChat');
expect(update).not.toContain('pollTabs');
// BUT must still expose the bootstrap globals for sidepanel-terminal.js.
expect(update).toContain('window.gstackServerPort');
expect(update).toContain('window.gstackAuthToken');
});
});
describe('sidepanel-terminal.js: eager auto-connect + injection API', () => {
test('Exposes window.gstackInjectToTerminal for cross-pane use', () => {
expect(TERM_JS).toContain('window.gstackInjectToTerminal');
// Returns false when no live session, true when bytes go out.
const inject = TERM_JS.slice(TERM_JS.indexOf('window.gstackInjectToTerminal'));
expect(inject).toContain('return false');
expect(inject).toContain('return true');
expect(inject).toContain('ws.readyState !== WebSocket.OPEN');
});
test('Auto-connects on init (no keypress required)', () => {
expect(TERM_JS).not.toContain('function onAnyKey');
expect(TERM_JS).not.toContain("addEventListener('keydown'");
expect(TERM_JS).toContain('function tryAutoConnect');
});
test('Repaint hook fires when Terminal pane becomes visible', () => {
// The chat-tab rip removed gstack:primary-tab-changed; we use a
// MutationObserver on #tab-terminal's class attr instead. The
// observer must call repaintIfLive when the .active class returns.
expect(TERM_JS).toContain('MutationObserver');
expect(TERM_JS).toContain("attributeFilter: ['class']");
expect(TERM_JS).toContain('repaintIfLive');
const repaint = TERM_JS.slice(TERM_JS.indexOf('function repaintIfLive'));
expect(repaint).toContain('fitAddon && fitAddon.fit()');
expect(repaint).toContain('term.refresh');
expect(repaint).toContain("type: 'resize'");
});
test('No auto-reconnect on close (Restart is user-initiated)', () => {
const closeOnly = TERM_JS.slice(
TERM_JS.indexOf("ws.addEventListener('close'"),
TERM_JS.indexOf("ws.addEventListener('error'"),
);
expect(closeOnly).not.toContain('setTimeout');
expect(closeOnly).not.toContain('tryAutoConnect');
expect(closeOnly).not.toContain('connect()');
});
test('forceRestart helper closes ws, disposes xterm, returns to IDLE', () => {
expect(TERM_JS).toContain('function forceRestart');
const fn = TERM_JS.slice(TERM_JS.indexOf('function forceRestart'));
expect(fn).toContain('ws && ws.close()');
expect(fn).toContain('term.dispose()');
expect(fn).toContain('STATE.IDLE');
expect(fn).toContain('tryAutoConnect()');
});
test('Both restart buttons (mid-session and ENDED) call forceRestart', () => {
expect(TERM_JS).toContain("els.restart?.addEventListener('click', forceRestart)");
expect(TERM_JS).toContain("els.restartNow?.addEventListener('click', forceRestart)");
});
});
describe('server.ts: chat / sidebar-agent endpoints are gone', () => {
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
test('No /sidebar-command, /sidebar-chat, /sidebar-agent/* routes', () => {
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-command['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-chat['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname\.startsWith\(['"]\/sidebar-agent\//);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-agent\/event['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-tabs['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-session['"]/);
});
test('No chat-related state declarations or helpers', () => {
// Allow the symbol names inside the rip-marker comments — but no
// `let`, `const`, `function`, or `interface` declarations of them.
expect(SERVER_SRC).not.toMatch(/^let agentProcess/m);
expect(SERVER_SRC).not.toMatch(/^let agentStatus/m);
expect(SERVER_SRC).not.toMatch(/^let messageQueue/m);
expect(SERVER_SRC).not.toMatch(/^let sidebarSession/m);
expect(SERVER_SRC).not.toMatch(/^const tabAgents/m);
expect(SERVER_SRC).not.toMatch(/^function pickSidebarModel/m);
expect(SERVER_SRC).not.toMatch(/^function processAgentEvent/m);
expect(SERVER_SRC).not.toMatch(/^function killAgent/m);
expect(SERVER_SRC).not.toMatch(/^function addChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface ChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface SidebarSession/m);
});
test('/health no longer surfaces agentStatus or messageQueue length', () => {
const health = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/health'"));
const slice = health.slice(0, 2000);
expect(slice).not.toContain('agentStatus');
expect(slice).not.toContain('messageQueue');
expect(slice).not.toContain('agentStartTime');
// chatEnabled is hardcoded false now (older clients still see the field).
expect(slice).toMatch(/chatEnabled:\s*false/);
// terminalPort survives.
expect(slice).toContain('terminalPort');
});
});
describe('cli.ts: sidebar-agent is no longer spawned', () => {
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
test('No Bun.spawn of sidebar-agent.ts', () => {
expect(CLI_SRC).not.toMatch(/Bun\.spawn\(\s*\['bun',\s*'run',\s*\w*[Aa]gent[Ss]cript\][\s\S]{0,300}sidebar-agent/);
// The variable name `agentScript` was for sidebar-agent. After the
// rip there's only termAgentScript. Allow comments to mention the
// history but not active spawn calls.
expect(CLI_SRC).not.toMatch(/^\s*let agentScript = path\.resolve/m);
});
test('Terminal-agent spawn survives', () => {
expect(CLI_SRC).toContain('terminal-agent.ts');
expect(CLI_SRC).toMatch(/Bun\.spawn\(\['bun',\s*'run',\s*termAgentScript\]/);
});
});
describe('files: sidebar-agent.ts and its tests are deleted', () => {
test('browse/src/sidebar-agent.ts is gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'))).toBe(false);
});
test('sidebar-agent test files are gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent.test.ts'))).toBe(false);
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent-roundtrip.test.ts'))).toBe(false);
});
});
describe('manifest: ws permission + xterm-safe CSP', () => {
test('host_permissions covers ws localhost', () => {
expect(MANIFEST.host_permissions).toContain('ws://127.0.0.1:*/');
});
test('host_permissions still covers http localhost', () => {
expect(MANIFEST.host_permissions).toContain('http://127.0.0.1:*/');
});
test('manifest does NOT add unsafe-eval to extension_pages CSP', () => {
const csp = MANIFEST.content_security_policy;
if (csp && csp.extension_pages) {
expect(csp.extension_pages).not.toContain('unsafe-eval');
}
});
});
+196
View File
@@ -0,0 +1,196 @@
/**
* tab-each — fan-out command for the live Terminal pane.
*
* Source-level guards: command is registered, has a description + usage,
* scope-check the inner command, restore the original active tab in a
* finally block (so a mid-batch exception doesn't leave the user looking
* at a tab they didn't choose).
*
* Behavioral logic test: drive handleMetaCommand directly with a mock
* BrowserManager + executeCommand callback. Verify the iteration order,
* the JSON shape, the tab restore, and the chrome:// skip.
*/
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import { handleMetaCommand } from '../src/meta-commands';
import { META_COMMANDS, COMMAND_DESCRIPTIONS } from '../src/commands';
const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
describe('tab-each: registration', () => {
test('command is in META_COMMANDS', () => {
expect(META_COMMANDS.has('tab-each')).toBe(true);
});
test('has a description and usage entry', () => {
expect(COMMAND_DESCRIPTIONS['tab-each']).toBeDefined();
expect(COMMAND_DESCRIPTIONS['tab-each'].usage).toContain('tab-each');
expect(COMMAND_DESCRIPTIONS['tab-each'].category).toBe('Tabs');
});
});
describe('tab-each: source-level guards', () => {
test('scope-checks the inner command before fanning out', () => {
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"));
expect(block).toContain('checkScope(tokenInfo, innerName)');
// The scope check must run BEFORE the for-loop. If it ran inside the
// loop, a permission failure on the second tab would leave the first
// tab already mutated.
const checkIdx = block.indexOf('checkScope(tokenInfo, innerName)');
const loopIdx = block.indexOf('for (const tab of tabs)');
expect(checkIdx).toBeLessThan(loopIdx);
});
test('restores the original active tab in a finally block', () => {
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
expect(block).toContain('finally');
expect(block).toContain('originalActive');
expect(block).toContain('switchTab(originalActive');
});
test('uses bringToFront: false so the OS window does NOT jump', () => {
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
// tab-each is a background operation — pulling focus would steal the
// user's foreground app every time claude fans out, which is
// unacceptable.
expect(block).toContain('bringToFront: false');
});
test('skips chrome:// and chrome-extension:// internal pages', () => {
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
expect(block).toContain("startsWith('chrome://')");
expect(block).toContain("startsWith('chrome-extension://')");
});
});
describe('tab-each: behavior', () => {
function mockBm(tabs: Array<{ id: number; url: string; title: string; active: boolean }>) {
let activeId = tabs.find(t => t.active)?.id ?? tabs[0]?.id ?? 0;
const switched: number[] = [];
return {
__switched: switched,
__activeId: () => activeId,
getActiveSession: () => ({}),
getActiveTabId: () => activeId,
getTabListWithTitles: async () => tabs.map(t => ({ ...t })),
switchTab: (id: number, _opts?: any) => { switched.push(id); activeId = id; },
} as any;
}
test('iterates every tab, calls executeCommand for each, returns JSON results', async () => {
const tabs = [
{ id: 1, url: 'https://news.example.com', title: 'News', active: true },
{ id: 2, url: 'https://docs.example.com', title: 'Docs', active: false },
{ id: 3, url: 'https://github.com', title: 'GitHub', active: false },
];
const bm = mockBm(tabs);
const calls: Array<{ command: string; args?: string[]; tabId?: number }> = [];
const out = await handleMetaCommand(
'tab-each',
['snapshot', '-i'],
bm,
async () => {},
null,
{
executeCommand: async (body) => {
calls.push(body);
return { status: 200, result: `snap-of-${body.tabId}` };
},
},
);
const parsed = JSON.parse(out);
expect(parsed.command).toBe('snapshot');
expect(parsed.args).toEqual(['-i']);
expect(parsed.total).toBe(3);
expect(parsed.results.map((r: any) => r.tabId)).toEqual([1, 2, 3]);
expect(parsed.results.every((r: any) => r.status === 200)).toBe(true);
expect(parsed.results[0].output).toBe('snap-of-1');
// Inner command was dispatched 3 times, once per tab, with the right tabId.
expect(calls).toHaveLength(3);
expect(calls.map(c => c.tabId)).toEqual([1, 2, 3]);
expect(calls.every(c => c.command === 'snapshot')).toBe(true);
});
test('skips chrome:// pages with status=0 + "skipped" output', async () => {
const tabs = [
{ id: 1, url: 'chrome://newtab', title: 'New Tab', active: true },
{ id: 2, url: 'https://example.com', title: 'Example', active: false },
{ id: 3, url: 'chrome-extension://abc/page.html', title: 'Ext', active: false },
];
const bm = mockBm(tabs);
const calls: any[] = [];
const out = await handleMetaCommand(
'tab-each',
['text'],
bm,
async () => {},
null,
{
executeCommand: async (body) => {
calls.push(body);
return { status: 200, result: `text-of-${body.tabId}` };
},
},
);
const parsed = JSON.parse(out);
expect(parsed.total).toBe(3);
// chrome:// and chrome-extension:// → skipped (status 0).
expect(parsed.results[0].status).toBe(0);
expect(parsed.results[0].output).toContain('skipped');
expect(parsed.results[2].status).toBe(0);
// Only the real tab dispatched.
expect(calls).toHaveLength(1);
expect(calls[0].tabId).toBe(2);
});
test('restores the originally active tab even if a tab errors', async () => {
const tabs = [
{ id: 10, url: 'https://a.example', title: 'A', active: false },
{ id: 20, url: 'https://b.example', title: 'B', active: true }, // initially active
{ id: 30, url: 'https://c.example', title: 'C', active: false },
];
const bm = mockBm(tabs);
let calls = 0;
const out = await handleMetaCommand(
'tab-each',
['text'],
bm,
async () => {},
null,
{
executeCommand: async (body) => {
calls++;
if (body.tabId === 20) {
return { status: 500, result: JSON.stringify({ error: 'boom' }) };
}
return { status: 200, result: `ok-${body.tabId}` };
},
},
);
const parsed = JSON.parse(out);
expect(parsed.results.find((r: any) => r.tabId === 20).status).toBe(500);
expect(parsed.results.find((r: any) => r.tabId === 20).output).toBe('boom');
expect(parsed.results.find((r: any) => r.tabId === 10).status).toBe(200);
expect(parsed.results.find((r: any) => r.tabId === 30).status).toBe(200);
// Active tab restored to 20 (the one that was active when we started).
expect(bm.__activeId()).toBe(20);
});
test('throws on empty args (no inner command)', async () => {
const bm = mockBm([{ id: 1, url: 'https://x.example', title: 'X', active: true }]);
await expect(handleMetaCommand(
'tab-each',
[],
bm,
async () => {},
null,
{ executeCommand: async () => ({ status: 200, result: '' }) },
)).rejects.toThrow(/Usage/);
});
});
@@ -0,0 +1,273 @@
/**
* Integration tests for terminal-agent.ts.
*
* Spawns the agent as a real subprocess in a temp state directory,
* exercises:
* 1. /internal/grant — loopback handshake with the internal token.
* 2. /ws Origin gate — non-extension Origin → 403.
* 3. /ws cookie gate — missing/invalid cookie → 401.
* 4. /ws full PTY round-trip — write `echo hi\n`, read `hi`.
* 5. resize control message — terminal accepts and stays alive.
* 6. close behavior — sending close terminates the PTY child.
*
* Uses /bin/bash via BROWSE_TERMINAL_BINARY override so CI doesn't need
* the `claude` binary installed.
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
const AGENT_SCRIPT = path.join(import.meta.dir, '../src/terminal-agent.ts');
const BASH = '/bin/bash';
let stateDir: string;
let agentProc: any;
let agentPort: number;
let internalToken: string;
function readPortFile(): number {
for (let i = 0; i < 50; i++) {
try {
const v = parseInt(fs.readFileSync(path.join(stateDir, 'terminal-port'), 'utf-8').trim(), 10);
if (Number.isFinite(v) && v > 0) return v;
} catch {}
Bun.sleepSync(40);
}
throw new Error('terminal-agent never wrote port file');
}
function readTokenFile(): string {
for (let i = 0; i < 50; i++) {
try {
const t = fs.readFileSync(path.join(stateDir, 'terminal-internal-token'), 'utf-8').trim();
if (t.length > 16) return t;
} catch {}
Bun.sleepSync(40);
}
throw new Error('terminal-agent never wrote internal token');
}
beforeAll(() => {
stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-term-'));
const stateFile = path.join(stateDir, 'browse.json');
// browse.json must exist so the agent's readBrowseToken doesn't throw.
fs.writeFileSync(stateFile, JSON.stringify({ token: 'test-browse-token' }));
agentProc = Bun.spawn(['bun', 'run', AGENT_SCRIPT], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_SERVER_PORT: '0', // not used in this test
BROWSE_TERMINAL_BINARY: BASH,
},
stdio: ['ignore', 'pipe', 'pipe'],
});
agentPort = readPortFile();
internalToken = readTokenFile();
});
afterAll(() => {
try { agentProc?.kill?.(); } catch {}
try { fs.rmSync(stateDir, { recursive: true, force: true }); } catch {}
});
async function grantToken(token: string): Promise<Response> {
return fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${internalToken}`,
},
body: JSON.stringify({ token }),
});
}
describe('terminal-agent: /internal/grant', () => {
test('accepts grants signed with the internal token', async () => {
const resp = await grantToken('test-cookie-token-very-long-yes');
expect(resp.status).toBe(200);
});
test('rejects grants with the wrong internal token', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer wrong-token',
},
body: JSON.stringify({ token: 'whatever' }),
});
expect(resp.status).toBe(403);
});
});
describe('terminal-agent: /ws gates', () => {
test('rejects upgrade attempts without an extension Origin', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`);
expect(resp.status).toBe(403);
expect(await resp.text()).toBe('forbidden origin');
});
test('rejects upgrade attempts from a non-extension Origin', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: { 'Origin': 'https://evil.example.com' },
});
expect(resp.status).toBe(403);
});
test('rejects extension-Origin upgrades without a granted cookie', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Origin': 'chrome-extension://abc123',
'Cookie': 'gstack_pty=never-granted',
},
});
expect(resp.status).toBe(401);
});
});
describe('terminal-agent: PTY round-trip via real WebSocket (Cookie auth)', () => {
test('binary writes go to PTY stdin, output streams back', async () => {
const cookie = 'rt-token-must-be-at-least-seventeen-chars-long';
const granted = await grantToken(cookie);
expect(granted.status).toBe(200);
const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
headers: {
'Origin': 'chrome-extension://test-extension-id',
'Cookie': `gstack_pty=${cookie}`,
},
} as any);
const collected: string[] = [];
let opened = false;
let closed = false;
await new Promise<void>((resolve, reject) => {
const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
ws.addEventListener('open', () => { opened = true; clearTimeout(timer); resolve(); });
ws.addEventListener('error', (e: any) => { clearTimeout(timer); reject(new Error('ws error')); });
});
ws.addEventListener('message', (ev: any) => {
if (typeof ev.data === 'string') return; // ignore control frames
const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data;
collected.push(new TextDecoder().decode(buf));
});
ws.addEventListener('close', () => { closed = true; });
// Lazy-spawn trigger: any binary frame causes the agent to spawn /bin/bash.
ws.send(new TextEncoder().encode('echo hello-pty-world\nexit\n'));
// Wait up to 5s for output and shutdown.
await new Promise<void>((resolve) => {
const start = Date.now();
const tick = () => {
const joined = collected.join('');
if (joined.includes('hello-pty-world')) return resolve();
if (Date.now() - start > 5000) return resolve();
setTimeout(tick, 50);
};
tick();
});
expect(opened).toBe(true);
const allOutput = collected.join('');
expect(allOutput).toContain('hello-pty-world');
try { ws.close(); } catch {}
// Give cleanup a moment.
await Bun.sleep(200);
});
test('Sec-WebSocket-Protocol auth path: browser-style upgrade with token in protocol', async () => {
// This is the path the actual browser extension takes. Cross-port
// SameSite=Strict cookies don't reliably survive the jump from the
// browse server (port A) to the agent (port B) when initiated from a
// chrome-extension origin, so we send the token via the only auth
// header the browser WebSocket API lets us set: Sec-WebSocket-Protocol.
//
// The browser sends `gstack-pty.<token>` and the agent must:
// 1) strip the gstack-pty. prefix
// 2) validate the token
// 3) ECHO the protocol back in the upgrade response
// Without (3) the browser closes the connection immediately, which
// is the exact bug the original cookie-only implementation hit in
// manual dogfood. This test catches that regression in CI.
const token = 'sec-protocol-token-must-be-at-least-seventeen-chars';
await grantToken(token);
// We exercise the protocol path by raw-handshaking via fetch+Upgrade,
// because Bun's test-client WebSocket constructor doesn't propagate
// `protocols` cleanly when also passed `headers` (the constructor
// detects the third-arg form unreliably). Real browsers (Chromium)
// use the standard protocols arg fine — the server-side handler is
// identical either way, so this test still locks the load-bearing
// invariant: the agent accepts a token via Sec-WebSocket-Protocol
// and echoes the protocol back so a browser would accept the upgrade.
const handshakeKey = 'dGhlIHNhbXBsZSBub25jZQ==';
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': handshakeKey,
'Sec-WebSocket-Protocol': `gstack-pty.${token}`,
'Origin': 'chrome-extension://test-extension-id',
},
});
// 101 Switching Protocols + protocol echoed back = browser would accept.
// 401/403/anything else = browser would close the connection immediately
// (the bug we hit in manual dogfood).
expect(resp.status).toBe(101);
expect(resp.headers.get('upgrade')?.toLowerCase()).toBe('websocket');
expect(resp.headers.get('sec-websocket-protocol')).toBe(`gstack-pty.${token}`);
});
test('Sec-WebSocket-Protocol auth: rejects unknown token even with valid Origin', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': 'dGhlIHNhbXBsZSBub25jZQ==',
'Sec-WebSocket-Protocol': 'gstack-pty.never-granted-token',
'Origin': 'chrome-extension://test-extension-id',
},
});
expect(resp.status).toBe(401);
});
test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => {
const cookie = 'resize-token-must-be-at-least-seventeen-chars';
await grantToken(cookie);
const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
headers: {
'Origin': 'chrome-extension://test-extension-id',
'Cookie': `gstack_pty=${cookie}`,
},
} as any);
await new Promise<void>((resolve, reject) => {
const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
ws.addEventListener('open', () => { clearTimeout(timer); resolve(); });
ws.addEventListener('error', () => { clearTimeout(timer); reject(new Error('ws error')); });
});
// Send a resize before anything else (lazy-spawn won't fire).
ws.send(JSON.stringify({ type: 'resize', cols: 120, rows: 40 }));
// After resize, send a binary frame; should still work.
ws.send(new TextEncoder().encode('exit\n'));
await Bun.sleep(300);
// ws still readyState 1 (OPEN) or 3 (CLOSED after exit) — both fine.
expect([WebSocket.OPEN, WebSocket.CLOSED]).toContain(ws.readyState);
try { ws.close(); } catch {}
});
});
+223
View File
@@ -0,0 +1,223 @@
/**
* Unit tests for the Terminal-tab PTY agent and its server-side glue.
*
* Coverage:
* - pty-session-cookie module: mint / validate / revoke / TTL pruning.
* - source-level guard: /pty-session and /terminal/* are NOT in TUNNEL_PATHS.
* - source-level guard: /health does not surface ptyToken.
* - source-level guard: terminal-agent binds 127.0.0.1 only.
* - source-level guard: terminal-agent enforces Origin AND cookie on /ws.
*
* These are read-only checks against source — they prevent silent surface
* widening during a routine refactor (matches the dual-listener.test.ts
* pattern). End-to-end behavior (real /bin/bash PTY round-trip,
* tunnel-surface 404 + denial-log) lives in
* `browse/test/terminal-agent-integration.test.ts`.
*/
import { describe, test, expect, beforeEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import {
mintPtySessionToken, validatePtySessionToken, revokePtySessionToken,
extractPtyCookie, buildPtySetCookie, buildPtyClearCookie,
PTY_COOKIE_NAME, __resetPtySessions,
} from '../src/pty-session-cookie';
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/terminal-agent.ts'), 'utf-8');
describe('pty-session-cookie: mint/validate/revoke', () => {
beforeEach(() => __resetPtySessions());
test('a freshly minted token validates', () => {
const { token } = mintPtySessionToken();
expect(validatePtySessionToken(token)).toBe(true);
});
test('null and unknown tokens fail validation', () => {
expect(validatePtySessionToken(null)).toBe(false);
expect(validatePtySessionToken(undefined)).toBe(false);
expect(validatePtySessionToken('')).toBe(false);
expect(validatePtySessionToken('not-a-real-token')).toBe(false);
});
test('revoke makes a token invalid', () => {
const { token } = mintPtySessionToken();
expect(validatePtySessionToken(token)).toBe(true);
revokePtySessionToken(token);
expect(validatePtySessionToken(token)).toBe(false);
});
test('Set-Cookie has HttpOnly + SameSite=Strict + Path=/ + Max-Age', () => {
const { token } = mintPtySessionToken();
const cookie = buildPtySetCookie(token);
expect(cookie).toContain(`${PTY_COOKIE_NAME}=${token}`);
expect(cookie).toContain('HttpOnly');
expect(cookie).toContain('SameSite=Strict');
expect(cookie).toContain('Path=/');
expect(cookie).toMatch(/Max-Age=\d+/);
// Secure is intentionally omitted — daemon binds 127.0.0.1 over HTTP.
expect(cookie).not.toContain('Secure');
});
test('clear-cookie has Max-Age=0', () => {
expect(buildPtyClearCookie()).toContain('Max-Age=0');
});
test('extractPtyCookie reads gstack_pty from a Cookie header', () => {
const { token } = mintPtySessionToken();
const req = new Request('http://127.0.0.1/ws', {
headers: { 'cookie': `othercookie=foo; gstack_pty=${token}; baz=qux` },
});
expect(extractPtyCookie(req)).toBe(token);
});
test('extractPtyCookie returns null when the cookie is missing', () => {
const req = new Request('http://127.0.0.1/ws', {
headers: { 'cookie': 'unrelated=value' },
});
expect(extractPtyCookie(req)).toBe(null);
});
});
describe('Source-level guard: /pty-session is not on the tunnel surface', () => {
test('TUNNEL_PATHS does not include /pty-session or /terminal/*', () => {
const start = SERVER_SRC.indexOf('const TUNNEL_PATHS = new Set<string>([');
expect(start).toBeGreaterThan(-1);
const end = SERVER_SRC.indexOf(']);', start);
const body = SERVER_SRC.slice(start, end);
expect(body).not.toContain('/pty-session');
expect(body).not.toContain('/terminal/');
expect(body).not.toContain('/terminal-');
});
});
describe('Source-level guard: /health does NOT surface ptyToken', () => {
test('/health response body does not include ptyToken', () => {
const healthIdx = SERVER_SRC.indexOf("url.pathname === '/health'");
expect(healthIdx).toBeGreaterThan(-1);
// Slice from /health through the response close-bracket.
const slice = SERVER_SRC.slice(healthIdx, healthIdx + 2000);
// The /health JSON.stringify body must not mention the cookie token.
// It's allowed to include `terminalPort` (a port number, not auth).
expect(slice).not.toContain('ptyToken');
expect(slice).not.toContain('gstack_pty');
expect(slice).toContain('terminalPort');
});
});
describe('Source-level guard: terminal-agent', () => {
test('binds 127.0.0.1 only, never 0.0.0.0', () => {
expect(AGENT_SRC).toContain("hostname: '127.0.0.1'");
expect(AGENT_SRC).not.toContain("hostname: '0.0.0.0'");
});
test('rejects /ws upgrades without chrome-extension:// Origin', () => {
// The Origin check must run BEFORE the cookie check — otherwise a
// missing-origin attempt would surface the 401 cookie message and
// signal to attackers that they need to forge a cookie.
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
expect(wsHandler).toContain('chrome-extension://');
expect(wsHandler).toContain('forbidden origin');
});
test('validates the session token against an in-memory token set', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Two transports: Sec-WebSocket-Protocol (preferred for browsers) and
// Cookie gstack_pty (fallback). Both verify against validTokens.
expect(wsHandler).toContain('sec-websocket-protocol');
expect(wsHandler).toContain('gstack_pty');
expect(wsHandler).toContain('validTokens.has');
});
test('Sec-WebSocket-Protocol auth: strips gstack-pty. prefix and echoes back', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Browsers send `Sec-WebSocket-Protocol: gstack-pty.<token>`. The agent
// must strip the prefix before checking validTokens, AND echo the
// protocol back in the upgrade response — without the echo, the
// browser closes the connection immediately.
expect(wsHandler).toContain("'gstack-pty.'");
expect(wsHandler).toContain('Sec-WebSocket-Protocol');
expect(wsHandler).toContain('acceptedProtocol');
});
test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => {
// The whole point of lazy-spawn (codex finding #8) is that the WS
// upgrade itself does NOT call spawnClaude. Spawn happens on first
// message frame.
const upgradeBlock = AGENT_SRC.slice(
AGENT_SRC.indexOf("if (url.pathname === '/ws')"),
AGENT_SRC.indexOf("websocket: {"),
);
expect(upgradeBlock).not.toContain('spawnClaude(');
// Spawn must be invoked from the message handler (lazy on first byte).
const messageHandler = AGENT_SRC.slice(AGENT_SRC.indexOf('message(ws, raw)'));
expect(messageHandler).toContain('spawnClaude(');
expect(messageHandler).toContain('!session.spawned');
});
test('process.on uncaughtException + unhandledRejection handlers exist', () => {
expect(AGENT_SRC).toContain("process.on('uncaughtException'");
expect(AGENT_SRC).toContain("process.on('unhandledRejection'");
});
test('cleanup escalates SIGINT to SIGKILL after 3s on close', () => {
// disposeSession must be idempotent and use a SIGINT-then-SIGKILL pattern.
const dispose = AGENT_SRC.slice(AGENT_SRC.indexOf('function disposeSession'));
expect(dispose).toContain("'SIGINT'");
expect(dispose).toContain("'SIGKILL'");
expect(dispose).toContain('3000');
});
test('tabState frames write tabs.json + active-tab.json', () => {
expect(AGENT_SRC).toContain("msg?.type === 'tabState'");
expect(AGENT_SRC).toContain('function handleTabState');
const fn = AGENT_SRC.slice(AGENT_SRC.indexOf('function handleTabState'));
// Atomic write via tmp + rename for both files (so claude never reads
// a half-written JSON document).
expect(fn).toContain("'tabs.json'");
expect(fn).toContain("'active-tab.json'");
expect(fn).toContain('renameSync');
// Skip chrome:// and chrome-extension:// pages — they're not useful
// targets for browse commands.
expect(fn).toContain("startsWith('chrome://')");
expect(fn).toContain("startsWith('chrome-extension://')");
});
test('claude is spawned with --append-system-prompt tab-awareness hint', () => {
expect(AGENT_SRC).toContain('function buildTabAwarenessHint');
const hint = AGENT_SRC.slice(AGENT_SRC.indexOf('function buildTabAwarenessHint'));
// The hint must mention the live state files and the fanout command —
// those are the two affordances that distinguish a gstack-PTY claude
// from a plain `claude` session.
expect(hint).toContain('tabs.json');
expect(hint).toContain('active-tab.json');
expect(hint).toContain('tab-each');
// And it must be passed via --append-system-prompt at spawn time
// (NOT written into the PTY as user input — that would pollute the
// visible transcript).
const spawn = AGENT_SRC.slice(AGENT_SRC.indexOf('function spawnClaude'));
expect(spawn).toContain("'--append-system-prompt'");
expect(spawn).toContain('tabHint');
});
});
describe('Source-level guard: server.ts /pty-session route', () => {
test('validates AUTH_TOKEN, grants over loopback, returns token + Set-Cookie', () => {
const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'"));
// Must check auth before minting.
const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken'));
expect(beforeMint).toContain('validateAuth');
// Must call the loopback grant before responding (otherwise the
// agent's validTokens Set never sees the token and /ws would 401).
expect(route).toContain('grantPtyToken');
// Must return the token in the JSON body for the
// Sec-WebSocket-Protocol auth path (cross-port cookies don't survive
// SameSite=Strict from a chrome-extension origin).
expect(route).toContain('ptySessionToken');
// Set-Cookie is kept as a fallback for non-browser callers.
expect(route).toContain('Set-Cookie');
expect(route).toContain('buildPtySetCookie');
});
});