feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip

The chat queue path is gone. The Chrome side panel is now just an
interactive claude PTY in xterm.js. Activity / Refs / Inspector still
exist behind the `debug` toggle in the footer.

Three threads of change, all from dogfood iteration on top of
cc-pty-import:

1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol
   - Browsers can't set Authorization on a WebSocket upgrade. We had
     been minting an HttpOnly gstack_pty cookie via /pty-session, but
     SameSite=Strict cookies don't survive the cross-port jump from
     server.ts:34567 to the agent's random port from a chrome-extension
     origin. The WS opened then immediately closed → "Session ended."
   - /pty-session now also returns ptySessionToken in the JSON body.
   - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`.
     Browser sends Sec-WebSocket-Protocol on the upgrade.
   - Agent reads the protocol header, validates against validTokens,
     and MUST echo the protocol back (Chromium closes the connection
     immediately if a server doesn't pick one of the offered protocols).
   - Cookie path is kept as a fallback for non-browser callers (curl,
     integration tests).
   - New integration test exercises the full protocol-auth round-trip
     via raw fetch+Upgrade so a future regression of this exact class
     fails in CI.

2. fix(extension): UX polish on the Terminal pane
   - Eager auto-connect when the sidebar opens — no "Press any key to
     start" friction every reload.
   - Always-visible ↻ Restart button in the terminal toolbar (not
     gated on the ENDED state) so the user can force a fresh claude
     mid-session.
   - MutationObserver on #tab-terminal's class attribute drives a
     fitAddon.fit() + term.refresh() when the pane becomes visible
     again — xterm doesn't auto-redraw after display:none → display:flex.

3. feat(extension): rip the chat tab + sidebar-agent.ts
   - Sidebar is Terminal-only. No more Terminal | Chat primary nav.
   - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat,
     /sidebar-agent/event, /sidebar-tabs* and friends all deleted.
   - The pickSidebarModel router (sonnet vs opus) is gone — the live
     PTY uses whatever model the user's `claude` CLI is configured with.
   - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive
     in the Terminal toolbar. Cleanup now injects its prompt into the
     live PTY via window.gstackInjectToTerminal — no more
     /sidebar-command POST. The Inspector "Send to Code" action uses
     the same injection path.
   - clear-chat button removed from the footer.
   - sidepanel.js shed ~900 lines of chat polling, optimistic UI,
     stop-agent, etc.

Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and
docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar
regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27
structural assertions locking the new layout — Terminal sole pane,
no chat input, quick-actions in toolbar, eager-connect, MutationObserver
repaint, restart helper.
This commit is contained in:
Garry Tan
2026-04-25 21:03:04 -07:00
parent 0361acfb6a
commit 006dbe19f1
16 changed files with 771 additions and 4229 deletions
+6 -55
View File
@@ -853,7 +853,7 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
// Delete stale state file
safeUnlinkQuiet(config.stateFile);
console.log('Launching headed Chromium with extension + sidebar agent...');
console.log('Launching headed Chromium with extension + terminal agent...');
try {
// Start server in headed mode with extension auto-loaded
// Use a well-known port so the Chrome extension auto-connects
@@ -882,61 +882,12 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
const status = await resp.text();
console.log(`Connected to real Chrome\n${status}`);
// Auto-start sidebar agent
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
if (!fs.existsSync(agentScript)) {
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
}
try {
if (!fs.existsSync(agentScript)) {
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
}
// Clear old agent queue
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
try {
fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
fs.writeFileSync(agentQueue, '', { mode: 0o600 });
} catch (err: any) {
if (err?.code !== 'EACCES') throw err;
}
// sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
// the Terminal pane runs an interactive PTY now, no more one-shot
// claude -p subprocesses to multiplex.
// Resolve browse binary path the same way — execPath-relative
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
if (!fs.existsSync(browseBin)) {
browseBin = process.execPath; // the compiled binary itself
}
// Kill any existing sidebar-agent processes before starting a new one.
// Old agents have stale auth tokens and will silently fail to relay events,
// causing the server to mark the agent as "hung".
try {
const { spawnSync } = require('child_process');
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
} catch (err: any) {
if (err?.code !== 'ENOENT') throw err;
}
const agentProc = Bun.spawn(['bun', 'run', agentScript], {
cwd: config.projectDir,
env: {
...process.env,
BROWSE_BIN: browseBin,
BROWSE_STATE_FILE: config.stateFile,
BROWSE_SERVER_PORT: String(newState.port),
},
stdio: ['ignore', 'ignore', 'ignore'],
});
agentProc.unref();
console.log(`[browse] Sidebar agent started (PID: ${agentProc.pid})`);
} catch (err: any) {
console.error(`[browse] Sidebar agent failed to start: ${err.message}`);
console.error(`[browse] Run manually: bun run ${agentScript}`);
}
// Auto-start terminal agent (non-compiled, parallel to sidebar-agent).
// Owns the PTY WebSocket for the Terminal sidebar tab. Crash-isolated
// from the chat agent per codex outside-voice review.
// Auto-start terminal agent (non-compiled bun process). Owns the PTY
// WebSocket for the sidebar Terminal pane.
let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
if (!fs.existsSync(termAgentScript)) {
termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');
+47 -911
View File
File diff suppressed because it is too large Load Diff
-947
View File
@@ -1,947 +0,0 @@
/**
* Sidebar Agent polls agent-queue from server, spawns claude -p for each
* message, streams live events back to the server via /sidebar-agent/event.
*
* This runs as a NON-COMPILED bun process because compiled bun binaries
* cannot posix_spawn external executables. The server writes to the queue
* file, this process reads it and spawns claude.
*
* Usage: BROWSE_BIN=/path/to/browse bun run browse/src/sidebar-agent.ts
*/
import { spawn } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import { safeUnlink } from './error-handling';
import {
checkCanaryInStructure, logAttempt, hashPayload, extractDomain,
combineVerdict, writeSessionState, readSessionState, THRESHOLDS,
readDecision, clearDecision, excerptForReview,
type LayerSignal,
} from './security';
import {
loadTestsavant, scanPageContent, checkTranscript,
shouldRunTranscriptCheck, getClassifierStatus,
loadDeberta, scanPageContentDeberta,
type ToolCallInput,
} from './security-classifier';
const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
const KILL_FILE = path.join(path.dirname(QUEUE), 'sidebar-agent-kill');
const SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '34567', 10);
const SERVER_URL = `http://127.0.0.1:${SERVER_PORT}`;
const POLL_MS = 200; // 200ms poll — keeps time-to-first-token low
const B = process.env.BROWSE_BIN || path.resolve(__dirname, '../../.claude/skills/gstack/browse/dist/browse');
const CANCEL_DIR = path.join(process.env.HOME || '/tmp', '.gstack');
function cancelFileForTab(tabId: number): string {
return path.join(CANCEL_DIR, `sidebar-agent-cancel-${tabId}`);
}
interface QueueEntry {
prompt: string;
args?: string[];
stateFile?: string;
cwd?: string;
tabId?: number | null;
message?: string | null;
pageUrl?: string | null;
sessionId?: string | null;
ts?: string;
canary?: string; // session-scoped token; leak = prompt injection evidence
}
function isValidQueueEntry(e: unknown): e is QueueEntry {
if (typeof e !== 'object' || e === null) return false;
const obj = e as Record<string, unknown>;
if (typeof obj.prompt !== 'string' || obj.prompt.length === 0) return false;
if (obj.args !== undefined && (!Array.isArray(obj.args) || !obj.args.every(a => typeof a === 'string'))) return false;
if (obj.stateFile !== undefined) {
if (typeof obj.stateFile !== 'string') return false;
if (obj.stateFile.includes('..')) return false;
}
if (obj.cwd !== undefined) {
if (typeof obj.cwd !== 'string') return false;
if (obj.cwd.includes('..')) return false;
}
if (obj.tabId !== undefined && obj.tabId !== null && typeof obj.tabId !== 'number') return false;
if (obj.message !== undefined && obj.message !== null && typeof obj.message !== 'string') return false;
if (obj.pageUrl !== undefined && obj.pageUrl !== null && typeof obj.pageUrl !== 'string') return false;
if (obj.sessionId !== undefined && obj.sessionId !== null && typeof obj.sessionId !== 'string') return false;
if (obj.canary !== undefined && typeof obj.canary !== 'string') return false;
return true;
}
let lastLine = 0;
let authToken: string | null = null;
// Per-tab processing — each tab can run its own agent concurrently
const processingTabs = new Set<number>();
// Active claude subprocesses — keyed by tabId for targeted kill
const activeProcs = new Map<number, ReturnType<typeof spawn>>();
let activeProc: ReturnType<typeof spawn> | null = null;
// Kill-file timestamp last seen — avoids double-kill on same write
let lastKillTs = 0;
// ─── File drop relay ──────────────────────────────────────────
function getGitRoot(): string | null {
try {
const { execSync } = require('child_process');
return execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
} catch (err: any) {
console.debug('[sidebar-agent] Not in a git repo:', err.message);
return null;
}
}
function writeToInbox(message: string, pageUrl?: string, sessionId?: string): void {
const gitRoot = getGitRoot();
if (!gitRoot) {
console.error('[sidebar-agent] Cannot write to inbox — not in a git repo');
return;
}
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
fs.mkdirSync(inboxDir, { recursive: true, mode: 0o700 });
const now = new Date();
const timestamp = now.toISOString().replace(/:/g, '-');
const filename = `${timestamp}-observation.json`;
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
const finalFile = path.join(inboxDir, filename);
const inboxMessage = {
type: 'observation',
timestamp: now.toISOString(),
page: { url: pageUrl || 'unknown', title: '' },
userMessage: message,
sidebarSessionId: sessionId || 'unknown',
};
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2), { mode: 0o600 });
fs.renameSync(tmpFile, finalFile);
console.log(`[sidebar-agent] Wrote inbox message: ${filename}`);
}
// ─── Auth ────────────────────────────────────────────────────────
async function refreshToken(): Promise<string | null> {
// Read token from state file (same-user, mode 0o600) instead of /health
try {
const stateFile = process.env.BROWSE_STATE_FILE ||
path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
const data = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
authToken = data.token || null;
return authToken;
} catch (err: any) {
console.error('[sidebar-agent] Failed to refresh auth token:', err.message);
return null;
}
}
// ─── Event relay to server ──────────────────────────────────────
async function sendEvent(event: Record<string, any>, tabId?: number): Promise<void> {
if (!authToken) await refreshToken();
if (!authToken) return;
try {
await fetch(`${SERVER_URL}/sidebar-agent/event`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${authToken}`,
},
body: JSON.stringify({ ...event, tabId: tabId ?? null }),
});
} catch (err) {
console.error('[sidebar-agent] Failed to send event:', err);
}
}
// ─── Claude subprocess ──────────────────────────────────────────
function shorten(str: string): string {
return str
.replace(new RegExp(B.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g'), '$B')
.replace(/\/Users\/[^/]+/g, '~')
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
.replace(/\.claude\/skills\/gstack\//g, '')
.replace(/browse\/dist\/browse/g, '$B');
}
function describeToolCall(tool: string, input: any): string {
if (!input) return '';
// For Bash commands, generate a plain-English description
if (tool === 'Bash' && input.command) {
const cmd = input.command;
// Browse binary commands — the most common case
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
if (browseMatch) {
const browseCmd = browseMatch[1] || browseMatch[2];
const args = cmd.split(/\s+/).slice(2).join(' ');
switch (browseCmd) {
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
case 'click': return `Clicking ${args}`;
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
case 'text': return 'Reading page text';
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
case 'links': return 'Finding all links on the page';
case 'forms': return 'Looking for forms';
case 'console': return 'Checking browser console for errors';
case 'network': return 'Checking network requests';
case 'url': return 'Checking current URL';
case 'back': return 'Going back';
case 'forward': return 'Going forward';
case 'reload': return 'Reloading the page';
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
case 'wait': return `Waiting for ${args}`;
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
case 'style': return `Changing CSS: ${args}`;
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
case 'prettyscreenshot': return 'Taking a clean screenshot';
case 'css': return `Checking CSS property: ${args}`;
case 'is': return `Checking if element is ${args}`;
case 'diff': return `Comparing ${args}`;
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
case 'status': return 'Checking browser status';
case 'tabs': return 'Listing open tabs';
case 'focus': return 'Bringing browser to front';
case 'select': return `Selecting option in ${args}`;
case 'hover': return `Hovering over ${args}`;
case 'viewport': return `Setting viewport to ${args}`;
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
default: return `Running browse ${browseCmd} ${args}`.trim();
}
}
// Non-browse bash commands
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
let short = shorten(cmd);
return short.length > 100 ? short.slice(0, 100) + '…' : short;
}
if (tool === 'Read' && input.file_path) {
// Skip Claude's internal tool-result file reads — they're plumbing, not user-facing
if (input.file_path.includes('/tool-results/') || input.file_path.includes('/.claude/projects/')) return '';
return `Reading ${shorten(input.file_path)}`;
}
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
}
// Keep the old name as an alias for backward compat
function summarizeToolInput(tool: string, input: any): string {
return describeToolCall(tool, input);
}
/**
* Scan a Claude stream event for the session canary. Returns the channel where
* it leaked, or null if clean. Covers every outbound channel: text blocks,
* text deltas, tool_use arguments (including nested URL/path/command strings),
* and result payloads.
*/
function detectCanaryLeak(event: any, canary: string, buf?: DeltaBuffer): string | null {
if (!canary) return null;
if (event.type === 'assistant' && event.message?.content) {
for (const block of event.message.content) {
if (block.type === 'text' && typeof block.text === 'string' && block.text.includes(canary)) {
return 'assistant_text';
}
if (block.type === 'tool_use' && checkCanaryInStructure(block.input, canary)) {
return `tool_use:${block.name}`;
}
}
}
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
if (checkCanaryInStructure(event.content_block.input, canary)) {
return `tool_use:${event.content_block.name}`;
}
}
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta') {
if (typeof event.delta.text === 'string') {
// Rolling buffer: an attacker can ask Claude to emit the canary split
// across two deltas (e.g., "CANARY-" then "ABCDEF"). A per-delta
// substring check misses this. Concatenate the previous tail with
// this chunk and search, then trim the tail to last canary.length-1
// chars for the next event.
const combined = buf ? buf.text_delta + event.delta.text : event.delta.text;
if (combined.includes(canary)) return 'text_delta';
if (buf) buf.text_delta = combined.slice(-(canary.length - 1));
}
}
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
if (typeof event.delta.partial_json === 'string') {
const combined = buf ? buf.input_json_delta + event.delta.partial_json : event.delta.partial_json;
if (combined.includes(canary)) return 'tool_input_delta';
if (buf) buf.input_json_delta = combined.slice(-(canary.length - 1));
}
}
if (event.type === 'content_block_stop' && buf) {
// Block boundary — reset the rolling buffer so a canary straddling
// two independent tool_use blocks isn't inferred.
buf.text_delta = '';
buf.input_json_delta = '';
}
if (event.type === 'result' && typeof event.result === 'string' && event.result.includes(canary)) {
return 'result';
}
return null;
}
/** Rolling-window tails for delta canary detection. See detectCanaryLeak. */
interface DeltaBuffer {
text_delta: string;
input_json_delta: string;
}
interface CanaryContext {
canary: string;
pageUrl: string;
onLeak: (channel: string) => void;
deltaBuf: DeltaBuffer;
}
interface ToolResultScanContext {
scan: (toolName: string, text: string) => Promise<void>;
}
/**
* Per-tab map of tool_use_id tool name. Lets the tool_result handler
* know what tool produced the content (Read, Grep, Glob, Bash $B ...) so
* we can tag attack logs with the ingress source.
*/
const toolUseRegistry = new Map<string, { toolName: string; toolInput: unknown }>();
/**
* Extract plain-text content from a tool_result block. The Claude stream
* encodes it as either a string or an array of content blocks (text, image).
* We care about text images can't carry prompt injection at this layer.
*/
function extractToolResultText(content: unknown): string {
if (typeof content === 'string') return content;
if (!Array.isArray(content)) return '';
const parts: string[] = [];
for (const block of content) {
if (block && typeof block === 'object') {
const b = block as Record<string, unknown>;
if (b.type === 'text' && typeof b.text === 'string') parts.push(b.text);
}
}
return parts.join('\n');
}
/**
* Tools whose outputs should be ML-scanned. Bash/$B outputs already get
* scanned via the page-content flow. Read/Glob/Grep outputs have been
* uncovered Codex review flagged this gap. Adding coverage here closes it.
*/
const SCANNED_TOOLS = new Set(['Read', 'Grep', 'Glob', 'Bash', 'WebFetch']);
async function handleStreamEvent(event: any, tabId?: number, canaryCtx?: CanaryContext, toolResultScanCtx?: ToolResultScanContext): Promise<void> {
// Canary check runs BEFORE any outbound send — we never want to relay
// a leaked token to the sidepanel UI.
if (canaryCtx) {
const channel = detectCanaryLeak(event, canaryCtx.canary, canaryCtx.deltaBuf);
if (channel) {
canaryCtx.onLeak(channel);
return; // drop the event — never relay content that leaked the canary
}
}
if (event.type === 'system' && event.session_id) {
// Relay claude session ID for --resume support
await sendEvent({ type: 'system', claudeSessionId: event.session_id }, tabId);
}
if (event.type === 'assistant' && event.message?.content) {
for (const block of event.message.content) {
if (block.type === 'tool_use') {
// Register the tool_use so we can correlate tool_results back to
// the originating tool when they arrive in the next user-role message.
if (block.id) toolUseRegistry.set(block.id, { toolName: block.name, toolInput: block.input });
await sendEvent({ type: 'tool_use', tool: block.name, input: summarizeToolInput(block.name, block.input) }, tabId);
} else if (block.type === 'text' && block.text) {
await sendEvent({ type: 'text', text: block.text }, tabId);
}
}
}
// Tool results come back in user-role messages. Content can be a string
// or an array of typed content blocks.
if (event.type === 'user' && event.message?.content) {
for (const block of event.message.content) {
if (block && typeof block === 'object' && block.type === 'tool_result') {
const meta = block.tool_use_id ? toolUseRegistry.get(block.tool_use_id) : null;
const toolName = meta?.toolName ?? 'Unknown';
const text = extractToolResultText(block.content);
// Scan this tool output with the ML classifier if the tool is in
// the SCANNED_TOOLS set and the content is non-trivial.
if (SCANNED_TOOLS.has(toolName) && text.length >= 32 && toolResultScanCtx) {
// Fire-and-forget — never block the stream handler. If BLOCK
// fires, onToolResultBlock handles kill + emit.
toolResultScanCtx.scan(toolName, text).catch(() => {});
}
if (block.tool_use_id) toolUseRegistry.delete(block.tool_use_id);
}
}
}
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
if (event.content_block.id) {
toolUseRegistry.set(event.content_block.id, {
toolName: event.content_block.name,
toolInput: event.content_block.input,
});
}
await sendEvent({ type: 'tool_use', tool: event.content_block.name, input: summarizeToolInput(event.content_block.name, event.content_block.input) }, tabId);
}
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta' && event.delta.text) {
await sendEvent({ type: 'text_delta', text: event.delta.text }, tabId);
}
// Relay tool results so the sidebar can show what happened
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
// Tool input streaming — skip, we already announced the tool
}
if (event.type === 'result') {
await sendEvent({ type: 'result', text: event.result || '' }, tabId);
}
// Tool result events — summarize and relay
if (event.type === 'tool_result' || (event.type === 'assistant' && event.message?.content)) {
// Tool results come in the next assistant turn — handled above
}
}
/**
* Fire the prompt-injection-detected event to the server. This terminates
* the session from the sidepanel's perspective and renders the canary leak
* banner. Also logs locally (salted hash + domain only) and fires telemetry
* if configured.
*/
async function onCanaryLeaked(params: {
tabId: number;
channel: string;
canary: string;
pageUrl: string;
}): Promise<void> {
const { tabId, channel, canary, pageUrl } = params;
const domain = extractDomain(pageUrl);
console.warn(`[sidebar-agent] CANARY LEAK detected on ${channel} for tab ${tabId} (domain=${domain || 'unknown'})`);
// Local log — salted hash + domain only, never the payload
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash: hashPayload(canary), // hash the canary, not the payload (which might be leaked content)
confidence: 1.0,
layer: 'canary',
verdict: 'block',
});
// Broadcast to sidepanel so it can render the approved banner
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: 'canary_leaked',
layer: 'canary',
channel,
domain,
}, tabId);
// Also emit agent_error so the sidepanel's existing error surface
// reflects that the session terminated. Keeps old clients working.
await sendEvent({
type: 'agent_error',
error: `Session terminated — prompt injection detected${domain ? ` from ${domain}` : ''}`,
}, tabId);
}
/**
* Pre-spawn ML scan of the user message. If the classifier fires at BLOCK,
* we log the attempt, emit a security_event to the sidepanel, and DO NOT
* spawn claude. Returns true if the scan blocked the session.
*
* Fail-open: any classifier error or degraded state returns false (safe) so
* the sidebar keeps working. The architectural controls (XML framing +
* command allowlist, live in server.ts:554-577) still defend.
*/
async function preSpawnSecurityCheck(entry: QueueEntry): Promise<boolean> {
const { message, canary, pageUrl, tabId } = entry;
if (!message || message.length === 0) return false;
const tid = tabId ?? 0;
// L4: scan the user message for direct injection patterns (TestSavantAI)
// L4c: also scan with DeBERTa-v3 when ensemble is enabled (opt-in)
const [contentSignal, debertaSignal] = await Promise.all([
scanPageContent(message),
scanPageContentDeberta(message),
]);
const signals: LayerSignal[] = [contentSignal, debertaSignal];
// L4b: only bother with Haiku if another layer already lit up at >= LOG_ONLY.
// Saves ~70% of Haiku calls per plan §E1 "gating optimization".
if (shouldRunTranscriptCheck(signals)) {
const transcriptSignal = await checkTranscript({
user_message: message,
tool_calls: [], // no tool calls yet at session start
});
signals.push(transcriptSignal);
}
const result = combineVerdict(signals);
if (result.verdict !== 'block') return false;
// BLOCK verdict. Log + emit + refuse to spawn.
const domain = extractDomain(pageUrl ?? '');
const leaderSignal = signals.reduce((a, b) => (a.confidence > b.confidence ? a : b));
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash: hashPayload(message),
confidence: result.confidence,
layer: leaderSignal.layer,
verdict: 'block',
});
console.warn(`[sidebar-agent] Pre-spawn BLOCK (${result.reason}) for tab ${tid}, confidence=${result.confidence.toFixed(3)}`);
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: result.reason ?? 'ml_classifier',
layer: leaderSignal.layer,
confidence: result.confidence,
domain,
}, tid);
await sendEvent({
type: 'agent_error',
error: `Session blocked — prompt injection detected${domain ? ` from ${domain}` : ' in your message'}`,
}, tid);
return true;
}
async function askClaude(queueEntry: QueueEntry): Promise<void> {
const { prompt, args, stateFile, cwd, tabId, canary, pageUrl } = queueEntry;
const tid = tabId ?? 0;
processingTabs.add(tid);
await sendEvent({ type: 'agent_start' }, tid);
// Pre-spawn ML scan: if the user message trips the ensemble, refuse to
// spawn claude. Fail-open on classifier errors.
if (await preSpawnSecurityCheck(queueEntry)) {
processingTabs.delete(tid);
return;
}
return new Promise((resolve) => {
// Canary context is set after proc is spawned (needs proc reference for kill).
let canaryCtx: CanaryContext | undefined;
let canaryTriggered = false;
// Use args from queue entry (server sets --model, --allowedTools, prompt framing).
// Fall back to defaults only if queue entry has no args (backward compat).
// Write doesn't expand attack surface beyond what Bash already provides.
// The security boundary is the localhost-only message path, not the tool allowlist.
let claudeArgs = args || ['-p', prompt, '--output-format', 'stream-json', '--verbose',
'--allowedTools', 'Bash,Read,Glob,Grep,Write'];
// Validate cwd exists — queue may reference a stale worktree
let effectiveCwd = cwd || process.cwd();
try { fs.accessSync(effectiveCwd); } catch (err: any) {
console.warn('[sidebar-agent] Worktree path inaccessible, falling back to cwd:', effectiveCwd, err.message);
effectiveCwd = process.cwd();
}
// Clear any stale cancel signal for this tab before starting
const cancelFile = cancelFileForTab(tid);
safeUnlink(cancelFile);
const proc = spawn('claude', claudeArgs, {
stdio: ['pipe', 'pipe', 'pipe'],
cwd: effectiveCwd,
env: {
...process.env,
BROWSE_STATE_FILE: stateFile || '',
// Connect to the existing headed browse server, never start a new one.
// BROWSE_PORT tells the CLI which port to check.
// BROWSE_NO_AUTOSTART prevents spawning an invisible headless browser
// if the headed server is down — fail fast with a clear error instead.
BROWSE_PORT: process.env.BROWSE_PORT || '34567',
BROWSE_NO_AUTOSTART: '1',
// Pin this agent to its tab — prevents cross-tab interference
// when multiple agents run simultaneously
BROWSE_TAB: String(tid),
},
});
// Track active procs so kill-file polling can terminate them
activeProcs.set(tid, proc);
activeProc = proc;
proc.stdin.end();
// Now that proc exists, set up the canary-leak handler. It fires at most
// once; on fire we kill the subprocess, emit security_event + agent_error,
// and let the normal close handler resolve the promise.
if (canary) {
canaryCtx = {
canary,
pageUrl: pageUrl ?? '',
deltaBuf: { text_delta: '', input_json_delta: '' },
onLeak: (channel: string) => {
if (canaryTriggered) return;
canaryTriggered = true;
onCanaryLeaked({ tabId: tid, channel, canary, pageUrl: pageUrl ?? '' });
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => {
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
}, 2000);
},
};
}
// Tool-result ML scan context. Addresses the Codex review gap: Read,
// Grep, Glob, and WebFetch outputs enter Claude's context without
// passing through the Bash $B pipeline that content-security.ts
// already wraps. Scan them here.
let toolResultBlockFired = false;
const toolResultScanCtx: ToolResultScanContext = {
scan: async (toolName: string, text: string) => {
if (toolResultBlockFired) return;
// Parallel L4 + L4c ensemble scan (DeBERTa no-op when disabled).
// We run L4/L4c AND Haiku in parallel on tool outputs regardless of
// L4's score, because BrowseSafe-Bench shows L4 (TestSavantAI) has
// low recall on browser-agent-specific attacks (~15% at v1). Gating
// Haiku on L4 meant our best signal almost never ran. The cost is
// ~$0.002 + ~300ms per tool output, bounded by the Haiku timeout
// and offset by Haiku actually seeing the real attack context.
//
// Haiku only runs when the Claude CLI is available (checkHaikuAvailable
// caches the probe). In environments without it, the call returns a
// degraded signal and the verdict falls back to L4 alone.
const [contentSignal, debertaSignal, transcriptSignal] = await Promise.all([
scanPageContent(text),
scanPageContentDeberta(text),
checkTranscript({
user_message: queueEntry.message ?? '',
tool_calls: [{ tool_name: toolName, tool_input: {} }],
tool_output: text,
}),
]);
const signals: LayerSignal[] = [contentSignal, debertaSignal, transcriptSignal];
const result = combineVerdict(signals, { toolOutput: true });
if (result.verdict !== 'block') return;
toolResultBlockFired = true;
const domain = extractDomain(pageUrl ?? '');
const payloadHash = hashPayload(text.slice(0, 4096));
// Log pending — if the user overrides, we'll update via a separate
// log line. The attempts.jsonl is append-only so both entries survive.
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash,
confidence: result.confidence,
layer: 'testsavant_content',
verdict: 'block',
});
console.warn(`[sidebar-agent] Tool-result BLOCK on ${toolName} for tab ${tid} (confidence=${result.confidence.toFixed(3)}) — awaiting user decision`);
// Surface a REVIEWABLE block event. Sidepanel renders the suspected
// text + layer scores + [Allow and continue] / [Block session] buttons.
// The user has 60s to decide; default is BLOCK (safe fallback).
const layerScores = signals
.filter((s) => s.confidence > 0)
.map((s) => ({ layer: s.layer, confidence: s.confidence }));
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: 'tool_result_ml',
layer: 'testsavant_content',
confidence: result.confidence,
domain,
tool: toolName,
reviewable: true,
suspected_text: excerptForReview(text),
signals: layerScores,
}, tid);
// Poll for the user's decision. Default to BLOCK on timeout.
const REVIEW_TIMEOUT_MS = 60_000;
const POLL_MS = 500;
clearDecision(tid); // clear any stale decision from a prior session
const deadline = Date.now() + REVIEW_TIMEOUT_MS;
let decision: 'allow' | 'block' = 'block';
let decisionReason = 'timeout';
while (Date.now() < deadline) {
const rec = readDecision(tid);
if (rec?.decision === 'allow' || rec?.decision === 'block') {
decision = rec.decision;
decisionReason = rec.reason ?? 'user';
break;
}
await new Promise((r) => setTimeout(r, POLL_MS));
}
clearDecision(tid);
if (decision === 'allow') {
// User overrode. Log the override so the audit trail captures it.
// toolResultBlockFired stays true so we don't re-prompt within the
// same message — one override per BLOCK event.
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash,
confidence: result.confidence,
layer: 'testsavant_content',
verdict: 'user_overrode',
});
await sendEvent({
type: 'security_event',
verdict: 'user_overrode',
reason: 'tool_result_ml',
layer: 'testsavant_content',
confidence: result.confidence,
domain,
tool: toolName,
}, tid);
console.warn(`[sidebar-agent] Tab ${tid}: user overrode BLOCK — session continues`);
// Let the block stay consumed; reset the flag so subsequent tool
// results get scanned fresh.
toolResultBlockFired = false;
return;
}
// User chose BLOCK (or timed out). Kill the session as before.
await sendEvent({
type: 'agent_error',
error: `Session terminated — prompt injection detected in ${toolName} output${decisionReason === 'timeout' ? ' (review timeout)' : ''}`,
}, tid);
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => {
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
}, 2000);
},
};
// Poll for per-tab cancel signal from server's killAgent()
const cancelCheck = setInterval(() => {
try {
if (fs.existsSync(cancelFile)) {
console.log(`[sidebar-agent] Cancel signal received for tab ${tid} — killing claude subprocess`);
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
fs.unlinkSync(cancelFile);
clearInterval(cancelCheck);
}
} catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
}, 500);
let buffer = '';
proc.stdout.on('data', (data: Buffer) => {
buffer += data.toString();
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.trim()) continue;
try { handleStreamEvent(JSON.parse(line), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse stream line:`, line.slice(0, 100), err.message);
}
}
});
let stderrBuffer = '';
proc.stderr.on('data', (data: Buffer) => {
stderrBuffer += data.toString();
});
proc.on('close', (code) => {
clearInterval(cancelCheck);
activeProc = null;
activeProcs.delete(tid);
if (buffer.trim()) {
try { handleStreamEvent(JSON.parse(buffer), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse final buffer:`, buffer.slice(0, 100), err.message);
}
}
const doneEvent: Record<string, any> = { type: 'agent_done' };
if (code !== 0 && stderrBuffer.trim()) {
doneEvent.stderr = stderrBuffer.trim().slice(-500);
}
sendEvent(doneEvent, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
});
proc.on('error', (err) => {
clearInterval(cancelCheck);
activeProc = null;
const errorMsg = stderrBuffer.trim()
? `${err.message}\nstderr: ${stderrBuffer.trim().slice(-500)}`
: err.message;
sendEvent({ type: 'agent_error', error: errorMsg }, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
});
// Timeout (default 300s / 5 min — multi-page tasks need time)
const timeoutMs = parseInt(process.env.SIDEBAR_AGENT_TIMEOUT || '300000', 10);
setTimeout(() => {
try { proc.kill('SIGTERM'); } catch (killErr: any) {
console.warn(`[sidebar-agent] Tab ${tid}: Failed to kill timed-out process:`, killErr.message);
}
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
const timeoutMsg = stderrBuffer.trim()
? `Timed out after ${timeoutMs / 1000}s\nstderr: ${stderrBuffer.trim().slice(-500)}`
: `Timed out after ${timeoutMs / 1000}s`;
sendEvent({ type: 'agent_error', error: timeoutMsg }, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
}, timeoutMs);
});
}
// ─── Poll loop ───────────────────────────────────────────────────
function countLines(): number {
try {
return fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean).length;
} catch (err: any) {
console.error('[sidebar-agent] Failed to read queue file:', err.message);
return 0;
}
}
function readLine(n: number): string | null {
try {
const lines = fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean);
return lines[n - 1] || null;
} catch (err: any) {
console.error(`[sidebar-agent] Failed to read queue line ${n}:`, err.message);
return null;
}
}
async function poll() {
const current = countLines();
if (current <= lastLine) return;
while (lastLine < current) {
lastLine++;
const line = readLine(lastLine);
if (!line) continue;
let parsed: unknown;
try { parsed = JSON.parse(line); } catch (err: any) {
console.warn(`[sidebar-agent] Skipping malformed queue entry at line ${lastLine}:`, line.slice(0, 80), err.message);
continue;
}
if (!isValidQueueEntry(parsed)) {
console.warn(`[sidebar-agent] Skipping invalid queue entry at line ${lastLine}: failed schema validation`);
continue;
}
const entry = parsed;
const tid = entry.tabId ?? 0;
// Skip if this tab already has an agent running — server queues per-tab
if (processingTabs.has(tid)) continue;
console.log(`[sidebar-agent] Processing tab ${tid}: "${entry.message}"`);
// Write to inbox so workspace agent can pick it up
writeToInbox(entry.message || entry.prompt, entry.pageUrl, entry.sessionId);
// Fire and forget — each tab's agent runs concurrently
askClaude(entry).catch((err) => {
console.error(`[sidebar-agent] Error on tab ${tid}:`, err);
sendEvent({ type: 'agent_error', error: String(err) }, tid);
});
}
}
// ─── Main ────────────────────────────────────────────────────────
function pollKillFile(): void {
try {
const stat = fs.statSync(KILL_FILE);
const mtime = stat.mtimeMs;
if (mtime > lastKillTs) {
lastKillTs = mtime;
if (activeProcs.size > 0) {
console.log(`[sidebar-agent] Kill signal received — terminating ${activeProcs.size} active agent(s)`);
for (const [tid, proc] of activeProcs) {
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 2000);
processingTabs.delete(tid);
}
activeProcs.clear();
}
}
} catch {
// Kill file doesn't exist yet — normal state
}
}
async function main() {
const dir = path.dirname(QUEUE);
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
if (!fs.existsSync(QUEUE)) fs.writeFileSync(QUEUE, '', { mode: 0o600 });
try { fs.chmodSync(QUEUE, 0o600); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
lastLine = countLines();
await refreshToken();
console.log(`[sidebar-agent] Started. Watching ${QUEUE} from line ${lastLine}`);
console.log(`[sidebar-agent] Server: ${SERVER_URL}`);
console.log(`[sidebar-agent] Browse binary: ${B}`);
// If GSTACK_SECURITY_ENSEMBLE=deberta is set, also warm the DeBERTa-v3
// ensemble classifier. Fire-and-forget alongside TestSavantAI — they
// warm in parallel. No-op when the env var is unset.
loadDeberta((msg) => console.log(`[security-classifier] ${msg}`))
.catch((err) => console.warn('[sidebar-agent] DeBERTa warmup failed:', err?.message));
// Warm up the ML classifier in the background. First call triggers a 112MB
// download (~30s on average broadband). Non-blocking — the sidebar stays
// functional on cold start; classifier just reports 'off' until warmed.
//
// On warmup completion (success or failure), write the classifier status to
// ~/.gstack/security/session-state.json so server.ts's /health endpoint can
// report it to the sidepanel for shield icon rendering.
loadTestsavant((msg) => console.log(`[security-classifier] ${msg}`))
.then(() => {
const s = getClassifierStatus();
console.log(`[sidebar-agent] Classifier warmup complete: ${JSON.stringify(s)}`);
const existing = readSessionState();
writeSessionState({
sessionId: existing?.sessionId ?? String(process.pid),
canary: existing?.canary ?? '',
warnedDomains: existing?.warnedDomains ?? [],
classifierStatus: s,
lastUpdated: new Date().toISOString(),
});
})
.catch((err) => console.warn('[sidebar-agent] Classifier warmup failed (degraded mode):', err?.message));
setInterval(poll, POLL_MS);
setInterval(pollKillFile, POLL_MS);
}
main().catch(console.error);
+49 -11
View File
@@ -200,10 +200,18 @@ function buildServer() {
// /ws — WebSocket upgrade. CRITICAL gates:
// (1) Origin must be chrome-extension://<id>. Cross-site WS hijacking
// defense per codex finding #9.
// (2) Cookie gstack_pty must be in validTokens. The cookie was
// minted by the parent server's /pty-session route under a
// valid AUTH_TOKEN, so a request without it can't get a shell.
// defense — required, not optional.
// (2) Token must be in validTokens. We accept the token via two
// transports for compatibility:
// - Sec-WebSocket-Protocol (preferred for browsers — the only
// auth header settable from the browser WebSocket API)
// - Cookie gstack_pty (works for non-browser callers and
// same-port browser callers; doesn't survive the cross-port
// jump from server.ts:34567 to the agent's random port
// when SameSite=Strict is set)
// Either path works; both verify against the same in-memory
// validTokens Set, populated by the parent server's
// authenticated /pty-session → /internal/grant chain.
if (url.pathname === '/ws') {
const origin = req.headers.get('origin') || '';
const isExtensionOrigin = origin.startsWith('chrome-extension://');
@@ -214,18 +222,48 @@ function buildServer() {
return new Response('forbidden origin', { status: 403 });
}
const cookieHeader = req.headers.get('cookie') || '';
let cookieToken: string | null = null;
for (const part of cookieHeader.split(';')) {
const [name, ...rest] = part.trim().split('=');
if (name === 'gstack_pty') { cookieToken = rest.join('=') || null; break; }
// Try Sec-WebSocket-Protocol first. Format: a single token, possibly
// with a `gstack-pty.` prefix (which we strip). Browsers send a
// comma-separated list when multiple were requested; we pick the
// first that matches a known token.
const protoHeader = req.headers.get('sec-websocket-protocol') || '';
let token: string | null = null;
let acceptedProtocol: string | null = null;
for (const raw of protoHeader.split(',').map(s => s.trim()).filter(Boolean)) {
const candidate = raw.startsWith('gstack-pty.') ? raw.slice('gstack-pty.'.length) : raw;
if (validTokens.has(candidate)) {
token = candidate;
acceptedProtocol = raw;
break;
}
}
if (!cookieToken || !validTokens.has(cookieToken)) {
// Fallback: Cookie gstack_pty (legacy / non-browser callers).
if (!token) {
const cookieHeader = req.headers.get('cookie') || '';
for (const part of cookieHeader.split(';')) {
const [name, ...rest] = part.trim().split('=');
if (name === 'gstack_pty') {
const candidate = rest.join('=') || null;
if (candidate && validTokens.has(candidate)) {
token = candidate;
}
break;
}
}
}
if (!token) {
return new Response('unauthorized', { status: 401 });
}
const upgraded = server.upgrade(req, {
data: { cookie: cookieToken },
data: { cookie: token },
// Echo the protocol back so the browser accepts the upgrade.
// Required when the client sends Sec-WebSocket-Protocol — the
// server MUST select one of the offered protocols, otherwise
// the browser closes the connection immediately.
...(acceptedProtocol ? { headers: { 'Sec-WebSocket-Protocol': acceptedProtocol } } : {}),
});
return upgraded ? undefined : new Response('upgrade failed', { status: 500 });
}
-226
View File
@@ -1,226 +0,0 @@
/**
* Layer 3: Sidebar agent round-trip tests.
* Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
* script that outputs canned stream-json. Verifies events flow end-to-end:
* POST /sidebar-command queue sidebar-agent mock claude events /sidebar-chat
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { spawn, type Subprocess } from 'bun';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
let serverProc: Subprocess | null = null;
let agentProc: Subprocess | null = null;
let serverPort: number = 0;
let authToken: string = '';
let tmpDir: string = '';
let stateFile: string = '';
let queueFile: string = '';
let mockBinDir: string = '';
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
...(opts.headers as Record<string, string> || {}),
};
if (!headers['Authorization'] && authToken) {
headers['Authorization'] = `Bearer ${authToken}`;
}
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
}
async function resetState() {
await api('/sidebar-session/new', { method: 'POST' });
fs.writeFileSync(queueFile, '');
}
async function pollChatUntil(
predicate: (entries: any[]) => boolean,
timeoutMs = 10000,
): Promise<any[]> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const resp = await api('/sidebar-chat?after=0');
const data = await resp.json();
if (predicate(data.entries)) return data.entries;
await new Promise(r => setTimeout(r, 300));
}
// Return whatever we have on timeout
const resp = await api('/sidebar-chat?after=0');
return (await resp.json()).entries;
}
function writeMockClaude(script: string) {
const mockPath = path.join(mockBinDir, 'claude');
fs.writeFileSync(mockPath, script, { mode: 0o755 });
}
beforeAll(async () => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
stateFile = path.join(tmpDir, 'browse.json');
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
mockBinDir = path.join(tmpDir, 'bin');
fs.mkdirSync(mockBinDir, { recursive: true });
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
// Write default mock claude that outputs canned events
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"mock-session-123"}'
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
echo '{"type":"result","result":"Done."}'
`);
// Start server (no browser)
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
serverProc = spawn(['bun', 'run', serverScript], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_HEADLESS_SKIP: '1',
BROWSE_PORT: '0',
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_IDLE_TIMEOUT: '300',
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Wait for server
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (fs.existsSync(stateFile)) {
try {
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
if (state.port && state.token) {
serverPort = state.port;
authToken = state.token;
break;
}
} catch {}
}
await new Promise(r => setTimeout(r, 100));
}
if (!serverPort) throw new Error('Server did not start in time');
// Start sidebar-agent with mock claude on PATH
const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
agentProc = spawn(['bun', 'run', agentScript], {
env: {
...process.env,
PATH: `${mockBinDir}:${process.env.PATH}`,
BROWSE_SERVER_PORT: String(serverPort),
BROWSE_STATE_FILE: stateFile,
SIDEBAR_QUEUE_PATH: queueFile,
SIDEBAR_AGENT_TIMEOUT: '10000',
BROWSE_BIN: 'browse', // doesn't matter, mock claude doesn't use it
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Give sidebar-agent time to start polling
await new Promise(r => setTimeout(r, 1000));
}, 20000);
afterAll(() => {
if (agentProc) { try { agentProc.kill(); } catch {} }
if (serverProc) { try { serverProc.kill(); } catch {} }
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
});
describe('sidebar-agent round-trip', () => {
test('full message round-trip with mock claude', async () => {
await resetState();
// Send a command
const resp = await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: 'what is on this page?',
activeTabUrl: 'https://example.com/test',
}),
});
expect(resp.status).toBe(200);
// Wait for mock claude to process and events to arrive
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done'),
15000,
);
// Verify the flow: user message → agent_start → text → agent_done
const userEntry = entries.find((e: any) => e.role === 'user');
expect(userEntry).toBeDefined();
expect(userEntry.message).toBe('what is on this page?');
// The mock claude outputs text — check for any agent text entry
const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
expect(textEntries.length).toBeGreaterThan(0);
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
expect(doneEntry).toBeDefined();
// Agent should be back to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
}, 20000);
test('claude crash produces agent_error', async () => {
await resetState();
// Replace mock claude with one that crashes
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"crash-test"}' >&2
exit 1
`);
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'crash test' }),
});
// Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
15000,
);
// Agent should recover to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
`);
}, 20000);
test('sequential queue drain', async () => {
await resetState();
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
`);
// Send two messages rapidly — first processes, second queues
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'first message' }),
});
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'second message' }),
});
// Wait for both to complete (two agent_done events)
const entries = await pollChatUntil(
(entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
20000,
);
// Both user messages should be in chat
const userEntries = entries.filter((e: any) => e.role === 'user');
expect(userEntries.length).toBeGreaterThanOrEqual(2);
}, 25000);
});
-562
View File
@@ -1,562 +0,0 @@
/**
* Tests for sidebar agent queue parsing and inbox writing.
*
* sidebar-agent.ts functions are not exported (it's an entry-point script),
* so we test the same logic inline: JSONL parsing, writeToInbox filesystem
* behavior, and edge cases.
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
// ─── Helpers: replicate sidebar-agent logic for unit testing ──────
/** Parse a single JSONL line — same logic as sidebar-agent poll() */
function parseQueueLine(line: string): any | null {
if (!line.trim()) return null;
try {
const entry = JSON.parse(line);
if (!entry.message && !entry.prompt) return null;
return entry;
} catch {
return null;
}
}
/** Read all valid entries from a JSONL string — same as countLines + readLine loop */
function parseQueueFile(content: string): any[] {
const entries: any[] = [];
const lines = content.split('\n').filter(Boolean);
for (const line of lines) {
const entry = parseQueueLine(line);
if (entry) entries.push(entry);
}
return entries;
}
/** Write to inbox — extracted logic from sidebar-agent.ts writeToInbox() */
function writeToInbox(
gitRoot: string,
message: string,
pageUrl?: string,
sessionId?: string,
): string | null {
if (!gitRoot) return null;
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
fs.mkdirSync(inboxDir, { recursive: true });
const now = new Date();
const timestamp = now.toISOString().replace(/:/g, '-');
const filename = `${timestamp}-observation.json`;
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
const finalFile = path.join(inboxDir, filename);
const inboxMessage = {
type: 'observation',
timestamp: now.toISOString(),
page: { url: pageUrl || 'unknown', title: '' },
userMessage: message,
sidebarSessionId: sessionId || 'unknown',
};
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2));
fs.renameSync(tmpFile, finalFile);
return finalFile;
}
/** Shorten paths — same logic as sidebar-agent.ts shorten() */
function shorten(str: string): string {
return str
.replace(/\/Users\/[^/]+/g, '~')
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
.replace(/\.claude\/skills\/gstack\//g, '')
.replace(/browse\/dist\/browse/g, '$B');
}
/** describeToolCall — replicated from sidebar-agent.ts for unit testing */
function describeToolCall(tool: string, input: any): string {
if (!input) return '';
if (tool === 'Bash' && input.command) {
const cmd = input.command;
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
if (browseMatch) {
const browseCmd = browseMatch[1] || browseMatch[2];
const args = cmd.split(/\s+/).slice(2).join(' ');
switch (browseCmd) {
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
case 'click': return `Clicking ${args}`;
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
case 'text': return 'Reading page text';
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
case 'links': return 'Finding all links on the page';
case 'forms': return 'Looking for forms';
case 'console': return 'Checking browser console for errors';
case 'network': return 'Checking network requests';
case 'url': return 'Checking current URL';
case 'back': return 'Going back';
case 'forward': return 'Going forward';
case 'reload': return 'Reloading the page';
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
case 'wait': return `Waiting for ${args}`;
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
case 'style': return `Changing CSS: ${args}`;
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
case 'prettyscreenshot': return 'Taking a clean screenshot';
case 'css': return `Checking CSS property: ${args}`;
case 'is': return `Checking if element is ${args}`;
case 'diff': return `Comparing ${args}`;
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
case 'status': return 'Checking browser status';
case 'tabs': return 'Listing open tabs';
case 'focus': return 'Bringing browser to front';
case 'select': return `Selecting option in ${args}`;
case 'hover': return `Hovering over ${args}`;
case 'viewport': return `Setting viewport to ${args}`;
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
default: return `Running browse ${browseCmd} ${args}`.trim();
}
}
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
let short = shorten(cmd);
return short.length > 100 ? short.slice(0, 100) + '…' : short;
}
if (tool === 'Read' && input.file_path) return `Reading ${shorten(input.file_path)}`;
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
}
// ─── Test setup ──────────────────────────────────────────────────
let tmpDir: string;
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-agent-test-'));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
// ─── Queue File Parsing ─────────────────────────────────────────
describe('queue file parsing', () => {
test('valid JSONL line parsed correctly', () => {
const line = JSON.stringify({ message: 'hello', prompt: 'check this', pageUrl: 'https://example.com' });
const entry = parseQueueLine(line);
expect(entry).not.toBeNull();
expect(entry.message).toBe('hello');
expect(entry.prompt).toBe('check this');
expect(entry.pageUrl).toBe('https://example.com');
});
test('malformed JSON line skipped without crash', () => {
const entry = parseQueueLine('this is not json {{{');
expect(entry).toBeNull();
});
test('valid JSON without message or prompt is skipped', () => {
const line = JSON.stringify({ foo: 'bar' });
const entry = parseQueueLine(line);
expect(entry).toBeNull();
});
test('empty file returns no entries', () => {
const entries = parseQueueFile('');
expect(entries).toEqual([]);
});
test('file with blank lines returns no entries', () => {
const entries = parseQueueFile('\n\n\n');
expect(entries).toEqual([]);
});
test('mixed valid and invalid lines', () => {
const content = [
JSON.stringify({ message: 'first' }),
'not json',
JSON.stringify({ unrelated: true }),
JSON.stringify({ message: 'second', prompt: 'do stuff' }),
].join('\n');
const entries = parseQueueFile(content);
expect(entries.length).toBe(2);
expect(entries[0].message).toBe('first');
expect(entries[1].message).toBe('second');
});
});
// ─── writeToInbox ────────────────────────────────────────────────
describe('writeToInbox', () => {
test('creates .context/sidebar-inbox/ directory', () => {
writeToInbox(tmpDir, 'test message');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
expect(fs.existsSync(inboxDir)).toBe(true);
expect(fs.statSync(inboxDir).isDirectory()).toBe(true);
});
test('writes valid JSON file', () => {
const filePath = writeToInbox(tmpDir, 'test message', 'https://example.com', 'session-123');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.type).toBe('observation');
expect(data.userMessage).toBe('test message');
expect(data.page.url).toBe('https://example.com');
expect(data.sidebarSessionId).toBe('session-123');
expect(data.timestamp).toBeTruthy();
});
test('atomic write — final file exists, no .tmp left', () => {
const filePath = writeToInbox(tmpDir, 'atomic test');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
// Check no .tmp files remain in the inbox directory
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir);
const tmpFiles = files.filter(f => f.endsWith('.tmp'));
expect(tmpFiles.length).toBe(0);
// Final file should end with -observation.json
const jsonFiles = files.filter(f => f.endsWith('-observation.json') && !f.startsWith('.'));
expect(jsonFiles.length).toBe(1);
});
test('handles missing git root gracefully', () => {
const result = writeToInbox('', 'test');
expect(result).toBeNull();
});
test('defaults pageUrl to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no url provided');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.page.url).toBe('unknown');
});
test('defaults sessionId to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no session');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.sidebarSessionId).toBe('unknown');
});
test('multiple writes create separate files', () => {
writeToInbox(tmpDir, 'message 1');
// Tiny delay to ensure different timestamps
const t = Date.now();
while (Date.now() === t) {} // spin until next ms
writeToInbox(tmpDir, 'message 2');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir).filter(f => f.endsWith('.json') && !f.startsWith('.'));
expect(files.length).toBe(2);
});
});
// ─── describeToolCall (verbose narration) ────────────────────────
describe('describeToolCall', () => {
// Browse navigation commands
test('goto → plain English with URL', () => {
const result = describeToolCall('Bash', { command: '$B goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('goto strips quotes from URL', () => {
const result = describeToolCall('Bash', { command: '$B goto "https://example.com"' });
expect(result).toBe('Opening https://example.com');
});
test('url → checking current URL', () => {
expect(describeToolCall('Bash', { command: '$B url' })).toBe('Checking current URL');
});
test('back/forward/reload → plain English', () => {
expect(describeToolCall('Bash', { command: '$B back' })).toBe('Going back');
expect(describeToolCall('Bash', { command: '$B forward' })).toBe('Going forward');
expect(describeToolCall('Bash', { command: '$B reload' })).toBe('Reloading the page');
});
// Snapshot variants
test('snapshot -i → scanning for interactive elements', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -i' })).toBe('Scanning for interactive elements');
});
test('snapshot -D → checking what changed', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -D' })).toBe('Checking what changed');
});
test('snapshot (plain) → taking a snapshot', () => {
expect(describeToolCall('Bash', { command: '$B snapshot' })).toBe('Taking a snapshot of the page');
});
// Interaction commands
test('click → clicking element', () => {
expect(describeToolCall('Bash', { command: '$B click @e3' })).toBe('Clicking @e3');
});
test('fill → typing into element', () => {
expect(describeToolCall('Bash', { command: '$B fill @e4 "hello world"' })).toBe('Typing ""hello world"" into @e4');
});
test('scroll with selector → scrolling to element', () => {
expect(describeToolCall('Bash', { command: '$B scroll .footer' })).toBe('Scrolling to .footer');
});
test('scroll without args → scrolling down', () => {
expect(describeToolCall('Bash', { command: '$B scroll' })).toBe('Scrolling down');
});
// Reading commands
test('text → reading page text', () => {
expect(describeToolCall('Bash', { command: '$B text' })).toBe('Reading page text');
});
test('html with selector → reading HTML of element', () => {
expect(describeToolCall('Bash', { command: '$B html .header' })).toBe('Reading HTML of .header');
});
test('html without selector → reading full page HTML', () => {
expect(describeToolCall('Bash', { command: '$B html' })).toBe('Reading full page HTML');
});
test('links → finding all links', () => {
expect(describeToolCall('Bash', { command: '$B links' })).toBe('Finding all links on the page');
});
test('console → checking console', () => {
expect(describeToolCall('Bash', { command: '$B console' })).toBe('Checking browser console for errors');
});
// Inspector commands
test('inspect with selector → inspecting CSS', () => {
expect(describeToolCall('Bash', { command: '$B inspect .header' })).toBe('Inspecting CSS of .header');
});
test('inspect without args → getting last picked element', () => {
expect(describeToolCall('Bash', { command: '$B inspect' })).toBe('Getting CSS for last picked element');
});
test('style → changing CSS', () => {
expect(describeToolCall('Bash', { command: '$B style .header color red' })).toBe('Changing CSS: .header color red');
});
test('cleanup → removing page clutter', () => {
expect(describeToolCall('Bash', { command: '$B cleanup --all' })).toBe('Removing page clutter (ads, popups, banners)');
});
// Visual commands
test('screenshot → saving screenshot', () => {
expect(describeToolCall('Bash', { command: '$B screenshot /tmp/shot.png' })).toBe('Saving screenshot to /tmp/shot.png');
});
test('screenshot without path', () => {
expect(describeToolCall('Bash', { command: '$B screenshot' })).toBe('Saving screenshot');
});
test('responsive → multi-size screenshots', () => {
expect(describeToolCall('Bash', { command: '$B responsive' })).toBe('Taking screenshots at mobile, tablet, and desktop sizes');
});
// Non-browse tools
test('Read tool → reading file', () => {
expect(describeToolCall('Read', { file_path: '/Users/foo/project/src/app.ts' })).toBe('Reading ~/project/src/app.ts');
});
test('Grep tool → searching for pattern', () => {
expect(describeToolCall('Grep', { pattern: 'handleClick' })).toBe('Searching for "handleClick"');
});
test('Glob tool → finding files', () => {
expect(describeToolCall('Glob', { pattern: '**/*.tsx' })).toBe('Finding files matching **/*.tsx');
});
test('Edit tool → editing file', () => {
expect(describeToolCall('Edit', { file_path: '/Users/foo/src/main.ts' })).toBe('Editing ~/src/main.ts');
});
// Edge cases
test('null input → empty string', () => {
expect(describeToolCall('Bash', null)).toBe('');
});
test('unknown browse command → generic description', () => {
expect(describeToolCall('Bash', { command: '$B newtab https://foo.com' })).toContain('newtab');
});
test('non-browse bash → shortened command', () => {
expect(describeToolCall('Bash', { command: 'echo hello' })).toBe('echo hello');
});
test('full browse binary path recognized', () => {
const result = describeToolCall('Bash', { command: '/Users/garrytan/.claude/skills/gstack/browse/dist/browse goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('tab command → switching tab', () => {
expect(describeToolCall('Bash', { command: '$B tab 2' })).toContain('tab');
});
});
// ─── Per-tab agent concurrency (source code validation) ──────────
describe('per-tab agent concurrency', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
test('server has per-tab agent state map', () => {
expect(serverSrc).toContain('tabAgents');
expect(serverSrc).toContain('TabAgentState');
expect(serverSrc).toContain('getTabAgent');
});
test('server returns per-tab agent status in /sidebar-chat', () => {
expect(serverSrc).toContain('getTabAgentStatus');
expect(serverSrc).toContain('tabAgentStatus');
});
test('spawnClaude accepts forTabId parameter', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('forTabId');
expect(spawnFn).toContain('tabState.status');
});
test('sidebar-command endpoint uses per-tab agent state', () => {
expect(serverSrc).toContain('msgTabId');
expect(serverSrc).toContain('tabState.status');
expect(serverSrc).toContain('tabState.queue');
});
test('agent event handler resets per-tab state', () => {
expect(serverSrc).toContain('eventTabId');
expect(serverSrc).toContain('tabState.status = \'idle\'');
});
test('agent event handler processes per-tab queue', () => {
// After agent_done, should process next message from THIS tab's queue
expect(serverSrc).toContain('tabState.queue.length > 0');
expect(serverSrc).toContain('tabState.queue.shift');
});
test('sidebar-agent uses per-tab processing set', () => {
expect(agentSrc).toContain('processingTabs');
expect(agentSrc).not.toContain('isProcessing');
});
test('sidebar-agent sends tabId with all events', () => {
// sendEvent should accept tabId parameter
expect(agentSrc).toContain('async function sendEvent(event: Record<string, any>, tabId?: number)');
// askClaude destructures tabId from queue entry (regex tolerates
// additional fields like `canary` and `pageUrl` from security module).
expect(agentSrc).toMatch(
/const \{[^}]*\bprompt\b[^}]*\bargs\b[^}]*\bstateFile\b[^}]*\bcwd\b[^}]*\btabId\b[^}]*\}/
);
});
test('sidebar-agent allows concurrent agents across tabs', () => {
// poll() should not block globally — it should check per-tab
expect(agentSrc).toContain('processingTabs.has(tid)');
// askClaude should be fire-and-forget (no await blocking the loop)
expect(agentSrc).toContain('askClaude(entry).catch');
});
test('queue entries include tabId', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('tabId: agentTabId');
});
test('health check monitors all per-tab agents', () => {
expect(serverSrc).toContain('for (const [tid, state] of tabAgents)');
});
});
describe('BROWSE_TAB tab pinning (cross-tab isolation)', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
const cliSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'cli.ts'), 'utf-8');
test('sidebar-agent passes BROWSE_TAB env var to claude process', () => {
// The env block should include BROWSE_TAB set to the tab ID
expect(agentSrc).toContain('BROWSE_TAB');
expect(agentSrc).toContain('String(tid)');
});
test('CLI reads BROWSE_TAB and sends tabId in command body', () => {
// BROWSE_TAB env var is still honored (sidebar-agent path). After the
// make-pdf refactor, the CLI layer now also accepts --tab-id <N>, with
// the CLI flag taking precedence over the env var. Both resolve to the
// same `tabId` body field.
expect(cliSrc).toContain('process.env.BROWSE_TAB');
expect(cliSrc).toContain('parseInt(envTab, 10)');
});
test('handleCommandInternal accepts tabId from request body', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1) > 0
? serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1)
: serverSrc.indexOf('\nasync function ', serverSrc.indexOf('async function handleCommandInternal(') + 200),
);
// Should destructure tabId from body
expect(handleFn).toContain('tabId');
// Should save and restore the active tab
expect(handleFn).toContain('savedTabId');
expect(handleFn).toContain('switchTab(tabId');
});
test('handleCommandInternal restores active tab after command (success path)', () => {
// On success, should restore savedTabId without stealing focus
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.length,
);
// Count restore calls — should appear in both success and error paths
const restoreCount = (handleFn.match(/switchTab\(savedTabId/g) || []).length;
expect(restoreCount).toBeGreaterThanOrEqual(2); // success + error paths
});
test('handleCommandInternal restores active tab on error path', () => {
// The catch block should also restore
const catchBlock = serverSrc.slice(
serverSrc.indexOf('} catch (err: any) {', serverSrc.indexOf('async function handleCommandInternal(')),
);
expect(catchBlock).toContain('switchTab(savedTabId');
});
test('tab pinning only activates when tabId is provided', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('try {', serverSrc.indexOf('async function handleCommandInternal(') + 1),
);
// Should check tabId is not undefined/null before switching
expect(handleFn).toContain('tabId !== undefined');
expect(handleFn).toContain('tabId !== null');
});
test('CLI only sends tabId when it is a valid number', () => {
// Body should conditionally include tabId. Historically that was keyed off
// the BROWSE_TAB env var. After the make-pdf refactor, the CLI also honors
// a --tab-id <N> flag on the CLI itself, so the check is "tabId defined
// AND not NaN" rather than literally inspecting the env var.
expect(cliSrc).toContain('tabId !== undefined && !isNaN(tabId)');
});
});
+204 -81
View File
@@ -1,26 +1,15 @@
/**
* Regression: changing the default sidebar tab to Terminal must NOT break
* the existing Chat path or the debug-tab return-to logic.
* Regression: sidebar layout invariants after the chat-tab rip.
*
* Original /plan-eng-review Issue 3A asked for a Playwright + extension
* E2E test. The codebase doesn't ship Playwright extension launcher
* infrastructure (extension tests here are source-level), so this regression
* is implemented as a structural assertion suite over the extension files.
* That's enough to lock the load-bearing invariants:
* The Chrome side panel used to host two surfaces: Chat (one-shot
* `claude -p` queue) and Terminal (interactive PTY). Chat was ripped
* once the PTY proved out sidebar-agent.ts is gone, the chat queue
* endpoints are gone, and the primary-tab nav (Terminal | Chat) is
* gone. Terminal is now the sole primary surface.
*
* 1. Terminal is the default-active primary tab.
* 2. Chat exists as a non-active primary tab.
* 3. The xterm assets are loaded.
* 4. The debug-close path no longer hardcodes `tab-chat` (uses the
* activePrimaryPaneId helper that respects whichever primary tab
* the user has selected).
* 5. Manifest declares the ws://127.0.0.1 host permission so MV3
* doesn't block the WebSocket upgrade.
* 6. The chat surface (chat-messages, chat input wiring) still exists
* and was not accidentally deleted alongside the default-tab change.
*
* If a future refactor regresses any of these, this test fails BEFORE the
* change ships.
* This file locks the load-bearing invariants of that layout so a
* future refactor can't silently re-introduce the old surface or break
* the new one.
*/
import { describe, test, expect } from 'bun:test';
@@ -32,84 +21,220 @@ const JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel
const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8');
const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8'));
describe('sidebar tabs regression: Terminal is default, Chat survives', () => {
test('primary tab bar declares Terminal and Chat with Terminal active', () => {
// Terminal is the active button.
expect(HTML).toMatch(/<button[^>]*class="primary-tab active"[^>]*data-pane="terminal"/);
// Chat is a primary tab, present and non-active.
expect(HTML).toMatch(/<button[^>]*class="primary-tab"[^>]*data-pane="chat"/);
describe('sidebar: chat tab + nav are removed, Terminal is sole primary surface', () => {
test('No primary-tab nav element exists', () => {
expect(HTML).not.toContain('class="primary-tabs"');
expect(HTML).not.toContain('data-pane="chat"');
expect(HTML).not.toContain('data-pane="terminal"');
});
test('Terminal pane is active and Chat pane is not active', () => {
// tab-terminal has the .active class on its <main>.
expect(HTML).toMatch(/<main id="tab-terminal" class="tab-content active"/);
// tab-chat is present but NOT active.
expect(HTML).toMatch(/<main id="tab-chat" class="tab-content"(?! active)/);
test('No <main id="tab-chat"> pane', () => {
expect(HTML).not.toMatch(/<main[^>]*id="tab-chat"/);
expect(HTML).not.toContain('id="chat-messages"');
expect(HTML).not.toContain('id="chat-loading"');
expect(HTML).not.toContain('id="chat-welcome"');
});
test('xterm assets are loaded for the Terminal pane', () => {
expect(HTML).toContain('lib/xterm.css');
expect(HTML).toContain('lib/xterm.js');
expect(HTML).toContain('lib/xterm-addon-fit.js');
expect(HTML).toContain('sidepanel-terminal.js');
test('No chat input / send button / experimental banner', () => {
expect(HTML).not.toContain('class="command-bar"');
expect(HTML).not.toContain('id="command-input"');
expect(HTML).not.toContain('id="send-btn"');
expect(HTML).not.toContain('id="stop-agent-btn"');
expect(HTML).not.toContain('id="experimental-banner"');
});
test('chat surface still exists (no accidental deletion)', () => {
// The chat input and chat-messages containers are load-bearing for the
// existing sidebar-agent flow. If the default-tab change accidentally
// removed them, this catches it before users do.
expect(HTML).toContain('id="chat-messages"');
expect(HTML).toContain('id="chat-loading"');
test('No clear-chat button in footer', () => {
expect(HTML).not.toContain('id="clear-chat"');
});
test('debug-close path no longer hardcodes tab-chat', () => {
// Before the Terminal default flip, sidepanel.js had two literal
// `getElementById('tab-chat').classList.add('active')` calls inside the
// debug-close handlers. Both must now go through activePrimaryPaneId()
// so closing debug returns to whichever primary tab is selected.
expect(JS).toContain('function activePrimaryPaneId');
// Old hardcoded form is gone (don't ban the string everywhere — there
// are legitimate references elsewhere in the file).
const debugToggleBlock = JS.slice(
JS.indexOf("debugToggle.addEventListener('click'"),
JS.indexOf("closeDebug.addEventListener('click'"),
);
expect(debugToggleBlock).not.toContain("'tab-chat'");
expect(debugToggleBlock).toContain('activePrimaryPaneId');
test('Terminal pane is .active by default and has the toolbar', () => {
expect(HTML).toMatch(/<main[^>]*id="tab-terminal"[^>]*class="tab-content active"/);
expect(HTML).toContain('id="terminal-toolbar"');
expect(HTML).toContain('id="terminal-restart-now"');
});
test('primary-tab click handler exists and toggles classes', () => {
expect(JS).toContain("querySelectorAll('.primary-tab')");
expect(JS).toContain('aria-selected');
test('Quick-actions buttons (Cleanup / Screenshot / Cookies) survive in the terminal toolbar', () => {
// Garry explicitly wanted these kept after the chat rip — they drive
// browser actions, not chat.
expect(HTML).toContain('id="chat-cleanup-btn"');
expect(HTML).toContain('id="chat-screenshot-btn"');
expect(HTML).toContain('id="chat-cookies-btn"');
// They live inside the terminal toolbar now (siblings of the Restart
// button), not as a separate strip below all panes.
const toolbarStart = HTML.indexOf('id="terminal-toolbar"');
const toolbarEnd = HTML.indexOf('</div>', toolbarStart);
const toolbarBlock = HTML.slice(toolbarStart, toolbarEnd + 6);
expect(toolbarBlock).toContain('id="chat-cleanup-btn"');
expect(toolbarBlock).toContain('id="chat-screenshot-btn"');
expect(toolbarBlock).toContain('id="chat-cookies-btn"');
});
});
describe('sidebar terminal: lazy spawn + auth chain', () => {
test('terminal JS waits for first key to start (lazy-spawn)', () => {
expect(TERM_JS).toContain('function onAnyKey');
expect(TERM_JS).toContain('terminalActive');
expect(TERM_JS).toContain('connect()');
describe('sidepanel.js: chat helpers ripped, terminal-injection helper survives', () => {
test('No primary-tab click handler', () => {
expect(JS).not.toContain("querySelectorAll('.primary-tab')");
expect(JS).not.toContain('activePrimaryPaneId');
});
test('terminal JS does NOT auto-reconnect on close (codex finding #8)', () => {
// Close handler transitions to ENDED and shows a restart button,
// not a reconnect timer.
const closeBlock = TERM_JS.slice(TERM_JS.indexOf("addEventListener('close'"));
expect(closeBlock).toContain('ENDED');
// Forbid bare setTimeout(...connect... patterns inside this file's
// close handler — would indicate auto-reconnect crept back in.
expect(TERM_JS).not.toMatch(/close[\s\S]{0,200}setTimeout\([^)]*connect/);
test('No chat polling, sendMessage, sendChat, stopAgent, or pollTabs', () => {
expect(JS).not.toContain('chatPollInterval');
expect(JS).not.toContain('function sendMessage');
expect(JS).not.toContain('function pollChat');
expect(JS).not.toContain('function pollTabs');
expect(JS).not.toContain('function switchChatTab');
expect(JS).not.toContain('function stopAgent');
expect(JS).not.toContain('function applyChatEnabled');
expect(JS).not.toContain('function showSecurityBanner');
});
test('terminal JS reaches /pty-session with the bootstrap auth token', () => {
expect(TERM_JS).toContain('/pty-session');
expect(TERM_JS).toContain('Bearer ${token}');
expect(TERM_JS).toContain('credentials');
test('Cleanup runs through the live PTY (no /sidebar-command POST)', () => {
// The new Cleanup handler injects the prompt straight into claude's
// PTY via gstackInjectToTerminal. The dead code path was a POST to
// /sidebar-command which kicked off a fresh claude -p subprocess.
const cleanup = JS.slice(JS.indexOf('async function runCleanup'));
expect(cleanup).toContain('window.gstackInjectToTerminal');
expect(cleanup).not.toContain('/sidebar-command');
expect(cleanup).not.toContain('addChatEntry');
});
test('terminal JS opens ws://127.0.0.1 (not wss)', () => {
expect(TERM_JS).toContain('new WebSocket(`ws://127.0.0.1:');
// Origin is implicit (browser sets chrome-extension://<id>); no manual override.
test('Inspector "Send to Code" routes through the live PTY', () => {
const sendBtn = JS.slice(JS.indexOf('inspectorSendBtn.addEventListener'));
expect(sendBtn).toContain('window.gstackInjectToTerminal');
expect(sendBtn).not.toContain("type: 'sidebar-command'");
});
test('updateConnection no longer kicks off chat / tab polling', () => {
const update = JS.slice(JS.indexOf('function updateConnection'), JS.indexOf('function updateConnection') + 1500);
expect(update).not.toContain('chatPollInterval');
expect(update).not.toContain('tabPollInterval');
expect(update).not.toContain('pollChat');
expect(update).not.toContain('pollTabs');
// BUT must still expose the bootstrap globals for sidepanel-terminal.js.
expect(update).toContain('window.gstackServerPort');
expect(update).toContain('window.gstackAuthToken');
});
});
describe('sidepanel-terminal.js: eager auto-connect + injection API', () => {
test('Exposes window.gstackInjectToTerminal for cross-pane use', () => {
expect(TERM_JS).toContain('window.gstackInjectToTerminal');
// Returns false when no live session, true when bytes go out.
const inject = TERM_JS.slice(TERM_JS.indexOf('window.gstackInjectToTerminal'));
expect(inject).toContain('return false');
expect(inject).toContain('return true');
expect(inject).toContain('ws.readyState !== WebSocket.OPEN');
});
test('Auto-connects on init (no keypress required)', () => {
expect(TERM_JS).not.toContain('function onAnyKey');
expect(TERM_JS).not.toContain("addEventListener('keydown'");
expect(TERM_JS).toContain('function tryAutoConnect');
});
test('Repaint hook fires when Terminal pane becomes visible', () => {
// The chat-tab rip removed gstack:primary-tab-changed; we use a
// MutationObserver on #tab-terminal's class attr instead. The
// observer must call repaintIfLive when the .active class returns.
expect(TERM_JS).toContain('MutationObserver');
expect(TERM_JS).toContain("attributeFilter: ['class']");
expect(TERM_JS).toContain('repaintIfLive');
const repaint = TERM_JS.slice(TERM_JS.indexOf('function repaintIfLive'));
expect(repaint).toContain('fitAddon && fitAddon.fit()');
expect(repaint).toContain('term.refresh');
expect(repaint).toContain("type: 'resize'");
});
test('No auto-reconnect on close (Restart is user-initiated)', () => {
const closeOnly = TERM_JS.slice(
TERM_JS.indexOf("ws.addEventListener('close'"),
TERM_JS.indexOf("ws.addEventListener('error'"),
);
expect(closeOnly).not.toContain('setTimeout');
expect(closeOnly).not.toContain('tryAutoConnect');
expect(closeOnly).not.toContain('connect()');
});
test('forceRestart helper closes ws, disposes xterm, returns to IDLE', () => {
expect(TERM_JS).toContain('function forceRestart');
const fn = TERM_JS.slice(TERM_JS.indexOf('function forceRestart'));
expect(fn).toContain('ws && ws.close()');
expect(fn).toContain('term.dispose()');
expect(fn).toContain('STATE.IDLE');
expect(fn).toContain('tryAutoConnect()');
});
test('Both restart buttons (mid-session and ENDED) call forceRestart', () => {
expect(TERM_JS).toContain("els.restart?.addEventListener('click', forceRestart)");
expect(TERM_JS).toContain("els.restartNow?.addEventListener('click', forceRestart)");
});
});
describe('server.ts: chat / sidebar-agent endpoints are gone', () => {
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
test('No /sidebar-command, /sidebar-chat, /sidebar-agent/* routes', () => {
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-command['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-chat['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname\.startsWith\(['"]\/sidebar-agent\//);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-agent\/event['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-tabs['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-session['"]/);
});
test('No chat-related state declarations or helpers', () => {
// Allow the symbol names inside the rip-marker comments — but no
// `let`, `const`, `function`, or `interface` declarations of them.
expect(SERVER_SRC).not.toMatch(/^let agentProcess/m);
expect(SERVER_SRC).not.toMatch(/^let agentStatus/m);
expect(SERVER_SRC).not.toMatch(/^let messageQueue/m);
expect(SERVER_SRC).not.toMatch(/^let sidebarSession/m);
expect(SERVER_SRC).not.toMatch(/^const tabAgents/m);
expect(SERVER_SRC).not.toMatch(/^function pickSidebarModel/m);
expect(SERVER_SRC).not.toMatch(/^function processAgentEvent/m);
expect(SERVER_SRC).not.toMatch(/^function killAgent/m);
expect(SERVER_SRC).not.toMatch(/^function addChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface ChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface SidebarSession/m);
});
test('/health no longer surfaces agentStatus or messageQueue length', () => {
const health = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/health'"));
const slice = health.slice(0, 2000);
expect(slice).not.toContain('agentStatus');
expect(slice).not.toContain('messageQueue');
expect(slice).not.toContain('agentStartTime');
// chatEnabled is hardcoded false now (older clients still see the field).
expect(slice).toMatch(/chatEnabled:\s*false/);
// terminalPort survives.
expect(slice).toContain('terminalPort');
});
});
describe('cli.ts: sidebar-agent is no longer spawned', () => {
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
test('No Bun.spawn of sidebar-agent.ts', () => {
expect(CLI_SRC).not.toMatch(/Bun\.spawn\(\s*\['bun',\s*'run',\s*\w*[Aa]gent[Ss]cript\][\s\S]{0,300}sidebar-agent/);
// The variable name `agentScript` was for sidebar-agent. After the
// rip there's only termAgentScript. Allow comments to mention the
// history but not active spawn calls.
expect(CLI_SRC).not.toMatch(/^\s*let agentScript = path\.resolve/m);
});
test('Terminal-agent spawn survives', () => {
expect(CLI_SRC).toContain('terminal-agent.ts');
expect(CLI_SRC).toMatch(/Bun\.spawn\(\['bun',\s*'run',\s*termAgentScript\]/);
});
});
describe('files: sidebar-agent.ts and its tests are deleted', () => {
test('browse/src/sidebar-agent.ts is gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'))).toBe(false);
});
test('sidebar-agent test files are gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent.test.ts'))).toBe(false);
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent-roundtrip.test.ts'))).toBe(false);
});
});
@@ -123,8 +248,6 @@ describe('manifest: ws permission + xterm-safe CSP', () => {
});
test('manifest does NOT add unsafe-eval to extension_pages CSP', () => {
// xterm@5 is eval-free (verified at vendor time). If a future xterm
// upgrade requires unsafe-eval, this test fires and forces a decision.
const csp = MANIFEST.content_security_policy;
if (csp && csp.extension_pages) {
expect(csp.extension_pages).not.toContain('unsafe-eval');
+60 -1
View File
@@ -127,7 +127,7 @@ describe('terminal-agent: /ws gates', () => {
});
});
describe('terminal-agent: PTY round-trip via real WebSocket', () => {
describe('terminal-agent: PTY round-trip via real WebSocket (Cookie auth)', () => {
test('binary writes go to PTY stdin, output streams back', async () => {
const cookie = 'rt-token-must-be-at-least-seventeen-chars-long';
const granted = await grantToken(cookie);
@@ -182,6 +182,65 @@ describe('terminal-agent: PTY round-trip via real WebSocket', () => {
await Bun.sleep(200);
});
test('Sec-WebSocket-Protocol auth path: browser-style upgrade with token in protocol', async () => {
// This is the path the actual browser extension takes. Cross-port
// SameSite=Strict cookies don't reliably survive the jump from the
// browse server (port A) to the agent (port B) when initiated from a
// chrome-extension origin, so we send the token via the only auth
// header the browser WebSocket API lets us set: Sec-WebSocket-Protocol.
//
// The browser sends `gstack-pty.<token>` and the agent must:
// 1) strip the gstack-pty. prefix
// 2) validate the token
// 3) ECHO the protocol back in the upgrade response
// Without (3) the browser closes the connection immediately, which
// is the exact bug the original cookie-only implementation hit in
// manual dogfood. This test catches that regression in CI.
const token = 'sec-protocol-token-must-be-at-least-seventeen-chars';
await grantToken(token);
// We exercise the protocol path by raw-handshaking via fetch+Upgrade,
// because Bun's test-client WebSocket constructor doesn't propagate
// `protocols` cleanly when also passed `headers` (the constructor
// detects the third-arg form unreliably). Real browsers (Chromium)
// use the standard protocols arg fine — the server-side handler is
// identical either way, so this test still locks the load-bearing
// invariant: the agent accepts a token via Sec-WebSocket-Protocol
// and echoes the protocol back so a browser would accept the upgrade.
const handshakeKey = 'dGhlIHNhbXBsZSBub25jZQ==';
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': handshakeKey,
'Sec-WebSocket-Protocol': `gstack-pty.${token}`,
'Origin': 'chrome-extension://test-extension-id',
},
});
// 101 Switching Protocols + protocol echoed back = browser would accept.
// 401/403/anything else = browser would close the connection immediately
// (the bug we hit in manual dogfood).
expect(resp.status).toBe(101);
expect(resp.headers.get('upgrade')?.toLowerCase()).toBe('websocket');
expect(resp.headers.get('sec-websocket-protocol')).toBe(`gstack-pty.${token}`);
});
test('Sec-WebSocket-Protocol auth: rejects unknown token even with valid Origin', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': 'dGhlIHNhbXBsZSBub25jZQ==',
'Sec-WebSocket-Protocol': 'gstack-pty.never-granted-token',
'Origin': 'chrome-extension://test-extension-id',
},
});
expect(resp.status).toBe(401);
});
test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => {
const cookie = 'resize-token-must-be-at-least-seventeen-chars';
await grantToken(cookie);
+23 -4
View File
@@ -122,12 +122,26 @@ describe('Source-level guard: terminal-agent', () => {
expect(wsHandler).toContain('forbidden origin');
});
test('validates gstack_pty cookie against an in-memory token set', () => {
test('validates the session token against an in-memory token set', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Two transports: Sec-WebSocket-Protocol (preferred for browsers) and
// Cookie gstack_pty (fallback). Both verify against validTokens.
expect(wsHandler).toContain('sec-websocket-protocol');
expect(wsHandler).toContain('gstack_pty');
expect(wsHandler).toContain('validTokens.has');
});
test('Sec-WebSocket-Protocol auth: strips gstack-pty. prefix and echoes back', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Browsers send `Sec-WebSocket-Protocol: gstack-pty.<token>`. The agent
// must strip the prefix before checking validTokens, AND echo the
// protocol back in the upgrade response — without the echo, the
// browser closes the connection immediately.
expect(wsHandler).toContain("'gstack-pty.'");
expect(wsHandler).toContain('Sec-WebSocket-Protocol');
expect(wsHandler).toContain('acceptedProtocol');
});
test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => {
// The whole point of lazy-spawn (codex finding #8) is that the WS
// upgrade itself does NOT call spawnClaude. Spawn happens on first
@@ -158,14 +172,19 @@ describe('Source-level guard: terminal-agent', () => {
});
describe('Source-level guard: server.ts /pty-session route', () => {
test('validates AUTH_TOKEN and uses cookie-based grant', () => {
test('validates AUTH_TOKEN, grants over loopback, returns token + Set-Cookie', () => {
const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'"));
// Must check auth before minting.
const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken'));
expect(beforeMint).toContain('validateAuth');
// Must call the loopback grant before responding.
// Must call the loopback grant before responding (otherwise the
// agent's validTokens Set never sees the token and /ws would 401).
expect(route).toContain('grantPtyToken');
// Must Set-Cookie with the minted token.
// Must return the token in the JSON body for the
// Sec-WebSocket-Protocol auth path (cross-port cookies don't survive
// SameSite=Strict from a chrome-extension origin).
expect(route).toContain('ptySessionToken');
// Set-Cookie is kept as a fallback for non-browser callers.
expect(route).toContain('Set-Cookie');
expect(route).toContain('buildPtySetCookie');
});