diff --git a/CHANGELOG.md b/CHANGELOG.md index 1e328e0ed..cc8b86170 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,64 @@ # Changelog +## [1.44.0.0] - 2026-05-23 + +## **Sidebar Claude Code now survives the day.** WebSocket keepalive, transparent re-attach across network blips with scrollback intact, and a restart button that actually kills the old claude before spawning the new one. Outer supervisor opt-in so the browse server itself can crash and recover without you noticing. + +The sidebar's embedded `claude` PTY used to stop connecting after a while, and the Restart button only closed the client-side WebSocket without killing the running process. Closing the browser left zombie claude processes alive for minutes. None of that any more. + +Five compounding timeouts on the v1.43 path: PTY session token TTL was 30 minutes with no refresh, no WebSocket keepalive (NAT idle timeouts of 30-60s silently dropped connections), the server's 30-minute idle timeout didn't account for active PTY sessions, the sidebar gave up after 15 seconds on cold start, and there was no auto-reconnect after a WS close. On top of that, a `pkill -f terminal-agent\.ts` regex in the agent teardown matched sibling gstack sessions on the same host. All six fixed. + +### The numbers that matter + +Source: 13 bisect commits on this branch (`git log v1.43.3.0..HEAD --oneline`), 12 new test files, 83 new unit-tier static-grep + behavioral tests; full `bun test` suite green. Live re-attach behavior runs against `GSTACK_PTY_DETACH_WINDOW_MS=1000` so the 60s detach window verifies in <2s of CI time. + +| Surface | Before | After | +|---|---|---| +| Sidebar idle 5 min, then keystroke | Reconnect spinner, ~3s to first byte | Immediate output — WS keepalive kept the socket alive | +| Wifi blip mid-session | "Session ended" — click Restart, lose scrollback | Silent re-attach within 8s, scrollback intact, keep typing | +| Click Restart with claude mid-task | Old claude killed asynchronously; race window with new spawn; user must type a key to see new prompt | Server disposes old PTY synchronously, mints new lease in one transaction, eager `{type:"start"}` boots claude before the prompt renders | +| Close sidebar / quit browser | Zombie claude lingers for 60s (detach window) | `pagehide` sendBeacon `/pty-dispose` cleans up immediately | +| Terminal-agent dies (OOM, signal) | Sidebar shows broken connection until manual reload | 60s watchdog with PID-liveness check (no split-brain) respawns agent automatically; 3-in-60s crash-loop guard | +| Browse server itself crashes | Headed browser orphaned, manual `$B connect` re-run | Opt-in `$B connect --supervise` keeps CLI attached, respawns server with 1s/2s/4s/8s/30s backoff, 5-in-5min guard | +| Two `$B connect` sessions on same host | Restart in session A kills session B's terminal-agent via `pkill -f` regex match | Identity-based `process.kill(pid)` against a per-boot agent record. Static-grep test fails CI if `pkill -f terminal-agent` reappears anywhere in source | +| PTY token leakage in logs / DevTools | Bearer token doubled as session identifier (codex outside-voice flagged it) | Stable non-secret `sessionId` separated from short-lived `attachToken`; lease lifecycle owns session liveness | + +### What this means for you + +Open the sidebar once. Use it. Close your laptop. Wake up tomorrow. Type a key. It just works. The whole "the terminal stopped working, let me reload the sidebar" reflex you've built up over the v1.40s — you don't need it any more. + +### Itemized changes + +#### Added + +- **Long-lived PTY connection (`browse/src/terminal-agent.ts`, `extension/sidepanel-terminal.js`)** — 25s WebSocket keepalive ping/pong cycle from both sides. NAT idle drops and Chrome MV3 panel-suspend cycles no longer silently kill the socket. Env-overridable via `GSTACK_PTY_KEEPALIVE_INTERVAL_MS`. +- **Session lease + attachToken model (`browse/src/pty-session-lease.ts`)** — Stable non-secret `sessionId` separated from short-lived secret `attachToken`. Re-attach within the lease window refreshes a fresh `attachToken` bound to the same `sessionId`; session identity stays loggable, bearer credential stays out of logs. +- **Scrollback replay on re-attach (`browse/src/terminal-agent.ts`)** — 1 MB frame-based ring buffer per session with ESC-boundary scan and alt-screen tracking (`CSI ?1049h/l`). On re-attach, client writes RIS (`\x1bc`) to xterm, server prepends DECSTR soft reset + optional alt-screen re-enter + ring buffer. Replay renders cleanly even mid-tool-call. Env-overridable via `GSTACK_PTY_RING_BUFFER_BYTES`. +- **60s detach window with re-attach (`browse/src/terminal-agent.ts`)** — WS close with any code other than 4001 (intentional), 4404 (no-claude), or 1000 (clean exit) keeps the PTY alive for 60s. New WS upgrade matching the same sessionId resumes the same `claude` process. Env-overridable via `GSTACK_PTY_DETACH_WINDOW_MS`. +- **Working Restart button (`browse/src/server.ts`, `extension/sidepanel-terminal.js`)** — `POST /pty-restart` is one transaction: dispose old session scope-to-sessionId, revoke old lease, mint fresh sessionId + lease + attachToken, return the 4-tuple. Client sends `{type:"start"}` immediately on the new WS for eager spawn — no keystroke required. +- **Explicit dispose on sidebar close (`extension/sidepanel.js`)** — `pagehide` handler fires `navigator.sendBeacon('/pty-dispose', {sessionId, authToken})` so browser quit / panel close / extension reload disposes the session immediately. Server route accepts auth token in the body (sendBeacon-compatible — no custom headers). +- **PID-identity terminal-agent kill (`browse/src/terminal-agent-control.ts`)** — Replaces `pkill -f terminal-agent\.ts` regex teardown. Agent writes `/terminal-agent-pid` (JSON `{pid, gen, startedAt}`) at boot; `cli.ts` and `server.ts` use `killAgentByRecord` instead. Static-grep tripwire test fails CI if the regex pattern returns to source. +- **Terminal-agent watchdog (`browse/src/server.ts`)** — 60s ticker checks recorded agent PID via `process.kill(pid, 0)`. Respawns on dead PID via shared `spawnTerminalAgent` helper. 3-in-60s crash-loop guard with rolling window. Slow-but-alive agents intentionally fall through (split-brain defense). Env-overridable via `GSTACK_AGENT_WATCHDOG_TICK_MS`. +- **Outer browse-server supervisor (`browse/src/cli.ts`)** — `$B connect --supervise` (or `BROWSE_SUPERVISE=1`) keeps the CLI attached, polls server PID every 30s, respawns on unexpected exit with 1s/2s/4s/8s/30s backoff. SIGINT/SIGTERM cleanly teardown the supervised server. Opt-in — default `$B connect` behavior unchanged for every existing caller. +- **Patient `tryAutoConnect` (`extension/sidepanel-terminal.js`)** — Replaces the 15s give-up with indefinite 2s polling. Ascending status messages at 15s / 60s / 5min so the user knows we're still trying. Sticky-abort only on 401 (auth invalid), cleared by explicit Restart click. +- **`/internal/healthz` route + `internalHandler` helper (`browse/src/terminal-agent.ts`)** — Liveness probe used by the watchdog (returns pid/gen/sessions count, doesn't touch claude binary lookup). Helper collapses four `/internal/*` routes' bearer-auth + X-Browse-Gen check + JSON parse into one-liner calls. + +#### Changed + +- **`/pty-session` response shape (`browse/src/server.ts`)** — Now returns `{terminalPort, sessionId, attachToken, leaseExpiresAt}`. Legacy `ptySessionToken` + `expiresAt` aliases preserved for one minor release. +- **`ServerConfig.ownsTerminalAgent` teardown** — Now runs four side effects (was three): identity-based kill via `killAgentByRecord`, plus unlinks for `terminal-port`, `terminal-internal-token`, and the new `terminal-agent-pid`. Documented in CLAUDE.md. + +#### Fixed + +- **Sibling gstack sessions killed by `pkill -f terminal-agent\.ts`** — Pre-v1.44 the teardown matched argv regex; any process whose command line contained `terminal-agent.ts` got SIGTERM'd. Closes the TODOS.md P3 item filed during v1.41 (`Identity-based terminal-agent kill`). +- **Seven pre-existing test failures unrelated to this branch** — Three env-pollution failures (Bun's `Bun.which('bash')` returning null and `Bun.spawn(['bun', ...])` ENOENT after a sibling test mutated `process.env.PATH`), two stale-marker failures in `server-auth.test.ts` (`'Sidebar agent started'` → `'Terminal agent started'`), `setup-codesign.test.ts` looking for the unwrapped `bun run build` string (now `bun_cmd run build`), and `upgrade-migration-v1.test.ts` reading the developer's real config because it didn't override `HOME`. Fixed via a narrow global `test-setup.ts` (restores PATH only after every test) plus targeted marker + env-passing fixes. + +#### For contributors + +- **Test framework `bunfig.toml` + `test-setup.ts`** — Global afterEach restores `process.env.PATH` only. Narrow on purpose — broader snapshot/restore breaks tests that legitimately set `process.env.GSTACK_HOME` at module load (`domain-skills-storage.test.ts`). +- **12 new test files, 83 new unit-tier tests.** Static-grep tripwires defend the load-bearing protocol contracts (close codes, lease lifecycle, watchdog identity check, supervisor crash-loop guard, ring buffer ESC boundaries) without paying for live WebSocket cycles in CI. +- **Eng review + outside voice (codex) ran on this branch.** 17 decisions baked: 10 from the in-review architecture pass (D1-D10), 6 from codex cross-model tension resolution (T1-T6, all adopted in codex's favor — most consequential was T1, separating sessionId from auth token), and 1 from in-PR scope-up of the outer supervisor. + ## [1.43.3.0] - 2026-05-21 ## **Headed Chromium embedded by external supervisors stops auto-shutting-down after 30 minutes of HTTP idle.** diff --git a/CLAUDE.md b/CLAUDE.md index 305c60c02..b34f038f9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -227,19 +227,23 @@ Activity / Refs / Inspector as debug overlays behind the footer's flow, dual-token model, and threat-model boundary — silent failures here usually trace to not understanding the cross-component flow. -**Embedder terminal-agent ownership** (v1.42.1.0+). `buildFetchHandler` -in `browse/src/server.ts` accepts `ServerConfig.ownsTerminalAgent?: -boolean` (default `true`). When `true`, factory shutdown runs the full -teardown: `pkill -f terminal-agent\.ts` plus `safeUnlinkQuiet` on -`/terminal-port` and `/terminal-internal-token`. -Embedders (e.g. the gbrowser phoenix overlay) that pre-launch their -own PTY server must pass `false` so their discovery files survive -gstack teardown cycles. The flag is the third caller-owned teardown -gate in `ServerConfig` (alongside `xvfb?` and `proxyBridge?`); polarity -is inverted (explicit bool vs presence) and documented in the field's -JSDoc. CLI `start()` always passes `true` explicitly — the static-grep -test in `browse/test/server-embedder-terminal-port.test.ts` fails CI -if a refactor drops it. +**Embedder terminal-agent ownership** (v1.42.1.0+, identity-based kill v1.44.0.0+). +`buildFetchHandler` in `browse/src/server.ts` accepts `ServerConfig.ownsTerminalAgent?: +boolean` (default `true`). When `true`, factory shutdown runs the full teardown: +identity-based kill via `killAgentByRecord(readAgentRecord(stateDir))` from +`browse/src/terminal-agent-control.ts` plus `safeUnlinkQuiet` on +`/terminal-port`, `/terminal-internal-token`, and +`/terminal-agent-pid` (the per-boot agent record introduced in v1.44). +Embedders (e.g. the gbrowser phoenix overlay) that pre-launch their own PTY +server must pass `false` so their discovery files survive gstack teardown cycles. +The flag is the third caller-owned teardown gate in `ServerConfig` (alongside +`xvfb?` and `proxyBridge?`); polarity is inverted (explicit bool vs presence) and +documented in the field's JSDoc. CLI `start()` always passes `true` explicitly — +the static-grep test in `browse/test/server-embedder-terminal-port.test.ts` fails +CI if a refactor drops it. Pre-v1.44 used `pkill -f terminal-agent\.ts` (regex +match) which would kill sibling gstack sessions on the same host; the new +`browse/test/terminal-agent-pid-identity.test.ts` static-grep tripwire fails CI +if any source file re-introduces `pkill ... terminal-agent` or `spawnSync('pkill', ...)`. **WebSocket auth uses Sec-WebSocket-Protocol, not cookies.** Browsers can't set `Authorization` on a WebSocket upgrade, but they CAN set diff --git a/TODOS.md b/TODOS.md index 01fdc1c85..fbd645e60 100644 --- a/TODOS.md +++ b/TODOS.md @@ -2,34 +2,17 @@ ## browse server: terminal-agent teardown follow-ups (filed v1.41 via /plan-eng-review) -### P3: Identity-based terminal-agent kill (replace pkill regex with PID) +### ✅ DONE (v1.44.0.0): Identity-based terminal-agent kill (replace pkill regex with PID) -**What:** Record the spawned terminal-agent PID at `browse/src/cli.ts:1057` and -replace `pkill -f terminal-agent\.ts` at both `cli.ts:1047` and -`server.ts:1281` (now inside the `if (ownsTerminalAgent)` gate) with -`process.kill(pid, signal)` against the recorded PID. - -**Why:** `pkill -f terminal-agent\.ts` matches by command-line regex, so today -it can kill ANY process whose argv contains `terminal-agent.ts` — sibling -gstack sessions, editor processes that have the file open, a second gstack -run on the same host. Latent footgun for the CLI path, not just embedders. - -**Pros:** Removes a real cross-session foot-cannon. PID-based kill is the -correct identity primitive. Lets us tighten `pkill -f`'s broad-match warning -in the new `ownsTerminalAgent` JSDoc to "historical" rather than "current". - -**Cons:** Requires threading the PID through the CLI-to-server state path -(currently the parent server reads `terminal-port` to discover the agent; it -would also need `terminal-agent-pid`). Touches `cli.ts`, `server.ts`, and -`terminal-agent.ts` together — bigger surface than the v1.41 fix. - -**Context:** Surfaced by both Codex and Claude subagent during /autoplan -review of the `ownsTerminalAgent` gate. Currently documented as out-of-scope -in `browse/src/server.ts` JSDoc for `ServerConfig.ownsTerminalAgent`. The -embedder fix (ownsTerminalAgent: false) means embedders don't hit this; CLI -users still do. - -**Depends on:** None. +**Resolved:** Bundled into the v1.44.0.0 long-lived-sidebar PR as Commit 0. +`browse/src/terminal-agent-control.ts` is the new home for `readAgentRecord`, +`writeAgentRecord`, `clearAgentRecord`, and `killAgentByRecord`. The agent +writes `/terminal-agent-pid` (JSON `{pid, gen, startedAt}`) at boot +and clears it on SIGTERM/SIGINT. `cli.ts` and `server.ts` both route through +`killAgentByRecord` instead of `pkill -f terminal-agent\.ts`. The new +`browse/test/terminal-agent-pid-identity.test.ts` is the static-grep tripwire +that fails CI if `pkill ... terminal-agent` or `spawnSync('pkill', ...)` +reappears in any source file. --- diff --git a/VERSION b/VERSION index f2c85ebe2..5b692d29a 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.43.3.0 +1.44.0.0 diff --git a/browse/src/cli.ts b/browse/src/cli.ts index 86c9273c9..4df950190 100644 --- a/browse/src/cli.ts +++ b/browse/src/cli.ts @@ -17,6 +17,7 @@ import { writeSecureFile, mkdirSecure } from './file-permissions'; import { resolveConfig, ensureStateDir, readVersionHash } from './config'; import { parseProxyConfig, computeConfigHash, ProxyConfigError } from './proxy-config'; import { redactProxyUrl } from './proxy-redact'; +import { spawnTerminalAgent } from './terminal-agent-control'; const config = resolveConfig(); const IS_WINDOWS = process.platform === 'win32'; @@ -1032,32 +1033,18 @@ Refs: After 'snapshot', use @e1, @e2... as selectors: // claude -p subprocesses to multiplex. // Auto-start terminal agent (non-compiled bun process). Owns the PTY - // WebSocket for the sidebar Terminal pane. - let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts'); - if (!fs.existsSync(termAgentScript)) { - termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts'); - } + // WebSocket for the sidebar Terminal pane. Routes through the shared + // spawnTerminalAgent helper so the CLI cold-start path and the + // server.ts watchdog respawn path share one implementation. The + // helper handles prior-PID cleanup, script lookup, and env wiring. try { - if (fs.existsSync(termAgentScript)) { - // Kill old terminal-agents so a stale port file can't trick the - // server into routing /pty-session at a dead listener. - try { - const { spawnSync } = require('child_process'); - spawnSync('pkill', ['-f', 'terminal-agent\\.ts'], { stdio: 'ignore', timeout: 3000 }); - } catch (err: any) { - if (err?.code !== 'ENOENT') throw err; - } - const termProc = Bun.spawn(['bun', 'run', termAgentScript], { - cwd: config.projectDir, - env: { - ...process.env, - BROWSE_STATE_FILE: config.stateFile, - BROWSE_SERVER_PORT: String(newState.port), - }, - stdio: ['ignore', 'ignore', 'ignore'], - }); - termProc.unref(); - console.log(`[browse] Terminal agent started (PID: ${termProc.pid})`); + const newPid = spawnTerminalAgent({ + stateFile: config.stateFile, + serverPort: newState.port, + cwd: config.projectDir, + }); + if (newPid) { + console.log(`[browse] Terminal agent started (PID: ${newPid})`); } } catch (err: any) { // Non-fatal: chat still works without the terminal agent. @@ -1067,6 +1054,96 @@ Refs: After 'snapshot', use @e1, @e2... as selectors: console.error(`[browse] Connect failed: ${err.message}`); process.exit(1); } + + // ─── Outer Supervisor (v1.44+, opt-in) ────────────────────────── + // + // Default: fire-and-forget (CLI exits, server runs detached). This is + // the contract every existing call site relies on, including Claude + // Code's Bash tool which expects `$B connect` to return promptly. + // + // Opt-in via `--supervise` flag or BROWSE_SUPERVISE=1 env: the CLI + // stays attached, polls the spawned server's PID every 30s, and + // respawns it through the same headed-mode startServer path on + // unexpected exit. Crash-loop guard: 5 respawns inside 5 min → + // give up and exit 1 with a clear error. SIGINT / SIGTERM cleanly + // tear down the supervised server before exit. + // + // Out of scope for v1.44 minimum: routing the Chromium-disconnect + // exit-code-1 path back through this supervisor. The terminal-agent + // watchdog (T5) already covers the highest-frequency restart case; + // Chromium-crash-respawn is documented as a follow-up so the + // supervisor stays a tight, testable primitive. + const superviseRequested = commandArgs.includes('--supervise') + || process.env.BROWSE_SUPERVISE === '1'; + if (!superviseRequested) { + process.exit(0); + } + console.log('[browse] Supervisor mode: monitoring server. Ctrl-C to stop.'); + let supervisorExiting = false; + const teardownAndExit = (signal: string) => { + if (supervisorExiting) return; + supervisorExiting = true; + console.log(`\n[browse] ${signal} received — stopping server.`); + const state = readState(); + if (state?.pid && isProcessAlive(state.pid)) { + safeKill(state.pid, 'SIGTERM'); + } + process.exit(0); + }; + process.on('SIGINT', () => teardownAndExit('SIGINT')); + process.on('SIGTERM', () => teardownAndExit('SIGTERM')); + + const SUPERVISOR_TICK_MS = parseInt( + process.env.GSTACK_SUPERVISOR_TICK_MS || '30000', + 10, + ); + const SUPERVISOR_GUARD_WINDOW_MS = 5 * 60_000; + const SUPERVISOR_GUARD_MAX = 5; + const SUPERVISOR_BACKOFF_MS = (process.env.GSTACK_SUPERVISOR_BACKOFF || '1000,2000,4000,8000,30000') + .split(',').map(s => parseInt(s.trim(), 10)).filter(n => Number.isFinite(n)); + const respawns: number[] = []; + + while (!supervisorExiting) { + await new Promise(resolve => setTimeout(resolve, SUPERVISOR_TICK_MS)); + if (supervisorExiting) break; + const state = readState(); + if (state?.pid && isProcessAlive(state.pid)) continue; + // Server died. Prune rolling window and check guard. + const now = Date.now(); + while (respawns.length && now - respawns[0] > SUPERVISOR_GUARD_WINDOW_MS) { + respawns.shift(); + } + if (respawns.length >= SUPERVISOR_GUARD_MAX) { + console.error( + `[browse] Supervisor: ${SUPERVISOR_GUARD_MAX} crashes in ${SUPERVISOR_GUARD_WINDOW_MS / 1000}s — giving up.`, + ); + process.exit(1); + } + const attempt = respawns.length; + respawns.push(now); + const backoff = SUPERVISOR_BACKOFF_MS[Math.min(attempt, SUPERVISOR_BACKOFF_MS.length - 1)] ?? 30_000; + console.warn(`[browse] Supervisor: server PID gone — respawning in ${backoff}ms (attempt ${attempt + 1}/${SUPERVISOR_GUARD_MAX})...`); + await new Promise(resolve => setTimeout(resolve, backoff)); + if (supervisorExiting) break; + try { + const respawned = await startServer(serverEnv); + console.log(`[browse] Supervisor: server respawned (PID ${respawned.pid}, port ${respawned.port}).`); + // Re-spawn the terminal-agent too; same env wiring as the initial connect. + try { + spawnTerminalAgent({ + stateFile: config.stateFile, + serverPort: respawned.port, + cwd: config.projectDir, + }); + } catch (err: any) { + console.warn(`[browse] Supervisor: terminal-agent respawn failed: ${err?.message || err}`); + } + } catch (err: any) { + console.error(`[browse] Supervisor: server respawn failed: ${err?.message || err}`); + // Let the next tick try again — the crash-loop guard already + // bounded the retries via the rolling window. + } + } process.exit(0); } diff --git a/browse/src/pty-session-lease.ts b/browse/src/pty-session-lease.ts new file mode 100644 index 000000000..ec2797889 --- /dev/null +++ b/browse/src/pty-session-lease.ts @@ -0,0 +1,137 @@ +/** + * PTY session lease registry (v1.44+). + * + * Separates two concerns that pre-v1.44 were conflated under one token: + * + * - **sessionId** — stable, non-secret identifier for a single PTY session. + * Safe to log, safe to include in URLs and server access logs, safe to + * keep in DevTools. Identifies "this terminal," not "you're allowed to + * use this terminal." + * + * - **attachToken** — secret, short-lived (30 s) bearer credential that + * grants the WS upgrade for ONE attach attempt against a session. Minted + * on every /pty-session and /pty-session/reattach call; revoked when + * the WS upgrade consumes it. Kept out of logs. + * + * - **lease** — server-side bookkeeping that maps sessionId → expiresAt. + * Re-attach within the lease window resumes the same PTY (and replays + * the ring buffer from terminal-agent). Lease expiry tears down the + * session. + * + * Codex outside-voice (T1 of the eng review) pushed for this separation: + * "the auth token IS the session id" collapsed identity into a secret, + * meaning re-attach URLs and logs carry the bearer credential. The lease + * model fixes that without changing the user experience. + * + * Mint cadence: + * - Initial /pty-session: mint sessionId + lease + attachToken (one round trip). + * - /pty-session/reattach: validate sessionId/lease, mint fresh attachToken. + * - /pty-restart: revoke old lease, mint fresh sessionId + lease + attachToken. + * - /pty-dispose: revoke lease (and the terminal-agent disposes the PTY). + * + * Lease TTL is env-overridable so v1.44 e2e tests can compress detach + * windows to 1 s instead of waiting 30 minutes per assertion. + */ +import * as crypto from 'crypto'; + +interface Lease { + createdAt: number; + expiresAt: number; +} + +const LEASE_TTL_MS = parseInt( + process.env.GSTACK_PTY_LEASE_TTL_MS || `${30 * 60 * 1000}`, + 10, +); // 30 minutes default; covers idle-but-engaged user sessions +const MAX_LEASES = 10_000; +const leases = new Map(); + +/** + * Mint a fresh sessionId + lease. Returns the non-secret sessionId and + * the expiry timestamp (caller surfaces both to the client). Never throws. + */ +export function mintLease(): { sessionId: string; expiresAt: number } { + const sessionId = crypto.randomBytes(32).toString('base64url'); + const now = Date.now(); + const expiresAt = now + LEASE_TTL_MS; + leases.set(sessionId, { createdAt: now, expiresAt }); + pruneExpired(now); + return { sessionId, expiresAt }; +} + +/** + * Check whether a lease is still valid (exists AND not expired). Returns + * the current expiresAt for valid leases; null otherwise. Lazily prunes + * stale entries. + */ +export function validateLease(sessionId: string | null | undefined): { ok: true; expiresAt: number } | { ok: false } { + if (!sessionId) return { ok: false }; + const lease = leases.get(sessionId); + if (!lease) { + pruneExpired(Date.now()); + return { ok: false }; + } + if (Date.now() > lease.expiresAt) { + leases.delete(sessionId); + pruneExpired(Date.now()); + return { ok: false }; + } + return { ok: true, expiresAt: lease.expiresAt }; +} + +/** + * Extend the lease's expiresAt to `now + LEASE_TTL_MS`. Caller should + * gate refresh on `expiresAt - now < REFRESH_THRESHOLD` (D10 lazy + * refresh: avoid refreshing on every keepalive when the lease is + * comfortably far from expiry). + * + * Returns `{ ok: true, expiresAt }` on success, `{ ok: false }` if the + * lease is unknown or already expired (the agent must close the WS and + * surface auth-invalid). Critical security invariant: never resurrect + * an expired lease — the 30-min TTL is what bounds blast radius for a + * leaked attach token whose lease should have been GC'd. + */ +export function refreshLease(sessionId: string | null | undefined): { ok: true; expiresAt: number } | { ok: false } { + if (!sessionId) return { ok: false }; + const lease = leases.get(sessionId); + if (!lease) return { ok: false }; + const now = Date.now(); + if (now > lease.expiresAt) { + leases.delete(sessionId); + return { ok: false }; + } + lease.expiresAt = now + LEASE_TTL_MS; + return { ok: true, expiresAt: lease.expiresAt }; +} + +/** + * Drop a lease. Called on explicit dispose (/pty-dispose, /pty-restart, + * WS close with code 4001) and on session timeout in terminal-agent. + */ +export function revokeLease(sessionId: string | null | undefined): void { + if (!sessionId) return; + leases.delete(sessionId); +} + +/** Returns the lease count — test + observability helper. */ +export function leaseCount(): number { + return leases.size; +} + +/** Test-only reset. */ +export function __resetLeases(): void { + leases.clear(); +} + +function pruneExpired(now: number): void { + let checked = 0; + for (const [sessionId, lease] of leases) { + if (checked++ >= 20) break; + if (lease.expiresAt <= now) leases.delete(sessionId); + } + while (leases.size > MAX_LEASES) { + const first = leases.keys().next().value; + if (!first) break; + leases.delete(first); + } +} diff --git a/browse/src/server.ts b/browse/src/server.ts index afeb5a09e..9483602b4 100644 --- a/browse/src/server.ts +++ b/browse/src/server.ts @@ -43,6 +43,8 @@ import { inspectElement, modifyStyle, resetModifications, getModificationHistory // Bun.spawn used instead of child_process.spawn (compiled bun binaries // fail posix_spawn on all executables including /bin/bash) import { safeUnlink, safeUnlinkQuiet, safeKill } from './error-handling'; +import { readAgentRecord, killAgentByRecord, clearAgentRecord, agentRecordPath, spawnTerminalAgent } from './terminal-agent-control'; +import { isProcessAlive } from './error-handling'; import { sanitizeBody, stripLoneSurrogateEscapes } from './sanitize'; import { startSocksBridge, testUpstream, type BridgeHandle } from './socks-bridge'; import { parseProxyConfig, toUpstreamConfig, ProxyConfigError } from './proxy-config'; @@ -56,6 +58,9 @@ import { import { mintPtySessionToken, buildPtySetCookie, revokePtySessionToken, } from './pty-session-cookie'; +import { + mintLease, validateLease, refreshLease, revokeLease, +} from './pty-session-lease'; import * as fs from 'fs'; import * as net from 'net'; import * as path from 'path'; @@ -207,31 +212,34 @@ export interface ServerConfig { beforeRoute?: (req: Request, surface: Surface, auth: TokenInfo | null) => Promise; /** * Whether gstack owns the lifecycle of the terminal-agent process and its - * discovery files (`/terminal-port`, `/terminal-internal-token`). + * discovery files (`/terminal-port`, `/terminal-internal-token`, + * `/terminal-agent-pid`). * - * When true (default), shutdown() runs three side effects: - * 1. `pkill -f terminal-agent\.ts` — regex-broad, matches ANY process whose - * command line contains `terminal-agent.ts` on this host (including - * sibling gstack sessions). Pre-existing CLI behavior, not introduced by - * this flag. Identity-based PID kill is a separate followup (see TODOS). + * When true (default), shutdown() runs four side effects: + * 1. Identity-based kill via `killAgentByRecord(readAgentRecord(stateDir))` + * (v1.44+). Only signals the PID recorded by THIS daemon's agent. + * Replaced the historical `pkill -f terminal-agent\.ts` regex that + * matched sibling gstack sessions on the same host — see + * terminal-agent-control.ts for rationale. * 2. `safeUnlinkQuiet(/terminal-port)` * 3. `safeUnlinkQuiet(/terminal-internal-token)` + * 4. `safeUnlinkQuiet(/terminal-agent-pid)` (the v1.44 record) * * This is correct for gstack's CLI path, which spawns `terminal-agent.ts` as * the producer of those files (see cli.ts:1037-1063). * * Embedders (gbrowser phoenix overlay, future hosts) that run their own PTY * server and write those files themselves should pass `false`. When `false`, - * the embedder owns BOTH the agent process AND both discovery files — - * terminal-agent.ts's own SIGTERM cleanup only removes `terminal-port` - * (see terminal-agent.ts:558), so the internal-token file is the embedder's - * full responsibility. + * the embedder owns BOTH the agent process AND all three discovery files. + * Note that terminal-agent.ts's own SIGTERM cleanup removes `terminal-port` + * and `terminal-agent-pid` (the agent writes both at boot), so embedders + * that pre-launch their own agent must ensure their cleanup matches. * * Polarity note: this differs from `xvfb?` and `proxyBridge?`, which gate by * the *presence* of a caller-owned handle (presence ⇒ don't close). This * field gates by an explicit boolean because there is no handle object — * the terminal-agent is started elsewhere (cli.ts), and shutdown's only - * reference is the regex-based pkill + the file paths. + * reference is the PID record + the file paths. */ ownsTerminalAgent?: boolean; } @@ -404,11 +412,13 @@ function readTerminalInternalToken(): string | null { /** * Push a freshly-minted PTY cookie token to the terminal-agent so its - * /ws upgrade can validate the cookie. Loopback POST authenticated with - * the internal token written by the agent at startup. Fire-and-forget; - * if the agent isn't up yet, the extension just retries /pty-session. + * /ws upgrade can validate the cookie. v1.44+: also pushes the bound + * sessionId so the agent can route /internal/restart and (Commit 3) + * re-attach back to the same PtySession. Loopback POST authenticated + * with the internal token written by the agent at startup. If the agent + * isn't up yet, the extension just retries /pty-session. */ -async function grantPtyToken(token: string): Promise { +async function grantPtyToken(token: string, sessionId?: string): Promise { const port = readTerminalPort(); const internal = readTerminalInternalToken(); if (!port || !internal) return false; @@ -419,13 +429,36 @@ async function grantPtyToken(token: string): Promise { 'Content-Type': 'application/json', 'Authorization': `Bearer ${internal}`, }, - body: JSON.stringify({ token }), + body: JSON.stringify(sessionId ? { token, sessionId } : { token }), signal: AbortSignal.timeout(2000), }); return resp.ok; } catch { return false; } } +/** + * Ask the terminal-agent to dispose the PtySession bound to `sessionId`. + * Scoped to one caller's session — sibling tabs/agents untouched. Used by + * /pty-restart and /pty-dispose. Returns true on agent ack. + */ +async function restartPtySession(sessionId: string): Promise { + const port = readTerminalPort(); + const internal = readTerminalInternalToken(); + if (!port || !internal) return false; + try { + const resp = await fetch(`http://127.0.0.1:${port}/internal/restart`, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'Authorization': `Bearer ${internal}`, + }, + body: JSON.stringify({ sessionId }), + signal: AbortSignal.timeout(5000), + }); + return resp.ok; + } catch { return false; } +} + /** Extract bearer token from request. Returns the token string or null. */ function extractToken(req: Request): string | null { const header = req.headers.get('authorization'); @@ -1333,6 +1366,84 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { // premise even under malformed cfg. const ownsTerminalAgent = cfg.ownsTerminalAgent === false ? false : true; + // ─── Terminal-Agent Watchdog (v1.44+) ───────────────────────────── + // + // The terminal-agent process can die independently of the server: SIGKILL + // from the OS OOM killer, an uncaught exception under load, an external + // `pkill` from a sibling debugging session. Pre-v1.44 the sidebar would + // see the broken connection and stay broken until the user reloaded. + // Now: 60s ticker checks the recorded agent PID, respawns via the shared + // spawnTerminalAgent helper if dead. + // + // Identity-based — uses readAgentRecord + isProcessAlive, NOT a process + // name probe. Critical: prevents respawning around a slow-but-alive agent + // (which would create split-brain — two agents writing the port file, + // tokens diverging between them, mystery PTY upgrade failures). + // + // Crash-loop guard: 3 respawn attempts inside 60s → stop trying and emit + // a one-line error. Manual `forceRestart` from the sidebar clears the + // history (the user is the explicit signal to retry). + // + // Only active when ownsTerminalAgent === true. Embedders that pre-launch + // their own PTY server (gbrowser phoenix overlay) must not be auto-respawned + // by us — their lifecycle is their concern. + let agentWatchdogInterval: ReturnType | null = null; + const respawnHistory: number[] = []; + const AGENT_WATCHDOG_TICK_MS = parseInt( + process.env.GSTACK_AGENT_WATCHDOG_TICK_MS || '60000', + 10, + ); + const RESPAWN_GUARD_WINDOW_MS = 60_000; + const RESPAWN_GUARD_MAX = 3; + let agentRespawnGuardTripped = false; + + if (ownsTerminalAgent) { + agentWatchdogInterval = setInterval(() => { + if (isShuttingDown) return; + if (agentRespawnGuardTripped) return; + const stateDir = path.dirname(cfg.config.stateFile); + const record = readAgentRecord(stateDir); + // If the record exists and the PID is alive, the agent is healthy + // (or at least still answering signal 0). Slow-but-alive agents + // intentionally fall through here — split-brain is worse than + // unresponsiveness, and slow recovery is handled by the user via + // restart. + if (record && isProcessAlive(record.pid)) return; + // Either no record (never spawned, or cleaned up after crash) or + // PID is dead. Try to respawn. + const now = Date.now(); + while (respawnHistory.length && now - respawnHistory[0] > RESPAWN_GUARD_WINDOW_MS) { + respawnHistory.shift(); + } + if (respawnHistory.length >= RESPAWN_GUARD_MAX) { + agentRespawnGuardTripped = true; + console.error( + `[browse] terminal-agent respawn guard tripped (${RESPAWN_GUARD_MAX} crashes in ${RESPAWN_GUARD_WINDOW_MS / 1000}s) — manual restart required`, + ); + return; + } + respawnHistory.push(now); + try { + const pid = spawnTerminalAgent({ + stateFile: cfg.config.stateFile, + serverPort: cfg.browsePort, + cwd: cfg.config.projectDir, + }); + if (pid) { + console.log(`[browse] terminal-agent respawned by watchdog (PID: ${pid})`); + } else { + console.warn('[browse] terminal-agent respawn skipped — script not found on disk'); + } + } catch (err: any) { + console.warn('[browse] terminal-agent respawn failed:', err?.message || err); + } + }, AGENT_WATCHDOG_TICK_MS); + // Detach the watchdog timer from Node's event-loop ref count so a + // healthy idle process can still exit cleanly if everything else is + // also unref'd. Bun's setInterval returns a Timer with unref(). + (agentWatchdogInterval as any)?.unref?.(); + } + // Factory-scoped validateAuth. Closes over cfg.authToken so every internal // auth check sees the same token the routes receive. Module-level // validateAuth was deleted in v1.35.0.0. @@ -1350,14 +1461,20 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { console.log('[browse] Shutting down...'); if (ownsTerminalAgent) { + // Identity-based kill (v1.44+). Replaces the v1.43- `pkill -f + // terminal-agent\.ts` regex teardown which matched sibling gstack + // sessions on the same host. Only the PID recorded in + // `/terminal-agent-pid` by THIS daemon's agent is signaled. try { - const { spawnSync } = require('child_process'); - spawnSync('pkill', ['-f', 'terminal-agent\\.ts'], { stdio: 'ignore', timeout: 3000 }); + const stateDir = path.dirname(config.stateFile); + const record = readAgentRecord(stateDir); + if (record) killAgentByRecord(record, 'SIGTERM'); } catch (err: any) { console.warn('[browse] Failed to kill terminal-agent:', err.message); } safeUnlinkQuiet(path.join(path.dirname(config.stateFile), 'terminal-port')); safeUnlinkQuiet(path.join(path.dirname(config.stateFile), 'terminal-internal-token')); + safeUnlinkQuiet(agentRecordPath(path.dirname(config.stateFile))); } try { detachSession(); } catch (err: any) { console.warn('[browse] Failed to detach CDP session:', err.message); @@ -1366,6 +1483,7 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { if (cfgBrowserManager.isWatching()) cfgBrowserManager.stopWatch(); clearInterval(flushInterval); clearInterval(idleCheckInterval); + if (agentWatchdogInterval) clearInterval(agentWatchdogInterval); await flushBuffers(); await cfgBrowserManager.close(); @@ -1564,15 +1682,25 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { }); } - // ─── /pty-session — mint Terminal-tab WebSocket cookie ─────────── + // ─── /pty-session — mint sessionId + lease + attachToken ───────── // - // The extension POSTs here with the bootstrap authToken, gets back a - // short-lived HttpOnly cookie scoped to the terminal-agent's /ws - // upgrade. We push the cookie value to the agent over loopback so the - // upgrade can validate it. The cookie travels automatically with the - // browser's WebSocket upgrade because it's same-origin to the agent - // when the daemon binds 127.0.0.1. NEVER added to TUNNEL_PATHS — the - // tunnel surface 404s any /pty-session attempt by default-deny. + // v1.44+ four-tuple shape: + // { terminalPort, sessionId, attachToken, leaseExpiresAt } + // + // - sessionId : stable, non-secret. Safe to log. Identifies "this + // terminal" across re-attaches. + // - attachToken : short-lived (30 min wall, single attach in practice + // since the agent revokes on WS close). Bearer for + // the /ws upgrade. + // - leaseExpiresAt: client-visible deadline for the lease. Re-attach + // only works inside this window. + // + // The lease + attachToken are minted together so a successful + // /pty-session is one round trip. Re-attach mints a fresh attachToken + // for the SAME sessionId via /pty-session/reattach. + // + // NEVER added to TUNNEL_PATHS — the tunnel surface 404s any + // /pty-session attempt by default-deny. if (url.pathname === '/pty-session' && req.method === 'POST') { if (!validateAuth(req)) { return new Response(JSON.stringify({ error: 'Unauthorized' }), { @@ -1585,41 +1713,195 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { error: 'terminal-agent not ready', }), { status: 503, headers: { 'Content-Type': 'application/json' } }); } + const lease = mintLease(); const minted = mintPtySessionToken(); - const granted = await grantPtyToken(minted.token); + const granted = await grantPtyToken(minted.token, lease.sessionId); if (!granted) { revokePtySessionToken(minted.token); + revokeLease(lease.sessionId); return new Response(JSON.stringify({ error: 'failed to grant terminal session', }), { status: 503, headers: { 'Content-Type': 'application/json' } }); } return new Response(JSON.stringify({ terminalPort: port, - // Returned in the JSON body so the extension can pass it to - // `new WebSocket(url, [token])`. Browsers translate that to a - // `Sec-WebSocket-Protocol` header — the only auth header we can - // set from the browser WebSocket API. SameSite=Strict cookies - // don't survive the port change between server.ts (34567) and - // the agent (random port), and HttpOnly + cross-origin makes - // the cookie path unreliable across browsers anyway. - // - // The token is short-lived (30 min, auto-revoked on WS close) - // and never persisted to disk on the extension side. The - // pre-existing authToken leak via /health is a separate - // concern (v1.1+ TODO). + sessionId: lease.sessionId, + attachToken: minted.token, + leaseExpiresAt: lease.expiresAt, + // Legacy alias — extensions still on the v1.43 wire shape keep + // working. Drop after one minor release once dogfood confirms. ptySessionToken: minted.token, expiresAt: minted.expiresAt, }), { status: 200, headers: { 'Content-Type': 'application/json', - // Set-Cookie is kept for non-browser callers / future use, - // but the WS upgrade no longer depends on it. 'Set-Cookie': buildPtySetCookie(minted.token), }, }); } + // ─── /pty-session/reattach — mint fresh attachToken for existing sessionId + // + // Used by Commit 3's re-attach loop on the client. Validates the + // lease (rejects unknown/expired sessionId with 410 Gone), mints a + // fresh short-lived attachToken bound to the same sessionId, and + // pushes it to the agent. The client opens a new WS with the new + // token; the agent matches the sessionId binding and re-attaches + // to the existing PtySession (kept alive for the 60s detach + // window — Commit 3 wires that side). + if (url.pathname === '/pty-session/reattach' && req.method === 'POST') { + if (!validateAuth(req)) { + return new Response(JSON.stringify({ error: 'Unauthorized' }), { + status: 401, headers: { 'Content-Type': 'application/json' }, + }); + } + const port = readTerminalPort(); + if (!port) { + return new Response(JSON.stringify({ error: 'terminal-agent not ready' }), { + status: 503, headers: { 'Content-Type': 'application/json' }, + }); + } + let body: any; + try { body = await req.json(); } catch { body = null; } + const sessionId = typeof body?.sessionId === 'string' ? body.sessionId : null; + const v = sessionId ? validateLease(sessionId) : { ok: false }; + if (!v.ok) { + // 410 Gone — session window has closed (lease expired or never + // existed). Client must fall back to /pty-session for a brand-new + // session. + return new Response(JSON.stringify({ error: 'lease expired or unknown' }), { + status: 410, headers: { 'Content-Type': 'application/json' }, + }); + } + const minted = mintPtySessionToken(); + const granted = await grantPtyToken(minted.token, sessionId!); + if (!granted) { + revokePtySessionToken(minted.token); + return new Response(JSON.stringify({ error: 'failed to grant attach token' }), { + status: 503, headers: { 'Content-Type': 'application/json' }, + }); + } + return new Response(JSON.stringify({ + terminalPort: port, + sessionId, + attachToken: minted.token, + leaseExpiresAt: v.ok ? v.expiresAt : 0, + }), { status: 200, headers: { 'Content-Type': 'application/json' } }); + } + + // ─── /pty-restart — one-transaction kill + fresh mint ──────────── + // + // The Restart button. Synchronously disposes the caller's existing + // PtySession on the agent, revokes the old lease, mints a fresh + // sessionId + lease + attachToken, and returns the new 4-tuple in + // one response. Zero race window between kill and mint (codex T2 + // + D8 of the eng review). + if (url.pathname === '/pty-restart' && req.method === 'POST') { + if (!validateAuth(req)) { + return new Response(JSON.stringify({ error: 'Unauthorized' }), { + status: 401, headers: { 'Content-Type': 'application/json' }, + }); + } + const port = readTerminalPort(); + if (!port) { + return new Response(JSON.stringify({ error: 'terminal-agent not ready' }), { + status: 503, headers: { 'Content-Type': 'application/json' }, + }); + } + let body: any; + try { body = await req.json(); } catch { body = null; } + const oldSessionId = typeof body?.sessionId === 'string' ? body.sessionId : null; + // Best-effort dispose. Missing/unknown sessionId is non-fatal — + // the client may be doing a "restart from scratch" with no prior + // session (e.g. ENDED state). The fresh mint always proceeds. + if (oldSessionId) { + await restartPtySession(oldSessionId); + revokeLease(oldSessionId); + } + const lease = mintLease(); + const minted = mintPtySessionToken(); + const granted = await grantPtyToken(minted.token, lease.sessionId); + if (!granted) { + revokePtySessionToken(minted.token); + revokeLease(lease.sessionId); + return new Response(JSON.stringify({ error: 'failed to grant terminal session' }), { + status: 503, headers: { 'Content-Type': 'application/json' }, + }); + } + return new Response(JSON.stringify({ + terminalPort: port, + sessionId: lease.sessionId, + attachToken: minted.token, + leaseExpiresAt: lease.expiresAt, + }), { status: 200, headers: { 'Content-Type': 'application/json' } }); + } + + // ─── /pty-dispose — explicit teardown (pagehide / browser quit) ── + // + // sendBeacon-compatible: accepts the auth token in the BODY so the + // extension's pagehide handler can fire it without setting headers + // (sendBeacon doesn't support custom headers). Codex T3 fix — + // without this, every browser quit + sidebar close leaves a zombie + // PTY alive for the 60s detach window (Commit 3). + if (url.pathname === '/pty-dispose' && req.method === 'POST') { + let body: any; + try { body = await req.json(); } catch { body = null; } + const authTokenFromBody = typeof body?.authToken === 'string' ? body.authToken : null; + // Accept either header bearer OR body authToken. Both must match + // the root auth token; otherwise reject. + const headerToken = extractToken(req); + const authedByHeader = headerToken !== null && headerToken === authToken; + const authedByBody = authTokenFromBody !== null && authTokenFromBody === authToken; + if (!authedByHeader && !authedByBody) { + return new Response(JSON.stringify({ error: 'Unauthorized' }), { + status: 401, headers: { 'Content-Type': 'application/json' }, + }); + } + const sessionId = typeof body?.sessionId === 'string' ? body.sessionId : null; + if (sessionId) { + await restartPtySession(sessionId); + revokeLease(sessionId); + } + return new Response(JSON.stringify({ ok: true }), { + status: 200, headers: { 'Content-Type': 'application/json' }, + }); + } + + // ─── /internal/lease-refresh — loopback from terminal-agent on keepalive + // + // T6 PTY-only idle reset (codex outside-voice fix): the headless + // daemon's idle timer must reset only on active PTY usage, not on + // every passive SSE consumer. Terminal-agent calls this endpoint + // (lazily, only when its cached lease is within 5 min of expiry) + // on its 25s keepalive cycle. Refreshing the lease here also bumps + // lastActivity so the daemon stays alive while a sidebar terminal + // is actively in use. + // + // INTERNAL endpoint — bound to the root authToken so an external + // caller can't refresh another user's lease. Body: {sessionId}. + if (url.pathname === '/internal/lease-refresh' && req.method === 'POST') { + if (!validateAuth(req)) { + return new Response(JSON.stringify({ error: 'Unauthorized' }), { + status: 401, headers: { 'Content-Type': 'application/json' }, + }); + } + let body: any; + try { body = await req.json(); } catch { body = null; } + const sessionId = typeof body?.sessionId === 'string' ? body.sessionId : null; + const r = sessionId ? refreshLease(sessionId) : { ok: false }; + if (!r.ok) { + return new Response(JSON.stringify({ error: 'lease expired or unknown' }), { + status: 410, headers: { 'Content-Type': 'application/json' }, + }); + } + // T6: PTY activity resets the daemon idle timer. + resetIdleTimer(); + return new Response(JSON.stringify({ ok: true, expiresAt: r.expiresAt }), { + status: 200, headers: { 'Content-Type': 'application/json' }, + }); + } + // ─── /pty-inject-scan — pre-inject prompt-injection scan for the // extension's gstackInjectToTerminal callers. The extension routes // every page-derived text through this endpoint BEFORE writing to diff --git a/browse/src/terminal-agent-control.ts b/browse/src/terminal-agent-control.ts new file mode 100644 index 000000000..094ba668f --- /dev/null +++ b/browse/src/terminal-agent-control.ts @@ -0,0 +1,143 @@ +/** + * terminal-agent process-control primitives shared by cli.ts spawn site, + * server.ts shutdown teardown, and the v1.44 watchdog/respawn loop. + * + * Why this exists: pre-v1.44 used `pkill -f terminal-agent\.ts`, which + * matches any process whose argv contains the string and would kill + * sibling gstack sessions on the same host. The agent now writes a + * structured `terminal-agent-pid` record (`{pid, gen, startedAt}`) and + * every kill site routes through `killAgentByRecord` here — identity-based, + * no regex. + * + * The `gen` field is a per-boot generation counter. Loopback /internal/* + * calls from the parent server include `X-Browse-Gen` so a slow agent that + * the watchdog respawned around can't accidentally service a stale grant + * from the old generation. + */ +import * as fs from 'fs'; +import * as path from 'path'; +import { safeUnlink, safeKill, isProcessAlive } from './error-handling'; +import { writeSecureFile, mkdirSecure } from './file-permissions'; + +/** + * Locate the terminal-agent script on disk. In dev (cli.ts running via + * `bun run`), it lives next to this file in browse/src. In a compiled + * binary, Bun's --compile bakes the source into the executable and + * exposes it relative to process.execPath. Either path must work or + * the agent can't be spawned at all. + */ +export function resolveTerminalAgentScript(searchHints: { metaDir?: string; execPath?: string } = {}): string | null { + const meta = searchHints.metaDir || __dirname; + const exec = searchHints.execPath || process.execPath; + const candidates = [ + path.resolve(meta, 'terminal-agent.ts'), + path.resolve(path.dirname(exec), '..', 'src', 'terminal-agent.ts'), + ]; + for (const c of candidates) { + if (fs.existsSync(c)) return c; + } + return null; +} + +/** + * Spawn a fresh terminal-agent as a detached child. Handles the standard + * three steps: kill any prior agent recorded at `/terminal-agent-pid`, + * clear the stale record, then `Bun.spawn(['bun', 'run', script], ...)` with + * env wiring. Returns the PID of the new agent on success, null when the + * agent script can't be located. + * + * Used by both the CLI cold-start path (cli.ts) and the v1.44 watchdog in + * server.ts. Centralizing here removes a copy-paste between them and means + * future spawn-env additions (e.g. BROWSE_OWNER_PID for the generation + * counter rollout) land in one place. + */ +export function spawnTerminalAgent(opts: { + stateFile: string; + serverPort: number; + cwd?: string; + /** Optional extra env vars to add to the agent's process env. */ + extraEnv?: Record; + /** Override script lookup for tests. */ + scriptPath?: string; +}): number | null { + const stateDir = path.dirname(opts.stateFile); + const prior = readAgentRecord(stateDir); + if (prior) { + killAgentByRecord(prior, 'SIGTERM'); + clearAgentRecord(stateDir); + } + const script = opts.scriptPath || resolveTerminalAgentScript(); + if (!script || !fs.existsSync(script)) return null; + const proc = (Bun as any).spawn(['bun', 'run', script], { + cwd: opts.cwd || process.cwd(), + env: { + ...process.env, + BROWSE_STATE_FILE: opts.stateFile, + BROWSE_SERVER_PORT: String(opts.serverPort), + ...(opts.extraEnv || {}), + }, + stdio: ['ignore', 'ignore', 'ignore'], + }); + proc.unref?.(); + return proc.pid ?? null; +} + +export interface AgentRecord { + pid: number; + /** Random per-boot identifier. Loopback /internal/* sees X-Browse-Gen: . */ + gen: string; + /** ms since epoch. Reserved for future PID-reuse guards. */ + startedAt: number; +} + +export function agentRecordPath(stateDir: string): string { + return path.join(stateDir, 'terminal-agent-pid'); +} + +/** Read the current record. Returns null on missing/malformed file. */ +export function readAgentRecord(stateDir: string): AgentRecord | null { + try { + const raw = fs.readFileSync(agentRecordPath(stateDir), 'utf-8'); + const j = JSON.parse(raw); + if (typeof j?.pid === 'number' && typeof j?.gen === 'string' && typeof j?.startedAt === 'number') { + return j as AgentRecord; + } + return null; + } catch { + return null; + } +} + +/** Atomic write. Caller must ensure stateDir exists; agent does this at boot. */ +export function writeAgentRecord(stateDir: string, record: AgentRecord): void { + try { mkdirSecure(stateDir); } catch {} + const target = agentRecordPath(stateDir); + const tmp = `${target}.tmp-${process.pid}`; + writeSecureFile(tmp, JSON.stringify(record)); + fs.renameSync(tmp, target); +} + +export function clearAgentRecord(stateDir: string): void { + safeUnlink(agentRecordPath(stateDir)); +} + +/** + * Kill the agent identified by `record`. Signal defaults to SIGTERM (give + * the agent a chance to run its own SIGTERM cleanup). Returns true if a + * signal was actually sent to a live PID; false if the PID was already + * dead (no-op). Never throws — ESRCH is swallowed by safeKill. + * + * Validates liveness BEFORE signaling so a PID-reuse race (the recorded + * PID was reaped and a brand-new unrelated process now holds it) can't + * cause us to kill the wrong process. This is a best-effort defense: + * Linux/macOS don't expose process-start-time cheaply, and the gap + * between record-write and watchdog-tick is small (60s max). + */ +export function killAgentByRecord( + record: AgentRecord, + signal: NodeJS.Signals = 'SIGTERM', +): boolean { + if (!isProcessAlive(record.pid)) return false; + safeKill(record.pid, signal); + return true; +} diff --git a/browse/src/terminal-agent.ts b/browse/src/terminal-agent.ts index 189a49f09..2e39d99e4 100644 --- a/browse/src/terminal-agent.ts +++ b/browse/src/terminal-agent.ts @@ -25,16 +25,47 @@ import * as path from 'path'; import * as crypto from 'crypto'; import { writeSecureFile, mkdirSecure } from './file-permissions'; import { safeUnlink } from './error-handling'; +import { writeAgentRecord, clearAgentRecord } from './terminal-agent-control'; const STATE_FILE = process.env.BROWSE_STATE_FILE || path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json'); const PORT_FILE = path.join(path.dirname(STATE_FILE), 'terminal-port'); const BROWSE_SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '0', 10); const EXTENSION_ID = process.env.BROWSE_EXTENSION_ID || ''; // optional: tighten Origin check const INTERNAL_TOKEN = crypto.randomBytes(32).toString('base64url'); // shared with parent server via env at spawn +/** + * Per-boot generation identifier. Loopback /internal/* callers include + * `X-Browse-Gen: ` so a slow agent the watchdog respawned + * around can't service a stale grant from the prior generation. Absent + * header means "legacy caller" and is accepted (backward compat); a + * present-but-mismatched header returns 409 stale generation. + */ +const CURRENT_GEN = crypto.randomBytes(16).toString('base64url'); -// In-memory cookie token registry. Parent posts /internal/grant after -// /pty-session; we validate WS cookies against this set. -const validTokens = new Set(); +// In-memory attach-token registry. Parent posts /internal/grant after +// /pty-session; we validate WS upgrades against this map. +// +// v1.44+: each token is bound to a v1.44 sessionId (the stable, non-secret +// identifier from browse/src/pty-session-lease.ts). The token grants ONE +// attach for ONE session — re-attach within the lease window comes through +// /pty-session/reattach, which mints a fresh token for the same sessionId. +// +// Legacy callers can still pass `{token}` without sessionId (the value +// stays null and the WS upgrade still works); those callers don't get +// re-attach because there's no stable identifier to match against. +const validTokens = new Map(); // token → sessionId + +/** + * Reverse index for re-attach lookups: sessionId → live PtySession. + * Populated when a WS first attaches with a known sessionId; cleared when + * the session is disposed or the lease expires. Used by: + * - /ws upgrade: if the incoming attachToken maps to a sessionId that + * already has a live session, REPLACE its ws ref instead of spawning. + * - /internal/restart: enumerate by sessionId, dispose that one session. + * + * Kept separate from the WeakMap so re-attach can find the + * session by id even after the original ws has gone. + */ +const sessionsById = new Map(); // Active PTY session per WS. One terminal per connection. Codex finding #4: // uncaught handlers below catch bugs in framing/cleanup so they don't kill @@ -46,12 +77,154 @@ process.on('unhandledRejection', (reason) => { console.error('[terminal-agent] unhandledRejection:', reason); }); -interface PtySession { +export interface PtySession { proc: any | null; // Bun.Subprocess once spawned cols: number; rows: number; cookie: string; + /** + * Current attached websocket. Swapped on re-attach (Commit 3): when a new + * WS upgrade matches this session's sessionId, the old liveWs is gone + * and the new ws takes its place. The PTY on-data callback closes over + * `session`, not the original `ws`, so it always writes to the current + * liveWs (or skips the write when detached and liveWs is null). + */ + liveWs: any | null; + /** + * v1.44+ stable session identifier (from pty-session-lease). Null for + * legacy /internal/grant callers that didn't pass one. Used for + * targeted /internal/restart and Commit 3 re-attach lookups. + */ + sessionId: string | null; spawned: boolean; + /** + * 25s server-side WS keepalive interval (v1.44+). Set in the WS `open` + * handler, cleared in `close`. We send `{type:"ping",ts}` text frames so + * NAT boxes, proxies, and Chrome's MV3 panel-suspend heuristics see the + * connection as active; the client either replies with `{type:"pong"}` + * or fires its own 25s `{type:"keepalive"}` cycle. Either path keeps + * the underlying TCP from being silently dropped. + */ + pingInterval: ReturnType | null; + /** + * Commit 3 scrollback ring buffer. Each PTY write appends a frame; the + * total byte count is capped at RING_BUFFER_MAX_BYTES with oldest frames + * evicted first. On re-attach, the surviving frames are replayed as a + * single binary frame (prefixed with the v1.44 reset sequence) so the + * user sees their last screen of output. Frame boundaries preserve UTF-8 + * + ANSI-CSI boundaries because each frame is the exact buffer that + * spawnClaude's on-data callback emitted. + */ + ringBuffer: Buffer[]; + ringBufferBytes: number; + /** + * Tracks whether the PTY is currently in xterm alt-screen mode. claude's + * TUI enters alt-screen (CSI ?1049h) during tool calls and exits (CSI + * ?1049l) when returning to the main prompt. On re-attach, the replay + * prelude must re-enter alt-screen if the original PTY left it active, + * otherwise the replay renders against the main screen and the cursor + * + colors end up in the wrong place. + */ + altScreenActive: boolean; + /** + * Detach state machine (Commit 3). When the WS closes for a reason OTHER + * than the v1.44 intentional-restart code (4001), we keep the PtySession + * alive for the detach window (default 60s) so a re-attach within the + * window can resume the same PTY and replay the ring buffer. The timer + * disposes the session if no re-attach arrives in time. + */ + detached: boolean; + detachTimer: ReturnType | null; +} + +/** + * WS keepalive interval. 25s is comfortably under the lowest common NAT + * idle timeout (typically 30-60s) and shorter than Chromium's WebSocket + * dead-peer threshold. Test-overridable via env so the v1.44 e2e tests + * can compress idle-window assertions to <1s without waiting half a + * minute per assertion. + */ +const KEEPALIVE_INTERVAL_MS = parseInt( + process.env.GSTACK_PTY_KEEPALIVE_INTERVAL_MS || '25000', + 10, +); + +/** + * Commit 3 scrollback ring buffer cap. 1 MB is enough for a full screen + * of dense claude output (including a recent tool result), small enough + * that a worst-case 10 detached sessions only cost ~10 MB of RSS. + * Env-overridable so e2e tests can verify eviction without writing 1 MB + * of fixture data per assertion. + */ +const RING_BUFFER_MAX_BYTES = parseInt( + process.env.GSTACK_PTY_RING_BUFFER_BYTES || `${1024 * 1024}`, + 10, +); + +/** + * Commit 3 detach window — how long to keep a session alive after WS + * close (with any code other than 4001 intentional-restart) so a + * re-attach can resume the same PTY. 60s is long enough to cover a + * Chrome MV3 service-worker suspend cycle, a wifi blip, or a brief + * laptop sleep; short enough that genuinely-closed sessions don't + * stack up unbounded. + */ +const DETACH_WINDOW_MS = parseInt( + process.env.GSTACK_PTY_DETACH_WINDOW_MS || '60000', + 10, +); + +/** + * Append a frame to a session's ring buffer, evicting oldest frames if + * the total byte count exceeds RING_BUFFER_MAX_BYTES. Eviction is at + * frame boundaries (one PTY write = one frame), so we never cut a + * multi-byte UTF-8 sequence or a partial ANSI CSI in half — claude's + * on-data callback emits coherent frames. + * + * Side effect: scans the appended chunk for alt-screen enter/exit + * sequences (CSI ?1049h / CSI ?1049l) and updates session.altScreenActive + * so the re-attach prelude knows whether to re-enter alt-screen. + */ +export function appendToRingBuffer(session: PtySession, frame: Buffer): void { + session.ringBuffer.push(frame); + session.ringBufferBytes += frame.length; + while (session.ringBufferBytes > RING_BUFFER_MAX_BYTES && session.ringBuffer.length > 1) { + const evicted = session.ringBuffer.shift()!; + session.ringBufferBytes -= evicted.length; + } + // Alt-screen tracking. Scan for the canonical xterm enter/exit pairs. + // We do this on every append (not just on attach) so the state is + // correct even if many frames have flowed since the last attach. + const ascii = frame.toString('latin1'); // single-byte view is enough — the codes are 7-bit ASCII + // Use lastIndexOf so trailing state wins when both appear in one frame + // (e.g., a quick tool-call open+close inside one render pass). + const enterIdx = ascii.lastIndexOf('\x1b[?1049h'); + const exitIdx = ascii.lastIndexOf('\x1b[?1049l'); + if (enterIdx >= 0 && enterIdx > exitIdx) session.altScreenActive = true; + else if (exitIdx >= 0 && exitIdx > enterIdx) session.altScreenActive = false; +} + +/** + * Build the re-attach replay payload: server-side reset prelude + the + * accumulated ring buffer. The client side writes RIS (`\x1bc`) to xterm + * BEFORE feeding this payload in, so the layout is: + * + * 1. Client: `\x1bc` (RIS — full reset, clears pre-blip xterm content) + * 2. Server: `\x1b[!p` (DECSTR soft reset — re-defaults char attributes) + * 3. Server: optional `\x1b[?1049h` if we were in alt-screen at detach + * 4. Server: ring buffer contents, in append order + * + * The client coordinates the order by waiting for a `{type:"reattach-begin"}` + * text frame before treating the next binary frame as replay. That separation + * is what lets us prepend reset codes without clobbering the live stream + * that resumes immediately after. + */ +export function buildReplayPayload(session: PtySession): Buffer { + const parts: Buffer[] = []; + parts.push(Buffer.from('\x1b[!p')); + if (session.altScreenActive) parts.push(Buffer.from('\x1b[?1049h')); + for (const frame of session.ringBuffer) parts.push(frame); + return Buffer.concat(parts); } const sessions = new WeakMap(); // ws -> session @@ -201,6 +374,118 @@ function disposeSession(session: PtySession): void { * * Everything else returns 404. The listener binds 127.0.0.1 only. */ +/** + * Validate a loopback /internal/* request. Returns null when the request + * is allowed; otherwise returns the Response to send back. Centralizes + * bearer auth + the v1.44 X-Browse-Gen generation check so adding a new + * /internal/* route is a one-liner. + */ +function checkInternalAuth(req: Request): Response | null { + const auth = req.headers.get('authorization'); + if (auth !== `Bearer ${INTERNAL_TOKEN}`) { + return new Response('forbidden', { status: 403 }); + } + const headerGen = req.headers.get('x-browse-gen'); + if (headerGen && headerGen !== CURRENT_GEN) { + return new Response('stale generation', { status: 409 }); + } + return null; +} + +/** + * Wrap a JSON-bodied /internal/* handler with the standard bearer-auth + + * generation-check + json-parse + error-response boilerplate. The handler + * `fn` is called with the parsed body; whatever it returns is JSON-stringified + * into a 200 Response, or the handler can return a Response directly to + * customize status / headers. Throwing from `fn` collapses to a 400 "bad". + * + * Centralizing the dance kills the copy-paste pattern of bearer + gen check + * + req.json().then(...).catch(...) that every /internal/* route needs. + * New routes become a single call to internalHandler. + */ +async function internalHandler( + req: Request, + fn: (body: any) => T | Promise | Response | Promise, +): Promise { + const denied = checkInternalAuth(req); + if (denied) return denied; + let body: any; + try { + body = await req.json(); + } catch { + return new Response('bad', { status: 400 }); + } + try { + const result = await fn(body); + if (result instanceof Response) return result; + if (result === undefined || result === null) return new Response('ok'); + return new Response(JSON.stringify(result), { + status: 200, + headers: { 'Content-Type': 'application/json' }, + }); + } catch { + return new Response('bad', { status: 400 }); + } +} + +/** + * Spawn the claude PTY for a session if it hasn't been spawned yet. + * Used by both the legacy binary-frame spawn trigger and the v1.44 explicit + * `{type:"start"}` text-frame trigger. Idempotent on `session.spawned`. + * + * Returns true if claude is now running, false if spawn failed (e.g. claude + * binary not on PATH). On failure, the caller is expected to have already + * surfaced the error to the client (or will via the next frame). + */ +function maybeSpawnPty(ws: any, session: PtySession): boolean { + if (session.spawned) return true; + session.spawned = true; + let leftover = Buffer.alloc(0); + const proc = spawnClaude(session.cols, session.rows, (chunk) => { + const combined = Buffer.concat([leftover, Buffer.from(chunk)]); + // UTF-8 boundary detection (issue #1272). Look back at most 3 bytes + // for the start of an incomplete multibyte sequence and defer it. + let safeEnd = combined.length; + for (let i = combined.length - 1; i >= Math.max(0, combined.length - 3); i--) { + const b = combined[i]; + if ((b & 0x80) === 0) { safeEnd = i + 1; break; } + if ((b & 0xC0) === 0x80) continue; + const expected = (b & 0xE0) === 0xC0 ? 2 : (b & 0xF0) === 0xE0 ? 3 : 4; + safeEnd = (combined.length - i >= expected) ? combined.length : i; + break; + } + const flush = combined.slice(0, safeEnd); + leftover = combined.slice(safeEnd); + if (flush.length) { + // Always record into the ring buffer (Commit 3) so re-attach can + // replay. session.liveWs is what changes across re-attaches — we + // close over `session`, not the original `ws`, so the write always + // goes to whichever ws is currently attached (or is skipped when + // detached and liveWs is null). + appendToRingBuffer(session, flush); + if (session.liveWs) { + try { session.liveWs.sendBinary(flush); } catch {} + } + } + }); + if (!proc) { + try { + ws.send(JSON.stringify({ + type: 'error', + code: 'CLAUDE_NOT_FOUND', + message: 'claude CLI not on PATH. Install: https://docs.anthropic.com/en/docs/claude-code', + })); + ws.close(4404, 'claude not found'); + } catch {} + return false; + } + session.proc = proc; + proc.exited?.then?.(() => { + try { session.liveWs?.close(1000, 'pty exited'); } catch {} + }); + return true; +} + function buildServer() { return Bun.serve({ hostname: '127.0.0.1', @@ -211,29 +496,66 @@ function buildServer() { const url = new URL(req.url); // /internal/grant — loopback-only handshake from parent server. + // v1.44+: accepts `{token, sessionId?}`. The sessionId binding lets + // the agent route re-attach attempts (same sessionId, fresh token) + // back to the same PtySession. Legacy callers passing just `{token}` + // still work — sessionId becomes null and re-attach is unavailable + // for that grant. if (url.pathname === '/internal/grant' && req.method === 'POST') { - const auth = req.headers.get('authorization'); - if (auth !== `Bearer ${INTERNAL_TOKEN}`) { - return new Response('forbidden', { status: 403 }); - } - return req.json().then((body: any) => { + return internalHandler(req, (body) => { if (typeof body?.token === 'string' && body.token.length > 16) { - validTokens.add(body.token); + const sid = typeof body?.sessionId === 'string' && body.sessionId.length > 0 + ? body.sessionId + : null; + validTokens.set(body.token, sid); } - return new Response('ok'); - }).catch(() => new Response('bad', { status: 400 })); + }); } // /internal/revoke — drop a token (called on WS close or bootstrap reload) if (url.pathname === '/internal/revoke' && req.method === 'POST') { - const auth = req.headers.get('authorization'); - if (auth !== `Bearer ${INTERNAL_TOKEN}`) { - return new Response('forbidden', { status: 403 }); - } - return req.json().then((body: any) => { + return internalHandler(req, (body) => { if (typeof body?.token === 'string') validTokens.delete(body.token); - return new Response('ok'); - }).catch(() => new Response('bad', { status: 400 })); + }); + } + + // /internal/restart — dispose the PtySession for a specific sessionId. + // Scoped to one caller (not enumerate-all). Server.ts /pty-restart + // posts here with the caller's sessionId; we kill ONLY that PTY, + // leaving any other live sidebar tabs untouched. Codex T2 of the + // eng review caught this gap — pre-spec the route would have + // disposed all sessions. + if (url.pathname === '/internal/restart' && req.method === 'POST') { + return internalHandler(req, (body) => { + const sid = typeof body?.sessionId === 'string' ? body.sessionId : null; + if (!sid) return { killed: 0 }; + const session = sessionsById.get(sid); + if (!session) return { killed: 0 }; + // Cancel any pending detach timer before disposal — otherwise it + // would fire later against an already-disposed session. + if (session.detachTimer) { + clearTimeout(session.detachTimer); + session.detachTimer = null; + } + disposeSession(session); + sessionsById.delete(sid); + return { killed: 1 }; + }); + } + + // /internal/healthz — liveness probe used by the v1.44 watchdog. + // Returns this agent's pid + gen + active session count without + // touching claude binary lookup (which can fail for non-process + // reasons and isn't a useful liveness signal). GET — no body to parse, + // so it stays on the bare checkInternalAuth gate. + if (url.pathname === '/internal/healthz' && req.method === 'GET') { + const denied = checkInternalAuth(req); + if (denied) return denied; + return new Response(JSON.stringify({ + pid: process.pid, + gen: CURRENT_GEN, + sessions: validTokens.size, + }), { status: 200, headers: { 'Content-Type': 'application/json' } }); } // /claude-available — bootstrap card hits this when user clicks "I installed it". @@ -305,8 +627,13 @@ function buildServer() { return new Response('unauthorized', { status: 401 }); } + // v1.44+: surface the token's sessionId binding to the upgraded ws. + // open() reads it via ws.data and registers the session in + // sessionsById so /internal/restart and (Commit 3) re-attach + // lookups can find it. + const sessionId = validTokens.get(token) ?? null; const upgraded = server.upgrade(req, { - data: { cookie: token }, + data: { cookie: token, sessionId }, // Echo the protocol back so the browser accepts the upgrade. // Required when the client sends Sec-WebSocket-Protocol — the // server MUST select one of the offered protocols, otherwise @@ -320,22 +647,105 @@ function buildServer() { }, websocket: { + /** + * Spawn the claude PTY for `session` if it hasn't been spawned yet. + * Called from both message paths: the legacy binary-frame trigger + * (any keystroke) AND the v1.44 explicit `{type:"start"}` trigger + * (forceRestart sends this on every fresh WS to get an eager prompt + * without requiring the user to type). Idempotent — a second call + * after `spawned: true` is a no-op. + */ + open(ws) { + const sessionId = (ws.data as any)?.sessionId ?? null; + const cookie = (ws.data as any)?.cookie || ''; + + // Commit 3 re-attach: if this sessionId already has a detached + // PtySession in sessionsById, REPLACE its liveWs ref and replay + // the ring buffer. The PTY process is unchanged — claude keeps + // running through the wifi blip / panel-suspend cycle. + if (sessionId) { + const existing = sessionsById.get(sessionId); + if (existing) { + if (existing.detachTimer) { + clearTimeout(existing.detachTimer); + existing.detachTimer = null; + } + existing.detached = false; + existing.liveWs = ws; + existing.cookie = cookie; + // Re-bind the WS-keyed map so resize/close/message handlers + // can still find this session via the new ws. + sessions.set(ws, existing); + // Restart keepalive on the new ws. + if (existing.pingInterval) clearInterval(existing.pingInterval); + existing.pingInterval = setInterval(() => { + try { ws.send(JSON.stringify({ type: 'ping', ts: Date.now() })); } catch {} + }, KEEPALIVE_INTERVAL_MS); + // Tell the client to prep its xterm (write RIS) before the + // replay binary arrives. Order matters — the binary frame + // immediately after this text frame IS the replay. + try { ws.send(JSON.stringify({ type: 'reattach-begin', sessionId })); } catch {} + try { ws.sendBinary(buildReplayPayload(existing)); } catch {} + return; + } + } + + const session: PtySession = { + proc: null, + cols: 80, + rows: 24, + cookie, + liveWs: ws, + sessionId, + spawned: false, + pingInterval: null, + ringBuffer: [], + ringBufferBytes: 0, + altScreenActive: false, + detached: false, + detachTimer: null, + }; + session.pingInterval = setInterval(() => { + try { + ws.send(JSON.stringify({ type: 'ping', ts: Date.now() })); + } catch { + // ws likely closed mid-tick; close handler clears the interval. + } + }, KEEPALIVE_INTERVAL_MS); + sessions.set(ws, session); + // Index by sessionId for /internal/restart + Commit 3 re-attach. + if (sessionId) sessionsById.set(sessionId, session); + }, + message(ws, raw) { let session = sessions.get(ws); if (!session) { + // Fallback for any path where open() didn't fire (shouldn't happen + // in Bun.serve but keeps the spawn path safe). No keepalive on + // this branch — open() is the supported entry point. session = { proc: null, cols: 80, rows: 24, cookie: (ws.data as any)?.cookie || '', + liveWs: ws, + sessionId: (ws.data as any)?.sessionId ?? null, spawned: false, + pingInterval: null, + ringBuffer: [], + ringBufferBytes: 0, + altScreenActive: false, + detached: false, + detachTimer: null, }; sessions.set(ws, session); + if (session.sessionId) sessionsById.set(session.sessionId, session); } - // Text frames are control messages: {type: "resize", cols, rows} or - // {type: "tabSwitch", tabId, url, title}. Binary frames are raw input - // bytes destined for the PTY stdin. + // Text frames are control messages: {type: "resize", cols, rows}, + // {type: "tabSwitch", tabId, url, title}, {type: "tabState", ...}, + // or v1.44 keepalive frames: {type: "pong", ts}, {type: "keepalive"}. + // Binary frames are raw input bytes destined for the PTY stdin. if (typeof raw === 'string') { let msg: any; try { msg = JSON.parse(raw); } catch { return; } @@ -355,50 +765,32 @@ function buildServer() { handleTabState(msg); return; } + if (msg?.type === 'pong' || msg?.type === 'keepalive' || msg?.type === 'ping') { + // Keepalive frames — accepted and silently dropped. The mere + // fact that the WS carried this frame is the liveness signal; + // there's no application-level state to update at this layer. + // `ping` is acknowledged here too in case the client (or a + // future agent peer) mirrors our server-side ping shape. + return; + } + if (msg?.type === 'start') { + // v1.44 explicit spawn trigger. forceRestart sends this + // immediately on every fresh WS so claude boots without the + // user having to type a keystroke (pre-v1.44, the lazy-binary + // spawn made restart look stuck until the user typed). No-op + // if already spawned. + maybeSpawnPty(ws, session); + return; + } // Unknown text frame — ignore. return; } - // Binary input. Lazy-spawn claude on the first byte. + // Binary input. Lazy-spawn claude on the first byte if `start` + // wasn't sent first. Both paths land in the same maybeSpawnPty + // helper for behavior parity. if (!session.spawned) { - session.spawned = true; - // UTF-8 boundary detection to prevent splitting multi-byte characters (issue #1272). - // Buffer incomplete UTF-8 sequences until the next chunk completes them. - let leftover = Buffer.alloc(0); - const proc = spawnClaude(session.cols, session.rows, (chunk) => { - const combined = Buffer.concat([leftover, Buffer.from(chunk)]); - // Find the last index where a UTF-8 codepoint ends. Look back at most 3 bytes. - let safeEnd = combined.length; - for (let i = combined.length - 1; i >= Math.max(0, combined.length - 3); i--) { - const b = combined[i]; - if ((b & 0x80) === 0) { safeEnd = i + 1; break; } // ASCII - if ((b & 0xC0) === 0x80) continue; // continuation byte - const expected = (b & 0xE0) === 0xC0 ? 2 : (b & 0xF0) === 0xE0 ? 3 : 4; - safeEnd = (combined.length - i >= expected) ? combined.length : i; - break; - } - const flush = combined.slice(0, safeEnd); - leftover = combined.slice(safeEnd); - if (flush.length) { - try { ws.sendBinary(flush); } catch {} - } - }); - if (!proc) { - try { - ws.send(JSON.stringify({ - type: 'error', - code: 'CLAUDE_NOT_FOUND', - message: 'claude CLI not on PATH. Install: https://docs.anthropic.com/en/docs/claude-code', - })); - ws.close(4404, 'claude not found'); - } catch {} - return; - } - session.proc = proc; - // Watch for child exit so the WS closes cleanly when claude exits. - proc.exited?.then?.(() => { - try { ws.close(1000, 'pty exited'); } catch {} - }); + if (!maybeSpawnPty(ws, session)) return; } try { // raw is a Uint8Array; Bun.Terminal.write accepts string|Buffer. @@ -409,16 +801,49 @@ function buildServer() { } }, - close(ws) { + close(ws, code, _reason) { const session = sessions.get(ws); - if (session) { - disposeSession(session); - if (session.cookie) { - // Drop the cookie so it can't be replayed against a new PTY. - validTokens.delete(session.cookie); - } - sessions.delete(ws); + if (!session) return; + // Always drop the WS-keyed map entry and the per-attach + // attachToken — the attach grant was single-use. + sessions.delete(ws); + if (session.cookie) validTokens.delete(session.cookie); + // Keepalive lives with the WS — every attach starts a fresh one. + if (session.pingInterval) { + clearInterval(session.pingInterval); + session.pingInterval = null; } + + // Commit 3 detach state machine. If the close was intentional + // (code 4001 = restart, 4404 = no-claude error), dispose + // immediately — there's no value in keeping the PTY alive. + // Otherwise enter the detach window: claude keeps running, the + // ring buffer keeps accumulating, and a re-attach with the same + // sessionId within DETACH_WINDOW_MS picks back up. If the timer + // fires without a re-attach, the session is disposed normally. + // + // Sessions without a sessionId (legacy single-shot grants) can't + // re-attach by definition — fall through to immediate dispose. + const intentional = code === 4001 || code === 4404 || code === 1000; + if (intentional || !session.sessionId) { + disposeSession(session); + if (session.sessionId) sessionsById.delete(session.sessionId); + return; + } + + // Mark detached and start the disposal timer. The session stays + // in sessionsById so the next /ws upgrade with the same + // sessionId can find and reattach to it. + session.detached = true; + session.liveWs = null; + session.detachTimer = setTimeout(() => { + if (!session.detached) return; // re-attached in the meantime + disposeSession(session); + if (session.sessionId) sessionsById.delete(session.sessionId); + }, DETACH_WINDOW_MS); + // setTimeout returns a Bun Timer; unref so the detach window + // doesn't keep the process alive past natural shutdown. + (session.detachTimer as any)?.unref?.(); }, }, }); @@ -548,14 +973,25 @@ function main() { writeSecureFile(tmp, String(port)); fs.renameSync(tmp, PORT_FILE); + // Write identity-based agent record (pid + per-boot gen). Replaces the + // v1.43- `pkill -f terminal-agent\.ts` regex teardown that could kill + // sibling gstack sessions. Callers (cli.ts spawn site, server.ts + // shutdown, the v1.44 watchdog) now route through killAgentByRecord in + // terminal-agent-control.ts. + writeAgentRecord(dir, { pid: process.pid, gen: CURRENT_GEN, startedAt: Date.now() }); + // Hand the parent the internal token so it can call /internal/grant. // Parent learns INTERNAL_TOKEN via env (TERMINAL_AGENT_INTERNAL_TOKEN below). // We just print it on stdout for the supervising process to pick up if it's // not already in env. Defense against env races at spawn time. - console.log(`[terminal-agent] listening on 127.0.0.1:${port} pid=${process.pid}`); + console.log(`[terminal-agent] listening on 127.0.0.1:${port} pid=${process.pid} gen=${CURRENT_GEN}`); - // Cleanup port file on exit. - const cleanup = () => { safeUnlink(PORT_FILE); process.exit(0); }; + // Cleanup port file + agent record on exit. + const cleanup = () => { + safeUnlink(PORT_FILE); + clearAgentRecord(dir); + process.exit(0); + }; process.on('SIGTERM', cleanup); process.on('SIGINT', cleanup); } diff --git a/browse/test/browser-skill-commands.test.ts b/browse/test/browser-skill-commands.test.ts index 5dc1b6afb..889c2d46e 100644 --- a/browse/test/browser-skill-commands.test.ts +++ b/browse/test/browser-skill-commands.test.ts @@ -178,7 +178,17 @@ describe('buildSpawnEnv', () => { process.env.LANG = 'en_US.UTF-8'; }); afterEach(() => { - process.env = origEnv; + // process.env = origEnv replaces only the reference; the underlying + // env stays mutated and leaks to later test files in the same Bun + // process (e.g., breaks Bun.which('bash') in security.test.ts and + // bun-spawn in pair-agent-tunnel-eval.test.ts). Delete every current + // key then re-assign from the snapshot — restores the actual env. + for (const k of Object.keys(process.env)) { + if (!(k in origEnv)) delete process.env[k]; + } + for (const [k, v] of Object.entries(origEnv)) { + if (v !== undefined) process.env[k] = v; + } }); it('untrusted: drops $HOME and secrets', () => { @@ -293,7 +303,15 @@ describe.skipIf(SKIP_SPAWN)('spawnSkill: lifecycle', () => { expect(parsed.gh).toBeNull(); expect(parsed.gstack).toBeNull(); } finally { - process.env = origEnv; + // See afterEach comment in `buildSpawnEnv` describe — direct + // reassignment of process.env doesn't actually restore the + // underlying env in Bun. Delete + re-assign instead. + for (const k of Object.keys(process.env)) { + if (!(k in origEnv)) delete process.env[k]; + } + for (const [k, v] of Object.entries(origEnv)) { + if (v !== undefined) process.env[k] = v; + } } }); @@ -312,7 +330,12 @@ describe.skipIf(SKIP_SPAWN)('spawnSkill: lifecycle', () => { const parsed = JSON.parse(result.stdout); expect(parsed.home).toBe('/Users/test-user'); } finally { - process.env = origEnv; + for (const k of Object.keys(process.env)) { + if (!(k in origEnv)) delete process.env[k]; + } + for (const [k, v] of Object.entries(origEnv)) { + if (v !== undefined) process.env[k] = v; + } } }); diff --git a/browse/test/cli-supervisor.test.ts b/browse/test/cli-supervisor.test.ts new file mode 100644 index 000000000..d9cec7b89 --- /dev/null +++ b/browse/test/cli-supervisor.test.ts @@ -0,0 +1,81 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 outer supervisor — static-grep invariants. +// +// Pre-v1.44 `$B connect` was fire-and-forget: spawn server detached, CLI +// exits, server runs unsupervised. If the server crashed, the user had to +// re-run `$B connect`. The opt-in supervisor (--supervise or +// BROWSE_SUPERVISE=1) keeps the CLI attached and respawns the server on +// unexpected exit, with the same crash-loop guard shape as the v1.44 +// terminal-agent watchdog. +// +// Live respawn tests belong in the e2e tier (real Bun.spawn cycles take +// 3-8s each). These tripwires defend the load-bearing invariants: +// opt-in by default, signal handlers wired, crash-loop guard, env knobs. + +const CLI_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'cli.ts'); + +describe('CLI outer supervisor (v1.44+)', () => { + test('1. supervisor is opt-in via --supervise flag or BROWSE_SUPERVISE env', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + expect(src).toContain("commandArgs.includes('--supervise')"); + expect(src).toContain("process.env.BROWSE_SUPERVISE === '1'"); + // Default path MUST still exit 0 promptly. The legacy contract is + // that every caller of `$B connect` (Claude Code Bash tool, scripts, + // CI) gets a prompt return. + expect(src).toMatch(/if \(!superviseRequested\) \{\s*process\.exit\(0\);\s*\}/); + }); + + test('2. SIGINT and SIGTERM trigger clean teardown', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + // Both signals must hit the teardown path or the user's Ctrl-C leaves + // an orphaned server (worse than no supervisor). + expect(src).toMatch(/process\.on\('SIGINT'.*teardownAndExit/); + expect(src).toMatch(/process\.on\('SIGTERM'.*teardownAndExit/); + // Teardown must signal the supervised server before exiting itself. + expect(src).toContain("safeKill(state.pid, 'SIGTERM')"); + }); + + test('3. crash-loop guard with 5-in-5min rolling window', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + expect(src).toContain('SUPERVISOR_GUARD_WINDOW_MS = 5 * 60_000'); + expect(src).toContain('SUPERVISOR_GUARD_MAX = 5'); + // Window pruning: a long-lived daemon with sporadic crashes must NOT + // hit the guard (otherwise we punish the user for the supervisor doing + // its job). + expect(src).toMatch(/respawns\.shift\(\)/); + }); + + test('4. exponential backoff schedule, env-overridable', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + expect(src).toContain('GSTACK_SUPERVISOR_BACKOFF'); + // Default schedule must include short waits at first (rapid recovery + // from transient crashes) and cap at a sensible long wait. + expect(src).toContain('1000,2000,4000,8000,30000'); + }); + + test('5. tick interval is env-overridable for tests', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + expect(src).toContain('GSTACK_SUPERVISOR_TICK_MS'); + }); + + test('6. respawned server gets a fresh terminal-agent too', () => { + const src = fs.readFileSync(CLI_TS, 'utf-8'); + // After server respawn, the terminal-agent state is stale (old PID + // record points to a dead agent that exited with its parent). The + // supervisor must re-call spawnTerminalAgent or the PTY path stays + // broken even though the server is back up. + const block = sliceBetween(src, 'Supervisor mode:', '// ─── Headed Disconnect'); + expect(block).toContain('spawnTerminalAgent({'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/pty-session-lease.test.ts b/browse/test/pty-session-lease.test.ts new file mode 100644 index 000000000..a1053d38e --- /dev/null +++ b/browse/test/pty-session-lease.test.ts @@ -0,0 +1,98 @@ +import { describe, test, expect, beforeEach } from 'bun:test'; + +// pty-session-lease registers a sessionId space distinct from the pre-v1.44 +// attach-token space (browse/src/pty-session-cookie.ts). These tests pin +// the validate-first contract that codex outside-voice flagged as critical: +// refreshLease MUST NOT resurrect expired leases, otherwise the 30-min TTL +// stops bounding leaked-token blast radius. + +import { + mintLease, + validateLease, + refreshLease, + revokeLease, + leaseCount, + __resetLeases, +} from '../src/pty-session-lease'; + +beforeEach(() => { + __resetLeases(); +}); + +describe('pty-session-lease: mint/validate/revoke', () => { + test('mintLease returns a fresh non-secret sessionId + future expiresAt', () => { + const a = mintLease(); + const b = mintLease(); + expect(a.sessionId).toBeTruthy(); + expect(b.sessionId).toBeTruthy(); + expect(a.sessionId).not.toBe(b.sessionId); + expect(a.expiresAt).toBeGreaterThan(Date.now()); + // base64url alphabet: characters in [A-Za-z0-9_-]. + expect(a.sessionId).toMatch(/^[A-Za-z0-9_-]+$/); + expect(leaseCount()).toBe(2); + }); + + test('validateLease ok for fresh lease, false for unknown', () => { + const { sessionId } = mintLease(); + const ok = validateLease(sessionId); + expect(ok.ok).toBe(true); + if (ok.ok) expect(ok.expiresAt).toBeGreaterThan(Date.now()); + expect(validateLease('not-a-real-session-id').ok).toBe(false); + expect(validateLease(null).ok).toBe(false); + expect(validateLease(undefined).ok).toBe(false); + }); + + test('revokeLease removes the lease; subsequent validate returns false', () => { + const { sessionId } = mintLease(); + expect(validateLease(sessionId).ok).toBe(true); + revokeLease(sessionId); + expect(validateLease(sessionId).ok).toBe(false); + expect(leaseCount()).toBe(0); + }); + + test('revokeLease tolerates unknown sessionId without throwing', () => { + expect(() => revokeLease('phantom')).not.toThrow(); + expect(() => revokeLease(null)).not.toThrow(); + }); +}); + +describe('pty-session-lease: refresh contract (validate-first)', () => { + test('refreshLease extends expiresAt for a valid lease', () => { + const { sessionId, expiresAt: initial } = mintLease(); + // Sleep micro-tick — Date.now() is ms-grain so a synchronous extend + // may not move the integer. Use a tight async wait instead. + return new Promise((resolve) => { + setTimeout(() => { + const r = refreshLease(sessionId); + expect(r.ok).toBe(true); + if (r.ok) expect(r.expiresAt).toBeGreaterThan(initial); + resolve(); + }, 5); + }); + }); + + test('refreshLease rejects unknown sessionId (validate-first invariant)', () => { + const r = refreshLease('never-minted'); + expect(r.ok).toBe(false); + }); + + test('refreshLease never resurrects an expired lease', async () => { + // Force TTL down to 5ms for this assertion by minting + waiting past expiry. + // Lease internals use Date.now() so the easiest way to expire one is + // to artificially backdate via revoke+remint cycle. Simpler: mint, then + // wait for the registry's own expiry check to trip. + // + // We can't backdate without breaking encapsulation, so this test exercises + // the negative-validate path: minted lease, then prove that refresh after + // explicit revoke still returns ok:false (same as expired-and-pruned). + const { sessionId } = mintLease(); + revokeLease(sessionId); + const r = refreshLease(sessionId); + expect(r.ok).toBe(false); + }); + + test('refreshLease tolerates null / undefined sessionId', () => { + expect(refreshLease(null).ok).toBe(false); + expect(refreshLease(undefined).ok).toBe(false); + }); +}); diff --git a/browse/test/server-auth.test.ts b/browse/test/server-auth.test.ts index 8732866b5..2469a121b 100644 --- a/browse/test/server-auth.test.ts +++ b/browse/test/server-auth.test.ts @@ -63,13 +63,13 @@ describe('Server auth security', () => { // Test 4: /activity/history requires auth via validateAuth test('/activity/history requires authentication', () => { - const historyBlock = sliceBetween(SERVER_SRC, "url.pathname === '/activity/history'", 'Sidebar endpoints'); + const historyBlock = sliceBetween(SERVER_SRC, "url.pathname === '/activity/history'", 'Batch endpoint'); expect(historyBlock).toContain('validateAuth'); }); // Test 5: /activity/history has no wildcard CORS header test('/activity/history has no wildcard CORS header', () => { - const historyBlock = sliceBetween(SERVER_SRC, "url.pathname === '/activity/history'", 'Sidebar endpoints'); + const historyBlock = sliceBetween(SERVER_SRC, "url.pathname === '/activity/history'", 'Batch endpoint'); expect(historyBlock).not.toContain("'*'"); }); @@ -314,7 +314,7 @@ describe('Server auth security', () => { // Regression: connect command crashed with "domains is not defined" because // a stray `domains,` variable was in the status fetch body (cli.ts:852). test('connect command status fetch body has no undefined variable references', () => { - const connectBlock = sliceBetween(CLI_SRC, 'Launching headed Chromium', 'Sidebar agent started'); + const connectBlock = sliceBetween(CLI_SRC, 'Launching headed Chromium', 'Terminal agent started'); // The status fetch should use a clean JSON body expect(connectBlock).toContain("command: 'status'"); // Must NOT contain a bare `domains` reference in the fetch body @@ -335,10 +335,15 @@ describe('Server auth security', () => { // The connect subprocess env must override BROWSE_PARENT_PID expect(pairBlock).toContain("BROWSE_PARENT_PID"); expect(pairBlock).toContain("'0'"); - // The connect command must propagate BROWSE_PARENT_PID=0 to serverEnv - const connectBlock = sliceBetween(CLI_SRC, 'Launching headed Chromium', 'Sidebar agent started'); - expect(connectBlock).toContain("BROWSE_PARENT_PID"); - expect(connectBlock).toContain("serverEnv.BROWSE_PARENT_PID"); + // The connect command must propagate BROWSE_PARENT_PID=0 via the + // serverEnv object literal passed to startServer. The literal text + // `serverEnv.BROWSE_PARENT_PID` is NOT in source — the value is + // assigned via object-literal syntax (`BROWSE_PARENT_PID: '0'`) + // inside the `const serverEnv: Record = { ... }` + // declaration. Assert both pieces appear in the connect block. + const connectBlock = sliceBetween(CLI_SRC, 'Launching headed Chromium', 'Terminal agent started'); + expect(connectBlock).toContain("const serverEnv"); + expect(connectBlock).toContain("BROWSE_PARENT_PID: '0'"); }); // Regression: newtab returned 403 for scoped tokens because the tab ownership diff --git a/browse/test/server-embedder-terminal-port.test.ts b/browse/test/server-embedder-terminal-port.test.ts index 722a331d8..f24ee3510 100644 --- a/browse/test/server-embedder-terminal-port.test.ts +++ b/browse/test/server-embedder-terminal-port.test.ts @@ -14,21 +14,35 @@ import { resolveConfig } from '../src/config'; // Tests for the v1.41+ ownsTerminalAgent flag. // // Embedders (gbrowser phoenix overlay) that run their own PTY server and write -// terminal-port / terminal-internal-token themselves were getting those files -// clobbered by gstack's shutdown(). The flag (default true) gates three side -// effects: pkill -f terminal-agent\.ts, unlink terminal-port, unlink -// terminal-internal-token. False = embedder owns them, gstack stays hands-off. +// terminal-port / terminal-internal-token / terminal-agent-pid themselves were +// getting those files clobbered by gstack's shutdown(). The flag (default true) +// gates four side effects (v1.44+): +// 1. identity-based kill of the PID in /terminal-agent-pid +// 2. unlink terminal-port +// 3. unlink terminal-internal-token +// 4. unlink terminal-agent-pid +// False = embedder owns them, gstack stays hands-off. // -// CRITICAL: each test stubs BOTH process.exit (so shutdown's exit doesn't kill -// the test runner) AND child_process.spawnSync (so pkill doesn't run real -// `pkill -f terminal-agent\.ts` on the developer's machine — would kill any -// sibling gstack sessions). +// Pre-v1.44 used `pkill -f terminal-agent\.ts` which matched sibling gstack +// sessions on the same host — see browse/src/terminal-agent-control.ts header. +// +// CRITICAL: each test stubs process.exit (so shutdown's exit doesn't kill +// the test runner). The PID in the test agent-record is a guaranteed-dead +// PID (1 = init / launchd — exists but cannot be killed by an unprivileged +// process, so safeKill returns ESRCH-equivalent without affecting anything). +// Use isProcessAlive's false branch by also testing with a PID that does +// not exist (negative PID rejected by the OS). const stateDir = resolveConfig().stateDir; const PORT_FILE = path.join(stateDir, 'terminal-port'); const TOKEN_FILE = path.join(stateDir, 'terminal-internal-token'); +const AGENT_RECORD_FILE = path.join(stateDir, 'terminal-agent-pid'); const SENTINEL_PORT = 'sentinel-port-65432'; const SENTINEL_TOKEN = 'sentinel-token-abcdef1234567890'; +// PID 2^31-1 is the Linux PID_MAX_LIMIT; macOS uses 99998. Either way, no +// real process will ever hold this PID on a developer machine. isProcessAlive +// returns false → killAgentByRecord no-ops without sending any signal. +const SENTINEL_DEAD_PID = 2147483646; function makeMinimalConfig(overrides: Partial = {}): ServerConfig { const token = 'embedder-test-' + crypto.randomBytes(16).toString('hex'); @@ -47,6 +61,10 @@ function writeSentinels(): void { fs.mkdirSync(stateDir, { recursive: true }); fs.writeFileSync(PORT_FILE, SENTINEL_PORT); fs.writeFileSync(TOKEN_FILE, SENTINEL_TOKEN); + fs.writeFileSync( + AGENT_RECORD_FILE, + JSON.stringify({ pid: SENTINEL_DEAD_PID, gen: 'sentinel-gen', startedAt: Date.now() }), + ); } function readIfExists(p: string): string | null { @@ -54,32 +72,40 @@ function readIfExists(p: string): string | null { } /** - * Stubs process.exit + child_process.spawnSync, runs the callback, and - * restores both regardless of throw. Returns the captured spawnSync argv - * list so callers can assert pkill was or wasn't invoked. The callback - * is expected to swallow the __exit:N throw from shutdown(). + * Stubs process.exit so shutdown()'s process.exit(0) throws an __exit:N + * marker the test can swallow instead of killing the runner. Also stubs + * process.kill so an accidental kill (regression in killAgentByRecord + * that bypassed isProcessAlive) cannot reach a real PID on the developer + * machine. Returns the captured kill calls so tests can assert kill + * scope. */ async function withStubs( - cb: (spawnSyncCalls: any[][]) => Promise -): Promise { + cb: (killCalls: Array<[number, NodeJS.Signals | number]>) => Promise +): Promise> { const origExit = process.exit; - const childProcess = require('child_process'); - const origSpawnSync = childProcess.spawnSync; - const spawnSyncCalls: any[][] = []; + const origKill = process.kill; + const killCalls: Array<[number, NodeJS.Signals | number]> = []; (process as any).exit = ((code: number) => { throw new Error(`__exit:${code}`); }) as any; - childProcess.spawnSync = ((...args: any[]) => { - spawnSyncCalls.push(args); - return { status: 0, stdout: '', stderr: '', signal: null, pid: 0, output: [] }; + (process as any).kill = ((pid: number, signal: NodeJS.Signals | number) => { + killCalls.push([pid, signal ?? 'SIGTERM']); + // signal 0 is a liveness probe — keep the existing 'process is dead' + // semantics so isProcessAlive(SENTINEL_DEAD_PID) returns false. + if (signal === 0) { + const err: any = new Error('No such process'); + err.code = 'ESRCH'; + throw err; + } + return true; }) as any; try { - await cb(spawnSyncCalls); + await cb(killCalls); } finally { (process as any).exit = origExit; - childProcess.spawnSync = origSpawnSync; + (process as any).kill = origKill; } - return spawnSyncCalls; + return killCalls; } async function runShutdown(handle: { shutdown: (code?: number) => Promise }): Promise { @@ -90,23 +116,28 @@ async function runShutdown(handle: { shutdown: (code?: number) => Promise } } -function pkillCalls(calls: any[][]): any[][] { - return calls.filter((call) => call[0] === 'pkill'); +// Filter out the signal=0 liveness probes; only count actual termination signals. +function terminationCalls( + calls: Array<[number, NodeJS.Signals | number]>, +): Array<[number, NodeJS.Signals | number]> { + return calls.filter(([, sig]) => sig !== 0); } describe('buildFetchHandler ownsTerminalAgent gate', () => { // shutdown() reads `path.dirname(config.stateFile)` from module-level config // (composition gap — see TODOS T9). So unlinks target the real state dir, // not a per-test temp dir. If a real gstack daemon is running on this host, - // its terminal-port + terminal-internal-token live where this test writes. - // Save + restore real-daemon file contents around the whole suite so the - // test never clobbers a developer's running session. + // its terminal-port + terminal-internal-token + terminal-agent-pid live + // where this test writes. Save + restore real-daemon file contents around + // the whole suite so the test never clobbers a developer's running session. let realPortBackup: string | null = null; let realTokenBackup: string | null = null; + let realAgentRecordBackup: string | null = null; beforeAll(() => { realPortBackup = readIfExists(PORT_FILE); realTokenBackup = readIfExists(TOKEN_FILE); + realAgentRecordBackup = readIfExists(AGENT_RECORD_FILE); }); afterAll(() => { @@ -122,6 +153,12 @@ describe('buildFetchHandler ownsTerminalAgent gate', () => { } else { try { fs.unlinkSync(TOKEN_FILE); } catch {} } + if (realAgentRecordBackup !== null) { + fs.mkdirSync(stateDir, { recursive: true }); + fs.writeFileSync(AGENT_RECORD_FILE, realAgentRecordBackup); + } else { + try { fs.unlinkSync(AGENT_RECORD_FILE); } catch {} + } }); beforeEach(() => { @@ -131,9 +168,10 @@ describe('buildFetchHandler ownsTerminalAgent gate', () => { // assertion can't pass spuriously off a stale file. try { fs.unlinkSync(PORT_FILE); } catch {} try { fs.unlinkSync(TOKEN_FILE); } catch {} + try { fs.unlinkSync(AGENT_RECORD_FILE); } catch {} }); - test('1. ownsTerminalAgent:false preserves both files and skips pkill', async () => { + test('1. ownsTerminalAgent:false preserves all three files and sends no signal', async () => { writeSentinels(); const handle = buildFetchHandler(makeMinimalConfig({ ownsTerminalAgent: false })); const calls = await withStubs(async () => { @@ -141,10 +179,11 @@ describe('buildFetchHandler ownsTerminalAgent gate', () => { }); expect(readIfExists(PORT_FILE)).toBe(SENTINEL_PORT); expect(readIfExists(TOKEN_FILE)).toBe(SENTINEL_TOKEN); - expect(pkillCalls(calls).length).toBe(0); + expect(readIfExists(AGENT_RECORD_FILE)).not.toBeNull(); + expect(terminationCalls(calls).length).toBe(0); }); - test('2. ownsTerminalAgent:true (explicit) deletes both files and invokes pkill exactly once', async () => { + test('2. ownsTerminalAgent:true deletes all three files; identity-based kill probes the recorded PID', async () => { writeSentinels(); const handle = buildFetchHandler(makeMinimalConfig({ ownsTerminalAgent: true })); const calls = await withStubs(async () => { @@ -152,13 +191,15 @@ describe('buildFetchHandler ownsTerminalAgent gate', () => { }); expect(readIfExists(PORT_FILE)).toBeNull(); expect(readIfExists(TOKEN_FILE)).toBeNull(); - const pkills = pkillCalls(calls); - expect(pkills.length).toBe(1); - // argv[1] is the args array passed to spawnSync. - expect(pkills[0][1]).toEqual(['-f', 'terminal-agent\\.ts']); + expect(readIfExists(AGENT_RECORD_FILE)).toBeNull(); + // isProcessAlive sends signal 0; PID is the sentinel-dead PID, so the + // probe returns false and no SIGTERM is sent. + const probes = calls.filter(([pid, sig]) => pid === SENTINEL_DEAD_PID && sig === 0); + expect(probes.length).toBeGreaterThan(0); + expect(terminationCalls(calls).length).toBe(0); }); - test('3. ownsTerminalAgent unset defaults to true (deletes + pkill)', async () => { + test('3. ownsTerminalAgent unset defaults to true (deletes all three; probes recorded PID)', async () => { writeSentinels(); // Note: no ownsTerminalAgent in the overrides — uses the `?? true` default. const handle = buildFetchHandler(makeMinimalConfig()); @@ -167,7 +208,9 @@ describe('buildFetchHandler ownsTerminalAgent gate', () => { }); expect(readIfExists(PORT_FILE)).toBeNull(); expect(readIfExists(TOKEN_FILE)).toBeNull(); - expect(pkillCalls(calls).length).toBe(1); + expect(readIfExists(AGENT_RECORD_FILE)).toBeNull(); + const probes = calls.filter(([pid, sig]) => pid === SENTINEL_DEAD_PID && sig === 0); + expect(probes.length).toBeGreaterThan(0); }); test('4. CLI start() call site passes ownsTerminalAgent: true literally (static grep)', () => { diff --git a/browse/test/server-pty-lease-routes.test.ts b/browse/test/server-pty-lease-routes.test.ts new file mode 100644 index 000000000..2c1261883 --- /dev/null +++ b/browse/test/server-pty-lease-routes.test.ts @@ -0,0 +1,94 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// Server-side route shape for the v1.44 lease + restart + dispose + +// lease-refresh wiring. Live route exercises require the terminal-agent +// loopback to be live (e2e-tier); these static-grep tripwires pin the +// load-bearing protocol invariants. + +const SERVER_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'server.ts'); + +describe('server: PTY lease routes (v1.44+ Commit 2)', () => { + test('1. /pty-session returns the 4-tuple shape (sessionId, attachToken, leaseExpiresAt)', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/pty-session' &&", "url.pathname === '/pty-session/reattach'"); + expect(block).toContain('mintLease()'); + expect(block).toContain('grantPtyToken(minted.token, lease.sessionId)'); + expect(block).toContain('sessionId: lease.sessionId'); + expect(block).toContain('attachToken: minted.token'); + expect(block).toContain('leaseExpiresAt: lease.expiresAt'); + // Backward compat: legacy ptySessionToken alias preserved for one release. + expect(block).toContain('ptySessionToken: minted.token'); + }); + + test('2. /pty-session/reattach validates lease + mints fresh attachToken', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/pty-session/reattach'", "url.pathname === '/pty-restart'"); + // Validate-first: rejects unknown/expired sessionId with 410 Gone so + // the client knows to fall back to a fresh /pty-session. + expect(block).toContain('validateLease(sessionId)'); + expect(block).toContain('status: 410'); + // Mint fresh token bound to SAME sessionId. + expect(block).toContain('grantPtyToken(minted.token, sessionId!)'); + }); + + test('3. /pty-restart is one transaction — dispose + revoke + fresh mint', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/pty-restart'", "url.pathname === '/pty-dispose'"); + // Disposes old session (best-effort — missing sessionId is non-fatal). + expect(block).toContain('restartPtySession(oldSessionId)'); + expect(block).toContain('revokeLease(oldSessionId)'); + // Then mints fresh sessionId + lease + attachToken in the same handler. + expect(block).toContain('mintLease()'); + expect(block).toContain('grantPtyToken(minted.token, lease.sessionId)'); + // Returns the same 4-tuple shape so the client doesn't need a + // separate /pty-session round-trip. + expect(block).toContain('attachToken: minted.token'); + expect(block).toContain('leaseExpiresAt: lease.expiresAt'); + }); + + test('4. /pty-dispose accepts body-token (sendBeacon-compatible)', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/pty-dispose'", "url.pathname === '/internal/lease-refresh'"); + // sendBeacon can't set custom headers, so the route MUST accept the + // auth token in the request body. Otherwise pagehide cleanup fails + // silently every time the user closes the browser. + expect(block).toContain('body?.authToken'); + expect(block).toContain('authedByBody'); + // Both auth paths must validate against authToken — never just trust + // a body-supplied token without the equality check. + expect(block).toContain('authTokenFromBody === authToken'); + }); + + test('5. /internal/lease-refresh resets the daemon idle timer (T6)', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/internal/lease-refresh'", '─── /pty-inject-scan'); + expect(block).toContain('refreshLease(sessionId)'); + expect(block).toContain('resetIdleTimer()'); + // Refresh failure (unknown / expired) MUST 410, not 200, so the + // agent knows to close the WS and force a clean re-auth. + expect(block).toContain('status: 410'); + }); + + test('6. grantPtyToken loopback carries sessionId binding', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + expect(src).toMatch(/grantPtyToken\(token: string, sessionId\?: string\)/); + expect(src).toContain('sessionId ? { token, sessionId } : { token }'); + }); + + test('7. restartPtySession helper exists and POSTs the agent /internal/restart', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + expect(src).toMatch(/async function restartPtySession\(sessionId: string\)/); + expect(src).toContain('/internal/restart'); + expect(src).toContain('JSON.stringify({ sessionId })'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/sidepanel-patient-autoconnect.test.ts b/browse/test/sidepanel-patient-autoconnect.test.ts new file mode 100644 index 000000000..faf38499b --- /dev/null +++ b/browse/test/sidepanel-patient-autoconnect.test.ts @@ -0,0 +1,70 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 patient autoConnect — static-grep invariants for the polling loop. +// +// Pre-v1.44 the sidebar gave up at 15s with "Browse server not ready. +// Reload sidebar to retry." Cold-start the browse server takes ~3-8s on a +// healthy laptop, longer on Conductor workspaces / slow CI, so the user +// frequently saw the failure message even when nothing was wrong. The +// fix: poll forever with ascending status messages and only abort on +// explicit unrecoverable signals (401 auth invalid). + +const CLIENT_JS = path.resolve( + new URL(import.meta.url).pathname, + '..', + '..', + '..', + 'extension', + 'sidepanel-terminal.js', +); + +describe('sidepanel tryAutoConnect patience (v1.44+)', () => { + test('1. no 15s give-up message', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + // The v0.x give-up string must NOT reappear — it's the message users + // saw on every cold start and the whole point of v1.44 was to delete it. + expect(src).not.toContain('Browse server not ready. Reload sidebar to retry.'); + }); + + test('2. ascending status messages at 15s / 60s / 5min', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + expect(src).toContain('Waiting for browse server...'); + expect(src).toContain('Still waiting'); + expect(src).toContain('still not responding after 5 min'); + }); + + test('3. sticky abort flag prevents loop spam on 401', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + expect(src).toContain('autoConnectAborted'); + // The mint failure branch must short-circuit on 401 specifically. + expect(src).toMatch(/minted\.error.*startsWith\('401'\)/); + // tryAutoConnect tick must respect the flag. + expect(src).toMatch(/if \(autoConnectAborted\) return/); + }); + + test('4. forceRestart re-arms the loop by clearing the abort flag', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + // forceRestart is the user's "try again" escape hatch — must reset + // the sticky flag or 401-once means stuck-forever. + const block = sliceBetween(src, 'function forceRestart', 'function repaintIfLive'); + expect(block).toContain('autoConnectAborted = false'); + }); + + test('5. poll interval is 2s, not the legacy 200ms tight loop', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + // 200ms ticks burned CPU and made the give-up window land too fast. + // 2s is the v1.44 cadence — verify the tight-loop literal is gone. + expect(src).toContain('setTimeout(tick, 2000)'); + expect(src).not.toContain('setTimeout(tick, 200)'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/sidepanel-reattach.test.ts b/browse/test/sidepanel-reattach.test.ts new file mode 100644 index 000000000..9179e57c4 --- /dev/null +++ b/browse/test/sidepanel-reattach.test.ts @@ -0,0 +1,93 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 Commit 3 — client-side re-attach loop. +// +// On unexpected WS close (anything other than clean 1000 / 4001 / 4404), +// the sidebar now silently posts /pty-session/reattach with backoff, +// opens a new WS with the fresh attachToken, writes RIS to xterm when +// the agent sends {type:"reattach-begin"}, then treats the next binary +// frame as the scrollback replay payload. Static-grep tripwires defend +// the load-bearing protocol invariants; live re-attach exercises belong +// in the e2e tier. + +const TERMINAL_JS = path.resolve( + new URL(import.meta.url).pathname, '..', '..', '..', 'extension', 'sidepanel-terminal.js', +); + +describe('sidepanel re-attach loop (v1.44+ Commit 3)', () => { + test('1. STATE.RECONNECTING exists for the in-flight re-attach window', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toContain("RECONNECTING: 'reconnecting'"); + }); + + test('2. backoff schedule matches the eng-review plan (1s/2s/4s/8s, 60s window)', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toContain('REATTACH_BACKOFF_MS = [1000, 2000, 4000, 8000]'); + expect(src).toContain('REATTACH_WINDOW_MS = 60_000'); + }); + + test('3. startReattachLoop posts /pty-session/reattach with sessionId', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toMatch(/function startReattachLoop\(prevSessionId\)/); + const block = sliceBetween(src, 'function startReattachLoop', 'function openReattachWebSocket'); + expect(block).toContain('/pty-session/reattach'); + expect(block).toContain('sessionId: prevSessionId'); + }); + + test('4. 410 Gone from re-attach short-circuits to ENDED (no retry loop)', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + const block = sliceBetween(src, 'function startReattachLoop', 'function openReattachWebSocket'); + // 410 = lease window expired. Retrying wouldn't help; fall through + // so the user clicks Restart for a fresh session. + expect(block).toContain('resp.status === 410'); + expect(block).toContain('setState(STATE.ENDED)'); + }); + + test('5. 401 from re-attach sticky-aborts auto-connect', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + const block = sliceBetween(src, 'function startReattachLoop', 'function openReattachWebSocket'); + expect(block).toContain('resp.status === 401'); + expect(block).toContain('autoConnectAborted = true'); + }); + + test('6. openReattachWebSocket handles {type:"reattach-begin"} → RIS to xterm', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + const block = sliceBetween(src, 'function openReattachWebSocket', 'async function checkClaudeAvailable'); + expect(block).toContain("msg.type === 'reattach-begin'"); + // RIS (\x1bc) is the full-reset escape that clears xterm cleanly + // before the replay binary arrives. + expect(block).toContain("term.write('\\x1bc')"); + expect(block).toContain('nextBinaryIsReplay = true'); + }); + + test('7. live connect()/forceRestart() close handlers trigger re-attach on transient close', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + // Both the connect() and forceRestart() close handlers must route + // through startReattachLoop for non-clean codes. Count = 3 + // (open-reattach close handler + connect close + forceRestart close). + const occurrences = (src.match(/startReattachLoop\(currentSessionId\)/g) || []).length; + expect(occurrences).toBeGreaterThanOrEqual(3); + }); + + test('8. clean codes (1000 / 4001 / 4404) bypass the re-attach loop', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + // The branch guard MUST exclude these codes from re-attach. 1000 = + // PTY exited (claude quit), 4001 = intentional restart, 4404 = no + // claude on PATH. Re-attaching in those cases would be wasted work + // (or actively wrong — a force-restart that re-attaches to its own + // pre-restart session is the bug we're avoiding). + expect(src).toContain('code === 1000'); + expect(src).toContain('code === 4001'); + expect(src).toContain('code === 4404'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/sidepanel-restart-dispose.test.ts b/browse/test/sidepanel-restart-dispose.test.ts new file mode 100644 index 000000000..8ec44690b --- /dev/null +++ b/browse/test/sidepanel-restart-dispose.test.ts @@ -0,0 +1,106 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 Commit 2C — client-side restart + dispose wiring. +// +// Pre-v1.44 forceRestart only closed the client WS and disposed xterm; +// the old PTY died asynchronously via the agent's WS close handler. +// Race window between kill and mint, two claude instances briefly, +// no prompt visible until the user typed. +// +// Now forceRestart POSTs /pty-restart (one transaction: dispose + mint), +// opens the new WS with the fresh attachToken from the response, and +// sends {type:"start"} for the eager spawn. pagehide handler in +// sidepanel.js sendBeacon /pty-dispose so browser quit / panel close +// doesn't leak a 60s-zombie claude. + +const TERMINAL_JS = path.resolve( + new URL(import.meta.url).pathname, '..', '..', '..', 'extension', 'sidepanel-terminal.js', +); +const SIDEPANEL_JS = path.resolve( + new URL(import.meta.url).pathname, '..', '..', '..', 'extension', 'sidepanel.js', +); + +describe('sidepanel-terminal: forceRestart via /pty-restart (v1.44+)', () => { + test('1. mintSession callers read the 4-tuple (sessionId + attachToken)', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + // The new shape lands in `minted.sessionId` and `minted.attachToken`. + expect(src).toContain('const { terminalPort, sessionId } = minted'); + expect(src).toContain('minted.attachToken || minted.ptySessionToken'); + // Backward-compat fallback to ptySessionToken kept so a partially- + // updated extension still works against a fresh server. + }); + + test('2. eager spawn via {type:"start"} on ws.open', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + // Replaces the legacy `ws.send(TextEncoder().encode("\\n"))` newline + // hack that nudged the lazy-binary-spawn. + expect(src).toMatch(/ws\.send\(JSON\.stringify\(\{\s*type:\s*'start'\s*\}\)\)/); + expect(src).not.toContain("TextEncoder().encode('\\n')"); + }); + + test('3. forceRestart sends 4001 close code (intentional restart)', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toMatch(/ws\.close\(4001/); + }); + + test('4. forceRestart POSTs /pty-restart with current sessionId', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toContain('/pty-restart'); + expect(src).toContain('priorSessionId ? { sessionId: priorSessionId } : {}'); + }); + + test('5. forceRestart 401 triggers sticky abort (no spam loop)', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + // Same defense pattern as connect() — 401 must flip the sticky flag + // or every 2s the user sees a fresh "Auth invalid" message. + const block = sliceBetween(src, 'async function forceRestart', 'function repaintIfLive'); + expect(block).toContain('resp.status === 401'); + expect(block).toContain('autoConnectAborted = true'); + }); + + test('6. currentSessionId is exposed on window for sidepanel.js pagehide', () => { + const src = fs.readFileSync(TERMINAL_JS, 'utf-8'); + expect(src).toContain('window.gstackPtySession = currentSessionId'); + }); +}); + +describe('sidepanel: pagehide → sendBeacon /pty-dispose (v1.44+)', () => { + test('7. pagehide handler fires sendBeacon to /pty-dispose', () => { + const src = fs.readFileSync(SIDEPANEL_JS, 'utf-8'); + expect(src).toMatch(/window\.addEventListener\('pagehide'/); + expect(src).toContain('navigator.sendBeacon'); + expect(src).toContain('/pty-dispose'); + }); + + test('8. pagehide payload carries sessionId + authToken in body (sendBeacon-compat)', () => { + const src = fs.readFileSync(SIDEPANEL_JS, 'utf-8'); + // sendBeacon can't set custom headers — server route accepts body-auth. + // Both fields must be in the payload or the server rejects. + expect(src).toMatch(/JSON\.stringify\(\{\s*sessionId,\s*authToken\s*\}\)/); + expect(src).toContain('window.gstackPtySession'); + expect(src).toContain('window.gstackAuthToken'); + }); + + test('9. pagehide handler is best-effort (try/catch swallows failures)', () => { + const src = fs.readFileSync(SIDEPANEL_JS, 'utf-8'); + // The 60s detach window catches any sendBeacon that fails, so the + // handler MUST not throw — uncaught throws can interfere with the + // browser's unload sequence. Slice between pagehide and end-of-file + // (it's the last addEventListener in sidepanel.js by design). + const i = src.indexOf("addEventListener('pagehide'"); + expect(i).toBeGreaterThan(-1); + const block = src.slice(i); + expect(block).toMatch(/try \{/); + expect(block).toMatch(/} catch /); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/terminal-agent-detach-reattach.test.ts b/browse/test/terminal-agent-detach-reattach.test.ts new file mode 100644 index 000000000..89fbe5a1c --- /dev/null +++ b/browse/test/terminal-agent-detach-reattach.test.ts @@ -0,0 +1,127 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 Commit 3 — detach state machine + ring buffer + re-attach replay. +// +// The state machine is what turns a single network blip from "fall through +// to ENDED state, click Restart" into "silent re-attach with scrollback +// intact, keep typing." Live WS cycles + buffer-overflow exercises belong +// in the e2e tier; these static-grep tripwires defend the load-bearing +// protocol + correctness properties. + +const AGENT_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'terminal-agent.ts'); + +describe('terminal-agent detach + re-attach (v1.44+ Commit 3)', () => { + test('1. PtySession carries ring buffer + alt-screen + detach state', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const i = src.indexOf('interface PtySession {'); + const j = src.indexOf('\n}', i); + const block = src.slice(i, j); + expect(block).toContain('liveWs: any | null'); + expect(block).toContain('ringBuffer: Buffer[]'); + expect(block).toContain('ringBufferBytes: number'); + expect(block).toContain('altScreenActive: boolean'); + expect(block).toContain('detached: boolean'); + expect(block).toContain('detachTimer:'); + }); + + test('2. RING_BUFFER_MAX_BYTES default is 1 MB, env-overridable', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toContain('GSTACK_PTY_RING_BUFFER_BYTES'); + expect(src).toContain('1024 * 1024'); + }); + + test('3. DETACH_WINDOW_MS default is 60s, env-overridable', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toContain('GSTACK_PTY_DETACH_WINDOW_MS'); + expect(src).toContain("'60000'"); + }); + + test('4. appendToRingBuffer evicts oldest frames past the cap', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toMatch(/function appendToRingBuffer\(/); + // Eviction loop: must keep at least one frame even at extreme caps + // (otherwise a single oversized frame would empty the buffer). + expect(src).toMatch(/session\.ringBufferBytes > RING_BUFFER_MAX_BYTES/); + expect(src).toContain('session.ringBuffer.length > 1'); + expect(src).toContain('session.ringBuffer.shift()'); + }); + + test('5. alt-screen tracking watches for CSI ?1049h / CSI ?1049l', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Canonical xterm enter/exit alt-screen sequences. Must update + // session.altScreenActive so the replay prelude knows. + expect(src).toContain('\\x1b[?1049h'); + expect(src).toContain('\\x1b[?1049l'); + expect(src).toContain('session.altScreenActive'); + }); + + test('6. buildReplayPayload prefixes soft-reset (+ alt-screen if active)', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toMatch(/function buildReplayPayload\(/); + // DECSTR soft reset — re-defaults character attributes after the + // client's RIS clears the xterm buffer. + expect(src).toContain('\\x1b[!p'); + // Conditionally re-enter alt-screen if claude was in a tool-call + // (alt-screen mode) at detach. + expect(src).toContain('session.altScreenActive'); + }); + + test('7. WS open() re-attaches when sessionId already lives in sessionsById', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const block = sliceBetween(src, 'open(ws) {', 'message(ws, raw) {'); + expect(block).toContain('sessionsById.get(sessionId)'); + expect(block).toContain('existing.liveWs = ws'); + expect(block).toContain('clearTimeout(existing.detachTimer)'); + // Tells the client to write RIS before treating the next binary + // frame as replay. + expect(block).toContain("type: 'reattach-begin'"); + expect(block).toContain('sendBinary(buildReplayPayload(existing))'); + }); + + test('8. WS close starts detach timer for non-intentional close codes', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const i = src.indexOf('close(ws'); + const j = src.indexOf('function handleTabState', i); + const block = src.slice(i, j); + // 4001 = intentional restart (Commit 2), 4404 = no-claude, 1000 = clean + // exit. Any other code (1006 abnormal, 1001 going-away, etc.) gets the + // 60s detach grace. + expect(block).toContain('code === 4001'); + expect(block).toContain('code === 4404'); + expect(block).toContain('code === 1000'); + expect(block).toContain('session.detached = true'); + expect(block).toContain('session.detachTimer = setTimeout'); + expect(block).toContain('DETACH_WINDOW_MS'); + // Detach timer must unref so the bun process can exit cleanly. + expect(block).toContain('detachTimer as any)?.unref?.()'); + }); + + test('9. /internal/restart cancels detach timer before disposal', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/internal/restart'", "// /claude-available"); + // Without the cancellation, a later detach-timer fire would dispose a + // session that's already been disposed by the explicit restart path. + expect(block).toContain('clearTimeout(session.detachTimer)'); + }); + + test('10. PTY on-data writes through session.liveWs (not the original ws closure)', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Critical for re-attach correctness: the PTY's on-data callback + // closes over `session`, not the original `ws`, so after re-attach + // it routes to the new liveWs automatically. + expect(src).toContain('session.liveWs.sendBinary'); + // Always append to the ring buffer regardless of attach state — so + // a detached session still captures output for the next re-attach. + expect(src).toContain('appendToRingBuffer(session, flush)'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/terminal-agent-internal-handler.test.ts b/browse/test/terminal-agent-internal-handler.test.ts new file mode 100644 index 000000000..04e7f3597 --- /dev/null +++ b/browse/test/terminal-agent-internal-handler.test.ts @@ -0,0 +1,51 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// Static-grep tripwire for the v1.44 internalHandler refactor. +// +// /internal/grant and /internal/revoke were copies of the same dance: +// bearer-auth → x-browse-gen check → req.json().then(...).catch(...). +// internalHandler(req, fn) collapses that into a single helper call. +// This test fails CI if the helper goes away or the existing routes +// regress to inline auth + JSON parse boilerplate. Wiring tests +// (token grant/revoke behavior) already live in +// browse/test/terminal-agent-integration.test.ts. + +const AGENT_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'terminal-agent.ts'); + +describe('terminal-agent internalHandler refactor (v1.44+)', () => { + test('1. internalHandler exists with the documented signature', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toMatch(/async function internalHandler\s*\(/); + // Body must include the auth gate, body parse, and result coercion. + expect(src).toContain('checkInternalAuth(req)'); + expect(src).toContain('await req.json()'); + expect(src).toContain('instanceof Response'); + }); + + test('2. /internal/grant routes through internalHandler', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Match the route handler block. + const block = sliceBetween(src, "url.pathname === '/internal/grant'", "url.pathname === '/internal/revoke'"); + expect(block).toContain('internalHandler(req'); + // Must NOT have the old inline pattern (would be a regression). + expect(block).not.toContain('req.headers.get(\'authorization\')'); + expect(block).not.toContain('req.json().then('); + }); + + test('3. /internal/revoke routes through internalHandler', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/internal/revoke'", "url.pathname === '/internal/healthz'"); + expect(block).toContain('internalHandler(req'); + expect(block).not.toContain('req.json().then('); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/terminal-agent-keepalive.test.ts b/browse/test/terminal-agent-keepalive.test.ts new file mode 100644 index 000000000..812f70f81 --- /dev/null +++ b/browse/test/terminal-agent-keepalive.test.ts @@ -0,0 +1,88 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 WS keepalive — static-grep invariants for the protocol contract. +// +// terminal-agent.ts and sidepanel-terminal.js cooperate on a 25s ping/pong + +// keepalive cycle so long-idle PTY connections survive NAT idle timeouts and +// Chromium's MV3 panel suspension heuristics. The wiring is invisible to +// integration tests (you'd have to wait 25s to observe a ping) but trivially +// regressed by a refactor. These tests fail CI if either side stops sending +// or stops accepting the protocol frames. + +const AGENT_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'terminal-agent.ts'); +const CLIENT_JS = path.resolve(new URL(import.meta.url).pathname, '..', '..', '..', 'extension', 'sidepanel-terminal.js'); + +describe('terminal-agent WS keepalive (v1.44+)', () => { + test('1. agent has a KEEPALIVE_INTERVAL_MS env knob, default 25000', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toContain('GSTACK_PTY_KEEPALIVE_INTERVAL_MS'); + expect(src).toMatch(/KEEPALIVE_INTERVAL_MS\s*=\s*parseInt\(/); + // Default constant present so the env knob has a fallback. + expect(src).toContain("'25000'"); + }); + + test('2. WS open handler starts a ping interval on the session', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // The open(ws) handler in the websocket: { ... } block must call + // setInterval to drive the ping cadence and store the handle. + const wsBlock = sliceBetween(src, 'websocket: {', 'function handleTabState'); + expect(wsBlock).toMatch(/open\s*\(\s*ws\s*\)/); + expect(wsBlock).toContain('setInterval'); + expect(wsBlock).toContain("type: 'ping'"); + expect(wsBlock).toContain('pingInterval'); + }); + + test('3. WS close handler clears the ping interval', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const wsBlock = sliceBetween(src, 'websocket: {', 'function handleTabState'); + // close(ws, code?, reason?) MUST clearInterval the pingInterval — + // otherwise we leak timers across reconnects and the ping handler + // captures a dead ws ref. Signature widened in Commit 3 to include + // the close code for the detach state machine, hence the loose match. + expect(wsBlock).toMatch(/close\s*\(\s*ws/); + expect(wsBlock).toContain('clearInterval(session.pingInterval)'); + }); + + test('4. message handler accepts pong / keepalive frames silently', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // The text-frame router must recognize the keepalive vocabulary — + // if a future refactor strips this branch, unknown-text-frame + // suppression would still drop them but we lose intent. + expect(src).toMatch(/msg\?\.type === 'pong'/); + expect(src).toMatch(/msg\?\.type === 'keepalive'/); + }); + + test('5. client sends keepalive every 25s on ws.open', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + expect(src).toContain('keepaliveInterval'); + expect(src).toMatch(/setInterval\(/); + expect(src).toContain("type: 'keepalive'"); + expect(src).toContain('KEEPALIVE_INTERVAL_MS = 25000'); + }); + + test('6. client replies pong to server ping', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + // The ws.message handler must short-circuit on msg.type === 'ping' + // and reply with {type: 'pong', ts: msg.ts}. + expect(src).toMatch(/msg\.type === 'ping'/); + expect(src).toMatch(/type: 'pong'/); + }); + + test('7. client clears keepalive in close + teardown + forceRestart', () => { + const src = fs.readFileSync(CLIENT_JS, 'utf-8'); + // Three teardown paths exist; all three must drop the interval to + // avoid leaking timers across reconnect attempts. + const occurrences = (src.match(/clearInterval\(keepaliveInterval\)/g) || []).length; + expect(occurrences).toBeGreaterThanOrEqual(3); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/terminal-agent-pid-identity.test.ts b/browse/test/terminal-agent-pid-identity.test.ts new file mode 100644 index 000000000..52503fe2e --- /dev/null +++ b/browse/test/terminal-agent-pid-identity.test.ts @@ -0,0 +1,161 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; +import { + readAgentRecord, + writeAgentRecord, + clearAgentRecord, + killAgentByRecord, + agentRecordPath, + type AgentRecord, +} from '../src/terminal-agent-control'; + +// REGRESSION TEST for the v1.44 PID-identity migration. +// +// Pre-v1.44, both `cli.ts` and `server.ts` killed the terminal-agent with +// `spawnSync('pkill', ['-f', 'terminal-agent\\.ts'])`. That command matches +// by argv regex — any process whose command line contains the string +// `terminal-agent.ts` got SIGTERM'd. In practice this killed: +// +// * sibling gstack sessions on the same host +// * editor processes (vim, code, less) that had the file open +// * any second gstack run on the host +// +// The v1.44 migration replaces both kill sites with identity-based PID kill +// against the record written at `/terminal-agent-pid` by the +// agent's own boot path. This test is the static-grep tripwire that prevents +// reintroducing the regex teardown anywhere in the source tree. +// +// Pattern mirrors browse/test/server-embedder-terminal-port.test.ts (Test 4) +// and browse/test/server-sanitize-surrogates.test.ts: read source files +// directly, assert an invariant on their contents. + +const SRC_DIR = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src'); + +function readAllSourceFiles(): Array<{ file: string; content: string }> { + const out: Array<{ file: string; content: string }> = []; + for (const entry of fs.readdirSync(SRC_DIR)) { + if (!entry.endsWith('.ts')) continue; + const full = path.join(SRC_DIR, entry); + out.push({ file: entry, content: fs.readFileSync(full, 'utf-8') }); + } + return out; +} + +describe('terminal-agent PID identity (v1.44+)', () => { + test('1. no source file calls `pkill -f terminal-agent`', () => { + // The regex matches both `pkill -f terminal-agent\.ts` (escaped form + // used in spawnSync args) and `pkill -f terminal-agent.ts` (literal), + // since the dot is the only difference and both are footguns. + const offenders: string[] = []; + for (const { file, content } of readAllSourceFiles()) { + // Walk line by line so we can skip comments that mention the historical + // pattern (acceptable as documentation, not as code). + const lines = content.split('\n'); + for (let i = 0; i < lines.length; i++) { + const line = lines[i]; + if (!/pkill/.test(line)) continue; + if (!/terminal-agent/.test(line)) continue; + // Skip comment lines — historical mentions in JSDoc are fine. + const trimmed = line.trim(); + if (trimmed.startsWith('//') || trimmed.startsWith('*') || trimmed.startsWith('/*')) continue; + offenders.push(`${file}:${i + 1}: ${trimmed}`); + } + } + expect(offenders).toEqual([]); + }); + + test('2. neither cli.ts nor server.ts calls spawnSync with pkill', () => { + // Tighter check — even if someone routes through a different code path, + // any spawnSync('pkill', ...) anywhere in src/ is the smell. + const offenders: string[] = []; + for (const { file, content } of readAllSourceFiles()) { + if (/spawnSync\s*\(\s*['"]pkill['"]/.test(content)) { + offenders.push(file); + } + } + expect(offenders).toEqual([]); + }); + + test('3. readAgentRecord round-trips writeAgentRecord', () => { + const tmpDir = fs.mkdtempSync(path.join(require('os').tmpdir(), 'gstack-pid-id-')); + try { + const record: AgentRecord = { + pid: 12345, + gen: 'test-gen-abcdef', + startedAt: Date.now(), + }; + writeAgentRecord(tmpDir, record); + const read = readAgentRecord(tmpDir); + expect(read).toEqual(record); + expect(fs.existsSync(agentRecordPath(tmpDir))).toBe(true); + + clearAgentRecord(tmpDir); + expect(readAgentRecord(tmpDir)).toBeNull(); + expect(fs.existsSync(agentRecordPath(tmpDir))).toBe(false); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }); + + test('4. readAgentRecord returns null on missing or malformed file', () => { + const tmpDir = fs.mkdtempSync(path.join(require('os').tmpdir(), 'gstack-pid-id-')); + try { + // Missing. + expect(readAgentRecord(tmpDir)).toBeNull(); + + // Malformed: wrong type for pid. + fs.writeFileSync(agentRecordPath(tmpDir), JSON.stringify({ pid: 'not-a-number', gen: 'x', startedAt: 0 })); + expect(readAgentRecord(tmpDir)).toBeNull(); + + // Malformed: not JSON. + fs.writeFileSync(agentRecordPath(tmpDir), 'definitely not json'); + expect(readAgentRecord(tmpDir)).toBeNull(); + + // Missing field. + fs.writeFileSync(agentRecordPath(tmpDir), JSON.stringify({ pid: 1, gen: 'x' })); + expect(readAgentRecord(tmpDir)).toBeNull(); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }); + + test('5. killAgentByRecord returns false for a dead PID and never throws', () => { + // PID 2147483646 is below Linux PID_MAX_LIMIT but way above macOS's + // typical max — no real process will ever hold it. isProcessAlive + // returns false; killAgentByRecord no-ops. + const record: AgentRecord = { + pid: 2147483646, + gen: 'sentinel', + startedAt: Date.now(), + }; + const result = killAgentByRecord(record, 'SIGTERM'); + expect(result).toBe(false); + }); + + test('6. killAgentByRecord skips the kill when isProcessAlive is false', () => { + // Guard via process.kill stub: confirm killAgentByRecord does NOT call + // process.kill with a non-zero signal when the PID is dead. This is the + // belt-and-suspenders defense against PID-reuse: even if isProcessAlive + // changes implementation, killAgentByRecord must validate liveness first. + const origKill = process.kill; + const kills: Array<[number, NodeJS.Signals | number]> = []; + (process as any).kill = ((pid: number, sig: NodeJS.Signals | number) => { + kills.push([pid, sig ?? 'SIGTERM']); + if (sig === 0) { + const err: any = new Error('ESRCH'); + err.code = 'ESRCH'; + throw err; + } + return true; + }) as any; + try { + const record: AgentRecord = { pid: 9999999, gen: 'x', startedAt: Date.now() }; + killAgentByRecord(record, 'SIGTERM'); + const terminations = kills.filter(([, s]) => s !== 0); + expect(terminations).toEqual([]); + } finally { + (process as any).kill = origKill; + } + }); +}); diff --git a/browse/test/terminal-agent-ring-buffer-runtime.test.ts b/browse/test/terminal-agent-ring-buffer-runtime.test.ts new file mode 100644 index 000000000..5cea4f9b8 --- /dev/null +++ b/browse/test/terminal-agent-ring-buffer-runtime.test.ts @@ -0,0 +1,154 @@ +import { describe, test, expect, beforeEach } from 'bun:test'; +import { + appendToRingBuffer, + buildReplayPayload, + type PtySession, +} from '../src/terminal-agent'; + +// Runtime exercises for the v1.44 Commit 3 ring buffer + replay prelude. +// Companion to browse/test/terminal-agent-detach-reattach.test.ts which +// covers the structural invariants; this file calls the helpers directly +// to prove behavioral correctness without spinning up a real Bun.serve +// listener. + +function fresh(): PtySession { + return { + proc: null, + cols: 80, + rows: 24, + cookie: 'test-cookie', + liveWs: null, + sessionId: 'test-session', + spawned: false, + pingInterval: null, + ringBuffer: [], + ringBufferBytes: 0, + altScreenActive: false, + detached: false, + detachTimer: null, + }; +} + +describe('appendToRingBuffer runtime', () => { + test('appends frames in order and tracks byte count', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('hello ')); + appendToRingBuffer(s, Buffer.from('world')); + expect(s.ringBuffer).toHaveLength(2); + expect(s.ringBufferBytes).toBe(11); + expect(Buffer.concat(s.ringBuffer).toString()).toBe('hello world'); + }); + + test('evicts oldest frames when cap exceeded', () => { + // Default cap is 1 MB. Override via env wouldn't help inside this + // running process (constant was read at module load), so use frames + // big enough to exceed it deterministically. + const s = fresh(); + const big = Buffer.alloc(400_000, 0x41); // 400 KB of 'A' + appendToRingBuffer(s, big); + appendToRingBuffer(s, big); + appendToRingBuffer(s, big); // total 1.2 MB — exceeds default cap + // Eviction must drop frames until under cap; first 400 KB chunk goes. + expect(s.ringBuffer.length).toBeLessThan(3); + expect(s.ringBufferBytes).toBeLessThanOrEqual(1024 * 1024); + }); + + test('keeps at least one frame even when a single frame exceeds the cap', () => { + const s = fresh(); + // 2 MB single frame — bigger than the 1 MB cap. The eviction loop + // guards on `ringBuffer.length > 1`, so the single oversized frame + // stays. Without that guard, the buffer would empty itself, defeating + // the whole point of replay on re-attach. + const huge = Buffer.alloc(2 * 1024 * 1024, 0x42); + appendToRingBuffer(s, huge); + expect(s.ringBuffer.length).toBe(1); + expect(s.ringBufferBytes).toBe(huge.length); + }); + + test('tracks alt-screen enter (CSI ?1049h)', () => { + const s = fresh(); + expect(s.altScreenActive).toBe(false); + appendToRingBuffer(s, Buffer.from('plain text')); + expect(s.altScreenActive).toBe(false); + appendToRingBuffer(s, Buffer.from('\x1b[?1049h')); + expect(s.altScreenActive).toBe(true); + }); + + test('tracks alt-screen exit (CSI ?1049l)', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('\x1b[?1049h')); + expect(s.altScreenActive).toBe(true); + appendToRingBuffer(s, Buffer.from('\x1b[?1049l')); + expect(s.altScreenActive).toBe(false); + }); + + test('trailing state wins when enter + exit appear in one frame', () => { + const s = fresh(); + // Tool call opened alt-screen then closed it inside one render — net + // state is back to main screen. lastIndexOf comparison handles this. + appendToRingBuffer(s, Buffer.from('start\x1b[?1049hmiddle\x1b[?1049lend')); + expect(s.altScreenActive).toBe(false); + + const s2 = fresh(); + // Reverse order: exited then re-entered — net state alt-screen. + appendToRingBuffer(s2, Buffer.from('\x1b[?1049l\x1b[?1049h')); + expect(s2.altScreenActive).toBe(true); + }); +}); + +describe('buildReplayPayload runtime', () => { + test('prepends DECSTR soft reset before ring buffer contents', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('prompt> ')); + const payload = buildReplayPayload(s).toString('latin1'); + expect(payload.startsWith('\x1b[!p')).toBe(true); + expect(payload.endsWith('prompt> ')).toBe(true); + }); + + test('re-enters alt-screen when session was in alt-screen at detach', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('\x1b[?1049h tool output ')); + const payload = buildReplayPayload(s).toString('latin1'); + // Order: soft reset, alt-screen re-enter, ring buffer. + expect(payload.indexOf('\x1b[!p')).toBeLessThan(payload.indexOf('\x1b[?1049h')); + expect(payload.indexOf('\x1b[?1049h')).toBeLessThan(payload.indexOf('tool output')); + }); + + test('omits alt-screen re-enter when session was on main screen', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('regular prompt')); + const payload = buildReplayPayload(s).toString('latin1'); + // Soft reset is present, but alt-screen enter is NOT. Both substrings + // are otherwise identical 8 bytes apart in the alphabet, so equal- + // substring checks need to be strict. + expect(payload).toContain('\x1b[!p'); + expect(payload).not.toContain('\x1b[?1049h'); + }); + + test('replay buffer length = soft-reset + (optional alt-screen) + ring bytes', () => { + const s = fresh(); + appendToRingBuffer(s, Buffer.from('abc')); + appendToRingBuffer(s, Buffer.from('def')); + const payload = buildReplayPayload(s); + // 4 bytes (DECSTR) + 6 bytes (abc/def) = 10 bytes. No alt-screen. + expect(payload.length).toBe(4 + 6); + }); +}); + +describe('lease lifecycle interplay (via pty-session-lease)', () => { + // Cross-module behavior: lease + ring buffer are both per-session. + // This catches the case where a refactor accidentally couples them. + test('lease registry is independent of ring buffer state', async () => { + const { mintLease, validateLease, __resetLeases } = await import('../src/pty-session-lease'); + __resetLeases(); + const a = mintLease(); + const b = mintLease(); + expect(a.sessionId).not.toBe(b.sessionId); + const va = validateLease(a.sessionId); + const vb = validateLease(b.sessionId); + expect(va.ok && vb.ok).toBe(true); + if (va.ok && vb.ok) { + expect(va.expiresAt).toBe(vb.expiresAt); + } + }); +}); diff --git a/browse/test/terminal-agent-session-routing.test.ts b/browse/test/terminal-agent-session-routing.test.ts new file mode 100644 index 000000000..acc51b2af --- /dev/null +++ b/browse/test/terminal-agent-session-routing.test.ts @@ -0,0 +1,96 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 Commit 2 — terminal-agent sessionId routing + eager spawn. +// +// Live spawn tests would require a real claude binary on PATH and a Bun.serve +// listener; both are e2e-tier. These static-grep tripwires defend the load- +// bearing protocol changes: +// - validTokens carries the sessionId binding (Map, not Set) +// - sessionsById index exists for /internal/restart + (Commit 3) re-attach +// - /internal/restart is scoped to one sessionId (codex T2 fix) +// - {type:"start"} triggers spawn for eager UX after forceRestart +// - maybeSpawnPty helper is the single entry point for both spawn paths + +const AGENT_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'terminal-agent.ts'); + +describe('terminal-agent session routing (v1.44+ Commit 2)', () => { + test('1. validTokens is a Map binding token → sessionId', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Pre-Commit 2 was `Set`; the Map carries the sessionId + // binding that /internal/restart and (Commit 3) re-attach depend on. + expect(src).toMatch(/const validTokens = new Map\(\)/); + expect(src).not.toMatch(/const validTokens = new Set { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toMatch(/const sessionsById = new Map\(\)/); + // Populated in open() — required so /internal/restart can find the session. + expect(src).toMatch(/if \(sessionId\) sessionsById\.set\(sessionId, session\)/); + }); + + test('3. /internal/grant binds an optional sessionId to the token', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/internal/grant'", "url.pathname === '/internal/revoke'"); + expect(block).toContain('validTokens.set(body.token, sid)'); + expect(block).toContain('body?.sessionId'); + }); + + test('4. /internal/restart is scoped to one sessionId, not dispose-all', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + const block = sliceBetween(src, "url.pathname === '/internal/restart'", "// /claude-available"); + expect(block).toContain('sessionsById.get(sid)'); + expect(block).toContain('disposeSession(session)'); + expect(block).toContain('sessionsById.delete(sid)'); + // Negative: must NOT enumerate all live sessions and dispose them + // (codex T2 caught this — pre-spec the route killed every PTY on the + // agent, breaking multi-sidebar / pair-agent setups). + expect(block).not.toMatch(/for\s*\(\s*const\s+\[?ws/); + }); + + test('5. WS upgrade surfaces sessionId on ws.data', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toContain('validTokens.get(token) ?? null'); + expect(src).toMatch(/data:\s*\{\s*cookie:\s*token,\s*sessionId\s*\}/); + }); + + test('6. eager spawn via {type:"start"} text frame', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + expect(src).toMatch(/msg\?\.type === 'start'/); + // Both spawn paths route through the same helper for parity. + expect(src).toContain('function maybeSpawnPty('); + expect(src).toMatch(/maybeSpawnPty\(ws, session\)/); + }); + + test('7. close() drops sessionsById entry alongside ws cleanup', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Commit 3 widened the close signature to `close(ws, code, _reason)` + // for the detach state machine. Match either shape so test is stable + // across the rest of the long-lived-sidebar PR. + const i = src.indexOf('close(ws'); + expect(i).toBeGreaterThan(-1); + const j = src.indexOf('function handleTabState', i); + const block = src.slice(i, j); + expect(block).toContain('sessionsById.delete(session.sessionId)'); + }); + + test('8. PtySession interface carries the sessionId field', () => { + const src = fs.readFileSync(AGENT_TS, 'utf-8'); + // Whole interface — close paren is sufficient. + const i = src.indexOf('interface PtySession {'); + expect(i).toBeGreaterThan(-1); + const j = src.indexOf('\n}', i); + const block = src.slice(i, j); + expect(block).toContain('sessionId: string | null'); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/browse/test/terminal-agent-watchdog.test.ts b/browse/test/terminal-agent-watchdog.test.ts new file mode 100644 index 000000000..f012dc406 --- /dev/null +++ b/browse/test/terminal-agent-watchdog.test.ts @@ -0,0 +1,91 @@ +import { describe, test, expect } from 'bun:test'; +import * as fs from 'fs'; +import * as path from 'path'; + +// v1.44 terminal-agent watchdog — static-grep invariants. +// +// The watchdog respawns terminal-agent when its PID dies. Live process-tree +// tests would require spawning, killing, and observing across two real Bun +// processes — slow and flaky in the free tier. These tripwires defend the +// load-bearing properties: identity-based liveness check (not name match), +// crash-loop guard, gated on ownsTerminalAgent, and cleared on shutdown. + +const SERVER_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'server.ts'); +const CONTROL_TS = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'terminal-agent-control.ts'); + +describe('terminal-agent watchdog (v1.44+)', () => { + test('1. spawnTerminalAgent helper exists with PID return type', () => { + const src = fs.readFileSync(CONTROL_TS, 'utf-8'); + expect(src).toMatch(/export function spawnTerminalAgent\(/); + // Must clean up prior PID before spawning (no zombies). + expect(src).toContain('readAgentRecord(stateDir)'); + expect(src).toContain('killAgentByRecord(prior'); + expect(src).toContain('clearAgentRecord(stateDir)'); + }); + + test('2. watchdog is gated on ownsTerminalAgent', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + // Match the comment + the guard. The guard MUST be a positive check; + // an inverted check would respawn for embedders and trample their PTY. + const block = sliceBetween(src, '─── Terminal-Agent Watchdog', 'Factory-scoped validateAuth'); + expect(block).toMatch(/if \(ownsTerminalAgent\)/); + expect(block).toContain('agentWatchdogInterval = setInterval'); + }); + + test('3. watchdog uses PID liveness, not process name probe', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, '─── Terminal-Agent Watchdog', 'Factory-scoped validateAuth'); + // The whole point of the v1.44 watchdog over v1.43- pkill teardown: + // identity-based liveness. Slow-but-alive agents must NOT trigger + // respawn (split-brain defense). + expect(block).toContain('readAgentRecord(stateDir)'); + expect(block).toContain('isProcessAlive(record.pid)'); + // Negative: no executable name-based process lookup. Allow the strings + // to appear in prose comments (the watchdog doc explains what it + // replaces), reject only actual invocations. + expect(block).not.toMatch(/spawnSync\s*\(\s*['"]pkill/); + expect(block).not.toMatch(/Bun\.spawn\s*\(\s*\[\s*['"]pgrep/); + }); + + test('4. crash-loop guard with rolling window', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + const block = sliceBetween(src, '─── Terminal-Agent Watchdog', 'Factory-scoped validateAuth'); + expect(block).toContain('RESPAWN_GUARD_WINDOW_MS = 60_000'); + expect(block).toContain('RESPAWN_GUARD_MAX = 3'); + expect(block).toContain('respawnHistory'); + expect(block).toContain('agentRespawnGuardTripped'); + // Window pruning: old entries must be evicted before counting toward + // the limit. Otherwise a daemon up for a week with one crash a day + // would eventually trip the guard. + expect(block).toMatch(/respawnHistory\.shift\(\)/); + }); + + test('5. watchdog interval is cleared on shutdown', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + expect(src).toContain('if (agentWatchdogInterval) clearInterval(agentWatchdogInterval)'); + }); + + test('6. tick interval is env-overridable for tests', () => { + const src = fs.readFileSync(SERVER_TS, 'utf-8'); + expect(src).toContain('GSTACK_AGENT_WATCHDOG_TICK_MS'); + }); + + test('7. CLI cold-start path uses the same spawnTerminalAgent helper', () => { + const cli = fs.readFileSync( + path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src', 'cli.ts'), + 'utf-8', + ); + // Otherwise the CLI and watchdog could drift on spawn env/cwd, and + // teardown invariants tested against one would silently miss the other. + expect(cli).toContain('spawnTerminalAgent({'); + expect(cli).toContain("from './terminal-agent-control'"); + }); +}); + +function sliceBetween(source: string, start: string, end: string): string { + const i = source.indexOf(start); + if (i === -1) throw new Error(`marker not found: ${start}`); + const j = source.indexOf(end, i + start.length); + if (j === -1) throw new Error(`end marker not found: ${end}`); + return source.slice(i, j); +} diff --git a/bunfig.toml b/bunfig.toml new file mode 100644 index 000000000..c53535f07 --- /dev/null +++ b/bunfig.toml @@ -0,0 +1,8 @@ +[test] +# Snapshot + restore process.env after every test. Centralizes the +# defense against per-file env mutations leaking into subsequent files +# (e.g., PATH = '/test/bin:/usr/bin' from a security-test fixture +# causing `Bun.which('bash')` to return null in unrelated downstream +# tests, or `Bun.spawn(['bun', ...])` to ENOENT when looking for bun +# via the polluted PATH). +preload = ["./test-setup.ts"] diff --git a/extension/sidepanel-terminal.js b/extension/sidepanel-terminal.js index 4ac0065d0..e6287abca 100644 --- a/extension/sidepanel-terminal.js +++ b/extension/sidepanel-terminal.js @@ -42,12 +42,68 @@ }; /** State machine. */ - const STATE = { IDLE: 'idle', CONNECTING: 'connecting', LIVE: 'live', ENDED: 'ended', NO_CLAUDE: 'no-claude' }; + const STATE = { + IDLE: 'idle', + CONNECTING: 'connecting', + LIVE: 'live', + ENDED: 'ended', + NO_CLAUDE: 'no-claude', + RECONNECTING: 'reconnecting', // v1.44 Commit 3 — re-attach loop active + }; let state = STATE.IDLE; let term = null; let fitAddon = null; let ws = null; + /** + * Sticky abort flag for tryAutoConnect's polling loop. Set when we + * receive an unrecoverable signal (auth invalid → 401, claude not + * found, fatal server error) so the loop doesn't keep retrying and + * spamming the user with the same failure message every 2s. Cleared + * by forceRestart() so the user can re-enter the polling loop after + * fixing whatever was wrong. + */ + let autoConnectAborted = false; + /** + * v1.44 session identity. The stable, non-secret sessionId minted by + * /pty-session and surfaced back via window.gstackPtySession so the + * sidepanel.js pagehide handler can sendBeacon /pty-dispose for THIS + * specific session. forceRestart sends this to /pty-restart so the + * server can scope the disposal to one terminal rather than all. + */ + let currentSessionId = null; + /** + * Commit 3 re-attach loop. Set true while a re-attach is in flight so + * concurrent ws.close events (e.g. user clicks Restart mid-reconnect) + * can short-circuit. Reset by every state transition out of RECONNECTING. + */ + let reattachInFlight = false; + /** + * Set true after a {type:"reattach-begin"} text frame and reset after + * the next binary frame is treated as replay payload. The flag is what + * lets the message handler distinguish "this binary is the scrollback + * replay, write RIS first to clear xterm" from "this is live PTY + * output, just feed it through." + */ + let nextBinaryIsReplay = false; + /** + * Re-attach backoff schedule (ms). 1s, 2s, 4s, 8s, then 8s steady until + * 60s total elapsed (Commit 3 detach window). If all attempts fail, + * fall through to ENDED state and the user clicks Restart for a fresh + * session. + */ + const REATTACH_BACKOFF_MS = [1000, 2000, 4000, 8000]; + const REATTACH_WINDOW_MS = 60_000; + /** + * 25s client-side WS keepalive interval (v1.44+). Belt-and-suspenders with + * the server-side ping in terminal-agent.ts: server pings cover most + * idle-NAT cases, client keepalive frames also defend against Chromium's + * MV3-adjacent panel suspension heuristics that can pause our timers. + * Started on ws.open, cleared on ws.close. The agent silently accepts + * `{type:"keepalive"}` text frames. + */ + let keepaliveInterval = null; + const KEEPALIVE_INTERVAL_MS = 25000; function show(el) { el.style.display = ''; } function hide(el) { el.style.display = 'none'; } @@ -139,6 +195,187 @@ } } + /** + * Commit 3 — re-attach loop. Triggered by an unexpected WS close + * (anything other than the v1.44 intentional codes) while state was + * LIVE. Posts /pty-session/reattach with the current sessionId; on + * success opens a new WS, feeds the {type:"reattach-begin"} + + * replay-binary handshake from the agent into xterm. + * + * Backoff: 1s, 2s, 4s, 8s, then 8s steady. Total wall budget is the + * server's DETACH_WINDOW_MS (default 60s) — past that point the + * server has disposed our session and any re-attach attempt will + * return 410 Gone. + * + * Aborts on: + * - reattachInFlight transitions to false (user clicked Restart or + * navigated away) + * - 410 Gone from /pty-session/reattach (lease expired) + * - 401 (auth invalid) + * - REATTACH_WINDOW_MS elapsed + */ + function startReattachLoop(prevSessionId) { + if (!prevSessionId) { + setState(STATE.ENDED); + return; + } + const serverPort = getServerPort(); + const authToken = getAuthToken(); + if (!serverPort || !authToken) { + setState(STATE.ENDED); + return; + } + reattachInFlight = true; + setState(STATE.RECONNECTING); + const startedAt = Date.now(); + let attempt = 0; + + const tick = async () => { + if (!reattachInFlight) return; + if (Date.now() - startedAt > REATTACH_WINDOW_MS) { + reattachInFlight = false; + setState(STATE.ENDED); + return; + } + attempt += 1; + let resp; + try { + resp = await fetch(`http://127.0.0.1:${serverPort}/pty-session/reattach`, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'Authorization': `Bearer ${authToken}`, + }, + body: JSON.stringify({ sessionId: prevSessionId }), + credentials: 'include', + }); + } catch (err) { + scheduleNextAttempt(); + return; + } + if (resp.status === 410) { + // Server disposed the session — lease window closed. No point + // retrying; fall through so the user clicks Restart for a fresh + // session. + reattachInFlight = false; + setState(STATE.ENDED); + return; + } + if (resp.status === 401) { + reattachInFlight = false; + autoConnectAborted = true; + setState(STATE.IDLE, { + message: 'Auth invalid — reload the sidebar or restart your gstack session.', + }); + return; + } + if (!resp.ok) { + scheduleNextAttempt(); + return; + } + let body; + try { body = await resp.json(); } catch { body = null; } + if (!body || !body.terminalPort || !body.attachToken) { + scheduleNextAttempt(); + return; + } + reattachInFlight = false; + openReattachWebSocket(body.terminalPort, body.attachToken, body.sessionId || prevSessionId); + }; + + const scheduleNextAttempt = () => { + const backoffIdx = Math.min(attempt - 1, REATTACH_BACKOFF_MS.length - 1); + const delay = REATTACH_BACKOFF_MS[backoffIdx] ?? 8000; + setTimeout(tick, delay); + }; + + tick(); + } + + /** + * Open the post-reattach WebSocket. Mostly a clone of connect()'s + * attach wiring but with the {type:"reattach-begin"} → RIS → binary + * replay handshake added. The xterm element is REUSED (not disposed) so + * the buffer flash is minimal — RIS clears it cleanly just before the + * replay arrives. + */ + function openReattachWebSocket(terminalPort, attachToken, sessionId) { + currentSessionId = sessionId || null; + try { window.gstackPtySession = currentSessionId; } catch {} + setState(STATE.LIVE); + ensureXterm(); + nextBinaryIsReplay = false; + ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${attachToken}`]); + ws.binaryType = 'arraybuffer'; + + ws.addEventListener('open', () => { + try { + ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows })); + } catch {} + if (keepaliveInterval) clearInterval(keepaliveInterval); + keepaliveInterval = setInterval(() => { + if (!ws || ws.readyState !== WebSocket.OPEN) return; + try { ws.send(JSON.stringify({ type: 'keepalive' })); } catch {} + }, KEEPALIVE_INTERVAL_MS); + }); + + ws.addEventListener('message', (ev) => { + if (typeof ev.data === 'string') { + try { + const msg = JSON.parse(ev.data); + if (msg.type === 'reattach-begin') { + // Clear xterm before the replay binary arrives — RIS (\x1bc) + // is a full hardware reset that flushes the buffer and + // resets all attributes. The server's replay starts with + // DECSTR + optional alt-screen re-enter for safety. + try { term.write('\x1bc'); } catch {} + nextBinaryIsReplay = true; + return; + } + if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') { + setState(STATE.NO_CLAUDE); + try { ws.close(); } catch {} + return; + } + if (msg.type === 'ping') { + try { ws.send(JSON.stringify({ type: 'pong', ts: msg.ts })); } catch {} + return; + } + } catch {} + return; + } + const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data; + // First binary frame after reattach-begin is the replay payload; + // write it through unchanged (server already prefixed soft-reset). + // Subsequent binary frames are live PTY output. + term.write(buf); + if (nextBinaryIsReplay) nextBinaryIsReplay = false; + }); + + ws.addEventListener('close', (ev) => { + ws = null; + if (keepaliveInterval) { + clearInterval(keepaliveInterval); + keepaliveInterval = null; + } + // If THIS reattach WS also closes uncleanly, recurse into another + // re-attach loop with the SAME sessionId — the server may still + // be inside the detach window. The state check + sessionId guard + // prevent runaway recursion (ENDED short-circuits the next loop). + if (state !== STATE.LIVE) return; + const code = (ev && (ev.code ?? 1006)) || 1006; + const intentional = code === 1000 || code === 4001 || code === 4404; + if (intentional || !currentSessionId) { + setState(intentional ? STATE.ENDED : STATE.ENDED); + return; + } + startReattachLoop(currentSessionId); + }); + ws.addEventListener('error', (err) => { + console.error('[gstack terminal] reattach ws error', err); + }); + } + async function checkClaudeAvailable(terminalPort) { try { const resp = await fetch(`http://127.0.0.1:${terminalPort}/claude-available`, { @@ -315,14 +552,32 @@ const minted = await mintSession(); if (minted.error) { + // 401 = stale auth token; no amount of retrying will fix it. Sticky + // abort the polling loop so we don't spam the same error every 2s + // until the user clicks Restart (which clears the flag). + if (typeof minted.error === 'string' && minted.error.startsWith('401')) { + autoConnectAborted = true; + setState(STATE.IDLE, { + message: 'Auth invalid — reload the sidebar or restart your gstack session.', + }); + return; + } setState(STATE.IDLE, { message: `Cannot start: ${minted.error}` }); return; } - const { terminalPort, ptySessionToken } = minted; - if (!ptySessionToken) { - setState(STATE.IDLE, { message: 'Cannot start: no session token returned' }); + // v1.44 4-tuple: { terminalPort, sessionId, attachToken, leaseExpiresAt } + // Falls back to the legacy `ptySessionToken` field for one minor release + // (server keeps the alias) so a partially-updated extension still works + // against a fresh server. + const { terminalPort, sessionId } = minted; + const attachToken = minted.attachToken || minted.ptySessionToken; + if (!attachToken) { + setState(STATE.IDLE, { message: 'Cannot start: no attach token returned' }); return; } + currentSessionId = sessionId || null; + // Expose for sidepanel.js pagehide handler — see Commit 2C wiring. + try { window.gstackPtySession = currentSessionId; } catch {} // Pre-flight: does claude even exist on PATH? const claudeStatus = await checkClaudeAvailable(terminalPort); @@ -344,7 +599,7 @@ // SameSite=Strict don't survive the jump from server.ts:34567 to the // agent's random port from a chrome-extension origin, so cookies // alone weren't reliable. - ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${ptySessionToken}`]); + ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${attachToken}`]); ws.binaryType = 'arraybuffer'; ws.addEventListener('open', () => { @@ -369,18 +624,38 @@ } }); } catch {} - // Send a single byte to nudge the agent to spawn claude (lazy-spawn trigger). - try { ws.send(new TextEncoder().encode('\n')); } catch {} + // v1.44 eager spawn: send {type:"start"} so the agent boots claude + // without requiring the user to type a keystroke. Pre-v1.44 the + // lazy-binary-spawn pattern made forceRestart look stuck for ~2-3s + // until the user pressed any key. + try { ws.send(JSON.stringify({ type: 'start' })); } catch {} + // v1.44 client-side keepalive. Server pings every 25s; we ALSO send + // keepalive frames at the same cadence so a paused timer on either + // side still has the other to lean on. Both are silently dropped + // by the agent's message handler. + if (keepaliveInterval) clearInterval(keepaliveInterval); + keepaliveInterval = setInterval(() => { + if (!ws || ws.readyState !== WebSocket.OPEN) return; + try { ws.send(JSON.stringify({ type: 'keepalive' })); } catch {} + }, KEEPALIVE_INTERVAL_MS); }); ws.addEventListener('message', (ev) => { if (typeof ev.data === 'string') { - // Agent control message (rare). Treat as JSON; error frames carry code. + // Agent control message. Treat as JSON; error frames carry code, + // ping frames trigger an immediate pong reply. try { const msg = JSON.parse(ev.data); if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') { setState(STATE.NO_CLAUDE); try { ws.close(); } catch {} + return; + } + if (msg.type === 'ping') { + // Mirror the server's timestamp back. Cheap liveness ACK that + // lets the agent observe round-trip latency for free. + try { ws.send(JSON.stringify({ type: 'pong', ts: msg.ts })); } catch {} + return; } } catch {} return; @@ -390,9 +665,26 @@ term.write(buf); }); - ws.addEventListener('close', () => { + ws.addEventListener('close', (ev) => { ws = null; - if (state !== STATE.NO_CLAUDE) setState(STATE.ENDED); + if (keepaliveInterval) { + clearInterval(keepaliveInterval); + keepaliveInterval = null; + } + if (state === STATE.NO_CLAUDE) return; + // v1.44 Commit 3 — re-attach loop on transient close. Clean codes + // (1000 = pty exited, 4001 = intentional restart, 4404 = no-claude) + // skip the loop and fall through to ENDED. Any other code + // (1006 abnormal, 1001 going-away) is a candidate for re-attach + // within the 60s server-side detach window, provided we still + // have a sessionId to match against. + const code = (ev && (ev.code ?? 1006)) || 1006; + const intentional = code === 1000 || code === 4001 || code === 4404; + if (state === STATE.LIVE && !intentional && currentSessionId) { + startReattachLoop(currentSessionId); + return; + } + setState(STATE.ENDED); }); ws.addEventListener('error', (err) => { @@ -401,6 +693,10 @@ } function teardown() { + if (keepaliveInterval) { + clearInterval(keepaliveInterval); + keepaliveInterval = null; + } try { ws && ws.close(); } catch {} ws = null; if (term) { @@ -417,16 +713,188 @@ * Force a fresh session: close any open WS, dispose xterm, return to * IDLE, kick off auto-connect. Safe to call from any state. */ - function forceRestart() { - try { ws && ws.close(); } catch {} + /** + * v1.44 forceRestart: hits the server's /pty-restart one-transaction + * endpoint with the current sessionId. The server kills the old PtySession + * scope-to-our-id, mints a fresh sessionId + lease + attachToken, and + * returns the new 4-tuple in one round trip. Zero race window between + * kill and mint (codex D8). + * + * If we don't have a sessionId (sidebar is in IDLE / ENDED state because + * the prior session ended cleanly), the route accepts that gracefully — + * skips the dispose step and just mints fresh. Either way the user sees + * the same "Restarting..." → fresh prompt UX. + */ + async function forceRestart() { + if (keepaliveInterval) { + clearInterval(keepaliveInterval); + keepaliveInterval = null; + } + // Re-arm the auto-connect loop in case a prior auth failure stuck the + // sticky flag — explicit user action is the cleared-flag signal. + autoConnectAborted = false; + setState(STATE.IDLE, { message: 'Restarting Claude Code...' }); + + const serverPort = getServerPort(); + const authToken = getAuthToken(); + const priorSessionId = currentSessionId; + + // Close the local WS BEFORE the server-side kill so the agent's + // close handler doesn't race with the dispose call. + try { ws && ws.close(4001, 'intentional-restart'); } catch {} ws = null; if (term) { try { term.dispose(); } catch {} term = null; fitAddon = null; } - setState(STATE.IDLE, { message: 'Starting Claude Code...' }); - tryAutoConnect(); + + if (!serverPort || !authToken) { + // Server hasn't been discovered yet — fall back to the patient + // polling loop. forceRestart's promise of "fresh prompt now" can't + // be met without a live server; user sees the patient status path. + tryAutoConnect(); + return; + } + + let nextTuple = null; + try { + const resp = await fetch(`http://127.0.0.1:${serverPort}/pty-restart`, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'Authorization': `Bearer ${authToken}`, + }, + body: JSON.stringify(priorSessionId ? { sessionId: priorSessionId } : {}), + credentials: 'include', + }); + if (resp.ok) { + nextTuple = await resp.json(); + } else if (resp.status === 401) { + autoConnectAborted = true; + setState(STATE.IDLE, { + message: 'Auth invalid — reload the sidebar or restart your gstack session.', + }); + return; + } else if (resp.status === 503) { + // Agent down — fall through to patient autoconnect which will + // surface the appropriate "waiting for server" status. + setState(STATE.IDLE, { message: 'Restart failed: terminal agent not ready. Retrying...' }); + } else { + const body = await resp.text().catch(() => ''); + setState(STATE.IDLE, { message: `Restart failed: ${resp.status} ${body || resp.statusText}` }); + } + } catch (err) { + setState(STATE.IDLE, { + message: `Restart failed: ${err && err.message ? err.message : String(err)}`, + }); + } + + if (!nextTuple) { + // Restart didn't yield a fresh tuple. Fall back to the regular + // connect path; tryAutoConnect will retry as the server recovers. + currentSessionId = null; + try { window.gstackPtySession = null; } catch {} + tryAutoConnect(); + return; + } + + // We have a fresh 4-tuple — open the new WS directly without going + // through mintSession again. This is the explicit "no race window" + // path the codex D8 redesign was after. + const { terminalPort, sessionId, attachToken, expiresAt: _expiresAt } = nextTuple; + const token = attachToken || nextTuple.ptySessionToken; + if (!terminalPort || !token) { + currentSessionId = null; + tryAutoConnect(); + return; + } + currentSessionId = sessionId || null; + try { window.gstackPtySession = currentSessionId; } catch {} + + // Pre-flight: claude still on PATH? + const claudeStatus = await checkClaudeAvailable(terminalPort); + if (!claudeStatus.available) { + setState(STATE.NO_CLAUDE); + return; + } + + setState(STATE.LIVE); + ensureXterm(); + ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${token}`]); + ws.binaryType = 'arraybuffer'; + + ws.addEventListener('open', () => { + try { + ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows })); + } catch {} + try { + chrome.runtime.sendMessage({ type: 'getTabState' }, (resp) => { + if (resp && ws && ws.readyState === WebSocket.OPEN) { + try { + ws.send(JSON.stringify({ + type: 'tabState', + active: resp.active, + tabs: resp.tabs, + reason: 'restart', + })); + } catch {} + } + }); + } catch {} + // Eager spawn — fresh claude prompt visible without user keystroke. + try { ws.send(JSON.stringify({ type: 'start' })); } catch {} + if (keepaliveInterval) clearInterval(keepaliveInterval); + keepaliveInterval = setInterval(() => { + if (!ws || ws.readyState !== WebSocket.OPEN) return; + try { ws.send(JSON.stringify({ type: 'keepalive' })); } catch {} + }, KEEPALIVE_INTERVAL_MS); + }); + + ws.addEventListener('message', (ev) => { + if (typeof ev.data === 'string') { + try { + const msg = JSON.parse(ev.data); + if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') { + setState(STATE.NO_CLAUDE); + try { ws.close(); } catch {} + return; + } + if (msg.type === 'ping') { + try { ws.send(JSON.stringify({ type: 'pong', ts: msg.ts })); } catch {} + return; + } + } catch {} + return; + } + const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data; + term.write(buf); + }); + + ws.addEventListener('close', (ev) => { + ws = null; + if (keepaliveInterval) { + clearInterval(keepaliveInterval); + keepaliveInterval = null; + } + if (state === STATE.NO_CLAUDE) return; + // v1.44 Commit 3 — re-attach loop on transient close. Clean codes + // (1000 = pty exited, 4001 = intentional restart, 4404 = no-claude) + // skip the loop and fall through to ENDED. Any other code + // (1006 abnormal, 1001 going-away) is a candidate for re-attach + // within the 60s server-side detach window, provided we still + // have a sessionId to match against. + const code = (ev && (ev.code ?? 1006)) || 1006; + const intentional = code === 1000 || code === 4001 || code === 4404; + if (state === STATE.LIVE && !intentional && currentSessionId) { + startReattachLoop(currentSessionId); + return; + } + setState(STATE.ENDED); + }); + ws.addEventListener('error', (err) => { + console.error('[gstack terminal] ws error', err); + }); } /** @@ -503,23 +971,47 @@ * as /health succeeds), then fires connect() automatically. The user * doesn't have to press a key — Terminal is the default tab and "tap to * start" was a needless paper cut on every reload. + * + * v1.44 patience overhaul: no more 15s give-up. The user already opened + * the sidebar; giving up tells them "you did something wrong" when the + * truth is the daemon is slow to boot (or restarting via the upstream + * supervisor). We poll forever at 2s intervals with ascending status + * messages so the user knows we're still trying, and ONLY abort on + * explicit signals: state transition out of IDLE (connect succeeded + * or user navigated), or an unrecoverable auth/network signal. */ function tryAutoConnect() { if (state !== STATE.IDLE) return; - let waited = 0; + if (autoConnectAborted) return; + const startedAt = Date.now(); const tick = () => { // If the user navigated away (Chat tab) or already connected, drop out. if (state !== STATE.IDLE) return; + // If a prior attempt hit an unrecoverable error (401, etc.), stop + // polling. The user clears the flag by clicking Restart. + if (autoConnectAborted) return; if (getServerPort() && getAuthToken()) { connect(); return; } - waited += 200; - if (waited > 15000) { - setState(STATE.IDLE, { message: 'Browse server not ready. Reload sidebar to retry.' }); - return; + // Ascending status messages — the user wants to know the sidebar is + // still trying. Each threshold is the moment the message would + // mislead if left silent: at 15s "should have started by now," at + // 60s "the server might be in trouble," at 5min "stop waiting and + // check on it manually." + const elapsed = Date.now() - startedAt; + if (elapsed > 300_000) { + setState(STATE.IDLE, { + message: 'Browse server still not responding after 5 min. Try `$B status` in a terminal.', + }); + } else if (elapsed > 60_000) { + setState(STATE.IDLE, { + message: 'Still waiting — browse server may be slow to start.', + }); + } else if (elapsed > 15_000) { + setState(STATE.IDLE, { message: 'Waiting for browse server...' }); } - setTimeout(tick, 200); + setTimeout(tick, 2000); }; tick(); } diff --git a/extension/sidepanel.js b/extension/sidepanel.js index 6328d7c51..14834519b 100644 --- a/extension/sidepanel.js +++ b/extension/sidepanel.js @@ -1083,3 +1083,38 @@ chrome.runtime.onMessage.addListener((msg) => { })); } }); + +// ─── v1.44 pagehide: explicit PTY dispose on sidebar close ────────── +// +// Codex T3 of the eng review: WS close codes alone can't distinguish +// "intentional close" (sidebar closed, browser quit, extension reload) +// from "transient blip" (wifi hiccup) reliably — Chrome routes the +// former through code 1001 (going-away) and the latter through 1006 +// (abnormal), but neither is a load-bearing contract across browsers +// and extension lifecycles. +// +// pagehide fires reliably for tab close, panel close, extension reload, +// and navigation-away. We use it to fire-and-forget a /pty-dispose POST +// so the server can synchronously dispose the PtySession instead of +// waiting for the 60s detach window (Commit 3) to time out. Zombie +// claude processes lingering for 60s on every browser quit was the +// codex-flagged failure mode. +// +// sendBeacon is the only fetch primitive that survives a closing page — +// it doesn't accept custom headers, which is why the server's +// /pty-dispose route accepts the auth token in the BODY (see +// server-pty-lease-routes.test.ts test 4). +window.addEventListener('pagehide', () => { + const sessionId = window.gstackPtySession; + const authToken = window.gstackAuthToken; + const port = window.gstackServerPort; + if (!sessionId || !authToken || !port) return; + try { + const blob = new Blob([JSON.stringify({ sessionId, authToken })], { + type: 'application/json', + }); + navigator.sendBeacon(`http://127.0.0.1:${port}/pty-dispose`, blob); + } catch { + // Best-effort — the 60s detach timer will catch any session we miss. + } +}); diff --git a/package.json b/package.json index 0b9428b9c..2c39a80b7 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.43.2.0", + "version": "1.44.0.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", diff --git a/test-setup.ts b/test-setup.ts new file mode 100644 index 000000000..fb742ab2d --- /dev/null +++ b/test-setup.ts @@ -0,0 +1,53 @@ +/** + * Global test preload (Bun: `[test] preload` in bunfig.toml). + * + * Snapshots `process.env` once at preload time, then restores it after + * every test. Defends against the recurring pollution class where one + * test file mutates `process.env.PATH` / `HOME` / etc. and leaks into + * unrelated subsequent files in the same Bun process — surfaces as + * `Executable not found in $PATH: "bun"` or `Bun.which('bash')` returning + * null in tests that have no business touching env. + * + * `process.env = X` reassignment does work in Bun (it swaps the underlying + * proxy), but several test files use the broken pattern of + * `origEnv = {...process.env}` followed by per-test mutation without a + * matching restore inside try/finally. Centralizing the safety net here + * means new tests don't have to remember the dance, and the bug class + * stays dead. + */ +import { afterEach, beforeAll } from 'bun:test'; + +// Narrowly restore PATH after every test. Defends against the recurring +// pollution class where one test sets `process.env.PATH = '/test/bin:/usr/bin'` +// to exercise a scrubbed-env fixture and either forgets to restore or uses +// the broken `process.env = origEnv` reassignment, then a downstream test +// (security.test.ts > resolveBashBinary, pair-agent-tunnel-eval, or +// server-no-import-side-effects) sees the wrong PATH and either has +// `Bun.which('bash')` return null or `Bun.spawn(['bun', ...])` ENOENT. +// +// Deliberately narrow: snapshotting + restoring all of process.env breaks +// tests that legitimately set per-file env at module top-level (e.g., +// domain-skills-storage.test.ts assigns `process.env.GSTACK_HOME` at +// import time so the loaded module reads the test sandbox path on first +// invocation — wiping that on afterEach would route reads at the user's +// real ~/.gstack and the test would assert on the wrong filesystem). +// +// If a future test pollutes a different variable in the same broken way, +// add it to RESTORE_KEYS rather than widening the snapshot scope. +const RESTORE_KEYS = ['PATH', 'Path'] as const; +const baseline: Record = {}; + +beforeAll(() => { + for (const k of RESTORE_KEYS) baseline[k] = process.env[k]; +}); + +afterEach(() => { + for (const k of RESTORE_KEYS) { + const want = baseline[k]; + if (want === undefined) { + if (process.env[k] !== undefined) delete process.env[k]; + } else if (process.env[k] !== want) { + process.env[k] = want; + } + } +}); diff --git a/test/explain-level-config.test.ts b/test/explain-level-config.test.ts index 24cb644d2..9a48a9f6b 100644 --- a/test/explain-level-config.test.ts +++ b/test/explain-level-config.test.ts @@ -28,8 +28,11 @@ afterEach(() => { }); function run(...args: string[]): { stdout: string; stderr: string; status: number } { + // gstack-config precedence is `${GSTACK_HOME:-${GSTACK_STATE_DIR:-$HOME/.gstack}}`, + // so GSTACK_HOME from the developer's parent env wins over the test's + // GSTACK_STATE_DIR. Override both to isolate from the real ~/.gstack. const res = spawnSync(BIN_CONFIG, args, { - env: { ...process.env, GSTACK_STATE_DIR: tmpHome }, + env: { ...process.env, GSTACK_STATE_DIR: tmpHome, GSTACK_HOME: tmpHome }, encoding: 'utf-8', cwd: ROOT, }); @@ -59,9 +62,13 @@ describe('gstack-config explain_level', () => { expect(run('get', 'explain_level').stdout).toBe('default'); }); - test('get with unset explain_level returns empty (preamble default takes over)', () => { - // No prior set → no config file → empty output - expect(run('get', 'explain_level').stdout).toBe(''); + test('get with unset explain_level returns the documented default', () => { + // gstack-config returns the documented default ("default") when the + // key is absent from config.yaml — see bin/gstack-config:103. Earlier + // versions of this test expected "" (preamble shell substitution), + // but the script ships defaults inline so callers always get a + // usable value without bash fallback gymnastics. + expect(run('get', 'explain_level').stdout).toBe('default'); }); test('config header documents explain_level', () => { diff --git a/test/setup-codesign.test.ts b/test/setup-codesign.test.ts index 1ac7a4982..506ce5cae 100644 --- a/test/setup-codesign.test.ts +++ b/test/setup-codesign.test.ts @@ -33,9 +33,12 @@ describe('setup: Apple Silicon codesign', () => { test('codesign block is inside the NEEDS_BUILD=1 branch', () => { const content = fs.readFileSync(SETUP_SCRIPT, 'utf-8'); - // The codesign block should appear after `bun run build` and before the - // `if [ ! -x "$BROWSE_BIN" ]` guard that checks the build succeeded. - const buildIdx = content.indexOf('bun run build'); + // The codesign block should appear after the build command and before the + // `if [ ! -x "$BROWSE_BIN" ]` guard that checks the build succeeded. The + // setup script invokes the build via `bun_cmd run build` (not literal + // `bun run build`) so the wrapper can route through asdf/volta/etc; + // matching the wrapped form keeps this test stable across that indirection. + const buildIdx = content.indexOf('bun_cmd run build'); const codesignIdx = content.indexOf('codesign --remove-signature'); const browseCheckIdx = content.indexOf('gstack setup failed: browse binary missing'); expect(buildIdx).toBeGreaterThan(-1); diff --git a/test/upgrade-migration-v1.test.ts b/test/upgrade-migration-v1.test.ts index edef6ee3a..09fdaf2c2 100644 --- a/test/upgrade-migration-v1.test.ts +++ b/test/upgrade-migration-v1.test.ts @@ -26,9 +26,13 @@ afterEach(() => { }); function run(): { stdout: string; stderr: string; status: number } { + // Override HOME too — the migration reads `${HOME}/.claude/skills/gstack/bin/gstack-config`, + // and the developer's real config may have `explain_level` set, which the + // migration interprets as "user already decided" and short-circuits without + // writing the pending-prompt flag (breaking these tests). const res = spawnSync('bash', [MIGRATION], { encoding: 'utf-8', - env: { ...process.env, GSTACK_HOME: tmpHome }, + env: { ...process.env, GSTACK_HOME: tmpHome, HOME: tmpHome }, }); return { stdout: (res.stdout ?? '').trim(),