feat(terminal-agent): 25s WS keepalive ping/pong + client keepalive frames

PTY connections were dying silently after NAT idle timeouts (30-60s on most
home routers, even shorter on some carrier-grade NAT) and Chrome MV3 panel
suspension. Neither side noticed until the user's next keystroke produced
no output. Both sides now drive a 25s keepalive cycle.

Server side (browse/src/terminal-agent.ts):
  * New ws.open handler constructs the PtySession eagerly and starts a
    setInterval that sends `{type:"ping",ts:Date.now()}` every 25s.
    Interval handle stored on session.pingInterval so close() can clear it.
  * PtySession.pingInterval field added; cleared in ws.close before
    disposeSession runs. Prevents timer leak across reconnects.
  * Message handler accepts `{type:"ping"|"pong"|"keepalive"}` silently —
    keepalive frames are a liveness signal at the TCP layer, no state to
    update. Existing resize/tabSwitch/tabState handling unchanged.
  * GSTACK_PTY_KEEPALIVE_INTERVAL_MS env knob (default 25000) lets the
    upcoming e2e tests compress idle assertions without 30s waits.

Client side (extension/sidepanel-terminal.js):
  * Belt-and-suspenders: client also runs a 25s setInterval that sends
    `{type:"keepalive"}`. Defends against Chrome pausing our timers if
    the server-side ping ever gets dropped (rare but possible in MV3).
  * Ping reply: on `{type:"ping",ts}` from the server, immediately send
    `{type:"pong",ts}`. Lets the agent observe round-trip latency for
    free and confirms the channel is bidirectional.
  * Interval cleared in three teardown paths: ws.close handler,
    teardown(), forceRestart(). Three paths exist because the sidebar
    can exit the LIVE state through any of them; all three must clean up
    or we leak timers across reconnects.

Test (browse/test/terminal-agent-keepalive.test.ts):
  * Static-grep tripwires for the 7-point protocol contract: agent has
    a configurable interval, open() starts the ping, close() clears it,
    message handler accepts keepalive vocabulary, client sends keepalive
    + replies pong, and all three client teardown paths clear the timer.
  * Wire-level tests (actually observe a ping after 25s) belong in the
    e2e tier — adding them here would either flake on slow CI or require
    a real Bun.serve listener per test which we don't want to pay for
    in the free tier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-23 23:09:23 -07:00
parent 4462f0caa4
commit d8751e91df
3 changed files with 192 additions and 4 deletions
+40 -1
View File
@@ -48,6 +48,16 @@
let term = null;
let fitAddon = null;
let ws = null;
/**
* 25s client-side WS keepalive interval (v1.44+). Belt-and-suspenders with
* the server-side ping in terminal-agent.ts: server pings cover most
* idle-NAT cases, client keepalive frames also defend against Chromium's
* MV3-adjacent panel suspension heuristics that can pause our timers.
* Started on ws.open, cleared on ws.close. The agent silently accepts
* `{type:"keepalive"}` text frames.
*/
let keepaliveInterval = null;
const KEEPALIVE_INTERVAL_MS = 25000;
function show(el) { el.style.display = ''; }
function hide(el) { el.style.display = 'none'; }
@@ -371,16 +381,33 @@
} catch {}
// Send a single byte to nudge the agent to spawn claude (lazy-spawn trigger).
try { ws.send(new TextEncoder().encode('\n')); } catch {}
// v1.44 client-side keepalive. Server pings every 25s; we ALSO send
// keepalive frames at the same cadence so a paused timer on either
// side still has the other to lean on. Both are silently dropped
// by the agent's message handler.
if (keepaliveInterval) clearInterval(keepaliveInterval);
keepaliveInterval = setInterval(() => {
if (!ws || ws.readyState !== WebSocket.OPEN) return;
try { ws.send(JSON.stringify({ type: 'keepalive' })); } catch {}
}, KEEPALIVE_INTERVAL_MS);
});
ws.addEventListener('message', (ev) => {
if (typeof ev.data === 'string') {
// Agent control message (rare). Treat as JSON; error frames carry code.
// Agent control message. Treat as JSON; error frames carry code,
// ping frames trigger an immediate pong reply.
try {
const msg = JSON.parse(ev.data);
if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') {
setState(STATE.NO_CLAUDE);
try { ws.close(); } catch {}
return;
}
if (msg.type === 'ping') {
// Mirror the server's timestamp back. Cheap liveness ACK that
// lets the agent observe round-trip latency for free.
try { ws.send(JSON.stringify({ type: 'pong', ts: msg.ts })); } catch {}
return;
}
} catch {}
return;
@@ -392,6 +419,10 @@
ws.addEventListener('close', () => {
ws = null;
if (keepaliveInterval) {
clearInterval(keepaliveInterval);
keepaliveInterval = null;
}
if (state !== STATE.NO_CLAUDE) setState(STATE.ENDED);
});
@@ -401,6 +432,10 @@
}
function teardown() {
if (keepaliveInterval) {
clearInterval(keepaliveInterval);
keepaliveInterval = null;
}
try { ws && ws.close(); } catch {}
ws = null;
if (term) {
@@ -418,6 +453,10 @@
* IDLE, kick off auto-connect. Safe to call from any state.
*/
function forceRestart() {
if (keepaliveInterval) {
clearInterval(keepaliveInterval);
keepaliveInterval = null;
}
try { ws && ws.close(); } catch {}
ws = null;
if (term) {