mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-27 03:59:59 +02:00
feat(browse): terminal-agent watchdog with PID liveness + crash-loop guard
terminal-agent could die independently of the server — SIGKILL from the OS
OOM killer, an uncaught exception under PTY churn, an external `pkill` from
a sibling debugging session. Pre-v1.44 the sidebar would observe the broken
connection and stay broken until the user reloaded the sidebar. Now a 60s
ticker checks the recorded agent PID and respawns via the shared
spawnTerminalAgent helper when dead.
Identity-based liveness (T4 from the eng review):
* Uses readAgentRecord + isProcessAlive (signal 0 probe), not a name match.
* Slow-but-alive agents intentionally fall through — respawning around a
living agent would create split-brain (two agents writing the port
file, tokens diverging between them, mystery upgrade 401s).
* Pairs with the v1.44 generation counter in /internal/* loopback calls:
if a stale agent does come back to life mid-cycle, its X-Browse-Gen
no longer matches and the parent's calls 409 cleanly.
Crash-loop guard:
* 3 respawn attempts inside a rolling 60s window → stop trying. A daemon
up for a week with one crash a day shouldn't trip the guard.
* On trip: one-line error to console (`respawn guard tripped`) and the
watchdog goes dormant. Manual restart via the sidebar Restart button
is the explicit signal to re-arm (added in Commit 2 of the larger PR).
Shared spawn path (refactor):
* New spawnTerminalAgent(opts) in terminal-agent-control.ts handles:
prior-PID cleanup → spawn → record stash. Both the CLI cold-start path
in cli.ts and the new server.ts watchdog route through it. Removes the
copy-paste between them; future env wiring lands in one place.
Gated on cfg.ownsTerminalAgent — embedders that pre-launch their own PTY
server (gbrowser phoenix overlay) still own the full lifecycle.
GSTACK_AGENT_WATCHDOG_TICK_MS env knob compresses the 60s tick for e2e
tests without 60s waits per assertion.
Tests:
* browse/test/terminal-agent-watchdog.test.ts — 7 static-grep tripwires
for the load-bearing invariants (ownsTerminalAgent gate, PID-based
liveness, crash-loop guard with window pruning, shutdown cleanup,
CLI cold-start uses the same helper, env knob exists).
* Live process-kill tests belong in the e2e tier; cheaper invariants
here catch refactor regressions in ~1ms each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+12
-32
@@ -16,7 +16,7 @@ import { writeSecureFile, mkdirSecure } from './file-permissions';
|
||||
import { resolveConfig, ensureStateDir, readVersionHash } from './config';
|
||||
import { parseProxyConfig, computeConfigHash, ProxyConfigError } from './proxy-config';
|
||||
import { redactProxyUrl } from './proxy-redact';
|
||||
import { readAgentRecord, killAgentByRecord, clearAgentRecord } from './terminal-agent-control';
|
||||
import { spawnTerminalAgent } from './terminal-agent-control';
|
||||
|
||||
const config = resolveConfig();
|
||||
const IS_WINDOWS = process.platform === 'win32';
|
||||
@@ -1034,38 +1034,18 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
||||
// claude -p subprocesses to multiplex.
|
||||
|
||||
// Auto-start terminal agent (non-compiled bun process). Owns the PTY
|
||||
// WebSocket for the sidebar Terminal pane.
|
||||
let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
|
||||
if (!fs.existsSync(termAgentScript)) {
|
||||
termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');
|
||||
}
|
||||
// WebSocket for the sidebar Terminal pane. Routes through the shared
|
||||
// spawnTerminalAgent helper so the CLI cold-start path and the
|
||||
// server.ts watchdog respawn path share one implementation. The
|
||||
// helper handles prior-PID cleanup, script lookup, and env wiring.
|
||||
try {
|
||||
if (fs.existsSync(termAgentScript)) {
|
||||
// Kill any stale terminal-agent from a prior run so its port file
|
||||
// can't trick the server into routing /pty-session at a dead
|
||||
// listener. Identity-based (v1.44+) — only kills the PID recorded
|
||||
// in `<stateDir>/terminal-agent-pid`. Pre-v1.44 used
|
||||
// `pkill -f terminal-agent\.ts` which matched sibling gstack
|
||||
// sessions; see terminal-agent-control.ts header for rationale.
|
||||
{
|
||||
const stateDir = path.dirname(config.stateFile);
|
||||
const prior = readAgentRecord(stateDir);
|
||||
if (prior) {
|
||||
killAgentByRecord(prior, 'SIGTERM');
|
||||
clearAgentRecord(stateDir);
|
||||
}
|
||||
}
|
||||
const termProc = Bun.spawn(['bun', 'run', termAgentScript], {
|
||||
cwd: config.projectDir,
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: config.stateFile,
|
||||
BROWSE_SERVER_PORT: String(newState.port),
|
||||
},
|
||||
stdio: ['ignore', 'ignore', 'ignore'],
|
||||
});
|
||||
termProc.unref();
|
||||
console.log(`[browse] Terminal agent started (PID: ${termProc.pid})`);
|
||||
const newPid = spawnTerminalAgent({
|
||||
stateFile: config.stateFile,
|
||||
serverPort: newState.port,
|
||||
cwd: config.projectDir,
|
||||
});
|
||||
if (newPid) {
|
||||
console.log(`[browse] Terminal agent started (PID: ${newPid})`);
|
||||
}
|
||||
} catch (err: any) {
|
||||
// Non-fatal: chat still works without the terminal agent.
|
||||
|
||||
Reference in New Issue
Block a user