mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-26 21:12:26 +02:00
920a13a17f463c28a3db75cc27482affb13a4fee
1 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
920a13a17f |
v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678)
* fix(browse): identity-based terminal-agent kill replaces pkill regex
Commit 0 of the v1.44 long-lived-sidebar PR — foundation for the watchdog
and removes a latent cross-session footgun.
`pkill -f terminal-agent\.ts` (cli.ts spawn site + server.ts shutdown) matched
by argv regex and would kill ANY process whose argv contained the string —
sibling gstack sessions on the same host, an editor with the file open, a
second `$B connect` run. Identity-based PID kill via a new helper module
removes that whole class of bug.
* New `browse/src/terminal-agent-control.ts`: `readAgentRecord`,
`writeAgentRecord`, `clearAgentRecord`, `killAgentByRecord`. Validates
PID liveness via `isProcessAlive` before signaling (PID-reuse defense).
* `terminal-agent.ts` writes `<stateDir>/terminal-agent-pid` (JSON
`{pid, gen, startedAt}`) at boot; clears on SIGTERM/SIGINT.
* New per-boot `CURRENT_GEN` (16-byte random); `/internal/*` callers can
include `X-Browse-Gen` to defend against split-brain in the upcoming
watchdog. Absent header is accepted (backward compat); mismatch returns
409. New `checkInternalAuth` helper centralizes bearer + gen checks.
* New `/internal/healthz` route — agent liveness probe used by the
upcoming watchdog (returns pid/gen/sessions, no claude-binary lookup).
* `cli.ts` and `server.ts` both call `killAgentByRecord` instead of pkill.
* `ServerConfig.ownsTerminalAgent` JSDoc updated; the gated teardown now
runs 4 side effects (was 3) — adds the new agent-record unlink.
Test changes:
* New `browse/test/terminal-agent-pid-identity.test.ts` — static-grep
tripwire that fails CI if any source file re-introduces `pkill ...
terminal-agent` or `spawnSync('pkill', ...)`; round-trips
write/read/clear; verifies killAgentByRecord no-ops on dead PIDs.
* `browse/test/server-embedder-terminal-port.test.ts` rewritten to
intercept `process.kill` (not `child_process.spawnSync`); writes a
sentinel agent-record with a guaranteed-dead PID; asserts probe-only
(signal 0) calls, no termination signals; verifies all 3 discovery
files including the new terminal-agent-pid.
Closes TODOS.md P3 ("Identity-based terminal-agent kill").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tests): repair 7 pre-existing failures (env pollution + stale markers)
All 7 failures existed on main before this branch — verified via `git stash`
round-trip. Bundling them into the long-lived-sidebar PR because we kept
tripping over them while running `bun test` to verify Commit 0.
* Global afterEach restores `process.env.PATH` (new bunfig.toml +
test-setup.ts). browser-skill-commands.test.ts sets
`PATH = '/test/bin:/usr/bin'` to exercise a scrubbed-env fixture and
used the broken `process.env = origEnv` reassignment pattern that
swaps the proxy reference; the underlying env stayed mutated and
leaked downstream. Fixed three call sites in that file and added a
narrow PATH-only global guardrail so a future polluter can't bring
the bug back. Killed: pair-agent-tunnel-eval (bun ENOENT),
security.test.ts > resolveBashBinary (Bun.which('bash') null),
server-no-import-side-effects (bun ENOENT).
* server-auth.test.ts: two `sliceBetween` markers referenced strings
deleted when sidebar-agent.ts was ripped — `'Sidebar agent started'`
→ `'Terminal agent started'`, `'Sidebar endpoints'` → `'Batch endpoint'`.
Also fixed the pair-agent BROWSE_PARENT_PID assertion (the literal
`serverEnv.BROWSE_PARENT_PID` never existed in source; the actual
contract is the object-literal `BROWSE_PARENT_PID: '0'` inside the
`const serverEnv` declaration).
* test/upgrade-migration-v1.test.ts: also overrides HOME in the spawn
env. The migration shells out to `${HOME}/.claude/skills/gstack/bin/gstack-config`
and a developer's real config with `explain_level` set causes the
script to take the "user already decided" branch and skip writing
the pending-prompt flag the test asserts on.
* test/setup-codesign.test.ts: replaced fragile `bun run build`
string-match (which hit a comment 700 lines later) with the actual
invocation `bun_cmd run build` used in the setup script.
Net: full suite is now green; CI no longer trips on bash/bun-ENOENT
from PATH pollution or on test markers that drifted with the codebase.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(terminal-agent): extract internalHandler<T> helper for /internal/* routes
Replaces the copy-pasted bearer-auth + X-Browse-Gen + req.json().then().catch()
boilerplate on /internal/grant and /internal/revoke with a single
internalHandler<T>(req, fn) wrapper. Future /internal/* routes added by the
v1.44 long-lived-sidebar work (/internal/lease-refresh, /internal/restart)
land as one-liners using the same helper. Pure refactor; no behavior change.
/internal/healthz stays on the bare checkInternalAuth gate because it's a
GET with no JSON body to parse — the helper's body-parse path would 400 it.
* browse/src/terminal-agent.ts — new internalHandler<T>; /internal/grant
+ /internal/revoke routed through it.
* browse/test/terminal-agent-internal-handler.test.ts — static-grep
tripwire that fails CI if the helper goes away or either of the two
refactored routes regresses to the old inline pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(terminal-agent): 25s WS keepalive ping/pong + client keepalive frames
PTY connections were dying silently after NAT idle timeouts (30-60s on most
home routers, even shorter on some carrier-grade NAT) and Chrome MV3 panel
suspension. Neither side noticed until the user's next keystroke produced
no output. Both sides now drive a 25s keepalive cycle.
Server side (browse/src/terminal-agent.ts):
* New ws.open handler constructs the PtySession eagerly and starts a
setInterval that sends `{type:"ping",ts:Date.now()}` every 25s.
Interval handle stored on session.pingInterval so close() can clear it.
* PtySession.pingInterval field added; cleared in ws.close before
disposeSession runs. Prevents timer leak across reconnects.
* Message handler accepts `{type:"ping"|"pong"|"keepalive"}` silently —
keepalive frames are a liveness signal at the TCP layer, no state to
update. Existing resize/tabSwitch/tabState handling unchanged.
* GSTACK_PTY_KEEPALIVE_INTERVAL_MS env knob (default 25000) lets the
upcoming e2e tests compress idle assertions without 30s waits.
Client side (extension/sidepanel-terminal.js):
* Belt-and-suspenders: client also runs a 25s setInterval that sends
`{type:"keepalive"}`. Defends against Chrome pausing our timers if
the server-side ping ever gets dropped (rare but possible in MV3).
* Ping reply: on `{type:"ping",ts}` from the server, immediately send
`{type:"pong",ts}`. Lets the agent observe round-trip latency for
free and confirms the channel is bidirectional.
* Interval cleared in three teardown paths: ws.close handler,
teardown(), forceRestart(). Three paths exist because the sidebar
can exit the LIVE state through any of them; all three must clean up
or we leak timers across reconnects.
Test (browse/test/terminal-agent-keepalive.test.ts):
* Static-grep tripwires for the 7-point protocol contract: agent has
a configurable interval, open() starts the ping, close() clears it,
message handler accepts keepalive vocabulary, client sends keepalive
+ replies pong, and all three client teardown paths clear the timer.
* Wire-level tests (actually observe a ping after 25s) belong in the
e2e tier — adding them here would either flake on slow CI or require
a real Bun.serve listener per test which we don't want to pay for
in the free tier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(sidebar): patient tryAutoConnect — poll forever with ascending status, abort only on 401
The 15s give-up message ("Browse server not ready. Reload sidebar to retry.")
fired on every cold start where the daemon took >15s to bind — common on
Conductor workspaces, CI runners, and any system under load. The user
already opened the sidebar; telling them to give up is the wrong default.
Now polls every 2s indefinitely with ascending status messages:
* 0 - 15s : silent (handles the happy path on a warm laptop)
* 15 - 60s : "Waiting for browse server..."
* 60s - 5m : "Still waiting — browse server may be slow to start."
* > 5m : "Browse server still not responding after 5 min. Try `$B status`."
Loop aborts on three signals only:
* state transitions out of IDLE (connect succeeded or user navigated)
* autoConnectAborted sticky flag set on unrecoverable error
* the panel itself unloading (browser handles this; pagehide cleanup
arrives with T8 of the larger plan)
401 from /pty-session sets the sticky flag with a clear "Auth invalid —
reload the sidebar or restart your gstack session." message. Without the
flag, the loop would re-call connect() every 2s and spam the same error;
with it, the user sees the message once and the loop holds. forceRestart()
clears the flag so clicking Restart is the explicit "try again" escape hatch.
Bumped poll interval 200ms → 2000ms — the legacy tight loop burned CPU
for no reason. 2s is plenty fast for a "did the daemon come up yet" check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(browse): terminal-agent watchdog with PID liveness + crash-loop guard
terminal-agent could die independently of the server — SIGKILL from the OS
OOM killer, an uncaught exception under PTY churn, an external `pkill` from
a sibling debugging session. Pre-v1.44 the sidebar would observe the broken
connection and stay broken until the user reloaded the sidebar. Now a 60s
ticker checks the recorded agent PID and respawns via the shared
spawnTerminalAgent helper when dead.
Identity-based liveness (T4 from the eng review):
* Uses readAgentRecord + isProcessAlive (signal 0 probe), not a name match.
* Slow-but-alive agents intentionally fall through — respawning around a
living agent would create split-brain (two agents writing the port
file, tokens diverging between them, mystery upgrade 401s).
* Pairs with the v1.44 generation counter in /internal/* loopback calls:
if a stale agent does come back to life mid-cycle, its X-Browse-Gen
no longer matches and the parent's calls 409 cleanly.
Crash-loop guard:
* 3 respawn attempts inside a rolling 60s window → stop trying. A daemon
up for a week with one crash a day shouldn't trip the guard.
* On trip: one-line error to console (`respawn guard tripped`) and the
watchdog goes dormant. Manual restart via the sidebar Restart button
is the explicit signal to re-arm (added in Commit 2 of the larger PR).
Shared spawn path (refactor):
* New spawnTerminalAgent(opts) in terminal-agent-control.ts handles:
prior-PID cleanup → spawn → record stash. Both the CLI cold-start path
in cli.ts and the new server.ts watchdog route through it. Removes the
copy-paste between them; future env wiring lands in one place.
Gated on cfg.ownsTerminalAgent — embedders that pre-launch their own PTY
server (gbrowser phoenix overlay) still own the full lifecycle.
GSTACK_AGENT_WATCHDOG_TICK_MS env knob compresses the 60s tick for e2e
tests without 60s waits per assertion.
Tests:
* browse/test/terminal-agent-watchdog.test.ts — 7 static-grep tripwires
for the load-bearing invariants (ownsTerminalAgent gate, PID-based
liveness, crash-loop guard with window pruning, shutdown cleanup,
CLI cold-start uses the same helper, env knob exists).
* Live process-kill tests belong in the e2e tier; cheaper invariants
here catch refactor regressions in ~1ms each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(cli): opt-in outer supervisor — respawn browse server on crash
Pre-v1.44 `$B connect` was fire-and-forget: spawn server detached, CLI
exits, server runs unsupervised. If the server crashed (OOM, uncaught
exception, signal kill from a runaway debugger), the user had to notice,
re-run `$B connect`, and resume work. The v1.44 terminal-agent watchdog
recovers from one layer of failure; this commit closes the outer loop.
Opt-in via `--supervise` flag or `BROWSE_SUPERVISE=1` env. Default
behavior is unchanged — every existing caller (Claude Code's Bash tool,
scripts, CI) still gets a prompt return. When the flag is set:
* CLI stays attached, polls server PID every 30s via readState() +
isProcessAlive (same identity primitive as the terminal-agent watchdog).
* On unexpected exit: respawn via the same headed-mode startServer path
used initially, then re-spawn the terminal-agent so the PTY recovers
too (otherwise sidebar Restart is the only path back).
* Crash-loop guard: 5 respawns in a rolling 5-min window → exit 1 with
a clear error. Window pruning means a long-lived daemon with sporadic
crashes does NOT trip the guard (otherwise we punish the user for the
supervisor doing its job).
* Backoff: 1s, 2s, 4s, 8s, 30s capped. Env-overridable via
GSTACK_SUPERVISOR_BACKOFF for tests.
* SIGINT / SIGTERM: clean teardown — signals the supervised server
before exiting itself. Without this, Ctrl-C leaves an orphaned server.
Out of scope (deferred follow-up): routing the Chromium-disconnect
exit-code-1 path back through this supervisor. The terminal-agent
watchdog already covers the highest-frequency restart case; Chromium
crash recovery joins the queue as its own commit.
Test (browse/test/cli-supervisor.test.ts):
* 6 static-grep tripwires: opt-in default, signal wiring, crash-loop
guard with window pruning, backoff schedule env knob, tick interval
env knob, terminal-agent re-spawn after server respawn.
* Live respawn tests belong in the e2e tier (real spawn cycles take
3-8s each; spamming these in the free tier would balloon CI time).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(browse): pty-session-lease registry — stable sessionId + lease lifecycle
Foundation for Commit 2 of the long-lived-sidebar PR. Separates two
concerns that pre-v1.44 were conflated under one token:
* sessionId — stable, non-secret identifier for a single PTY session.
Safe to log, safe in URLs, safe in DevTools. Identifies "this terminal,"
not "you're allowed to use this terminal."
* lease — server-side bookkeeping that maps sessionId → expiresAt.
Re-attach within the lease window resumes the same PTY; expiry tears
it down.
The companion attach-token primitive (short-lived 30s bearer) reuses the
existing browse/src/pty-session-cookie.ts module unchanged — the lease
adds a name-space alongside, it doesn't replace anything.
Codex outside-voice (T1 of the eng review) flagged the original D4
"token IS sessionId" design as conflating identity with auth. The fix
is this lease registry: re-attach URLs carry the stable sessionId
(loggable), the short-lived attachToken stays out of logs.
API:
* mintLease() → { sessionId, expiresAt }
* validateLease(sessionId) → { ok: true, expiresAt } | { ok: false }
* refreshLease(sessionId) — validate-first, never resurrects expired
leases. Security-critical: the 30-min TTL is what bounds blast
radius for a leaked attachToken whose lease should have GC'd.
* revokeLease(sessionId) — explicit dispose path.
* leaseCount() — observability helper.
* __resetLeases() — test-only.
TTL env knob (GSTACK_PTY_LEASE_TTL_MS) lets v1.44 e2e tests compress
the detach window to 1s instead of waiting 30 minutes per assertion.
Server.ts wiring + /pty-session shape change + /pty-restart + /pty-dispose
+ /pty-session/reattach all land in subsequent commits in this branch.
Test (browse/test/pty-session-lease.test.ts):
* 8 cases pinning mint uniqueness, validate-first refresh contract,
revoke idempotency, null/undefined tolerance, and the negative case
that refresh never resurrects a revoked lease (same code path as
expired-and-pruned).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(terminal-agent): sessionId-aware grant + scoped restart + eager spawn
Wires the pty-session-lease primitive (
|