CalvinBackup/gstack - gstack - MS-GitHub-Backup (Gitea)

CalvinBackup/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-27 21:43:05 +02:00

Author	SHA1	Message	Date
Garry Tan	920a13a17f	v1.44.0.0 feat: long-lived sidebar — keepalive, restart, re-attach, scrollback replay (#1678 ) * fix(browse): identity-based terminal-agent kill replaces pkill regex Commit 0 of the v1.44 long-lived-sidebar PR — foundation for the watchdog and removes a latent cross-session footgun. `pkill -f terminal-agent\.ts` (cli.ts spawn site + server.ts shutdown) matched by argv regex and would kill ANY process whose argv contained the string — sibling gstack sessions on the same host, an editor with the file open, a second `$B connect` run. Identity-based PID kill via a new helper module removes that whole class of bug. * New `browse/src/terminal-agent-control.ts`: `readAgentRecord`, `writeAgentRecord`, `clearAgentRecord`, `killAgentByRecord`. Validates PID liveness via `isProcessAlive` before signaling (PID-reuse defense). * `terminal-agent.ts` writes `<stateDir>/terminal-agent-pid` (JSON `{pid, gen, startedAt}`) at boot; clears on SIGTERM/SIGINT. * New per-boot `CURRENT_GEN` (16-byte random); `/internal/` callers can include `X-Browse-Gen` to defend against split-brain in the upcoming watchdog. Absent header is accepted (backward compat); mismatch returns 409. New `checkInternalAuth` helper centralizes bearer + gen checks. New `/internal/healthz` route — agent liveness probe used by the upcoming watchdog (returns pid/gen/sessions, no claude-binary lookup). * `cli.ts` and `server.ts` both call `killAgentByRecord` instead of pkill. * `ServerConfig.ownsTerminalAgent` JSDoc updated; the gated teardown now runs 4 side effects (was 3) — adds the new agent-record unlink. Test changes: * New `browse/test/terminal-agent-pid-identity.test.ts` — static-grep tripwire that fails CI if any source file re-introduces `pkill ... terminal-agent` or `spawnSync('pkill', ...)`; round-trips write/read/clear; verifies killAgentByRecord no-ops on dead PIDs. * `browse/test/server-embedder-terminal-port.test.ts` rewritten to intercept `process.kill` (not `child_process.spawnSync`); writes a sentinel agent-record with a guaranteed-dead PID; asserts probe-only (signal 0) calls, no termination signals; verifies all 3 discovery files including the new terminal-agent-pid. Closes TODOS.md P3 ("Identity-based terminal-agent kill"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): repair 7 pre-existing failures (env pollution + stale markers) All 7 failures existed on main before this branch — verified via `git stash` round-trip. Bundling them into the long-lived-sidebar PR because we kept tripping over them while running `bun test` to verify Commit 0. * Global afterEach restores `process.env.PATH` (new bunfig.toml + test-setup.ts). browser-skill-commands.test.ts sets `PATH = '/test/bin:/usr/bin'` to exercise a scrubbed-env fixture and used the broken `process.env = origEnv` reassignment pattern that swaps the proxy reference; the underlying env stayed mutated and leaked downstream. Fixed three call sites in that file and added a narrow PATH-only global guardrail so a future polluter can't bring the bug back. Killed: pair-agent-tunnel-eval (bun ENOENT), security.test.ts > resolveBashBinary (Bun.which('bash') null), server-no-import-side-effects (bun ENOENT). * server-auth.test.ts: two `sliceBetween` markers referenced strings deleted when sidebar-agent.ts was ripped — `'Sidebar agent started'` → `'Terminal agent started'`, `'Sidebar endpoints'` → `'Batch endpoint'`. Also fixed the pair-agent BROWSE_PARENT_PID assertion (the literal `serverEnv.BROWSE_PARENT_PID` never existed in source; the actual contract is the object-literal `BROWSE_PARENT_PID: '0'` inside the `const serverEnv` declaration). * test/upgrade-migration-v1.test.ts: also overrides HOME in the spawn env. The migration shells out to `${HOME}/.claude/skills/gstack/bin/gstack-config` and a developer's real config with `explain_level` set causes the script to take the "user already decided" branch and skip writing the pending-prompt flag the test asserts on. * test/setup-codesign.test.ts: replaced fragile `bun run build` string-match (which hit a comment 700 lines later) with the actual invocation `bun_cmd run build` used in the setup script. Net: full suite is now green; CI no longer trips on bash/bun-ENOENT from PATH pollution or on test markers that drifted with the codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(terminal-agent): extract internalHandler<T> helper for /internal/* routes Replaces the copy-pasted bearer-auth + X-Browse-Gen + req.json().then().catch() boilerplate on /internal/grant and /internal/revoke with a single internalHandler<T>(req, fn) wrapper. Future /internal/* routes added by the v1.44 long-lived-sidebar work (/internal/lease-refresh, /internal/restart) land as one-liners using the same helper. Pure refactor; no behavior change. /internal/healthz stays on the bare checkInternalAuth gate because it's a GET with no JSON body to parse — the helper's body-parse path would 400 it. * browse/src/terminal-agent.ts — new internalHandler<T>; /internal/grant + /internal/revoke routed through it. * browse/test/terminal-agent-internal-handler.test.ts — static-grep tripwire that fails CI if the helper goes away or either of the two refactored routes regresses to the old inline pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(terminal-agent): 25s WS keepalive ping/pong + client keepalive frames PTY connections were dying silently after NAT idle timeouts (30-60s on most home routers, even shorter on some carrier-grade NAT) and Chrome MV3 panel suspension. Neither side noticed until the user's next keystroke produced no output. Both sides now drive a 25s keepalive cycle. Server side (browse/src/terminal-agent.ts): * New ws.open handler constructs the PtySession eagerly and starts a setInterval that sends `{type:"ping",ts:Date.now()}` every 25s. Interval handle stored on session.pingInterval so close() can clear it. * PtySession.pingInterval field added; cleared in ws.close before disposeSession runs. Prevents timer leak across reconnects. * Message handler accepts `{type:"ping"\|"pong"\|"keepalive"}` silently — keepalive frames are a liveness signal at the TCP layer, no state to update. Existing resize/tabSwitch/tabState handling unchanged. * GSTACK_PTY_KEEPALIVE_INTERVAL_MS env knob (default 25000) lets the upcoming e2e tests compress idle assertions without 30s waits. Client side (extension/sidepanel-terminal.js): * Belt-and-suspenders: client also runs a 25s setInterval that sends `{type:"keepalive"}`. Defends against Chrome pausing our timers if the server-side ping ever gets dropped (rare but possible in MV3). * Ping reply: on `{type:"ping",ts}` from the server, immediately send `{type:"pong",ts}`. Lets the agent observe round-trip latency for free and confirms the channel is bidirectional. * Interval cleared in three teardown paths: ws.close handler, teardown(), forceRestart(). Three paths exist because the sidebar can exit the LIVE state through any of them; all three must clean up or we leak timers across reconnects. Test (browse/test/terminal-agent-keepalive.test.ts): * Static-grep tripwires for the 7-point protocol contract: agent has a configurable interval, open() starts the ping, close() clears it, message handler accepts keepalive vocabulary, client sends keepalive + replies pong, and all three client teardown paths clear the timer. * Wire-level tests (actually observe a ping after 25s) belong in the e2e tier — adding them here would either flake on slow CI or require a real Bun.serve listener per test which we don't want to pay for in the free tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): patient tryAutoConnect — poll forever with ascending status, abort only on 401 The 15s give-up message ("Browse server not ready. Reload sidebar to retry.") fired on every cold start where the daemon took >15s to bind — common on Conductor workspaces, CI runners, and any system under load. The user already opened the sidebar; telling them to give up is the wrong default. Now polls every 2s indefinitely with ascending status messages: * 0 - 15s : silent (handles the happy path on a warm laptop) * 15 - 60s : "Waiting for browse server..." * 60s - 5m : "Still waiting — browse server may be slow to start." * > 5m : "Browse server still not responding after 5 min. Try `$B status`." Loop aborts on three signals only: * state transitions out of IDLE (connect succeeded or user navigated) * autoConnectAborted sticky flag set on unrecoverable error * the panel itself unloading (browser handles this; pagehide cleanup arrives with T8 of the larger plan) 401 from /pty-session sets the sticky flag with a clear "Auth invalid — reload the sidebar or restart your gstack session." message. Without the flag, the loop would re-call connect() every 2s and spam the same error; with it, the user sees the message once and the loop holds. forceRestart() clears the flag so clicking Restart is the explicit "try again" escape hatch. Bumped poll interval 200ms → 2000ms — the legacy tight loop burned CPU for no reason. 2s is plenty fast for a "did the daemon come up yet" check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): terminal-agent watchdog with PID liveness + crash-loop guard terminal-agent could die independently of the server — SIGKILL from the OS OOM killer, an uncaught exception under PTY churn, an external `pkill` from a sibling debugging session. Pre-v1.44 the sidebar would observe the broken connection and stay broken until the user reloaded the sidebar. Now a 60s ticker checks the recorded agent PID and respawns via the shared spawnTerminalAgent helper when dead. Identity-based liveness (T4 from the eng review): * Uses readAgentRecord + isProcessAlive (signal 0 probe), not a name match. * Slow-but-alive agents intentionally fall through — respawning around a living agent would create split-brain (two agents writing the port file, tokens diverging between them, mystery upgrade 401s). * Pairs with the v1.44 generation counter in /internal/* loopback calls: if a stale agent does come back to life mid-cycle, its X-Browse-Gen no longer matches and the parent's calls 409 cleanly. Crash-loop guard: * 3 respawn attempts inside a rolling 60s window → stop trying. A daemon up for a week with one crash a day shouldn't trip the guard. * On trip: one-line error to console (`respawn guard tripped`) and the watchdog goes dormant. Manual restart via the sidebar Restart button is the explicit signal to re-arm (added in Commit 2 of the larger PR). Shared spawn path (refactor): * New spawnTerminalAgent(opts) in terminal-agent-control.ts handles: prior-PID cleanup → spawn → record stash. Both the CLI cold-start path in cli.ts and the new server.ts watchdog route through it. Removes the copy-paste between them; future env wiring lands in one place. Gated on cfg.ownsTerminalAgent — embedders that pre-launch their own PTY server (gbrowser phoenix overlay) still own the full lifecycle. GSTACK_AGENT_WATCHDOG_TICK_MS env knob compresses the 60s tick for e2e tests without 60s waits per assertion. Tests: * browse/test/terminal-agent-watchdog.test.ts — 7 static-grep tripwires for the load-bearing invariants (ownsTerminalAgent gate, PID-based liveness, crash-loop guard with window pruning, shutdown cleanup, CLI cold-start uses the same helper, env knob exists). * Live process-kill tests belong in the e2e tier; cheaper invariants here catch refactor regressions in ~1ms each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(cli): opt-in outer supervisor — respawn browse server on crash Pre-v1.44 `$B connect` was fire-and-forget: spawn server detached, CLI exits, server runs unsupervised. If the server crashed (OOM, uncaught exception, signal kill from a runaway debugger), the user had to notice, re-run `$B connect`, and resume work. The v1.44 terminal-agent watchdog recovers from one layer of failure; this commit closes the outer loop. Opt-in via `--supervise` flag or `BROWSE_SUPERVISE=1` env. Default behavior is unchanged — every existing caller (Claude Code's Bash tool, scripts, CI) still gets a prompt return. When the flag is set: * CLI stays attached, polls server PID every 30s via readState() + isProcessAlive (same identity primitive as the terminal-agent watchdog). * On unexpected exit: respawn via the same headed-mode startServer path used initially, then re-spawn the terminal-agent so the PTY recovers too (otherwise sidebar Restart is the only path back). * Crash-loop guard: 5 respawns in a rolling 5-min window → exit 1 with a clear error. Window pruning means a long-lived daemon with sporadic crashes does NOT trip the guard (otherwise we punish the user for the supervisor doing its job). * Backoff: 1s, 2s, 4s, 8s, 30s capped. Env-overridable via GSTACK_SUPERVISOR_BACKOFF for tests. * SIGINT / SIGTERM: clean teardown — signals the supervised server before exiting itself. Without this, Ctrl-C leaves an orphaned server. Out of scope (deferred follow-up): routing the Chromium-disconnect exit-code-1 path back through this supervisor. The terminal-agent watchdog already covers the highest-frequency restart case; Chromium crash recovery joins the queue as its own commit. Test (browse/test/cli-supervisor.test.ts): * 6 static-grep tripwires: opt-in default, signal wiring, crash-loop guard with window pruning, backoff schedule env knob, tick interval env knob, terminal-agent re-spawn after server respawn. * Live respawn tests belong in the e2e tier (real spawn cycles take 3-8s each; spamming these in the free tier would balloon CI time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): pty-session-lease registry — stable sessionId + lease lifecycle Foundation for Commit 2 of the long-lived-sidebar PR. Separates two concerns that pre-v1.44 were conflated under one token: * sessionId — stable, non-secret identifier for a single PTY session. Safe to log, safe in URLs, safe in DevTools. Identifies "this terminal," not "you're allowed to use this terminal." * lease — server-side bookkeeping that maps sessionId → expiresAt. Re-attach within the lease window resumes the same PTY; expiry tears it down. The companion attach-token primitive (short-lived 30s bearer) reuses the existing browse/src/pty-session-cookie.ts module unchanged — the lease adds a name-space alongside, it doesn't replace anything. Codex outside-voice (T1 of the eng review) flagged the original D4 "token IS sessionId" design as conflating identity with auth. The fix is this lease registry: re-attach URLs carry the stable sessionId (loggable), the short-lived attachToken stays out of logs. API: * mintLease() → { sessionId, expiresAt } * validateLease(sessionId) → { ok: true, expiresAt } \| { ok: false } * refreshLease(sessionId) — validate-first, never resurrects expired leases. Security-critical: the 30-min TTL is what bounds blast radius for a leaked attachToken whose lease should have GC'd. * revokeLease(sessionId) — explicit dispose path. * leaseCount() — observability helper. * __resetLeases() — test-only. TTL env knob (GSTACK_PTY_LEASE_TTL_MS) lets v1.44 e2e tests compress the detach window to 1s instead of waiting 30 minutes per assertion. Server.ts wiring + /pty-session shape change + /pty-restart + /pty-dispose + /pty-session/reattach all land in subsequent commits in this branch. Test (browse/test/pty-session-lease.test.ts): * 8 cases pinning mint uniqueness, validate-first refresh contract, revoke idempotency, null/undefined tolerance, and the negative case that refresh never resurrects a revoked lease (same code path as expired-and-pruned). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(terminal-agent): sessionId-aware grant + scoped restart + eager spawn Wires the pty-session-lease primitive (`3aada48b`) into terminal-agent so the Commit 2 work in server.ts (next commit) can route /pty-restart and re-attach by session identity rather than by single-use token. Changes: * validTokens: Set<string> → Map<string, string\|null>. Each grant carries its bound sessionId (or null for legacy single-grant callers). On WS upgrade, the agent surfaces the bound sessionId via ws.data so open() can register the session in the new reverse index. * sessionsById: Map<sessionId, PtySession> — populated in open(), cleared in close(). Required so /internal/restart can find and dispose one specific session by id rather than enumerating all live sessions. * /internal/restart: scoped to one sessionId. Codex T2 of the eng review caught the gap — pre-spec the route would have disposed every PTY on the agent, breaking pair-agent and any future multi-sidebar setup. The body now requires `{sessionId}`; missing or unknown id returns `{killed: 0}` and leaves siblings alone. * maybeSpawnPty(ws, session): hoisted from the inline binary-frame spawn block so both the legacy "spawn on first keystroke" trigger AND the new `{type:"start"}` text-frame trigger land in the same code path. Idempotent on session.spawned. * `{type:"start"}` text frame: explicit spawn trigger. forceRestart (extension side, lands in Commit 2C) sends this immediately on every fresh WS so claude boots without requiring a keystroke. Pre-v1.44 the lazy-binary-spawn pattern made the restart feel stuck. * close(ws): drops the sessionsById entry alongside the existing sessions WeakMap + validTokens cleanup. Commit 3 will revisit this to keep the session alive for a 60s detach window before disposing. Test (browse/test/terminal-agent-session-routing.test.ts): * 8 static-grep tripwires pinning the load-bearing properties: validTokens is a Map (not Set), sessionsById exists, /internal/restart is scoped (negative-assert against enumerate-all patterns), WS upgrade plumbs sessionId, maybeSpawnPty is the single spawn entry, close() drops the index. Live spawn cycles belong in the e2e tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): /pty-session 4-tuple + /pty-restart + /pty-dispose + lease-refresh Wires the lease + attachToken model end-to-end on the server side. The client side (extension) lands in the next commit; agent side already shipped in `449144cd`. Routes: * POST /pty-session — mints sessionId (stable, loggable) + lease (server-side bookkeeping) + attachToken (short-lived bearer for the WS upgrade). Returns the 4-tuple in one round trip. Legacy ptySessionToken / expiresAt aliases kept for one minor release so extensions on the v1.43 wire shape keep working. * POST /pty-session/reattach — validates a sessionId's lease and mints a FRESH attachToken bound to the same sessionId. Used by Commit 3's re-attach loop; 410 Gone when the lease has expired so the client knows to fall back to a brand-new /pty-session. * POST /pty-restart — one transaction: dispose the caller's existing PtySession on the agent (via /internal/restart, scoped to one sessionId — codex T2), revoke the old lease, mint a fresh sessionId + lease + attachToken, return the 4-tuple. Zero race window between kill and mint (codex T2 + D8 of the eng review). * POST /pty-dispose — explicit teardown. sendBeacon-compatible: accepts auth token in the body so the extension's pagehide handler (Commit 2C) can fire it without setting custom headers (sendBeacon doesn't support those). Without this route, every clean browser quit leaves a zombie PTY alive for the 60s detach window — codex T3 caught it. * POST /internal/lease-refresh — loopback from terminal-agent on its 25s keepalive cycle (lazy: only when lease is within 5 min of expiry). Refreshes the lease AND resets the daemon idle timer. T6 of the eng review: PTY activity (not arbitrary SSE consumers) is what keeps the daemon alive when the sidebar is in use. Helpers: * grantPtyToken now accepts optional sessionId and passes it through to the agent's /internal/grant body. The agent binds token → sessionId in its validTokens Map so /ws upgrades carry the sessionId for /internal/restart and Commit 3 re-attach lookups. * restartPtySession() — new loopback helper that POSTs the agent's scoped /internal/restart with a sessionId body. Used by /pty-restart and /pty-dispose. Auth contract on /pty-dispose deliberately accepts the auth token in EITHER the Authorization header OR the request body. The body path is required for sendBeacon (which can't set custom headers); the header path stays available for non-beacon callers and tests. Test (browse/test/server-pty-lease-routes.test.ts): * 7 static-grep tripwires pinning the 4-tuple shape, validate-first re-attach with 410 fallback, one-transaction restart semantics, sendBeacon-compatible dispose auth, and the T6 PTY-only idle reset. * Live route exercises (full mint + grant + WS upgrade cycle) belong in the e2e tier — they require a real terminal-agent loopback and take seconds per assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): forceRestart via /pty-restart + pagehide /pty-dispose Closes the Commit 2 loop: server-side lease + restart routes shipped in 25ef24e9; this commit wires the extension client to use them. End-to-end result — clicking Restart now actually kills the server's PTY before opening a new WS (zero race window), and closing the sidebar / quitting the browser disposes the PTY immediately instead of letting it linger for the upcoming 60s detach window. sidepanel-terminal.js: * mintSession callers read the v1.44 4-tuple (sessionId + attachToken) from /pty-session, with a backward-compat fallback to ptySessionToken so a partially-updated extension still works against a fresh server for one minor release. * Eager spawn via {type:"start"} text frame replaces the legacy `TextEncoder().encode("\n")` newline hack. Pre-v1.44, the lazy-binary- spawn pattern made forceRestart look stuck until the user typed — now claude boots before the prompt renders. * forceRestart() rewritten as an async one-transaction handler: 1. close current WS with code 4001 (intentional-restart) 2. POST /pty-restart with priorSessionId so the server can scope the dispose, then mint fresh sessionId + lease + attachToken in the same response 3. Open new WS with the returned attachToken, send {type:"start"} immediately for eager spawn 4. On 401: sticky-abort the auto-connect loop (no spam) 5. On 503 / network failure: fall back to patient autoconnect * currentSessionId tracked and exposed on window.gstackPtySession so sidepanel.js's pagehide handler can sendBeacon the dispose. sidepanel.js: * New pagehide handler fires navigator.sendBeacon('/pty-dispose', {sessionId, authToken}) on tab close, panel close, browser quit, or extension reload. sendBeacon-compatible: auth token rides in the body since sendBeacon can't set custom headers (server route accepts body-auth per `25ef24e9`). * try/catch around the entire body so a sendBeacon failure can't interfere with the browser's unload sequence — the 60s detach window from Commit 3 catches anything we miss. There's bounded duplication between connect() and forceRestart() (~70 lines of WS attach/handler wiring). Extracting a shared helper is a clean follow-up but out of scope for the v1.44 ship — both paths are exercised by the same e2e test. Test (browse/test/sidepanel-restart-dispose.test.ts): * 9 static-grep tripwires pinning the 4-tuple parse, eager spawn, close-code 4001 contract, /pty-restart wire shape, sticky-abort 401 path, sessionId window plumbing, sendBeacon body contract, and the best-effort try/catch around pagehide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(terminal-agent): scrollback ring buffer + detach state machine + re-attach The agent side of Commit 3 — the "magic" feature. A network blip (wifi hiccup, MV3 panel suspend, brief Chromium pause) now silently reconnects the sidebar to the SAME claude session with scrollback intact. No more "Session ended" message + manual Restart click + losing your tool-call output. Server-side /pty-session/reattach (`25ef24e9`) and the extension re-attach loop (next commit) close the loop end-to-end. Ring buffer (T10): * Per-session frames: Buffer[] capped at 1 MB (env-overridable via GSTACK_PTY_RING_BUFFER_BYTES). Each PTY write is one frame, so eviction is at frame boundaries and never cuts a UTF-8 sequence or ANSI CSI in half. * appendToRingBuffer eviction loop keeps at least one frame even at extreme caps — a single oversized frame can't empty the buffer. * Alt-screen tracking via canonical xterm CSI ?1049h / CSI ?1049l sequences. lastIndexOf comparison so trailing state wins when both appear in one render frame (quick tool-call open+close). Replay payload (T5 — codex outside-voice): * buildReplayPayload prefixes DECSTR soft reset (\x1b[!p) and conditionally re-enters alt-screen if claude was in a tool call at detach. The client writes RIS (\x1bc) FIRST to clear pre-blip xterm content; the server's prelude resets character attributes; the ring buffer replays cleanly on top. * Order is enforced by the {type:"reattach-begin"} text frame the agent sends right before the binary replay — client waits for it, writes RIS, then treats the next binary frame as the replay payload. Detach state machine (T9): * PtySession.liveWs decouples the PTY callback from the original ws closure. On re-attach, swapping session.liveWs is enough — the on-data callback writes to the new ws automatically. * close(ws, code, _reason): codes 4001 (intentional restart), 4404 (no-claude), and 1000 (clean exit) trigger immediate dispose. Anything else (1006 abnormal, 1001 going-away from network blip / panel suspend) starts a 60s detach timer instead. claude keeps running, output keeps accumulating in the ring buffer. * Detach timer is unref'd so the bun process can still exit cleanly on natural shutdown. * Sessions without a sessionId (legacy single-shot grants) can't re-attach by definition — those fall through to immediate dispose. Re-attach lookup (T9): * WS open() checks sessionsById[sessionId] FIRST. If a detached session is sitting there, cancel its detach timer, swap liveWs, rebind the WS-keyed map, restart keepalive, send reattach-begin + replay payload. The PTY process is unchanged. * /internal/restart now cancels any pending detach timer before disposal — otherwise the timer would later try to dispose an already-disposed session. Env knobs for e2e: * GSTACK_PTY_RING_BUFFER_BYTES — compress to 256 for eviction tests. * GSTACK_PTY_DETACH_WINDOW_MS — compress to 1000 for "did the timer fire?" tests without waiting a minute per assertion. Tests: * browse/test/terminal-agent-detach-reattach.test.ts — 10 static-grep tripwires for the load-bearing properties: interface shape, env knobs, eviction floor, alt-screen tracking, replay prelude composition, re-attach lookup, close-code routing, detach timer unref, /internal/restart timer cancellation, on-data through session.liveWs. * browse/test/terminal-agent-session-routing.test.ts test 7 widened to match the new close(ws, code, _reason) signature. * browse/test/terminal-agent-keepalive.test.ts test 3 widened similarly. Both stay regressions for the prior contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(sidebar): silent re-attach with scrollback replay (Commit 3 client side) Closes the v1.44 long-lived-sidebar loop end-to-end. When the WS dies for a transient reason (wifi blip, MV3 panel suspend, brief Chromium pause), the sidebar now silently re-attaches to the SAME claude session inside the server's 60s detach window. Scrollback replays cleanly; the user keeps typing without noticing anything happened. State machine: * New STATE.RECONNECTING covers the in-flight re-attach window. setState transitions out of this state reset reattachInFlight so a concurrent user action (Restart click, panel navigate) short-circuits cleanly. * Backoff schedule REATTACH_BACKOFF_MS = [1000, 2000, 4000, 8000] then 8s steady until REATTACH_WINDOW_MS (60s) elapses. Past that point the server has disposed our session and /pty-session/reattach returns 410 Gone. startReattachLoop(prevSessionId): * Posts /pty-session/reattach with sessionId. * On 200 with a valid 4-tuple, opens the post-reattach WS directly. * On 410 (lease expired) — short-circuits to ENDED. No retry; the user clicks Restart for a fresh session. * On 401 — sticky-aborts the auto-connect loop. Same defense as `25ef24e9` so we don't spam "Auth invalid" every 2s. * On network failure or other non-OK status — schedules the next backoff tick. openReattachWebSocket(terminalPort, attachToken, sessionId): * Mostly a clone of connect()'s attach wiring. Reuses the live xterm element — RIS clears the buffer cleanly when the agent's {type:"reattach-begin"} arrives, so the visual flash is minimal. * Handshake: on `{type:"reattach-begin"}` text frame → write `\x1bc` (RIS) to xterm + set nextBinaryIsReplay = true. The next binary frame IS the server-built replay payload (DECSTR soft-reset prefix + optional alt-screen re-enter + ring buffer contents). * If THIS reattach WS also dies uncleanly, recurses into another re-attach loop with the same sessionId — the server's detach window may still be open. State guard prevents runaway recursion. connect() + forceRestart() close handlers (existing): * Both updated to call startReattachLoop on transient close codes (anything other than 1000 / 4001 / 4404). Was just setState(ENDED). * Clean codes still bypass — re-attaching to a force-restart's pre-restart session would be the bug we're avoiding. Test (browse/test/sidepanel-reattach.test.ts): * 8 static-grep tripwires for the load-bearing properties: state constant, backoff schedule, /pty-session/reattach wiring, 410 short-circuit (no retry past lease window), 401 sticky-abort, reattach-begin → RIS handshake, all three close handlers route through the loop, clean-code bypass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.44.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(terminal-agent): runtime tests for ring buffer + replay + alt-screen tracking Companion to browse/test/terminal-agent-detach-reattach.test.ts (static-grep tripwires) — calls appendToRingBuffer + buildReplayPayload directly to prove behavioral correctness without spinning up a real Bun.serve listener. * 11 runtime cases: append + byte counting, oversize eviction with one-frame floor (the eviction loop guard that prevents an oversized single frame from emptying the buffer), alt-screen tracking via canonical xterm CSI ?1049h / CSI ?1049l, trailing-state-wins for enter+exit pairs inside a single render frame, soft-reset prefix ordering, optional alt-screen re-enter, payload length math. * Exports appendToRingBuffer, buildReplayPayload, and the PtySession interface from terminal-agent.ts (purely for testability — they were module-private; the change is annotation-only). * Lease registry sanity check: mint two sessions, verify distinct sessionIds, both valid simultaneously. Catches future refactors that accidentally couple lease + ring buffer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): explain_level unset returns the documented default, not empty Pre-existing failure on main — the test expected gstack-config to return "" for an unset explain_level (with the comment "preamble default takes over"), but the script at bin/gstack-config:103 explicitly returns "default" inline for that key. Earlier versions of the script may have relied on shell-substitution fallback, but the current contract is inline-default-on-get so callers always receive a usable value without bash gymnastics. Updated the test to match the actual contract. Also added GSTACK_HOME override alongside GSTACK_STATE_DIR in the spawn env so developer-machine config doesn't bleed into the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-24 01:43:51 -07:00
Garry Tan	7ca04d8ef0	v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594 ) * fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569) gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin (e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints, analytics, and learnings into that plugin's directory. Anyone with the Codex plugin installed alongside gstack hit this silently. Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path contains "gstack"). Skill installs fall through to \$HOME/.gstack. Contributed by @ElliotDrel via #1570. Closes #1569. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+ gbrain v0.20+ changed `gbrain sources list --json` to return {sources: [...]} instead of a flat array. sourceLocalPath crashed upstream with `list.find is not a function` on every /sync-gbrain invocation against modern gbrain. Accept both shapes for forward/backward compat, matching probeSource/sourcePageCount in lib/gbrain-sources.ts. Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564 (@tonyjzhou, same fix, different shape — credit retained). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559) gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`, which fails in any environment where the `command` builtin isn't on the spawned process's PATH (most non-interactive shells). The probe then reported gbrain as missing even when it was installed, and context-load silently skipped vector/list queries. Fix: probe `gbrain --version` directly with a 500ms timeout (matching the rest of the file's MCP_TIMEOUT_MS). Same semantics, works everywhere execFile works. Contributed by @jbetala7 via #1560. Closes #1559. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418) Adds an exec-path regression test that runs a fake gbrain shim emitting the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings", exit 1 for health_score < 100, no top-level `engine` field). Confirms freshDetectEngineTier recovers stdout from the non-zero exit and falls back to GBRAIN_HOME/config.json for the engine label. The pre-existing test for #1415 only stripped gbrain from PATH; this test exercises the actual doctor parse path, closing the gap that codex's plan review flagged. Also documents the schema_version separation in lib/gbrain-local-status.ts: the local CacheEntry stays at version 1, distinct from the doctor-output schema_version which we accept across versions in gstack-memory-helpers. Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2 collapse). The fix landed pre-emptively in v1.29.x; this commit pins it with a stronger test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346) #1346 reported that gstack-memory-ingest still called the renamed gbrain put_page subcommand on gbrain v0.18+. The actual code migrated to `gbrain put` and later to batch `gbrain import <dir>` before this report landed — only documentation lag remained. This commit: - Updates the --help string ("Skip gbrain put calls (still updates state file)") so user-facing docs match the shipped subcommand - Updates two inline comments that still referenced the old name - Adds test/memory-ingest-no-put_page.test.ts: a regression pin that strips comments from bin/gstack-memory-ingest.ts and fails the build if "put_page" appears in any active code or string literal, plus a sanity check that the file still calls a supported gbrain page-write verb (put or import) Closes #1346. Reporter @kylma-code surfaced the doc lag; the original code migration credit is on the v1.27.x wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug> scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions using the renamed `gbrain put_page` subcommand across 10 skills (office-hours, investigate, plan-ceo-review, retro, plan-eng-review, ship, cso, design-consultation, fallback, entity-stub). Every gstack user copying those snippets hit "unknown command: put_page" on gbrain v0.18+. This commit: - Rewrites all 10 instruction templates to use `gbrain put <slug> --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML frontmatter inside --content, matching the v0.18+ subcommand shape - Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands" table to reference `gbrain put` and `gbrain get` - Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two invariants: (a) resolver source ships only canonical instructions, (b) every tracked SKILL.md file is free of `gbrain put_page` CHANGELOG entries are deliberately left untouched (historical record). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561) Bun's Windows shell parser rejects multiple constructs the inline package.json build chain used: brace groups `{ cmd; }`, subshells with redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells near redirections in general. Every Windows install + every auto-upgrade since v1.34.2.0 has failed on `bun run build`. Extracts the build chain to scripts/build.sh and the .version writes to scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing involved. Also adds Windows-specific bun.exe handling for non-ASCII PATHs (a separate Windows footgun where Bun's --compile fails when the binary lives under a path with non-ASCII chars). Updates test/build-script-shell-compat.test.ts to assert the new shape: no subshells with redirections anywhere in the build chain, and build delegates to scripts/build.sh which delegates .version writes. Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed in build helper), #1480 (@mikepsinn, partial overlap), #1460 (@realcarsonterry, brace-group fix subsumed) — credit retained. Closes #1538, #1537, #1530, #1457, #1561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554) bun build --compile on Windows appends .exe to the output filename, producing browse.exe instead of browse. find-browse's existsSync probe only checked the bare path and returned null on Windows even when the binary was correctly built. .gitignore similarly only excluded the bare bin/gstack-global-discover path, leaving the .exe variant tracked. This commit: - .gitignore: changes `bin/gstack-global-discover` → `bin/gstack-global-discover` so the Windows .exe variant is ignored - browse/src/find-browse.ts: adds isExecutable + findExecutable helpers that fall back to .exe/.cmd/.bat probing on Windows, mirroring the same helper already in make-pdf/src/browseClient.ts and pdftotext.ts Contributed by @Mike-E-Log via #1554. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest Adds .github/workflows/windows-setup-e2e.yml as the gate that catches Bun shell-parser regressions in the build chain before they reach users. Triggers on PRs touching package.json, scripts/build.sh, scripts/write-version-files.sh, setup, browse cli/find-browse, or gstack-paths. What it verifies: 1. bun run build completes on Windows (the previously-broken path that #1538/#1537/#1530/#1457/#1561 reported) 2. All compiled binaries land on disk (browse.exe, find-browse.exe, design.exe, gstack-global-discover.exe) 3. find-browse resolves to the .exe variant on Windows (regression gate for #1554) 4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT on Windows (regression gate for #1570) Complements the existing windows-free-tests.yml (curated unit subset); this new workflow exercises the install path itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209) Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together (mutually exclusive at argv level). Every /codex review, /review, and /ship structured Codex review call ended with an argv error before the model ran. Fix: scope the diff in prompt text using "Run git diff origin/<base>...HEAD 2>/dev/null \|\| git diff <base>...HEAD" instead of `--base <base>`. Preserves the filesystem boundary instruction across all invocations and keeps Codex's review prompt tuning. Touches: - codex/SKILL.md.tmpl + regenerated codex/SKILL.md - scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md - test/gen-skill-docs.test.ts: new regression that fails if any of the five known files still contain the prompt+--base shape - test/skill-validation.test.ts: corresponding negative + positive pin on the rendered SKILL.md files Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527 (@mvanhorn — same intent, different patch shape, CONFLICTING) and #1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): diff from git merge-base, not git diff origin/<base> (#1492) git diff origin/<base> shows everything since the common ancestor in both directions — it includes commits that landed on origin/<base> after this branch was created as deletions. That made /review and /ship's pre-landing structured review report inflated diff totals and flagged "removed" code that was actually still present in the working tree. Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff the working tree against that point. Same coverage of uncommitted edits, no phantom deletions from out-of-order base advancement. Applies to /review's Step 1 (diff existence check), Step 3 (get the diff), the build-on-intent scope-creep check, the structured review DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt. Same change flows into ship/SKILL.md via the shared resolver. Touches: - review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md - scripts/resolvers/review.ts - scripts/resolvers/review-army.ts Contributed by @mvanhorn via #1492. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522) #1503 reported that the bare codex review --base path stripped the filesystem boundary instruction, letting Codex spend tokens reading .claude/skills/ and agents/. #1522 proposed adding a skill-path detector that switched to the custom-instructions route when the diff touched skill files. After C10 (#1209) restructured codex review to always carry the boundary in the prompt (the prompt+--base argv conflict forced the restructure), the skill-path detector becomes redundant — every default call already preserves the boundary. This commit pins the post-#1209 invariant with a test that fails the build if any future refactor strips the boundary from codex/SKILL.md, review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test. #1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers its safety concern); credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): use command -v instead of which for codex detection (#1197) `which` is not on PATH in every shell — some Windows shells, BusyBox- only containers, and minimal CI images all fail when skills probe codex availability via `which codex`. `command -v` is a POSIX builtin and always available where the skill is running. Touched: - codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex \|\| echo "") - scripts/resolvers/review.ts and scripts/resolvers/design.ts: 3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1` - Regenerated all 10 affected SKILL.md files (codex, review, ship, design-consultation, design-review, office-hours, plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review) - test/skill-validation.test.ts: updated pin + defensive regression test that fails if `which codex` returns to codex/SKILL.md - test/skill-e2e-plan.test.ts: updated summary regex Contributed by @mvanhorn via #1197. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327) When codex exits non-zero (parse errors, arg-shape breaks, model API errors that propagate as non-zero status), the calling agent previously saw an empty output and burned 30-60 minutes misdiagnosing as a silent model/API stall. The hang-detection block only caught exit 124 (the timeout-wrapper signal). Adds elif blocks in all four codex invocation sites (Review default, Challenge, Consult new-session, Consult resume) that: - Echo "[codex exit N] <stderr first line>" to stdout - Indent the first 20 stderr lines for inline context - Log codex_nonzero_exit telemetry tagged with the call site Contributed by @genisis0x via #1467. Closes #1327. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248) The design binary previously called process.env.OPENAI_API_KEY without checking where the key came from. If a user ran $D inside someone else's project that had OPENAI_API_KEY in its .env, the resulting generation billed that project's account. Silent and irreversible. Fix: resolveApiKeyInfo() returns both the key and its source. When the env-var path matches an OPENAI_API_KEY entry in the current directory's .env, .env.<NODE_ENV>, or .env.local file, we set a warning. requireApiKey() prints "Using OpenAI key from <source>" plus the warning before the run — never the key itself. Adds 6 unit tests covering: config-vs-env precedence, env-only (no match), env+cwd .env match, quoted/exported values, value-mismatch (no false positive), and the no-leak invariant for requireApiKey stderr output. Contributed by @jbetala7 via #1278. Closes #1248. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214) Full-page screenshots of tall pages routinely exceeded 2000px on the longest dimension, silently bricking the agent's session: the resulting base64 reached the Anthropic vision API which rejected the oversized image, leaving the agent burning turns on a useless blob with no stderr trace from the browse side. Adds browse/src/screenshot-size-guard.ts as a shared helper: - guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000 - guardScreenshotPath(path) → file-mode variant that rewrites in place - Aspect ratio preserved via sharp's resize fit:inside - Stderr diagnostic on any downscale so callers can see when it fired - Lazy sharp import so non-screenshot paths pay no startup cost Wires the guard into all three full-page callsites codex review flagged: - browse/src/snapshot.ts: annotated + heatmap fullPage captures - browse/src/meta-commands.ts: screenshot command (path + base64 fullPage modes) plus the responsive 3-viewport sweep - browse/src/write-commands.ts: prettyscreenshot fullPage path Covers seven unit cases (pass-through, downscale, aspect ratio, exactly-2000px edge, file-mode rewrite) plus a static invariant test that fails the build if any of the three callsites stops importing the guard. Closes #1214. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370) The L4 TestSavant classifier in browse/src/security-classifier.ts can't be imported into the compiled browse server (onnxruntime-node dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent that used to host it (sidebar-agent.ts) was removed when the PTY proved out — leaving the classifier file shipped but with zero callers. Exactly the gap codex flagged in #1370. Adds browse/src/security-sidecar-entry.ts: a Node script that runs the classifier as a subprocess of the browse server. It reads NDJSON requests from stdin and writes id-correlated NDJSON responses to stdout, supporting: - op: "scan-page-content" — full L4 classifier scan - op: "ping" — liveness probe for the client's health check - op: "status" — classifier readiness (used by /pty-inject-scan to surface l4 { available: bool } in its response) Plus browse/src/find-security-sidecar.ts: a resolver that locates node + the bundled JS entry (browse/dist/security-sidecar.js, built in a follow-up package.json change) or falls back to the dev TS entry. Returns null cleanly when node isn't on PATH so the calling endpoint can degrade per D7 (extension WARN + user confirm). C17 of the security-stack wave. C18 adds the IPC client + lifecycle management; C19 wires the endpoint; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370) Adds browse/src/security-sidecar-client.ts to manage the Node L4 classifier subprocess from the compiled browse server: - Lazy spawn on first scan; reuses the same process across requests - Id-correlated request/response via NDJSON over stdio - 5s default per-scan timeout; 64KB payload cap (short-circuits before spawn so oversized requests don't waste a process) - 3-in-10-minutes respawn cap → trips circuit breaker; subsequent scans throw immediately so the /pty-inject-scan endpoint can surface l4 { available: false } to the extension and degrade to WARN+confirm - process.on('exit') sends SIGTERM to the child for clean teardown - isSidecarAvailable() lets the endpoint probe before scan calls so the response shape reflects degraded mode honestly Unit tests cover the payload cap, the availability probe, and the breaker-doesn't-crash invariant under repeated rejected calls. C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370) The sidebar's gstackInjectToTerminal callers (toolbar Cleanup, Inspector "Send to Code") were piping page-derived text directly into the live claude PTY with ZERO classifier processing — the gap codex flagged in #1370. The documented sidebar security stack had a hole the size of every Cleanup-button click. Adds POST /pty-inject-scan to browse/src/server.ts: - Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the general 404 path; never reaches the scan logic) - Root-token auth via existing validateAuth() — 401 on unauth - 64KB request cap → 413 + payload-too-large body - 5s scan timeout via sidecar client - URL-blocklist forced to BLOCK in PTY context (page-derived REPL input is higher-risk than ordinary tool output) - L4 ML classifier via the sidecar when available; degrades to WARN per D7 when sidecar is unavailable - Response goes through JSON.stringify(..., sanitizeReplacer) per v1.38.0.0 Unicode-egress hardening - Imports only from security-sidecar-client.ts, never directly from security-classifier.ts (which would brick the compiled Bun binary) Seven static-invariant tests pin the POST verb, auth gate, 64KB cap, tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability shape, and the no-direct-classifier-import rule. C19 of the security-stack wave. C20 routes the extension through it; C21 adds the invariant AST check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370) Closes the documented-vs-shipped gap codex flagged in #1370. The sidebar's two PTY-injection call sites (Inspector "Send to Code" and toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint before writing to the live claude REPL. Adds window.gstackScanForPTYInject(text, origin) to extension/sidepanel-terminal.js: - Async, returns { allow, verdict, reasons, l4 } - POST to /pty-inject-scan with the existing root-token auth - WARN+confirm on scan failure (network down, sidecar absent, etc.) rather than silent PASS — D7 honest-degradation gstackInjectToTerminal stays synchronous, returns boolean. Per D6: keeping the inject sync means existing `const ok = ...?.()` callers don't break, and the invariant test in test/extension-pty-inject-invariant.test.ts can statically pin that every call goes through the scan first. extension/sidepanel.js call sites updated: - inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via window.confirm, PASS injects silently - runCleanup() → same flow. Static cleanup prompt always PASSes but still routes through scan to honor the invariant. C20 of the security-stack wave. C21 adds the static invariant test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(security): invariant — extension PTY inject must be scan-gated (#1370) Static-analysis invariant test that fails the build if any extension/.js path calls window.gstackInjectToTerminal without a preceding window.gstackScanForPTYInject in the same enclosing function. Closes the documented-vs-shipped gap codex demanded a machine check on. Rules: - Rule 1: any file that calls inject must also reference scan - Rule 2: in the enclosing function (function declaration, arrow, async (), event handler), a scan call must appear before the inject call by source position - Exemption: sidepanel-terminal.js (the file that DEFINES the inject function) is exempt from Rule 2 since the definition is not a call Plus two structural checks: - sidepanel-terminal.js defines both the inject and scan functions - inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would silently break the `const ok = ...?.()` pattern at every caller C21 of the security-stack wave. The sidecar architecture (#1370) is complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension pre-scan wiring (C20), and now the regression gate (C21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112) Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current browse/src/stealth.ts contract. The existing minimal "codex narrowed" stealth (webdriver-mask + AutomationControlled launch arg) stays the default. PR #1112's six additional patches are added behind an opt-in GSTACK_STEALTH=extended env flag. Extended-mode patches (applied AFTER the default mask, in order): 1. delete navigator.webdriver from prototype (not just the getter — detectors check `"webdriver" in navigator`) 2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1 software-GPU tell in containers) 3. navigator.plugins returns a PluginArray-prototype-passing array with MimeType objects and namedItem() 4. window.chrome populated with chrome.app, chrome.runtime, chrome.loadTimes(), chrome.csi() with realistic shapes 5. navigator.mediaDevices backfilled when headless drops it 6. CDP cdc_-prefixed window globals cleared Why opt-in: the default mode's contract is fingerprint CONSISTENCY, which protects against detectors that flag spoofing mismatch. Extended mode actively lies about the environment; sites that reflect on these properties can break. Users who hit detection in default mode can flip GSTACK_STEALTH=extended for SannySoft 100% pass-rate. Twenty unit tests pin the env-flag semantics, all six patches' code presence, and the applyStealth wiring order. Live SannySoft pass-rate verification stays in the periodic-tier E2E suite. Contributed by @garrytan via #1112 (rebased — original PR opened before the codex-narrowed minimum landed; rebase preserves the narrowed default while adding the SannySoft-passing path as opt-in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates Updates the three ship-SKILL.md golden baselines (claude, codex, factory hosts) to match the new shape produced by: - C10 #1209 codex argv (prompt + diff scope, no --base) - C11 #1492 merge-base diff (DIFF_BASE= preamble) - C13 #1197 command -v for codex detection - C12 + boundary preservation per regen-enforcing test Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs, commit the regenerated outputs together. Goldens are part of the regen contract — without this commit, test/host-config.test.ts' golden-baseline checks fail with the diff codex review surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes) Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the release-summary format in CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table, "What this means for builders" closer, then itemized Added/Changed/Fixed/For contributors with inline credit to every PR author and original issue reporter. Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net, substantial new capability across security (PTY sidecar wiring), install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI 0.130+), and quality (screenshot guard, design key disclosure, extended stealth opt-in). MINOR is the right call. Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537, #1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327, #1193 pattern, #1152 pattern. Credit retained inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe] windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a freshly-built repo where binaries land at browse/dist/browse.exe (no .claude/skills/gstack/ install layout). The previous markers chain only matched .codex/.agents/.claude prefixed paths, so find-browse exited "not found" even when the binary was present. Adds a source-checkout fallback after the marker scan: if no installed layout resolves but <git-root>/browse/dist/browse[.exe] exists, return that. Three real callers hit this path: - gstack repo dev workflow before `./setup` runs - windows-setup-e2e.yml CI (the breakage that surfaced this) - make-pdf consumers running from a sibling source checkout Smoke-verified: a fresh git repo with browse/dist/browse on disk now resolves through the source-checkout branch (was returning null before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:35:01 -07:00
Garry Tan	00f966b3ec	v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391 ) * fix(codex): use resume-compatible flags * fix: V-001 security vulnerability Automated security fix generated by Orbis Security AI * docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 in `d75402bb` (v1.6.4.0, "cut Haiku classifier FP from 44% to 23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and ARCHITECTURE.md still read 0.60. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table). No code change. No behavior change. Pure doc-vs-code alignment. Verification: $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md browse/src/security.ts:37: WARN: 0.75, CLAUDE.md:290: - \`WARN: 0.75\` ... ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)... BROWSER.md:743: - \`WARN: 0.75\` ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: Korean/CJK IME input and rendering in Sidebar Terminal Fixes #1272 This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal: Bug 1 - IME Input: Korean text typed via IME composition was not reaching the PTY correctly. Added compositionstart/compositionend event listeners to suppress partial jamo fragments and only send the final composed string. Bug 2a - Font Rendering: Added CJK monospace font fallbacks ("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js fontFamily config and the CSS --font-mono variable. This ensures consistent cell-width calculations for Korean characters. Bug 2b - UTF-8 Boundary Detection: Added buffering logic to prevent multi-byte UTF-8 characters (Korean is 3 bytes) from being split across WebSocket chunks. This follows the same pattern as PR #1007 which fixed the sidebar-agent path, but extends it to the terminal-agent path. Special thanks to @ldybob for the excellent root cause analysis and proposed solutions in issue #1272. Tested on WSL2 + Windows 11 with Korean IME. * fix(ship): tighten Plan Completion gate (VAS-449 remediation) VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has /docs/dashboard.md) silently skipped. /ship's Plan Completion subagent existed at ship time (added in v1.4.1.0) but the gate let the failure through. Four structural fixes: 1. Path concreteness rule: items naming a concrete filesystem path MUST be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE. 2. Validator detection: CONTENT-SHAPE items scan target repo's package.json for validate-* scripts and run them before falling back to UNVERIFIABLE. 3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked each one" with per-item Y/N/D loop. The blanket-confirm path is the exact failure VAS-449 surfaced. 4. Subagent fail-closed: if Plan Completion subagent + inline fallback both fail, surface explicit AskUserQuestion instead of silent pass. Replaces the prior "Never block /ship on subagent failure" fail-open. Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions, no LLM dependency, ~60ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): bash.exe wrap for telemetry on Windows reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args) where bin is the gstack-telemetry-log bash script. On Windows this fails silently with ENOENT — CreateProcess can't dispatch on shebang lines. Adopts v1.24.0.0's Bun.which + GSTACK__BIN override pattern (from browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path or PATH-resolvable override, falling back to Bun.which('bash') which finds Git Bash on the standard Windows install. buildTelemetrySpawnCommand() wraps the script invocation on win32 only; POSIX path is bit-identical. Returns null when bash can't be resolved on Windows so caller skips spawn — local attempts.jsonl audit trail keeps working without surfacing a Windows-only failure. 8 new unit tests cover resolveBashBinary (POSIX bash, absolute override, quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand (POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array immutability). POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on. fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows Extends v1.24.0.0's Bun.which + GSTACK__BIN override pattern (introduced in browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and make-pdf/src/pdftotext.ts:resolvePdftotext. Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which` isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same Bun.which-based fix shape, same env override convention. Changes: - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides; BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases. - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles Windows PATHEXT natively; no more `where`-vs-`which` branch. - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat after the bare-path miss on win32. Linux/macOS behavior is bit-identical (isExecutable short-circuits before the win32 branch ever runs). - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local, /usr/bin). No Windows candidates added; Poppler installs scatter across Scoop/Chocolatey/portable zips and guessing causes false positives. - Error messages get a Windows install hint (scoop install poppler / oschwartz10612) and `setx` example for GSTACK__BIN. - Pre-existing test 'honors BROWSE_BIN when it points at a real executable' was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own. Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a narrower scope; this PR's pdftotext + cross-platform tests + GSTACK__BIN naming are additive. Either order of merge works. Test plan: - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null, resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip, same shape for resolvePdftotext + Windows install hint in error message). - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits before any win32 logic runs, matching the v1.24 claude-bin.ts pattern. fix(browse): NTFS ACL hardening for Windows state files via icacls gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent queue contents (with prompt history), session state, security-decision logs, and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows, those mode bits are a silent no-op: Node's fs module doesn't translate POSIX modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable by other principals on the machine (verified via icacls — six ACEs, the intended user is the LAST of six). Threat model is non-trivial on: - Self-hosted CI runners (different service account on the same Windows box can read developer tokens, canary tokens, prompt history) - Shared development machines (agencies, studios, lab environments) - Multi-tenant servers with shared home directories Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin / global / local installs; this PR ensures files written into those resolved paths actually get the POSIX 0o600 semantic translated to NTFS. The fix: - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset). restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX) or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile / appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns. - 19 call sites converted across 9 source files: browser-manager.ts, browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts, security-classifier.ts, security.ts (4 sites), server.ts (5 sites), terminal-agent.ts (8 sites), tunnel-denial-log.ts. - (OI)(CI) inheritance flags on directories mean files created via fs.write* inside an mkdirSecure-created dir inherit the owner-only ACL automatically — important for tunnel-denial-log.ts where appends use async fsp.appendFile. Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened environments) log a one-shot warning to stderr and proceed. Once-per-process gating prevents log spam if the condition persists. Filesystem stays functional; the file just ends up with inherited ACLs. Test plan: - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX mode-bit assertions, Windows no-throw, mkdir idempotence, recursive creation, Buffer payloads, append-creates-then-reapplies-once semantics) - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security test suite plus the bash-binary resolution tests added in fix #1119; the converted writeFileSync/appendFileSync/mkdirSync sites in security.ts integrate cleanly) - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE - bun build typecheck on all modified files — clean (server.ts has a pre-existing playwright-core/electron resolution issue unrelated to this PR) POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux and macOS see no behavior change. Inviting pushback on three judgment calls (in PR description): 1. icacls vs npm library 2. ACL scope — just user, or user + SYSTEM? 3. Graceful degradation — once-per-process warn, not silent, not hard-fail. * fix(browse): declare lastConsoleFlushed to restore console-log persistence flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337 and assigns it at :344, but the `let lastConsoleFlushed = 0;` declaration is missing — only the network and dialog siblings are declared at lines 327-328. Result: every 1-second flushBuffers tick (line 376) throws `ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by the catch at line 369 ("[browse] Buffer flush failed: ..."), and the console branch's append never runs. browse-console.log is never written in any production deployment since this regressed. Discovered by stress-testing the daemon with 15 concurrent CLIs against cold state — the race surfaced the buffer-flush error spam in one spawned daemon's stderr. Verified by running the daemon against a real file:// page with console.log events: in-memory `browse console` returns the entries, but `.gstack/browse-console.log` is never created on disk. Regression introduced by `1a100a2a` "fix: eliminate duplicate command sets in chain, improve flush perf and type safety" — the flush refactor switched from `Bun.write` to `fs.appendFileSync` and added the `lastConsoleFlushed` cursor pattern alongside its network/dialog siblings, but missed the matching `let` declaration. Tests don't currently exercise flushBuffers, so the regression shipped silently. Fix: - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed` and `lastDialogFlushed` (browse/src/server.ts:327) - Add a source-level guard test (browse/test/server-flush-trackers.test.ts) that fails any future refactor that adds a fourth `lastFlushed` cursor without the matching declaration. Same pattern as terminal-agent.test.ts and dual-listener.test.ts — read source as text, assert invariant, no daemon required. Test plan: - [x] New regression test fails on current main, passes with the fix - [x] `bun run build` clean - [x] Manual smoke: spawn daemon -> goto file:// page with console.log -> wait 4s -> .gstack/browse-console.log now exists with the expected entries (163 bytes vs zero before) 🤖 Generated with [Claude Code](https://claude.com/claude-code) fix(browse): per-process state-file temp path to fix concurrent-write ENOENT The daemon writes `.gstack/browse.json` via the standard atomic-rename pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four sites in server.ts use this pattern (initial daemon-startup state at :2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all four hard-code the same temp filename `${stateFile}.tmp`. Under concurrent writers the shared filename races on the rename: t0 Writer A: writeFileSync(stateFile + '.tmp', payloadA) t1 Writer B: writeFileSync(stateFile + '.tmp', payloadB) // overwrites A t2 Writer A: renameSync(stateFile + '.tmp', stateFile) // moves B's payload t3 Writer B: renameSync(stateFile + '.tmp', stateFile) // ENOENT — file gone Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`: [browse] Failed to start: ENOENT: no such file or directory, rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json' Pre-fix success rate: 0 / 15 under cold-start race. Post-fix success rate: 15 / 15, zero ENOENT. Fix: - New `tmpStatePath()` helper (server.ts:333) returns `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}` - All 4 call sites use `tmpStatePath()` instead of the shared literal - Atomic rename still gives last-writer-wins semantics on the final state.json content; only behavior change is that concurrent writers no longer kill each other on the rename step Source-level guard test (browse/test/server-tmp-state-path.test.ts) locks two invariants: (1) no remaining `stateFile + '.tmp'` literals, (2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same read-source-as-text pattern as terminal-agent.test.ts and dual-listener.test.ts — no daemon required, runs in tier-1 free. Test plan: - [x] Targeted source-level guard test passes (3 / 0) - [x] `bun run build` clean - [x] Live regression: 15 concurrent CLIs against cold state → 15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix) - [x] No `.tmp.` orphans left behind after rename succeeds - [x] Related test cluster (server-auth, dual-listener, cdp-mutex, findport) — same pre-existing flakes as `main`, no new regressions introduced 🤖 Generated with [Claude Code](https://claude.com/claude-code) fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage Asymmetric cleanup between two equivalent staleness conditions: onMainFrameNavigated() → clearRefs() + activeFrame = null ✓ getActiveFrameOrPage() → activeFrame = null (refs NOT cleared) ✗ Both paths see the same staleness condition — refs were captured against a frame that no longer exists. The main-frame path correctly clears both pieces of state. The iframe-detach path nulls the frame but leaves the refMap intact. The lazy click-time check in `resolveRef` (tab-session.ts:97) partially saves us — `entry.locator.count()` on a detached-frame locator throws or returns 0, so the click errors out as "Ref X is stale". But the user has no signal that frame context silently changed underfoot: the next `snapshot` runs against `this.page` (main) while old iframe refs still litter `refMap` with the same role+name keys. New refs collide with stale ones, the resolver picks one at random, the user clicks the wrong element. TODOS.md line 816-820 documents "Detached frame auto-recovery" as a shipped iframe-support feature in v0.12.1.0. This restores the documented intent — the recovery should leave the session in a clean state, not a half-cleared one. Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null` inside the if-branch. Test plan: - [x] New regression test: 4/4 pass - refs cleared when getActiveFrameOrPage detects detached iframe - refs preserved when active frame is still attached (no regression) - refs preserved when no frame set (page-level path untouched) - matches onMainFrameNavigated symmetry — both paths reach the same clean end state - [x] `bun run build` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(codex): resolve python for JSON parser * fix: add fail-fast probe for base branch in ship step 12 * fix(plan-devex-review): remove contradictory plan-mode handshake * fix(design): honor Retry-After header in variants 429 handler Closes #1244. The 429 handler in `generateVariant` discarded the `Retry-After` response header and fell straight through to a local exponential schedule (2s/4s/8s). In image-generation batches, that burns retry attempts inside the provider's cooldown window and the request never recovers. Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`) and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds are validated as digits-only (rejects `2abc`). When `Retry-After` is honored (including 0 / past-date "retry now"), the next iteration's leading exponential sleep is skipped so we don't double-wait. Invalid or missing headers fall through to the existing exponential schedule unchanged. Behavior matrix: \| Header \| Behavior \| \|---------------------------------\|-------------------------------------------\| \| Retry-After: 5 \| wait 5s, skip leading on next attempt \| \| Retry-After: 999999 \| capped to 60s, skip leading \| \| Retry-After: 2abc \| invalid, fall through to exponential \| \| Retry-After: 0 \| wait 0, skip leading (retry immediately) \| \| Retry-After: <past HTTP-date> \| wait 0, skip leading \| \| Retry-After: <future date> \| wait diff capped at 60s, skip leading \| \| no header \| fall through to existing exponential \| `generateVariant` now accepts an optional `fetchFn` parameter (defaults to `globalThis.fetch`) so tests can inject a stub. Production call sites are unchanged. Tests cover the five behavior buckets above, asserting both the 1st-to-2nd call timing gap and call counts. All five pass in ~8s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docs): correct per-skill symlink removal snippet in README uninstall Closes #1130. The manual-uninstall fallback in `## Uninstall` → `### Option 2` used `find ~/.claude/skills -maxdepth 1 -type l`, which finds nothing on real installs. Each `~/.claude/skills/<name>/` is a real directory, and only `<name>/SKILL.md` inside it is a symlink into `gstack/`. The find never matched, so the snippet silently removed nothing. Replace with a directory walk that inspects each `<name>/SKILL.md`: find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack → check $dir/SKILL.md is a symlink → readlink it → if target is gstack/* or /gstack/: rm -f the link, rmdir the dir (only if empty — preserves any user-added files) Excludes the top-level `gstack/` dir from the walk; that's removed by step 3 of the same uninstall block. `bin/gstack-uninstall` (the script-mode path) already handles the layout correctly via its own walk; only this manual fallback needed updating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: reject partial browse client env integers * fix(gemini-adapter): detect new ~/.gemini/oauth_creds.json auth path gemini-cli >=0.30 stores OAuth credentials at ~/.gemini/oauth_creds.json instead of the legacy ~/.config/gemini/ directory. The benchmark adapter's availability check now succeeds for users on recent gemini-cli releases who have authenticated via interactive login. Both paths are accepted so users on older versions still work. * fix(browser): add --no-sandbox for root user on Linux/WSL2 Chromium's sandbox can't initialize when running as root on Linux, causing an immediate exit. Extend the existing CI/CONTAINER check to also cover this case, keeping the Windows-safe `typeof getuid` guard. * security: pass cwd to git via execFileSync, not interpolation through /bin/sh `bin/gstack-memory-ingest.ts:632-643` ran `execSync(\`git -C ${JSON.stringify(cwd)} remote get-url origin 2>/dev/null\`, ...)`. JSON.stringify escapes `"` and `\` but not `$` or backticks, so a `cwd` of `"$(touch /tmp/marker)"` survived JSON quoting and detonated under /bin/sh's command-substitution-inside-double-quotes. `cwd` originates from transcript JSONL records under `~/.claude/projects/<encoded-cwd>/<uuid>.jsonl` and `~/.codex/sessions/YYYY/MM/DD/rollout-.jsonl`. The walker grabs the first `.cwd` it sees per session. That's an untrusted surface in the gstack threat model — the L1-L6 sidebar security stack exists exactly because agent transcripts can carry attacker-influenced text. Two pivots above the local same-uid bar: (a) prompt-injection appending `cwd="$(...)"` to the active session log turns the next /sync-gbrain run into RCE under the user's uid; (b) cross-machine transcript share (a colleague's `.claude/projects` snippet untar'd into HOME, a documented gbrain dogfooding shape) → RCE on first sync. Fix swaps the one execSync for `execFileSync("git", ["-C", cwd, "remote", "get-url", "origin"], ...)`. No shell, argv passed directly to git. The same module already uses execFileSync for `gbrainAvailable()` (line 762 pre-patch) and `gbrainPutPage()` (line 816 pre-patch) — this single execSync was the outlier. Test: `gstack-memory-ingest security: untrusted cwd cannot trigger shell substitution` plants a Claude-Code-shaped JSONL with cwd=`$(touch <marker>)` and asserts the marker file is not created after `--incremental --quiet`. Negative control: with the patch reverted, the test fails (marker created); with the patch applied, it passes (18/18 in test/gstack-memory-ingest.test.ts). security: gate domain-skill auto-promote on classifier_score > 0 `browse/src/domain-skill-commands.ts:140` (handleSave) writes `classifier_score: 0` with the comment "L4 deferred to load-time / sidebar-agent fills this in on first prompt-injection load." But CLAUDE.md "Sidebar architecture" documents that sidebar-agent.ts was ripped, and grep for recordSkillUse + classifierFlagged callers across browse/src/ returns zero hits outside the module under test. Net effect: every quarantined skill that survives three benign uses without flag (`recordSkillUse(... , classifierFlagged: false)` x3) auto-promotes to `active` and lands in prompt context wrapped as UNTRUSTED on every subsequent visit to that host. The L4 score that was supposed to gate the promotion was never written — the production save path puts 0 on disk and nothing later updates it. Threat model: a domain-skill body authored by an agent under the influence of a poisoned page (the new `gstackInjectToTerminal` PTY path runs no L1-L3 either) would lose its auto-promote barrier after three uses. The exploit isn't single-step but the bar is exactly N=3 prompt-injection-shaped uses on a hostile page, which is well within reach. Fix adds a single condition to the auto-promote gate in `recordSkillUse`: if (state === 'quarantined' && useCount >= PROMOTE_THRESHOLD && flagCount === 0 && current.classifier_score > 0) { state = 'active'; } `classifier_score` is set once at writeSkill and never updated. Production saves it as 0 (handleSave), so the gate stays closed; existing tests that explicitly pass `classifierScore: 0.1` still auto-promote (the auto-promote path is preserved for the day L4 is rewired). Manual promotion via `domain-skill promote-to-global` is unaffected (it goes through `promoteToGlobal` which has its own state-machine guard at line 337+). Test: new regression case `does NOT auto-promote when classifier_score is 0 (production handleSave shape)` plants a skill with classifierScore=0 (matches domain-skill-commands.ts:140), runs three uses without flag, asserts the skill stays quarantined and readSkill returns null. Negative control: revert the patch, the test fails with `Received: "active"`. With the patch: 15/15 pass. * fix(ship): port #1302 SKILL.md edits to .tmpl + resolver source PR #1302 added Verification Mode + UNVERIFIABLE classification + per-item confirmation gate to ship/SKILL.md, but only the generated SKILL.md was edited — not the .tmpl source or scripts/resolvers/review.ts. The next `bun run gen:skill-docs` run would have wiped the changes. Port the same content into the resolver and .tmpl so regeneration produces the intended output. * ci(windows): extend free-tests lane to cover icacls + Bun.which resolvers from fix-wave PRs Closes #1306/#1307/#1308 validation gap. The four newly-added test files already have process.platform guards so they run safely on both POSIX and Windows lanes — only platform-relevant assertions execute on each. Tests added to the windows-latest lane: - browse/test/file-permissions.test.ts (#1308 icacls + writeSecureFile) - browse/test/security.test.ts (#1306 bash.exe wrap pure-function path) - make-pdf/test/browseClient.test.ts (#1307 Bun.which browse resolver) - make-pdf/test/pdftotext.test.ts (#1307 Bun.which pdftotext resolver) * test(codex): live flag-semantics smoke for codex exec resume Closes #1270's regex-only test gap. PR #1270 asserted that codex/SKILL.md's `codex exec resume` invocation drops -C/-s and uses sandbox_mode config. That regex catches the skill template regressing, but not codex CLI itself flipping flag semantics again. This test probes `codex exec resume --help` and asserts the surface gstack relies on: -c/sandbox_mode is accepted, top-level -C is absent. Skips silently when codex isn't on PATH, so dev machines without codex installed never see it fail. * chore: regen SKILL.md after fix wave One regen commit at the end of the merge wave per the plan. plan-devex-review loses the contradictory plan-mode handshake (#1333). review/SKILL.md picks up the Verification Mode + UNVERIFIABLE classification additions that #1302 authored against ship/SKILL.md (same resolver shared between ship and review modes). * fix(server.ts): keep fs.writeFileSync for state-file writes #1308's writeSecureFile wrapper added Windows icacls hardening for the 4 state-file write sites in server.ts, but #1310's regression test grep's for fs.writeFileSync(tmpStatePath()) calls. The two changes are technically compatible only if the test relaxes — keeping the test strict (the safer choice for catching regressions on the cold-start race) means the 4 state- file sites stay on fs.writeFileSync(..., { mode: 0o600 }). POSIX 0o600 hardening is preserved on those 4 sites. Windows icacls hardening still applies to all the other writeSecureFile call sites #1308 added (auth.json, mkdirSecure, etc.). Also refreshes golden baselines after #1302 / port + minor wording tweak in scripts/resolvers/review.ts to keep gen-skill-docs.test.ts assertion 'Cite the specific file' satisfied. * v1.30.0.0: fix wave — 21 community PRs + 2 closing fixes for Windows + codex CI gaps Headline release. Browse stops dropping console logs, cold-start race fixed, codex resume works without python3, Windows hardening (icacls + Bun.which + bash.exe wrap), ship gate gets VAS-449 remediation, two closing fixes that put icacls/Bun.which/codex flag semantics under CI. * test(domain-skills): cover #1369 classifier_score=0 quarantine + score>0 promote path The pre-existing T6 test seeded skills via writeSkill (which defaults classifier_score to 0 until L4 is rewired) and then expected 3 uses to auto-promote. PR #1369 added `current.classifier_score > 0` to the gate specifically to block that path — a quarantined skill written under the influence of a poisoned page would otherwise auto-promote after three benign uses. Updated test asserts both halves of the new contract: - classifier_score=0 + 3 uses → stays quarantined (the security guarantee) - classifier_score>0 + 3 more uses → promotes to active (unblock path) Catches both regressions: the gate going away (would re-allow the bypass) and the unblock path breaking (would silently quarantine all skills forever once L4 is rewired). --------- Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: orbisai0security <mediratta01.pally@gmail.com> Co-authored-by: Bryce Alan <brycealan.eth@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Terry Carson YM <cym3118288@gmail.com> Co-authored-by: Vasko Ckorovski <vckorovski@gmail.com> Co-authored-by: Samuel Carson <samuel.carson@gmail.com> Co-authored-by: Yashwant Kotipalli <yashwant7kotipalli@gmail.com> Co-authored-by: Jasper Chen <jasperchen925@gmail.com> Co-authored-by: Stefan Neamtu <stefan.neamtu@gmail.com> Co-authored-by: 陈家名 <chenjiaming@kezaihui.com> Co-authored-by: Abigail Atheryon <abi@atheryon.ai> Co-authored-by: Furkan Köykıran <furkankoykiran@gmail.com> Co-authored-by: gus <gustavoraularagon@gmail.com>	2026-05-09 08:06:47 -07:00
Garry Tan	ed1e4be2f6	feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216 ) * build: vendor xterm@5 for the Terminal sidebar tab Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm` build step that copies the assets into `extension/lib/` at build time. The vendored files are .gitignored so the npm version stays the source of truth. xterm@5 is eval-free, so no MV3 CSP changes needed. No runtime callers yet — this just stages the assets. * feat(server): add pty-session-cookie module for the Terminal tab Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly cookies for authenticating the Terminal-tab WebSocket upgrade against the terminal-agent. Same TTL, same opportunistic-pruning shape, same "scoped tokens never valid as root" invariant. Two registries instead of one because the cookie names are different (`gstack_sse` vs `gstack_pty`) and the token spaces must not overlap. No callers yet — wired up in the next commit. * feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab) Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun non-compiled process. Lives separately from `sidebar-agent.ts` so a WS-framing or PTY-cleanup bug can't take down the chat path (codex outside-voice review caught the coupling risk). Architecture: - Bun.serve on 127.0.0.1:0 (never tunneled). - POST /internal/grant accepts cookie tokens from the parent server over loopback, authenticated with a per-boot internal token. - GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and (b) the gstack_pty cookie minted by /pty-session. Either gate alone is insufficient (CSWSH defense + auth defense). - Lazy spawn: claude PTY is not started until the WS receives its first data frame. Idle sidebar opens cost nothing. - Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at impl time on Bun 1.3.10. proc.terminal.write() for input, proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback on close. - process.on('uncaughtException'\|'unhandledRejection') handlers so a framing bug logs but doesn't kill the listener loop. Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration tests spawn /bin/bash instead of requiring claude on every CI runner. Not yet spawned by anything — wired in the next commit. * feat(server): wire /pty-session route + spawn terminal-agent Server-side glue connecting the Terminal sidebar tab to the new terminal-agent process. server.ts: - New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie to the extension. - /health response gains `terminalPort` (just the port number — never a shell token). Tokens flow via the cookie path, never /health, because /health already surfaces AUTH_TOKEN to localhost callers in headed mode (that's a separate v1.1+ TODO). - /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS, so the dual-listener tunnel surface 404s by default-deny. - Shutdown path now also pkills terminal-agent and unlinks its state files (terminal-port + terminal-internal-token) so a reconnect doesn't try to hit a dead port. cli.ts: - After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn fails — chat still works without the terminal agent. * feat(extension): Terminal as default sidebar tab Adds a primary tab bar (Terminal \| Chat) above the existing tab-content panes. Terminal is the default-active tab; clicking Chat returns to the existing claude -p one-shot flow which is preserved verbatim. manifest.json: adds ws://127.0.0.1:/ to host_permissions so MV3 doesn't block the WebSocket upgrade. sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a "Press any key to start Claude Code" bootstrap card, claude-not-found install card, xterm mount point, and "session ended" restart UI. Loads xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no longer the .active default. sidepanel.js: new activePrimaryPaneId() helper that reads which primary tab is selected. Debug-close paths now route back to whichever primary pane is active (was hardcoded to tab-chat). Primary-tab click handler toggles .active classes and aria-selected. window.gstackServerPort and window.gstackAuthToken exposed so sidepanel-terminal.js can build the /pty-session POST and the WS URL. sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first keystroke fires POST /pty-session, then opens ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically by the browser. Resize observer sends {type:"resize"} text frames. ResizeObserver, tab-switch hooks, restart button, install-card retry. On WS close shows "Session ended, click to restart" — no auto-reconnect (codex outside-voice flagged that as session-burning). sidepanel.css: primary-tabs bar + Terminal pane styling (full-height xterm container, install card, ended state). test: terminal-agent + cookie module + sidebar default-tab regression Three new test files: terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/ revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure since 127.0.0.1 over HTTP), source-level guards that /pty-session and /terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant (spawnClaude is called from message handler, not upgrade), uncaughtException/ unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup. terminal-agent-integration.test.ts (7 tests): spawns the agent as a real subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no cookie → 401), real WebSocket round-trip with /bin/bash via the BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it back), and resize message acceptance. sidebar-tabs.test.ts (13 tests): structural regression suite locking the load-bearing invariants of the default-tab change — Terminal is .active, Chat is not, xterm assets are loaded, debug-close path no longer hardcodes tab-chat (uses activePrimaryPaneId), primary-tab click handler exists, chat surface is not accidentally deleted, terminal JS does NOT auto- reconnect on close, manifest declares ws:// + http:// localhost host permissions, no unsafe-eval. Plan called for Playwright + extension regression; the codebase doesn't ship Playwright extension launcher infra, so we follow the existing extension-test pattern (source-level structural assertions). Same load-bearing intent — locks the invariants before they regress. * docs: Terminal flow + threat model + v1.1 follow-ups SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS upgrade path (/pty-session cookie mint → /ws Origin + cookie gate → lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session, gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback), and the threat-model boundary — the Terminal tab bypasses the entire prompt-injection security stack on purpose; user keystrokes are the trust source. That trust assumption is load-bearing on three transport guarantees: local-only listener, Origin gate, cookie auth. Drop any one of those three and the tab becomes unsafe. CLAUDE.md: extends the "Sidebar architecture" note to include terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is its own process" note so a future contributor doesn't bolt PTY logic onto sidebar-agent.ts. TODOS.md: three new follow-ups under a new "Sidebar Terminal" section: - v1.1: PTY session survives sidebar reload (Issue 1C deferred). - v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 — a pre-existing soft leak that cc-pty-import sidesteps but doesn't fix). - v1.1+: apply terminal-agent's process.on exception handlers to sidebar-agent.ts (codex finding #4 — chat path has no fatal handlers). * feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip The chat queue path is gone. The Chrome side panel is now just an interactive claude PTY in xterm.js. Activity / Refs / Inspector still exist behind the `debug` toggle in the footer. Three threads of change, all from dogfood iteration on top of cc-pty-import: 1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol - Browsers can't set Authorization on a WebSocket upgrade. We had been minting an HttpOnly gstack_pty cookie via /pty-session, but SameSite=Strict cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The WS opened then immediately closed → "Session ended." - /pty-session now also returns ptySessionToken in the JSON body. - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`. Browser sends Sec-WebSocket-Protocol on the upgrade. - Agent reads the protocol header, validates against validTokens, and MUST echo the protocol back (Chromium closes the connection immediately if a server doesn't pick one of the offered protocols). - Cookie path is kept as a fallback for non-browser callers (curl, integration tests). - New integration test exercises the full protocol-auth round-trip via raw fetch+Upgrade so a future regression of this exact class fails in CI. 2. fix(extension): UX polish on the Terminal pane - Eager auto-connect when the sidebar opens — no "Press any key to start" friction every reload. - Always-visible ↻ Restart button in the terminal toolbar (not gated on the ENDED state) so the user can force a fresh claude mid-session. - MutationObserver on #tab-terminal's class attribute drives a fitAddon.fit() + term.refresh() when the pane becomes visible again — xterm doesn't auto-redraw after display:none → display:flex. 3. feat(extension): rip the chat tab + sidebar-agent.ts - Sidebar is Terminal-only. No more Terminal \| Chat primary nav. - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat, /sidebar-agent/event, /sidebar-tabs* and friends all deleted. - The pickSidebarModel router (sonnet vs opus) is gone — the live PTY uses whatever model the user's `claude` CLI is configured with. - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive in the Terminal toolbar. Cleanup now injects its prompt into the live PTY via window.gstackInjectToTerminal — no more /sidebar-command POST. The Inspector "Send to Code" action uses the same injection path. - clear-chat button removed from the footer. - sidepanel.js shed ~900 lines of chat polling, optimistic UI, stop-agent, etc. Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27 structural assertions locking the new layout — Terminal sole pane, no chat input, quick-actions in toolbar, eager-connect, MutationObserver repaint, restart helper. * feat: live tab awareness for the Terminal pane claude in the PTY now has continuous tab-aware context. Three pieces: 1. Live state files. background.js listens to chrome.tabs.onActivated / onCreated / onRemoved / onUpdated (throttled to URL/title/status== complete so loading spinners don't spam) and pushes a snapshot. The sidepanel relays it as a custom event; sidepanel-terminal.js sends {type:"tabState"} text frames over the live PTY WebSocket. terminal-agent.ts writes: <stateDir>/tabs.json all open tabs (id, url, title, active, pinned, audible, windowId) <stateDir>/active-tab.json current active tab (skips chrome:// and chrome-extension:// internal pages) Atomic write via tmp + rename so claude never reads a half-written document. A fresh snapshot is pushed on WS open so the files exist by the time claude finishes booting. 2. New $B tab-each <command> [args...] meta-command. Fans out a single command across every open tab, returns {command, args, total, results: [{tabId, url, title, status, output}]}. Skips chrome:// pages; restores the originally active tab in a finally block (so a mid-batch error doesn't leave the user looking at a different tab); uses bringToFront: false so the OS window doesn't jump on every fanout. Scope-checks the inner command BEFORE the loop. 3. --append-system-prompt hint at spawn time. Claude is told about both the state files and the $B tab-each command up front, so it doesn't have to discover the surface by trial. Passed via the --append-system- prompt CLI flag, NOT as a leading PTY write — the hint stays out of the visible transcript. Tests: - browse/test/tab-each.test.ts (new) — registration + source-level invariants (scope check before loop, finally-restore, bringToFront:false, chrome:// skip) + behavior tests with a mock BrowserManager that verify iteration order, JSON shape, error handling, and active-tab restore. - browse/test/terminal-agent.test.ts — three new assertions for tabState handler shape, atomic-write pattern, and the --append-system-prompt wiring at spawn. Verified live: opened 5 tabs, ran $B tab-each url against the live server, got per-tab JSON results back, original active tab restored without OS focus stealing. * chore: drop sidebar-agent test refs after chat rip Five test files / describe blocks targeted the deleted chat path: - browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E with mock claude — whole file gone) - browse/test/security-review-fullstack.test.ts (review-flow E2E with real classifier — whole file gone) - browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for the security event banner that was ripped from sidepanel.html) - browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue permissions, isValidQueueEntry stateFile traversal, loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy guard, /sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation, AGENT_SRC top-level read converted to graceful fallback) - browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split detection on detectCanaryLeak; one tool-output test on sidebar-agent) - test/skill-validation.test.ts (sidebar agent #584 describe block) These all assumed sidebar-agent.ts existed and tested chat-queue plumbing, chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier canary detection. With the live PTY there is no chat queue, no chat tab, no LLM stream to canary-scan, and no per-message subprocess. The Terminal pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts (27 structural assertions), browse/test/terminal-agent.test.ts, and browse/test/terminal-agent-integration.test.ts. bun test → exit 0, 0 failures. * chore: bump version and changelog (v1.14.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(extension): xterm fills the full Terminal panel height The Terminal pane only rendered into the top portion of the panel — most of the panel below the prompt was an empty black gap. Three layered issues, all about xterm.js measuring dimensions during a layout state that wasn't ready yet: 1. order-of-operations in connect(): ensureXterm() ran BEFORE setState(LIVE), so term.open() measured els.mount while it was still display:none. xterm caches a 0-size viewport synchronously inside open() and never auto-recovers when the container goes visible. Flipped: setState(LIVE) → ensureXterm. 2. first fit() ran synchronously before the browser had applied the .active class transition. Wrapped in requestAnimationFrame so layout has settled before fit() reads clientHeight. 3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and the lack of `min-height: 0` on .terminal-mount meant the item couldn't shrink below content size. flex:1 then refused to expand into available space and xterm rendered into whatever its initial 2x2 measurement happened to be. Fixes: - extension/sidepanel-terminal.js: reorder + RAF fit - extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` + `min-height: 0` + `position: relative`. #tab-terminal overrides .tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has its own viewport scroll; the parent shouldn't compete) and explicitly re-declares `display: flex; flex-direction: column` for #tab-terminal.active. bun test browse/test/sidebar-tabs.test.ts → 27/27 pass. Manually verified: side panel opens → Terminal fills full panel height, xterm scrollback works, debug-tab toggle still repaints correctly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 22:52:15 -07:00