gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Author	SHA1	Message	Date
Garry Tan	ed1e4be2f6	feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216 ) * build: vendor xterm@5 for the Terminal sidebar tab Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm` build step that copies the assets into `extension/lib/` at build time. The vendored files are .gitignored so the npm version stays the source of truth. xterm@5 is eval-free, so no MV3 CSP changes needed. No runtime callers yet — this just stages the assets. * feat(server): add pty-session-cookie module for the Terminal tab Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly cookies for authenticating the Terminal-tab WebSocket upgrade against the terminal-agent. Same TTL, same opportunistic-pruning shape, same "scoped tokens never valid as root" invariant. Two registries instead of one because the cookie names are different (`gstack_sse` vs `gstack_pty`) and the token spaces must not overlap. No callers yet — wired up in the next commit. * feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab) Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun non-compiled process. Lives separately from `sidebar-agent.ts` so a WS-framing or PTY-cleanup bug can't take down the chat path (codex outside-voice review caught the coupling risk). Architecture: - Bun.serve on 127.0.0.1:0 (never tunneled). - POST /internal/grant accepts cookie tokens from the parent server over loopback, authenticated with a per-boot internal token. - GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and (b) the gstack_pty cookie minted by /pty-session. Either gate alone is insufficient (CSWSH defense + auth defense). - Lazy spawn: claude PTY is not started until the WS receives its first data frame. Idle sidebar opens cost nothing. - Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at impl time on Bun 1.3.10. proc.terminal.write() for input, proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback on close. - process.on('uncaughtException'\|'unhandledRejection') handlers so a framing bug logs but doesn't kill the listener loop. Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration tests spawn /bin/bash instead of requiring claude on every CI runner. Not yet spawned by anything — wired in the next commit. * feat(server): wire /pty-session route + spawn terminal-agent Server-side glue connecting the Terminal sidebar tab to the new terminal-agent process. server.ts: - New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie to the extension. - /health response gains `terminalPort` (just the port number — never a shell token). Tokens flow via the cookie path, never /health, because /health already surfaces AUTH_TOKEN to localhost callers in headed mode (that's a separate v1.1+ TODO). - /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS, so the dual-listener tunnel surface 404s by default-deny. - Shutdown path now also pkills terminal-agent and unlinks its state files (terminal-port + terminal-internal-token) so a reconnect doesn't try to hit a dead port. cli.ts: - After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn fails — chat still works without the terminal agent. * feat(extension): Terminal as default sidebar tab Adds a primary tab bar (Terminal \| Chat) above the existing tab-content panes. Terminal is the default-active tab; clicking Chat returns to the existing claude -p one-shot flow which is preserved verbatim. manifest.json: adds ws://127.0.0.1:/ to host_permissions so MV3 doesn't block the WebSocket upgrade. sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a "Press any key to start Claude Code" bootstrap card, claude-not-found install card, xterm mount point, and "session ended" restart UI. Loads xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no longer the .active default. sidepanel.js: new activePrimaryPaneId() helper that reads which primary tab is selected. Debug-close paths now route back to whichever primary pane is active (was hardcoded to tab-chat). Primary-tab click handler toggles .active classes and aria-selected. window.gstackServerPort and window.gstackAuthToken exposed so sidepanel-terminal.js can build the /pty-session POST and the WS URL. sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first keystroke fires POST /pty-session, then opens ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically by the browser. Resize observer sends {type:"resize"} text frames. ResizeObserver, tab-switch hooks, restart button, install-card retry. On WS close shows "Session ended, click to restart" — no auto-reconnect (codex outside-voice flagged that as session-burning). sidepanel.css: primary-tabs bar + Terminal pane styling (full-height xterm container, install card, ended state). test: terminal-agent + cookie module + sidebar default-tab regression Three new test files: terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/ revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure since 127.0.0.1 over HTTP), source-level guards that /pty-session and /terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant (spawnClaude is called from message handler, not upgrade), uncaughtException/ unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup. terminal-agent-integration.test.ts (7 tests): spawns the agent as a real subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no cookie → 401), real WebSocket round-trip with /bin/bash via the BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it back), and resize message acceptance. sidebar-tabs.test.ts (13 tests): structural regression suite locking the load-bearing invariants of the default-tab change — Terminal is .active, Chat is not, xterm assets are loaded, debug-close path no longer hardcodes tab-chat (uses activePrimaryPaneId), primary-tab click handler exists, chat surface is not accidentally deleted, terminal JS does NOT auto- reconnect on close, manifest declares ws:// + http:// localhost host permissions, no unsafe-eval. Plan called for Playwright + extension regression; the codebase doesn't ship Playwright extension launcher infra, so we follow the existing extension-test pattern (source-level structural assertions). Same load-bearing intent — locks the invariants before they regress. * docs: Terminal flow + threat model + v1.1 follow-ups SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS upgrade path (/pty-session cookie mint → /ws Origin + cookie gate → lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session, gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback), and the threat-model boundary — the Terminal tab bypasses the entire prompt-injection security stack on purpose; user keystrokes are the trust source. That trust assumption is load-bearing on three transport guarantees: local-only listener, Origin gate, cookie auth. Drop any one of those three and the tab becomes unsafe. CLAUDE.md: extends the "Sidebar architecture" note to include terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is its own process" note so a future contributor doesn't bolt PTY logic onto sidebar-agent.ts. TODOS.md: three new follow-ups under a new "Sidebar Terminal" section: - v1.1: PTY session survives sidebar reload (Issue 1C deferred). - v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 — a pre-existing soft leak that cc-pty-import sidesteps but doesn't fix). - v1.1+: apply terminal-agent's process.on exception handlers to sidebar-agent.ts (codex finding #4 — chat path has no fatal handlers). * feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip The chat queue path is gone. The Chrome side panel is now just an interactive claude PTY in xterm.js. Activity / Refs / Inspector still exist behind the `debug` toggle in the footer. Three threads of change, all from dogfood iteration on top of cc-pty-import: 1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol - Browsers can't set Authorization on a WebSocket upgrade. We had been minting an HttpOnly gstack_pty cookie via /pty-session, but SameSite=Strict cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The WS opened then immediately closed → "Session ended." - /pty-session now also returns ptySessionToken in the JSON body. - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`. Browser sends Sec-WebSocket-Protocol on the upgrade. - Agent reads the protocol header, validates against validTokens, and MUST echo the protocol back (Chromium closes the connection immediately if a server doesn't pick one of the offered protocols). - Cookie path is kept as a fallback for non-browser callers (curl, integration tests). - New integration test exercises the full protocol-auth round-trip via raw fetch+Upgrade so a future regression of this exact class fails in CI. 2. fix(extension): UX polish on the Terminal pane - Eager auto-connect when the sidebar opens — no "Press any key to start" friction every reload. - Always-visible ↻ Restart button in the terminal toolbar (not gated on the ENDED state) so the user can force a fresh claude mid-session. - MutationObserver on #tab-terminal's class attribute drives a fitAddon.fit() + term.refresh() when the pane becomes visible again — xterm doesn't auto-redraw after display:none → display:flex. 3. feat(extension): rip the chat tab + sidebar-agent.ts - Sidebar is Terminal-only. No more Terminal \| Chat primary nav. - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat, /sidebar-agent/event, /sidebar-tabs* and friends all deleted. - The pickSidebarModel router (sonnet vs opus) is gone — the live PTY uses whatever model the user's `claude` CLI is configured with. - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive in the Terminal toolbar. Cleanup now injects its prompt into the live PTY via window.gstackInjectToTerminal — no more /sidebar-command POST. The Inspector "Send to Code" action uses the same injection path. - clear-chat button removed from the footer. - sidepanel.js shed ~900 lines of chat polling, optimistic UI, stop-agent, etc. Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27 structural assertions locking the new layout — Terminal sole pane, no chat input, quick-actions in toolbar, eager-connect, MutationObserver repaint, restart helper. * feat: live tab awareness for the Terminal pane claude in the PTY now has continuous tab-aware context. Three pieces: 1. Live state files. background.js listens to chrome.tabs.onActivated / onCreated / onRemoved / onUpdated (throttled to URL/title/status== complete so loading spinners don't spam) and pushes a snapshot. The sidepanel relays it as a custom event; sidepanel-terminal.js sends {type:"tabState"} text frames over the live PTY WebSocket. terminal-agent.ts writes: <stateDir>/tabs.json all open tabs (id, url, title, active, pinned, audible, windowId) <stateDir>/active-tab.json current active tab (skips chrome:// and chrome-extension:// internal pages) Atomic write via tmp + rename so claude never reads a half-written document. A fresh snapshot is pushed on WS open so the files exist by the time claude finishes booting. 2. New $B tab-each <command> [args...] meta-command. Fans out a single command across every open tab, returns {command, args, total, results: [{tabId, url, title, status, output}]}. Skips chrome:// pages; restores the originally active tab in a finally block (so a mid-batch error doesn't leave the user looking at a different tab); uses bringToFront: false so the OS window doesn't jump on every fanout. Scope-checks the inner command BEFORE the loop. 3. --append-system-prompt hint at spawn time. Claude is told about both the state files and the $B tab-each command up front, so it doesn't have to discover the surface by trial. Passed via the --append-system- prompt CLI flag, NOT as a leading PTY write — the hint stays out of the visible transcript. Tests: - browse/test/tab-each.test.ts (new) — registration + source-level invariants (scope check before loop, finally-restore, bringToFront:false, chrome:// skip) + behavior tests with a mock BrowserManager that verify iteration order, JSON shape, error handling, and active-tab restore. - browse/test/terminal-agent.test.ts — three new assertions for tabState handler shape, atomic-write pattern, and the --append-system-prompt wiring at spawn. Verified live: opened 5 tabs, ran $B tab-each url against the live server, got per-tab JSON results back, original active tab restored without OS focus stealing. * chore: drop sidebar-agent test refs after chat rip Five test files / describe blocks targeted the deleted chat path: - browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E with mock claude — whole file gone) - browse/test/security-review-fullstack.test.ts (review-flow E2E with real classifier — whole file gone) - browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for the security event banner that was ripped from sidepanel.html) - browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue permissions, isValidQueueEntry stateFile traversal, loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy guard, /sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation, AGENT_SRC top-level read converted to graceful fallback) - browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split detection on detectCanaryLeak; one tool-output test on sidebar-agent) - test/skill-validation.test.ts (sidebar agent #584 describe block) These all assumed sidebar-agent.ts existed and tested chat-queue plumbing, chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier canary detection. With the live PTY there is no chat queue, no chat tab, no LLM stream to canary-scan, and no per-message subprocess. The Terminal pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts (27 structural assertions), browse/test/terminal-agent.test.ts, and browse/test/terminal-agent-integration.test.ts. bun test → exit 0, 0 failures. * chore: bump version and changelog (v1.14.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(extension): xterm fills the full Terminal panel height The Terminal pane only rendered into the top portion of the panel — most of the panel below the prompt was an empty black gap. Three layered issues, all about xterm.js measuring dimensions during a layout state that wasn't ready yet: 1. order-of-operations in connect(): ensureXterm() ran BEFORE setState(LIVE), so term.open() measured els.mount while it was still display:none. xterm caches a 0-size viewport synchronously inside open() and never auto-recovers when the container goes visible. Flipped: setState(LIVE) → ensureXterm. 2. first fit() ran synchronously before the browser had applied the .active class transition. Wrapped in requestAnimationFrame so layout has settled before fit() reads clientHeight. 3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and the lack of `min-height: 0` on .terminal-mount meant the item couldn't shrink below content size. flex:1 then refused to expand into available space and xterm rendered into whatever its initial 2x2 measurement happened to be. Fixes: - extension/sidepanel-terminal.js: reorder + RAF fit - extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` + `min-height: 0` + `position: relative`. #tab-terminal overrides .tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has its own viewport scroll; the parent shouldn't compete) and explicitly re-declares `display: flex; flex-direction: column` for #tab-terminal.active. bun test browse/test/sidebar-tabs.test.ts → 27/27 pass. Manually verified: side panel opens → Terminal fills full panel height, xterm scrollback works, debug-tab toggle still repaints correctly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 22:52:15 -07:00
Garry Tan	c15b805cd8	feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 ) * feat(browse): TabSession loadedHtml + command aliases + DX polish primitives Adds the foundation layer for Puppeteer-parity features: - TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml — enables load-html content to survive context recreation (viewport --scale) via in-memory replay. ASCII lifecycle diagram in the source explains the clear-before-navigation contract. - COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth for name aliases (setcontent / set-content / setContent → load-html), consumed by server dispatch and chain prevalidation. - buildUnknownCommandError() pure function — rich error messages with Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints. - load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write tokens can use it. - screenshot and viewport descriptions updated for upcoming flags. - New browse/test/dx-polish.test.ts (15 tests): alias canonicalization, Levenshtein threshold + alphabetical tiebreak, short-input guard, NEW_IN_VERSION upgrade hint, alias + scope integration invariants. No consumers yet — pure additive foundation. Safe to bisect on its own. * feat(browse): accept file:// in goto with smart cwd/home-relative parsing Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs (cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a new normalizeFileUrl() helper that handles non-standard relative forms BEFORE the WHATWG URL parser sees them: file:///abs/path.html → unchanged file://./docs/page.html → file://<cwd>/docs/page.html file://~/Documents/page.html → file://<HOME>/Documents/page.html file://docs/page.html → file://<cwd>/docs/page.html file://localhost/abs/path → unchanged file://host.example.com/... → rejected (UNC/network) file:// and file:/// → rejected (would list a directory) Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x, file://[::1]/x, and file://C:/Users/x are explicit errors. Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so URL escapes like %20 decode correctly and Node rejects encoded-slash traversal (%2F..%2F) outright. Signature change: validateNavigationUrl now returns Promise<string> (the normalized URL) instead of Promise<void>. Existing callers that ignore the return value still compile — they just don't benefit from smart-parsing until updated in follow-up commits. Callers will be migrated in the next few commits (goto, diff, newTab, restoreState). Rewrites the url-validation test file: updates existing tests for the new return type, adds 20+ new tests covering every normalizeFileUrl shape variant, URL-encoding edge cases, and path-traversal rejection. References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath. * feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing Three tightly-coupled changes to BrowserManager, all in service of the Puppeteer-parity workflow: 1. deviceScaleFactor + currentViewport tracking. New private fields (default scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method. deviceScaleFactor is a context-level Playwright option — changing it requires recreateContext(). The method validates (finite number, 1-3 cap, headed-mode rejected), stores new values, calls recreateContext(), and rolls back the fields on failure so a bad call doesn't leave inconsistent state. Context options at all three sites (launch, recreate happy path, recreate fallback) now honor the stored values instead of hardcoding 1280x720. 2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab loadedHtml from the session; restoreState replays it via newSession. setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml is rehydrated and survives subsequent scale changes. In-memory only, never persisted to disk (HTML may contain secrets or customer data). 3. newTab + restoreState now consume validateNavigationUrl's normalized return value. file://./x, file://~/x, and bare-segment forms now take effect at every navigation site, not just the top-level goto command. Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5 → screenshot, with content surviving both context recreations. Codex v2 P0 flagged that bare page.setContent in restoreState would lose content on the second scale change — this commit implements the rehydration path. References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller return value), plan Feature 3 + Feature 4. * feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch Wires the new handlers and dispatch logic that the previous commits made possible: write-commands.ts - New 'load-html' case: validateReadPath for safe-dir scoping, stat-based actionable errors (not found, directory, oversize), extension allowlist (.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like <div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES override, frame-context rejection. Calls session.setTabContent() so replay metadata is rehydrated. - viewport command extended: optional [<WxH>], optional [--scale <n>], scale-only variant reads current size via page.viewportSize(). Invalid scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed mode rejected explicitly. - clearLoadedHtml() called BEFORE goto/back/forward/reload navigation (not after) so a timed-out goto post-commit doesn't leave stale metadata that could resurrect on a later context recreation. Codex v2 P1 catch. - goto uses validateNavigationUrl's normalized return value. meta-commands.ts - screenshot --selector <css> flag: explicit element-screenshot form. Rejects alongside positional selector (both = error), preserves --clip conflict at line 161, composes with --base64 at lines 168-174. - chain canonicalizes each step with canonicalizeCommand — step shape is now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has, watch blocking, and result labels all use canonical names while audit labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape only canonicalized at prevalidation and diverged everywhere else. - diff command consumes validateNavigationUrl return value for both URLs. server.ts - Command canonicalization inserted immediately after parse, before scope / watch / tab-ownership / content-wrapping checks. rawCommand preserved for future audit (not wired into audit log in this commit — follow-up). - Unknown-command handler replaced with buildUnknownCommandError() from commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional upgrade hint for NEW_IN_VERSION entries. security-audit-r2.test.ts - Updated chain-loop marker from 'for (const cmd of commands)' to 'for (const c of commands)' to match the new chain step shape. Same isWatching + BLOCKED invariants still asserted. * chore: bump version and changelog (v1.1.0.0) - VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands) - package.json: matching version bump - CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector, viewport --scale, file:// support, setContent replay, and DX polish in user voice with a dedicated Security section for file:// safe-dirs policy - browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13 "Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by- side API mapping and a worked tweet-renderer migration example - browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs` to reflect the new command descriptions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (9 findings from specialist + adversarial review) Adversarial review (Claude subagent + Codex) surfaced 9 bugs across CRITICAL/HIGH severity. All fixed: 1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent await. Prior order left phantom HTML in replay metadata if setContent threw (timeout, browser crash), which a later viewport --scale would silently replay. Now loadedHtml is only recorded on successful load. 2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second recreateContext after restoring the old fields. The fallback path in the original recreateContext builds a blank context using whatever this.deviceScaleFactor/currentViewport hold at that moment (which were the NEW values we were trying to apply). Rolling back the fields without a second recreate left the live context at new-scale while state tracked old-scale. Now: restore fields, force re-recreate with old values, only if that ALSO fails do we return a combined error. 3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted alphabetically, so first equal-distance wins by default. The prior '(d === bestDist && best !== undefined && cand < best)' clause was dead code. 4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just refs + frame. Without this, a user who load-html'd then clicked a link (or had a form submit / JS redirect / OAuth flow) would retain the stale replay metadata. The next viewport --scale would silently revert the tab to the ORIGINAL loaded HTML, losing whatever the post-navigation content was. Silent data corruption. Browser-emitted navigations trigger this path via wirePageEvents. 5. browser-manager.ts:saveState + restoreState — tab ownership now flows through BrowserState.owner. Without this, a scoped agent's viewport --scale would strand them: tab IDs change during recreate, ownership map held stale IDs, owner lookup failed. New IDs had no owner, so writes without tabId were denied (DoS). Worse, if the agent sent a stale tabId the server's swallowed-tab-switch-error path would let the command hit whatever tab was currently active (cross-tab authz bypass). Now: clear ownership before restore, re-add per-tab with new IDs. 6. meta-commands.ts:state load — disk-loaded state.pages is now explicit allowlist (url, isActive, storage:null) instead of object spread. Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a user-writable state file, letting a tampered state.json smuggle HTML past load-html's safe-dirs / extension / magic-byte / 50MB-cap validators, or forge tab ownership. Now stripped at the boundary. 7. url-validation.ts:normalizeFileUrl — preserves query string + fragment across normalization. file://./app.html?route=home#login previously resolved to a filesystem path that URL-encoded '?' as %3F and '#' as %23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs and fixture URLs with query params 404'd or loaded the wrong route. Now: split on ?/# before path resolution, reattach after. 8. url-validation.ts:validateNavigationUrl — reattaches parsed.search + parsed.hash to the normalized file:// URL. Same fix at the main validator for absolute paths that go through fileURLToPath round-trip. 9. server.ts:writeAuditEntry — audit entries now include aliasOf when the user typed an alias ('setcontent' → cmd: 'load-html', aliasOf: 'setcontent'). Previously the isAliased variable was computed but dropped, losing the raw input from the forensic trail. Completes the plan's codex v3 P2 requirement. Also added bm.getCurrentViewport() and switched 'viewport --scale'- without-size to read from it (more reliable than page.viewportSize() on headed/transition contexts). Tests pass: exit 0, no failures. Build clean. * test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases Adds 28 Playwright-integration tests that close the coverage gap flagged by the ship-workflow coverage audit (50% → expected ~80%+). load-html (12 tests): - happy path loads HTML file, page text matches - bare HTML fragments (<div>...</div>) accepted, not just full documents - missing file arg throws usage - non-.html extension rejected by allowlist - /etc/passwd.html rejected by safe-dirs policy - ENOENT path rejected with actionable "not found" error - directory target rejected - binary file (PNG magic bytes) disguised as .html rejected by magic-byte check - UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted - --wait-until networkidle exercises non-default branch - invalid --wait-until value rejected - unknown flag rejected screenshot --selector (5 tests): - --selector flag captures element, validates Screenshot saved (element) - conflicts with positional selector (both = error) - conflicts with --clip (mutually exclusive) - composes with --base64 (returns data:image/png;base64,...) - missing value throws usage viewport --scale (5 tests): - WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23) - --scale without WxH keeps current size + applies scale - non-finite value (abc) throws "not a finite number" - out-of-range (4, 0.5) throws "between 1 and 3" - missing value throws setContent replay across context recreation (3 tests): - load-html → viewport --scale 2: content survives (hits setTabContent replay path) - double cycle 2x → 1.5x: content still survives (proves TabSession rehydration) - goto after load-html clears replay: subsequent viewport --scale does NOT resurrect the stale HTML (validates the onMainFrameNavigated fix) Command aliases (2 tests): - setcontent routes to load-html via chain canonicalization - set-content (hyphenated) also routes — both end-to-end through chain dispatch Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is /var/folders/... on macOS and outside the safe-dirs boundary. Chain result labels use rawName→name format when an alias is resolved (matches the meta-commands.ts chain refactor). Full suite: exit 0, 223/223 pass. * docs: update BROWSER.md + CHANGELOG for v1.1.0.0 BROWSER.md: - Command reference table updated: goto now lists file:// support, load-html added to Navigate row, viewport flagged with --scale option, screenshot row shows --selector + --base64 flags - Screenshot modes table adds the fifth mode (element crop via --selector flag) and notes the tag-selector-not-caught-positionally gotcha - New "Retina screenshots — viewport --scale" subsection explains deviceScaleFactor mechanics, context recreation side effects, and headed-mode rejection - New "Loading local HTML — goto file:// vs load-html" subsection explains the two paths, their tradeoffs (URL state, relative asset resolution), the safe-dirs policy, extension allowlist + magic-byte sniff, 50MB cap, setContent replay across recreateContext, and the alias routing (setcontent → load-html before scope check) CHANGELOG.md (v1.1.0.0 security section expanded, no existing content removed): - State files cannot smuggle HTML or forge tab ownership (allowlist on disk-loaded page fields) - Audit log records aliasOf when a canonical command was reached via an alias (setcontent → load-html) - load-html content clears on real navigations (clicks, form submits, JS redirects) — not just explicit goto. Also notes SPA query/fragment preservation for goto file:// Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:25:33 +08:00
Garry Tan	b73f364411	feat: browser data platform for AI agents (v0.16.0.0) (#907 ) * refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 00:41:55 -07:00
Garry Tan	03973c2fab	fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 ) * fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>	2026-04-06 00:47:04 -07:00

4 Commits