Files
gstack/browse/src/meta-commands.ts
T
Garry Tan 54d4cde773 security: tunnel dual-listener + SSRF + envelope + path wave (v1.6.0.0) (#1137)
* refactor(security): loosen /connect rate limit from 3/min to 300/min

Setup keys are 24 random bytes (unbruteforceable), so a tight rate limit
does not meaningfully prevent key guessing. It exists only to cap
bandwidth, CPU, and log-flood damage from someone who discovered the
ngrok URL. A legitimate pair-agent session hits /connect once; 300/min
is 60x that pattern and never hit accidentally.

3/min caused pairing to fail on any retry flow (network blip, second
paired client) with no upside. Per-IP tracking was considered and
rejected — adds a bounded Map + LRU for defense already adequate at the
global layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add tunnel-denial-log module for attack visibility

Append-only log of tunnel-surface auth denials to
~/.gstack/security/attempts.jsonl. Gives operators visibility into who
is probing tunneled daemons so the next security wave can be driven by
real attack data instead of speculation.

Design notes:
- Async via fs.promises.appendFile. Never appendFileSync — blocking the
  event loop on every denial during a flood is what an attacker wants
  (prior learning: sync-audit-log-io, 10/10 confidence).
- In-process rate cap at 60 writes/minute globally. Excess denials are
  counted in memory but not written to disk — prevents disk DoS.
- Writes to the same ~/.gstack/security/attempts.jsonl used by the
  prompt-injection attempt log. File rotation is handled by the existing
  security pipeline (10MB, 5 generations).

No consumers in this commit; wired up in the dual-listener refactor that
follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): dual-listener tunnel architecture

The /health endpoint leaked AUTH_TOKEN to any caller that hit the ngrok
URL (spoofing chrome-extension:// origin, or catching headed mode).
Surfaced by @garagon in PR #1026; the original fix was header-inference
on the single port. Codex's outside-voice review during /plan-ceo-review
called that approach brittle (ngrok header behavior could change, local
proxies would false-positive), and pushed for the structural fix.

This is that fix. Stop making /health a root-token bootstrap endpoint on
any surface the tunnel can reach. The server now binds two HTTP
listeners when a tunnel is active. The local listener (extension, CLI,
sidebar) stays on 127.0.0.1 and is never exposed to ngrok. ngrok
forwards only to the tunnel listener, which serves only /connect
(unauth, rate-limited) and /command with a locked allowlist of
browser-driving commands. Security property comes from physical port
separation, not from header inference — a tunnel caller cannot reach
/health or /cookie-picker or /inspector because they live on a
different TCP socket.

What this commit adds to browse/src/server.ts:
  * Surface type ('local' | 'tunnel') and TUNNEL_PATHS +
    TUNNEL_COMMANDS allowlists near the top of the file.
  * makeFetchHandler(surface) factory replacing the single fetch arrow;
    closure-captures the surface so the filter that runs before route
    dispatch knows which socket accepted the request.
  * Tunnel filter at dispatch entry: 404s anything not on TUNNEL_PATHS,
    403s root-token bearers with a clear pairing hint, 401s non-/connect
    requests that lack a scoped token. Every denial is logged via
    logTunnelDenial (from tunnel-denial-log).
  * GET /connect alive probe (unauth on both surfaces) so /pair and
    /tunnel/start can detect dead ngrok tunnels without reaching
    /health — /health is no longer tunnel-reachable.
  * Lazy tunnel listener lifecycle. /tunnel/start binds a dedicated
    Bun.serve on an ephemeral port, points ngrok.forward at THAT port
    (not the local port), hard-fails on bind error (no local fallback),
    tears down cleanly on ngrok failure. BROWSE_TUNNEL=1 startup uses
    the same pattern.
  * closeTunnel() helper — single teardown path for both the ngrok
    listener and the tunnel Bun.serve listener.
  * resolveNgrokAuthtoken() helper — shared authtoken lookup across
    /tunnel/start and BROWSE_TUNNEL=1 startup (was duplicated).
  * TUNNEL_COMMANDS check in /command dispatch: on the tunnel surface,
    commands outside the allowlist return 403 with a list of allowed
    commands as a hint.
  * Probe paths in /pair and /tunnel/start migrated from /health to
    GET /connect — the only unauth path reachable on the tunnel surface
    under the new architecture.

Test updates in browse/test/server-auth.test.ts:
  * /pair liveness-verify test: assert via closeTunnel() helper instead
    of the inline `tunnelActive = false; tunnelUrl = null` lines that
    the helper subsumes.
  * /tunnel/start cached-tunnel test: same closeTunnel() adaptation.

Credit
  Derived from PR #1026 by @garagon — thanks for flagging the critical
  bug that drove the architectural rewrite. The per-request
  isTunneledRequest approach from #1026 is superseded by physical port
  separation here; the underlying report remains the root cause for the
  entire v1.6.0.0 wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): add source-level guards for dual-listener architecture

23 source-level assertions that keep future contributors from silently
widening the tunnel surface during a routine refactor. Covers:

  * Surface type + tunnelServer state variable shape
  * TUNNEL_PATHS is a closed set of /connect, /command, /sidebar-chat
    (and NOT /health, /welcome, /cookie-picker, /inspector/*, /pair,
    /token, /refs, /activity/stream, /tunnel/{start,stop})
  * TUNNEL_COMMANDS includes browser-driving ops only (and NOT
    launch-browser, tunnel-start, token-mint, cookie-import, etc.)
  * makeFetchHandler(surface) factory exists and is wired to both
    listeners with the correct surface parameter
  * Tunnel filter runs BEFORE any route dispatch, with 404/403/401
    responses and logged denials for each reason
  * GET /connect returns {alive: true} unauth
  * /command dispatch enforces TUNNEL_COMMANDS on tunnel surface
  * closeTunnel() helper tears down ngrok + Bun.serve listener
  * /tunnel/start binds on ephemeral port, points ngrok at TUNNEL_PORT
    (not local port), hard-fails on bind error (no fallback), probes
    cached tunnel via GET /connect (not /health), tears down on
    ngrok.forward failure
  * BROWSE_TUNNEL=1 startup uses the dual-listener pattern
  * logTunnelDenial wired for all three denial reasons
  * /connect rate limit is 300/min, not 3/min

All 23 tests pass. Behavioral integration tests (spawn subprocess, real
network) live in the E2E suite that lands later in this wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* security: gate download + scrape through validateNavigationUrl (SSRF)

The `goto` command was correctly wired through validateNavigationUrl,
but `download` and `scrape` called page.request.fetch(url, ...) directly.
A caller with the default write scope could hit the /command endpoint
and ask the daemon to fetch http://169.254.169.254/latest/meta-data/
(AWS IMDSv1) or the GCP/Azure/internal equivalents. The response body
comes back as base64 or lands on disk where GET /file serves it.

Fix: call validateNavigationUrl(url) immediately before each
page.request.fetch() call site in download and in the scrape loop.
Same blocklist that already protects `goto`: file://, javascript:,
data:, chrome://, cloud metadata (IPv4 all encodings, IPv6 ULA,
metadata.*.internal).

Tests: extend browse/test/url-validation.test.ts with a source-level
guard that walks every `await page.request.fetch(` call site and
asserts a validateNavigationUrl call precedes it within the same
branch. Regression trips before code review if a future refactor
drops the gate.

* security: route splitForScoped through envelope sentinel escape

The scoped-token snapshot path in snapshot.ts built its untrusted
block by pushing the raw accessibility-tree lines between the literal
`═══ BEGIN UNTRUSTED WEB CONTENT ═══` / `═══ END UNTRUSTED WEB CONTENT ═══`
sentinels. The full-page wrap path in content-security.ts already
applied a zero-width-space escape on those exact strings to prevent
sentinel injection, but the scoped path skipped it.

Net effect: a page whose rendered text contains the literal sentinel
can close the envelope early from inside untrusted content and forge
a fake "trusted" block for the LLM. That includes fabricating
interactive `@eN` references the agent will act on.

Fix:
  * Extract the zero-width-space escape into a named, exported helper
    `escapeEnvelopeSentinels(content)` in content-security.ts.
  * Have `wrapUntrustedPageContent` call it (behavior unchanged on
    that path — same bytes out).
  * Import the helper in snapshot.ts and map it over `untrustedLines`
    in the `splitForScoped` branch before pushing the BEGIN sentinel.

Tests: add a describe block in content-security.test.ts that covers
  * `escapeEnvelopeSentinels` defuses BEGIN and END markers;
  * `escapeEnvelopeSentinels` leaves normal text untouched;
  * `wrapUntrustedPageContent` still emits exactly one real envelope
    pair when hostile content contains forged sentinels;
  * snapshot.ts imports the helper;
  * the scoped-snapshot branch calls `escapeEnvelopeSentinels` before
    pushing the BEGIN sentinel (source-level regression — if a future
    refactor reorders this, the test trips).

* security: extend hidden-element detection to all DOM-reading channels

The Confusion Protocol envelope wrap (`wrapUntrustedPageContent`)
covers every scoped PAGE_CONTENT_COMMAND, but the hidden-element
ARIA-injection detection layer only ran for `text`. Other DOM-reading
channels (html, links, forms, accessibility, attrs, data, media,
ux-audit) returned their output through the envelope with no hidden-
content filter, so a page serving a display:none div that instructs
the agent to disregard prior system messages, or an aria-label that
claims to put the LLM in admin mode, leaked the injection payload on
any non-text channel. The envelope alone does not mitigate this, and
the page itself never rendered the hostile content to the human
operator.

Fix:
  * New export `DOM_CONTENT_COMMANDS` in commands.ts — the subset of
    PAGE_CONTENT_COMMANDS that derives its output from the live DOM.
    Console and dialog stay out; they read separate runtime state.
  * server.ts runs `markHiddenElements` + `cleanupHiddenMarkers` for
    every scoped command in this set. `text` keeps its existing
    `getCleanTextWithStripping` path (hidden elements physically
    stripped before the read). All other channels keep their output
    format but emit flagged elements as CONTENT WARNINGS on the
    envelope, so the LLM sees what it would otherwise have consumed
    silently.
  * Hidden-element descriptions merge into `combinedWarnings`
    alongside content-filter warnings before the wrap call.

Tests: new describe block in content-security.test.ts covering
  * `DOM_CONTENT_COMMANDS` export shape and channel membership;
  * dispatch gates on `DOM_CONTENT_COMMANDS.has(command)`, not the
    literal `text` string;
  * hiddenContentWarnings plumbs into `combinedWarnings` and reaches
    wrapUntrustedPageContent;
  * DOM_CONTENT_COMMANDS is a strict subset of PAGE_CONTENT_COMMANDS.

Existing datamarking, envelope wrap, centralized-wrapping, and chain
security suites stay green (52 pass, 0 fail).

* security: validate --from-file payload paths for parity with direct paths

The direct `load-html <file>` path runs every caller-supplied file path
through validateReadPath() so reads stay confined to SAFE_DIRECTORIES
(cwd, TEMP_DIR). The `load-html --from-file <payload.json>` shortcut
and its sibling `pdf --from-file <payload.json>` skipped that check and
went straight to fs.readFileSync(). An MCP caller that picks the
payload path (or any caller whose payload argument is reachable from
attacker-influenced text) could use --from-file as a read-anywhere
escape hatch for the safe-dirs policy.

Fix: call validateReadPath(path.resolve(payloadPath)) before readFileSync
at both sites. Error surface mirrors the direct-path branch so ops and
agent errors stay consistent.

Test coverage in browse/test/from-file-path-validation.test.ts:
  - source-level: validateReadPath precedes readFileSync in the load-html
    --from-file branch (write-commands.ts) and the pdf --from-file parser
    (meta-commands.ts)
  - error-message parity: both sites reference SAFE_DIRECTORIES

Related security audit pattern: R3 F002 (validateNavigationUrl gap on
download/scrape) and R3 F008 (markHiddenElements gap on 10 DOM commands)
were the same shape — a defense that existed on the primary code path
but not its shortcut sibling. This PR closes the same class of gap on
the --from-file shortcuts.

* fix(design): escape url.origin when injecting into served HTML

serve.ts injected url.origin into a single-quoted JS string in
the response body. A local request with a crafted Host header
(e.g. Host: "evil'-alert(1)-'x") would break out of the string
and execute JS in the 127.0.0.1:<port> origin opened by the
design board. Low severity — bound to localhost, requires a
local attacker — but no reason not to escape.

Fix: JSON.stringify(url.origin) produces a properly quoted,
escaped JS string literal in one call.

Also includes Prettier reformatting (single→double quotes,
trailing commas, line wrapping) applied by the repo's
PostToolUse formatter hook. Security change is the one line
in the HTML injection; everything else is whitespace/style.

* fix(scripts): drop shell:true from slop-diff npx invocations

spawnSync('npx', [...], { shell: true }) invokes /bin/sh -c
with the args concatenated, subjecting them to shell parsing
(word splitting, glob expansion, metacharacter interpretation).
No user input reaches these calls today, so not exploitable —
but the posture is wrong: npx + shell args should be direct.

Fix: scope shell:true to process.platform === 'win32' where
npx is actually a .cmd requiring the shell. POSIX runs the
npx binary directly with array-form args.

Also includes Prettier reformatting (single→double quotes,
trailing commas, line wrapping) applied by the repo's
PostToolUse formatter hook. Security-relevant change is just
the two shell:true -> shell: process.platform === 'win32'
lines; everything else is whitespace/style.

* security(E3): gate GSTACK_SLUG on /welcome path traversal

The /welcome handler interpolates GSTACK_SLUG directly into the filesystem
path used to locate the project-local welcome page. Without validation, a
slug like "../../etc/passwd" would resolve to
~/.gstack/projects/../../etc/passwd/designs/welcome-page-20260331/finalized.html
— classic path traversal.

Not exploitable today: GSTACK_SLUG is set by the gstack CLI at daemon launch,
and an attacker would already need local env-var access to poison it. But
the gate is one regex (^[a-z0-9_-]+$), and a defense-in-depth pass costs us
nothing when the cost of being wrong is arbitrary file read via /welcome.

Fall back to the safe 'unknown' literal when the slug fails validation —
same fallback the code already uses when GSTACK_SLUG is unset. No behavior
change for legitimate slugs (they all match the regex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* security(N1): replace ?token= SSE auth with HttpOnly session cookie

Activity stream and inspector events SSE endpoints accepted the root
AUTH_TOKEN via `?token=` query param (EventSource can't send Authorization
headers). URLs leak to browser history, referer headers, server logs,
crash reports, and refactoring accidents. Codex flagged this during the
/plan-ceo-review outside voice pass.

New auth model: the extension calls POST /sse-session with a Bearer token
and receives a view-only session cookie (HttpOnly, SameSite=Strict, 30-min
TTL). EventSource is opened with `withCredentials: true` so the browser
sends the cookie back on the SSE connection. The ?token= query param is
GONE — no more URL-borne secrets.

Scope isolation (prior learning cookie-picker-auth-isolation, 10/10
confidence): the SSE session cookie grants access to /activity/stream and
/inspector/events ONLY. The token is never valid against /command, /token,
or any mutating endpoint. A leaked cookie can watch activity; it cannot
execute browser commands.

Components
  * browse/src/sse-session-cookie.ts — registry: mint/validate/extract/
    build-cookie. 256-bit tokens, 30-min TTL, lazy expiry pruning,
    no imports from token-registry (scope isolation enforced by module
    boundary).
  * browse/src/server.ts — POST /sse-session mint endpoint (requires
    Bearer). /activity/stream and /inspector/events now accept Bearer
    OR the session cookie, and reject ?token= query param.
  * extension/sidepanel.js — ensureSseSessionCookie() bootstrap call,
    EventSource opened with withCredentials:true on both SSE endpoints.
    Tested via the source guards; behavioral test is the E2E pairing
    flow that lands later in the wave.
  * browse/test/sse-session-cookie.test.ts — 20 unit tests covering
    mint entropy, TTL enforcement, cookie flag invariants, cookie
    parsing from multi-cookie headers, and scope-isolation contract
    guard (module must not import token-registry).
  * browse/test/server-auth.test.ts — existing /activity/stream auth
    test updated to assert the new cookie-based gate and the absence
    of the ?token= query param.

Cookie flag choices:
  * HttpOnly: token not readable from page JS (mitigates XSS
    exfiltration).
  * SameSite=Strict: cookie not sent on cross-site requests (mitigates
    CSRF). Fine for SSE because the extension connects to 127.0.0.1
    directly.
  * Path=/: cookie scoped to the whole origin.
  * Max-Age=1800: 30 minutes, matches TTL. Extension re-mints on
    reconnect when daemon restarts.
  * Secure NOT set: daemon binds to 127.0.0.1 over plain HTTP. Adding
    Secure would block the browser from ever sending the cookie back.
    Add Secure when gstack ships over HTTPS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* security(N2): document Windows v20 ABE elevation path on CDP port

The existing comment around the cookie-import-browser --remote-debugging-port
launch claimed "threat model: no worse than baseline." That's wrong on
Windows with App-Bound Encryption v20. A same-user local process that
opens the cookie SQLite DB directly CANNOT decrypt v20 values (DPAPI
context is bound to the browser process). The CDP port lets them bypass
that: connect to the debug port, call Network.getAllCookies inside Chrome,
walk away with decrypted v20 cookies.

The correct fix is to switch from TCP --remote-debugging-port to
--remote-debugging-pipe so the CDP transport is a stdio pipe, not a
socket. That requires restructuring the CDP WebSocket client in this
module and Playwright doesn't expose the pipe transport out of the box.
Non-trivial, deferred from the v1.6.0.0 wave.

This commit updates the comment to correctly describe the threat and
points at the tracking issue. No code change to the launch itself.
Follow-up: #1136.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(E2): document dual-listener tunnel architecture in ARCHITECTURE.md

Adds an explicit per-endpoint disposition table to the Security model
section, covering the v1.6.0.0 dual-listener refactor. Every HTTP
endpoint now has a documented local-vs-tunnel answer. Future audits
(and future contributors wondering "is it safe to add X to the tunnel
surface?") can read this instead of reverse-engineering server.ts.

Also documents:
  * Why physical port separation beats per-request header inference
    (ngrok behavior drift, local proxies can forge headers, etc.)
  * Tunnel surface denial logging → ~/.gstack/security/attempts.jsonl
  * SSE session cookie model (gstack_sse, 30-min TTL, stream-scope only,
    module-boundary-enforced scope isolation)
  * N2 non-goal for Windows v20 ABE via CDP port (tracking #1136)

No code changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(E1): end-to-end pair-agent flow against a spawned daemon

Spawns the browse daemon as a subprocess with BROWSE_HEADLESS_SKIP=1 so
the HTTP layer runs without a real browser.  Exercises:

  * GET /health — token delivery for chrome-extension origin, withheld
    otherwise (the F1 + PR #1026 invariant)
  * GET /connect — alive probe returns {alive:true} unauth
  * POST /pair — root Bearer required (403 without), returns setup_key
  * POST /connect — setup_key exchange mints a distinct scoped token
  * POST /command — 401 without auth
  * POST /sse-session — Bearer required, Set-Cookie has HttpOnly +
    SameSite=Strict (the N1 invariant)
  * GET /activity/stream — 401 without auth
  * GET /activity/stream?token= — 401 (the old ?token= query param is
    REJECTED, which is the whole point of N1)
  * GET /welcome — serves HTML, does not leak /etc/passwd content under
    the default 'unknown' slug (E3 regex gate)

12 behavioral tests, ~220ms end-to-end, no network dependencies, no
ngrok, no real browser.  This is the receipt for the wave's central
'pair-agent still works + the security boundary holds' claim.

Tunnel-port binding (/tunnel/start) is deliberately NOT exercised here
— it requires an ngrok authtoken and live network.  The dual-listener
route allowlist is covered by source-level guards in
dual-listener.test.ts; behavioral tunnel testing belongs in a separate
paid-evals harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release(v1.6.0.0): bump VERSION + CHANGELOG for security wave

Architectural bump, not patch: dual-listener HTTP refactor changes the
daemon's tunnel-exposure model.  See CHANGELOG for the full release
summary (~950 words) covering the five root causes this wave closes:

  1. /health token leak over ngrok (F1 + E3 + test infra)
  2. /cookie-picker + /inspector exposed over the tunnel (F1)
  3. ?token=<ROOT> in SSE URLs leaking to logs/referer/history (N1)
  4. /welcome GSTACK_SLUG path traversal (E3)
  5. Windows v20 ABE elevation via CDP port (N2 — documented non-goal,
     tracked as #1136)

Plus the base PRs: SSRF gate (#1029), envelope sentinel escape (#1031),
DOM-channel hidden-element coverage (#1032), --from-file path validation
(#1103), and 2 commits from #1073 (@theqazi).

VERSION + package.json bumped to 1.6.0.0.  CHANGELOG entry covers
credits (@garagon, @Hybirdss, @HMAKT99, @theqazi), review lineage (CEO
→ Codex outside voice → Eng), and the non-goal tracking issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pre-landing review findings (4 auto-fixes)

Addresses 4 findings from the Claude adversarial subagent on the
v1.6.0.0 security wave diff.  No user-visible behavior change; all
are defense-in-depth hardening of newly-introduced code.

1. GET /connect rate-limited (was POST-only) [HIGH conf 8/10]
   Attacker discovering the ngrok URL could probe unlimited GETs for
   daemon enumeration.  Now shares the global /connect counter.

2. ngrok listener leak on tunnel startup failure [MEDIUM conf 8/10]
   If ngrok.forward() resolved but tunnelListener.url() or the
   state-file write threw, the Bun listener was torn down but the
   ngrok session was leaked.  Fixed in BOTH /tunnel/start and
   BROWSE_TUNNEL=1 startup paths.

3. GSTACK_SKILL_ROOT path-traversal gate [MEDIUM conf 8/10]
   Symmetric with E3's GSTACK_SLUG regex gate — reject values
   containing '..' before interpolating into the welcome-page path.

4. SSE session registry pruning [LOW conf 7/10]
   pruneExpired() only checked 10 entries per mint call.  Now runs
   on every validate too, checks 20 entries, with a hard 10k cap as
   backstop.  Prevents registry growth under sustained extension
   reconnect pressure.

Tests remain green (56/56 in sse-session-cookie + dual-listener +
pair-agent-e2e suites).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v1.6.0.0

Reflect the dual-listener tunnel architecture, SSE session cookies,
SSRF guards, and Windows v20 ABE non-goal across the three docs
users actually read for remote-agent and browser auth context:

- docs/REMOTE_BROWSER_ACCESS.md: rewrote Architecture diagram for
  dual listeners, fixed /connect rate limit (3/min → 300/min),
  removed stale "/health requires no auth" (now 404 on tunnel),
  added SSE cookie auth, expanded Security Model with tunnel
  allowlist, SSRF guards, /welcome path traversal defense, and
  the Windows v20 ABE tracking note.
- BROWSER.md: added dual-listener paragraph to Authentication and
  linked to ARCHITECTURE.md endpoint table. Replaced the stale
  ?token= SSE auth note with the HttpOnly gstack_sse cookie flow.
- CLAUDE.md: added Transport-layer security section above the
  sidebar prompt-injection stack so contributors editing server.ts,
  sse-session-cookie.ts, or tunnel-denial-log.ts see the load-bearing
  module boundaries before touching them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(make-pdf): write --from-file payload to /tmp, not os.tmpdir()

make-pdf's browseClient wrote its --from-file payload to os.tmpdir(),
which is /var/folders/... on macOS. v1.6.0.0's PR #1103 cherry-pick
tightened browse load-html --from-file to validate against the
safe-dirs allowlist ([TEMP_DIR, cwd] where TEMP_DIR is '/tmp' on
macOS/Linux, os.tmpdir() on Windows). This closed a CLI/API parity
gap but broke make-pdf on macOS because /var/folders/... is outside
the allowlist.

Fix: mirror browse's TEMP_DIR convention — use '/tmp' on non-Windows,
os.tmpdir() on Windows. The make-pdf-gate CI failure on macOS-latest
(run 72440797490) is caused by exactly this: the payload file was
rejected by validateReadPath.

Verified locally: the combined-gate e2e test now passes after
rebuilding make-pdf/dist/pdf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sidebar): killAgent resets per-tab state; align tests with current agent event format

Two pre-existing bugs surfaced while running the full e2e suite on the
sec-wave branch.  Both pre-date v1.6.0.0 (same failures on main at
e23ff280) but blocked the ship verification, so fixing now.

### Bug 1: killAgent leaked stale per-tab state

`killAgent()` reset the legacy globals (agentProcess, agentStatus,
etc.) but never touched the per-tab `tabAgents` Map.  Meanwhile
`/sidebar-command` routes on `tabState.status` from that Map, not the
legacy globals.  Consequence: after a kill (including the implicit
kill in `/sidebar-session/new`), the next /sidebar-command on the
same tab saw `tabState.status === 'processing'` and fell into the
queue branch, silently NOT spawning an agent.  Integration tests that
called resetState between cases all failed with empty queues.

Fix: when targetTabId is supplied, reset that one tab's state; when
called without a tab (session-new, full kill), reset ALL tab states.
Matches the semantic boundary already used for the cancel-file write.

### Bug 2: sidebar-integration tests drifted from current event format

`agent events appear in /sidebar-chat` posted the raw Claude streaming
format (`{type: 'assistant', message: {content: [...]}}`) but
`processAgentEvent` in server.ts only handles the simplified types
that sidebar-agent.ts pre-processes into (text, text_delta, tool_use,
result, agent_error, security_event).  The architecture moved
pre-processing into sidebar-agent.ts at some point and this test
never got updated.  Fixed by sending the pre-processed `{type:
'text', text: '...'}` format — which is actually what the server sees
in production.

Also removed the `entry.prompt` URL-containment check in the
queue-write test.  The URL is carried on entry.pageUrl (metadata) by
design: the system prompt tells Claude to run `browse url` to fetch
the actual page rather than trust any URL in the prompt body.  That's
the URL-based prompt-injection defense.  The prompt SHOULD NOT
contain the URL, so the test assertion was wrong for the current
security posture.

### Verification

- `bun test browse/test/sidebar-integration.test.ts` → 13/13 pass
  (was 6/13 on both main and branch before this commit)
- Full `bun run test` → exit 0, zero fail markers
- No behavior change for production sidebar flows: killAgent was
  already supposed to return the agent to idle; it just wasn't fully
  doing so.  Per-tab reset now matches the documented semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: gus <gustavoraularagon@gmail.com>
Co-authored-by: Mohammed Qazi <10266060+theqazi@users.noreply.github.com>
2026-04-21 21:58:27 -07:00

1026 lines
42 KiB
TypeScript

/**
* Meta commands — tabs, server control, screenshots, chain, diff, snapshot
*/
import type { BrowserManager } from './browser-manager';
import { handleSnapshot } from './snapshot';
import { getCleanText } from './read-commands';
import { READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS, PAGE_CONTENT_COMMANDS, wrapUntrustedContent, canonicalizeCommand } from './commands';
import { validateNavigationUrl } from './url-validation';
import { checkScope, type TokenInfo } from './token-registry';
import { validateOutputPath, validateReadPath, SAFE_DIRECTORIES, escapeRegExp } from './path-security';
// Re-export for backward compatibility (tests import from meta-commands)
export { validateOutputPath, escapeRegExp } from './path-security';
import * as Diff from 'diff';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR } from './platform';
import { resolveConfig } from './config';
import type { Frame } from 'playwright';
/** Tokenize a pipe segment respecting double-quoted strings. */
function tokenizePipeSegment(segment: string): string[] {
const tokens: string[] = [];
let current = '';
let inQuote = false;
for (let i = 0; i < segment.length; i++) {
const ch = segment[i];
if (ch === '"') {
inQuote = !inQuote;
} else if (ch === ' ' && !inQuote) {
if (current) { tokens.push(current); current = ''; }
} else {
current += ch;
}
}
if (current) tokens.push(current);
return tokens;
}
// ─── PDF flag parsing (make-pdf contract) ─────────────────────────────
//
// The $B pdf command grew from a 2-line wrapper (format: 'A4') into a real
// PDF engine frontend. make-pdf/dist/pdf shells out to `browse pdf` with
// this flag set, so the contract here has to be stable.
//
// Mutex rules enforced:
// --format vs --width/--height
// --margins vs any --margin-*
// --page-numbers vs --footer-template (page-numbers writes the footer itself)
//
// Units for dimensions: "1in" | "72pt" | "25mm" | "2.54cm". Bare numbers
// are interpreted as pixels (Playwright's default), which is almost never
// what callers want — we warn but don't reject.
//
// Large payloads: header/footer HTML and custom CSS can exceed Windows'
// 8191-char CreateProcess cap via argv. Callers pass `--from-file <path>`
// to a JSON file holding the full options. make-pdf always uses this path.
interface ParsedPdfArgs {
output: string;
format?: string;
width?: string;
height?: string;
marginTop?: string;
marginRight?: string;
marginBottom?: string;
marginLeft?: string;
headerTemplate?: string;
footerTemplate?: string;
pageNumbers?: boolean;
tagged?: boolean;
outline?: boolean;
printBackground?: boolean;
preferCSSPageSize?: boolean;
toc?: boolean;
}
function parsePdfArgs(args: string[]): ParsedPdfArgs {
// --from-file short-circuits argv parsing entirely
for (let i = 0; i < args.length; i++) {
if (args[i] === '--from-file') {
const payloadPath = args[++i];
if (!payloadPath) throw new Error('pdf: --from-file requires a path');
return parsePdfFromFile(payloadPath);
}
}
const result: ParsedPdfArgs = {
output: `${TEMP_DIR}/browse-page.pdf`,
};
let margins: string | undefined;
const positional: string[] = [];
for (let i = 0; i < args.length; i++) {
const a = args[i];
if (a === '--format') { result.format = requireValue(args, ++i, 'format'); }
else if (a === '--page-size') { result.format = requireValue(args, ++i, 'page-size'); }
else if (a === '--width') { result.width = requireValue(args, ++i, 'width'); }
else if (a === '--height') { result.height = requireValue(args, ++i, 'height'); }
else if (a === '--margins') { margins = requireValue(args, ++i, 'margins'); }
else if (a === '--margin-top') { result.marginTop = requireValue(args, ++i, 'margin-top'); }
else if (a === '--margin-right') { result.marginRight = requireValue(args, ++i, 'margin-right'); }
else if (a === '--margin-bottom') { result.marginBottom = requireValue(args, ++i, 'margin-bottom'); }
else if (a === '--margin-left') { result.marginLeft = requireValue(args, ++i, 'margin-left'); }
else if (a === '--header-template') { result.headerTemplate = requireValue(args, ++i, 'header-template'); }
else if (a === '--footer-template') { result.footerTemplate = requireValue(args, ++i, 'footer-template'); }
else if (a === '--page-numbers') { result.pageNumbers = true; }
else if (a === '--tagged') { result.tagged = true; }
else if (a === '--outline') { result.outline = true; }
else if (a === '--print-background') { result.printBackground = true; }
else if (a === '--prefer-css-page-size') { result.preferCSSPageSize = true; }
else if (a === '--toc') { result.toc = true; }
else if (a.startsWith('--')) { throw new Error(`Unknown pdf flag: ${a}`); }
else { positional.push(a); }
}
if (positional.length > 0) result.output = positional[0];
if (margins !== undefined) {
if (result.marginTop || result.marginRight || result.marginBottom || result.marginLeft) {
throw new Error('pdf: --margins is mutex with --margin-top/--margin-right/--margin-bottom/--margin-left');
}
result.marginTop = result.marginRight = result.marginBottom = result.marginLeft = margins;
}
if (result.format && (result.width || result.height)) {
throw new Error('pdf: --format is mutex with --width/--height');
}
if (result.pageNumbers && result.footerTemplate) {
throw new Error('pdf: --page-numbers is mutex with --footer-template (page-numbers writes the footer itself)');
}
return result;
}
function parsePdfFromFile(payloadPath: string): ParsedPdfArgs {
// Parity with load-html --from-file (browse/src/write-commands.ts) and
// the direct load-html <file> path: every caller-supplied file path
// must pass validateReadPath so the safe-dirs policy can't be skirted
// by routing reads through the --from-file shortcut.
try {
validateReadPath(path.resolve(payloadPath));
} catch {
throw new Error(
`pdf: --from-file ${payloadPath} must be under ${SAFE_DIRECTORIES.join(' or ')} (security policy). Copy the payload into the project tree or /tmp first.`
);
}
const raw = fs.readFileSync(payloadPath, 'utf8');
const json = JSON.parse(raw);
const out: ParsedPdfArgs = {
output: json.output || `${TEMP_DIR}/browse-page.pdf`,
format: json.format,
width: json.width,
height: json.height,
marginTop: json.marginTop,
marginRight: json.marginRight,
marginBottom: json.marginBottom,
marginLeft: json.marginLeft,
headerTemplate: json.headerTemplate,
footerTemplate: json.footerTemplate,
pageNumbers: json.pageNumbers === true,
tagged: json.tagged === true,
outline: json.outline === true,
printBackground: json.printBackground === true,
preferCSSPageSize: json.preferCSSPageSize === true,
toc: json.toc === true,
};
return out;
}
function requireValue(args: string[], i: number, flag: string): string {
const v = args[i];
if (v === undefined || v.startsWith('--')) {
throw new Error(`pdf: --${flag} requires a value`);
}
return v;
}
function buildPdfOptions(parsed: ParsedPdfArgs): Record<string, unknown> {
const opts: Record<string, unknown> = {};
// Page size
if (parsed.format) {
opts.format = parsed.format.charAt(0).toUpperCase() + parsed.format.slice(1).toLowerCase();
} else if (parsed.width && parsed.height) {
opts.width = parsed.width;
opts.height = parsed.height;
} else {
opts.format = 'Letter';
}
// Margins
const margin: Record<string, string> = {};
if (parsed.marginTop) margin.top = parsed.marginTop;
if (parsed.marginRight) margin.right = parsed.marginRight;
if (parsed.marginBottom) margin.bottom = parsed.marginBottom;
if (parsed.marginLeft) margin.left = parsed.marginLeft;
if (Object.keys(margin).length > 0) opts.margin = margin;
// Header/footer
const displayHeaderFooter =
!!parsed.headerTemplate || !!parsed.footerTemplate || parsed.pageNumbers === true;
if (displayHeaderFooter) {
opts.displayHeaderFooter = true;
// Provide minimum empty templates when only one is set, otherwise Chromium
// emits its default ugly URL/date in the other slot.
if (parsed.headerTemplate !== undefined) opts.headerTemplate = parsed.headerTemplate;
else if (parsed.pageNumbers || parsed.footerTemplate) opts.headerTemplate = '<div></div>';
if (parsed.pageNumbers) {
opts.footerTemplate = [
'<div style="font-size:9pt; font-family:Helvetica,Arial,sans-serif; color:#666; ',
'width:100%; text-align:center;">',
'<span class="pageNumber"></span> of <span class="totalPages"></span>',
'</div>',
].join('');
} else if (parsed.footerTemplate !== undefined) {
opts.footerTemplate = parsed.footerTemplate;
} else {
opts.footerTemplate = '<div></div>';
}
}
if (parsed.tagged === true) opts.tagged = true;
if (parsed.outline === true) opts.outline = true;
if (parsed.printBackground === true) opts.printBackground = true;
if (parsed.preferCSSPageSize === true) opts.preferCSSPageSize = true;
return opts;
}
/** Options passed from handleCommandInternal for chain routing */
export interface MetaCommandOpts {
chainDepth?: number;
/** Callback to route subcommands through the full security pipeline (handleCommandInternal) */
executeCommand?: (body: { command: string; args?: string[]; tabId?: number }, tokenInfo?: TokenInfo | null) => Promise<{ status: number; result: string; json?: boolean }>;
}
export async function handleMetaCommand(
command: string,
args: string[],
bm: BrowserManager,
shutdown: () => Promise<void> | void,
tokenInfo?: TokenInfo | null,
opts?: MetaCommandOpts,
): Promise<string> {
// Per-tab operations use the active session; global operations use bm directly
const session = bm.getActiveSession();
switch (command) {
// ─── Tabs ──────────────────────────────────────────
case 'tabs': {
const tabs = await bm.getTabListWithTitles();
return tabs.map(t =>
`${t.active ? '→ ' : ' '}[${t.id}] ${t.title || '(untitled)'}${t.url}`
).join('\n');
}
case 'tab': {
const id = parseInt(args[0], 10);
if (isNaN(id)) throw new Error('Usage: browse tab <id>');
bm.switchTab(id);
return `Switched to tab ${id}`;
}
case 'newtab': {
// --json returns structured output (machine-parseable). Other flag-like
// tokens are treated as the url. make-pdf always passes --json.
let url: string | undefined;
let jsonMode = false;
for (const a of args) {
if (a === '--json') { jsonMode = true; }
else if (!url) { url = a; }
}
const id = await bm.newTab(url);
if (jsonMode) {
return JSON.stringify({ tabId: id, url: url ?? null });
}
return `Opened tab ${id}${url ? `${url}` : ''}`;
}
case 'closetab': {
const id = args[0] ? parseInt(args[0], 10) : undefined;
await bm.closeTab(id);
return `Closed tab${id ? ` ${id}` : ''}`;
}
// ─── Server Control ────────────────────────────────
case 'status': {
const page = bm.getPage();
const tabs = bm.getTabCount();
const mode = bm.getConnectionMode();
return [
`Status: healthy`,
`Mode: ${mode}`,
`URL: ${page.url()}`,
`Tabs: ${tabs}`,
`PID: ${process.pid}`,
].join('\n');
}
case 'url': {
return bm.getCurrentUrl();
}
case 'stop': {
await shutdown();
return 'Server stopped';
}
case 'restart': {
// Signal that we want a restart — the CLI will detect exit and restart
console.log('[browse] Restart requested. Exiting for CLI to restart.');
await shutdown();
return 'Restarting...';
}
// ─── Visual ────────────────────────────────────────
case 'screenshot': {
// Parse priority: flags (--viewport, --clip, --base64) → selector (@ref, CSS) → output path
const page = bm.getPage();
let outputPath = `${TEMP_DIR}/browse-screenshot.png`;
let clipRect: { x: number; y: number; width: number; height: number } | undefined;
let targetSelector: string | undefined;
let viewportOnly = false;
let base64Mode = false;
const remaining: string[] = [];
let flagSelector: string | undefined;
for (let i = 0; i < args.length; i++) {
if (args[i] === '--viewport') {
viewportOnly = true;
} else if (args[i] === '--base64') {
base64Mode = true;
} else if (args[i] === '--selector') {
flagSelector = args[++i];
if (!flagSelector) throw new Error('Usage: screenshot --selector <css> [path]');
} else if (args[i] === '--clip') {
const coords = args[++i];
if (!coords) throw new Error('Usage: screenshot --clip x,y,w,h [path]');
const parts = coords.split(',').map(Number);
if (parts.length !== 4 || parts.some(isNaN))
throw new Error('Usage: screenshot --clip x,y,width,height — all must be numbers');
clipRect = { x: parts[0], y: parts[1], width: parts[2], height: parts[3] };
} else if (args[i].startsWith('--')) {
throw new Error(`Unknown screenshot flag: ${args[i]}`);
} else {
remaining.push(args[i]);
}
}
// Separate target (selector/@ref) from output path
for (const arg of remaining) {
// File paths containing / and ending with an image/pdf extension are never CSS selectors
const isFilePath = arg.includes('/') && /\.(png|jpe?g|webp|pdf)$/i.test(arg);
if (isFilePath) {
outputPath = arg;
} else if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
targetSelector = arg;
} else {
outputPath = arg;
}
}
// --selector flag takes precedence; conflict with positional selector.
if (flagSelector !== undefined) {
if (targetSelector !== undefined) {
throw new Error('--selector conflicts with positional selector — choose one');
}
targetSelector = flagSelector;
}
validateOutputPath(outputPath);
if (clipRect && targetSelector) {
throw new Error('Cannot use --clip with a selector/ref — choose one');
}
if (viewportOnly && clipRect) {
throw new Error('Cannot use --viewport with --clip — choose one');
}
// --base64 mode: capture to buffer instead of disk
if (base64Mode) {
let buffer: Buffer;
if (targetSelector) {
const resolved = await bm.resolveRef(targetSelector);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
buffer = await locator.screenshot({ timeout: 5000 });
} else if (clipRect) {
buffer = await page.screenshot({ clip: clipRect });
} else {
buffer = await page.screenshot({ fullPage: !viewportOnly });
}
if (buffer.length > 10 * 1024 * 1024) {
throw new Error('Screenshot too large for --base64 (>10MB). Use disk path instead.');
}
return `data:image/png;base64,${buffer.toString('base64')}`;
}
if (targetSelector) {
const resolved = await bm.resolveRef(targetSelector);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
await locator.screenshot({ path: outputPath, timeout: 5000 });
return `Screenshot saved (element): ${outputPath}`;
}
if (clipRect) {
await page.screenshot({ path: outputPath, clip: clipRect });
return `Screenshot saved (clip ${clipRect.x},${clipRect.y},${clipRect.width},${clipRect.height}): ${outputPath}`;
}
await page.screenshot({ path: outputPath, fullPage: !viewportOnly });
return `Screenshot saved${viewportOnly ? ' (viewport)' : ''}: ${outputPath}`;
}
case 'pdf': {
const page = bm.getPage();
const parsed = parsePdfArgs(args);
validateOutputPath(parsed.output);
// If --toc: wait up to 3s for Paged.js to signal by setting
// window.__pagedjsAfterFired = true. If the polyfill isn't injected
// (make-pdf v1 ships without Paged.js; TOC renders without page
// numbers), we fall through silently — callers that require strict
// TOC pagination should pass --require-paged-js too.
if (parsed.toc) {
const deadline = Date.now() + 3000;
let ready = false;
while (Date.now() < deadline) {
try {
ready = await page.evaluate('!!window.__pagedjsAfterFired');
} catch { /* tab may still be hydrating */ }
if (ready) break;
await new Promise(r => setTimeout(r, 150));
}
// Intentionally non-fatal. Paged.js is optional in v1.
}
const opts = buildPdfOptions(parsed);
opts.path = parsed.output;
await page.pdf(opts);
return `PDF saved: ${parsed.output}`;
}
case 'responsive': {
const page = bm.getPage();
const prefix = args[0] || `${TEMP_DIR}/browse-responsive`;
validateOutputPath(prefix);
const viewports = [
{ name: 'mobile', width: 375, height: 812 },
{ name: 'tablet', width: 768, height: 1024 },
{ name: 'desktop', width: 1280, height: 720 },
];
const originalViewport = page.viewportSize();
const results: string[] = [];
for (const vp of viewports) {
await page.setViewportSize({ width: vp.width, height: vp.height });
const screenshotPath = `${prefix}-${vp.name}.png`;
validateOutputPath(screenshotPath);
await page.screenshot({ path: screenshotPath, fullPage: true });
results.push(`${vp.name} (${vp.width}x${vp.height}): ${screenshotPath}`);
}
// Restore original viewport
if (originalViewport) {
await page.setViewportSize(originalViewport);
}
return results.join('\n');
}
// ─── Chain ─────────────────────────────────────────
case 'chain': {
// Read JSON array from args[0] (if provided) or expect it was passed as body
const jsonStr = args[0];
if (!jsonStr) throw new Error(
'Usage: echo \'[["goto","url"],["text"]]\' | browse chain\n' +
' or: browse chain \'goto url | click @e5 | snapshot -ic\''
);
let rawCommands: string[][];
try {
rawCommands = JSON.parse(jsonStr);
if (!Array.isArray(rawCommands)) throw new Error('not array');
} catch (err: any) {
// Fallback: pipe-delimited format "goto url | click @e5 | snapshot -ic"
if (!(err instanceof SyntaxError) && err?.message !== 'not array') throw err;
rawCommands = jsonStr.split(' | ')
.filter(seg => seg.trim().length > 0)
.map(seg => tokenizePipeSegment(seg.trim()));
}
// Canonicalize aliases across the whole chain. Pair canonical name with the raw
// input so result labels + error messages reflect what the user typed, but every
// dispatch path (scope check, WRITE_COMMANDS.has, watch blocking, handler lookup)
// uses the canonical name. Otherwise `chain '[["setcontent","/tmp/x.html"]]'`
// bypasses prevalidation or runs under the wrong command set.
const commands = rawCommands.map(cmd => {
const [rawName, ...cmdArgs] = cmd;
const name = canonicalizeCommand(rawName);
return { rawName, name, args: cmdArgs };
});
// Pre-validate ALL subcommands against the token's scope before executing any.
// Uses canonical name so aliases don't bypass scope checks.
if (tokenInfo && tokenInfo.clientId !== 'root') {
for (const c of commands) {
if (!checkScope(tokenInfo, c.name)) {
throw new Error(
`Chain rejected: subcommand "${c.rawName}" not allowed by your token scope (${tokenInfo.scopes.join(', ')}). ` +
`All subcommands must be within scope.`
);
}
}
}
// Route each subcommand through handleCommandInternal for full security:
// scope, domain, tab ownership, content wrapping — all enforced per subcommand.
// Chain-specific options: skip rate check (chain = 1 request), skip activity
// events (chain emits 1 event), increment chain depth (recursion guard).
const executeCmd = opts?.executeCommand;
const results: string[] = [];
let lastWasWrite = false;
if (executeCmd) {
// Full security pipeline via handleCommandInternal.
// Pass rawName so the server's own canonicalization is a no-op (already canonical).
for (const c of commands) {
const cr = await executeCmd(
{ command: c.name, args: c.args },
tokenInfo,
);
const label = c.rawName === c.name ? c.name : `${c.rawName}${c.name}`;
if (cr.status === 200) {
results.push(`[${label}] ${cr.result}`);
} else {
// Parse error from JSON result
let errMsg = cr.result;
try { errMsg = JSON.parse(cr.result).error || cr.result; } catch (err: any) { if (!(err instanceof SyntaxError)) throw err; }
results.push(`[${label}] ERROR: ${errMsg}`);
}
lastWasWrite = WRITE_COMMANDS.has(c.name);
}
} else {
// Fallback: direct dispatch (CLI mode, no server context)
const { handleReadCommand } = await import('./read-commands');
const { handleWriteCommand } = await import('./write-commands');
for (const c of commands) {
const name = c.name;
const cmdArgs = c.args;
const label = c.rawName === name ? name : `${c.rawName}${name}`;
try {
let result: string;
if (WRITE_COMMANDS.has(name)) {
if (bm.isWatching()) {
result = 'BLOCKED: write commands disabled in watch mode';
} else {
result = await handleWriteCommand(name, cmdArgs, session, bm);
}
lastWasWrite = true;
} else if (READ_COMMANDS.has(name)) {
result = await handleReadCommand(name, cmdArgs, session);
if (PAGE_CONTENT_COMMANDS.has(name)) {
result = wrapUntrustedContent(result, bm.getCurrentUrl());
}
lastWasWrite = false;
} else if (META_COMMANDS.has(name)) {
result = await handleMetaCommand(name, cmdArgs, bm, shutdown, tokenInfo, opts);
lastWasWrite = false;
} else {
throw new Error(`Unknown command: ${c.rawName}`);
}
results.push(`[${label}] ${result}`);
} catch (err: any) {
results.push(`[${label}] ERROR: ${err.message}`);
}
}
}
// Wait for network to settle after write commands before returning
if (lastWasWrite) {
await bm.getPage().waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
}
return results.join('\n\n');
}
// ─── Diff ──────────────────────────────────────────
case 'diff': {
const [url1, url2] = args;
if (!url1 || !url2) throw new Error('Usage: browse diff <url1> <url2>');
const page = bm.getPage();
const normalizedUrl1 = await validateNavigationUrl(url1);
await page.goto(normalizedUrl1, { waitUntil: 'domcontentloaded', timeout: 15000 });
const text1 = await getCleanText(page);
const normalizedUrl2 = await validateNavigationUrl(url2);
await page.goto(normalizedUrl2, { waitUntil: 'domcontentloaded', timeout: 15000 });
const text2 = await getCleanText(page);
const changes = Diff.diffLines(text1, text2);
const output: string[] = [`--- ${url1}`, `+++ ${url2}`, ''];
for (const part of changes) {
const prefix = part.added ? '+' : part.removed ? '-' : ' ';
const lines = part.value.split('\n').filter(l => l.length > 0);
for (const line of lines) {
output.push(`${prefix} ${line}`);
}
}
return wrapUntrustedContent(output.join('\n'), `diff: ${url1} vs ${url2}`);
}
// ─── Snapshot ─────────────────────────────────────
case 'snapshot': {
const isScoped = tokenInfo && tokenInfo.clientId !== 'root';
const snapshotResult = await handleSnapshot(args, session, {
splitForScoped: !!isScoped,
});
// Scoped tokens get split format (refs outside envelope); root gets basic wrapping
if (isScoped) {
return snapshotResult; // already has envelope from split format
}
return wrapUntrustedContent(snapshotResult, bm.getCurrentUrl());
}
// ─── Handoff ────────────────────────────────────
case 'handoff': {
const message = args.join(' ') || 'User takeover requested';
return await bm.handoff(message);
}
case 'resume': {
bm.resume();
// Re-snapshot to capture current page state after human interaction
const isScoped2 = tokenInfo && tokenInfo.clientId !== 'root';
const snapshot = await handleSnapshot(['-i'], session, { splitForScoped: !!isScoped2 });
if (isScoped2) {
return `RESUMED\n${snapshot}`;
}
return `RESUMED\n${wrapUntrustedContent(snapshot, bm.getCurrentUrl())}`;
}
// ─── Headed Mode ──────────────────────────────────────
case 'connect': {
// connect is handled as a pre-server command in cli.ts
// If we get here, server is already running — tell the user
if (bm.getConnectionMode() === 'headed') {
return 'Already in headed mode with extension.';
}
return 'The connect command must be run from the CLI (not sent to a running server). Run: $B connect';
}
case 'disconnect': {
if (bm.getConnectionMode() !== 'headed') {
return 'Not in headed mode — nothing to disconnect.';
}
// Signal that we want a restart in headless mode
console.log('[browse] Disconnecting headed browser. Restarting in headless mode.');
await shutdown();
return 'Disconnected. Server will restart in headless mode on next command.';
}
case 'focus': {
if (bm.getConnectionMode() !== 'headed') {
return 'focus requires headed mode. Run `$B connect` first.';
}
try {
const { execSync } = await import('child_process');
// Try common Chromium-based browser app names to bring to foreground
const appNames = ['Comet', 'Google Chrome', 'Arc', 'Brave Browser', 'Microsoft Edge'];
let activated = false;
for (const appName of appNames) {
try {
execSync(`osascript -e 'tell application "${appName}" to activate'`, { stdio: 'pipe', timeout: 3000 });
activated = true;
break;
} catch (err: any) {
// Try next browser — osascript fails if app not found or AppleScript errors
if (err?.status === undefined && !err?.message?.includes('Command failed')) throw err;
}
}
if (!activated) {
return 'Could not bring browser to foreground. macOS only.';
}
// If a ref was passed, scroll it into view
if (args.length > 0 && args[0].startsWith('@')) {
try {
const resolved = await bm.resolveRef(args[0]);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
return `Browser activated. Scrolled ${args[0]} into view.`;
}
} catch (err: any) {
// Ref not found or element gone — still activated the browser
if (!err?.message?.includes('not found') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('timeout')) throw err;
}
}
return 'Browser window activated.';
} catch (err: any) {
return `focus failed: ${err.message}. macOS only.`;
}
}
// ─── Watch ──────────────────────────────────────────
case 'watch': {
if (args[0] === 'stop') {
if (!bm.isWatching()) return 'Not currently watching.';
const result = bm.stopWatch();
const durationSec = Math.round(result.duration / 1000);
const lastSnapshot = result.snapshots.length > 0
? wrapUntrustedContent(result.snapshots[result.snapshots.length - 1], bm.getCurrentUrl())
: '(none)';
return [
`WATCH STOPPED (${durationSec}s, ${result.snapshots.length} snapshots)`,
'',
'Last snapshot:',
lastSnapshot,
].join('\n');
}
if (bm.isWatching()) return 'Already watching. Run `$B watch stop` to stop.';
if (bm.getConnectionMode() !== 'headed') {
return 'watch requires headed mode. Run `$B connect` first.';
}
bm.startWatch();
return 'WATCHING — observing user browsing. Periodic snapshots every 5s.\nRun `$B watch stop` to stop and get summary.';
}
// ─── Inbox ──────────────────────────────────────────
case 'inbox': {
const { execSync } = await import('child_process');
let gitRoot: string;
try {
gitRoot = execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
} catch (err: any) {
// execSync throws with exit status on non-git directories
if (err?.status === undefined && !err?.message?.includes('Command failed')) throw err;
return 'Not in a git repository — cannot locate inbox.';
}
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
if (!fs.existsSync(inboxDir)) return 'Inbox empty.';
const files = fs.readdirSync(inboxDir)
.filter(f => f.endsWith('.json') && !f.startsWith('.'))
.sort()
.reverse(); // newest first
if (files.length === 0) return 'Inbox empty.';
const messages: { timestamp: string; url: string; userMessage: string }[] = [];
for (const file of files) {
try {
const data = JSON.parse(fs.readFileSync(path.join(inboxDir, file), 'utf-8'));
messages.push({
timestamp: data.timestamp || '',
url: data.page?.url || 'unknown',
userMessage: data.userMessage || '',
});
} catch (err: any) {
// Skip malformed JSON or unreadable files
if (!(err instanceof SyntaxError) && err?.code !== 'ENOENT' && err?.code !== 'EACCES') throw err;
}
}
if (messages.length === 0) return 'Inbox empty.';
const lines: string[] = [];
lines.push(`SIDEBAR INBOX (${messages.length} message${messages.length === 1 ? '' : 's'})`);
lines.push('────────────────────────────────');
for (const msg of messages) {
const ts = msg.timestamp ? `[${msg.timestamp}]` : '[unknown]';
lines.push(`${ts} ${wrapUntrustedContent(msg.url, 'inbox-url')}`);
lines.push(` "${wrapUntrustedContent(msg.userMessage, 'inbox-message')}"`);
lines.push('');
}
lines.push('────────────────────────────────');
// Handle --clear flag
if (args.includes('--clear')) {
for (const file of files) {
try { fs.unlinkSync(path.join(inboxDir, file)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
}
lines.push(`Cleared ${files.length} message${files.length === 1 ? '' : 's'}.`);
}
return lines.join('\n');
}
// ─── State ────────────────────────────────────────
case 'state': {
const [action, name] = args;
if (!action || !name) throw new Error('Usage: state save|load <name>');
// Sanitize name: alphanumeric + hyphens + underscores only
if (!/^[a-zA-Z0-9_-]+$/.test(name)) {
throw new Error('State name must be alphanumeric (a-z, 0-9, _, -)');
}
const config = resolveConfig();
const stateDir = path.join(config.stateDir, 'browse-states');
fs.mkdirSync(stateDir, { recursive: true });
const statePath = path.join(stateDir, `${name}.json`);
if (action === 'save') {
const state = await bm.saveState();
// V1: cookies + URLs only (not localStorage — breaks on load-before-navigate)
const saveData = {
version: 1,
savedAt: new Date().toISOString(),
cookies: state.cookies,
pages: state.pages.map(p => ({ url: p.url, isActive: p.isActive })),
};
fs.writeFileSync(statePath, JSON.stringify(saveData, null, 2), { mode: 0o600 });
return `State saved: ${statePath} (${state.cookies.length} cookies, ${state.pages.length} pages)\n⚠️ Cookies stored in plaintext. Delete when no longer needed.`;
}
if (action === 'load') {
if (!fs.existsSync(statePath)) throw new Error(`State not found: ${statePath}`);
const data = JSON.parse(fs.readFileSync(statePath, 'utf-8'));
if (!Array.isArray(data.cookies) || !Array.isArray(data.pages)) {
throw new Error('Invalid state file: expected cookies and pages arrays');
}
// Validate and filter cookies — reject malformed or internal-network cookies
const validatedCookies = data.cookies.filter((c: any) => {
if (typeof c !== 'object' || !c) return false;
if (typeof c.name !== 'string' || typeof c.value !== 'string') return false;
if (typeof c.domain !== 'string' || !c.domain) return false;
const d = c.domain.startsWith('.') ? c.domain.slice(1) : c.domain;
if (d === 'localhost' || d.endsWith('.internal') || d === '169.254.169.254') return false;
return true;
});
if (validatedCookies.length < data.cookies.length) {
console.warn(`[browse] Filtered ${data.cookies.length - validatedCookies.length} invalid cookies from state file`);
}
// Warn on state files older than 7 days
if (data.savedAt) {
const ageMs = Date.now() - new Date(data.savedAt).getTime();
const SEVEN_DAYS = 7 * 24 * 60 * 60 * 1000;
if (ageMs > SEVEN_DAYS) {
console.warn(`[browse] Warning: State file is ${Math.round(ageMs / 86400000)} days old. Consider re-saving.`);
}
}
// Close existing pages, then restore (replace, not merge)
bm.setFrame(null);
await bm.closeAllPages();
// Allowlist disk-loaded page fields — NEVER accept loadedHtml, loadedHtmlWaitUntil,
// or owner from disk. Those are in-memory-only invariants; allowing them would let
// a tampered state file smuggle HTML past load-html's safe-dirs + magic-byte + size
// checks, or forge tab ownership for cross-agent authorization bypass.
await bm.restoreState({
cookies: validatedCookies,
pages: data.pages.map((p: any) => ({
url: typeof p.url === 'string' ? p.url : '',
isActive: Boolean(p.isActive),
storage: null,
})),
});
return `State loaded: ${data.cookies.length} cookies, ${data.pages.length} pages`;
}
throw new Error('Usage: state save|load <name>');
}
// ─── Frame ───────────────────────────────────────
case 'frame': {
const target = args[0];
if (!target) throw new Error('Usage: frame <selector|@ref|--name name|--url pattern|main>');
if (target === 'main') {
bm.setFrame(null);
bm.clearRefs();
return 'Switched to main frame';
}
const page = bm.getPage();
let frame: Frame | null = null;
if (target === '--name') {
if (!args[1]) throw new Error('Usage: frame --name <name>');
frame = page.frame({ name: args[1] });
} else if (target === '--url') {
if (!args[1]) throw new Error('Usage: frame --url <pattern>');
frame = page.frame({ url: new RegExp(escapeRegExp(args[1])) });
} else {
// CSS selector or @ref for the iframe element
const resolved = await bm.resolveRef(target);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
const elementHandle = await locator.elementHandle({ timeout: 5000 });
frame = await elementHandle?.contentFrame() ?? null;
await elementHandle?.dispose();
}
if (!frame) throw new Error(`Frame not found: ${target}`);
bm.setFrame(frame);
bm.clearRefs();
return `Switched to frame: ${frame.url()}`;
}
// ─── UX Audit ─────────────────────────────────────
case 'ux-audit': {
const page = bm.getPage();
// Extract page structure for UX behavioral analysis
// Agent interprets the data and applies Krug's 6 usability tests
// Uses textContent (not innerText) to avoid layout computation on large DOMs
const data = await page.evaluate(() => {
const HEADING_CAP = 50;
const INTERACTIVE_CAP = 200;
const TEXT_BLOCK_CAP = 50;
// Site ID: logo or brand element
const logoEl = document.querySelector('[class*="logo"], [id*="logo"], header img, [aria-label*="home"], a[href="/"]');
const siteId = logoEl ? {
found: true,
text: (logoEl.textContent || '').trim().slice(0, 100),
tag: logoEl.tagName,
alt: (logoEl as HTMLImageElement).alt || null,
} : { found: false, text: null, tag: null, alt: null };
// Page name: main heading
const h1 = document.querySelector('h1');
const pageName = h1 ? {
found: true,
text: h1.textContent?.trim().slice(0, 200) || '',
} : { found: false, text: null };
// Navigation: primary nav elements
const navEls = document.querySelectorAll('nav, [role="navigation"]');
const navItems: Array<{ text: string; links: number }> = [];
navEls.forEach((nav, i) => {
if (i >= 5) return;
const links = nav.querySelectorAll('a');
navItems.push({
text: (nav.getAttribute('aria-label') || `nav-${i}`).slice(0, 50),
links: links.length,
});
});
// "You are here" indicator: current/active nav items
// Scoped to nav containers to avoid false positives from animation classes
const activeNavItems = document.querySelectorAll('nav [aria-current], nav .active, nav .current, [role="navigation"] [aria-current], [role="navigation"] .active, [role="navigation"] .current');
const youAreHere = Array.from(activeNavItems).slice(0, 5).map(el => ({
text: (el.textContent || '').trim().slice(0, 50),
tag: el.tagName,
}));
// Search: search box presence
const searchEl = document.querySelector('input[type="search"], [role="search"], input[name*="search"], input[placeholder*="search" i], input[aria-label*="search" i]');
const search = { found: !!searchEl };
// Breadcrumbs
const breadcrumbEl = document.querySelector('[aria-label*="breadcrumb" i], .breadcrumb, .breadcrumbs, [class*="breadcrumb"]');
const breadcrumbs = breadcrumbEl ? {
found: true,
items: Array.from(breadcrumbEl.querySelectorAll('a, span, li')).slice(0, 10).map(el => (el.textContent || '').trim().slice(0, 30)),
} : { found: false, items: [] };
// Headings: heading hierarchy
const headings = Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).slice(0, HEADING_CAP).map(h => ({
tag: h.tagName,
text: (h.textContent || '').trim().slice(0, 80),
size: getComputedStyle(h).fontSize,
}));
// Interactive elements: buttons, links, inputs
const interactiveEls = Array.from(document.querySelectorAll('a, button, input, select, textarea, [role="button"], [tabindex]')).slice(0, INTERACTIVE_CAP);
const interactive = interactiveEls.map(el => {
const rect = el.getBoundingClientRect();
return {
tag: el.tagName,
text: (el.textContent || (el as HTMLInputElement).placeholder || '').trim().slice(0, 50),
type: (el as HTMLInputElement).type || null,
role: el.getAttribute('role'),
w: Math.round(rect.width),
h: Math.round(rect.height),
visible: rect.width > 0 && rect.height > 0,
};
}).filter(el => el.visible);
// Text blocks: paragraphs and large text areas
const textBlocks = Array.from(document.querySelectorAll('p, [class*="description"], [class*="intro"], [class*="welcome"], [class*="hero"] p, main p')).slice(0, TEXT_BLOCK_CAP).map(el => ({
text: (el.textContent || '').trim().slice(0, 200),
wordCount: (el.textContent || '').trim().split(/\s+/).filter(Boolean).length,
}));
// Total visible text word count (textContent avoids layout computation)
const bodyText = (document.body?.textContent || '').trim();
const totalWords = bodyText.split(/\s+/).filter(Boolean).length;
return {
url: window.location.href,
title: document.title,
siteId,
pageName,
navigation: navItems,
youAreHere,
search,
breadcrumbs,
headings,
interactive,
textBlocks,
totalWords,
};
});
return JSON.stringify(data, null, 2);
}
default:
throw new Error(`Unknown meta command: ${command}`);
}
}