Files
gstack/docs/REMOTE_BROWSER_ACCESS.md
Garry Tan 3bff673671 docs: update project documentation for v1.6.0.0
Reflect the dual-listener tunnel architecture, SSE session cookies,
SSRF guards, and Windows v20 ABE non-goal across the three docs
users actually read for remote-agent and browser auth context:

- docs/REMOTE_BROWSER_ACCESS.md: rewrote Architecture diagram for
  dual listeners, fixed /connect rate limit (3/min → 300/min),
  removed stale "/health requires no auth" (now 404 on tunnel),
  added SSE cookie auth, expanded Security Model with tunnel
  allowlist, SSRF guards, /welcome path traversal defense, and
  the Windows v20 ABE tracking note.
- BROWSER.md: added dual-listener paragraph to Authentication and
  linked to ARCHITECTURE.md endpoint table. Replaced the stale
  ?token= SSE auth note with the HttpOnly gstack_sse cookie flow.
- CLAUDE.md: added Transport-layer security section above the
  sidebar prompt-injection stack so contributors editing server.ts,
  sse-session-cookie.ts, or tunnel-denial-log.ts see the load-bearing
  module boundaries before touching them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:11:55 -07:00

9.6 KiB

Remote Browser Access — How to Pair With a GStack Browser

A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.

This document is the reference for remote agents. The quick-start instructions are generated by $B pair-agent with the actual credentials baked in.

Architecture

Your Machine                          Remote Agent
─────────────                         ────────────
GStack Browser Server                 Any AI agent
  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
  ├── Local listener  127.0.0.1:LOCAL         │
  │    (bootstrap, CLI, sidebar, cookies)      │
  ├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤
  │    (pair-agent only: /connect, /command,   │
  │     /sidebar-chat — locked allowlist)      │
  ├── ngrok tunnel (forwards tunnel port only) │
  │     https://xxx.ngrok.dev ─────────────────┘
  └── Token Registry
        ├── Root token (local listener only)
        ├── Setup keys (5 min, one-time)
        ├── Session tokens (24h, scoped)
        └── SSE session cookies (30 min, stream-scope)

Dual-listener architecture (v1.6.0.0)

The daemon binds two HTTP sockets. The local listener serves the full command surface to 127.0.0.1 only and is never forwarded. The tunnel listener is bound lazily on /tunnel/start (and torn down on /tunnel/stop) with a locked path allowlist. ngrok forwards only the tunnel port.

A caller who stumbles onto your ngrok URL cannot reach /health, /cookie-picker, /inspector/*, or /welcome — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only /connect, /command (with a scoped token + the 17-command browser-driving allowlist), and /sidebar-chat.

See ARCHITECTURE.md for the full endpoint table.

Connection Flow

  1. User runs $B pair-agent (or /pair-agent in Claude Code)
  2. Server creates a one-time setup key (expires in 5 minutes)
  3. User copies the instruction block into the other agent's chat
  4. Remote agent runs POST /connect with the setup key
  5. Server returns a scoped session token (24h default)
  6. Remote agent creates its own tab via POST /command with newtab
  7. Remote agent browses using POST /command with its session token + tabId

API Reference

Authentication

All command endpoints require a Bearer token:

Authorization: Bearer gsk_sess_...

/connect is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. /health is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404).

SSE endpoints (/activity/stream, /inspector/events) accept either a Bearer token or the HttpOnly gstack_sse cookie (minted via POST /sse-session, 30-minute TTL, stream-scope only — cannot be used against /command). As of v1.6.0.0 the ?token=<ROOT> query-string auth is no longer accepted.

Endpoints

POST /connect

Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable).

Request:  {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}

POST /command

Send a browser command. Requires Bearer auth.

Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)

GET /health

Server status. No auth required. Returns status, tabs, mode, uptime.

Commands

Navigation

Command Args Description
goto ["URL"] Navigate to a URL
back [] Go back
forward [] Go forward
reload [] Reload page

Reading Content

Command Args Description
snapshot ["-i"] Interactive snapshot with @ref labels (most useful)
text [] Full page text
html ["selector?"] HTML of element or full page
links [] All links on page
screenshot ["/tmp/s.png"] Take a screenshot
url [] Current URL

Interaction

Command Args Description
click ["@e3"] Click an element (use @ref from snapshot)
fill ["@e5", "text"] Fill a form field
select ["@e7", "option"] Select dropdown value
type ["text"] Type text (keyboard)
press ["Enter"] Press a key
scroll ["down"] Scroll the page

Tabs

Command Args Description
newtab ["URL?"] Create a new tab (required before writing)
tabs [] List all tabs
closetab ["id?"] Close a tab

The Snapshot → @ref Pattern

This is the most powerful browsing pattern. Instead of writing CSS selectors:

  1. Run snapshot -i to get an interactive snapshot with labeled elements
  2. The snapshot returns text like:
    [Page Title]
    @e1 [link] "Home"
    @e2 [button] "Sign In"
    @e3 [input] "Search..."
    
  3. Use the @e refs directly in commands: click @e2, fill @e3 "search query"

This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always snapshot -i first, then use the refs.

Scopes

Scope What it allows
read snapshot, text, html, links, screenshot, url, tabs, console, etc.
write goto, click, fill, scroll, newtab, closetab, etc.
admin eval, js, cookies, storage, cookie-import, useragent, etc.
meta tab, diff, frame, responsive, watch

Default tokens get read + write. Admin requires --admin flag when pairing.

Tab Isolation

Each agent owns the tabs it creates. Rules:

  • Read: Any agent can read any tab (snapshot, text, screenshot)
  • Write: Only the tab owner can write (click, fill, goto, etc.)
  • Unowned tabs: Pre-existing tabs are root-only for writes
  • First step: Always newtab before trying to interact

Error Codes

Code Meaning What to do
401 Token invalid, expired, or revoked Ask user to run /pair-agent again
403 Command not in scope, or tab not yours Use newtab, or ask for --admin
429 Rate limit exceeded (>10 req/s) Wait for Retry-After header

Security Model

  • Physical port separation. Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
  • Tunnel command allowlist. /command over the tunnel only accepts 17 browser-driving commands (goto, click, fill, snapshot, text, etc.). Server-management commands (tunnel, pair, token, useragent, eval, js) are denied on the tunnel.
  • Root token is tunnel-blocked. A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
  • Setup keys expire in 5 minutes and can only be used once.
  • Session tokens expire in 24 hours (configurable).
  • The root token never appears in instruction blocks or connection strings.
  • Admin scope (JS execution, cookie access) is denied by default.
  • Tokens can be revoked instantly: $B tunnel revoke agent-name
  • SSE auth uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against /command).
  • Path traversal guarded on /welcomeGSTACK_SLUG must match ^[a-z0-9_-]+$ or falls back to the built-in template.
  • SSRF guards on goto, download, and scrape paths — validates URL target against a localhost/private-range blocklist.
  • Tunnel surface denial logging. Every rejection on the tunnel listener (path_not_on_tunnel, root_token_on_tunnel, missing_scoped_token, disallowed_command:*) is appended to ~/.gstack/security/attempts.jsonl with timestamp, source IP, path, method. Rate-capped at 60 writes/min.
  • All agent activity is logged with attribution (clientId).

Known non-goal (tracked as #1136): on Windows, the cookie-import-browser path launches Chrome with --remote-debugging-port=<random>. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is --remote-debugging-pipe instead of TCP.

Same-Machine Shortcut

If both agents are on the same machine, skip the copy-paste:

$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json

No tunnel needed. Uses localhost directly.

ngrok Tunnel Setup

For remote agents on different machines:

  1. Sign up at ngrok.com (free tier works)
  2. Copy your auth token from the dashboard
  3. Save it: echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env
  4. Optionally claim a stable domain: echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env
  5. Start with tunnel: BROWSE_TUNNEL=1 $B restart
  6. Run $B pair-agent — it will use the tunnel URL automatically