Reflect the dual-listener tunnel architecture, SSE session cookies, SSRF guards, and Windows v20 ABE non-goal across the three docs users actually read for remote-agent and browser auth context: - docs/REMOTE_BROWSER_ACCESS.md: rewrote Architecture diagram for dual listeners, fixed /connect rate limit (3/min → 300/min), removed stale "/health requires no auth" (now 404 on tunnel), added SSE cookie auth, expanded Security Model with tunnel allowlist, SSRF guards, /welcome path traversal defense, and the Windows v20 ABE tracking note. - BROWSER.md: added dual-listener paragraph to Authentication and linked to ARCHITECTURE.md endpoint table. Replaced the stale ?token= SSE auth note with the HttpOnly gstack_sse cookie flow. - CLAUDE.md: added Transport-layer security section above the sidebar prompt-injection stack so contributors editing server.ts, sse-session-cookie.ts, or tunnel-denial-log.ts see the load-bearing module boundaries before touching them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9.6 KiB
Remote Browser Access — How to Pair With a GStack Browser
A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.
This document is the reference for remote agents. The quick-start instructions are
generated by $B pair-agent with the actual credentials baked in.
Architecture
Your Machine Remote Agent
───────────── ────────────
GStack Browser Server Any AI agent
├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.)
├── Local listener 127.0.0.1:LOCAL │
│ (bootstrap, CLI, sidebar, cookies) │
├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤
│ (pair-agent only: /connect, /command, │
│ /sidebar-chat — locked allowlist) │
├── ngrok tunnel (forwards tunnel port only) │
│ https://xxx.ngrok.dev ─────────────────┘
└── Token Registry
├── Root token (local listener only)
├── Setup keys (5 min, one-time)
├── Session tokens (24h, scoped)
└── SSE session cookies (30 min, stream-scope)
Dual-listener architecture (v1.6.0.0)
The daemon binds two HTTP sockets. The local listener serves the full command surface to 127.0.0.1 only and is never forwarded. The tunnel listener is bound lazily on /tunnel/start (and torn down on /tunnel/stop) with a locked path allowlist. ngrok forwards only the tunnel port.
A caller who stumbles onto your ngrok URL cannot reach /health, /cookie-picker, /inspector/*, or /welcome — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only /connect, /command (with a scoped token + the 17-command browser-driving allowlist), and /sidebar-chat.
See ARCHITECTURE.md for the full endpoint table.
Connection Flow
- User runs
$B pair-agent(or/pair-agentin Claude Code) - Server creates a one-time setup key (expires in 5 minutes)
- User copies the instruction block into the other agent's chat
- Remote agent runs
POST /connectwith the setup key - Server returns a scoped session token (24h default)
- Remote agent creates its own tab via
POST /commandwithnewtab - Remote agent browses using
POST /commandwith its session token + tabId
API Reference
Authentication
All command endpoints require a Bearer token:
Authorization: Bearer gsk_sess_...
/connect is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. /health is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404).
SSE endpoints (/activity/stream, /inspector/events) accept either a Bearer token or the HttpOnly gstack_sse cookie (minted via POST /sse-session, 30-minute TTL, stream-scope only — cannot be used against /command). As of v1.6.0.0 the ?token=<ROOT> query-string auth is no longer accepted.
Endpoints
POST /connect
Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable).
Request: {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}
POST /command
Send a browser command. Requires Bearer auth.
Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)
GET /health
Server status. No auth required. Returns status, tabs, mode, uptime.
Commands
Navigation
| Command | Args | Description |
|---|---|---|
goto |
["URL"] |
Navigate to a URL |
back |
[] |
Go back |
forward |
[] |
Go forward |
reload |
[] |
Reload page |
Reading Content
| Command | Args | Description |
|---|---|---|
snapshot |
["-i"] |
Interactive snapshot with @ref labels (most useful) |
text |
[] |
Full page text |
html |
["selector?"] |
HTML of element or full page |
links |
[] |
All links on page |
screenshot |
["/tmp/s.png"] |
Take a screenshot |
url |
[] |
Current URL |
Interaction
| Command | Args | Description |
|---|---|---|
click |
["@e3"] |
Click an element (use @ref from snapshot) |
fill |
["@e5", "text"] |
Fill a form field |
select |
["@e7", "option"] |
Select dropdown value |
type |
["text"] |
Type text (keyboard) |
press |
["Enter"] |
Press a key |
scroll |
["down"] |
Scroll the page |
Tabs
| Command | Args | Description |
|---|---|---|
newtab |
["URL?"] |
Create a new tab (required before writing) |
tabs |
[] |
List all tabs |
closetab |
["id?"] |
Close a tab |
The Snapshot → @ref Pattern
This is the most powerful browsing pattern. Instead of writing CSS selectors:
- Run
snapshot -ito get an interactive snapshot with labeled elements - The snapshot returns text like:
[Page Title] @e1 [link] "Home" @e2 [button] "Sign In" @e3 [input] "Search..." - Use the
@erefs directly in commands:click @e2,fill @e3 "search query"
This is how the snapshot system works, and it's much more reliable than guessing
CSS selectors. Always snapshot -i first, then use the refs.
Scopes
| Scope | What it allows |
|---|---|
read |
snapshot, text, html, links, screenshot, url, tabs, console, etc. |
write |
goto, click, fill, scroll, newtab, closetab, etc. |
admin |
eval, js, cookies, storage, cookie-import, useragent, etc. |
meta |
tab, diff, frame, responsive, watch |
Default tokens get read + write. Admin requires --admin flag when pairing.
Tab Isolation
Each agent owns the tabs it creates. Rules:
- Read: Any agent can read any tab (snapshot, text, screenshot)
- Write: Only the tab owner can write (click, fill, goto, etc.)
- Unowned tabs: Pre-existing tabs are root-only for writes
- First step: Always
newtabbefore trying to interact
Error Codes
| Code | Meaning | What to do |
|---|---|---|
| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again |
| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin |
| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header |
Security Model
- Physical port separation. Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
- Tunnel command allowlist.
/commandover the tunnel only accepts 17 browser-driving commands (goto, click, fill, snapshot, text, etc.). Server-management commands (tunnel, pair, token, useragent, eval, js) are denied on the tunnel. - Root token is tunnel-blocked. A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
- Setup keys expire in 5 minutes and can only be used once.
- Session tokens expire in 24 hours (configurable).
- The root token never appears in instruction blocks or connection strings.
- Admin scope (JS execution, cookie access) is denied by default.
- Tokens can be revoked instantly:
$B tunnel revoke agent-name - SSE auth uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against
/command). - Path traversal guarded on
/welcome—GSTACK_SLUGmust match^[a-z0-9_-]+$or falls back to the built-in template. - SSRF guards on
goto,download, and scrape paths — validates URL target against a localhost/private-range blocklist. - Tunnel surface denial logging. Every rejection on the tunnel listener (
path_not_on_tunnel,root_token_on_tunnel,missing_scoped_token,disallowed_command:*) is appended to~/.gstack/security/attempts.jsonlwith timestamp, source IP, path, method. Rate-capped at 60 writes/min. - All agent activity is logged with attribution (clientId).
Known non-goal (tracked as #1136): on Windows, the cookie-import-browser path launches Chrome with --remote-debugging-port=<random>. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is --remote-debugging-pipe instead of TCP.
Same-Machine Shortcut
If both agents are on the same machine, skip the copy-paste:
$B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json
No tunnel needed. Uses localhost directly.
ngrok Tunnel Setup
For remote agents on different machines:
- Sign up at ngrok.com (free tier works)
- Copy your auth token from the dashboard
- Save it:
echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env - Optionally claim a stable domain:
echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env - Start with tunnel:
BROWSE_TUNNEL=1 $B restart - Run
$B pair-agent— it will use the tunnel URL automatically