# Remote Browser Access — How to Pair With a GStack Browser A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab. This document is the reference for remote agents. The quick-start instructions are generated by `$B pair-agent` with the actual credentials baked in. ## Architecture ``` Your Machine Remote Agent ───────────── ──────────── GStack Browser Server Any AI agent ├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.) ├── Local listener 127.0.0.1:LOCAL │ │ (bootstrap, CLI, sidebar, cookies) │ ├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤ │ (pair-agent only: /connect, /command, │ │ /sidebar-chat — locked allowlist) │ ├── ngrok tunnel (forwards tunnel port only) │ │ https://xxx.ngrok.dev ─────────────────┘ └── Token Registry ├── Root token (local listener only) ├── Setup keys (5 min, one-time) ├── Session tokens (24h, scoped) └── SSE session cookies (30 min, stream-scope) ``` ### Dual-listener architecture (v1.6.0.0) The daemon binds two HTTP sockets. The **local listener** serves the full command surface to 127.0.0.1 only and is never forwarded. The **tunnel listener** is bound lazily on `/tunnel/start` (and torn down on `/tunnel/stop`) with a locked path allowlist. ngrok forwards only the tunnel port. A caller who stumbles onto your ngrok URL cannot reach `/health`, `/cookie-picker`, `/inspector/*`, or `/welcome` — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only `/connect`, `/command` (with a scoped token + the 26-command browser-driving allowlist), and `/sidebar-chat`. See [ARCHITECTURE.md](../ARCHITECTURE.md#dual-listener-tunnel-architecture-v1600) for the full endpoint table. ## Connection Flow 1. **User runs** `$B pair-agent` (or `/pair-agent` in Claude Code) 2. **Server creates** a one-time setup key (expires in 5 minutes) 3. **User copies** the instruction block into the other agent's chat 4. **Remote agent runs** `POST /connect` with the setup key 5. **Server returns** a scoped session token (24h default) 6. **Remote agent creates** its own tab via `POST /command` with `newtab` 7. **Remote agent browses** using `POST /command` with its session token + tabId ## API Reference ### Authentication All command endpoints require a Bearer token: ``` Authorization: Bearer gsk_sess_... ``` `/connect` is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. `/health` is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404). SSE endpoints (`/activity/stream`, `/inspector/events`) accept either a Bearer token or the HttpOnly `gstack_sse` cookie (minted via `POST /sse-session`, 30-minute TTL, stream-scope only — cannot be used against `/command`). As of v1.6.0.0 the `?token=` query-string auth is no longer accepted. ### Endpoints #### POST /connect Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable). ```json Request: {"setup_key": "gsk_setup_..."} Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"} ``` #### POST /command Send a browser command. Requires Bearer auth. ```json Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1} Response: (plain text result of the command) ``` #### GET /health Server status. No auth required. Returns status, tabs, mode, uptime. ### Commands #### Navigation | Command | Args | Description | |---------|------|-------------| | `goto` | `["URL"]` | Navigate to a URL | | `back` | `[]` | Go back | | `forward` | `[]` | Go forward | | `reload` | `[]` | Reload page | #### Reading Content | Command | Args | Description | |---------|------|-------------| | `snapshot` | `["-i"]` | Interactive snapshot with @ref labels (most useful) | | `text` | `[]` | Full page text | | `html` | `["selector?"]` | HTML of element or full page | | `links` | `[]` | All links on page | | `screenshot` | `["/tmp/s.png"]` | Take a screenshot | | `url` | `[]` | Current URL | #### Interaction | Command | Args | Description | |---------|------|-------------| | `click` | `["@e3"]` | Click an element (use @ref from snapshot) | | `fill` | `["@e5", "text"]` | Fill a form field | | `select` | `["@e7", "option"]` | Select dropdown value | | `type` | `["text"]` | Type text (keyboard) | | `press` | `["Enter"]` | Press a key | | `scroll` | `["down"]` | Scroll the page | #### Tabs | Command | Args | Description | |---------|------|-------------| | `newtab` | `["URL?"]` | Create a new tab (required before writing) | | `tabs` | `[]` | List all tabs | | `closetab` | `["id?"]` | Close a tab | ## The Snapshot → @ref Pattern This is the most powerful browsing pattern. Instead of writing CSS selectors: 1. Run `snapshot -i` to get an interactive snapshot with labeled elements 2. The snapshot returns text like: ``` [Page Title] @e1 [link] "Home" @e2 [button] "Sign In" @e3 [input] "Search..." ``` 3. Use the `@e` refs directly in commands: `click @e2`, `fill @e3 "search query"` This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always `snapshot -i` first, then use the refs. ## Scopes | Scope | What it allows | |-------|---------------| | `read` | snapshot, text, html, links, screenshot, url, tabs, console, etc. | | `write` | goto, click, fill, scroll, newtab, closetab, etc. | | `admin` | eval, js, cookies, storage, cookie-import, useragent, etc. | | `meta` | tab, diff, frame, responsive, watch | Default tokens get `read` + `write`. Admin requires `--admin` flag when pairing. ## Tab Isolation Each agent owns the tabs it creates. Rules: - **Read:** Any agent can read any tab (snapshot, text, screenshot) - **Write:** Only the tab owner can write (click, fill, goto, etc.) - **Unowned tabs:** Pre-existing tabs are root-only for writes - **First step:** Always `newtab` before trying to interact ## Error Codes | Code | Meaning | What to do | |------|---------|------------| | 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again | | 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin | | 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header | ## Security Model - **Physical port separation.** Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port). - **Tunnel command allowlist.** `/command` over the tunnel only accepts 26 browser-driving commands (goto, click, fill, snapshot, text, newtab, tabs, back, forward, reload, closetab, etc.). Server-management commands (tunnel, pair, token, useragent, js) are denied on the tunnel. - **Root token is tunnel-blocked.** A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel. - **Setup keys** expire in 5 minutes and can only be used once. - **Session tokens** expire in 24 hours (configurable). - The root token never appears in instruction blocks or connection strings. - **Admin scope** (JS execution, cookie access) is denied by default. - Tokens can be revoked instantly: `$B tunnel revoke agent-name` - **SSE auth** uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against `/command`). - **Path traversal guarded** on `/welcome` — `GSTACK_SLUG` must match `^[a-z0-9_-]+$` or falls back to the built-in template. - **SSRF guards** on `goto`, `download`, and scrape paths — validates URL target against a localhost/private-range blocklist. - **Tunnel surface denial logging.** Every rejection on the tunnel listener (`path_not_on_tunnel`, `root_token_on_tunnel`, `missing_scoped_token`, `disallowed_command:*`) is appended to `~/.gstack/security/attempts.jsonl` with timestamp, source IP, path, method. Rate-capped at 60 writes/min. - All agent activity is logged with attribution (clientId). **Known non-goal (tracked as #1136):** on Windows, the cookie-import-browser path launches Chrome with `--remote-debugging-port=`. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is `--remote-debugging-pipe` instead of TCP. ## Same-Machine Shortcut If both agents are on the same machine, skip the copy-paste: ```bash $B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json $B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json $B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json ``` No tunnel needed. Uses localhost directly. ## ngrok Tunnel Setup For remote agents on different machines: 1. Sign up at [ngrok.com](https://ngrok.com) (free tier works) 2. Copy your auth token from the dashboard 3. Save it: `echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env` 4. Optionally claim a stable domain: `echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env` 5. Start with tunnel: `BROWSE_TUNNEL=1 $B restart` 6. Run `$B pair-agent` — it will use the tunnel URL automatically