Files
gstack/docs/REMOTE_BROWSER_ACCESS.md
T
Garry Tan bf66cec3d5 docs: remote browser access reference for paired agents
Full API reference, snapshot→@ref pattern, scopes, tab isolation,
error codes, ngrok setup, and same-machine shortcuts. The instruction
block points here for deeper reading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:53:47 -07:00

6.3 KiB

Remote Browser Access — How to Pair With a GStack Browser

A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.

This document is the reference for remote agents. The quick-start instructions are generated by $B pair-agent with the actual credentials baked in.

Architecture

Your Machine                          Remote Agent
─────────────                         ────────────
GStack Browser Server                 Any AI agent
  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
  ├── HTTP API on localhost:PORT           │
  ├── ngrok tunnel (optional)              │
  │     https://xxx.ngrok.dev ─────────────┘
  └── Token Registry
        ├── Root token (local only)
        ├── Setup keys (5 min, one-time)
        └── Session tokens (24h, scoped)

Connection Flow

  1. User runs $B pair-agent (or /pair-agent in Claude Code)
  2. Server creates a one-time setup key (expires in 5 minutes)
  3. User copies the instruction block into the other agent's chat
  4. Remote agent runs POST /connect with the setup key
  5. Server returns a scoped session token (24h default)
  6. Remote agent creates its own tab via POST /command with newtab
  7. Remote agent browses using POST /command with its session token + tabId

API Reference

Authentication

All endpoints except /connect and /health require a Bearer token:

Authorization: Bearer gsk_sess_...

Endpoints

POST /connect

Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute.

Request:  {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}

POST /command

Send a browser command. Requires Bearer auth.

Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)

GET /health

Server status. No auth required. Returns status, tabs, mode, uptime.

Commands

Navigation

Command Args Description
goto ["URL"] Navigate to a URL
back [] Go back
forward [] Go forward
reload [] Reload page

Reading Content

Command Args Description
snapshot ["-i"] Interactive snapshot with @ref labels (most useful)
text [] Full page text
html ["selector?"] HTML of element or full page
links [] All links on page
screenshot ["/tmp/s.png"] Take a screenshot
url [] Current URL

Interaction

Command Args Description
click ["@e3"] Click an element (use @ref from snapshot)
fill ["@e5", "text"] Fill a form field
select ["@e7", "option"] Select dropdown value
type ["text"] Type text (keyboard)
press ["Enter"] Press a key
scroll ["down"] Scroll the page

Tabs

Command Args Description
newtab ["URL?"] Create a new tab (required before writing)
tabs [] List all tabs
closetab ["id?"] Close a tab

The Snapshot → @ref Pattern

This is the most powerful browsing pattern. Instead of writing CSS selectors:

  1. Run snapshot -i to get an interactive snapshot with labeled elements
  2. The snapshot returns text like:
    [Page Title]
    @e1 [link] "Home"
    @e2 [button] "Sign In"
    @e3 [input] "Search..."
    
  3. Use the @e refs directly in commands: click @e2, fill @e3 "search query"

This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always snapshot -i first, then use the refs.

Scopes

Scope What it allows
read snapshot, text, html, links, screenshot, url, tabs, console, etc.
write goto, click, fill, scroll, newtab, closetab, etc.
admin eval, js, cookies, storage, cookie-import, useragent, etc.
meta tab, diff, frame, responsive, watch

Default tokens get read + write. Admin requires --admin flag when pairing.

Tab Isolation

Each agent owns the tabs it creates. Rules:

  • Read: Any agent can read any tab (snapshot, text, screenshot)
  • Write: Only the tab owner can write (click, fill, goto, etc.)
  • Unowned tabs: Pre-existing tabs are root-only for writes
  • First step: Always newtab before trying to interact

Error Codes

Code Meaning What to do
401 Token invalid, expired, or revoked Ask user to run /pair-agent again
403 Command not in scope, or tab not yours Use newtab, or ask for --admin
429 Rate limit exceeded (>10 req/s) Wait for Retry-After header

Security Model

  • Setup keys expire in 5 minutes and can only be used once
  • Session tokens expire in 24 hours (configurable)
  • The root token never appears in instruction blocks or connection strings
  • Admin scope (JS execution, cookie access) is denied by default
  • Tokens can be revoked instantly: $B tunnel revoke agent-name
  • All agent activity is logged with attribution (clientId)

Same-Machine Shortcut

If both agents are on the same machine, skip the copy-paste:

$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json

No tunnel needed. Uses localhost directly.

ngrok Tunnel Setup

For remote agents on different machines:

  1. Sign up at ngrok.com (free tier works)
  2. Copy your auth token from the dashboard
  3. Save it: echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env
  4. Optionally claim a stable domain: echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env
  5. Start with tunnel: BROWSE_TUNNEL=1 $B restart
  6. Run $B pair-agent — it will use the tunnel URL automatically