Full API reference, snapshot→@ref pattern, scopes, tab isolation, error codes, ngrok setup, and same-machine shortcuts. The instruction block points here for deeper reading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.3 KiB
Remote Browser Access — How to Pair With a GStack Browser
A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.
This document is the reference for remote agents. The quick-start instructions are
generated by $B pair-agent with the actual credentials baked in.
Architecture
Your Machine Remote Agent
───────────── ────────────
GStack Browser Server Any AI agent
├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.)
├── HTTP API on localhost:PORT │
├── ngrok tunnel (optional) │
│ https://xxx.ngrok.dev ─────────────┘
└── Token Registry
├── Root token (local only)
├── Setup keys (5 min, one-time)
└── Session tokens (24h, scoped)
Connection Flow
- User runs
$B pair-agent(or/pair-agentin Claude Code) - Server creates a one-time setup key (expires in 5 minutes)
- User copies the instruction block into the other agent's chat
- Remote agent runs
POST /connectwith the setup key - Server returns a scoped session token (24h default)
- Remote agent creates its own tab via
POST /commandwithnewtab - Remote agent browses using
POST /commandwith its session token + tabId
API Reference
Authentication
All endpoints except /connect and /health require a Bearer token:
Authorization: Bearer gsk_sess_...
Endpoints
POST /connect
Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute.
Request: {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}
POST /command
Send a browser command. Requires Bearer auth.
Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)
GET /health
Server status. No auth required. Returns status, tabs, mode, uptime.
Commands
Navigation
| Command | Args | Description |
|---|---|---|
goto |
["URL"] |
Navigate to a URL |
back |
[] |
Go back |
forward |
[] |
Go forward |
reload |
[] |
Reload page |
Reading Content
| Command | Args | Description |
|---|---|---|
snapshot |
["-i"] |
Interactive snapshot with @ref labels (most useful) |
text |
[] |
Full page text |
html |
["selector?"] |
HTML of element or full page |
links |
[] |
All links on page |
screenshot |
["/tmp/s.png"] |
Take a screenshot |
url |
[] |
Current URL |
Interaction
| Command | Args | Description |
|---|---|---|
click |
["@e3"] |
Click an element (use @ref from snapshot) |
fill |
["@e5", "text"] |
Fill a form field |
select |
["@e7", "option"] |
Select dropdown value |
type |
["text"] |
Type text (keyboard) |
press |
["Enter"] |
Press a key |
scroll |
["down"] |
Scroll the page |
Tabs
| Command | Args | Description |
|---|---|---|
newtab |
["URL?"] |
Create a new tab (required before writing) |
tabs |
[] |
List all tabs |
closetab |
["id?"] |
Close a tab |
The Snapshot → @ref Pattern
This is the most powerful browsing pattern. Instead of writing CSS selectors:
- Run
snapshot -ito get an interactive snapshot with labeled elements - The snapshot returns text like:
[Page Title] @e1 [link] "Home" @e2 [button] "Sign In" @e3 [input] "Search..." - Use the
@erefs directly in commands:click @e2,fill @e3 "search query"
This is how the snapshot system works, and it's much more reliable than guessing
CSS selectors. Always snapshot -i first, then use the refs.
Scopes
| Scope | What it allows |
|---|---|
read |
snapshot, text, html, links, screenshot, url, tabs, console, etc. |
write |
goto, click, fill, scroll, newtab, closetab, etc. |
admin |
eval, js, cookies, storage, cookie-import, useragent, etc. |
meta |
tab, diff, frame, responsive, watch |
Default tokens get read + write. Admin requires --admin flag when pairing.
Tab Isolation
Each agent owns the tabs it creates. Rules:
- Read: Any agent can read any tab (snapshot, text, screenshot)
- Write: Only the tab owner can write (click, fill, goto, etc.)
- Unowned tabs: Pre-existing tabs are root-only for writes
- First step: Always
newtabbefore trying to interact
Error Codes
| Code | Meaning | What to do |
|---|---|---|
| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again |
| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin |
| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header |
Security Model
- Setup keys expire in 5 minutes and can only be used once
- Session tokens expire in 24 hours (configurable)
- The root token never appears in instruction blocks or connection strings
- Admin scope (JS execution, cookie access) is denied by default
- Tokens can be revoked instantly:
$B tunnel revoke agent-name - All agent activity is logged with attribution (clientId)
Same-Machine Shortcut
If both agents are on the same machine, skip the copy-paste:
$B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json
No tunnel needed. Uses localhost directly.
ngrok Tunnel Setup
For remote agents on different machines:
- Sign up at ngrok.com (free tier works)
- Copy your auth token from the dashboard
- Save it:
echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env - Optionally claim a stable domain:
echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env - Start with tunnel:
BROWSE_TUNNEL=1 $B restart - Run
$B pair-agent— it will use the tunnel URL automatically