mirror of https://github.com/garrytan/gstack.git synced 2026-05-06 21:46:40 +02:00

Files

T

Garry Tan bf66cec3d5 docs: remote browser access reference for paired agents

Full API reference, snapshot→@ref pattern, scopes, tab isolation,
error codes, ngrok setup, and same-machine shortcuts. The instruction
block points here for deeper reading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 23:53:47 -07:00

6.3 KiB

Raw Blame History

Remote Browser Access — How to Pair With a GStack Browser

A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.

This document is the reference for remote agents. The quick-start instructions are generated by $B pair-agent with the actual credentials baked in.

Architecture

Your Machine                          Remote Agent
─────────────                         ────────────
GStack Browser Server                 Any AI agent
  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
  ├── HTTP API on localhost:PORT           │
  ├── ngrok tunnel (optional)              │
  │     https://xxx.ngrok.dev ─────────────┘
  └── Token Registry
        ├── Root token (local only)
        ├── Setup keys (5 min, one-time)
        └── Session tokens (24h, scoped)

Connection Flow

User runs $B pair-agent (or /pair-agent in Claude Code)
Server creates a one-time setup key (expires in 5 minutes)
User copies the instruction block into the other agent's chat
Remote agent runs POST /connect with the setup key
Server returns a scoped session token (24h default)
Remote agent creates its own tab via POST /command with newtab
Remote agent browses using POST /command with its session token + tabId

API Reference

Authentication

All endpoints except /connect and /health require a Bearer token:

Authorization: Bearer gsk_sess_...

Endpoints

POST /connect

Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute.

Request:  {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}

POST /command

Send a browser command. Requires Bearer auth.

Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)

GET /health

Server status. No auth required. Returns status, tabs, mode, uptime.

Commands

Command	Args	Description
`goto`	`["URL"]`	Navigate to a URL
`back`	`[]`	Go back
`forward`	`[]`	Go forward
`reload`	`[]`	Reload page

Reading Content

Command	Args	Description
`snapshot`	`["-i"]`	Interactive snapshot with @ref labels (most useful)
`text`	`[]`	Full page text
`html`	`["selector?"]`	HTML of element or full page
`links`	`[]`	All links on page
`screenshot`	`["/tmp/s.png"]`	Take a screenshot
`url`	`[]`	Current URL

Interaction

Command	Args	Description
`click`	`["@e3"]`	Click an element (use @ref from snapshot)
`fill`	`["@e5", "text"]`	Fill a form field
`select`	`["@e7", "option"]`	Select dropdown value
`type`	`["text"]`	Type text (keyboard)
`press`	`["Enter"]`	Press a key
`scroll`	`["down"]`	Scroll the page

Tabs

Command	Args	Description
`newtab`	`["URL?"]`	Create a new tab (required before writing)
`tabs`	`[]`	List all tabs
`closetab`	`["id?"]`	Close a tab

The Snapshot → @ref Pattern

This is the most powerful browsing pattern. Instead of writing CSS selectors:

Run snapshot -i to get an interactive snapshot with labeled elements

The snapshot returns text like:

[Page Title]
@e1 [link] "Home"
@e2 [button] "Sign In"
@e3 [input] "Search..."

Use the @e refs directly in commands: click @e2, fill @e3 "search query"

This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always snapshot -i first, then use the refs.

Scopes

Scope	What it allows
`read`	snapshot, text, html, links, screenshot, url, tabs, console, etc.
`write`	goto, click, fill, scroll, newtab, closetab, etc.
`admin`	eval, js, cookies, storage, cookie-import, useragent, etc.
`meta`	tab, diff, frame, responsive, watch

Default tokens get read + write. Admin requires --admin flag when pairing.

Tab Isolation

Each agent owns the tabs it creates. Rules:

Read: Any agent can read any tab (snapshot, text, screenshot)
Write: Only the tab owner can write (click, fill, goto, etc.)
Unowned tabs: Pre-existing tabs are root-only for writes
First step: Always newtab before trying to interact

Error Codes

Code	Meaning	What to do
401	Token invalid, expired, or revoked	Ask user to run /pair-agent again
403	Command not in scope, or tab not yours	Use newtab, or ask for --admin
429	Rate limit exceeded (>10 req/s)	Wait for Retry-After header

Security Model

Setup keys expire in 5 minutes and can only be used once
Session tokens expire in 24 hours (configurable)
The root token never appears in instruction blocks or connection strings
Admin scope (JS execution, cookie access) is denied by default
Tokens can be revoked instantly: $B tunnel revoke agent-name
All agent activity is logged with attribution (clientId)

Same-Machine Shortcut

If both agents are on the same machine, skip the copy-paste:

$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json

No tunnel needed. Uses localhost directly.

ngrok Tunnel Setup

For remote agents on different machines:

Sign up at ngrok.com (free tier works)
Copy your auth token from the dashboard
Save it: echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env
Optionally claim a stable domain: echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env
Start with tunnel: BROWSE_TUNNEL=1 $B restart
Run $B pair-agent — it will use the tunnel URL automatically

6.3 KiB Raw Blame History