diff --git a/docs/REMOTE_BROWSER_ACCESS.md b/docs/REMOTE_BROWSER_ACCESS.md new file mode 100644 index 00000000..c7d22ca1 --- /dev/null +++ b/docs/REMOTE_BROWSER_ACCESS.md @@ -0,0 +1,178 @@ +# Remote Browser Access — How to Pair With a GStack Browser + +A GStack Browser server can be shared with any AI agent that can make HTTP requests. +The agent gets scoped access to a real Chromium browser: navigate pages, read content, +click elements, fill forms, take screenshots. Each agent gets its own tab. + +This document is the reference for remote agents. The quick-start instructions are +generated by `$B pair-agent` with the actual credentials baked in. + +## Architecture + +``` +Your Machine Remote Agent +───────────── ──────────── +GStack Browser Server Any AI agent + ├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.) + ├── HTTP API on localhost:PORT │ + ├── ngrok tunnel (optional) │ + │ https://xxx.ngrok.dev ─────────────┘ + └── Token Registry + ├── Root token (local only) + ├── Setup keys (5 min, one-time) + └── Session tokens (24h, scoped) +``` + +## Connection Flow + +1. **User runs** `$B pair-agent` (or `/pair-agent` in Claude Code) +2. **Server creates** a one-time setup key (expires in 5 minutes) +3. **User copies** the instruction block into the other agent's chat +4. **Remote agent runs** `POST /connect` with the setup key +5. **Server returns** a scoped session token (24h default) +6. **Remote agent creates** its own tab via `POST /command` with `newtab` +7. **Remote agent browses** using `POST /command` with its session token + tabId + +## API Reference + +### Authentication + +All endpoints except `/connect` and `/health` require a Bearer token: + +``` +Authorization: Bearer gsk_sess_... +``` + +### Endpoints + +#### POST /connect +Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute. + +```json +Request: {"setup_key": "gsk_setup_..."} +Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"} +``` + +#### POST /command +Send a browser command. Requires Bearer auth. + +```json +Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1} +Response: (plain text result of the command) +``` + +#### GET /health +Server status. No auth required. Returns status, tabs, mode, uptime. + +### Commands + +#### Navigation +| Command | Args | Description | +|---------|------|-------------| +| `goto` | `["URL"]` | Navigate to a URL | +| `back` | `[]` | Go back | +| `forward` | `[]` | Go forward | +| `reload` | `[]` | Reload page | + +#### Reading Content +| Command | Args | Description | +|---------|------|-------------| +| `snapshot` | `["-i"]` | Interactive snapshot with @ref labels (most useful) | +| `text` | `[]` | Full page text | +| `html` | `["selector?"]` | HTML of element or full page | +| `links` | `[]` | All links on page | +| `screenshot` | `["/tmp/s.png"]` | Take a screenshot | +| `url` | `[]` | Current URL | + +#### Interaction +| Command | Args | Description | +|---------|------|-------------| +| `click` | `["@e3"]` | Click an element (use @ref from snapshot) | +| `fill` | `["@e5", "text"]` | Fill a form field | +| `select` | `["@e7", "option"]` | Select dropdown value | +| `type` | `["text"]` | Type text (keyboard) | +| `press` | `["Enter"]` | Press a key | +| `scroll` | `["down"]` | Scroll the page | + +#### Tabs +| Command | Args | Description | +|---------|------|-------------| +| `newtab` | `["URL?"]` | Create a new tab (required before writing) | +| `tabs` | `[]` | List all tabs | +| `closetab` | `["id?"]` | Close a tab | + +## The Snapshot → @ref Pattern + +This is the most powerful browsing pattern. Instead of writing CSS selectors: + +1. Run `snapshot -i` to get an interactive snapshot with labeled elements +2. The snapshot returns text like: + ``` + [Page Title] + @e1 [link] "Home" + @e2 [button] "Sign In" + @e3 [input] "Search..." + ``` +3. Use the `@e` refs directly in commands: `click @e2`, `fill @e3 "search query"` + +This is how the snapshot system works, and it's much more reliable than guessing +CSS selectors. Always `snapshot -i` first, then use the refs. + +## Scopes + +| Scope | What it allows | +|-------|---------------| +| `read` | snapshot, text, html, links, screenshot, url, tabs, console, etc. | +| `write` | goto, click, fill, scroll, newtab, closetab, etc. | +| `admin` | eval, js, cookies, storage, cookie-import, useragent, etc. | +| `meta` | tab, diff, frame, responsive, watch | + +Default tokens get `read` + `write`. Admin requires `--admin` flag when pairing. + +## Tab Isolation + +Each agent owns the tabs it creates. Rules: +- **Read:** Any agent can read any tab (snapshot, text, screenshot) +- **Write:** Only the tab owner can write (click, fill, goto, etc.) +- **Unowned tabs:** Pre-existing tabs are root-only for writes +- **First step:** Always `newtab` before trying to interact + +## Error Codes + +| Code | Meaning | What to do | +|------|---------|------------| +| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again | +| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin | +| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header | + +## Security Model + +- Setup keys expire in 5 minutes and can only be used once +- Session tokens expire in 24 hours (configurable) +- The root token never appears in instruction blocks or connection strings +- Admin scope (JS execution, cookie access) is denied by default +- Tokens can be revoked instantly: `$B tunnel revoke agent-name` +- All agent activity is logged with attribution (clientId) + +## Same-Machine Shortcut + +If both agents are on the same machine, skip the copy-paste: + +```bash +$B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json +$B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json +$B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json +``` + +No tunnel needed. Uses localhost directly. + +## ngrok Tunnel Setup + +For remote agents on different machines: + +1. Sign up at [ngrok.com](https://ngrok.com) (free tier works) +2. Copy your auth token from the dashboard +3. Save it: `echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env` +4. Optionally claim a stable domain: `echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env` +5. Start with tunnel: `BROWSE_TUNNEL=1 $B restart` +6. Run `$B pair-agent` — it will use the tunnel URL automatically