mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
docs: remote browser access reference for paired agents
Full API reference, snapshot→@ref pattern, scopes, tab isolation, error codes, ngrok setup, and same-machine shortcuts. The instruction block points here for deeper reading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,178 @@
|
||||
# Remote Browser Access — How to Pair With a GStack Browser
|
||||
|
||||
A GStack Browser server can be shared with any AI agent that can make HTTP requests.
|
||||
The agent gets scoped access to a real Chromium browser: navigate pages, read content,
|
||||
click elements, fill forms, take screenshots. Each agent gets its own tab.
|
||||
|
||||
This document is the reference for remote agents. The quick-start instructions are
|
||||
generated by `$B pair-agent` with the actual credentials baked in.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Your Machine Remote Agent
|
||||
───────────── ────────────
|
||||
GStack Browser Server Any AI agent
|
||||
├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.)
|
||||
├── HTTP API on localhost:PORT │
|
||||
├── ngrok tunnel (optional) │
|
||||
│ https://xxx.ngrok.dev ─────────────┘
|
||||
└── Token Registry
|
||||
├── Root token (local only)
|
||||
├── Setup keys (5 min, one-time)
|
||||
└── Session tokens (24h, scoped)
|
||||
```
|
||||
|
||||
## Connection Flow
|
||||
|
||||
1. **User runs** `$B pair-agent` (or `/pair-agent` in Claude Code)
|
||||
2. **Server creates** a one-time setup key (expires in 5 minutes)
|
||||
3. **User copies** the instruction block into the other agent's chat
|
||||
4. **Remote agent runs** `POST /connect` with the setup key
|
||||
5. **Server returns** a scoped session token (24h default)
|
||||
6. **Remote agent creates** its own tab via `POST /command` with `newtab`
|
||||
7. **Remote agent browses** using `POST /command` with its session token + tabId
|
||||
|
||||
## API Reference
|
||||
|
||||
### Authentication
|
||||
|
||||
All endpoints except `/connect` and `/health` require a Bearer token:
|
||||
|
||||
```
|
||||
Authorization: Bearer gsk_sess_...
|
||||
```
|
||||
|
||||
### Endpoints
|
||||
|
||||
#### POST /connect
|
||||
Exchange a setup key for a session token. No auth required. Rate-limited to 3/minute.
|
||||
|
||||
```json
|
||||
Request: {"setup_key": "gsk_setup_..."}
|
||||
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}
|
||||
```
|
||||
|
||||
#### POST /command
|
||||
Send a browser command. Requires Bearer auth.
|
||||
|
||||
```json
|
||||
Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1}
|
||||
Response: (plain text result of the command)
|
||||
```
|
||||
|
||||
#### GET /health
|
||||
Server status. No auth required. Returns status, tabs, mode, uptime.
|
||||
|
||||
### Commands
|
||||
|
||||
#### Navigation
|
||||
| Command | Args | Description |
|
||||
|---------|------|-------------|
|
||||
| `goto` | `["URL"]` | Navigate to a URL |
|
||||
| `back` | `[]` | Go back |
|
||||
| `forward` | `[]` | Go forward |
|
||||
| `reload` | `[]` | Reload page |
|
||||
|
||||
#### Reading Content
|
||||
| Command | Args | Description |
|
||||
|---------|------|-------------|
|
||||
| `snapshot` | `["-i"]` | Interactive snapshot with @ref labels (most useful) |
|
||||
| `text` | `[]` | Full page text |
|
||||
| `html` | `["selector?"]` | HTML of element or full page |
|
||||
| `links` | `[]` | All links on page |
|
||||
| `screenshot` | `["/tmp/s.png"]` | Take a screenshot |
|
||||
| `url` | `[]` | Current URL |
|
||||
|
||||
#### Interaction
|
||||
| Command | Args | Description |
|
||||
|---------|------|-------------|
|
||||
| `click` | `["@e3"]` | Click an element (use @ref from snapshot) |
|
||||
| `fill` | `["@e5", "text"]` | Fill a form field |
|
||||
| `select` | `["@e7", "option"]` | Select dropdown value |
|
||||
| `type` | `["text"]` | Type text (keyboard) |
|
||||
| `press` | `["Enter"]` | Press a key |
|
||||
| `scroll` | `["down"]` | Scroll the page |
|
||||
|
||||
#### Tabs
|
||||
| Command | Args | Description |
|
||||
|---------|------|-------------|
|
||||
| `newtab` | `["URL?"]` | Create a new tab (required before writing) |
|
||||
| `tabs` | `[]` | List all tabs |
|
||||
| `closetab` | `["id?"]` | Close a tab |
|
||||
|
||||
## The Snapshot → @ref Pattern
|
||||
|
||||
This is the most powerful browsing pattern. Instead of writing CSS selectors:
|
||||
|
||||
1. Run `snapshot -i` to get an interactive snapshot with labeled elements
|
||||
2. The snapshot returns text like:
|
||||
```
|
||||
[Page Title]
|
||||
@e1 [link] "Home"
|
||||
@e2 [button] "Sign In"
|
||||
@e3 [input] "Search..."
|
||||
```
|
||||
3. Use the `@e` refs directly in commands: `click @e2`, `fill @e3 "search query"`
|
||||
|
||||
This is how the snapshot system works, and it's much more reliable than guessing
|
||||
CSS selectors. Always `snapshot -i` first, then use the refs.
|
||||
|
||||
## Scopes
|
||||
|
||||
| Scope | What it allows |
|
||||
|-------|---------------|
|
||||
| `read` | snapshot, text, html, links, screenshot, url, tabs, console, etc. |
|
||||
| `write` | goto, click, fill, scroll, newtab, closetab, etc. |
|
||||
| `admin` | eval, js, cookies, storage, cookie-import, useragent, etc. |
|
||||
| `meta` | tab, diff, frame, responsive, watch |
|
||||
|
||||
Default tokens get `read` + `write`. Admin requires `--admin` flag when pairing.
|
||||
|
||||
## Tab Isolation
|
||||
|
||||
Each agent owns the tabs it creates. Rules:
|
||||
- **Read:** Any agent can read any tab (snapshot, text, screenshot)
|
||||
- **Write:** Only the tab owner can write (click, fill, goto, etc.)
|
||||
- **Unowned tabs:** Pre-existing tabs are root-only for writes
|
||||
- **First step:** Always `newtab` before trying to interact
|
||||
|
||||
## Error Codes
|
||||
|
||||
| Code | Meaning | What to do |
|
||||
|------|---------|------------|
|
||||
| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again |
|
||||
| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin |
|
||||
| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header |
|
||||
|
||||
## Security Model
|
||||
|
||||
- Setup keys expire in 5 minutes and can only be used once
|
||||
- Session tokens expire in 24 hours (configurable)
|
||||
- The root token never appears in instruction blocks or connection strings
|
||||
- Admin scope (JS execution, cookie access) is denied by default
|
||||
- Tokens can be revoked instantly: `$B tunnel revoke agent-name`
|
||||
- All agent activity is logged with attribution (clientId)
|
||||
|
||||
## Same-Machine Shortcut
|
||||
|
||||
If both agents are on the same machine, skip the copy-paste:
|
||||
|
||||
```bash
|
||||
$B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json
|
||||
$B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json
|
||||
$B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json
|
||||
```
|
||||
|
||||
No tunnel needed. Uses localhost directly.
|
||||
|
||||
## ngrok Tunnel Setup
|
||||
|
||||
For remote agents on different machines:
|
||||
|
||||
1. Sign up at [ngrok.com](https://ngrok.com) (free tier works)
|
||||
2. Copy your auth token from the dashboard
|
||||
3. Save it: `echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env`
|
||||
4. Optionally claim a stable domain: `echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env`
|
||||
5. Start with tunnel: `BROWSE_TUNNEL=1 $B restart`
|
||||
6. Run `$B pair-agent` — it will use the tunnel URL automatically
|
||||
Reference in New Issue
Block a user