mirror of https://github.com/garrytan/gstack.git synced 2026-05-06 13:45:35 +02:00

Files

T

Garry Tan 3bff673671 docs: update project documentation for v1.6.0.0

Reflect the dual-listener tunnel architecture, SSE session cookies,
SSRF guards, and Windows v20 ABE non-goal across the three docs
users actually read for remote-agent and browser auth context:

- docs/REMOTE_BROWSER_ACCESS.md: rewrote Architecture diagram for
  dual listeners, fixed /connect rate limit (3/min → 300/min),
  removed stale "/health requires no auth" (now 404 on tunnel),
  added SSE cookie auth, expanded Security Model with tunnel
  allowlist, SSRF guards, /welcome path traversal defense, and
  the Windows v20 ABE tracking note.
- BROWSER.md: added dual-listener paragraph to Authentication and
  linked to ARCHITECTURE.md endpoint table. Replaced the stale
  ?token= SSE auth note with the HttpOnly gstack_sse cookie flow.
- CLAUDE.md: added Transport-layer security section above the
  sidebar prompt-injection stack so contributors editing server.ts,
  sse-session-cookie.ts, or tunnel-denial-log.ts see the load-bearing
  module boundaries before touching them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-21 21:11:55 -07:00

9.6 KiB

Raw Permalink Blame History

Remote Browser Access — How to Pair With a GStack Browser

A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.

This document is the reference for remote agents. The quick-start instructions are generated by $B pair-agent with the actual credentials baked in.

Architecture

Your Machine                          Remote Agent
─────────────                         ────────────
GStack Browser Server                 Any AI agent
  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
  ├── Local listener  127.0.0.1:LOCAL         │
  │    (bootstrap, CLI, sidebar, cookies)      │
  ├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤
  │    (pair-agent only: /connect, /command,   │
  │     /sidebar-chat — locked allowlist)      │
  ├── ngrok tunnel (forwards tunnel port only) │
  │     https://xxx.ngrok.dev ─────────────────┘
  └── Token Registry
        ├── Root token (local listener only)
        ├── Setup keys (5 min, one-time)
        ├── Session tokens (24h, scoped)
        └── SSE session cookies (30 min, stream-scope)

Dual-listener architecture (v1.6.0.0)

The daemon binds two HTTP sockets. The local listener serves the full command surface to 127.0.0.1 only and is never forwarded. The tunnel listener is bound lazily on /tunnel/start (and torn down on /tunnel/stop) with a locked path allowlist. ngrok forwards only the tunnel port.

A caller who stumbles onto your ngrok URL cannot reach /health, /cookie-picker, /inspector/*, or /welcome — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only /connect, /command (with a scoped token + the 17-command browser-driving allowlist), and /sidebar-chat.

See ARCHITECTURE.md for the full endpoint table.

Connection Flow

User runs $B pair-agent (or /pair-agent in Claude Code)
Server creates a one-time setup key (expires in 5 minutes)
User copies the instruction block into the other agent's chat
Remote agent runs POST /connect with the setup key
Server returns a scoped session token (24h default)
Remote agent creates its own tab via POST /command with newtab
Remote agent browses using POST /command with its session token + tabId

API Reference

Authentication

All command endpoints require a Bearer token:

Authorization: Bearer gsk_sess_...

/connect is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. /health is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404).

SSE endpoints (/activity/stream, /inspector/events) accept either a Bearer token or the HttpOnly gstack_sse cookie (minted via POST /sse-session, 30-minute TTL, stream-scope only — cannot be used against /command). As of v1.6.0.0 the ?token=<ROOT> query-string auth is no longer accepted.

Endpoints

POST /connect

Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable).

Request:  {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}

POST /command

Send a browser command. Requires Bearer auth.

Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)

GET /health

Server status. No auth required. Returns status, tabs, mode, uptime.

Commands

Command	Args	Description
`goto`	`["URL"]`	Navigate to a URL
`back`	`[]`	Go back
`forward`	`[]`	Go forward
`reload`	`[]`	Reload page

Reading Content

Command	Args	Description
`snapshot`	`["-i"]`	Interactive snapshot with @ref labels (most useful)
`text`	`[]`	Full page text
`html`	`["selector?"]`	HTML of element or full page
`links`	`[]`	All links on page
`screenshot`	`["/tmp/s.png"]`	Take a screenshot
`url`	`[]`	Current URL

Interaction

Command	Args	Description
`click`	`["@e3"]`	Click an element (use @ref from snapshot)
`fill`	`["@e5", "text"]`	Fill a form field
`select`	`["@e7", "option"]`	Select dropdown value
`type`	`["text"]`	Type text (keyboard)
`press`	`["Enter"]`	Press a key
`scroll`	`["down"]`	Scroll the page

Tabs

Command	Args	Description
`newtab`	`["URL?"]`	Create a new tab (required before writing)
`tabs`	`[]`	List all tabs
`closetab`	`["id?"]`	Close a tab

The Snapshot → @ref Pattern

This is the most powerful browsing pattern. Instead of writing CSS selectors:

Run snapshot -i to get an interactive snapshot with labeled elements

The snapshot returns text like:

[Page Title]
@e1 [link] "Home"
@e2 [button] "Sign In"
@e3 [input] "Search..."

Use the @e refs directly in commands: click @e2, fill @e3 "search query"

This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always snapshot -i first, then use the refs.

Scopes

Scope	What it allows
`read`	snapshot, text, html, links, screenshot, url, tabs, console, etc.
`write`	goto, click, fill, scroll, newtab, closetab, etc.
`admin`	eval, js, cookies, storage, cookie-import, useragent, etc.
`meta`	tab, diff, frame, responsive, watch

Default tokens get read + write. Admin requires --admin flag when pairing.

Tab Isolation

Each agent owns the tabs it creates. Rules:

Read: Any agent can read any tab (snapshot, text, screenshot)
Write: Only the tab owner can write (click, fill, goto, etc.)
Unowned tabs: Pre-existing tabs are root-only for writes
First step: Always newtab before trying to interact

Error Codes

Code	Meaning	What to do
401	Token invalid, expired, or revoked	Ask user to run /pair-agent again
403	Command not in scope, or tab not yours	Use newtab, or ask for --admin
429	Rate limit exceeded (>10 req/s)	Wait for Retry-After header

Security Model

Physical port separation. Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
Tunnel command allowlist. /command over the tunnel only accepts 17 browser-driving commands (goto, click, fill, snapshot, text, etc.). Server-management commands (tunnel, pair, token, useragent, eval, js) are denied on the tunnel.
Root token is tunnel-blocked. A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
Setup keys expire in 5 minutes and can only be used once.
Session tokens expire in 24 hours (configurable).
The root token never appears in instruction blocks or connection strings.
Admin scope (JS execution, cookie access) is denied by default.
Tokens can be revoked instantly: $B tunnel revoke agent-name
SSE auth uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against /command).
Path traversal guarded on /welcome — GSTACK_SLUG must match ^[a-z0-9_-]+$ or falls back to the built-in template.
SSRF guards on goto, download, and scrape paths — validates URL target against a localhost/private-range blocklist.
Tunnel surface denial logging. Every rejection on the tunnel listener (path_not_on_tunnel, root_token_on_tunnel, missing_scoped_token, disallowed_command:*) is appended to ~/.gstack/security/attempts.jsonl with timestamp, source IP, path, method. Rate-capped at 60 writes/min.
All agent activity is logged with attribution (clientId).

Known non-goal (tracked as #1136): on Windows, the cookie-import-browser path launches Chrome with --remote-debugging-port=<random>. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is --remote-debugging-pipe instead of TCP.

Same-Machine Shortcut

If both agents are on the same machine, skip the copy-paste:

$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json

No tunnel needed. Uses localhost directly.

ngrok Tunnel Setup

For remote agents on different machines:

Sign up at ngrok.com (free tier works)
Copy your auth token from the dashboard
Save it: echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env
Optionally claim a stable domain: echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env
Start with tunnel: BROWSE_TUNNEL=1 $B restart
Run $B pair-agent — it will use the tunnel URL automatically

9.6 KiB Raw Permalink Blame History