mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 11:17:50 +02:00
8f3701b761
* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel
Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to
TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already
documented). Each new command is bounded by the existing per-tab ownership
check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only'
so paired agents still can't operate on tabs they don't own.
Refactors the inline gate check at server.ts:1771-1783 into a pure exported
function canDispatchOverTunnel(command). Same behavior as the inline check;
the difference is unit-testability without HTTP.
Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve
listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed.
Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: source-level guards + pure-function unit test + dual-listener behavioral eval
Three layers of regression coverage for the tunnel allowlist:
1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set
equality on the 26-command literal (the prior intersection-only style let
new commands sneak into the source without test updates). Adds a regex
assertion that the `command !== 'newtab'` ownership exemption at
server.ts:613 still exists — catches refactors that re-introduce the
catch-22 from the other side. Updates the /command handler test to look
for canDispatchOverTunnel(body?.command) instead of the inline check.
2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed,
20 blocked, null/undefined/empty/non-string defensive handling, and alias
canonicalization (e.g. 'set-content' resolves to 'load-html' which is
correctly rejected since 'load-html' isn't tunnel-allowed).
3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the
daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both
listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and
assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel
403s with disallowed_command:pair AND writes a denial-log entry;
(c) pair over local does NOT trigger the tunnel gate (proves the gate
is surface-scoped); (d) regression for the catch-22 — newtab + goto on
the resulting tab does not 403 with "Tab not owned by your agent".
All four tests run free under bun test (no API spend, no ngrok).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md
Both docs already named the 9 new commands as remote-accessible (the operator
guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's
instruction blocks). The allowlist count was the only place the drift was
visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list:
'eval' is in the allowlist, not the denied list — prior doc was wrong.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.21.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot)
The previous bump landed at v1.21.0.0 because gstack-next-version
advances past the highest claimed slot (v1.20.0.0 from #1252) rather
than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and
v1.16.0.0 preserves monotonic version ordering on main once #1234
(v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots
The gate was rejecting any PR VERSION below the util's next-slot
recommendation, even when the lower slot was unclaimed. This blocked
PRs that legitimately want to land at an unclaimed slot below the queue
max — which is what /ship should pick when the goal is monotonic version
ordering on main (lower-numbered PRs landing first preserves order; the
util's "advance past max claimed" semantics only optimizes for fresh
runs picking unique slots, not for queue ordering on merge).
New gate logic:
1. Hard-fail if PR VERSION <= base VERSION (no actual bump).
2. Hard-fail if PR VERSION exactly matches another open PR's VERSION
(real collision).
3. Pass otherwise. If the PR is below the util's suggestion, emit an
informational ::notice:: explaining the slot is unclaimed.
The util's output stays informational — it tells fresh /ship runs what
the next-up slot should be, but the gate only blocks actual conflicts.
This is a strict relaxation: every PR that passed the old gate also
passes the new one.
Confirmed by dry-run against the current queue (4 open PRs claiming
1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0):
- v1.16.0.0 → pass with informational notice (unclaimed)
- v1.17.0.0 → fail (collision with #1234)
- v1.15.0.0 → fail (no bump from base)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
205 lines
9.7 KiB
Markdown
205 lines
9.7 KiB
Markdown
# Remote Browser Access — How to Pair With a GStack Browser
|
|
|
|
A GStack Browser server can be shared with any AI agent that can make HTTP requests.
|
|
The agent gets scoped access to a real Chromium browser: navigate pages, read content,
|
|
click elements, fill forms, take screenshots. Each agent gets its own tab.
|
|
|
|
This document is the reference for remote agents. The quick-start instructions are
|
|
generated by `$B pair-agent` with the actual credentials baked in.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Your Machine Remote Agent
|
|
───────────── ────────────
|
|
GStack Browser Server Any AI agent
|
|
├── Chromium (Playwright) (OpenClaw, Hermes, Codex, etc.)
|
|
├── Local listener 127.0.0.1:LOCAL │
|
|
│ (bootstrap, CLI, sidebar, cookies) │
|
|
├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤
|
|
│ (pair-agent only: /connect, /command, │
|
|
│ /sidebar-chat — locked allowlist) │
|
|
├── ngrok tunnel (forwards tunnel port only) │
|
|
│ https://xxx.ngrok.dev ─────────────────┘
|
|
└── Token Registry
|
|
├── Root token (local listener only)
|
|
├── Setup keys (5 min, one-time)
|
|
├── Session tokens (24h, scoped)
|
|
└── SSE session cookies (30 min, stream-scope)
|
|
```
|
|
|
|
### Dual-listener architecture (v1.6.0.0)
|
|
|
|
The daemon binds two HTTP sockets. The **local listener** serves the full command surface to 127.0.0.1 only and is never forwarded. The **tunnel listener** is bound lazily on `/tunnel/start` (and torn down on `/tunnel/stop`) with a locked path allowlist. ngrok forwards only the tunnel port.
|
|
|
|
A caller who stumbles onto your ngrok URL cannot reach `/health`, `/cookie-picker`, `/inspector/*`, or `/welcome` — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only `/connect`, `/command` (with a scoped token + the 26-command browser-driving allowlist), and `/sidebar-chat`.
|
|
|
|
See [ARCHITECTURE.md](../ARCHITECTURE.md#dual-listener-tunnel-architecture-v1600) for the full endpoint table.
|
|
|
|
## Connection Flow
|
|
|
|
1. **User runs** `$B pair-agent` (or `/pair-agent` in Claude Code)
|
|
2. **Server creates** a one-time setup key (expires in 5 minutes)
|
|
3. **User copies** the instruction block into the other agent's chat
|
|
4. **Remote agent runs** `POST /connect` with the setup key
|
|
5. **Server returns** a scoped session token (24h default)
|
|
6. **Remote agent creates** its own tab via `POST /command` with `newtab`
|
|
7. **Remote agent browses** using `POST /command` with its session token + tabId
|
|
|
|
## API Reference
|
|
|
|
### Authentication
|
|
|
|
All command endpoints require a Bearer token:
|
|
|
|
```
|
|
Authorization: Bearer gsk_sess_...
|
|
```
|
|
|
|
`/connect` is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. `/health` is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404).
|
|
|
|
SSE endpoints (`/activity/stream`, `/inspector/events`) accept either a Bearer token or the HttpOnly `gstack_sse` cookie (minted via `POST /sse-session`, 30-minute TTL, stream-scope only — cannot be used against `/command`). As of v1.6.0.0 the `?token=<ROOT>` query-string auth is no longer accepted.
|
|
|
|
### Endpoints
|
|
|
|
#### POST /connect
|
|
Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable).
|
|
|
|
```json
|
|
Request: {"setup_key": "gsk_setup_..."}
|
|
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}
|
|
```
|
|
|
|
#### POST /command
|
|
Send a browser command. Requires Bearer auth.
|
|
|
|
```json
|
|
Request: {"command": "goto", "args": ["https://example.com"], "tabId": 1}
|
|
Response: (plain text result of the command)
|
|
```
|
|
|
|
#### GET /health
|
|
Server status. No auth required. Returns status, tabs, mode, uptime.
|
|
|
|
### Commands
|
|
|
|
#### Navigation
|
|
| Command | Args | Description |
|
|
|---------|------|-------------|
|
|
| `goto` | `["URL"]` | Navigate to a URL |
|
|
| `back` | `[]` | Go back |
|
|
| `forward` | `[]` | Go forward |
|
|
| `reload` | `[]` | Reload page |
|
|
|
|
#### Reading Content
|
|
| Command | Args | Description |
|
|
|---------|------|-------------|
|
|
| `snapshot` | `["-i"]` | Interactive snapshot with @ref labels (most useful) |
|
|
| `text` | `[]` | Full page text |
|
|
| `html` | `["selector?"]` | HTML of element or full page |
|
|
| `links` | `[]` | All links on page |
|
|
| `screenshot` | `["/tmp/s.png"]` | Take a screenshot |
|
|
| `url` | `[]` | Current URL |
|
|
|
|
#### Interaction
|
|
| Command | Args | Description |
|
|
|---------|------|-------------|
|
|
| `click` | `["@e3"]` | Click an element (use @ref from snapshot) |
|
|
| `fill` | `["@e5", "text"]` | Fill a form field |
|
|
| `select` | `["@e7", "option"]` | Select dropdown value |
|
|
| `type` | `["text"]` | Type text (keyboard) |
|
|
| `press` | `["Enter"]` | Press a key |
|
|
| `scroll` | `["down"]` | Scroll the page |
|
|
|
|
#### Tabs
|
|
| Command | Args | Description |
|
|
|---------|------|-------------|
|
|
| `newtab` | `["URL?"]` | Create a new tab (required before writing) |
|
|
| `tabs` | `[]` | List all tabs |
|
|
| `closetab` | `["id?"]` | Close a tab |
|
|
|
|
## The Snapshot → @ref Pattern
|
|
|
|
This is the most powerful browsing pattern. Instead of writing CSS selectors:
|
|
|
|
1. Run `snapshot -i` to get an interactive snapshot with labeled elements
|
|
2. The snapshot returns text like:
|
|
```
|
|
[Page Title]
|
|
@e1 [link] "Home"
|
|
@e2 [button] "Sign In"
|
|
@e3 [input] "Search..."
|
|
```
|
|
3. Use the `@e` refs directly in commands: `click @e2`, `fill @e3 "search query"`
|
|
|
|
This is how the snapshot system works, and it's much more reliable than guessing
|
|
CSS selectors. Always `snapshot -i` first, then use the refs.
|
|
|
|
## Scopes
|
|
|
|
| Scope | What it allows |
|
|
|-------|---------------|
|
|
| `read` | snapshot, text, html, links, screenshot, url, tabs, console, etc. |
|
|
| `write` | goto, click, fill, scroll, newtab, closetab, etc. |
|
|
| `admin` | eval, js, cookies, storage, cookie-import, useragent, etc. |
|
|
| `meta` | tab, diff, frame, responsive, watch |
|
|
|
|
Default tokens get `read` + `write`. Admin requires `--admin` flag when pairing.
|
|
|
|
## Tab Isolation
|
|
|
|
Each agent owns the tabs it creates. Rules:
|
|
- **Read:** Any agent can read any tab (snapshot, text, screenshot)
|
|
- **Write:** Only the tab owner can write (click, fill, goto, etc.)
|
|
- **Unowned tabs:** Pre-existing tabs are root-only for writes
|
|
- **First step:** Always `newtab` before trying to interact
|
|
|
|
## Error Codes
|
|
|
|
| Code | Meaning | What to do |
|
|
|------|---------|------------|
|
|
| 401 | Token invalid, expired, or revoked | Ask user to run /pair-agent again |
|
|
| 403 | Command not in scope, or tab not yours | Use newtab, or ask for --admin |
|
|
| 429 | Rate limit exceeded (>10 req/s) | Wait for Retry-After header |
|
|
|
|
## Security Model
|
|
|
|
- **Physical port separation.** Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
|
|
- **Tunnel command allowlist.** `/command` over the tunnel only accepts 26 browser-driving commands (goto, click, fill, snapshot, text, newtab, tabs, back, forward, reload, closetab, etc.). Server-management commands (tunnel, pair, token, useragent, js) are denied on the tunnel.
|
|
- **Root token is tunnel-blocked.** A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
|
|
- **Setup keys** expire in 5 minutes and can only be used once.
|
|
- **Session tokens** expire in 24 hours (configurable).
|
|
- The root token never appears in instruction blocks or connection strings.
|
|
- **Admin scope** (JS execution, cookie access) is denied by default.
|
|
- Tokens can be revoked instantly: `$B tunnel revoke agent-name`
|
|
- **SSE auth** uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against `/command`).
|
|
- **Path traversal guarded** on `/welcome` — `GSTACK_SLUG` must match `^[a-z0-9_-]+$` or falls back to the built-in template.
|
|
- **SSRF guards** on `goto`, `download`, and scrape paths — validates URL target against a localhost/private-range blocklist.
|
|
- **Tunnel surface denial logging.** Every rejection on the tunnel listener (`path_not_on_tunnel`, `root_token_on_tunnel`, `missing_scoped_token`, `disallowed_command:*`) is appended to `~/.gstack/security/attempts.jsonl` with timestamp, source IP, path, method. Rate-capped at 60 writes/min.
|
|
- All agent activity is logged with attribution (clientId).
|
|
|
|
**Known non-goal (tracked as #1136):** on Windows, the cookie-import-browser path launches Chrome with `--remote-debugging-port=<random>`. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is `--remote-debugging-pipe` instead of TCP.
|
|
|
|
## Same-Machine Shortcut
|
|
|
|
If both agents are on the same machine, skip the copy-paste:
|
|
|
|
```bash
|
|
$B pair-agent --local openclaw # writes to ~/.openclaw/skills/gstack/browse-remote.json
|
|
$B pair-agent --local codex # writes to ~/.codex/skills/gstack/browse-remote.json
|
|
$B pair-agent --local cursor # writes to ~/.cursor/skills/gstack/browse-remote.json
|
|
```
|
|
|
|
No tunnel needed. Uses localhost directly.
|
|
|
|
## ngrok Tunnel Setup
|
|
|
|
For remote agents on different machines:
|
|
|
|
1. Sign up at [ngrok.com](https://ngrok.com) (free tier works)
|
|
2. Copy your auth token from the dashboard
|
|
3. Save it: `echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env`
|
|
4. Optionally claim a stable domain: `echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env`
|
|
5. Start with tunnel: `BROWSE_TUNNEL=1 $B restart`
|
|
6. Run `$B pair-agent` — it will use the tunnel URL automatically
|