mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Files

T

Garry Tan 8f3701b761 v1.16.0.0 feat: tunnel allowlist 17→26 + canDispatchOverTunnel pure function (#1253 )

* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel

Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to
TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already
documented). Each new command is bounded by the existing per-tab ownership
check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only'
so paired agents still can't operate on tabs they don't own.

Refactors the inline gate check at server.ts:1771-1783 into a pure exported
function canDispatchOverTunnel(command). Same behavior as the inline check;
the difference is unit-testability without HTTP.

Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve
listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed.
Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: source-level guards + pure-function unit test + dual-listener behavioral eval

Three layers of regression coverage for the tunnel allowlist:

1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set
   equality on the 26-command literal (the prior intersection-only style let
   new commands sneak into the source without test updates). Adds a regex
   assertion that the `command !== 'newtab'` ownership exemption at
   server.ts:613 still exists — catches refactors that re-introduce the
   catch-22 from the other side. Updates the /command handler test to look
   for canDispatchOverTunnel(body?.command) instead of the inline check.

2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed,
   20 blocked, null/undefined/empty/non-string defensive handling, and alias
   canonicalization (e.g. 'set-content' resolves to 'load-html' which is
   correctly rejected since 'load-html' isn't tunnel-allowed).

3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the
   daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both
   listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and
   assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel
   403s with disallowed_command:pair AND writes a denial-log entry;
   (c) pair over local does NOT trigger the tunnel gate (proves the gate
   is surface-scoped); (d) regression for the catch-22 — newtab + goto on
   the resulting tab does not 403 with "Tab not owned by your agent".

All four tests run free under bun test (no API spend, no ngrok).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md

Both docs already named the 9 new commands as remote-accessible (the operator
guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's
instruction blocks). The allowlist count was the only place the drift was
visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list:
'eval' is in the allowlist, not the denied list — prior doc was wrong.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.21.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot)

The previous bump landed at v1.21.0.0 because gstack-next-version
advances past the highest claimed slot (v1.20.0.0 from #1252) rather
than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and
v1.16.0.0 preserves monotonic version ordering on main once #1234
(v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots

The gate was rejecting any PR VERSION below the util's next-slot
recommendation, even when the lower slot was unclaimed. This blocked
PRs that legitimately want to land at an unclaimed slot below the queue
max — which is what /ship should pick when the goal is monotonic version
ordering on main (lower-numbered PRs landing first preserves order; the
util's "advance past max claimed" semantics only optimizes for fresh
runs picking unique slots, not for queue ordering on merge).

New gate logic:

1. Hard-fail if PR VERSION <= base VERSION (no actual bump).
2. Hard-fail if PR VERSION exactly matches another open PR's VERSION
   (real collision).
3. Pass otherwise. If the PR is below the util's suggestion, emit an
   informational ::notice:: explaining the slot is unclaimed.

The util's output stays informational — it tells fresh /ship runs what
the next-up slot should be, but the gate only blocks actual conflicts.
This is a strict relaxation: every PR that passed the old gate also
passes the new one.

Confirmed by dry-run against the current queue (4 open PRs claiming
1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0):
  - v1.16.0.0  → pass with informational notice (unclaimed)
  - v1.17.0.0  → fail (collision with #1234)
  - v1.15.0.0  → fail (no bump from base)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-28 00:57:28 -07:00

9.7 KiB

Raw Blame History

Remote Browser Access — How to Pair With a GStack Browser

A GStack Browser server can be shared with any AI agent that can make HTTP requests. The agent gets scoped access to a real Chromium browser: navigate pages, read content, click elements, fill forms, take screenshots. Each agent gets its own tab.

This document is the reference for remote agents. The quick-start instructions are generated by $B pair-agent with the actual credentials baked in.

Architecture

Your Machine                          Remote Agent
─────────────                         ────────────
GStack Browser Server                 Any AI agent
  ├── Chromium (Playwright)           (OpenClaw, Hermes, Codex, etc.)
  ├── Local listener  127.0.0.1:LOCAL         │
  │    (bootstrap, CLI, sidebar, cookies)      │
  ├── Tunnel listener 127.0.0.1:TUNNEL ◄───────┤
  │    (pair-agent only: /connect, /command,   │
  │     /sidebar-chat — locked allowlist)      │
  ├── ngrok tunnel (forwards tunnel port only) │
  │     https://xxx.ngrok.dev ─────────────────┘
  └── Token Registry
        ├── Root token (local listener only)
        ├── Setup keys (5 min, one-time)
        ├── Session tokens (24h, scoped)
        └── SSE session cookies (30 min, stream-scope)

Dual-listener architecture (v1.6.0.0)

The daemon binds two HTTP sockets. The local listener serves the full command surface to 127.0.0.1 only and is never forwarded. The tunnel listener is bound lazily on /tunnel/start (and torn down on /tunnel/stop) with a locked path allowlist. ngrok forwards only the tunnel port.

A caller who stumbles onto your ngrok URL cannot reach /health, /cookie-picker, /inspector/*, or /welcome — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only /connect, /command (with a scoped token + the 26-command browser-driving allowlist), and /sidebar-chat.

See ARCHITECTURE.md for the full endpoint table.

Connection Flow

User runs $B pair-agent (or /pair-agent in Claude Code)
Server creates a one-time setup key (expires in 5 minutes)
User copies the instruction block into the other agent's chat
Remote agent runs POST /connect with the setup key
Server returns a scoped session token (24h default)
Remote agent creates its own tab via POST /command with newtab
Remote agent browses using POST /command with its session token + tabId

API Reference

Authentication

All command endpoints require a Bearer token:

Authorization: Bearer gsk_sess_...

/connect is unauthenticated (rate-limited) — it's how a remote agent exchanges a setup key for a scoped session token. /health is unauthenticated on the local listener (bootstrap) but does NOT exist on the tunnel listener (404).

SSE endpoints (/activity/stream, /inspector/events) accept either a Bearer token or the HttpOnly gstack_sse cookie (minted via POST /sse-session, 30-minute TTL, stream-scope only — cannot be used against /command). As of v1.6.0.0 the ?token=<ROOT> query-string auth is no longer accepted.

Endpoints

POST /connect

Exchange a setup key for a session token. No auth required. Rate-limited to 300/minute (flood defense — setup keys are 24 random bytes, unbruteforceable).

Request:  {"setup_key": "gsk_setup_..."}
Response: {"token": "gsk_sess_...", "expires": "ISO8601", "scopes": ["read","write"], "agent": "agent-name"}

POST /command

Send a browser command. Requires Bearer auth.

Request:  {"command": "goto", "args": ["https://example.com"], "tabId": 1}
Response: (plain text result of the command)

GET /health

Server status. No auth required. Returns status, tabs, mode, uptime.

Commands

Command	Args	Description
`goto`	`["URL"]`	Navigate to a URL
`back`	`[]`	Go back
`forward`	`[]`	Go forward
`reload`	`[]`	Reload page

Reading Content

Command	Args	Description
`snapshot`	`["-i"]`	Interactive snapshot with @ref labels (most useful)
`text`	`[]`	Full page text
`html`	`["selector?"]`	HTML of element or full page
`links`	`[]`	All links on page
`screenshot`	`["/tmp/s.png"]`	Take a screenshot
`url`	`[]`	Current URL

Interaction

Command	Args	Description
`click`	`["@e3"]`	Click an element (use @ref from snapshot)
`fill`	`["@e5", "text"]`	Fill a form field
`select`	`["@e7", "option"]`	Select dropdown value
`type`	`["text"]`	Type text (keyboard)
`press`	`["Enter"]`	Press a key
`scroll`	`["down"]`	Scroll the page

Tabs

Command	Args	Description
`newtab`	`["URL?"]`	Create a new tab (required before writing)
`tabs`	`[]`	List all tabs
`closetab`	`["id?"]`	Close a tab

The Snapshot → @ref Pattern

This is the most powerful browsing pattern. Instead of writing CSS selectors:

Run snapshot -i to get an interactive snapshot with labeled elements

The snapshot returns text like:

[Page Title]
@e1 [link] "Home"
@e2 [button] "Sign In"
@e3 [input] "Search..."

Use the @e refs directly in commands: click @e2, fill @e3 "search query"

This is how the snapshot system works, and it's much more reliable than guessing CSS selectors. Always snapshot -i first, then use the refs.

Scopes

Scope	What it allows
`read`	snapshot, text, html, links, screenshot, url, tabs, console, etc.
`write`	goto, click, fill, scroll, newtab, closetab, etc.
`admin`	eval, js, cookies, storage, cookie-import, useragent, etc.
`meta`	tab, diff, frame, responsive, watch

Default tokens get read + write. Admin requires --admin flag when pairing.

Tab Isolation

Each agent owns the tabs it creates. Rules:

Read: Any agent can read any tab (snapshot, text, screenshot)
Write: Only the tab owner can write (click, fill, goto, etc.)
Unowned tabs: Pre-existing tabs are root-only for writes
First step: Always newtab before trying to interact

Error Codes

Code	Meaning	What to do
401	Token invalid, expired, or revoked	Ask user to run /pair-agent again
403	Command not in scope, or tab not yours	Use newtab, or ask for --admin
429	Rate limit exceeded (>10 req/s)	Wait for Retry-After header

Security Model

Physical port separation. Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
Tunnel command allowlist. /command over the tunnel only accepts 26 browser-driving commands (goto, click, fill, snapshot, text, newtab, tabs, back, forward, reload, closetab, etc.). Server-management commands (tunnel, pair, token, useragent, js) are denied on the tunnel.
Root token is tunnel-blocked. A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
Setup keys expire in 5 minutes and can only be used once.
Session tokens expire in 24 hours (configurable).
The root token never appears in instruction blocks or connection strings.
Admin scope (JS execution, cookie access) is denied by default.
Tokens can be revoked instantly: $B tunnel revoke agent-name
SSE auth uses a 30-minute HttpOnly SameSite=Strict cookie, stream-scope only (never valid against /command).
Path traversal guarded on /welcome — GSTACK_SLUG must match ^[a-z0-9_-]+$ or falls back to the built-in template.
SSRF guards on goto, download, and scrape paths — validates URL target against a localhost/private-range blocklist.
Tunnel surface denial logging. Every rejection on the tunnel listener (path_not_on_tunnel, root_token_on_tunnel, missing_scoped_token, disallowed_command:*) is appended to ~/.gstack/security/attempts.jsonl with timestamp, source IP, path, method. Rate-capped at 60 writes/min.
All agent activity is logged with attribution (clientId).

Known non-goal (tracked as #1136): on Windows, the cookie-import-browser path launches Chrome with --remote-debugging-port=<random>. With App-Bound Encryption v20, a same-user local process can connect to that port and exfiltrate decrypted v20 cookies — an elevation path relative to reading the SQLite DB directly. Fix direction is --remote-debugging-pipe instead of TCP.

Same-Machine Shortcut

If both agents are on the same machine, skip the copy-paste:

$B pair-agent --local openclaw    # writes to ~/.openclaw/skills/gstack/browse-remote.json
$B pair-agent --local codex       # writes to ~/.codex/skills/gstack/browse-remote.json
$B pair-agent --local cursor      # writes to ~/.cursor/skills/gstack/browse-remote.json

No tunnel needed. Uses localhost directly.

ngrok Tunnel Setup

For remote agents on different machines:

Sign up at ngrok.com (free tier works)
Copy your auth token from the dashboard
Save it: echo 'NGROK_AUTHTOKEN=your_token' > ~/.gstack/ngrok.env
Optionally claim a stable domain: echo 'NGROK_DOMAIN=your-name.ngrok-free.dev' >> ~/.gstack/ngrok.env
Start with tunnel: BROWSE_TUNNEL=1 $B restart
Run $B pair-agent — it will use the tunnel URL automatically

9.7 KiB Raw Blame History