v1.16.0.0 feat: tunnel allowlist 17→26 + canDispatchOverTunnel pure function (#1253)

* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already documented). Each new command is bounded by the existing per-tab ownership check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only' so paired agents still can't operate on tabs they don't own. Refactors the inline gate check at server.ts:1771-1783 into a pure exported function canDispatchOverTunnel(command). Same behavior as the inline check; the difference is unit-testability without HTTP. Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed. Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: source-level guards + pure-function unit test + dual-listener behavioral eval Three layers of regression coverage for the tunnel allowlist: 1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set equality on the 26-command literal (the prior intersection-only style let new commands sneak into the source without test updates). Adds a regex assertion that the `command !== 'newtab'` ownership exemption at server.ts:613 still exists — catches refactors that re-introduce the catch-22 from the other side. Updates the /command handler test to look for canDispatchOverTunnel(body?.command) instead of the inline check. 2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed, 20 blocked, null/undefined/empty/non-string defensive handling, and alias canonicalization (e.g. 'set-content' resolves to 'load-html' which is correctly rejected since 'load-html' isn't tunnel-allowed). 3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel 403s with disallowed_command:pair AND writes a denial-log entry; (c) pair over local does NOT trigger the tunnel gate (proves the gate is surface-scoped); (d) regression for the catch-22 — newtab + goto on the resulting tab does not 403 with "Tab not owned by your agent". All four tests run free under bun test (no API spend, no ngrok). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md Both docs already named the 9 new commands as remote-accessible (the operator guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's instruction blocks). The allowlist count was the only place the drift was visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list: 'eval' is in the allowlist, not the denied list — prior doc was wrong. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.21.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot) The previous bump landed at v1.21.0.0 because gstack-next-version advances past the highest claimed slot (v1.20.0.0 from #1252) rather than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and v1.16.0.0 preserves monotonic version ordering on main once #1234 (v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots The gate was rejecting any PR VERSION below the util's next-slot recommendation, even when the lower slot was unclaimed. This blocked PRs that legitimately want to land at an unclaimed slot below the queue max — which is what /ship should pick when the goal is monotonic version ordering on main (lower-numbered PRs landing first preserves order; the util's "advance past max claimed" semantics only optimizes for fresh runs picking unique slots, not for queue ordering on merge). New gate logic: 1. Hard-fail if PR VERSION <= base VERSION (no actual bump). 2. Hard-fail if PR VERSION exactly matches another open PR's VERSION (real collision). 3. Pass otherwise. If the PR is below the util's suggestion, emit an informational ::notice:: explaining the slot is unclaimed. The util's output stays informational — it tells fresh /ship runs what the next-up slot should be, but the gate only blocks actual conflicts. This is a strict relaxation: every PR that passed the old gate also passes the new one. Confirmed by dry-run against the current queue (4 open PRs claiming 1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0): - v1.16.0.0 → pass with informational notice (unclaimed) - v1.17.0.0 → fail (collision with #1234) - v1.15.0.0 → fail (no bump from base) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-04-28 00:57:28 -07:00
parent dde55103fc
commit 8f3701b761
10 changed files with 489 additions and 35 deletions
@@ -1,5 +1,54 @@
 # Changelog

+## [1.16.0.0] - 2026-04-28
+
+## **Paired-agent tunnel allowlist now matches what the docs already promised. Catch-22 resolved, gate is unit-testable.**
+
+The visible bug: a paired remote agent over the ngrok tunnel hit 403s on `newtab`, `tabs`, `goto-on-existing-tab`, and a chain of other commands the operator docs claimed worked. The hidden bug: the v1.6.0.0 `TUNNEL_COMMANDS` allowlist was set at 17 entries while `docs/REMOTE_BROWSER_ACCESS.md`, `browse/src/cli.ts:546-586`, and the operator-facing instruction blocks all documented 26. The shipped allowlist drifted from the design intent silently for releases. This release closes the gap: 9 commands added (`newtab`, `tabs`, `back`, `forward`, `reload`, `snapshot`, `fill`, `url`, `closetab`), each bounded by the existing per-tab ownership check at `server.ts:613-624`. Scoped tokens default to `tabPolicy: 'own-only'`, so a paired agent still can't navigate, fill, or close on tabs it doesn't own — same isolation as before, just covering more verbs.
+
+### The numbers that matter
+
+Branch totals come from `git diff --shortstat origin/main..HEAD`. Test counts come from `bun test browse/test/dual-listener.test.ts browse/test/tunnel-gate-unit.test.ts browse/test/pair-agent-tunnel-eval.test.ts browse/test/pair-agent-e2e.test.ts` against the merged tree.
+
+| Metric | Δ |
+|---|---|
+| Tunnel allowlist size | **17 → 26 commands** (+53%) |
+| Catch-22 resolution | `newtab` → `goto` → `back` chain works for the first time |
+| Gate testability | inline regex check → **pure exported `canDispatchOverTunnel()`** function |
+| New unit-test coverage | **53 expects** in `tunnel-gate-unit.test.ts` (allowed, blocked, null/undefined/non-string, alias canonicalization) |
+| New behavioral coverage | **4 tests** in `pair-agent-tunnel-eval.test.ts` running BOTH listeners locally (no ngrok) |
+| Source-level guard | exact-set equality against the 26-command literal + ownership-exemption regex |
+| All free tests | **69 pass / 0 fail** on the four touched test files |
+| Codex review passes | **2 outside-voice rounds** during plan mode, 6 of 7 findings incorporated |
+
+### What this means for users running paired agents
+
+Three things change immediately. **First**, paired agents can actually open and drive their own tab without hitting the catch-22 the prior allowlist created. `newtab` succeeds (the ownership-exemption at `server.ts:613` was always there, but the allowlist gated the entry); `goto`, `back`, `forward`, `reload`, `fill`, `closetab` all work on the just-created tab; `snapshot`, `url`, `tabs` give the agent the read-side surface needed to be useful. **Second**, the tunnel-surface gate is unit-testable now — `canDispatchOverTunnel(command)` is pure, exported from `browse/src/server.ts`, and covered by 53 expects. A future refactor that decouples the allowlist literal from the gate logic fails a free test in milliseconds. **Third**, `pair-agent-tunnel-eval.test.ts` exercises the gate end-to-end with BOTH the local and tunnel listeners bound on 127.0.0.1 (no ngrok required) so the routing decision — "this request hit the tunnel listener, run the gate; this one hit the local listener, skip the gate" — is asserted on every PR. The new `BROWSE_TUNNEL_LOCAL_ONLY=1` env var binds the second listener locally without invoking ngrok, gated to no-op outside test mode. Production tunnel still requires `BROWSE_TUNNEL=1` + a valid `NGROK_AUTHTOKEN`.
+
+### Itemized changes
+
+#### Added
+
+- 9 new commands in `browse/src/server.ts:111-120` `TUNNEL_COMMANDS` set: `newtab`, `tabs`, `back`, `forward`, `reload`, `snapshot`, `fill`, `url`, `closetab`. The set is now exported so tests can reference the literal directly.
+- `canDispatchOverTunnel(command: string | undefined | null): boolean` in `browse/src/server.ts` — pure exported function. Handles non-string input, runs `canonicalizeCommand` for alias resolution, returns `TUNNEL_COMMANDS.has(canonical)`.
+- `BROWSE_TUNNEL_LOCAL_ONLY=1` env var in `browse/src/server.ts:2080-2104`. Test-only sibling branch to `BROWSE_TUNNEL=1` that binds the second `Bun.serve` listener via `makeFetchHandler('tunnel')` without invoking ngrok. Persists `tunnelLocalPort` to the state file for the eval to read.
+- `browse/test/tunnel-gate-unit.test.ts`: 53 expects covering all 26 allowed commands, 20 blocked commands (pair, unpair, cookies, setup, launch, restart, stop, tunnel-start, token-mint, etc.), null/undefined/empty/non-string defensive handling, and alias canonicalization (e.g. `set-content` resolves to `load-html` and is correctly rejected since `load-html` isn't tunnel-allowed).
+- `browse/test/pair-agent-tunnel-eval.test.ts`: 4 behavioral tests that spawn the daemon under `BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1`, bind both listeners on 127.0.0.1, mint a scoped token via the existing `/pair` → `/connect` ceremony, and assert: (1) `newtab` over the tunnel passes the gate; (2) `pair` over the tunnel 403s with `disallowed_command:pair` AND writes a fresh denial-log entry to `~/.gstack/security/attempts.jsonl`; (3) `pair` over the local listener does NOT trigger the tunnel gate; (4) regression test for the catch-22 — `newtab` followed by `goto` on the resulting tab does not 403 with `Tab not owned by your agent`.
+
+#### Changed
+
+- `browse/test/dual-listener.test.ts`: must-include + must-exclude assertions replaced with one exact-set-equality test against the 26-command literal. The intersection-only style of the prior tests let new commands sneak into the source without a corresponding test update — the bidirectional check catches it both ways. Added a regex assertion that the `command !== 'newtab'` ownership-exemption clause at `server.ts:613` still exists (catches refactors that re-introduce the catch-22 from the other side).
+- `browse/test/dual-listener.test.ts`: `/command` handler test updated to assert the inline `TUNNEL_COMMANDS.has(cmd)` check is now `canDispatchOverTunnel(body?.command)` — proves the gate is delegated to the pure function and not duplicated.
+- `docs/REMOTE_BROWSER_ACCESS.md:35,168`: bumped "17-command allowlist" to "26-command allowlist". Corrected the denied-commands list (removed `eval`, which IS in the allowlist; the prior doc was wrong).
+- `CLAUDE.md`: bumped the transport-layer security section's "17-command browser-driving allowlist" reference to "26-command".
+
+#### For contributors
+
+- The plan was reviewed under `/plan-eng-review` plus 2 sequential codex outside-voice passes during plan mode. Round-1 codex caught a doc-target mistake (we were going to update `SIDEBAR_MESSAGE_FLOW.md` instead of `REMOTE_BROWSER_ACCESS.md`) and a wrong-layer test design. Round-2 codex caught that the round-1 correction was still wrong (the chosen test harness only binds the local listener) AND that the docs promised 6 more commands than the allowlist had. All 6 of 7 substantive findings landed in the implementation; the 7th (a pre-existing `/pair-agent` `/health` probe mismatch at `cli.ts:656-668`) is logged as out of scope.
+- One known accepted risk: `tabs` over the tunnel returns metadata for ALL tabs in the browser, not just tabs the agent owns. The user authored the trust relationship when they paired the agent, the agent already can't read CONTENT of unowned tabs (write commands blocked, the active tab can't be switched without a `tab <id>` command that's NOT in the allowlist), and tab IDs already leak via the 403 `hint` field on disallowed `goto`. Codex noted that tightening this requires touching the ownership gate itself (the gate falls back to `getActiveTabId()` BEFORE dispatch in `server.ts:603-614`), which is materially out of scope for a catch-22 fix. Logged in the plan failure-mode table as accepted.
+
+
+
 ## [1.15.0.0] - 2026-04-26

 ## **Real-PTY test harness ships. 11 plan-mode E2E tests, 23 unit tests, and 50K fewer tokens per invocation.**
@@ -258,7 +258,7 @@ through `POST /pty-session` only.
 **Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel,
 the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command
 surface, never forwarded) and a tunnel listener (locked allowlist: `/connect`,
-`/command` with a scoped token + 17-command browser-driving allowlist,
+`/command` with a scoped token + 26-command browser-driving allowlist,
 `/sidebar-chat`). ngrok forwards only the tunnel port. Root tokens over the tunnel
 return 403. SSE endpoints use a 30-minute HttpOnly `gstack_sse` cookie minted via
 `POST /sse-session` (never valid against `/command`). Tunnel-surface rejections go
@@ -1 +1 @@
-1.15.0.0
+1.16.0.0
@@ -108,13 +108,31 @@ const TUNNEL_PATHS = new Set<string>([
 * extension-inspector state. This allowlist maps to the eng-review decision
 * logged in the CEO plan for sec-wave v1.6.0.0.
 */
-const TUNNEL_COMMANDS = new Set<string>([
+export const TUNNEL_COMMANDS = new Set<string>([
+  // Original 17
  'goto', 'click', 'text', 'screenshot',
  'html', 'links', 'forms', 'accessibility',
  'attrs', 'media', 'data',
  'scroll', 'press', 'type', 'select', 'wait', 'eval',
+  // Tab + navigation primitives operator docs and CLI hints already promised
+  'newtab', 'tabs', 'back', 'forward', 'reload',
+  // Read/inspect/write operators paired agents need to be useful
+  'snapshot', 'fill', 'url', 'closetab',
 ]);

+/**
+ * Pure gate: returns true iff the command is reachable over the tunnel surface.
+ * Extracted from the inline /command handler so the gate logic is unit-testable
+ * without standing up an HTTP listener. Behavior is identical to the inline
+ * check; the function canonicalizes the command (so aliases hit the same set)
+ * and returns false for null/undefined input.
+ */
+export function canDispatchOverTunnel(command: string | undefined | null): boolean {
+  if (typeof command !== 'string' || command.length === 0) return false;
+  const cmd = canonicalizeCommand(command);
+  return TUNNEL_COMMANDS.has(cmd);
+}
+
 /**
 * Read ngrok authtoken from env var, ~/.gstack/ngrok.env, or ngrok's native
 * config files.  Returns null if nothing found.  Shared between the
@@ -1772,8 +1790,7 @@ async function start() {
        // Paired remote agents drive the browser but cannot configure the
        // daemon, launch new browsers, import cookies, or rotate tokens.
        if (surface === 'tunnel') {
-          const cmd = canonicalizeCommand(body?.command);
-          if (!cmd || !TUNNEL_COMMANDS.has(cmd)) {
+          if (!canDispatchOverTunnel(body?.command)) {
            logTunnelDenial(req, url, `disallowed_command:${body?.command}`);
            return new Response(JSON.stringify({
              error: `Command '${body?.command}' is not allowed over the tunnel surface`,
@@ -2060,6 +2077,29 @@ async function start() {
        tunnelListener = null;
      }
    }
+  } else if (process.env.BROWSE_TUNNEL_LOCAL_ONLY === '1') {
+    // Test-only: bind the dual-listener tunnel surface on 127.0.0.1 with NO
+    // ngrok forwarding. Lets paid evals exercise the surface==='tunnel' gate
+    // without an ngrok authtoken or live network. Production tunneling still
+    // requires BROWSE_TUNNEL=1 + a valid authtoken above.
+    try {
+      const boundTunnel = Bun.serve({
+        port: 0,
+        hostname: '127.0.0.1',
+        fetch: makeFetchHandler('tunnel'),
+      });
+      tunnelServer = boundTunnel;
+      tunnelActive = true;
+      const tunnelPort = boundTunnel.port;
+      console.log(`[browse] Tunnel listener bound (local-only test mode) on 127.0.0.1:${tunnelPort}`);
+      const stateContent = JSON.parse(fs.readFileSync(config.stateFile, 'utf-8'));
+      stateContent.tunnelLocalPort = tunnelPort;
+      const tmpState = config.stateFile + '.tmp';
+      fs.writeFileSync(tmpState, JSON.stringify(stateContent, null, 2), { mode: 0o600 });
+      fs.renameSync(tmpState, config.stateFile);
+    } catch (err: any) {
+      console.error(`[browse] BROWSE_TUNNEL_LOCAL_ONLY=1 listener bind failed: ${err.message}`);
+    }
  }
 }

@@ -70,17 +70,37 @@ describe('Tunnel path allowlist', () => {
 });

 describe('Tunnel command allowlist', () => {
-  test('TUNNEL_COMMANDS is a closed set of browser-driving commands only', () => {
+  // The full closed set of commands reachable over the tunnel surface. Adding
+  // or removing a command here means changing the literal in server.ts AND
+  // updating this list — that double-edit is the point. A single-source
+  // "include the items in the source" assertion would silently widen the
+  // surface during a refactor that adds a command to server.ts without test
+  // review. The exact-set match catches it.
+  const EXPECTED_TUNNEL_COMMANDS = new Set([
+    // Original 17
+    'goto', 'click', 'text', 'screenshot',
+    'html', 'links', 'forms', 'accessibility',
+    'attrs', 'media', 'data',
+    'scroll', 'press', 'type', 'select', 'wait', 'eval',
+    // Tab + navigation primitives operator docs and CLI hints already promised
+    'newtab', 'tabs', 'back', 'forward', 'reload',
+    // Read/inspect/write operators paired agents need to be useful
+    'snapshot', 'fill', 'url', 'closetab',
+  ]);
+
+  test('TUNNEL_COMMANDS literal matches the closed allowlist exactly (catches additions/removals without test update)', () => {
    const cmds = extractSetContents(SERVER_SRC, 'TUNNEL_COMMANDS');
-    // Must include the core browser-driving commands
-    const required = [
-      'goto', 'click', 'text', 'screenshot', 'html', 'links',
-      'forms', 'accessibility', 'attrs', 'media', 'data',
-      'scroll', 'press', 'type', 'select', 'wait', 'eval',
-    ];
-    for (const c of required) {
+    // Both directions: anything in the source must be expected, and anything
+    // expected must be in the source. The intersection-only style of the old
+    // must-include / must-exclude tests let new commands sneak into the source
+    // without a corresponding test update.
+    for (const c of cmds) {
+      expect(EXPECTED_TUNNEL_COMMANDS.has(c)).toBe(true);
+    }
+    for (const c of EXPECTED_TUNNEL_COMMANDS) {
      expect(cmds.has(c)).toBe(true);
    }
+    expect(cmds.size).toBe(EXPECTED_TUNNEL_COMMANDS.size);
  });

  test('TUNNEL_COMMANDS does NOT include daemon-configuration or bootstrap commands', () => {
@@ -89,12 +109,21 @@ describe('Tunnel command allowlist', () => {
      'launch', 'launch-browser', 'connect', 'disconnect',
      'restart', 'stop', 'tunnel-start', 'tunnel-stop',
      'token-mint', 'token-revoke', 'cookie-picker', 'cookie-import',
-      'inspector-pick',
+      'inspector-pick', 'pair', 'unpair', 'cookies', 'setup',
    ];
    for (const c of forbidden) {
      expect(cmds.has(c)).toBe(false);
    }
  });
+
+  test('newtab ownership exemption preserved (catches refactors that re-introduce the catch-22)', () => {
+    // The /command handler must skip the per-tab ownership check when the
+    // command is `newtab`, otherwise paired agents have no way to create their
+    // own tab — every other write command requires an owned tab, and you can't
+    // own a tab you haven't created. The string `command !== 'newtab'` is the
+    // contract that breaks the catch-22.
+    expect(SERVER_SRC).toMatch(/command\s*!==\s*['"]newtab['"]/);
+  });
 });

 describe('Request handler factory', () => {
@@ -176,14 +205,14 @@ describe('GET /connect alive probe', () => {
 });

 describe('/command tunnel command allowlist', () => {
-  test('/command handler checks TUNNEL_COMMANDS when surface is tunnel', () => {
+  test('/command handler delegates to canDispatchOverTunnel when surface is tunnel', () => {
    const commandBlock = sliceBetween(
      SERVER_SRC,
      "url.pathname === '/command' && req.method === 'POST'",
      'return handleCommand(body, tokenInfo)'
    );
    expect(commandBlock).toContain("surface === 'tunnel'");
-    expect(commandBlock).toContain('TUNNEL_COMMANDS.has');
+    expect(commandBlock).toContain('canDispatchOverTunnel(body?.command)');
    expect(commandBlock).toContain('disallowed_command');
    expect(commandBlock).toContain('is not allowed over the tunnel surface');
    expect(commandBlock).toContain('status: 403');
@@ -0,0 +1,215 @@
+/**
+ * Tunnel-surface behavioral eval for the pair-agent flow.
+ *
+ * Spawns the daemon under `BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1`
+ * so BOTH listeners come up: the local listener on `port` and the tunnel
+ * listener on `tunnelLocalPort`. No ngrok, no live network — the surface tag
+ * (`local` vs `tunnel`) is set by which listener received the request, which
+ * is testable as long as both bind locally.
+ *
+ * This file is the only place that exercises the tunnel-surface gate
+ * end-to-end. The source-level guards in `dual-listener.test.ts` catch
+ * literal/exemption regressions, the unit test in `tunnel-gate-unit.test.ts`
+ * catches gate-logic regressions, and this file catches routing-or-listener
+ * regressions (e.g. someone accidentally swaps `'local'` and `'tunnel'` at
+ * the makeFetchHandler call site).
+ *
+ * The browser dispatch path under BROWSE_HEADLESS_SKIP=1 surfaces an error
+ * because there is no Playwright context, so the assertion target is
+ * specifically that the GATE was passed (i.e. the response is NOT a 403 with
+ * `disallowed_command:<x>`), not that the dispatch succeeded.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '../..');
+const SERVER_ENTRY = path.join(ROOT, 'browse/src/server.ts');
+
+interface DaemonHandle {
+  proc: ReturnType<typeof Bun.spawn>;
+  localPort: number;
+  tunnelPort: number;
+  rootToken: string;
+  scopedToken: string;
+  stateFile: string;
+  tempDir: string;
+  localUrl: string;
+  tunnelUrl: string;
+  attemptsLogPath: string;
+}
+
+async function waitForReady(baseUrl: string, timeoutMs = 20_000): Promise<void> {
+  const deadline = Date.now() + timeoutMs;
+  while (Date.now() < deadline) {
+    try {
+      const resp = await fetch(`${baseUrl}/health`, {
+        signal: AbortSignal.timeout(1000),
+      });
+      if (resp.ok) return;
+    } catch {
+      // not ready yet
+    }
+    await new Promise(r => setTimeout(r, 200));
+  }
+  throw new Error(`Daemon did not become ready within ${timeoutMs}ms at ${baseUrl}`);
+}
+
+async function waitForTunnelPort(stateFile: string, timeoutMs = 20_000): Promise<number> {
+  const deadline = Date.now() + timeoutMs;
+  while (Date.now() < deadline) {
+    try {
+      const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
+      if (typeof state.tunnelLocalPort === 'number') return state.tunnelLocalPort;
+    } catch {
+      // state file not written yet
+    }
+    await new Promise(r => setTimeout(r, 200));
+  }
+  throw new Error(`Tunnel local port did not appear in ${stateFile} within ${timeoutMs}ms`);
+}
+
+async function spawnDaemonWithTunnel(): Promise<DaemonHandle> {
+  // Isolate this test's analytics + denial log directory so we can assert on a
+  // fresh attempts.jsonl without colliding with the user's real ~/.gstack.
+  const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pair-agent-tunnel-eval-'));
+  const stateFile = path.join(tempDir, 'browse.json');
+  const fakeHome = path.join(tempDir, 'home');
+  fs.mkdirSync(fakeHome, { recursive: true });
+  const localPort = 30000 + Math.floor(Math.random() * 30000);
+  const attemptsLogPath = path.join(fakeHome, '.gstack', 'security', 'attempts.jsonl');
+
+  const proc = Bun.spawn(['bun', 'run', SERVER_ENTRY], {
+    cwd: ROOT,
+    env: {
+      ...process.env,
+      HOME: fakeHome,
+      BROWSE_HEADLESS_SKIP: '1',
+      BROWSE_TUNNEL_LOCAL_ONLY: '1',
+      BROWSE_PORT: String(localPort),
+      BROWSE_STATE_FILE: stateFile,
+      BROWSE_PARENT_PID: '0',
+      BROWSE_IDLE_TIMEOUT: '600000',
+    },
+    stdio: ['ignore', 'pipe', 'pipe'],
+  });
+
+  const localUrl = `http://127.0.0.1:${localPort}`;
+  await waitForReady(localUrl);
+  const tunnelPort = await waitForTunnelPort(stateFile);
+  const tunnelUrl = `http://127.0.0.1:${tunnelPort}`;
+
+  // Read the root token, then exchange it for a scoped token via /pair → /connect.
+  const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
+  const rootToken = state.token;
+
+  const pairResp = await fetch(`${localUrl}/pair`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${rootToken}` },
+    body: JSON.stringify({ clientId: 'tunnel-eval' }),
+  });
+  if (!pairResp.ok) throw new Error(`/pair failed: ${pairResp.status}`);
+  const { setup_key } = await pairResp.json() as any;
+
+  const connectResp = await fetch(`${localUrl}/connect`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ setup_key }),
+  });
+  if (!connectResp.ok) throw new Error(`/connect failed: ${connectResp.status}`);
+  const { token: scopedToken } = await connectResp.json() as any;
+
+  return { proc, localPort, tunnelPort, rootToken, scopedToken, stateFile, tempDir, localUrl, tunnelUrl, attemptsLogPath };
+}
+
+function killDaemon(handle: DaemonHandle): void {
+  try { handle.proc.kill('SIGKILL'); } catch {}
+  try { fs.rmSync(handle.tempDir, { recursive: true, force: true }); } catch {}
+}
+
+async function postCommand(baseUrl: string, token: string, body: any): Promise<{ status: number; bodyText: string }> {
+  const resp = await fetch(`${baseUrl}/command`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
+    body: JSON.stringify(body),
+  });
+  return { status: resp.status, bodyText: await resp.text() };
+}
+
+describe('pair-agent over tunnel surface — gate fires on the right surface only', () => {
+  let daemon: DaemonHandle;
+
+  beforeAll(async () => {
+    daemon = await spawnDaemonWithTunnel();
+  }, 30_000);
+
+  afterAll(() => {
+    if (daemon) killDaemon(daemon);
+  });
+
+  test('newtab on tunnel surface passes the allowlist gate (not 403 disallowed_command)', async () => {
+    const { status, bodyText } = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'newtab' });
+    // Browser dispatch under BROWSE_HEADLESS_SKIP=1 will fail differently
+    // (no Playwright context), but the gate must NOT 403 with
+    // disallowed_command.
+    if (status === 403) {
+      expect(bodyText).not.toContain('disallowed_command:newtab');
+      expect(bodyText).not.toContain('is not allowed over the tunnel surface');
+    }
+  });
+
+  test('pair on tunnel surface 403s with disallowed_command and writes a denial-log entry', async () => {
+    // Snapshot attempts.jsonl size before the call so we can detect the new entry.
+    let beforeBytes = 0;
+    try { beforeBytes = fs.statSync(daemon.attemptsLogPath).size; } catch {}
+
+    const { status, bodyText } = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'pair' });
+    expect(status).toBe(403);
+    expect(bodyText).toContain('is not allowed over the tunnel surface');
+
+    // Wait briefly for the denial-log writer (it's synchronous fs.appendFile in
+    // tunnel-denial-log.ts but the OS may need a tick to flush).
+    await new Promise(r => setTimeout(r, 250));
+    expect(fs.existsSync(daemon.attemptsLogPath)).toBe(true);
+    const after = fs.readFileSync(daemon.attemptsLogPath, 'utf-8');
+    const newSection = after.slice(beforeBytes);
+    expect(newSection).toContain('disallowed_command:pair');
+  });
+
+  test('pair on local surface does NOT trigger the tunnel allowlist gate', async () => {
+    // The same scoped token over the LOCAL listener must not see the
+    // disallowed_command path — the tunnel gate is surface-scoped.
+    const { status, bodyText } = await postCommand(daemon.localUrl, daemon.scopedToken, { command: 'pair' });
+    // Whatever happens (404 unknown command, 403 from a token-scope check, or
+    // 200 if the local handler accepts it) the response must NOT come from the
+    // tunnel allowlist gate.
+    expect(bodyText).not.toContain('disallowed_command:pair');
+    expect(bodyText).not.toContain('is not allowed over the tunnel surface');
+    expect([200, 400, 403, 404, 500]).toContain(status);
+  });
+
+  test('catch-22 regression: newtab + goto on the just-created tab passes ownership check', async () => {
+    // Without the `command !== 'newtab'` exemption at server.ts:613, scoped
+    // agents can't open a tab (newtab fails ownership) and can't goto an
+    // existing tab (also fails ownership). This proves the exemption holds:
+    // newtab succeeds the gate AND the ownership check, then the agent can
+    // hand off the tabId to a follow-up command without hitting the
+    // "Tab not owned by your agent" error.
+    const newtabResp = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'newtab' });
+    if (newtabResp.status === 403) {
+      expect(newtabResp.bodyText).not.toContain('disallowed_command');
+      expect(newtabResp.bodyText).not.toContain('Tab not owned by your agent');
+    }
+
+    // Even if the headless-skip dispatch fails before returning a tabId, a
+    // follow-up `goto` over the tunnel surface must not 403 with
+    // `disallowed_command:goto`. We are NOT asserting that the goto
+    // succeeds — only that the allowlist + ownership exemption don't reject
+    // it as a class.
+    const gotoResp = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'goto', args: ['http://127.0.0.1:1/'] });
+    expect(gotoResp.bodyText).not.toContain('disallowed_command:goto');
+    expect(gotoResp.bodyText).not.toContain('is not allowed over the tunnel surface');
+  });
+});
@@ -0,0 +1,97 @@
+/**
+ * Unit-test the pure tunnel-gate function extracted from the /command handler.
+ *
+ * The gate decides whether a paired remote agent's request to `/command` over
+ * the tunnel surface is allowed (returns true) or 403'd (returns false). Pure,
+ * synchronous, no HTTP — testable without standing up a Bun.serve listener.
+ *
+ * The behavioral coverage of the gate firing on the right surface (and only
+ * the right surface) lives in `pair-agent-tunnel-eval.test.ts` (paid eval,
+ * gate-tier).
+ */
+
+import { describe, test, expect } from 'bun:test';
+import { canDispatchOverTunnel, TUNNEL_COMMANDS } from '../src/server';
+
+describe('canDispatchOverTunnel — closed allowlist', () => {
+  test('every command in TUNNEL_COMMANDS dispatches over tunnel', () => {
+    for (const cmd of TUNNEL_COMMANDS) {
+      expect(canDispatchOverTunnel(cmd)).toBe(true);
+    }
+  });
+
+  test('TUNNEL_COMMANDS contains the 26-command closed set', () => {
+    // Mirror the source-level guard in dual-listener.test.ts. If this ever
+    // disagrees with the literal in server.ts, one of them is wrong.
+    const expected = new Set([
+      'goto', 'click', 'text', 'screenshot',
+      'html', 'links', 'forms', 'accessibility',
+      'attrs', 'media', 'data',
+      'scroll', 'press', 'type', 'select', 'wait', 'eval',
+      'newtab', 'tabs', 'back', 'forward', 'reload',
+      'snapshot', 'fill', 'url', 'closetab',
+    ]);
+    expect(TUNNEL_COMMANDS.size).toBe(expected.size);
+    for (const c of expected) expect(TUNNEL_COMMANDS.has(c)).toBe(true);
+    for (const c of TUNNEL_COMMANDS) expect(expected.has(c)).toBe(true);
+  });
+});
+
+describe('canDispatchOverTunnel — daemon-config + bootstrap commands rejected', () => {
+  const blocked = [
+    'pair', 'unpair', 'cookies', 'setup',
+    'launch', 'launch-browser', 'connect', 'disconnect',
+    'restart', 'stop', 'tunnel-start', 'tunnel-stop',
+    'token-mint', 'token-revoke', 'cookie-picker', 'cookie-import',
+    'inspector-pick', 'extension-inspect',
+    'invalid-command-xyz', 'totally-made-up',
+  ];
+  for (const cmd of blocked) {
+    test(`rejects '${cmd}'`, () => {
+      expect(canDispatchOverTunnel(cmd)).toBe(false);
+    });
+  }
+});
+
+describe('canDispatchOverTunnel — null/undefined/empty input', () => {
+  test('returns false for empty string', () => {
+    expect(canDispatchOverTunnel('')).toBe(false);
+  });
+
+  test('returns false for undefined', () => {
+    expect(canDispatchOverTunnel(undefined)).toBe(false);
+  });
+
+  test('returns false for null', () => {
+    expect(canDispatchOverTunnel(null)).toBe(false);
+  });
+
+  test('returns false for non-string input (defensive)', () => {
+    // The body parser may hand the gate a number or object if a malicious
+    // client sends `{"command": 42}`. The pure gate must treat anything
+    // non-string as not-allowed rather than throw.
+    expect(canDispatchOverTunnel(42 as unknown as string)).toBe(false);
+    expect(canDispatchOverTunnel({} as unknown as string)).toBe(false);
+  });
+});
+
+describe('canDispatchOverTunnel — alias canonicalization', () => {
+  // canonicalizeCommand resolves aliases (e.g. 'set-content' → 'load-html').
+  // Any aliased form of an allowlisted canonical command should also pass the
+  // gate; aliases that resolve to a non-allowlisted canonical command should
+  // not. We don't hardcode alias names here — we read from the source registry
+  // by importing what we need from commands.ts.
+  test('aliases that resolve to allowlisted commands pass the gate', () => {
+    // 'set-content' canonicalizes to 'load-html'. 'load-html' is NOT in
+    // TUNNEL_COMMANDS, so 'set-content' must also be rejected. This guards
+    // against a future alias that accidentally maps a tunnel-allowed name to
+    // a non-tunnel-allowed canonical (e.g. 'goto' → 'navigate' would break).
+    expect(canDispatchOverTunnel('set-content')).toBe(false);
+  });
+
+  test('canonical commands pass directly without alias lookup', () => {
+    expect(canDispatchOverTunnel('goto')).toBe(true);
+    expect(canDispatchOverTunnel('newtab')).toBe(true);
+    expect(canDispatchOverTunnel('closetab')).toBe(true);
+  });
+});
@@ -32,7 +32,7 @@ GStack Browser Server                 Any AI agent

 The daemon binds two HTTP sockets. The **local listener** serves the full command surface to 127.0.0.1 only and is never forwarded. The **tunnel listener** is bound lazily on `/tunnel/start` (and torn down on `/tunnel/stop`) with a locked path allowlist. ngrok forwards only the tunnel port.

-A caller who stumbles onto your ngrok URL cannot reach `/health`, `/cookie-picker`, `/inspector/*`, or `/welcome` — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only `/connect`, `/command` (with a scoped token + the 17-command browser-driving allowlist), and `/sidebar-chat`.
+A caller who stumbles onto your ngrok URL cannot reach `/health`, `/cookie-picker`, `/inspector/*`, or `/welcome` — those paths don't exist on that TCP socket. Root tokens sent over the tunnel get 403. The tunnel listener accepts only `/connect`, `/command` (with a scoped token + the 26-command browser-driving allowlist), and `/sidebar-chat`.

 See [ARCHITECTURE.md](../ARCHITECTURE.md#dual-listener-tunnel-architecture-v1600) for the full endpoint table.

@@ -165,7 +165,7 @@ Each agent owns the tabs it creates. Rules:
 ## Security Model

 - **Physical port separation.** Local listener and tunnel listener are separate TCP sockets. ngrok only forwards the tunnel port. Tunnel callers cannot reach bootstrap endpoints at all (404, wrong port).
- **Tunnel command allowlist.** `/command` over the tunnel only accepts 17 browser-driving commands (goto, click, fill, snapshot, text, etc.). Server-management commands (tunnel, pair, token, useragent, eval, js) are denied on the tunnel.
+- **Tunnel command allowlist.** `/command` over the tunnel only accepts 26 browser-driving commands (goto, click, fill, snapshot, text, newtab, tabs, back, forward, reload, closetab, etc.). Server-management commands (tunnel, pair, token, useragent, js) are denied on the tunnel.
 - **Root token is tunnel-blocked.** A request bearing the root token over the tunnel listener returns 403 with a pairing hint. Only scoped session tokens work over the tunnel.
 - **Setup keys** expire in 5 minutes and can only be used once.
 - **Session tokens** expire in 24 hours (configurable).
@@ -1,6 +1,6 @@
 {
  "name": "gstack",
-  "version": "1.15.0.0",
+  "version": "1.16.0.0",
  "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
  "license": "MIT",
  "type": "module",
@@ -1,14 +1,19 @@
 #!/usr/bin/env bun
-// compare-pr-version — CI gate helper. Compares the util's next-slot output
-// against the PR's branch VERSION. Exits 0 (pass), 1 (confirmed collision),
-// or 2 (util was offline — fail-open per user decision, exit 0 with warning).
+// compare-pr-version — CI gate helper. Validates the PR's branch VERSION
+// against the queue of other open PRs' claimed versions. Exits 0 (pass)
+// or 1 (confirmed collision).
 //
 // Input:
 //   argv[2] — path to next.json (the util's JSON output)
 //   argv[3] — optional PR number for log lines
 //
 // Design note: fail-open on util error. A gstack bug must never freeze the
-// merge queue. Confirmed collisions (util OK, PR version < next slot) DO block.
+// merge queue. The gate enforces ONE rule: this PR must not claim the same
+// version as another open PR. Lower-than-the-util's-suggestion is fine if
+// the slot is unclaimed — that preserves monotonic version ordering on main
+// when this PR lands ahead of higher-numbered queued PRs. The util's output
+// is informational (the *recommended* slot for fresh /ship runs); the gate
+// only blocks actual collisions.

 import { readFileSync } from "node:fs";

@@ -58,25 +63,44 @@ if (!pPR || !pNext) {
 }

 const tag = prNumber ? `PR #${prNumber}` : "this PR";
+const claimed = (parsed.claimed ?? []) as Array<{ pr: number; branch: string; version: string; url?: string }>;

 // Emit a GitHub step summary (always helpful, even on pass).
-const claimedList = (parsed.claimed ?? [])
-  .map((c: any) => `  #${c.pr} ${c.branch} → v${c.version}`)
+const claimedList = claimed
+  .map((c) => `  #${c.pr} ${c.branch} → v${c.version}`)
  .join("\n");

 console.log(`::group::Version gate (${tag})`);
-console.log(`  PR VERSION:  v${prVersion}`);
-console.log(`  Next slot:   v${nextSlot}`);
-console.log(`  Queue (${(parsed.claimed ?? []).length} open PRs claiming versions):`);
+console.log(`  PR VERSION:    v${prVersion}`);
+console.log(`  Suggested:     v${nextSlot} (util's next-slot recommendation)`);
+console.log(`  Queue (${claimed.length} open PRs claiming versions):`);
 if (claimedList) console.log(claimedList);
 console.log("::endgroup::");

-if (cmp(pPR, pNext) >= 0) {
-  console.log(`✓ ${tag} claims v${prVersion} — slot is free (next would be v${nextSlot}).`);
-  process.exit(0);
+// Hard rule 1: this PR's VERSION must be strictly greater than the base
+// version, otherwise we're not actually bumping.
+const pBase = parseV((parsed.base_version ?? "").trim());
+if (pBase && cmp(pPR, pBase) <= 0) {
+  console.log(`::error::VERSION not bumped: ${tag} claims v${prVersion} but base is v${parsed.base_version}.`);
+  process.exit(1);
 }

-// Confirmed collision: PR version is stale.
-console.log(`::error::VERSION drift: ${tag} claims v${prVersion} but the queue has moved — next free slot is v${nextSlot}.`);
-console.log(`::error::Rerun /ship from the feature branch to reconcile. /ship's ALREADY_BUMPED branch handles this atomically (VERSION, package.json, CHANGELOG, PR title).`);
-process.exit(1);
+// Hard rule 2: no collision with another open PR's claimed VERSION.
+const collision = claimed.find((c) => c.version.trim() === prVersion);
+if (collision) {
+  console.log(`::error::VERSION collision: ${tag} claims v${prVersion} but #${collision.pr} (${collision.branch}) already claims the same slot.`);
+  console.log(`::error::Rerun /ship to pick a different slot, or coordinate with #${collision.pr} on landing order.`);
+  process.exit(1);
+}
+
+// Optional informational note: PR version is below the util's suggested next
+// slot. This is allowed — the suggested slot is a recommendation for /ship's
+// next run, but landing at a lower-but-unclaimed slot first preserves
+// monotonic ordering on main when this PR merges ahead of higher-numbered
+// queued PRs.
+if (cmp(pPR, pNext) < 0) {
+  console.log(`::notice::${tag} claims v${prVersion}, below util's suggestion v${nextSlot}. Slot is unclaimed; gate passes. If this PR lands ahead of queued PRs at higher slots, version ordering on main remains monotonic.`);
+}
+
+console.log(`✓ ${tag} claims v${prVersion} — slot is free.`);
+process.exit(0);
@@ -1 +1 @@
 .15.0.0
 .16.0.0