Files
gstack/browse/test/tunnel-gate-unit.test.ts
Garry Tan 8f3701b761 v1.16.0.0 feat: tunnel allowlist 17→26 + canDispatchOverTunnel pure function (#1253)
* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel

Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to
TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already
documented). Each new command is bounded by the existing per-tab ownership
check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only'
so paired agents still can't operate on tabs they don't own.

Refactors the inline gate check at server.ts:1771-1783 into a pure exported
function canDispatchOverTunnel(command). Same behavior as the inline check;
the difference is unit-testability without HTTP.

Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve
listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed.
Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: source-level guards + pure-function unit test + dual-listener behavioral eval

Three layers of regression coverage for the tunnel allowlist:

1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set
   equality on the 26-command literal (the prior intersection-only style let
   new commands sneak into the source without test updates). Adds a regex
   assertion that the `command !== 'newtab'` ownership exemption at
   server.ts:613 still exists — catches refactors that re-introduce the
   catch-22 from the other side. Updates the /command handler test to look
   for canDispatchOverTunnel(body?.command) instead of the inline check.

2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed,
   20 blocked, null/undefined/empty/non-string defensive handling, and alias
   canonicalization (e.g. 'set-content' resolves to 'load-html' which is
   correctly rejected since 'load-html' isn't tunnel-allowed).

3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the
   daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both
   listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and
   assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel
   403s with disallowed_command:pair AND writes a denial-log entry;
   (c) pair over local does NOT trigger the tunnel gate (proves the gate
   is surface-scoped); (d) regression for the catch-22 — newtab + goto on
   the resulting tab does not 403 with "Tab not owned by your agent".

All four tests run free under bun test (no API spend, no ngrok).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md

Both docs already named the 9 new commands as remote-accessible (the operator
guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's
instruction blocks). The allowlist count was the only place the drift was
visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list:
'eval' is in the allowlist, not the denied list — prior doc was wrong.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.21.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot)

The previous bump landed at v1.21.0.0 because gstack-next-version
advances past the highest claimed slot (v1.20.0.0 from #1252) rather
than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and
v1.16.0.0 preserves monotonic version ordering on main once #1234
(v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots

The gate was rejecting any PR VERSION below the util's next-slot
recommendation, even when the lower slot was unclaimed. This blocked
PRs that legitimately want to land at an unclaimed slot below the queue
max — which is what /ship should pick when the goal is monotonic version
ordering on main (lower-numbered PRs landing first preserves order; the
util's "advance past max claimed" semantics only optimizes for fresh
runs picking unique slots, not for queue ordering on merge).

New gate logic:

1. Hard-fail if PR VERSION <= base VERSION (no actual bump).
2. Hard-fail if PR VERSION exactly matches another open PR's VERSION
   (real collision).
3. Pass otherwise. If the PR is below the util's suggestion, emit an
   informational ::notice:: explaining the slot is unclaimed.

The util's output stays informational — it tells fresh /ship runs what
the next-up slot should be, but the gate only blocks actual conflicts.
This is a strict relaxation: every PR that passed the old gate also
passes the new one.

Confirmed by dry-run against the current queue (4 open PRs claiming
1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0):
  - v1.16.0.0  → pass with informational notice (unclaimed)
  - v1.17.0.0  → fail (collision with #1234)
  - v1.15.0.0  → fail (no bump from base)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 00:57:28 -07:00

98 lines
4.0 KiB
TypeScript

/**
* Unit-test the pure tunnel-gate function extracted from the /command handler.
*
* The gate decides whether a paired remote agent's request to `/command` over
* the tunnel surface is allowed (returns true) or 403'd (returns false). Pure,
* synchronous, no HTTP — testable without standing up a Bun.serve listener.
*
* The behavioral coverage of the gate firing on the right surface (and only
* the right surface) lives in `pair-agent-tunnel-eval.test.ts` (paid eval,
* gate-tier).
*/
import { describe, test, expect } from 'bun:test';
import { canDispatchOverTunnel, TUNNEL_COMMANDS } from '../src/server';
describe('canDispatchOverTunnel — closed allowlist', () => {
test('every command in TUNNEL_COMMANDS dispatches over tunnel', () => {
for (const cmd of TUNNEL_COMMANDS) {
expect(canDispatchOverTunnel(cmd)).toBe(true);
}
});
test('TUNNEL_COMMANDS contains the 26-command closed set', () => {
// Mirror the source-level guard in dual-listener.test.ts. If this ever
// disagrees with the literal in server.ts, one of them is wrong.
const expected = new Set([
'goto', 'click', 'text', 'screenshot',
'html', 'links', 'forms', 'accessibility',
'attrs', 'media', 'data',
'scroll', 'press', 'type', 'select', 'wait', 'eval',
'newtab', 'tabs', 'back', 'forward', 'reload',
'snapshot', 'fill', 'url', 'closetab',
]);
expect(TUNNEL_COMMANDS.size).toBe(expected.size);
for (const c of expected) expect(TUNNEL_COMMANDS.has(c)).toBe(true);
for (const c of TUNNEL_COMMANDS) expect(expected.has(c)).toBe(true);
});
});
describe('canDispatchOverTunnel — daemon-config + bootstrap commands rejected', () => {
const blocked = [
'pair', 'unpair', 'cookies', 'setup',
'launch', 'launch-browser', 'connect', 'disconnect',
'restart', 'stop', 'tunnel-start', 'tunnel-stop',
'token-mint', 'token-revoke', 'cookie-picker', 'cookie-import',
'inspector-pick', 'extension-inspect',
'invalid-command-xyz', 'totally-made-up',
];
for (const cmd of blocked) {
test(`rejects '${cmd}'`, () => {
expect(canDispatchOverTunnel(cmd)).toBe(false);
});
}
});
describe('canDispatchOverTunnel — null/undefined/empty input', () => {
test('returns false for empty string', () => {
expect(canDispatchOverTunnel('')).toBe(false);
});
test('returns false for undefined', () => {
expect(canDispatchOverTunnel(undefined)).toBe(false);
});
test('returns false for null', () => {
expect(canDispatchOverTunnel(null)).toBe(false);
});
test('returns false for non-string input (defensive)', () => {
// The body parser may hand the gate a number or object if a malicious
// client sends `{"command": 42}`. The pure gate must treat anything
// non-string as not-allowed rather than throw.
expect(canDispatchOverTunnel(42 as unknown as string)).toBe(false);
expect(canDispatchOverTunnel({} as unknown as string)).toBe(false);
});
});
describe('canDispatchOverTunnel — alias canonicalization', () => {
// canonicalizeCommand resolves aliases (e.g. 'set-content' → 'load-html').
// Any aliased form of an allowlisted canonical command should also pass the
// gate; aliases that resolve to a non-allowlisted canonical command should
// not. We don't hardcode alias names here — we read from the source registry
// by importing what we need from commands.ts.
test('aliases that resolve to allowlisted commands pass the gate', () => {
// 'set-content' canonicalizes to 'load-html'. 'load-html' is NOT in
// TUNNEL_COMMANDS, so 'set-content' must also be rejected. This guards
// against a future alias that accidentally maps a tunnel-allowed name to
// a non-tunnel-allowed canonical (e.g. 'goto' → 'navigate' would break).
expect(canDispatchOverTunnel('set-content')).toBe(false);
});
test('canonical commands pass directly without alias lookup', () => {
expect(canDispatchOverTunnel('goto')).toBe(true);
expect(canDispatchOverTunnel('newtab')).toBe(true);
expect(canDispatchOverTunnel('closetab')).toBe(true);
});
});