mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
8f3701b761
* feat: extend tunnel allowlist to 26 commands + extract canDispatchOverTunnel
Adds newtab, tabs, back, forward, reload, snapshot, fill, url, closetab to
TUNNEL_COMMANDS (matching what cli.ts and REMOTE_BROWSER_ACCESS.md already
documented). Each new command is bounded by the existing per-tab ownership
check at server.ts:613-624 — scoped tokens default to tabPolicy: 'own-only'
so paired agents still can't operate on tabs they don't own.
Refactors the inline gate check at server.ts:1771-1783 into a pure exported
function canDispatchOverTunnel(command). Same behavior as the inline check;
the difference is unit-testability without HTTP.
Adds BROWSE_TUNNEL_LOCAL_ONLY=1 test-mode flag that binds the second Bun.serve
listener with makeFetchHandler('tunnel') on 127.0.0.1 — no ngrok needed.
Production tunnel still requires BROWSE_TUNNEL=1 + valid NGROK_AUTHTOKEN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: source-level guards + pure-function unit test + dual-listener behavioral eval
Three layers of regression coverage for the tunnel allowlist:
1. dual-listener.test.ts: replaces must-include/must-exclude with exact-set
equality on the 26-command literal (the prior intersection-only style let
new commands sneak into the source without test updates). Adds a regex
assertion that the `command !== 'newtab'` ownership exemption at
server.ts:613 still exists — catches refactors that re-introduce the
catch-22 from the other side. Updates the /command handler test to look
for canDispatchOverTunnel(body?.command) instead of the inline check.
2. tunnel-gate-unit.test.ts (new): 53 expects covering all 26 allowed,
20 blocked, null/undefined/empty/non-string defensive handling, and alias
canonicalization (e.g. 'set-content' resolves to 'load-html' which is
correctly rejected since 'load-html' isn't tunnel-allowed).
3. pair-agent-tunnel-eval.test.ts (new): 4 behavioral tests that spawn the
daemon under BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1, bind both
listeners on 127.0.0.1, mint a scoped token via /pair → /connect, and
assert: (a) newtab over tunnel passes the gate; (b) pair over tunnel
403s with disallowed_command:pair AND writes a denial-log entry;
(c) pair over local does NOT trigger the tunnel gate (proves the gate
is surface-scoped); (d) regression for the catch-22 — newtab + goto on
the resulting tab does not 403 with "Tab not owned by your agent".
All four tests run free under bun test (no API spend, no ngrok).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: bump tunnel allowlist count 17 -> 26 in CLAUDE.md and REMOTE_BROWSER_ACCESS.md
Both docs already named the 9 new commands as remote-accessible (the operator
guide's per-command sections at lines 86-119 and 168, plus cli.ts:546-586's
instruction blocks). The allowlist count was the only place the drift was
visible. Also corrected REMOTE_BROWSER_ACCESS.md's denied-commands list:
'eval' is in the allowlist, not the denied list — prior doc was wrong.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.21.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: re-version v1.21.0.0 -> v1.16.0.0 (lowest unclaimed slot)
The previous bump landed at v1.21.0.0 because gstack-next-version
advances past the highest claimed slot (v1.20.0.0 from #1252) rather
than picking the lowest unclaimed. v1.16-v1.18 are unclaimed and
v1.16.0.0 preserves monotonic version ordering on main once #1234
(v1.17), #1233 (v1.19), and #1252 (v1.20) merge after us.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): version-gate enforces collisions, allows lower-but-unclaimed slots
The gate was rejecting any PR VERSION below the util's next-slot
recommendation, even when the lower slot was unclaimed. This blocked
PRs that legitimately want to land at an unclaimed slot below the queue
max — which is what /ship should pick when the goal is monotonic version
ordering on main (lower-numbered PRs landing first preserves order; the
util's "advance past max claimed" semantics only optimizes for fresh
runs picking unique slots, not for queue ordering on merge).
New gate logic:
1. Hard-fail if PR VERSION <= base VERSION (no actual bump).
2. Hard-fail if PR VERSION exactly matches another open PR's VERSION
(real collision).
3. Pass otherwise. If the PR is below the util's suggestion, emit an
informational ::notice:: explaining the slot is unclaimed.
The util's output stays informational — it tells fresh /ship runs what
the next-up slot should be, but the gate only blocks actual conflicts.
This is a strict relaxation: every PR that passed the old gate also
passes the new one.
Confirmed by dry-run against the current queue (4 open PRs claiming
1.17.0.0, 1.19.0.0, 1.21.1.0, 1.22.0.0):
- v1.16.0.0 → pass with informational notice (unclaimed)
- v1.17.0.0 → fail (collision with #1234)
- v1.15.0.0 → fail (no bump from base)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
216 lines
9.2 KiB
TypeScript
216 lines
9.2 KiB
TypeScript
/**
|
|
* Tunnel-surface behavioral eval for the pair-agent flow.
|
|
*
|
|
* Spawns the daemon under `BROWSE_HEADLESS_SKIP=1 BROWSE_TUNNEL_LOCAL_ONLY=1`
|
|
* so BOTH listeners come up: the local listener on `port` and the tunnel
|
|
* listener on `tunnelLocalPort`. No ngrok, no live network — the surface tag
|
|
* (`local` vs `tunnel`) is set by which listener received the request, which
|
|
* is testable as long as both bind locally.
|
|
*
|
|
* This file is the only place that exercises the tunnel-surface gate
|
|
* end-to-end. The source-level guards in `dual-listener.test.ts` catch
|
|
* literal/exemption regressions, the unit test in `tunnel-gate-unit.test.ts`
|
|
* catches gate-logic regressions, and this file catches routing-or-listener
|
|
* regressions (e.g. someone accidentally swaps `'local'` and `'tunnel'` at
|
|
* the makeFetchHandler call site).
|
|
*
|
|
* The browser dispatch path under BROWSE_HEADLESS_SKIP=1 surfaces an error
|
|
* because there is no Playwright context, so the assertion target is
|
|
* specifically that the GATE was passed (i.e. the response is NOT a 403 with
|
|
* `disallowed_command:<x>`), not that the dispatch succeeded.
|
|
*/
|
|
|
|
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
|
import * as fs from 'fs';
|
|
import * as os from 'os';
|
|
import * as path from 'path';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '../..');
|
|
const SERVER_ENTRY = path.join(ROOT, 'browse/src/server.ts');
|
|
|
|
interface DaemonHandle {
|
|
proc: ReturnType<typeof Bun.spawn>;
|
|
localPort: number;
|
|
tunnelPort: number;
|
|
rootToken: string;
|
|
scopedToken: string;
|
|
stateFile: string;
|
|
tempDir: string;
|
|
localUrl: string;
|
|
tunnelUrl: string;
|
|
attemptsLogPath: string;
|
|
}
|
|
|
|
async function waitForReady(baseUrl: string, timeoutMs = 20_000): Promise<void> {
|
|
const deadline = Date.now() + timeoutMs;
|
|
while (Date.now() < deadline) {
|
|
try {
|
|
const resp = await fetch(`${baseUrl}/health`, {
|
|
signal: AbortSignal.timeout(1000),
|
|
});
|
|
if (resp.ok) return;
|
|
} catch {
|
|
// not ready yet
|
|
}
|
|
await new Promise(r => setTimeout(r, 200));
|
|
}
|
|
throw new Error(`Daemon did not become ready within ${timeoutMs}ms at ${baseUrl}`);
|
|
}
|
|
|
|
async function waitForTunnelPort(stateFile: string, timeoutMs = 20_000): Promise<number> {
|
|
const deadline = Date.now() + timeoutMs;
|
|
while (Date.now() < deadline) {
|
|
try {
|
|
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
|
if (typeof state.tunnelLocalPort === 'number') return state.tunnelLocalPort;
|
|
} catch {
|
|
// state file not written yet
|
|
}
|
|
await new Promise(r => setTimeout(r, 200));
|
|
}
|
|
throw new Error(`Tunnel local port did not appear in ${stateFile} within ${timeoutMs}ms`);
|
|
}
|
|
|
|
async function spawnDaemonWithTunnel(): Promise<DaemonHandle> {
|
|
// Isolate this test's analytics + denial log directory so we can assert on a
|
|
// fresh attempts.jsonl without colliding with the user's real ~/.gstack.
|
|
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'pair-agent-tunnel-eval-'));
|
|
const stateFile = path.join(tempDir, 'browse.json');
|
|
const fakeHome = path.join(tempDir, 'home');
|
|
fs.mkdirSync(fakeHome, { recursive: true });
|
|
const localPort = 30000 + Math.floor(Math.random() * 30000);
|
|
const attemptsLogPath = path.join(fakeHome, '.gstack', 'security', 'attempts.jsonl');
|
|
|
|
const proc = Bun.spawn(['bun', 'run', SERVER_ENTRY], {
|
|
cwd: ROOT,
|
|
env: {
|
|
...process.env,
|
|
HOME: fakeHome,
|
|
BROWSE_HEADLESS_SKIP: '1',
|
|
BROWSE_TUNNEL_LOCAL_ONLY: '1',
|
|
BROWSE_PORT: String(localPort),
|
|
BROWSE_STATE_FILE: stateFile,
|
|
BROWSE_PARENT_PID: '0',
|
|
BROWSE_IDLE_TIMEOUT: '600000',
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
const localUrl = `http://127.0.0.1:${localPort}`;
|
|
await waitForReady(localUrl);
|
|
const tunnelPort = await waitForTunnelPort(stateFile);
|
|
const tunnelUrl = `http://127.0.0.1:${tunnelPort}`;
|
|
|
|
// Read the root token, then exchange it for a scoped token via /pair → /connect.
|
|
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
|
const rootToken = state.token;
|
|
|
|
const pairResp = await fetch(`${localUrl}/pair`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${rootToken}` },
|
|
body: JSON.stringify({ clientId: 'tunnel-eval' }),
|
|
});
|
|
if (!pairResp.ok) throw new Error(`/pair failed: ${pairResp.status}`);
|
|
const { setup_key } = await pairResp.json() as any;
|
|
|
|
const connectResp = await fetch(`${localUrl}/connect`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({ setup_key }),
|
|
});
|
|
if (!connectResp.ok) throw new Error(`/connect failed: ${connectResp.status}`);
|
|
const { token: scopedToken } = await connectResp.json() as any;
|
|
|
|
return { proc, localPort, tunnelPort, rootToken, scopedToken, stateFile, tempDir, localUrl, tunnelUrl, attemptsLogPath };
|
|
}
|
|
|
|
function killDaemon(handle: DaemonHandle): void {
|
|
try { handle.proc.kill('SIGKILL'); } catch {}
|
|
try { fs.rmSync(handle.tempDir, { recursive: true, force: true }); } catch {}
|
|
}
|
|
|
|
async function postCommand(baseUrl: string, token: string, body: any): Promise<{ status: number; bodyText: string }> {
|
|
const resp = await fetch(`${baseUrl}/command`, {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
|
|
body: JSON.stringify(body),
|
|
});
|
|
return { status: resp.status, bodyText: await resp.text() };
|
|
}
|
|
|
|
describe('pair-agent over tunnel surface — gate fires on the right surface only', () => {
|
|
let daemon: DaemonHandle;
|
|
|
|
beforeAll(async () => {
|
|
daemon = await spawnDaemonWithTunnel();
|
|
}, 30_000);
|
|
|
|
afterAll(() => {
|
|
if (daemon) killDaemon(daemon);
|
|
});
|
|
|
|
test('newtab on tunnel surface passes the allowlist gate (not 403 disallowed_command)', async () => {
|
|
const { status, bodyText } = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'newtab' });
|
|
// Browser dispatch under BROWSE_HEADLESS_SKIP=1 will fail differently
|
|
// (no Playwright context), but the gate must NOT 403 with
|
|
// disallowed_command.
|
|
if (status === 403) {
|
|
expect(bodyText).not.toContain('disallowed_command:newtab');
|
|
expect(bodyText).not.toContain('is not allowed over the tunnel surface');
|
|
}
|
|
});
|
|
|
|
test('pair on tunnel surface 403s with disallowed_command and writes a denial-log entry', async () => {
|
|
// Snapshot attempts.jsonl size before the call so we can detect the new entry.
|
|
let beforeBytes = 0;
|
|
try { beforeBytes = fs.statSync(daemon.attemptsLogPath).size; } catch {}
|
|
|
|
const { status, bodyText } = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'pair' });
|
|
expect(status).toBe(403);
|
|
expect(bodyText).toContain('is not allowed over the tunnel surface');
|
|
|
|
// Wait briefly for the denial-log writer (it's synchronous fs.appendFile in
|
|
// tunnel-denial-log.ts but the OS may need a tick to flush).
|
|
await new Promise(r => setTimeout(r, 250));
|
|
expect(fs.existsSync(daemon.attemptsLogPath)).toBe(true);
|
|
const after = fs.readFileSync(daemon.attemptsLogPath, 'utf-8');
|
|
const newSection = after.slice(beforeBytes);
|
|
expect(newSection).toContain('disallowed_command:pair');
|
|
});
|
|
|
|
test('pair on local surface does NOT trigger the tunnel allowlist gate', async () => {
|
|
// The same scoped token over the LOCAL listener must not see the
|
|
// disallowed_command path — the tunnel gate is surface-scoped.
|
|
const { status, bodyText } = await postCommand(daemon.localUrl, daemon.scopedToken, { command: 'pair' });
|
|
// Whatever happens (404 unknown command, 403 from a token-scope check, or
|
|
// 200 if the local handler accepts it) the response must NOT come from the
|
|
// tunnel allowlist gate.
|
|
expect(bodyText).not.toContain('disallowed_command:pair');
|
|
expect(bodyText).not.toContain('is not allowed over the tunnel surface');
|
|
expect([200, 400, 403, 404, 500]).toContain(status);
|
|
});
|
|
|
|
test('catch-22 regression: newtab + goto on the just-created tab passes ownership check', async () => {
|
|
// Without the `command !== 'newtab'` exemption at server.ts:613, scoped
|
|
// agents can't open a tab (newtab fails ownership) and can't goto an
|
|
// existing tab (also fails ownership). This proves the exemption holds:
|
|
// newtab succeeds the gate AND the ownership check, then the agent can
|
|
// hand off the tabId to a follow-up command without hitting the
|
|
// "Tab not owned by your agent" error.
|
|
const newtabResp = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'newtab' });
|
|
if (newtabResp.status === 403) {
|
|
expect(newtabResp.bodyText).not.toContain('disallowed_command');
|
|
expect(newtabResp.bodyText).not.toContain('Tab not owned by your agent');
|
|
}
|
|
|
|
// Even if the headless-skip dispatch fails before returning a tabId, a
|
|
// follow-up `goto` over the tunnel surface must not 403 with
|
|
// `disallowed_command:goto`. We are NOT asserting that the goto
|
|
// succeeds — only that the allowlist + ownership exemption don't reject
|
|
// it as a class.
|
|
const gotoResp = await postCommand(daemon.tunnelUrl, daemon.scopedToken, { command: 'goto', args: ['http://127.0.0.1:1/'] });
|
|
expect(gotoResp.bodyText).not.toContain('disallowed_command:goto');
|
|
expect(gotoResp.bodyText).not.toContain('is not allowed over the tunnel surface');
|
|
});
|
|
});
|