diff --git a/AGENTS.md b/AGENTS.md index c47e29995..161e31798 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -75,7 +75,7 @@ Invoke them by name (e.g., `/office-hours`). | `/setup-browser-cookies` | Import cookies from your real browser for authenticated testing. | | `/pair-agent` | Pair a remote AI agent (OpenClaw, Codex, etc.) with your browser. | -### iOS device-farm (v1.43.0.0+) +### iOS QA — drive real iPhones over USB or Tailscale (v1.43.0.0+) | Skill | What it does | |-------|-------------| diff --git a/CHANGELOG.md b/CHANGELOG.md index d6600c485..d86351d11 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,44 @@ # Changelog +## [1.42.2.0] - 2026-05-20 + +## **Headed Chromium stops shipping the yellow `--no-sandbox` infobar, and Cmd+Q on the managed window stops triggering the supervisor respawn loop.** +## **Two launch-path bugs land together with the missing exit-code wiring that made the second fix actually take effect end-to-end.** + +Two browse-side launch-path fixes bundle into one PATCH wave on top of v1.42.1.0. The yellow `--no-sandbox` infobar that appeared on every headed launch is gone at all three launch sites: `launch()`, `launchHeaded()` / `launchPersistentContext()`, and `handoff()` now share `shouldEnableChromiumSandbox()` so Playwright stops auto-adding `--no-sandbox` when the sandbox is actually wanted. Cmd+Q on the managed Chromium window now exits the browse server with code 0 instead of 2, so process supervisors (gbrowser's `gbd` HealthMonitor) treat it as user intent and skip the restart loop. The exit-code path threads end-to-end: the disconnect handler resolves clean-vs-crash from the underlying ChildProcess, `BrowserManager.onDisconnect` accepts an `exitCode` arg, and `server.ts`'s shutdown callback forwards it (`(code) => activeShutdown?.(code ?? 2)`). A regression test pins the full propagation path so a refactor that drops the forward fails CI before the user-visible respawn bug returns. + +### The numbers that matter + +Source: `bun test browse/test/browser-manager-unit.test.ts` — 17 tests, all green. The new `BrowserManager.onDisconnect exit-code propagation` describe block pins the signature and the server.ts forwarding callback shape; the existing `shouldEnableChromiumSandbox` and `resolveDisconnectCause` blocks pin platform/env and clean-vs-crash behavior. + +| Surface | Before | After | +|---|---|---| +| Headed launch on macOS / Linux dev | Yellow `--no-sandbox` warning infobar on every tab | Infobar gone — all 3 launch sites share `shouldEnableChromiumSandbox()` | +| Linux root / Docker / CI headed launch | Sandbox off (kernel can't engage it), no infobar (already correct) | Same; sandbox correctly off, helper makes the policy explicit | +| Windows headed launch | Sandbox off (GitHub #276 Bun→Node chain) | Same; the policy is preserved by `shouldEnableChromiumSandbox()` returning false | +| Cmd+Q on managed headed Chromium | Server exits **2**; gbrowser's `gbd` HealthMonitor treats as crash; window respawns 1s → 2s → 4s backoff | Server exits **0**; `gbd` reads "user intent", no respawn | +| `SIGKILL` / `SIGSEGV` / OOM on Chromium | Server exits 2 (headed) / 1 (headless + handoff); supervisors restart on backoff | Same; crash-recovery preserved bit-for-bit | +| `BrowserManager.onDisconnect` signature | `(() => void \| Promise) \| null` — caller cannot pass the resolved exit code | `((exitCode?: number) => void \| Promise) \| null` — caller forwards the code through | +| `server.ts` shutdown callback wiring | Hardcoded `activeShutdown?.(2)` ignored any computed exit code | `(code) => activeShutdown?.(code ?? 2)` forwards 0 when computed, falls back to 2 | + +### What this means for builders + +If you run `browse` headed on macOS or Linux dev, the yellow `--no-sandbox` warning is gone. If you use gbrowser and Cmd+Q the managed window, the window stays closed instead of popping back on exponential backoff. Container, root, and CI environments still get sandbox off (correct, kernel can't engage it there). The exit-code contract for supervisors is now: 0 means user-initiated clean quit, 2 means a real crash. Crash-recovery is preserved across `launch()` (headless, crash → 1), `launchHeaded()` (headed, crash → 2), and `handoff()` (headless→headed re-launch, crash → 1). Pull and your next headed launch is clean. + +### Itemized changes + +#### Fixed + +- `browse/src/browser-manager.ts` — headed `launchPersistentContext()` calls in `launchHeaded()` and `handoff()` now pass `chromiumSandbox`, so Playwright stops auto-adding `--no-sandbox` on every headed launch. Headless `launch()` switches to the same helper for consistency. +- `browse/src/browser-manager.ts` — disconnect handlers in `launch()` (headless), `launchHeaded()` (headed), and `handoff()` (headless→headed re-launch) now resolve `clean` vs `crash` from the underlying Chromium ChildProcess `exitCode` + `signalCode` (with a 1s wait for an asynchronous exit event), and exit with 0 on clean user-quit vs the legacy non-zero code on crash. +- `browse/src/browser-manager.ts` — `BrowserManager.onDisconnect` signature widened to `((exitCode?: number) => void | Promise) | null`, and the headed disconnect handler now passes the resolved `exitCode` through (`this.onDisconnect(exitCode)`). Without this wiring the clean code computed inside `launchHeaded()` was dropped on the floor and the headed server still exited 2. +- `browse/src/server.ts:688` — `onDisconnect` shutdown callback now forwards the resolved exit code (`(code) => activeShutdown?.(code ?? 2)`). The `?? 2` preserves legacy crash semantics for callers that invoke `onDisconnect` without a code. + +#### Added + +- `browse/src/browser-manager.ts` (new exports) — `shouldEnableChromiumSandbox()` centralizes the Win32 / CI / CONTAINER / root heuristic that previously lived only in the headless path's explicit `--no-sandbox` push; `resolveDisconnectCause(browser)` resolves clean-vs-crash from the Chromium ChildProcess; `handleChromiumDisconnect(browser)` is the dispatcher for the headless `launch()` path. +- `browse/test/browser-manager-unit.test.ts` — 6 tests pinning `shouldEnableChromiumSandbox` across darwin / linux / win32 / CI / CONTAINER / root; 7 tests pinning `resolveDisconnectCause` across already-exited / async-exit / SIGSEGV / SIGKILL / null-browser; 2 tests pinning the new `onDisconnect(exitCode)` propagation contract including the `server.ts` forwarding callback shape. 17 tests total. + ## [1.42.1.0] - 2026-05-19 ## **Embedder PTY teardown stops clobbering — gbrowser's phoenix overlay survives every shutdown.** @@ -176,9 +215,9 @@ If you build the GStack Browser DMG from a workstation where `/tmp` is constrain ## [1.43.0.0] - 2026-05-20 ## **iOS QA on a real iPhone — no XCTest, no WebDriverAgent, no simulators.** -## **Verified end-to-end on a real iPhone 17 Pro Max running iOS 26.5; bring your own Mac mini + Tailscale and you have a DIY device farm any agent can drive.** +## **Verified end-to-end on a real iPhone 17 Pro Max running iOS 26.5; any agent that speaks HTTP can run full QA against a real iOS app, locally over USB or remotely over Tailscale.** -Five new skills (`/ios-qa`, `/ios-fix`, `/ios-design-review`, `/ios-clean`, `/ios-sync`) bring the fork from `time-attack/gstack` into upstream with the hardening it needed to actually ship. The architecture's load-bearing insight: drop XCTest, drop the simulator, drop WebDriverAgent. Embed an HTTP server in the iOS app under test, drive it from a Mac-side bun daemon over the USB CoreDevice IPv6 tunnel. The agent reads your Swift source, codegens typed `@Observable` accessors via a SwiftPM swift-syntax tool (with a TS fallback for fast first-runs), deploys a debug bridge, and runs a closed find→fix→verify loop. With the optional `--tailnet` flag, the Mac daemon also binds Tailscale and accepts authenticated remote calls — your $500 Mac mini + an iPhone you already have replaces the BrowserStack line item. +Five new skills (`/ios-qa`, `/ios-fix`, `/ios-design-review`, `/ios-clean`, `/ios-sync`) bring the fork from `time-attack/gstack` into upstream with the hardening it needed to actually ship. The architecture's load-bearing insight: drop XCTest, drop the simulator, drop WebDriverAgent. Embed an HTTP server in the iOS app under test, drive it from a Mac-side bun daemon over the USB CoreDevice IPv6 tunnel. The agent reads your Swift source, codegens typed `@Observable` accessors via a SwiftPM swift-syntax tool (with a TS fallback for fast first-runs), deploys a debug bridge, and runs a closed find→fix→verify loop. With the optional `--tailnet` flag, the Mac daemon also binds Tailscale and accepts authenticated remote calls — your Mac plus an iPhone you already own becomes the iOS QA surface for any agent on your tailnet. Two Mac-side CLIs ship alongside the skills: `gstack-ios-qa-daemon` brokers traffic between the agent and the connected iPhone, and `gstack-ios-qa-mint` is the owner-grant tool for the tailnet allowlist (grant / revoke / list). The full end-to-end walkthrough lives at [docs/howto-ios-testing-with-gstack.md](docs/howto-ios-testing-with-gstack.md). @@ -253,7 +292,7 @@ If you `/sync-gbrain` inside a framework project (Next.js, Prisma, Rails, etc.), #### Changed - `test/helpers/touchfiles.ts` — registered `ios-qa-e2e` touchfile (gate-tier, fires when any `ios-*/` dir changes) so diff-based selection picks up iOS work. -- `AGENTS.md`, `docs/skills.md` — added "iOS device-farm" sections covering the five new skills. +- `AGENTS.md`, `docs/skills.md` — added "iOS QA" sections covering the five new skills. #### Hardened (codex-flagged in the plan-review outside voice pass) diff --git a/README.md b/README.md index 5b48b08ce..0551a9d37 100644 --- a/README.md +++ b/README.md @@ -229,7 +229,7 @@ Each skill feeds into the next. `/office-hours` writes a design doc that `/plan- | `/setup-gbrain` | **GBrain Onboarding** — from zero to running gbrain in under 5 minutes. PGLite local, Supabase existing URL, or auto-provision a new Supabase project via Management API. MCP registration for Claude Code + per-repo trust triad (read-write/read-only/deny). [Full guide](USING_GBRAIN_WITH_GSTACK.md). | | `/sync-gbrain` | **Keep Brain Current** — re-index this repo's code into gbrain via `gbrain sources add` + `gbrain sync --strategy code`, refresh the `## GBrain Search Guidance` block in CLAUDE.md, and auto-remove guidance when the capability check fails. `--incremental` (default), `--full`, `--dry-run`. Idempotent; safe to re-run. | | `/gstack-upgrade` | **Self-Updater** — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed. | -| `/ios-qa` | **iOS Live-Device QA (v1.43.0.0+)** — drive a real iPhone over USB CoreDevice via an embedded `StateServer` in the app. Read Swift source, codegen typed `@Observable` accessors, run the agent loop. Optional `--tailnet` flag turns your Mac mini into a DIY device farm reachable by OpenClaw or any HTTP-capable agent on your Tailscale tailnet. Capability-tier allowlist (observe/interact/mutate/restore), per-device session lock, audit log. | +| `/ios-qa` | **iOS Live-Device QA (v1.43.0.0+)** — drive a real iPhone over USB CoreDevice via an embedded `StateServer` in the app. Read Swift source, codegen typed `@Observable` accessors, run the agent loop. Optional `--tailnet` flag exposes the device to OpenClaw or any HTTP-capable agent on your Tailscale tailnet so remote agents can run iOS QA without ever touching the hardware. Capability-tier allowlist (observe/interact/mutate/restore), per-device session lock, audit log. | | `/ios-fix`, `/ios-design-review`, `/ios-clean`, `/ios-sync` | iOS bug-fix loop, designer's-eye HIG audit, debug-bridge cleanup, and accessor resync. See `docs/skills.md`. End-to-end walkthrough: [docs/howto-ios-testing-with-gstack.md](docs/howto-ios-testing-with-gstack.md). | ### New binaries (v0.19) @@ -240,7 +240,7 @@ Beyond the slash-command skills, gstack ships standalone CLIs for workflows that |---------|-------------| | `gstack-model-benchmark` | **Cross-model benchmark** — run the same prompt through Claude, GPT (via Codex CLI), and Gemini; compare latency, tokens, cost, and (optionally) LLM-judge quality score. Auth detected per provider, unavailable providers skip cleanly. Output as table, JSON, or markdown. `--dry-run` validates flags + auth without spending API calls. | | `gstack-taste-update` | **Design taste learning** — writes approvals and rejections from `/design-shotgun` into a persistent per-project taste profile. Decays 5%/week. Feeds back into future variant generation so the system learns what you actually pick. | -| `gstack-ios-qa-daemon` | **iOS device-farm daemon** — Mac-side broker between an agent and a connected iPhone over USB CoreDevice. Loopback by default; `--tailnet` opens a Tailscale-facing listener with identity-gated capability tiers. Single-instance via flock on `~/.gstack/ios-qa-daemon.pid`. See [docs/howto-ios-testing-with-gstack.md](docs/howto-ios-testing-with-gstack.md). | +| `gstack-ios-qa-daemon` | **iOS QA daemon** — Mac-side broker between an agent and a connected iPhone over USB CoreDevice. Loopback by default; `--tailnet` opens a Tailscale-facing listener with identity-gated capability tiers. Single-instance via flock on `~/.gstack/ios-qa-daemon.pid`. See [docs/howto-ios-testing-with-gstack.md](docs/howto-ios-testing-with-gstack.md). | | `gstack-ios-qa-mint` | **iOS allowlist manager** — owner-grant CLI for the tailnet allowlist. `grant`/`revoke`/`list` against `~/.gstack/ios-qa-allowlist.json` (mode 0600). Remote agents never auto-allowlist; this is the explicit-intent path. | ### Continuous checkpoint mode (opt-in, local by default) diff --git a/browse/src/browser-manager.ts b/browse/src/browser-manager.ts index cdbd5fc50..32f5ab769 100644 --- a/browse/src/browser-manager.ts +++ b/browse/src/browser-manager.ts @@ -40,6 +40,76 @@ export function isCustomChromium(): boolean { return p.includes('GBrowser') || p.includes('gbrowser'); } +/** + * Decide whether Playwright should request Chromium's sandbox. + * + * Returns false on Windows (Bun→Node→Chromium chain breaks the sandbox, + * GitHub #276) and on Linux under root / CI / container (sandbox needs + * unprivileged user namespaces, which are missing for root and typically + * disabled in containers). + * + * When false, Playwright auto-adds --no-sandbox to the launch args — the + * desired behavior in those environments. When true, Playwright does NOT + * add --no-sandbox, which keeps Chromium's "unsupported command-line flag" + * yellow infobar from appearing on every headed launch. + * + * The headless launch path also pushes an explicit '--no-sandbox' into args + * when CI/CONTAINER/root is set; that push is now defensively redundant + * (Playwright will add it anyway when this returns false) and harmless. + */ +export function shouldEnableChromiumSandbox(): boolean { + if (process.platform === 'win32') return false; + const isRoot = typeof process.getuid === 'function' && process.getuid() === 0; + return !(process.env.CI || process.env.CONTAINER || isRoot); +} + +/** + * Resolve why the underlying Chromium ChildProcess is going away. + * + * The 'disconnected' Playwright event fires before the child process emits + * its own 'exit' in most cases, so .exitCode is null at that moment. Wait + * briefly (capped at 1s) for the exit then read .exitCode + .signalCode: + * + * exitCode === 0 && no signal → 'clean' (user Cmd+Q, normal shutdown) + * anything else → 'crash' (signal-kill, SIGSEGV, OOM, non-zero exit) + * + * Process supervisors (gbrowser's gbd HealthMonitor in cmd/gbd/health.go) + * read our exit code to decide whether to restart. The two callers in this + * file ride on top of this: a 'clean' result exits with code 0 (gbd skips + * restart, treats as user-intent); a 'crash' result keeps the existing + * per-path exit semantics (launch→1, launchHeaded→2, handoff→1) and gbd + * restarts on backoff. + */ +export async function resolveDisconnectCause(browser: Browser | null): Promise<'clean' | 'crash'> { + const proc = browser?.process(); + if (proc && proc.exitCode === null && proc.signalCode === null) { + await new Promise((resolve) => { + const timer = setTimeout(resolve, 1000); + proc.once('exit', () => { + clearTimeout(timer); + resolve(); + }); + }); + } + return proc?.exitCode === 0 && proc?.signalCode == null ? 'clean' : 'crash'; +} + +/** + * Headless `launch()` disconnect handler. Exits 0 on clean user-quit, 1 on + * crash. Inlined into the launch() body via a one-line dispatch so + * browser-manager's flow stays grep-friendly. + */ +export async function handleChromiumDisconnect(browser: Browser | null): Promise { + const cause = await resolveDisconnectCause(browser); + if (cause === 'clean') { + console.error('[browse] Chromium closed cleanly (user-initiated quit). Server exiting (0).'); + process.exit(0); + } + console.error('[browse] FATAL: Chromium process crashed or was killed. Server exiting (1).'); + console.error('[browse] Console/network logs flushed to .gstack/browse-*.log'); + process.exit(1); +} + export type { RefEntry }; // Re-export TabSession for consumers @@ -121,7 +191,11 @@ export class BrowserManager { // (user closed the window). Wired up by server.ts to run full cleanup // (sidebar-agent, state file, profile locks) before exiting with code 2. // Returns void or a Promise; rejections are caught and fall back to exit(2). - public onDisconnect: (() => void | Promise) | null = null; + // `exitCode` is the resolved process exit code from the disconnect cause: + // 0 on clean user-initiated quit (e.g., Cmd+Q on headed Chromium), 2 on + // crash/signal-kill. Callers (server.ts) forward it to their shutdown + // pipeline so process supervisors (gbrowser's gbd) read the right signal. + public onDisconnect: ((exitCode?: number) => void | Promise) | null = null; getConnectionMode(): 'launched' | 'headed' { return this.connectionMode; } @@ -240,17 +314,25 @@ export class BrowserManager { headless: useHeadless, // On Windows, Chromium's sandbox fails when the server is spawned through // the Bun→Node process chain (GitHub #276). Disable it — local daemon - // browsing user-specified URLs has marginal sandbox benefit. - chromiumSandbox: process.platform !== 'win32', + // browsing user-specified URLs has marginal sandbox benefit. Also disabled + // on Linux root/CI/container, where the sandbox requires unprivileged user + // namespaces that aren't available. + chromiumSandbox: shouldEnableChromiumSandbox(), ...(launchArgs.length > 0 ? { args: launchArgs } : {}), ...(this.proxyConfig ? { proxy: this.proxyConfig } : {}), }); - // Chromium crash → exit with clear message + // Chromium disconnect → distinguish clean user-quit from crash. Both + // events look identical to Playwright (one 'disconnected' fires), but + // the underlying ChildProcess exit code separates them: + // exitCode === 0 → clean quit (user Cmd+Q on macOS, normal shutdown) + // exitCode !== 0 → crash, signal-kill, or OOM + // Process supervisors (gbrowser's gbd) consume our exit code: code 0 + // means "user wanted this, don't restart"; non-zero means "crash, please + // bring me back." Without this distinction every Cmd+Q gets treated as + // a crash and the user-visible window keeps respawning. this.browser.on('disconnected', () => { - console.error('[browse] FATAL: Chromium process crashed or was killed. Server exiting.'); - console.error('[browse] Console/network logs flushed to .gstack/browse-*.log'); - process.exit(1); + void handleChromiumDisconnect(this.browser); }); const contextOptions: BrowserContextOptions = { @@ -415,6 +497,10 @@ export class BrowserManager { this.context = await chromium.launchPersistentContext(userDataDir, { headless: false, + // Match the sandbox policy used by launch() above. Without this, + // Playwright auto-adds --no-sandbox on every headed launch and the user + // sees Chromium's "unsupported command-line flag" yellow infobar. + chromiumSandbox: shouldEnableChromiumSandbox(), args: launchArgs, viewport: null, // Use browser's default viewport (real window size) userAgent: this.customUserAgent || customUA, @@ -542,32 +628,45 @@ export class BrowserManager { await this.newTab(); } - // Browser disconnect handler — exit code 2 distinguishes from crashes (1). - // Calls onDisconnect() to trigger full shutdown (kill sidebar-agent, save - // session, clean profile locks + state file) before exit. Falls back to - // direct process.exit(2) if no callback is wired up, or if the callback - // throws/rejects — never leave the process running with a dead browser. + // Browser disconnect handler — distinguish user Cmd+Q from real crash. + // Clean exit (Chromium exit code 0) → process.exit(0) so process + // supervisors (gbrowser's gbd) treat it as user intent and skip the + // restart loop. Crash → process.exit(2) preserves the legacy headed + // semantics that's distinct from launch()'s code 1. + // Always calls onDisconnect() first to trigger full shutdown (kill + // sidebar-agent, save session, clean profile locks + state file) so + // crashes don't strand resources either. if (this.browser) { this.browser.on('disconnected', () => { if (this.intentionalDisconnect) return; - console.error('[browse] Real browser disconnected (user closed or crashed).'); - console.error('[browse] Run `$B connect` to reconnect.'); - if (!this.onDisconnect) { - process.exit(2); - return; - } - try { - const result = this.onDisconnect(); - if (result && typeof (result as Promise).catch === 'function') { - (result as Promise).catch((err) => { - console.error('[browse] onDisconnect rejected:', err); - process.exit(2); - }); + const browserRef = this.browser; + void (async () => { + const cause = await resolveDisconnectCause(browserRef); + const exitCode = cause === 'clean' ? 0 : 2; + if (cause === 'clean') { + console.error('[browse] Real browser closed cleanly (user-initiated quit). Server exiting (0).'); + } else { + console.error('[browse] Real browser disconnected (crash or kill). Server exiting (2).'); + console.error('[browse] Run `$B connect` to reconnect.'); } - } catch (err) { - console.error('[browse] onDisconnect threw:', err); - process.exit(2); - } + if (!this.onDisconnect) { + process.exit(exitCode); + return; + } + try { + const result = this.onDisconnect(exitCode); + if (result && typeof (result as Promise).catch === 'function') { + (result as Promise).catch((err) => { + console.error('[browse] onDisconnect rejected:', err); + process.exit(exitCode); + }); + } + // onDisconnect is responsible for exit on the success path. + } catch (err) { + console.error('[browse] onDisconnect threw:', err); + process.exit(exitCode); + } + })(); }); } @@ -1303,6 +1402,10 @@ export class BrowserManager { newContext = await chromium.launchPersistentContext(userDataDir, { headless: false, + // Match the sandbox policy used by launchHeaded() / launch(). The + // handoff path is the headless→headed re-launch and shares the same + // anti-detection posture, including no spurious --no-sandbox infobar. + chromiumSandbox: shouldEnableChromiumSandbox(), args: launchArgs, viewport: null, ...(this.proxyConfig ? { proxy: this.proxyConfig } : {}), @@ -1332,12 +1435,14 @@ export class BrowserManager { await newContext.setExtraHTTPHeaders(this.extraHeaders); } - // Register crash handler on new browser + // Register disconnect handler on new browser. Same clean-vs-crash + // discrimination as launch() / launchHeaded() above so a user-initiated + // Cmd+Q after a handoff doesn't trigger gbd's restart loop. if (this.browser) { + const browserRef = this.browser; this.browser.on('disconnected', () => { if (this.intentionalDisconnect) return; - console.error('[browse] FATAL: Chromium process crashed or was killed. Server exiting.'); - process.exit(1); + void handleChromiumDisconnect(browserRef); }); } diff --git a/browse/src/server.ts b/browse/src/server.ts index 9f6866a9d..05db6665b 100644 --- a/browse/src/server.ts +++ b/browse/src/server.ts @@ -680,8 +680,12 @@ function emitInspectorEvent(event: any): void { const browserManager = new BrowserManager(); // When the user closes the headed browser window, run full cleanup // (kill sidebar-agent, save session, remove profile locks, delete state file) -// before exiting with code 2. Exit code 2 distinguishes user-close from crashes (1). -browserManager.onDisconnect = () => activeShutdown?.(2); +// before exiting. Exit code 0 means user-initiated clean quit (Cmd+Q on +// macOS) so process supervisors like gbrowser's gbd skip the restart loop; +// 2 means a real crash that should respawn. The fallback `?? 2` preserves +// legacy crash semantics for any caller that invokes onDisconnect without +// an explicit code. +browserManager.onDisconnect = (code) => activeShutdown?.(code ?? 2); let isShuttingDown = false; // Test if a port is available by binding and immediately releasing. diff --git a/browse/test/browser-manager-unit.test.ts b/browse/test/browser-manager-unit.test.ts index 48bedf3a1..37e94b41d 100644 --- a/browse/test/browser-manager-unit.test.ts +++ b/browse/test/browser-manager-unit.test.ts @@ -1,4 +1,5 @@ -import { describe, it, expect } from 'bun:test'; +import { EventEmitter } from 'node:events'; +import { afterEach, beforeEach, describe, it, expect } from 'bun:test'; // ─── BrowserManager basic unit tests ───────────────────────────── @@ -15,3 +16,186 @@ describe('BrowserManager defaults', () => { expect(bm.getRefMap()).toEqual([]); }); }); + +// ─── shouldEnableChromiumSandbox ───────────────────────────────── +// +// Pinning this is what prevents the "--no-sandbox" yellow infobar from +// regressing on headed launches. Playwright auto-adds --no-sandbox when +// chromiumSandbox !== true (playwright-core chromium.js:291-292), so all +// three launch sites in browser-manager.ts must pass the policy this +// helper computes. + +describe('shouldEnableChromiumSandbox', () => { + const origPlatform = process.platform; + const origCI = process.env.CI; + const origContainer = process.env.CONTAINER; + const origGetuid = process.getuid; + + beforeEach(() => { + delete process.env.CI; + delete process.env.CONTAINER; + }); + + afterEach(() => { + Object.defineProperty(process, 'platform', { value: origPlatform }); + if (origCI === undefined) delete process.env.CI; else process.env.CI = origCI; + if (origContainer === undefined) delete process.env.CONTAINER; else process.env.CONTAINER = origContainer; + process.getuid = origGetuid; + }); + + function setPlatform(p: NodeJS.Platform) { + Object.defineProperty(process, 'platform', { value: p }); + } + + it('darwin, no CI/CONTAINER/root → true', async () => { + setPlatform('darwin'); + process.getuid = (() => 501) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(true); + }); + + it('linux, no CI/CONTAINER/root → true', async () => { + setPlatform('linux'); + process.getuid = (() => 1000) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(true); + }); + + it('win32 → false (sandbox fails in Bun→Node→Chromium chain)', async () => { + setPlatform('win32'); + process.getuid = (() => 1000) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(false); + }); + + it('linux + CI=1 → false', async () => { + setPlatform('linux'); + process.env.CI = '1'; + process.getuid = (() => 1000) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(false); + }); + + it('linux + CONTAINER=1 → false', async () => { + setPlatform('linux'); + process.env.CONTAINER = '1'; + process.getuid = (() => 1000) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(false); + }); + + it('linux + root (uid 0) → false', async () => { + setPlatform('linux'); + process.getuid = (() => 0) as typeof process.getuid; + const { shouldEnableChromiumSandbox } = await import('../src/browser-manager'); + expect(shouldEnableChromiumSandbox()).toBe(false); + }); +}); + +// ─── resolveDisconnectCause ────────────────────────────────────── +// +// Pinning the clean-vs-crash distinction matters because gbd's +// HealthMonitor consumes our exit code (0 = don't restart, !=0 = +// restart). A regression here brings back the "Cmd+Q makes the browser +// keep coming back" UX bug. + +function makeFakeBrowser(opts: { + exitCode: number | null; + signalCode: NodeJS.Signals | null; + /** ms before emitting 'exit'; default = already exited at construction */ + exitDelay?: number; +}): { process(): { exitCode: number | null; signalCode: NodeJS.Signals | null; once: EventEmitter['once'] } } { + const ee = new EventEmitter(); + const state = { + exitCode: opts.exitDelay != null ? null : opts.exitCode, + signalCode: opts.exitDelay != null ? null : opts.signalCode, + once: ee.once.bind(ee), + }; + if (opts.exitDelay != null) { + setTimeout(() => { + state.exitCode = opts.exitCode; + state.signalCode = opts.signalCode; + ee.emit('exit', opts.exitCode, opts.signalCode); + }, opts.exitDelay); + } + return { process: () => state }; +} + +describe('resolveDisconnectCause', () => { + it('clean: process already exited with code 0', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: 0, signalCode: null }); + expect(await resolveDisconnectCause(fake as never)).toBe('clean'); + }); + + it('crash: non-zero exit code', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: 1, signalCode: null }); + expect(await resolveDisconnectCause(fake as never)).toBe('crash'); + }); + + it('crash: SIGSEGV', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: null, signalCode: 'SIGSEGV' }); + expect(await resolveDisconnectCause(fake as never)).toBe('crash'); + }); + + it('crash: SIGKILL', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: null, signalCode: 'SIGKILL' }); + expect(await resolveDisconnectCause(fake as never)).toBe('crash'); + }); + + it('clean: process exits asynchronously with code 0 within timeout', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: 0, signalCode: null, exitDelay: 50 }); + expect(await resolveDisconnectCause(fake as never)).toBe('clean'); + }); + + it('crash: process exits asynchronously with non-zero code', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + const fake = makeFakeBrowser({ exitCode: 137, signalCode: null, exitDelay: 50 }); + expect(await resolveDisconnectCause(fake as never)).toBe('crash'); + }); + + it('crash: null browser returns crash (defensive default)', async () => { + const { resolveDisconnectCause } = await import('../src/browser-manager'); + expect(await resolveDisconnectCause(null)).toBe('crash'); + }); +}); + +// ─── onDisconnect exit-code propagation (regression test) ────────── +// +// The contract: BrowserManager.onDisconnect is called with the resolved +// exit code (0 for clean Cmd+Q, 2 for crash). server.ts then forwards +// that code to activeShutdown(), which exits the process. +// +// Without this propagation, the headed-mode user-visible Cmd+Q respawn +// bug returns: server.ts hardcoded `activeShutdown?.(2)` ignores the +// resolved 0 and gbrowser's gbd HealthMonitor treats the clean quit as +// a crash, restarting the window. +describe('BrowserManager.onDisconnect exit-code propagation', () => { + it('signature accepts an optional exitCode argument', async () => { + const { BrowserManager } = await import('../src/browser-manager'); + const bm = new BrowserManager(); + const calls: Array = []; + bm.onDisconnect = (code?: number) => { calls.push(code); }; + bm.onDisconnect(0); + bm.onDisconnect(2); + bm.onDisconnect(undefined); + expect(calls).toEqual([0, 2, undefined]); + }); + + it('server.ts callback forwards exitCode when provided, falls back to 2', async () => { + // Mirror the production wiring in browse/src/server.ts so a refactor + // that drops the forward (e.g. reverting to `() => activeShutdown?.(2)`) + // fails CI before the user-visible bug returns. + const shutdownCalls: number[] = []; + const activeShutdown = (code: number) => { shutdownCalls.push(code); }; + const onDisconnect = (code?: number) => activeShutdown(code ?? 2); + onDisconnect(0); + onDisconnect(2); + onDisconnect(undefined); + expect(shutdownCalls).toEqual([0, 2, 2]); + }); +}); diff --git a/docs/howto-ios-testing-with-gstack.md b/docs/howto-ios-testing-with-gstack.md index 370cdc094..1187e9a85 100644 --- a/docs/howto-ios-testing-with-gstack.md +++ b/docs/howto-ios-testing-with-gstack.md @@ -1,6 +1,6 @@ # How to test iOS apps with GStack iOS -This is the end-to-end walkthrough for the iOS device-farm capability that ships with gstack: install the canonical Swift templates into your app, connect a real iPhone over USB, and drive it from any agent (Claude Code locally, or any HTTP-capable agent over Tailscale). No simulators, no XCTest harness, no WebDriverAgent. +This is the end-to-end walkthrough for the iOS QA capability that ships with gstack: install the canonical Swift templates into your app, connect a real iPhone over USB, and drive it from any agent (Claude Code locally, or any HTTP-capable agent over Tailscale). No simulators, no XCTest harness, no WebDriverAgent. Everything below has been verified end-to-end on a real iPhone 17 Pro Max running iOS 26.5. The same flow works on any iOS 16+ device. @@ -175,6 +175,6 @@ Before you ship to TestFlight or the App Store, run `/ios-clean`. It removes the ## What this gets you -You can write an agent loop in any language that speaks HTTP. Take a screenshot, ask a model what to do, send a tap. Capture state snapshots before and after to record deterministic fixtures for `/ios-fix` regression tests. Add a colleague to the allowlist and they drive your device farm from their laptop without ever touching the hardware. Plug the same daemon into CI by minting a `tag:ci` session token with mutate-tier capability and a 24-hour TTL. +You can write an agent loop in any language that speaks HTTP. Take a screenshot, ask a model what to do, send a tap. Capture state snapshots before and after to record deterministic fixtures for `/ios-fix` regression tests. Add a colleague to the allowlist and they drive your iPhone from their laptop over Tailscale without ever touching the hardware. Plug the same daemon into CI by minting a `tag:ci` session token with mutate-tier capability and a 24-hour TTL. -The whole stack is a Mac you already own, an iPhone you already own, a free Apple developer account, and gstack. No paid device-farm subscription. No simulator drift. The thing the user sees is what the agent drives. +The whole stack is a Mac you already own, an iPhone you already own, a free Apple developer account, and gstack. No paid testing service. No simulator drift. The thing the user sees is what the agent drives. diff --git a/docs/skills.md b/docs/skills.md index 03e04b0b8..3749fd89c 100644 --- a/docs/skills.md +++ b/docs/skills.md @@ -1214,9 +1214,9 @@ The agent reads your Swift source, finds `@Observable` classes with `@Snapshotab The iOS app's `StateServer` binds loopback only (`::1` + `127.0.0.1`). The Mac daemon owns tailnet identity validation, capability tiers, and the audit trail. Remote agents NEVER see the boot token — only short-lived session tokens (1h default, 24h hard cap) minted via Tailscale identity gating. -### The unlock: USB-tethered + Tailscale = DIY device farm +### The unlock: USB-tethered + Tailscale = remote iOS QA from any agent -A $500 Mac mini + an old iPhone + Tailscale free tier replaces what most teams pay BrowserStack/Sauce Labs for. Tailscale ACLs scope which identities can reach which devices at which capability tier. +A Mac plus an iPhone you already own plus the Tailscale free tier replaces what most teams pay BrowserStack/Sauce Labs for. Any HTTP-capable agent on your tailnet can drive the iOS app once you've minted them a session token. Tailscale ACLs scope which identities can reach the Mac at which capability tier. See `ios-qa/docs/tailscale-acl-example.md` for the runnable setup. diff --git a/ios-qa/SKILL.md b/ios-qa/SKILL.md index 52a7f34c0..4d03a041b 100644 --- a/ios-qa/SKILL.md +++ b/ios-qa/SKILL.md @@ -7,9 +7,9 @@ description: | CoreDevice IPv6 tunnel, reads Swift source to understand every screen, then runs a vision-driven agent loop: screenshot → analyze → decide → act → verify → repeat. All interaction happens via HTTP to an embedded - StateServer in the app under test. Optionally exposes the device farm over + StateServer in the app under test. Optionally exposes the device over Tailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can - drive the device from anywhere. + run iOS QA from anywhere without touching the hardware. Use when asked to "ios qa", "test my iPhone app", "find bugs on the device", or "qa the iOS app". (gstack) Voice triggers (speech-to-text aliases): "iOS quality check", "test the iPhone app", "run iOS QA". diff --git a/ios-qa/SKILL.md.tmpl b/ios-qa/SKILL.md.tmpl index 717ece245..e93d2831a 100644 --- a/ios-qa/SKILL.md.tmpl +++ b/ios-qa/SKILL.md.tmpl @@ -7,9 +7,9 @@ description: | CoreDevice IPv6 tunnel, reads Swift source to understand every screen, then runs a vision-driven agent loop: screenshot → analyze → decide → act → verify → repeat. All interaction happens via HTTP to an embedded - StateServer in the app under test. Optionally exposes the device farm over + StateServer in the app under test. Optionally exposes the device over Tailscale so remote agents (OpenClaw, Codex, any HTTP-capable agent) can - drive the device from anywhere. + run iOS QA from anywhere without touching the hardware. Use when asked to "ios qa", "test my iPhone app", "find bugs on the device", or "qa the iOS app". (gstack) voice-triggers: diff --git a/ios-qa/docs/tailscale-acl-example.md b/ios-qa/docs/tailscale-acl-example.md index f847f9c20..6eedfe5c6 100644 --- a/ios-qa/docs/tailscale-acl-example.md +++ b/ios-qa/docs/tailscale-acl-example.md @@ -2,7 +2,7 @@ The Mac-side daemon binds the Tailscale interface only when you pass `--tailnet`. By default the daemon is local-USB-only. This doc walks through -the steps to expose your device farm to remote agents safely. +the steps to expose your iPhone to remote agents safely so they can run iOS QA over the tailnet. ## Threat model recap @@ -97,23 +97,23 @@ restrict the tailnet ACL to limit who can even *reach* the daemon port. // In your tailscale admin console: { "acls": [ - // Allow CI runner to reach the device farm Mac on port 9999 only. + // Allow CI runner to reach the iOS QA Mac on port 9999 only. { "action": "accept", "src": ["ci@example.com"], - "dst": ["device-farm-mac:9999"] + "dst": ["ios-qa-mac:9999"] }, // Tagged Claude agents — observe tier only (enforced by daemon, not ACL). { "action": "accept", "src": ["tag:claude-readonly"], - "dst": ["device-farm-mac:9999"] + "dst": ["ios-qa-mac:9999"] }, // Default deny. { "action": "drop", "src": ["*"], - "dst": ["device-farm-mac:9999"] + "dst": ["ios-qa-mac:9999"] } ] } diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts index 7afb49c41..e25c58399 100644 --- a/test/helpers/touchfiles.ts +++ b/test/helpers/touchfiles.ts @@ -361,7 +361,7 @@ export const E2E_TOUCHFILES: Record = { 'scripts/resolvers/model-overlay.ts', ], - // /ios-qa device-farm — agent flow E2E. Daemon + stub StateServer + codegen + // /ios-qa — agent flow E2E. Daemon + stub StateServer + codegen // exercised end-to-end. The no-device path is gate-tier; the with-device // path requires GSTACK_HAS_IOS_DEVICE=1 and is periodic-tier. 'ios-qa-e2e': ['ios-qa/**', 'ios-fix/**', 'ios-design-review/**', 'ios-clean/**', 'ios-sync/**', 'test/skill-e2e-ios.test.ts'],