From c0a80ee77cce3d63ad27dc0450fa7bd998aaa74d Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 21 Mar 2026 12:52:33 -0700 Subject: [PATCH] docs: document real browser mode + Chrome extension in BROWSER.md and README.md BROWSER.md: new sections for connect/disconnect/focus commands, Chrome extension Side Panel install, CDP-aware skills, activity streaming. Updated command reference table, key components, env vars, source map. README.md: updated /browse description, added "Real browser mode" to What's New section. Co-Authored-By: Claude Opus 4.6 (1M context) --- BROWSER.md | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 4 +++- 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/BROWSER.md b/BROWSER.md index b024cdd4..00ca899e 100644 --- a/BROWSER.md +++ b/BROWSER.md @@ -18,6 +18,7 @@ This document covers the command reference and internals of gstack's headless br | Cookies | `cookie-import`, `cookie-import-browser` | Import cookies from file or real browser | | Multi-step | `chain` (JSON from stdin) | Batch commands in one call | | Handoff | `handoff [reason]`, `resume` | Switch to visible Chrome for user takeover | +| Real browser | `connect`, `disconnect`, `focus` | Control real Chrome, visible window | All selector arguments accept CSS selectors, `@e` refs after `snapshot`, or `@c` refs after `snapshot -C`. 50+ commands total plus cookie import. @@ -70,6 +71,8 @@ browse/ │ ├── cookie-import-browser.ts # Decrypt + import cookies from real Chromium browsers │ ├── cookie-picker-routes.ts # HTTP routes for interactive cookie picker UI │ ├── cookie-picker-ui.ts # Self-contained HTML/CSS/JS for cookie picker +│ ├── chrome-launcher.ts # Browser discovery, CDP probe, runtime detection +│ ├── activity.ts # Activity streaming (SSE) for Chrome extension │ └── buffers.ts # CircularBuffer + console/network/dialog capture ├── test/ # Integration tests + HTML fixtures └── dist/ @@ -124,6 +127,64 @@ The server hooks into Playwright's `page.on('console')`, `page.on('response')`, The `console`, `network`, and `dialog` commands read from the in-memory buffers, not disk. +### Real browser mode (`connect`) + +Instead of headless Chromium, `connect` launches your real Chrome as a headed window controlled by Playwright. You see everything Claude does in real time. + +```bash +$B connect # launch real Chrome, headed +$B goto https://app.com # navigates in the visible window +$B snapshot -i # refs from the real page +$B click @e3 # clicks in the real window +$B focus # bring Chrome window to foreground (macOS) +$B status # shows Mode: cdp +$B disconnect # back to headless mode +``` + +The window has a subtle green shimmer line at the top edge and a floating "gstack" pill in the bottom-right corner so you always know which Chrome window is being controlled. + +**How it works:** Playwright's `channel: 'chrome'` launches your system Chrome binary via a native pipe protocol — not CDP WebSocket. All existing browse commands work unchanged because they go through Playwright's abstraction layer. + +**When to use it:** +- QA testing where you want to watch Claude click through your app +- Design review where you need to see exactly what Claude sees +- Debugging where headless behavior differs from real Chrome +- Demos where you're sharing your screen + +**Commands:** + +| Command | What it does | +|---------|-------------| +| `connect` | Launch real Chrome, restart server in headed mode | +| `disconnect` | Close real Chrome, restart in headless mode | +| `focus` | Bring Chrome to foreground (macOS). `focus @e3` also scrolls element into view | +| `status` | Shows `Mode: cdp` when connected, `Mode: launched` when headless | + +**CDP-aware skills:** When in real-browser mode, `/qa` and `/design-review` automatically skip cookie import prompts and headless workarounds. + +### Chrome extension (Side Panel) + +A Manifest V3 Chrome extension shows live browse activity in a Side Panel: + +``` +extension/ + manifest.json — Manifest V3, sidePanel permission + background.js — polls /health, relays refs, badge status + sidepanel.html/js — live activity feed, dark theme + popup.html/js — port config, connection status + content.js/css — @ref overlays + connection pill +``` + +**Install:** `chrome://extensions` → Developer mode → Load unpacked → select `extension/` + +**Setup:** Click the gstack icon → enter the browse server port (shown by `$B status` or in `.gstack/browse.json`). + +**Features:** +- Toolbar badge: green when connected, gray when not +- Side Panel: live feed of every browse command and result +- @ref overlays: floating panel showing current refs on the page +- Connection pill: subtle indicator on every page when connected + ### User handoff When the headless browser can't proceed (CAPTCHA, MFA, complex auth), `handoff` opens a visible Chrome window at the exact same page with all cookies, localStorage, and tabs preserved. The user solves the problem manually, then `resume` returns control to the agent with a fresh snapshot. @@ -171,6 +232,8 @@ No port collisions. No shared state. Each project is fully isolated. | `BROWSE_IDLE_TIMEOUT` | 1800000 (30 min) | Idle shutdown timeout in ms | | `BROWSE_STATE_FILE` | `.gstack/browse.json` | Path to state file (CLI passes to server) | | `BROWSE_SERVER_SCRIPT` | auto-detected | Path to server.ts | +| `BROWSE_CDP_URL` | (none) | Set to `channel:chrome` for real browser mode | +| `BROWSE_CDP_PORT` | 0 | CDP port (used internally) | ### Performance @@ -250,6 +313,8 @@ Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fi | `browse/src/cookie-import-browser.ts` | Decrypt Chromium cookies via macOS Keychain + PBKDF2/AES-128-CBC. Auto-detects installed browsers. | | `browse/src/cookie-picker-routes.ts` | HTTP routes for `/cookie-picker/*` — browser list, domain search, import, remove. | | `browse/src/cookie-picker-ui.ts` | Self-contained HTML generator for the interactive cookie picker (dark theme, no frameworks). | +| `browse/src/chrome-launcher.ts` | Browser binary discovery, CDP port probe, runtime detection (Conductor/Claude Code/Codex/terminal). | +| `browse/src/activity.ts` | Activity streaming — `ActivityEntry` type, `CircularBuffer`, privacy filtering, SSE subscriber management. | | `browse/src/buffers.ts` | `CircularBuffer` (O(1) ring buffer) + console/network/dialog capture with async disk flush. | ### Deploying to the active skill diff --git a/README.md b/README.md index 07047797..b84fb67e 100644 --- a/README.md +++ b/README.md @@ -142,7 +142,7 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack. | `/ship` | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command. | | `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. | | `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. | -| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. | +| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. `$B connect` launches your real Chrome as a headed window — watch every action live. | | `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. | ### Power tools @@ -172,6 +172,8 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack. **`/document-release` is the engineer you never had.** It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now `/ship` auto-invokes it — docs stay current without an extra command. +**Real browser mode.** `$B connect` launches your actual Chrome as a headed window controlled by Playwright. You watch Claude click, fill, and navigate in real time — same window, same screen. A subtle green shimmer at the top edge tells you which Chrome window gstack controls. All existing browse commands work unchanged. `$B disconnect` returns to headless. A Chrome extension Side Panel shows a live activity feed of every command. This is co-presence — Claude isn't remote-controlling a hidden browser, it's sitting next to you in the same cockpit. + **Browser handoff when the AI gets stuck.** Hit a CAPTCHA, auth wall, or MFA prompt? `$B handoff` opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, `$B resume` picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures. **Multi-AI second opinion.** `/codex` gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both `/review` (Claude) and `/codex` (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.