docs: headed mode + sidebar agent documentation (v0.12.0)

- README: sidebar agent section, personal automation example (school parent
  portal), two auth paths (manual login + cookie import), DevTools MCP mention
- BROWSER.md: sidebar agent section with usage, timeout, session isolation,
  authentication, and random delay documentation
- connect-chrome template: add sidebar chat onboarding step
- CHANGELOG: v0.12.0 entry covering headed mode, sidebar agent, extension
- VERSION: bump to 0.12.0.0
- TODOS: Chrome DevTools MCP integration as P0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-26 10:46:08 -06:00
parent 376b572c5a
commit ee1e33a607
7 changed files with 91 additions and 5 deletions
+28
View File
@@ -217,6 +217,34 @@ Or do it manually:
- **Side Panel is empty:** The feed only shows activity after the extension connects. Run a browse command (`$B snapshot`) to see it appear.
- **Extension disappeared after Chrome update:** Sideloaded extensions persist across updates. If it's gone, reload it from Step 3.
### Sidebar agent
The Chrome side panel includes a chat interface. Type a message and a child Claude instance executes it in the browser. The sidebar agent has access to `Bash`, `Read`, `Glob`, and `Grep` tools (same as Claude Code, minus `Edit` and `Write` ... read-only by design).
**How it works:**
1. You type a message in the side panel chat
2. The extension POSTs to the local browse server (`/sidebar-command`)
3. The server queues the message and the sidebar-agent process spawns `claude -p` with your message + the current page context
4. Claude executes browse commands via Bash (`$B snapshot`, `$B click @e3`, etc.)
5. Progress streams back to the side panel in real time
**What you can do:**
- "Take a snapshot and describe what you see"
- "Click the Login button, fill in test@example.com / password123, and submit"
- "Go through every row in this table and extract the names and emails"
- "Navigate to Settings > Account and screenshot it"
**Timeout:** Each task gets up to 5 minutes. Multi-page workflows (navigating a directory, filling forms across pages) work within this window. If a task times out, the side panel shows an error and you can retry or break it into smaller steps.
**Session isolation:** Each sidebar session runs in its own git worktree. The sidebar agent won't interfere with your main Claude Code session.
**Authentication:** The sidebar agent uses the same browser session as headed mode. Two options:
1. Log in manually in the headed browser ... your session persists for the sidebar agent
2. Import cookies from your real Chrome via `/setup-browser-cookies`
**Random delays:** If you need the agent to pause between actions (e.g., to avoid rate limits), use `sleep` in bash or `$B wait <milliseconds>`.
### User handoff
When the headless browser can't proceed (CAPTCHA, MFA, complex auth), `handoff` opens a visible Chrome window at the exact same page with all cookies, localStorage, and tabs preserved. The user solves the problem manually, then `resume` returns control to the agent with a fresh snapshot.
+20
View File
@@ -1,5 +1,25 @@
# Changelog
## [0.12.0.0] - 2026-03-26 — Headed Mode + Sidebar Agent
You can now watch Claude work in a real Chrome window and direct it from a sidebar chat.
### Added
- **Headed mode with sidebar agent.** `$B connect` launches a visible Chrome window with the gstack extension. The Side Panel shows a live activity feed of every command AND a chat interface where you type natural language instructions. A child Claude instance executes your requests in the browser ... navigate pages, click buttons, fill forms, extract data. Each task gets up to 5 minutes.
- **Personal automation.** The sidebar agent handles repetitive browser tasks beyond dev workflows. Browse your kid's school parent portal and add parent contact info to Google Contacts. Fill out vendor onboarding forms. Extract data from dashboards. Log in once in the headed browser or import cookies from your real Chrome with `/setup-browser-cookies`.
- **Chrome extension.** Toolbar badge (green=connected, gray=not), Side Panel with activity feed + chat + refs tab, @ref overlays on the page, and a connection pill showing which window gstack controls. Auto-loads when you run `$B connect`.
- **`/connect-chrome` skill.** Guided setup: launches Chrome, verifies the extension, demos the activity feed, and introduces the sidebar chat.
### Changed
- **Sidebar agent ungated.** Previously required `--chat` flag. Now always available in headed mode. The sidebar agent has the same security model as Claude Code itself (Bash, Read, Glob, Grep on localhost).
- **Agent timeout raised to 5 minutes.** Multi-page tasks (navigating directories, filling forms across pages) need more than the previous 2-minute limit.
## [0.11.19.0] - 2026-03-24
### Fixed
+5 -1
View File
@@ -189,7 +189,11 @@ gstack works well with one sprint. It gets interesting with ten running at once.
**`/document-release` is the engineer you never had.** It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now `/ship` auto-invokes it — docs stay current without an extra command.
**Real browser mode.** `$B connect` launches your actual Chrome as a headed window controlled by Playwright. You watch Claude click, fill, and navigate in real time — same window, same screen. A subtle green shimmer at the top edge tells you which Chrome window gstack controls. All existing browse commands work unchanged. `$B disconnect` returns to headless. A Chrome extension Side Panel shows a live activity feed of every command. This is co-presence — Claude isn't remote-controlling a hidden browser, it's sitting next to you in the same cockpit.
**Real browser mode.** `$B connect` launches your actual Chrome as a headed window controlled by Playwright. You watch Claude click, fill, and navigate in real time — same window, same screen. A subtle green shimmer at the top edge tells you which Chrome window gstack controls. All existing browse commands work unchanged. `$B disconnect` returns to headless. A Chrome extension Side Panel shows a live activity feed of every command and a chat sidebar where you can direct Claude. This is co-presence — Claude isn't remote-controlling a hidden browser, it's sitting next to you in the same cockpit.
**Sidebar agent — your AI browser assistant.** Type natural language instructions in the Chrome side panel and a child Claude instance executes them. "Navigate to the settings page and screenshot it." "Fill out this form with test data." "Go through every item in this list and extract the prices." Each task gets up to 5 minutes. The sidebar agent runs in an isolated session, so it won't interfere with your main Claude Code window. It's like having a second pair of hands in the browser.
**Personal automation.** The sidebar agent isn't just for dev workflows. Example: "Browse my kid's school parent portal and add all the other parents' names, phone numbers, and photos to my Google Contacts." Two ways to get authenticated: (1) log in once in the headed browser — your session persists, or (2) run `/setup-browser-cookies` to import cookies from your real Chrome. Once authenticated, Claude navigates the directory, extracts the data, and creates the contacts.
**Browser handoff when the AI gets stuck.** Hit a CAPTCHA, auth wall, or MFA prompt? `$B handoff` opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, `$B resume` picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures.
+20
View File
@@ -14,6 +14,26 @@
**Priority:** P2
**Depends on:** Blog post about Search Before Building
## Chrome DevTools MCP Integration
### Real Chrome session access
**What:** Integrate Chrome DevTools MCP to connect to the user's real Chrome session with real cookies, real state, no Playwright middleman.
**Why:** Right now, headed mode launches a fresh Chromium profile. Users must log in manually or import cookies. Chrome DevTools MCP connects to the user's actual Chrome ... instant access to every authenticated site. This is the future of browser automation for AI agents.
**Context:** Google shipped Chrome DevTools MCP in Chrome 146+ (June 2025). It provides screenshots, console messages, performance traces, Lighthouse audits, and full page interaction through the user's real browser. gstack should use it for real-session access while keeping Playwright for headless CI/testing workflows.
Potential new skills:
- `/debug-browser`: JS error tracing with source-mapped stack traces
- `/perf-debug`: performance traces, Core Web Vitals, network waterfall
May replace `/setup-browser-cookies` for most use cases since the user's real cookies are already there.
**Effort:** L (human: ~2 weeks / CC: ~2 hours)
**Priority:** P0
**Depends on:** Chrome 146+, DevTools MCP server installed
## Browse
### Bundle server.ts into compiled binary
+1 -1
View File
@@ -1 +1 @@
0.11.19.0
0.12.0.0
+16 -2
View File
@@ -104,13 +104,27 @@ Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot
commands appear in the activity feed. Every command Claude runs will show up here
in real time."
## Step 5: What's next
## Step 5: Sidebar chat
After the activity feed demo, tell the user about the sidebar chat:
> The Side Panel also has a **chat tab**. Try typing a message like "take a
> snapshot and describe this page." A child Claude instance will execute your
> request in the browser — you'll see the commands appear in the activity feed.
>
> The sidebar agent can navigate pages, click buttons, fill forms, and read
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
> it won't interfere with this Claude Code window.
## Step 6: What's next
Tell the user:
> You're all set! Chrome is under Claude's control with the Side Panel showing
> live activity. Here's what you can do:
> live activity and a chat sidebar for direct commands. Here's what you can do:
>
> - **Chat in the sidebar** — type natural language instructions and Claude
> executes them in the browser
> - **Run any browse command** — `$B goto`, `$B click`, `$B snapshot` — and
> watch it happen in Chrome + the Side Panel
> - **Use /qa or /design-review** — they'll run in the visible Chrome window
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "gstack",
"version": "0.11.19.0",
"version": "0.12.0.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT",
"type": "module",