mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
dc0bae82d3
* fix: sidebar agent uses extension's activeTabUrl instead of stale Playwright URL When the user navigates manually in headed Chrome, Playwright's page.url() stays on the old page. The sidebar agent was using this stale URL in its system prompt, causing it to navigate to the wrong page (e.g., Hacker News instead of the user's current page). The Chrome extension now captures the active tab URL via chrome.tabs.query() and sends it as activeTabUrl in the /sidebar-command POST body. The server prefers this over Playwright's URL. The URL is sanitized (http/https only, control chars stripped, 2048 char limit) to prevent prompt injection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: connect-chrome pre-flight cleanup + improved onboarding docs Adds Step 0 pre-flight cleanup that kills stale browse servers and cleans Chromium profile locks before connecting. Improves the onboarding flow with clearer instructions for finding the extension, opening the Side Panel, and troubleshooting connection issues. Fixes Mode check from cdp to headed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent test suite (layers 1-2) Layer 1 (unit): 18 tests for URL sanitization in sidebar-utils.ts — http/https pass, chrome:// rejected, javascript: rejected, control chars stripped, truncation. Layer 2 (integration): 13 tests for server HTTP endpoints — auth, sidebar-command queue writes, activeTabUrl override/fallback, event relay to chat buffer, message queuing, queue overflow (429), chat clear, agent kill. Source changes for testability: - Extract sanitizeExtensionUrl() to browse/src/sidebar-utils.ts - Add BROWSE_HEADLESS_SKIP env var to skip browser launch in HTTP-only tests - Add SIDEBAR_QUEUE_PATH env var to both server.ts and sidebar-agent.ts - Add SIDEBAR_AGENT_TIMEOUT env var to sidebar-agent.ts - Sync package.json version to match VERSION (0.12.2.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent round-trip tests with mock claude (layer 3) Starts server + sidebar-agent together with a mock claude binary (shell script outputting canned stream-json). Verifies the full queue-based message flow: - Full round-trip: POST /sidebar-command → queue → agent → mock claude → events → chat - Claude crash recovery: mock exits 1, agent_error appears, status returns to idle - Sequential queue drain: two rapid messages both process in order Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar agent E2E tests with real Claude (layer 4) Two E2E tests that exercise the full sidebar agent flow with real Claude: - sidebar-navigate: POST /sidebar-command asking Claude to describe a fixture page, verify it responds with page content through the chat buffer - sidebar-url-accuracy: POST with activeTabUrl differing from Playwright URL, verify the queue prompt uses the extension URL (the core bug fix) Both registered as periodic tier (~$0.80 total, non-deterministic). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar E2E tests — sequential execution + eval collector fix Both tests now pass: - sidebar-url-accuracy: deterministic queue file check (no Claude needed) - sidebar-navigate: real Claude responds through sidebar agent queue Fixed: testIfSelected (sequential, not concurrent) to avoid queue file conflicts. Added cost_usd field for eval collector compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: kill stale sidebar-agent processes before starting new one Each /connect-chrome starts a new sidebar-agent subprocess with unref() but never kills the previous one. Old agents accumulate as zombies with stale auth tokens. When they pick up queue entries, their event relay fails (401), so the server never receives agent_done and marks the agent as "hung". The user sees the sidebar freeze. Fix: pkill any existing sidebar-agent.ts processes before spawning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add P1 TODO for sidebar Write tool + error visibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
203 lines
7.3 KiB
Cheetah
203 lines
7.3 KiB
Cheetah
---
|
|
name: connect-chrome
|
|
version: 0.1.0
|
|
description: |
|
|
Launch real Chrome controlled by gstack with the Side Panel extension auto-loaded.
|
|
One command: connects Claude to a visible Chrome window where you can watch every
|
|
action in real time. The extension shows a live activity feed in the Side Panel.
|
|
Use when asked to "connect chrome", "open chrome", "real browser", "launch chrome",
|
|
"side panel", or "control my browser".
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- AskUserQuestion
|
|
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
# /connect-chrome — Launch Real Chrome with Side Panel
|
|
|
|
Connect Claude to a visible Chrome window with the gstack extension auto-loaded.
|
|
You see every click, every navigation, every action in real time.
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
## Step 0: Pre-flight cleanup
|
|
|
|
Before connecting, kill any stale browse servers and clean up lock files that
|
|
may have persisted from a crash. This prevents "already connected" false
|
|
positives and Chromium profile lock conflicts.
|
|
|
|
```bash
|
|
# Kill any existing browse server
|
|
if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
|
|
_OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
|
|
[ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
|
|
sleep 1
|
|
[ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
|
|
rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
|
|
fi
|
|
# Clean Chromium profile locks (can persist after crashes)
|
|
_PROFILE_DIR="$HOME/.gstack/chromium-profile"
|
|
for _LF in SingletonLock SingletonSocket SingletonCookie; do
|
|
rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
|
|
done
|
|
echo "Pre-flight cleanup done"
|
|
```
|
|
|
|
## Step 1: Connect
|
|
|
|
```bash
|
|
$B connect
|
|
```
|
|
|
|
This launches Playwright's bundled Chromium in headed mode with:
|
|
- A visible window you can watch (not your regular Chrome — it stays untouched)
|
|
- The gstack Chrome extension auto-loaded via `launchPersistentContext`
|
|
- A golden shimmer line at the top of every page so you know which window is controlled
|
|
- A sidebar agent process for chat commands
|
|
|
|
The `connect` command auto-discovers the extension from the gstack install
|
|
directory. It always uses port **34567** so the extension can auto-connect.
|
|
|
|
After connecting, print the full output to the user. Confirm you see
|
|
`Mode: headed` in the output.
|
|
|
|
If the output shows an error or the mode is not `headed`, run `$B status` and
|
|
share the output with the user before proceeding.
|
|
|
|
## Step 2: Verify
|
|
|
|
```bash
|
|
$B status
|
|
```
|
|
|
|
Confirm the output shows `Mode: headed`. Read the port from the state file:
|
|
|
|
```bash
|
|
cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
|
|
```
|
|
|
|
The port should be **34567**. If it's different, note it — the user may need it
|
|
for the Side Panel.
|
|
|
|
Also find the extension path so you can help the user if they need to load it manually:
|
|
|
|
```bash
|
|
_EXT_PATH=""
|
|
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
|
[ -n "$_ROOT" ] && [ -f "$_ROOT/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.claude/skills/gstack/extension"
|
|
[ -z "$_EXT_PATH" ] && [ -f "$HOME/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.claude/skills/gstack/extension"
|
|
echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
|
|
```
|
|
|
|
## Step 3: Guide the user to the Side Panel
|
|
|
|
Use AskUserQuestion:
|
|
|
|
> Chrome is launched with gstack control. You should see Playwright's Chromium
|
|
> (not your regular Chrome) with a golden shimmer line at the top of the page.
|
|
>
|
|
> The Side Panel extension should be auto-loaded. To open it:
|
|
> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
|
|
> already show the gstack icon if the extension loaded successfully
|
|
> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
|
|
> 3. Click the pinned **gstack icon** in the toolbar
|
|
> 4. The Side Panel should open on the right showing a live activity feed
|
|
>
|
|
> **Port:** 34567 (auto-detected — the extension connects automatically in the
|
|
> Playwright-controlled Chrome).
|
|
|
|
Options:
|
|
- A) I can see the Side Panel — let's go!
|
|
- B) I can see Chrome but can't find the extension
|
|
- C) Something went wrong
|
|
|
|
If B: Tell the user:
|
|
|
|
> The extension is loaded into Playwright's Chromium at launch time, but
|
|
> sometimes it doesn't appear immediately. Try these steps:
|
|
>
|
|
> 1. Type `chrome://extensions` in the address bar
|
|
> 2. Look for **"gstack browse"** — it should be listed and enabled
|
|
> 3. If it's there but not pinned, go back to any page, click the puzzle piece
|
|
> icon, and pin it
|
|
> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
|
|
> - Press **Cmd+Shift+G** in the file picker dialog
|
|
> - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
|
|
> - Click **Select**
|
|
>
|
|
> After loading, pin it and click the icon to open the Side Panel.
|
|
>
|
|
> If the Side Panel badge stays gray (disconnected), click the gstack icon
|
|
> and enter port **34567** manually.
|
|
|
|
If C:
|
|
|
|
1. Run `$B status` and show the output
|
|
2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
|
|
3. If the server IS healthy but the browser isn't visible, try `$B focus`
|
|
4. If that fails, ask the user what they see (error message, blank screen, etc.)
|
|
|
|
## Step 4: Demo
|
|
|
|
After the user confirms the Side Panel is working, run a quick demo:
|
|
|
|
```bash
|
|
$B goto https://news.ycombinator.com
|
|
```
|
|
|
|
Wait 2 seconds, then:
|
|
|
|
```bash
|
|
$B snapshot -i
|
|
```
|
|
|
|
Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
|
|
commands appear in the activity feed. Every command Claude runs shows up here
|
|
in real time."
|
|
|
|
## Step 5: Sidebar chat
|
|
|
|
After the activity feed demo, tell the user about the sidebar chat:
|
|
|
|
> The Side Panel also has a **chat tab**. Try typing a message like "take a
|
|
> snapshot and describe this page." A sidebar agent (a child Claude instance)
|
|
> executes your request in the browser — you'll see the commands appear in
|
|
> the activity feed as they happen.
|
|
>
|
|
> The sidebar agent can navigate pages, click buttons, fill forms, and read
|
|
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
|
|
> it won't interfere with this Claude Code window.
|
|
|
|
## Step 6: What's next
|
|
|
|
Tell the user:
|
|
|
|
> You're all set! Here's what you can do with the connected Chrome:
|
|
>
|
|
> **Watch Claude work in real time:**
|
|
> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
|
|
> every action happen in the visible Chrome window + Side Panel feed
|
|
> - No cookie import needed — the Playwright browser shares its own session
|
|
>
|
|
> **Control the browser directly:**
|
|
> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
|
|
> agent executes it (e.g., "fill in the login form and submit")
|
|
> - **Browse commands** — `$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
|
|
> `$B snapshot -i` — all visible in Chrome + Side Panel
|
|
>
|
|
> **Window management:**
|
|
> - `$B focus` — bring Chrome to the foreground anytime
|
|
> - `$B disconnect` — close headed Chrome and return to headless mode
|
|
>
|
|
> **What skills look like in headed mode:**
|
|
> - `/qa` runs its full test suite in the visible browser — you see every page
|
|
> load, every click, every assertion
|
|
> - `/design-review` takes screenshots in the real browser — same pixels you see
|
|
> - `/benchmark` measures performance in the headed browser
|
|
|
|
Then proceed with whatever the user asked to do. If they didn't specify a task,
|
|
ask what they'd like to test or browse.
|