Merge origin/main into garrytan/browserharness

Resolves 52 conflicts from the merge:

VERSION + CHANGELOG + package.json: kept v1.16.0.0 (next slot above
main's v1.15.0.0). CHANGELOG entry for v1.16.0.0 (browser-skills) sits
above v1.15.0.0 (slim preamble + plan-mode E2E harness) and the rest
of main's history.

TODOS.md: kept browser-skills phases (P1 Phase 2, P2 Phase 3, P2
Phase 4) AND main's new entries (Sidebar Terminal v1.1, Structural
STOP-Ask forcing function P1).

README.md: took main's GBrain section (newer /setup-gbrain story).

browse/src/server.ts: took main's chat-queue refactor (sidebar agent
ripped in favor of interactive PTY) and re-applied browser-skills'
LOCAL_LISTEN_PORT module-level state + daemonPort plumbing through
MetaCommandOpts.

scripts/resolvers/preamble.ts: took main's reorder of AskUserQuestion
Format ahead of model overlay (v1.6.4.0 fix).

scripts/resolvers/preamble/generate-brain-sync-block.ts: took main's
slimmer version (slim preamble v1.15.0.0).

bin/gstack-brain-{init,sync}, bin/gstack-config, test/brain-sync.test.ts:
took main's mature versions (gbrain-sync shipped via #1151).

test/skill-validation.test.ts: took main's known-large-fixtures form +
removed sidebar-agent #584 assertions (file was deleted in main); kept
my Bundled browser-skills frontmatter contract block.

SKILL.md files (37 of them) + golden fixtures: took main's, then ran
`bun run gen:skill-docs --host all` to re-add the new $B skill +
domain-skill + cdp commands to the generated docs.

All 805 tests pass across browser-skills + skill-validation + gen-skill-docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-26 14:24:50 -07:00
167 changed files with 23453 additions and 20217 deletions
+155 -145
View File
@@ -1,190 +1,200 @@
# Sidebar Message Flow
# Sidebar Flow
How the GStack Browser sidebar actually works. Read this before touching
sidepanel.js, background.js, content.js, server.ts sidebar endpoints,
or sidebar-agent.ts.
`sidepanel.js`, `background.js`, `content.js`, `terminal-agent.ts`, or
sidebar-related server endpoints.
The sidebar has one primary surface — the **Terminal** pane, an interactive
`claude` PTY. Activity / Refs / Inspector survive as debug overlays behind
the `debug` toggle in the footer. The chat queue path (one-shot `claude -p`,
sidebar-agent.ts) was ripped once the PTY proved out — the Terminal pane is
strictly more capable.
## Components
```
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐
│ sidepanel.js │────▶│ background.js│────▶│ server.ts │────▶│sidebar-agent.ts│
(Chrome panel) │ │ (svc worker) │ │ (Bun HTTP) │ │ (Bun process)
└─────────────────┘ └──────────────┘ └─────────────┘ └────────────────┘
▲ │ │
polls /sidebar-chat polls queue file
└───────────────────────────────────────────┘ │
──────────────────────
POST /sidebar-agent/event
┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ sidepanel.js + │────▶│ server.ts │────▶│terminal-agent.ts
-terminal.js │ │ (compiled) │ │ (non-compiled)
│ (xterm.js) │ │ │ │ PTY listener │
└─────────────────┘ └──────────────┘ └──────────────────┘
│ ws://127.0.0.1:<termPort>/ws (Sec-WebSocket-Protocol auth)
└───────────────────────┼──────────────────────▶│ Bun.spawn(claude)
│ │ terminal: {data}
│ ▼
│ ┌──────────────────┐
│ │ claude PTY │
│ └──────────────────┘
POST /pty-session │
(Bearer AUTH_TOKEN) │
┌──────────────────┐
│ pty-session- │
│ cookie.ts │
│ (in-memory token │
│ registry) │
└──────────────────┘
│ POST /internal/grant (loopback)
┌──────────────────┐
│ validTokens Set │
│ in agent memory │
└──────────────────┘
```
## Startup Timeline
The compiled browse server can't `posix_spawn` external executables —
`terminal-agent.ts` runs as a separate non-compiled `bun run` process and
owns the `claude` subprocess.
## Startup + first-keystroke timeline
```
T+0ms CLI runs `$B connect`
├── Server starts on port 34567
── Writes state to .gstack/browse.json (pid, port, token)
├── Launches headed Chromium with extension
└── Clears sidebar-agent-queue.jsonl
├── Server starts (compiled)
── Spawns terminal-agent.ts via `bun run`
T+500ms sidebar-agent.ts spawned by CLI
├── Reads auth token from .gstack/browse.json
├── Creates queue file if missing
├── Sets lastLine = current line count
└── Starts polling every 200ms
T+500ms terminal-agent.ts boots
├── Bun.serve on 127.0.0.1:0 (random port)
├── Writes <stateDir>/terminal-port (server reads it for /health)
├── Writes <stateDir>/terminal-internal-token (loopback handshake)
└── Probes claude → writes claude-available.json
T+1-3s Extension loads in Chromium
├── background.js: health poll every 1s (fast startup)
│ └── GET /health → gets auth token
├── content.js: injects on welcome page
│ └── Does NOT fire gstack-extension-ready (waits for sidebar)
└── Side panel: may auto-open via chrome.sidePanel.open()
T+1-3s Extension loads, sidebar opens
├── sidepanel-terminal.js: setState(IDLE), shows "Starting Claude Code..."
└── tryAutoConnect() polls until window.gstackServerPort + token are set
T+2-10s Side panel connects
├── tryConnect() → asks background for port/token
├── Fallback: direct GET /health for token
├── updateConnection(url, token)
│ ├── Starts chat polling (1s interval)
│ ├── Starts tab polling (2s interval)
├── Connects SSE activity stream
└── Sends { type: 'sidebarOpened' } to background
└── background relays to content script → hides welcome arrow
T+10s+ Ready for messages
T+ready tryAutoConnect calls connect()
├── POST /pty-session (Authorization: Bearer AUTH_TOKEN)
│ └── server mints session token, posts /internal/grant to agent
│ └── responds with {terminalPort, ptySessionToken}
├── GET /claude-available (preflight)
├── new WebSocket(`ws://127.0.0.1:<terminalPort>/ws`,
[`gstack-pty.<token>`])
│ └── Browser sends Sec-WebSocket-Protocol + Origin
│ └── Agent validates Origin AND token BEFORE upgrading
│ └── Agent echoes the protocol back (REQUIRED — browser
│ closes the connection without it)
├── On open: send {type:"resize"} then a single \n byte
└── Agent message handler sees the byte → spawnClaude()
```
## Message Flow: User Types → Claude Responds
## Auth: WebSocket can't send Authorization headers
```
1. User types "go to hn" in sidebar, hits Enter
Browser WebSocket clients can't set `Authorization`. They CAN set
`Sec-WebSocket-Protocol` via the second arg of `new WebSocket(url,
protocols)`. We exploit that:
2. sidepanel.js sendMessage()
├── Renders user bubble immediately (optimistic)
├── Renders thinking dots immediately
├── Switches to fast poll (300ms)
└── chrome.runtime.sendMessage({ type: 'sidebar-command', message, tabId })
1. `POST /pty-session` (auth: Bearer AUTH_TOKEN) → server mints a
short-lived session token, pushes it to the agent over loopback,
returns it in the JSON body.
2. Extension calls `new WebSocket(url, ['gstack-pty.<token>'])`.
3. Agent reads `Sec-WebSocket-Protocol`, strips `gstack-pty.`, validates
against `validTokens`, echoes the protocol back. Echo is mandatory —
without it Chromium closes the connection on receipt of the upgrade
response.
3. background.js
├── Gets active Chrome tab URL
└── POST /sidebar-command { message, activeTabUrl }
with Authorization: Bearer ${authToken}
A `Set-Cookie: gstack_pty=...` header is also returned for non-browser
callers (curl, integration tests). The cookie path was the original v1
design but `SameSite=Strict` cookies don't survive the cross-port jump
from server.ts:34567 → agent:<random> from a chrome-extension origin.
The protocol-token path is what the browser actually uses.
4. server.ts /sidebar-command handler
├── validateAuth(req)
├── syncActiveTabByUrl(extensionUrl) — syncs Playwright tab to Chrome tab
├── pickSidebarModel(message) — 'sonnet' for actions, 'opus' for analysis
├── Adds user message to chat buffer
├── Builds system prompt + args
└── Appends JSON to ~/.gstack/sidebar-agent-queue.jsonl
### Dual-token model
5. sidebar-agent.ts poll() (within 200ms)
├── Reads new line from queue file
├── Parses JSON entry
├── Checks processingTabs — skips if tab already has agent running
└── askClaude(entry) — fire and forget
| Token | Lives in | Used for | Lifetime |
|-------|----------|----------|----------|
| `AUTH_TOKEN` | `<stateDir>/browse.json`; in-memory in server.ts | `/pty-session` POST (mint cookie + token) | server lifetime |
| `gstack-pty.<...>` (Sec-WebSocket-Protocol) | Browser memory only; agent `validTokens` Set | `/ws` upgrade auth | 30 min, auto-revoked on WS close |
| `INTERNAL_TOKEN` | `<stateDir>/terminal-internal-token`; in agent memory | server → agent loopback `/internal/grant` | agent lifetime |
6. sidebar-agent.ts askClaude()
├── spawn('claude', ['-p', prompt, '--model', model, ...])
├── Streams stdout line-by-line (stream-json format)
├── For each event: POST /sidebar-agent/event { type, tool, text, tabId }
└── On close: POST /sidebar-agent/event { type: 'agent_done' }
`AUTH_TOKEN` is **never** valid for `/ws` directly. The session token is
**never** valid for `/pty-session` or `/command`. Strict separation
prevents an SSE or page-content token leak from escalating into shell
access.
7. server.ts processAgentEvent()
├── Adds entry to chat buffer (in-memory + disk)
├── On agent_done: sets tab status to 'idle'
└── On agent_done: processes next queued message for that tab
## Threat model
8. sidepanel.js pollChat() (every 300ms during fast poll)
├── GET /sidebar-chat?after=${chatLineCount}&tabId=${tabId}
├── Renders new entries (text, tool_use, agent_done)
└── On agent idle: removes thinking dots, stops fast poll
```
The Terminal pane **bypasses the prompt-injection security stack** on
purpose — the user is typing directly to claude, there's no untrusted
page content in the loop. Trust source is the keyboard, same as any
local terminal.
## Arrow Hint Hide Flow (4-step signal chain)
That trust assumption is load-bearing on three transport guarantees:
The welcome page shows a right-pointing arrow until the sidebar opens.
1. **Local-only listener.** terminal-agent.ts binds `127.0.0.1` only.
The dual-listener tunnel surface (server.ts `TUNNEL_PATHS`) does
not include `/pty-session` or `/terminal/*`, so the tunnel returns
404 by default-deny.
2. **Origin gate.** `/ws` upgrades require
`Origin: chrome-extension://<id>`. A localhost web page can't mount
a cross-site WebSocket hijack against the shell because its Origin
is a regular `http(s)://...`.
3. **Session token auth.** Minted only by an authenticated
`/pty-session` POST, scoped to one WS, auto-revoked on close.
```
1. sidepanel.js updateConnection()
└── chrome.runtime.sendMessage({ type: 'sidebarOpened' })
Drop any one of those three and the whole tab becomes unsafe.
2. background.js
└── chrome.tabs.sendMessage(activeTabId, { type: 'sidebarOpened' })
## Lifecycle
3. content.js onMessage handler
└── document.dispatchEvent(new CustomEvent('gstack-extension-ready'))
- **Eager auto-connect.** Sidebar opens → tryAutoConnect polls for the
bootstrap globals and connects as soon as they're set. No keypress
required.
- **One PTY per WS.** Closing the WebSocket SIGINTs claude, then SIGKILLs
after 3s. The session token is revoked so a stolen token can't be
replayed.
- **No auto-reconnect on close.** The user sees "Session ended, click to
start a new session." Auto-reconnect would burn a fresh claude session
on every reload. v1.1 may add session resumption keyed on tab/session
id (see TODOS).
- **Manual restart anytime.** A `↻ Restart` button lives in the always-
visible terminal toolbar — works mid-session, not just from the ENDED
state.
4. welcome.html script
└── addEventListener('gstack-extension-ready', () => arrow.classList.add('hidden'))
```
## Quick-action toolbar
The arrow does NOT hide when the extension loads. Only when the sidebar connects.
Three browser-action buttons live next to the Restart button at the top
of the Terminal pane:
## Auth Token Flow
| Button | Behavior |
|--------|----------|
| 🧹 Cleanup | `window.gstackInjectToTerminal(prompt)` — pipes a "remove ads/banners" instruction into the live PTY. claude in the terminal sees it and acts. |
| 📸 Screenshot | `POST /command screenshot` — direct browse-server call, no PTY involvement. |
| 🍪 Cookies | Navigates to the `/cookie-picker` page. |
```
Server starts → AUTH_TOKEN = crypto.randomUUID()
├── GET /health (no auth) → returns { token: AUTH_TOKEN }
├── background.js checkHealth() → authToken = data.token
│ └── Refreshes on EVERY health poll (fixes stale token on restart)
├── sidepanel.js tryConnect() → serverToken from background or /health
│ └── Used for chat polling: Authorization: Bearer ${serverToken}
└── sidebar-agent.ts refreshToken() → reads from .gstack/browse.json
└── Used for event relay: Authorization: Bearer ${authToken}
```
The Inspector's "Send to Code" button uses the same `gstackInjectToTerminal`
path to forward CSS inspector data into claude.
If the server restarts, all three components get fresh tokens within 10s
(background health poll interval).
## Debug surfaces (Activity / Refs / Inspector)
## Model Routing
Behind the `debug` toggle in the footer. SSE-driven, independent of the
Terminal pane:
`pickSidebarModel(message)` in server.ts classifies messages:
- **Activity** — streams every browse command via `/activity/stream` SSE.
- **Refs** — REST: `GET /refs` — current page's `@ref` element labels.
- **Inspector** — CDP-based element picker; SSE on `/inspector/events`.
| Pattern | Model | Why |
|---------|-------|-----|
| "click @e24", "go to hn", "screenshot" | sonnet | Deterministic tool calls, no thinking needed |
| "what does this page say?", "summarize" | opus | Needs comprehension |
| "find bugs", "check for broken links" | opus | Analysis task |
| "navigate to X and fill the form" | sonnet | Action-oriented, no analysis words |
When the debug strip closes, the Terminal pane re-becomes visible.
xterm.js doesn't auto-redraw when its container flips from `display:none`
to `display:flex`, so sidepanel-terminal.js runs a `MutationObserver` on
`#tab-terminal`'s class attribute and forces a fit + refresh when
`.active` returns.
Analysis words (`what`, `why`, `how`, `summarize`, `describe`, `analyze`, `read X and Y`)
always override action verbs and force opus.
## Known Failure Modes
| Failure | Symptom | Root Cause | Fix |
|---------|---------|------------|-----|
| Stale auth token | "Unauthorized" in input | Server restarted, background had old token | background.js refreshes token on every health poll |
| Tab ID mismatch | Message sent, no response visible | Server assigned tabId 1, sidebar polling tabId 0 | switchChatTab preserves optimistic UI during switch |
| Sidebar agent not running | Messages queue forever | Agent process failed to spawn or crashed | Check `ps aux | grep sidebar-agent` |
| Agent stale token | Agent runs but no events appear in sidebar | sidebar-agent has old token from .gstack/browse.json | Agent re-reads token before each event POST |
| Queue file missing | spawnClaude fails | Race between server start and agent start | Both sides create file if missing |
| Optimistic UI blown away | User bubble + dots vanish | switchChatTab replaced DOM with welcome screen | Preserved DOM when lastOptimisticMsg is set |
## Per-Tab Concurrency
Each browser tab can run its own agent simultaneously:
- Server: `tabAgents: Map<number, TabAgentState>` with per-tab queue (max 5)
- sidebar-agent: `processingTabs: Set<number>` prevents duplicate spawns
- Two messages on same tab: queued sequentially, processed in order
- Two messages on different tabs: run concurrently
## File Locations
## Files
| Component | File | Runs in |
|-----------|------|---------|
| Sidebar UI | `extension/sidepanel.js` | Chrome side panel |
| Sidebar UI shell | `extension/sidepanel.html` + `sidepanel.js` + `sidepanel.css` | Chrome side panel |
| Terminal UI | `extension/sidepanel-terminal.js` + `extension/lib/xterm.js` | Chrome side panel |
| Service worker | `extension/background.js` | Chrome background |
| Content script | `extension/content.js` | Page context |
| Welcome page | `browse/src/welcome.html` | Page context |
| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
| Agent process | `browse/src/sidebar-agent.ts` | Bun (non-compiled, can spawn) |
| PTY agent | `browse/src/terminal-agent.ts` | Bun (non-compiled) |
| PTY token store | `browse/src/pty-session-cookie.ts` | Bun (compiled, in server.ts) |
| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
| Queue file | `~/.gstack/sidebar-agent-queue.jsonl` | Filesystem |
| State file | `.gstack/browse.json` | Filesystem |
| Chat log | `~/.gstack/sessions/<id>/chat.jsonl` | Filesystem |
| State file | `<stateDir>/browse.json` | Filesystem |
| Terminal port | `<stateDir>/terminal-port` | Filesystem |
| Internal token | `<stateDir>/terminal-internal-token` | Filesystem |
| Claude probe | `<stateDir>/claude-available.json` | Filesystem |
| Active tab | `<stateDir>/active-tab.json` | Filesystem (claude reads) |