mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 14:06:42 +02:00
Merge remote-tracking branch 'origin/main' into garrytan/openclaw-plan
This commit is contained in:
@@ -0,0 +1,376 @@
|
||||
# GStack Browser V0 — The AI-Native Development Browser
|
||||
|
||||
**Date:** 2026-03-30
|
||||
**Author:** Garry Tan + Claude Code
|
||||
**Status:** Phase 1a shipped, Phase 1b in progress
|
||||
**Branch:** garrytan/gstack-as-browser
|
||||
|
||||
## The Thesis
|
||||
|
||||
Every other AI browser (Atlas, Dia, Comet, Chrome Auto Browse) starts with a
|
||||
consumer browser and bolts AI onto it. GStack Browser inverts this. It starts
|
||||
with Claude Code as the runtime and gives it a browser viewport.
|
||||
|
||||
The agent is the primary citizen. The browser is the canvas. Skills are
|
||||
first-class capabilities. You don't "use a browser with AI help." You use
|
||||
an AI that can see and interact with the web.
|
||||
|
||||
This is the IDE for the post-IDE era. Code lives in the terminal. The product
|
||||
lives in the browser. The AI works across both simultaneously. What Cursor did
|
||||
for text editors, GStack Browser does for the browser.
|
||||
|
||||
## What It Is Today (Phase 1a, shipped)
|
||||
|
||||
A double-clickable macOS .app that wraps Playwright's Chromium with the gstack
|
||||
sidebar extension baked in. You open it and Claude Code can see your screen,
|
||||
navigate pages, fill forms, take screenshots, inspect CSS, clean up overlays,
|
||||
and run any gstack skill. All without touching a terminal.
|
||||
|
||||
```
|
||||
GStack Browser.app (389MB, 189MB DMG)
|
||||
├── Compiled browse binary (58MB) — CLI + HTTP server
|
||||
├── Chrome extension (172KB) — sidebar, activity feed, inspector
|
||||
├── Playwright's Chromium (330MB) — the actual browser
|
||||
└── Launcher script — binds project dir, sets env vars
|
||||
```
|
||||
|
||||
Launch → Chromium opens with sidebar → extension auto-connects to browse server
|
||||
→ agent ready in ~5 seconds.
|
||||
|
||||
## What It Will Be
|
||||
|
||||
### Phase 1b: Developer UX (next)
|
||||
|
||||
**Command Palette (Cmd+K):** The signature interaction. Opens a fuzzy-filtered
|
||||
skill picker. Type "/qa" to start QA testing, "/investigate" to debug, "/ship"
|
||||
to create a PR. Skills are fetched from the browse server, not hardcoded. The
|
||||
palette is the entry point to everything.
|
||||
|
||||
**Quick Screenshot (Cmd+Shift+S):** Capture the current viewport and pipe it into
|
||||
the sidebar chat with "What do you see?" context. The AI analyzes the screenshot
|
||||
and gives you actionable feedback. Visual bug reports in one keystroke.
|
||||
|
||||
**Status Bar:** A persistent 30px bar at the bottom of every page. Shows agent
|
||||
status (idle/thinking), workspace name, current branch, and auto-detected dev
|
||||
servers. Click a dev server pill to navigate. Always-visible context about what
|
||||
the AI is doing.
|
||||
|
||||
**Auto-Detect Dev Servers:** On launch, scans common ports (3000, 3001, 4200,
|
||||
5173, 5174, 8000, 8080). If exactly one server is found, auto-navigates to it.
|
||||
Dev server pills in the status bar for one-click switching.
|
||||
|
||||
### Phase 2: BoomLooper Integration
|
||||
|
||||
The sidebar connects to BoomLooper's Phoenix/Elixir APIs instead of a local
|
||||
`claude -p` subprocess. BoomLooper provides:
|
||||
|
||||
- **Multi-agent orchestration.** Spawn 5 agents in parallel, each with its own
|
||||
browser tab. One runs QA, one does design review, one watches for regressions.
|
||||
- **Docker infrastructure.** Each agent gets an isolated container. The browser
|
||||
inside the container tests the dev server. No port conflicts, no state leakage.
|
||||
- **Session persistence.** Agent conversations survive browser restarts. Pick up
|
||||
where you left off.
|
||||
- **Team visibility.** Your teammates can watch what your agents are doing in
|
||||
real-time. Like pair programming, but the pair is 5 AI agents and you're the
|
||||
conductor.
|
||||
|
||||
### Phase 3: Browse as BoomLooper Tool
|
||||
|
||||
The browse binary becomes an MCP tool in BoomLooper. Agents in Docker containers
|
||||
use browse commands to test dev servers, take screenshots, fill forms, and verify
|
||||
deployments. Cross-platform compilation (linux-arm64/x64) required.
|
||||
|
||||
### Phase 4: Chromium Fork (trigger-gated)
|
||||
|
||||
When the extension side panel hits hard API limits, GStack Browser ships to
|
||||
external users, build infra exists, and the business justifies maintenance:
|
||||
fork Chromium. Brave's `chromium_src` override pattern, CC-powered 6-week
|
||||
rebases (2-4 hours with CC vs 1-2 weeks human). ~20-30 files modified.
|
||||
|
||||
### Phase 5: Native Shell
|
||||
|
||||
SwiftUI/AppKit app shell with native sidebar, isolated Chromium service. Full
|
||||
platform integration. May be superseded by Phase 4 if the Chromium fork includes
|
||||
a native sidebar.
|
||||
|
||||
## Vision: What an AI Browser Can Do
|
||||
|
||||
### 1. See What You See
|
||||
|
||||
The browser is the AI's eyes. Not through screenshots (though it can do that),
|
||||
but through DOM access, CSS inspection, network monitoring, and accessibility
|
||||
tree parsing. The AI understands the page structure, not just the pixels.
|
||||
|
||||
**Today:** `snapshot` command returns an accessibility-tree representation of any
|
||||
page. The AI can "see" every button, link, form field, and text element. Element
|
||||
references (`@e1`, `@e2`) let the AI click, fill, and interact.
|
||||
|
||||
**Next:** Real-time page observation. The AI notices when a page changes, when an
|
||||
error appears in the console, when a network request fails. Proactive debugging
|
||||
without being asked.
|
||||
|
||||
**Future:** Visual understanding. The AI compares before/after screenshots to catch
|
||||
visual regressions. Pixel-level design review. "This button moved 3px left and the
|
||||
font changed from 14px to 13px."
|
||||
|
||||
### 2. Act on What It Sees
|
||||
|
||||
Not just reading pages, but interacting with them like a human user would.
|
||||
|
||||
**Today:** Click, fill, select, hover, type, scroll, upload files, handle dialogs,
|
||||
navigate, manage tabs. All via simple commands through the browse server.
|
||||
|
||||
**Next:** Multi-step user flows. "Log in, go to settings, change the timezone,
|
||||
verify the confirmation message." The AI chains commands with verification at each
|
||||
step.
|
||||
|
||||
**Future:** Autonomous QA agent. "Test every link on this page. Fill every form.
|
||||
Try to break it." The AI runs exhaustive interaction testing without a script.
|
||||
Finds bugs a human tester would miss because it tries combinations humans don't
|
||||
think of.
|
||||
|
||||
### 3. Write Code While Browsing
|
||||
|
||||
This is the key differentiator. The AI can see the bug in the browser AND fix it
|
||||
in the code simultaneously.
|
||||
|
||||
**Today:** The sidebar chat connects to Claude Code. You say "this button is
|
||||
misaligned" and the AI reads the CSS, identifies the issue, and proposes a fix.
|
||||
The `/design-review` skill takes screenshots, identifies visual issues, and
|
||||
commits fixes with before/after evidence.
|
||||
|
||||
**Next:** Live reload loop. The AI edits CSS/HTML, the browser auto-reloads, the
|
||||
AI verifies the fix visually. No human in the loop for simple visual fixes.
|
||||
"Fix every spacing issue on this page" becomes a 30-second task.
|
||||
|
||||
**Future:** Full-stack debugging. The AI sees a 500 error in the browser, reads
|
||||
the server logs, traces to the failing line, writes the fix, and verifies in the
|
||||
browser. One command: "This page is broken. Fix it."
|
||||
|
||||
### 4. Understand the Whole Stack
|
||||
|
||||
The browser isn't just a viewport. It's a window into the application's health.
|
||||
|
||||
**Today:**
|
||||
- Console log capture — every `console.log`, `console.error`, and warning
|
||||
- Network request monitoring — every XHR, fetch, websocket, and static asset
|
||||
- Performance metrics — Core Web Vitals, resource timing, paint events
|
||||
- Cookie and storage inspection — read and write localStorage, sessionStorage
|
||||
- CSS inspection — computed styles, box model, rule cascade
|
||||
|
||||
**Next:**
|
||||
- Network request replay — "replay this failing request with different params"
|
||||
- Performance regression detection — "this page is 200ms slower than yesterday"
|
||||
- Dependency auditing — "this page loads 47 third-party scripts"
|
||||
- Accessibility auditing — "this form has no labels, these colors fail contrast"
|
||||
|
||||
**Future:**
|
||||
- Full application telemetry — CPU, memory, GPU usage in real-time
|
||||
- Cross-browser testing — same test suite across Chrome, Firefox, Safari
|
||||
- Real user monitoring correlation — "this bug affects 12% of production users"
|
||||
|
||||
### 5. The Workspace Model
|
||||
|
||||
The browser IS the workspace. Not a tab in a workspace. The workspace itself.
|
||||
|
||||
**Today:** Each browser session is bound to a project directory. The sidebar shows
|
||||
the current branch. The status bar shows detected dev servers.
|
||||
|
||||
**Next:** Multi-project support. Switch between projects without closing the
|
||||
browser. Each project gets its own set of tabs, its own agent, its own context.
|
||||
Like VSCode workspaces, but for the browser.
|
||||
|
||||
**Future:** Team workspaces. Multiple developers share a browser workspace. See
|
||||
each other's agents working. Collaborative debugging where one person navigates
|
||||
and the other watches the AI fix things in real-time.
|
||||
|
||||
### 6. Skills as Browser Capabilities
|
||||
|
||||
Every gstack skill becomes a browser capability.
|
||||
|
||||
| Skill | Browser Capability |
|
||||
|-------|-------------------|
|
||||
| `/qa` | Test every page, find bugs, fix them, verify fixes |
|
||||
| `/design-review` | Screenshot → analyze → fix CSS → screenshot again |
|
||||
| `/investigate` | See the error in browser → trace to code → fix → verify |
|
||||
| `/benchmark` | Measure page performance → detect regressions → alert |
|
||||
| `/canary` | Monitor deployed site → screenshot periodically → alert on changes |
|
||||
| `/ship` | Run tests → review diff → create PR → verify deployment in browser |
|
||||
| `/cso` | Audit page for XSS, open redirects, clickjacking in real browser |
|
||||
| `/office-hours` | Browse competitor sites → synthesize observations → design doc |
|
||||
|
||||
The command palette (Cmd+K) is the hub. You don't need to know the skills exist.
|
||||
You type what you want, the fuzzy filter finds the right skill, and the AI runs it
|
||||
with the browser as context.
|
||||
|
||||
### 7. The Design Loop
|
||||
|
||||
AI-powered design is a loop, not a handoff.
|
||||
|
||||
```
|
||||
Generate mockup (GPT Image API)
|
||||
→ Review in browser (side-by-side with live site)
|
||||
→ Iterate with feedback ("make the header taller")
|
||||
→ Approve direction
|
||||
→ Generate production HTML/CSS
|
||||
→ Preview in browser
|
||||
→ Fine-tune with /design-review
|
||||
→ Ship
|
||||
```
|
||||
|
||||
The browser closes the gap between "what it looks like in Figma" and "what it
|
||||
looks like in production." Because the AI can see both simultaneously.
|
||||
|
||||
### 8. The Security Loop
|
||||
|
||||
CSO review in a real browser, not just static analysis.
|
||||
|
||||
- Inject XSS payloads into every input field, check if they execute
|
||||
- Test CSRF by replaying requests from a different origin
|
||||
- Check for open redirects by navigating to crafted URLs
|
||||
- Verify CSP headers are actually enforced (not just present)
|
||||
- Test auth flows by manipulating cookies and tokens in real-time
|
||||
- Check for clickjacking by loading the site in an iframe
|
||||
|
||||
Static analysis catches patterns. Browser testing catches reality.
|
||||
|
||||
### 9. The Monitoring Loop
|
||||
|
||||
Post-deploy canary monitoring, in a real browser.
|
||||
|
||||
```
|
||||
Deploy → Browser loads production URL
|
||||
→ Screenshot baseline
|
||||
→ Every 5 minutes: screenshot, compare, check console
|
||||
→ Alert on: visual regression, new console errors, performance drop
|
||||
→ Auto-rollback if critical error detected
|
||||
```
|
||||
|
||||
Synthetic monitoring with AI judgment. Not just "did the page return 200" but
|
||||
"does the page look right and work correctly."
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+-------------------------------------------------------+
|
||||
| GStack Browser |
|
||||
| |
|
||||
| +------------------+ +---------------------------+ |
|
||||
| | Chromium | | Extension Side Panel | |
|
||||
| | (Playwright) | | ├── Chat (Claude Code) | |
|
||||
| | | | ├── Activity Feed | |
|
||||
| | ┌────────────┐ | | ├── Element Refs | |
|
||||
| | │ Status Bar │ | | ├── CSS Inspector | |
|
||||
| | └────────────┘ | | ├── Command Palette | |
|
||||
| +--------┬──────────+ | └── Settings | |
|
||||
| │ +-------------┬--------------+ |
|
||||
+-----------┼────────────────────────────┼─────────────────+
|
||||
│ │
|
||||
v v
|
||||
+---------┴-----------+ +-----------┴-----------+
|
||||
| Browse Server | | Sidebar Agent |
|
||||
| (HTTP + SSE) | | (claude -p wrapper) |
|
||||
| :34567 | | Runs gstack skills |
|
||||
| | | Per-tab isolation |
|
||||
| Commands: | | |
|
||||
| goto, click, fill | | Future: BoomLooper |
|
||||
| snapshot, screenshot| | GenServer agents |
|
||||
| css, inspect, eval | | |
|
||||
+---------┬-----------+ +-----------┬-----------+
|
||||
│ │
|
||||
v v
|
||||
+---------┴-----------+ +-----------┴-----------+
|
||||
| User's App | | Claude Code |
|
||||
| localhost:3000 | | (reads/writes code) |
|
||||
| (or any URL) | | |
|
||||
+---------------------+ +-----------------------+
|
||||
```
|
||||
|
||||
## Competitive Landscape
|
||||
|
||||
| Browser | Approach | Differentiator | Weakness |
|
||||
|---------|----------|---------------|----------|
|
||||
| **Atlas** | Chromium fork + AI layer | Agentic browser, "OWL" isolated Chromium | Consumer-focused, no code integration |
|
||||
| **Dia** | AI-native browser | Clean UI, built for AI interaction | No dev tools, no code editing |
|
||||
| **Comet** | AI browser | Multi-agent browsing | Early, unclear dev workflow |
|
||||
| **Chrome Auto Browse** | Extension | Google's own, deep Chrome integration | Extension-only, no code editing |
|
||||
| **Cursor** | VSCode fork + AI | Best-in-class code editing | No browser viewport |
|
||||
| **GStack Browser** | CC runtime + browser viewport | See bug in browser, fix in code, verify | Currently macOS-only, no consumer features |
|
||||
|
||||
GStack Browser doesn't compete with consumer browsers. It competes with the
|
||||
workflow of switching between browser and editor. The goal is to make that switch
|
||||
invisible.
|
||||
|
||||
## Design System
|
||||
|
||||
From DESIGN.md:
|
||||
- **Primary accent:** Amber-500 (#F59E0B) — agent active, focus states, pulse
|
||||
- **Background:** Zinc-950 (#09090B) through Zinc-800 (#27272A) — dark, dense
|
||||
- **Typography:** JetBrains Mono (code/status), DM Sans (UI/labels)
|
||||
- **Border radius:** 8px (md), 12px (lg), full (pills)
|
||||
- **Motion:** Pulse animation on agent active, 200ms transitions
|
||||
- **Layout:** Sidebar (right), status bar (bottom), palette (centered overlay)
|
||||
|
||||
## Implementation Status
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| .app bundle | **SHIPPED** | 389MB, launches in ~5s |
|
||||
| DMG packaging | **SHIPPED** | 189MB compressed |
|
||||
| `GSTACK_CHROMIUM_PATH` | **SHIPPED** | Custom Chromium binary support |
|
||||
| `BROWSE_EXTENSIONS_DIR` | **SHIPPED** | Extension path override |
|
||||
| Auth via `/health` | **SHIPPED** | Replaces .auth.json file approach, auto-refreshes on server restart |
|
||||
| Build script | **SHIPPED** | `scripts/build-app.sh` |
|
||||
| Model routing | **SHIPPED** | Sonnet for actions, Opus for analysis (`pickSidebarModel`) |
|
||||
| Debug logging | **SHIPPED** | 40+ silent catches → prefixed console logging across 4 files |
|
||||
| No idle timeout (headed) | **SHIPPED** | Browser stays alive as long as window is open |
|
||||
| Cookie import button | **SHIPPED** | One-click in sidebar footer, opens `/cookie-picker` |
|
||||
| Sidebar arrow hint | **SHIPPED** | Points to sidebar, hides only when sidebar actually opens |
|
||||
| Architecture doc | **SHIPPED** | `docs/designs/SIDEBAR_MESSAGE_FLOW.md` |
|
||||
| Command palette | Planned | Phase 1b |
|
||||
| Quick screenshot | Planned | Phase 1b |
|
||||
| Status bar | Planned | Phase 1b |
|
||||
| Dev server detection | Planned | Phase 1b |
|
||||
| BoomLooper integration | Future | Phase 2 |
|
||||
| Cross-platform | Future | Phase 3 |
|
||||
| Chromium fork | Trigger-gated | Phase 4 |
|
||||
| Native shell | Deferred | Phase 5 |
|
||||
|
||||
## The 12-Month Vision
|
||||
|
||||
```
|
||||
TODAY (Phase 1) 6 MONTHS (Phase 2-3) 12 MONTHS (Phase 4-5)
|
||||
───────────── ────────────────── ────────────────────
|
||||
macOS .app wrapper BoomLooper multi-agent Chromium fork OR
|
||||
Extension sidebar Docker containers Native SwiftUI shell
|
||||
Local claude -p agent Team workspaces Cross-platform
|
||||
Single project Linux/x64 browse Auto-update
|
||||
Manual skill invocation Autonomous QA loops Skill marketplace
|
||||
Performance monitoring Plugin API
|
||||
Real-time collaboration Enterprise features
|
||||
```
|
||||
|
||||
The 12-month ideal: you open GStack Browser, it detects your project, starts
|
||||
your dev server, runs your test suite, and reports what's broken. You say "fix
|
||||
it" and the AI fixes every bug, verifies each fix visually, and creates a PR.
|
||||
You review the PR in the same browser, approve it, and the AI deploys it and
|
||||
monitors the canary. All in one window.
|
||||
|
||||
That's the browser as AI workspace. Not a browser with AI bolted on. An AI
|
||||
with a browser bolted on.
|
||||
|
||||
## Review History
|
||||
|
||||
This plan went through 4 reviews:
|
||||
|
||||
1. **CEO Review** (`/plan-ceo-review`, SELECTIVE EXPANSION) — 9 scope proposals,
|
||||
3 accepted (Cmd+K, Cmd+Shift+S, status bar), 5 deferred, 1 skipped
|
||||
2. **Design Review** (`/plan-design-review`) — scored 5/10 → 8/10, 9 design
|
||||
decisions added, 2 approved mockups generated
|
||||
3. **Eng Review** (`/plan-eng-review`) — 4 issues found, 0 critical gaps,
|
||||
test plan produced
|
||||
4. **Codex Review** (outside voice) — 9 findings, 3 critical gaps caught
|
||||
(server bundling, auth file location, project binding). All resolved.
|
||||
|
||||
The Codex review caught 3 real architecture gaps that survived 3 prior reviews.
|
||||
Cross-model review works.
|
||||
@@ -0,0 +1,190 @@
|
||||
# Sidebar Message Flow
|
||||
|
||||
How the GStack Browser sidebar actually works. Read this before touching
|
||||
sidepanel.js, background.js, content.js, server.ts sidebar endpoints,
|
||||
or sidebar-agent.ts.
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐
|
||||
│ sidepanel.js │────▶│ background.js│────▶│ server.ts │────▶│sidebar-agent.ts│
|
||||
│ (Chrome panel) │ │ (svc worker) │ │ (Bun HTTP) │ │ (Bun process) │
|
||||
└─────────────────┘ └──────────────┘ └─────────────┘ └────────────────┘
|
||||
▲ │ │
|
||||
│ polls /sidebar-chat │ polls queue file │
|
||||
└───────────────────────────────────────────┘ │
|
||||
◀──────────────────────┘
|
||||
POST /sidebar-agent/event
|
||||
```
|
||||
|
||||
## Startup Timeline
|
||||
|
||||
```
|
||||
T+0ms CLI runs `$B connect`
|
||||
├── Server starts on port 34567
|
||||
├── Writes state to .gstack/browse.json (pid, port, token)
|
||||
├── Launches headed Chromium with extension
|
||||
└── Clears sidebar-agent-queue.jsonl
|
||||
|
||||
T+500ms sidebar-agent.ts spawned by CLI
|
||||
├── Reads auth token from .gstack/browse.json
|
||||
├── Creates queue file if missing
|
||||
├── Sets lastLine = current line count
|
||||
└── Starts polling every 200ms
|
||||
|
||||
T+1-3s Extension loads in Chromium
|
||||
├── background.js: health poll every 1s (fast startup)
|
||||
│ └── GET /health → gets auth token
|
||||
├── content.js: injects on welcome page
|
||||
│ └── Does NOT fire gstack-extension-ready (waits for sidebar)
|
||||
└── Side panel: may auto-open via chrome.sidePanel.open()
|
||||
|
||||
T+2-10s Side panel connects
|
||||
├── tryConnect() → asks background for port/token
|
||||
├── Fallback: direct GET /health for token
|
||||
├── updateConnection(url, token)
|
||||
│ ├── Starts chat polling (1s interval)
|
||||
│ ├── Starts tab polling (2s interval)
|
||||
│ ├── Connects SSE activity stream
|
||||
│ └── Sends { type: 'sidebarOpened' } to background
|
||||
└── background relays to content script → hides welcome arrow
|
||||
|
||||
T+10s+ Ready for messages
|
||||
```
|
||||
|
||||
## Message Flow: User Types → Claude Responds
|
||||
|
||||
```
|
||||
1. User types "go to hn" in sidebar, hits Enter
|
||||
|
||||
2. sidepanel.js sendMessage()
|
||||
├── Renders user bubble immediately (optimistic)
|
||||
├── Renders thinking dots immediately
|
||||
├── Switches to fast poll (300ms)
|
||||
└── chrome.runtime.sendMessage({ type: 'sidebar-command', message, tabId })
|
||||
|
||||
3. background.js
|
||||
├── Gets active Chrome tab URL
|
||||
└── POST /sidebar-command { message, activeTabUrl }
|
||||
with Authorization: Bearer ${authToken}
|
||||
|
||||
4. server.ts /sidebar-command handler
|
||||
├── validateAuth(req)
|
||||
├── syncActiveTabByUrl(extensionUrl) — syncs Playwright tab to Chrome tab
|
||||
├── pickSidebarModel(message) — 'sonnet' for actions, 'opus' for analysis
|
||||
├── Adds user message to chat buffer
|
||||
├── Builds system prompt + args
|
||||
└── Appends JSON to ~/.gstack/sidebar-agent-queue.jsonl
|
||||
|
||||
5. sidebar-agent.ts poll() (within 200ms)
|
||||
├── Reads new line from queue file
|
||||
├── Parses JSON entry
|
||||
├── Checks processingTabs — skips if tab already has agent running
|
||||
└── askClaude(entry) — fire and forget
|
||||
|
||||
6. sidebar-agent.ts askClaude()
|
||||
├── spawn('claude', ['-p', prompt, '--model', model, ...])
|
||||
├── Streams stdout line-by-line (stream-json format)
|
||||
├── For each event: POST /sidebar-agent/event { type, tool, text, tabId }
|
||||
└── On close: POST /sidebar-agent/event { type: 'agent_done' }
|
||||
|
||||
7. server.ts processAgentEvent()
|
||||
├── Adds entry to chat buffer (in-memory + disk)
|
||||
├── On agent_done: sets tab status to 'idle'
|
||||
└── On agent_done: processes next queued message for that tab
|
||||
|
||||
8. sidepanel.js pollChat() (every 300ms during fast poll)
|
||||
├── GET /sidebar-chat?after=${chatLineCount}&tabId=${tabId}
|
||||
├── Renders new entries (text, tool_use, agent_done)
|
||||
└── On agent idle: removes thinking dots, stops fast poll
|
||||
```
|
||||
|
||||
## Arrow Hint Hide Flow (4-step signal chain)
|
||||
|
||||
The welcome page shows a right-pointing arrow until the sidebar opens.
|
||||
|
||||
```
|
||||
1. sidepanel.js updateConnection()
|
||||
└── chrome.runtime.sendMessage({ type: 'sidebarOpened' })
|
||||
|
||||
2. background.js
|
||||
└── chrome.tabs.sendMessage(activeTabId, { type: 'sidebarOpened' })
|
||||
|
||||
3. content.js onMessage handler
|
||||
└── document.dispatchEvent(new CustomEvent('gstack-extension-ready'))
|
||||
|
||||
4. welcome.html script
|
||||
└── addEventListener('gstack-extension-ready', () => arrow.classList.add('hidden'))
|
||||
```
|
||||
|
||||
The arrow does NOT hide when the extension loads. Only when the sidebar connects.
|
||||
|
||||
## Auth Token Flow
|
||||
|
||||
```
|
||||
Server starts → AUTH_TOKEN = crypto.randomUUID()
|
||||
│
|
||||
├── GET /health (no auth) → returns { token: AUTH_TOKEN }
|
||||
│
|
||||
├── background.js checkHealth() → authToken = data.token
|
||||
│ └── Refreshes on EVERY health poll (fixes stale token on restart)
|
||||
│
|
||||
├── sidepanel.js tryConnect() → serverToken from background or /health
|
||||
│ └── Used for chat polling: Authorization: Bearer ${serverToken}
|
||||
│
|
||||
└── sidebar-agent.ts refreshToken() → reads from .gstack/browse.json
|
||||
└── Used for event relay: Authorization: Bearer ${authToken}
|
||||
```
|
||||
|
||||
If the server restarts, all three components get fresh tokens within 10s
|
||||
(background health poll interval).
|
||||
|
||||
## Model Routing
|
||||
|
||||
`pickSidebarModel(message)` in server.ts classifies messages:
|
||||
|
||||
| Pattern | Model | Why |
|
||||
|---------|-------|-----|
|
||||
| "click @e24", "go to hn", "screenshot" | sonnet | Deterministic tool calls, no thinking needed |
|
||||
| "what does this page say?", "summarize" | opus | Needs comprehension |
|
||||
| "find bugs", "check for broken links" | opus | Analysis task |
|
||||
| "navigate to X and fill the form" | sonnet | Action-oriented, no analysis words |
|
||||
|
||||
Analysis words (`what`, `why`, `how`, `summarize`, `describe`, `analyze`, `read X and Y`)
|
||||
always override action verbs and force opus.
|
||||
|
||||
## Known Failure Modes
|
||||
|
||||
| Failure | Symptom | Root Cause | Fix |
|
||||
|---------|---------|------------|-----|
|
||||
| Stale auth token | "Unauthorized" in input | Server restarted, background had old token | background.js refreshes token on every health poll |
|
||||
| Tab ID mismatch | Message sent, no response visible | Server assigned tabId 1, sidebar polling tabId 0 | switchChatTab preserves optimistic UI during switch |
|
||||
| Sidebar agent not running | Messages queue forever | Agent process failed to spawn or crashed | Check `ps aux | grep sidebar-agent` |
|
||||
| Agent stale token | Agent runs but no events appear in sidebar | sidebar-agent has old token from .gstack/browse.json | Agent re-reads token before each event POST |
|
||||
| Queue file missing | spawnClaude fails | Race between server start and agent start | Both sides create file if missing |
|
||||
| Optimistic UI blown away | User bubble + dots vanish | switchChatTab replaced DOM with welcome screen | Preserved DOM when lastOptimisticMsg is set |
|
||||
|
||||
## Per-Tab Concurrency
|
||||
|
||||
Each browser tab can run its own agent simultaneously:
|
||||
|
||||
- Server: `tabAgents: Map<number, TabAgentState>` with per-tab queue (max 5)
|
||||
- sidebar-agent: `processingTabs: Set<number>` prevents duplicate spawns
|
||||
- Two messages on same tab: queued sequentially, processed in order
|
||||
- Two messages on different tabs: run concurrently
|
||||
|
||||
## File Locations
|
||||
|
||||
| Component | File | Runs in |
|
||||
|-----------|------|---------|
|
||||
| Sidebar UI | `extension/sidepanel.js` | Chrome side panel |
|
||||
| Service worker | `extension/background.js` | Chrome background |
|
||||
| Content script | `extension/content.js` | Page context |
|
||||
| Welcome page | `browse/src/welcome.html` | Page context |
|
||||
| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
|
||||
| Agent process | `browse/src/sidebar-agent.ts` | Bun (non-compiled, can spawn) |
|
||||
| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
|
||||
| Queue file | `~/.gstack/sidebar-agent-queue.jsonl` | Filesystem |
|
||||
| State file | `.gstack/browse.json` | Filesystem |
|
||||
| Chat log | `~/.gstack/sessions/<id>/chat.jsonl` | Filesystem |
|
||||
@@ -0,0 +1,290 @@
|
||||
# Slate Host Integration — Research & Design Doc
|
||||
|
||||
**Date:** 2026-04-02
|
||||
**Branch:** garrytan/slate-agent-support
|
||||
**Status:** Research complete, blocked on host config refactor
|
||||
**Supersedes:** None
|
||||
|
||||
## What is Slate
|
||||
|
||||
Slate is a proprietary coding agent CLI from Random Labs.
|
||||
Install: `npm i -g @randomlabs/slate` or `brew install anthropic/tap/slate`.
|
||||
License: Proprietary. 85MB compiled Bun binary (arm64/x64, darwin/linux/windows).
|
||||
npm package: `@randomlabs/slate@1.0.25` (thin 8.8KB launcher + platform-specific optional deps).
|
||||
|
||||
Multi-model: dynamically selects Claude Sonnet/Opus/Haiku, plus other models.
|
||||
Built for "swarm orchestration" with extended multi-hour sessions.
|
||||
|
||||
## Slate is an OpenCode fork
|
||||
|
||||
**Confirmed via binary strings analysis** of the 85MB Mach-O arm64 binary:
|
||||
|
||||
- Internal name: `name: "opencode"` (literal string in binary)
|
||||
- All `OPENCODE_*` env vars present alongside `SLATE_*` equivalents
|
||||
- Shares OpenCode's tool/skill architecture, LSP integration, terminal management
|
||||
- Own branding, API endpoints (`api.randomlabs.ai`, `agent-worker-prod.randomlabs.workers.dev`), and config paths
|
||||
|
||||
This matters for integration: OpenCode conventions mostly apply, but Slate adds
|
||||
its own paths and env vars on top.
|
||||
|
||||
## Skill Discovery (confirmed from binary)
|
||||
|
||||
Slate scans ALL four directory families for skills. Error messages in binary confirm:
|
||||
|
||||
```
|
||||
"failed .slate directory scan for skills"
|
||||
"failed .claude directory scan for skills"
|
||||
"failed .agents directory scan for skills"
|
||||
"failed .opencode directory scan for skills"
|
||||
```
|
||||
|
||||
**Discovery paths (priority order from Slate docs):**
|
||||
|
||||
1. `.slate/skills/<name>/SKILL.md` — project-level, highest priority
|
||||
2. `~/.slate/skills/<name>/SKILL.md` — global
|
||||
3. `.opencode/skills/`, `.agents/skills/` — compatibility fallback
|
||||
4. `.claude/skills/` — Claude Code compatibility fallback (lowest)
|
||||
5. Custom paths via `slate.json`
|
||||
|
||||
**Glob patterns:** `**/SKILL.md` and `{skill,skills}/**/SKILL.md`
|
||||
|
||||
**Commands:** Same directory structure but under `commands/` subdirs:
|
||||
`/.slate/commands/`, `/.claude/commands/`, `/.agents/commands/`, `/.opencode/commands/`
|
||||
|
||||
**Skill frontmatter:** YAML with `name` and `description` fields (per Slate docs).
|
||||
No documented length limits on either field.
|
||||
|
||||
## Project Instructions
|
||||
|
||||
Slate reads both `CLAUDE.md` and `AGENTS.md` for project instructions.
|
||||
Both literal strings confirmed in binary. No changes needed to existing
|
||||
gstack projects... CLAUDE.md works as-is.
|
||||
|
||||
## Configuration
|
||||
|
||||
**Config file:** `slate.json` / `slate.jsonc` (NOT opencode.json)
|
||||
|
||||
**Config options (from Slate docs):**
|
||||
- `privacy` (boolean) — disables telemetry/logging
|
||||
- Permissions: `allow`, `ask`, `deny` per tool (`read`, `edit`, `bash`, `grep`, `webfetch`, `websearch`, `*`)
|
||||
- Model slots: `models.main`, `models.subagent`, `models.search`, `models.reasoning`
|
||||
- MCP servers: local or remote with custom commands and headers
|
||||
- Custom commands: `/commands` with templates
|
||||
|
||||
The setup script should NOT create `slate.json`. Users configure their own permissions.
|
||||
|
||||
## CLI Flags (Headless Mode)
|
||||
|
||||
```
|
||||
--stream-json / --output-format stream-json — JSONL output, "compatible with Anthropic Claude Code SDK"
|
||||
--dangerously-skip-permissions — bypass all permission checks (CI/automation)
|
||||
--input-format stream-json — programmatic input
|
||||
-q — non-interactive mode
|
||||
-w <dir> — workspace directory
|
||||
--output-format text — plain text output (default)
|
||||
```
|
||||
|
||||
**Stream-JSON format:** Slate docs claim "compatible with Anthropic Claude Code SDK."
|
||||
Not yet empirically verified. Given OpenCode heritage, likely matches Claude Code's
|
||||
NDJSON event schema (type: "assistant", type: "tool_result", type: "result").
|
||||
|
||||
**Need to verify:** Run `slate -q "hello" --stream-json` with valid credits and
|
||||
capture actual JSONL events before building the session runner parser.
|
||||
|
||||
## Environment Variables (from binary strings)
|
||||
|
||||
### Slate-specific
|
||||
```
|
||||
SLATE_API_KEY — API key
|
||||
SLATE_AGENT — agent selection
|
||||
SLATE_AUTO_SHARE — auto-share setting
|
||||
SLATE_CLIENT — client identifier
|
||||
SLATE_CONFIG — config override
|
||||
SLATE_CONFIG_CONTENT — inline config
|
||||
SLATE_CONFIG_DIR — config directory
|
||||
SLATE_DANGEROUSLY_SKIP_PERMISSIONS — bypass permissions
|
||||
SLATE_DIR — data directory override
|
||||
SLATE_DISABLE_AUTOUPDATE — disable auto-update
|
||||
SLATE_DISABLE_CLAUDE_CODE — disable Claude Code integration entirely
|
||||
SLATE_DISABLE_CLAUDE_CODE_PROMPT — disable Claude Code prompt loading
|
||||
SLATE_DISABLE_CLAUDE_CODE_SKILLS — disable .claude/skills/ loading
|
||||
SLATE_DISABLE_DEFAULT_PLUGINS — disable default plugins
|
||||
SLATE_DISABLE_FILETIME_CHECK — disable file time checks
|
||||
SLATE_DISABLE_LSP_DOWNLOAD — disable LSP auto-download
|
||||
SLATE_DISABLE_MODELS_FETCH — disable models config fetch
|
||||
SLATE_DISABLE_PROJECT_CONFIG — disable project-level config
|
||||
SLATE_DISABLE_PRUNE — disable session pruning
|
||||
SLATE_DISABLE_TERMINAL_TITLE — disable terminal title updates
|
||||
SLATE_ENABLE_EXA — enable Exa search
|
||||
SLATE_ENABLE_EXPERIMENTAL_MODELS — enable experimental models
|
||||
SLATE_EXPERIMENTAL — enable experimental features
|
||||
SLATE_EXPERIMENTAL_BASH_DEFAULT_TIMEOUT_MS — bash timeout override
|
||||
SLATE_EXPERIMENTAL_DISABLE_COPY_ON_SELECT — disable copy on select
|
||||
SLATE_EXPERIMENTAL_DISABLE_FILEWATCHER — disable file watcher
|
||||
SLATE_EXPERIMENTAL_EXA — Exa search (alt flag)
|
||||
SLATE_EXPERIMENTAL_FILEWATCHER — enable file watcher
|
||||
SLATE_EXPERIMENTAL_ICON_DISCOVERY — icon discovery
|
||||
SLATE_EXPERIMENTAL_LSP_TOOL — LSP tool
|
||||
SLATE_EXPERIMENTAL_LSP_TY — LSP type checking
|
||||
SLATE_EXPERIMENTAL_MARKDOWN — markdown mode
|
||||
SLATE_EXPERIMENTAL_OUTPUT_TOKEN_MAX — output token limit
|
||||
SLATE_EXPERIMENTAL_OXFMT — oxfmt integration
|
||||
SLATE_EXPERIMENTAL_PLAN_MODE — plan mode
|
||||
SLATE_FAKE_VCS — fake VCS for testing
|
||||
SLATE_GIT_BASH_PATH — git bash path (Windows)
|
||||
SLATE_MODELS_URL — models config URL
|
||||
SLATE_PERMISSION — permission override
|
||||
SLATE_SERVER_PASSWORD — server auth
|
||||
SLATE_SERVER_USERNAME — server auth
|
||||
SLATE_TELEMETRY_DISABLED — disable telemetry
|
||||
SLATE_TEST_HOME — test home directory
|
||||
SLATE_TOKEN_DIR — token storage directory
|
||||
```
|
||||
|
||||
### OpenCode legacy (still functional)
|
||||
```
|
||||
OPENCODE_DISABLE_LSP_DOWNLOAD
|
||||
OPENCODE_EXPERIMENTAL_DISABLE_FILEWATCHER
|
||||
OPENCODE_EXPERIMENTAL_FILEWATCHER
|
||||
OPENCODE_EXPERIMENTAL_ICON_DISCOVERY
|
||||
OPENCODE_EXPERIMENTAL_LSP_TY
|
||||
OPENCODE_EXPERIMENTAL_OXFMT
|
||||
OPENCODE_FAKE_VCS
|
||||
OPENCODE_GIT_BASH_PATH
|
||||
OPENCODE_LIBC
|
||||
OPENCODE_TERMINAL
|
||||
```
|
||||
|
||||
### Critical env vars for gstack integration
|
||||
|
||||
**`SLATE_DISABLE_CLAUDE_CODE_SKILLS`** — When set, `.claude/skills/` loading is disabled.
|
||||
This makes publishing to `.slate/skills/` load-bearing, not just an optimization.
|
||||
Without native `.slate/` publishing, gstack skills vanish when this flag is set.
|
||||
|
||||
**`SLATE_TEST_HOME`** — Useful for E2E tests. Can redirect Slate's home directory
|
||||
to an isolated temp directory, similar to how Codex tests use a temp HOME.
|
||||
|
||||
**`SLATE_DANGEROUSLY_SKIP_PERMISSIONS`** — Required for headless E2E tests.
|
||||
|
||||
## Model References (from binary)
|
||||
|
||||
```
|
||||
anthropic/claude-sonnet-4.6
|
||||
anthropic/claude-opus-4
|
||||
anthropic/claude-haiku-4
|
||||
anthropic/slate — Slate's own model routing
|
||||
openai/gpt-5.3-codex
|
||||
google/nano-banana
|
||||
randomlabs/fast-default-alpha
|
||||
```
|
||||
|
||||
## API Endpoints (from binary)
|
||||
|
||||
```
|
||||
https://api.randomlabs.ai — main API
|
||||
https://api.randomlabs.ai/exaproxy — Exa search proxy
|
||||
https://agent-worker-prod.randomlabs.workers.dev — production worker
|
||||
https://agent-worker-dev.randomlabs.workers.dev — dev worker
|
||||
https://dashboard.randomlabs.ai — dashboard
|
||||
https://docs.randomlabs.ai — documentation
|
||||
https://randomlabs.ai/config.json — remote config
|
||||
```
|
||||
|
||||
Brew tap: `anthropic/tap/slate` (notable: under Anthropic's tap, not Random Labs)
|
||||
|
||||
## npm Package Structure
|
||||
|
||||
```
|
||||
@randomlabs/slate (8.8 kB, thin launcher)
|
||||
├── bin/slate — Node.js launcher (finds platform binary in node_modules)
|
||||
├── bin/slate1 — Bun launcher (same logic, import.meta.filename)
|
||||
├── postinstall.mjs — Verifies platform binary exists, symlinks if needed
|
||||
└── package.json — Declares optionalDependencies for all platforms
|
||||
|
||||
Platform packages (85MB each):
|
||||
├── @randomlabs/slate-darwin-arm64
|
||||
├── @randomlabs/slate-darwin-x64
|
||||
├── @randomlabs/slate-linux-arm64
|
||||
├── @randomlabs/slate-linux-x64
|
||||
├── @randomlabs/slate-linux-x64-musl
|
||||
├── @randomlabs/slate-linux-arm64-musl
|
||||
├── @randomlabs/slate-linux-x64-baseline
|
||||
├── @randomlabs/slate-linux-x64-baseline-musl
|
||||
├── @randomlabs/slate-darwin-x64-baseline
|
||||
├── @randomlabs/slate-windows-x64
|
||||
└── @randomlabs/slate-windows-x64-baseline
|
||||
```
|
||||
|
||||
Binary override: `SLATE_BIN_PATH` env var skips all discovery, runs the specified binary directly.
|
||||
|
||||
## What Already Works Today
|
||||
|
||||
gstack skills already work in Slate via the `.claude/skills/` fallback path.
|
||||
No changes needed for basic functionality. Users who install gstack for Claude Code
|
||||
and also use Slate will find their skills available in both agents.
|
||||
|
||||
## What First-Class Support Adds
|
||||
|
||||
1. **Reliability** — `.slate/skills/` is Slate's highest-priority path. Immune to
|
||||
`SLATE_DISABLE_CLAUDE_CODE_SKILLS`.
|
||||
2. **Optimized frontmatter** — Strip Claude-specific fields (allowed-tools, hooks, version)
|
||||
that Slate doesn't use. Keep only `name` and `description`.
|
||||
3. **Setup script** — Auto-detect `slate` binary, install skills to `~/.slate/skills/`.
|
||||
4. **E2E tests** — Verify skills work when invoked by Slate directly.
|
||||
|
||||
## Blocked On: Host Config Refactor
|
||||
|
||||
Codex's outside voice review identified that adding Slate as a 4th host (after Claude,
|
||||
Codex, Factory) is "host explosion for a path alias." The current architecture has:
|
||||
|
||||
- Hard-coded host names in `type Host = 'claude' | 'codex' | 'factory'`
|
||||
- Per-host branches in `transformFrontmatter()` with near-duplicate logic
|
||||
- Per-host config in `EXTERNAL_HOST_CONFIG` with similar patterns
|
||||
- Per-host functions in the setup script (`create_codex_runtime_root`, `link_codex_skill_dirs`)
|
||||
- Host names duplicated in `bin/gstack-platform-detect`, `bin/gstack-uninstall`, `bin/dev-setup`
|
||||
|
||||
Adding Slate means copying all of these patterns again. A refactor to make hosts
|
||||
data-driven (config objects instead of if/else branches) would make Slate integration
|
||||
trivial AND make future hosts (any new OpenCode fork, any new agent) zero-effort.
|
||||
|
||||
### Missing from the plan (identified by Codex)
|
||||
|
||||
- `lib/worktree.ts` only copies `.agents/`, not `.slate/` — E2E tests in worktrees won't
|
||||
have Slate skills
|
||||
- `bin/gstack-uninstall` doesn't know about `.slate/`
|
||||
- `bin/dev-setup` doesn't wire `.slate/` for contributor dev mode
|
||||
- `bin/gstack-platform-detect` doesn't detect Slate
|
||||
- E2E tests should set `SLATE_DISABLE_CLAUDE_CODE_SKILLS=1` to prove `.slate/` path
|
||||
actually works (not just falling back to `.claude/`)
|
||||
|
||||
## Session Runner Design (for later)
|
||||
|
||||
When the JSONL format is verified, the session runner should:
|
||||
|
||||
- Spawn: `slate -q "<prompt>" --stream-json --dangerously-skip-permissions -w <dir>`
|
||||
- Parse: Claude Code SDK-compatible NDJSON (assumed, needs verification)
|
||||
- Skills: Install to `.slate/skills/` in test fixture (not `.claude/skills/`)
|
||||
- Auth: Use `SLATE_API_KEY` or existing `~/.slate/` credentials
|
||||
- Isolation: Use `SLATE_TEST_HOME` for home directory isolation
|
||||
- Timeout: 300s default (same as Codex)
|
||||
|
||||
```typescript
|
||||
export interface SlateResult {
|
||||
output: string;
|
||||
toolCalls: string[];
|
||||
tokens: number;
|
||||
exitCode: number;
|
||||
durationMs: number;
|
||||
sessionId: string | null;
|
||||
rawLines: string[];
|
||||
stderr: string;
|
||||
}
|
||||
```
|
||||
|
||||
## Docs References
|
||||
|
||||
- Slate docs: https://docs.randomlabs.ai
|
||||
- Quickstart: https://docs.randomlabs.ai/en/getting-started/quickstart
|
||||
- Skills: https://docs.randomlabs.ai/en/using-slate/skills
|
||||
- Configuration: https://docs.randomlabs.ai/en/using-slate/configuration
|
||||
- Hotkeys: https://docs.randomlabs.ai/en/using-slate/hotkey_reference
|
||||
Reference in New Issue
Block a user