mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
Merge origin/main into garrytan/slim-gstack-skills
VERSION → 1.15.0.0 (MINOR bump on top of main's v1.14.0.0). Branch's v1.13.1.0 work (preamble compression + real-PTY harness + 5 plan-mode tests passing) consolidated with v1.15.0.0 work (6 new E2E tests on the harness + parseNumberedOptions + budget regression utils) into a single release entry — v1.13.1.0 never landed on main, so its content rolls into the final shippable version per the never-orphan rule in CLAUDE.md. Conflicts resolved: - VERSION: 1.13.1.0 (HEAD) + 1.14.0.0 (main) → 1.15.0.0 - package.json: matching 1.15.0.0 - CHANGELOG.md: replaced HEAD's 1.13.1.0 entry with a consolidated 1.15.0.0 entry above main's untouched 1.14.0.0 entry. Itemized changes split per-version (no shared header). CLAUDE.md adds "Scale-aware bumps — use common sense" guidance under CHANGELOG + VERSION style. Big diffs (>2K LOC, new capability) bump MINOR; PATCH is for fixes/small adds; MAJOR for breaking changes. Codified after a v1.14.1.0 PATCH attempt got correctly pushed back on for a ~10K-line additions / -24K-line removals release. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,10 @@ bin/gstack-global-discover
|
||||
.gbrain/
|
||||
.context/
|
||||
extension/.auth.json
|
||||
# xterm assets are vendored from npm at build time; not source-of-truth.
|
||||
extension/lib/xterm.js
|
||||
extension/lib/xterm.css
|
||||
extension/lib/xterm-addon-fit.js
|
||||
.gstack-worktrees/
|
||||
/tmp/
|
||||
*.log
|
||||
|
||||
+78
-7
@@ -1,14 +1,14 @@
|
||||
# Changelog
|
||||
|
||||
## [1.13.1.0] - 2026-04-25
|
||||
## [1.15.0.0] - 2026-04-26
|
||||
|
||||
## **Skill prompts get a 25% haircut. Plan-mode E2E tests work for the first time ever.**
|
||||
## **Skill prompts get a 25% haircut. Plan-mode E2E coverage doubles, and AUQ rendering is now testable.**
|
||||
|
||||
Two pieces of work in one release. First, every preamble resolver got compressed: 18 resolvers (Voice, Writing Style, AskUserQuestion Format, Completeness Principle, Plan Mode Info, Brain Sync, Routing Injection, and 11 more) lost a third of their prose without losing a single semantic rule. The full corpus of generated `SKILL.md` files dropped from 3.08 MB to 2.30 MB across 47 outputs. Second, the 5 plan-mode E2E tests added in v1.11.1.0 and rewritten in v1.12.1.0 turned out to have never actually passed. The SDK harness they used couldn't observe Claude's plan-mode confirmation UI, so `result.askUserQuestions.length` was always 0. They fail on `origin/main`. They fail on v1.0.0.0. This release ships a real-PTY harness that drives the actual `claude` binary, watches the rendered terminal, and gets all 5 to green.
|
||||
Three pieces of work in one release. First, every preamble resolver got compressed: 18 resolvers (Voice, Writing Style, AskUserQuestion Format, Completeness Principle, Plan Mode Info, Brain Sync, Routing Injection, and 11 more) lost a third of their prose without losing a single semantic rule. The full corpus of generated `SKILL.md` files dropped from 3.08 MB to 2.30 MB across 47 outputs. Second, the 5 plan-mode E2E tests added in v1.11.1.0 and rewritten in v1.12.1.0 turned out to have never actually passed — the SDK harness they used couldn't observe Claude's plan-mode confirmation UI. This release ships a real-PTY harness that drives the actual `claude` binary, watches the rendered terminal, and gets all 5 to green. Third, on top of that harness, 6 new E2E tests cover behaviors no test could reach before: AUQ format compliance, plan-design UI-scope detection (positive path), tool-budget regression, /ship idempotency end-to-end, /plan-ceo answer-routing, and /autoplan phase ordering.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Token-level reduction comes from regenerating every `SKILL.md` against the slim resolvers (`bun run gen:skill-docs --host all`). Plan-mode E2E numbers come from `EVALS=1 EVALS_TIER=gate bun test test/skill-e2e-plan-*-plan-mode.test.ts` on a clean working tree.
|
||||
Token-level reduction comes from regenerating every `SKILL.md` against the slim resolvers (`bun run gen:skill-docs --host all`). Plan-mode E2E numbers come from `EVALS=1 EVALS_TIER=gate bun test test/skill-e2e-plan-*-plan-mode.test.ts` on a clean working tree. New E2E test verification uses the same gate flag against the new test files.
|
||||
|
||||
| Metric | Before | After | Δ |
|
||||
|---|---|---|---|
|
||||
@@ -17,23 +17,38 @@ Token-level reduction comes from regenerating every `SKILL.md` against the slim
|
||||
| `plan-ceo-review` preamble | 54 KB | 31 KB | −43% |
|
||||
| Plan-mode E2E tests passing | 0/5 | 5/5 | +5 |
|
||||
| Plan-mode E2E wall time | ∞ (never green) | 790 s (sequential) | proven |
|
||||
| Real-PTY E2E test count | 5 | 11 | +6 |
|
||||
| Gate-tier paid E2E added | 0 | 3 | auq-format, design-with-ui, budget-regression |
|
||||
| Periodic-tier paid E2E added | 0 | 3 | mode-routing, ship-idempotency, autoplan-chain |
|
||||
| New helper unit tests | 0 | 23 | parser + budget regression coverage |
|
||||
|
||||
| Skill class | Old preamble | New preamble | Δ |
|
||||
|---|---|---|---|
|
||||
| Tier-2+ review skills | ~50 KB | ~30 KB | −40% |
|
||||
| Tier-1 quick skills | ~12 KB | ~9 KB | −25% |
|
||||
|
||||
The biggest wins are the tier-≥3 plan reviews that load full preamble surface (Brain Sync, Context Recovery, Routing Injection): they keep all the load-bearing functionality and lose almost half the bytes. Every gstack invocation is now ~50K tokens lighter.
|
||||
The biggest wins are the tier-≥3 plan reviews that load full preamble surface (Brain Sync, Context Recovery, Routing Injection): they keep all the load-bearing functionality and lose almost half the bytes. Every gstack invocation is now ~50K tokens lighter, and the test harness can finally observe what users actually see in the terminal.
|
||||
|
||||
### What this means for builders
|
||||
|
||||
Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more headroom inside the 200K context window for actual work. And for anyone who tried `/plan-ceo-review` in plan mode and watched it silently write a plan file: those tests now actually verify that doesn't happen. Run `bun run gen:skill-docs --host all` after pulling. The 5 plan-mode tests will run in CI on the next gate-tier eval pass.
|
||||
Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more headroom inside the 200K context window for actual work. The plan-mode E2E tests now actually verify the skill doesn't silently write a plan file when `/plan-ceo-review` runs in plan mode. And the 3 new gate-tier tests catch a class of regression that was previously invisible: AUQ format drift (`Recommendation:` line missing), UI-scope misdetection (positive path), and tool-call budget bloat (a skill burning 3× the tools it used to). Run `bun run gen:skill-docs --host all` after pulling. The 11 plan-mode tests will run in CI on the next gate-tier eval pass.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
|
||||
- `test/helpers/claude-pty-runner.ts`: real-PTY test harness using `Bun.spawn({terminal:})` (Bun 1.3.10+ has built-in PTY — no `node-pty`, no native modules). Exposes `launchClaudePty()` for raw session control and `runPlanSkillObservation()` as the high-level contract for plan-mode skill tests.
|
||||
- `parseNumberedOptions(visible)` and `isPermissionDialogVisible(visible)` helpers in `claude-pty-runner.ts`. Tests can now look up an option index by its label without hard-coding positions, and auto-grant Claude Code's file-edit / workspace-trust / bash-permission dialogs that fire during preamble side-effects.
|
||||
- `findBudgetRegressions()` and `assertNoBudgetRegression()` in `test/helpers/eval-store.ts`. Pure functions returning tests that grew >2× in tools or turns vs the prior eval run, with floors at 5 prior tools / 3 prior turns to avoid noise. Env override `GSTACK_BUDGET_RATIO`.
|
||||
- 6 new real-PTY E2E tests on the harness:
|
||||
- `skill-e2e-auq-format-compliance.test.ts` (gate, ~$0.50/run): asserts every gstack `AskUserQuestion` rendering contains the 7 mandated format elements (ELI10, Recommendation, Pros/Cons with ✅/❌, Net, `(recommended)` label).
|
||||
- `skill-e2e-plan-design-with-ui.test.ts` (gate, ~$0.80/run): positive coverage for `/plan-design-review` UI-scope detection. Counterpart to the existing no-UI early-exit test — without it, a regression that flips the detector to "early-exit always" would ship undetected.
|
||||
- `skill-budget-regression.test.ts` (gate, free): branch-scoped library-only assertion that no skill burns >2× tools or turns vs its prior recorded run.
|
||||
- `skill-e2e-plan-ceo-mode-routing.test.ts` (periodic, ~$3/run): verifies AUQ answer routing — HOLD SCOPE picks routes to rigor language, SCOPE EXPANSION picks route to expansion language.
|
||||
- `skill-e2e-ship-idempotency.test.ts` (periodic, ~$3/run): runs `/ship` end-to-end against a real git fixture with `STATE: ALREADY_BUMPED` baked in; asserts no double-bump, no double-commit, no fixture mutation.
|
||||
- `skill-e2e-autoplan-chain.test.ts` (periodic, ~$8/run): asserts `/autoplan` phase ordering by tee'ing timestamps as each `**Phase N complete.**` marker appears.
|
||||
- `test/helpers-unit.test.ts`: 23 unit tests covering `parseNumberedOptions` edge cases (empty, partial paint, >9 options, stale-vs-fresh anchoring) and `findBudgetRegressions` (noise floor, env override, missing tool data).
|
||||
- `test/fixtures/plans/ui-heavy-feature.md`: planted plan with explicit UI scope keywords for the new design-with-UI test.
|
||||
- Auto-handling of the workspace-trust dialog so tests run in temp directories without manual intervention.
|
||||
- Outcome contract: `asked` | `plan_ready` | `silent_write` | `exited` | `timeout`. Tests pass on `asked` or `plan_ready`, fail on the rest.
|
||||
|
||||
@@ -43,6 +58,7 @@ Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more hea
|
||||
- All 47 generated `SKILL.md` files regenerated; 3 ship golden fixtures regenerated.
|
||||
- Plan-* skills retain full preamble surface (Brain Sync, Context Recovery, Routing Injection) — the early slim attempt that cut these was reverted after diagnosing them as load-bearing.
|
||||
- 5 plan-mode E2E tests rewritten on the new harness with a 300s observation budget.
|
||||
- `isNumberedOptionListVisible` regex tolerates whitespace collapse from TTY cursor-positioning escapes (`\x1b[40C`) which `stripAnsi` removes — `\b2\.` was failing on word-to-word transitions where stripped output read `text2.`.
|
||||
|
||||
#### Fixed
|
||||
|
||||
@@ -56,9 +72,64 @@ Faster every-skill startup, cheaper prompt-cache pricing on cold reads, more hea
|
||||
|
||||
#### For contributors
|
||||
|
||||
- `test/helpers/touchfiles.ts`: 5 plan-mode test selections + e2e-harness-audit selection now point at `claude-pty-runner.ts` instead of the deleted helper.
|
||||
- `test/helpers/touchfiles.ts`: 5 plan-mode test selections + e2e-harness-audit selection now point at `claude-pty-runner.ts` instead of the deleted helper. 6 new entries (`auq-format-pty`, `plan-ceo-mode-routing`, `plan-design-with-ui-scope`, `budget-regression-pty`, `ship-idempotency-pty`, `autoplan-chain-pty`) with tier classifications: 3 gate, 3 periodic.
|
||||
- `test/e2e-harness-audit.test.ts`: recognizes `runPlanSkillObservation` as a valid coverage path alongside the legacy `canUseTool` / `runPlanModeSkillTest` patterns.
|
||||
- New unit test: `test/gen-skill-docs.test.ts` asserts plan-review preambles stay under 33 KB and the slim Voice section preserves its load-bearing semantic contract (lead-with-the-point, name-the-file, user-outcome framing, no-corporate, no-AI-vocab, user-sovereignty).
|
||||
- `test/touchfiles.test.ts`: skill-specific change selection count updated 15 → 18 to match the 6 new touchfile entries that depend on `plan-ceo-review/**`.
|
||||
|
||||
## [1.14.0.0] - 2026-04-25
|
||||
|
||||
## **The gstack browser sidebar is now an interactive Claude Code REPL with live tab awareness.**
|
||||
|
||||
Open the side panel and Claude Code is right there in a real terminal. Type, watch the agent work, switch browser tabs and Claude sees the change. The old one-shot chat queue is gone. Two-way conversation, slash commands, `/resume`, ANSI colors, all of it. Plus a `$B tab-each` command that fans out a single browse command across every open tab and returns per-tab JSON results.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
| Metric | Before | After | Δ |
|
||||
|---|---|---|---|
|
||||
| Sidebar surfaces | Chat (one-shot `claude -p`) + 3 debug | Terminal (live PTY) + 3 debug | -1 surface, +interactive |
|
||||
| Subprocesses spawned per session | Many (one per chat message) | One (PTY claude, lazy-spawned) | -N |
|
||||
| Lines in `extension/sidepanel.js` | 1969 | 1042 | -47% |
|
||||
| Total diff | — | 27 files, +2875 / -3885 | -1010 net |
|
||||
| New unit + integration + regression tests | 0 | 56+ | +56 |
|
||||
| Live `tabs.json` push latency | n/a (no live state) | <50ms after `chrome.tabs` event | new capability |
|
||||
|
||||
### What this means for builders
|
||||
|
||||
Open the sidebar, type. Real PTY means slash commands, `/resume`, real ANSI rendering, real claude process lifecycle. Switch browser tabs while Claude is running and `<stateDir>/tabs.json` + `active-tab.json` update in place — Claude reads them, no need to ask `$B tabs`. Need to do the same thing on every tab? `$B tab-each <command>` returns a JSON array, original active tab restored when done, no OS focus stealing.
|
||||
|
||||
The old chat queue is gone. `sidebar-agent.ts`, `/sidebar-command`, `/sidebar-chat`, `/sidebar-agent/event` all deleted. The Cleanup / Screenshot / Cookies toolbar buttons survive in the Terminal pane — Cleanup pipes its prompt straight into the live PTY via `window.gstackInjectToTerminal()` instead of spawning yet another `claude -p`.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
|
||||
- **Interactive Terminal sidebar tab.** xterm.js + a non-compiled `terminal-agent.ts` Bun process that spawns claude with `Bun.spawn({terminal: {rows, cols, data}})`. Auto-connects when the side panel opens, no keypress needed.
|
||||
- **`$B tab-each <command>`** — fan-out helper for multi-tab work. Returns `{command, args, total, results: [{tabId, url, title, status, output}]}`. Skips chrome:// pages, scope-checks the inner command before iterating, restores the original active tab in a `finally` block, never pulls focus away from the user's foreground app.
|
||||
- **Live tab state files.** `<stateDir>/tabs.json` (full list with id, url, title, active, pinned, audible, windowId) and `<stateDir>/active-tab.json` (current active). Updated atomically on every `chrome.tabs` event (activated, created, removed, URL/title change). Claude reads on demand instead of running `$B tabs`.
|
||||
- **Tab-awareness system prompt** injected via `claude --append-system-prompt` at spawn so the model knows about the state files and the `$B tab-each` command without being told.
|
||||
- **Always-visible Restart button** in the Terminal toolbar. Force-restart claude any time, not just from the "session ended" state.
|
||||
|
||||
#### Changed
|
||||
- **Sidebar is Terminal-only.** No more `Terminal | Chat` primary tab nav. Activity / Refs / Inspector still live behind the `debug` toggle in the footer. Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) moved into the Terminal toolbar.
|
||||
- **WebSocket auth uses `Sec-WebSocket-Protocol`** instead of cookies. Browsers can't set `Authorization` on WS upgrades, and `SameSite=Strict` cookies don't survive the cross-port jump from server.ts:34567 to the agent's random port from a chrome-extension origin. The token rides on `new WebSocket(url, [`gstack-pty.<token>`])` and the agent echoes the protocol back (Chromium closes connections that don't pick a protocol).
|
||||
- **Cleanup button now drives the live PTY.** Clicking "🧹 Cleanup" injects the cleanup prompt straight into claude via `window.gstackInjectToTerminal()`. The Inspector "Send to Code" action uses the same path. No more `/sidebar-command` POSTs.
|
||||
- **Repaint after debug-tab close.** xterm.js doesn't auto-redraw when its container flips from `display: none` back to `display: flex`. A MutationObserver on `#tab-terminal`'s class attribute now forces a `fitAddon.fit() + term.refresh() + resize` push when the pane becomes visible.
|
||||
|
||||
#### Removed
|
||||
- **`browse/src/sidebar-agent.ts`** — the one-shot `claude -p` queue worker. ~900 lines.
|
||||
- **Server endpoints**: `/sidebar-command`, `/sidebar-chat[/clear]`, `/sidebar-agent/{event,kill,stop}`, `/sidebar-tabs[/switch]`, `/sidebar-session{,/new,/list}`, `/sidebar-queue/dismiss`. ~600 lines.
|
||||
- **Chat-related state** in server.ts: `ChatEntry`, `SidebarSession`, `TabAgentState`, `pickSidebarModel`, `addChatEntry`, `processAgentEvent`, `killAgent`, the agent-health watchdog, `chatBuffer`, the per-tab agent map.
|
||||
- **Chat UI in sidepanel.html**: primary-tab nav, `<main id="tab-chat">`, the chat input bar, the experimental "Browser co-pilot" banner, the security event banner, the `clear-chat` footer button.
|
||||
- **Five obsolete test files**: `sidebar-agent.test.ts`, `sidebar-agent-roundtrip.test.ts`, `security-e2e-fullstack.test.ts`, `security-review-fullstack.test.ts`, `security-review-sidepanel-e2e.test.ts`. Plus 5 chat-only describe blocks inside surviving security tests (loadSession session-ID validation, switchChatTab DocumentFragment, pollChat reentrancy, sidebar-tabs URL sanitization, agent queue security).
|
||||
|
||||
#### For contributors
|
||||
- **`browse/src/pty-session-cookie.ts`** mirrors `sse-session-cookie.ts`. Same TTL, same opportunistic pruning, separate registry (PTY tokens must never be valid as SSE tokens or vice versa).
|
||||
- **`docs/designs/SIDEBAR_MESSAGE_FLOW.md`** rewritten around the Terminal flow: WebSocket upgrade, dual-token model (`AUTH_TOKEN` for `/pty-session`, `gstack-pty.<token>` for `/ws`, `INTERNAL_TOKEN` for server↔agent loopback), threat-model boundary (Terminal tab bypasses the prompt-injection stack on purpose; user keystrokes are the trust source).
|
||||
- **`browse/test/terminal-agent.test.ts`** (16 tests) + `terminal-agent-integration.test.ts` (real `/bin/bash` PTY round-trip, raw `Sec-WebSocket-Protocol` upgrade verification) + `tab-each.test.ts` (10 tests with mock `BrowserManager`) + `sidebar-tabs.test.ts` (27 structural assertions locking the chat-rip invariants).
|
||||
- **CLAUDE.md** updated with the dual-token model, the cookie-vs-protocol rationale, and the cross-pane injection pattern.
|
||||
- **`vendor:xterm`** build step copies `xterm@5.x` and `xterm-addon-fit` from `node_modules/` into `extension/lib/` at build time. xterm files are gitignored.
|
||||
- **TODOS.md** carries three v1.1+ follow-ups: PTY session survival across sidebar reload (Issue 1C deferred), `/health` `AUTH_TOKEN` distribution audit (codex finding, pre-existing soft leak), and dropping the now-dead `security-classifier.ts` ML pipeline.
|
||||
|
||||
## [1.13.0.0] - 2026-04-25
|
||||
|
||||
|
||||
@@ -225,12 +225,35 @@ When you need to interact with a browser (QA, dogfooding, cookie setup), use the
|
||||
project uses.
|
||||
|
||||
**Sidebar architecture:** Before modifying `sidepanel.js`, `background.js`,
|
||||
`content.js`, `sidebar-agent.ts`, or sidebar-related server endpoints, read
|
||||
`docs/designs/SIDEBAR_MESSAGE_FLOW.md`. It documents the full initialization
|
||||
timeline, message flow, auth token chain, tab concurrency model, and known
|
||||
failure modes. The sidebar spans 5 files across 2 codebases (extension + server)
|
||||
with non-obvious ordering dependencies. The doc exists to prevent the kind of
|
||||
silent failures that come from not understanding the cross-component flow.
|
||||
`content.js`, `terminal-agent.ts`, or sidebar-related server endpoints,
|
||||
read `docs/designs/SIDEBAR_MESSAGE_FLOW.md`. The sidebar has one primary
|
||||
surface — the **Terminal** pane (interactive `claude` PTY) — with
|
||||
Activity / Refs / Inspector as debug overlays behind the footer's
|
||||
`debug` toggle. The chat queue path was ripped once the PTY proved out;
|
||||
`sidebar-agent.ts` and the `/sidebar-command` / `/sidebar-chat` /
|
||||
`/sidebar-agent/event` endpoints are gone. The doc covers the WS auth
|
||||
flow, dual-token model, and threat-model boundary — silent failures
|
||||
here usually trace to not understanding the cross-component flow.
|
||||
|
||||
**WebSocket auth uses Sec-WebSocket-Protocol, not cookies.** Browsers
|
||||
can't set `Authorization` on a WebSocket upgrade, but they CAN set
|
||||
`Sec-WebSocket-Protocol` via `new WebSocket(url, [token])`. The agent
|
||||
reads it, validates against `validTokens`, and MUST echo the protocol
|
||||
back in the upgrade response — without the echo, Chromium closes the
|
||||
connection immediately. `Set-Cookie: gstack_pty=...` is kept as a
|
||||
fallback for non-browser callers (the cross-port `SameSite=Strict`
|
||||
cookie path doesn't survive from a chrome-extension origin).
|
||||
|
||||
**Cross-pane PTY injection.** The toolbar's Cleanup button and the
|
||||
Inspector's "Send to Code" action both pipe text into the live claude
|
||||
PTY via `window.gstackInjectToTerminal(text)`, exposed by
|
||||
`sidepanel-terminal.js`. No `/sidebar-command` POST — the live REPL is
|
||||
the only execution surface in the sidebar now.
|
||||
|
||||
**`/health` MUST NOT surface any shell-grant token.** It already leaks
|
||||
`AUTH_TOKEN` to localhost callers in headed mode (a v1.1+ TODO). Don't
|
||||
make that worse by adding the PTY session token there. PTY auth flows
|
||||
through `POST /pty-session` only.
|
||||
|
||||
**Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel,
|
||||
the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command
|
||||
@@ -437,6 +460,31 @@ claims v1.7.0.0 as a MINOR and branch B is also a MINOR, B lands at v1.8.0.0
|
||||
`bin/gstack-next-version` advances within the chosen bump level rather than
|
||||
repicking the level when collisions happen.
|
||||
|
||||
**Scale-aware bumps — use common sense.** When the diff is big, bump MINOR (or
|
||||
MAJOR), not PATCH. PATCH is for bug fixes and small additions; MINOR is for
|
||||
substantial new capability or substantial reduction; MAJOR is for breaking
|
||||
changes. Rough guideposts (don't treat as rules, treat as smell-checks):
|
||||
|
||||
- **PATCH (X.Y.Z+1.0)**: bug fix, doc tweak, small additive change, single
|
||||
test/file added. Net diff under ~500 lines, no new user-facing capability.
|
||||
- **MINOR (X.Y+1.0.0)**: new capability shipped (skill, harness, command, big
|
||||
refactor), substantial code reduction (compression, migration), or coordinated
|
||||
multi-file change. Net diff over ~2000 lines added/removed, OR a user-visible
|
||||
feature you'd put in a tweet.
|
||||
- **MAJOR (X+1.0.0.0)**: breaking change to public surface (CLI flag rename,
|
||||
skill removed, config format changed), OR a release big enough to be the
|
||||
headline of a blog post.
|
||||
|
||||
If you find yourself debating "is 10K added + 24K removed really a PATCH?" — it
|
||||
isn't. Bump MINOR. Same for "this adds a whole new test harness with 6 new E2E
|
||||
tests + helper utilities" — MINOR. The bump level is communication to the user
|
||||
about what kind of release this is; don't undersell it.
|
||||
|
||||
When merging origin/main brings a higher VERSION, re-evaluate the bump level
|
||||
against the SCALE of your branch's work, not just whether main moved forward.
|
||||
If main bumped MINOR and your branch is also a substantial change, you bump
|
||||
MINOR again on top (e.g., main at v1.14.0.0, your branch lands v1.15.0.0).
|
||||
|
||||
**VERSION and CHANGELOG are branch-scoped.** Every feature branch that ships gets its
|
||||
own version bump and CHANGELOG entry. The entry describes what THIS branch adds —
|
||||
not what was already on main.
|
||||
|
||||
@@ -880,6 +880,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
|
||||
| `closetab [id]` | Close tab |
|
||||
| `newtab [url] [--json]` | Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf). |
|
||||
| `tab <id>` | Switch to tab |
|
||||
| `tab-each <command> [args...]` | Run a command on every open tab. Returns JSON with per-tab results. |
|
||||
| `tabs` | List open tabs |
|
||||
|
||||
### Server
|
||||
|
||||
@@ -1,5 +1,57 @@
|
||||
# TODOS
|
||||
|
||||
## Sidebar Terminal (cc-pty-import follow-ups)
|
||||
|
||||
### v1.1: PTY session survives sidebar reload
|
||||
|
||||
**What:** Today the Terminal tab's PTY dies with the WebSocket — sidebar
|
||||
reload, side-panel close, even a quick navigate-away in another tab close
|
||||
the session. v1.1 should key the PTY on a tab/session id so a reload
|
||||
reattaches to the existing claude process and you keep `/resume` history.
|
||||
|
||||
**Why:** Mid-task resilience. When you've been pair-programming with claude
|
||||
for 20 minutes and an accidental Cmd-R blows it away, the cost is real.
|
||||
|
||||
**Pros:** Better UX, fewer interrupted sessions. **Cons:** Session-tracking
|
||||
state, ghost-process risk, lifecycle bugs (when DOES the PTY actually go
|
||||
away?). v1 chose the simple "PTY dies with WS" model deliberately.
|
||||
|
||||
**Context:** /plan-eng-review Issue 1C decision (cc-pty-import branch,
|
||||
2026-04-25). v1 ships with phoenix's lifecycle. **Depends on:**
|
||||
cc-pty-import landed.
|
||||
|
||||
**Priority:** P2 (nice-to-have).
|
||||
**Effort:** M. Likely needs a per-tab session map keyed by chrome.tabs.id
|
||||
plus a TTL so abandoned PTYs eventually exit.
|
||||
|
||||
---
|
||||
|
||||
### v1.1+: Audit `/health` token distribution
|
||||
|
||||
**What:** Codex's outside-voice review on cc-pty-import flagged that
|
||||
`/health` already surfaces `AUTH_TOKEN` to any localhost caller in headed
|
||||
mode (`server.ts:1657`). That's a pre-existing soft leak — anything
|
||||
running on localhost gets the root token by hitting `/health`.
|
||||
|
||||
**Why:** cc-pty-import sidesteps it by NOT putting the PTY token there
|
||||
(uses an HttpOnly cookie path instead). But the underlying leak is still
|
||||
shippable surface. A second extension or a localhost web app could
|
||||
currently scrape `AUTH_TOKEN` and hit any browse-server endpoint.
|
||||
|
||||
**Pros:** Closes a real privilege-escalation path on multi-extension
|
||||
machines. **Cons:** Either we tighten the gate (Origin must be OUR
|
||||
extension id, not just any chrome-extension://) or we move bootstrap
|
||||
discovery off `/health` entirely. Either has migration cost for tests
|
||||
and the existing extension.
|
||||
|
||||
**Context:** codex finding #2 on cc-pty-import plan-eng review. Not in
|
||||
scope of that PR; deliberately deferred to keep PTY-import small.
|
||||
|
||||
**Priority:** P2.
|
||||
**Effort:** M.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
## P1: Structural STOP-Ask forcing function across all skills
|
||||
|
||||
@@ -804,6 +804,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
|
||||
| `closetab [id]` | Close tab |
|
||||
| `newtab [url] [--json]` | Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf). |
|
||||
| `tab <id>` | Switch to tab |
|
||||
| `tab-each <command> [args...]` | Run a command on every open tab. Returns JSON with per-tab results. |
|
||||
| `tabs` | List open tabs |
|
||||
|
||||
### Server
|
||||
|
||||
+32
-47
@@ -853,7 +853,7 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
||||
// Delete stale state file
|
||||
safeUnlinkQuiet(config.stateFile);
|
||||
|
||||
console.log('Launching headed Chromium with extension + sidebar agent...');
|
||||
console.log('Launching headed Chromium with extension + terminal agent...');
|
||||
try {
|
||||
// Start server in headed mode with extension auto-loaded
|
||||
// Use a well-known port so the Chrome extension auto-connects
|
||||
@@ -882,56 +882,41 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
||||
const status = await resp.text();
|
||||
console.log(`Connected to real Chrome\n${status}`);
|
||||
|
||||
// Auto-start sidebar agent
|
||||
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
|
||||
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
|
||||
if (!fs.existsSync(agentScript)) {
|
||||
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
|
||||
// sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
|
||||
// the Terminal pane runs an interactive PTY now, no more one-shot
|
||||
// claude -p subprocesses to multiplex.
|
||||
|
||||
// Auto-start terminal agent (non-compiled bun process). Owns the PTY
|
||||
// WebSocket for the sidebar Terminal pane.
|
||||
let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
|
||||
if (!fs.existsSync(termAgentScript)) {
|
||||
termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');
|
||||
}
|
||||
try {
|
||||
if (!fs.existsSync(agentScript)) {
|
||||
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
|
||||
if (fs.existsSync(termAgentScript)) {
|
||||
// Kill old terminal-agents so a stale port file can't trick the
|
||||
// server into routing /pty-session at a dead listener.
|
||||
try {
|
||||
const { spawnSync } = require('child_process');
|
||||
spawnSync('pkill', ['-f', 'terminal-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
|
||||
} catch (err: any) {
|
||||
if (err?.code !== 'ENOENT') throw err;
|
||||
}
|
||||
const termProc = Bun.spawn(['bun', 'run', termAgentScript], {
|
||||
cwd: config.projectDir,
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: config.stateFile,
|
||||
BROWSE_SERVER_PORT: String(newState.port),
|
||||
},
|
||||
stdio: ['ignore', 'ignore', 'ignore'],
|
||||
});
|
||||
termProc.unref();
|
||||
console.log(`[browse] Terminal agent started (PID: ${termProc.pid})`);
|
||||
}
|
||||
// Clear old agent queue
|
||||
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
try {
|
||||
fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
|
||||
fs.writeFileSync(agentQueue, '', { mode: 0o600 });
|
||||
} catch (err: any) {
|
||||
if (err?.code !== 'EACCES') throw err;
|
||||
}
|
||||
|
||||
// Resolve browse binary path the same way — execPath-relative
|
||||
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
|
||||
if (!fs.existsSync(browseBin)) {
|
||||
browseBin = process.execPath; // the compiled binary itself
|
||||
}
|
||||
|
||||
// Kill any existing sidebar-agent processes before starting a new one.
|
||||
// Old agents have stale auth tokens and will silently fail to relay events,
|
||||
// causing the server to mark the agent as "hung".
|
||||
try {
|
||||
const { spawnSync } = require('child_process');
|
||||
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
|
||||
} catch (err: any) {
|
||||
if (err?.code !== 'ENOENT') throw err;
|
||||
}
|
||||
|
||||
const agentProc = Bun.spawn(['bun', 'run', agentScript], {
|
||||
cwd: config.projectDir,
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_BIN: browseBin,
|
||||
BROWSE_STATE_FILE: config.stateFile,
|
||||
BROWSE_SERVER_PORT: String(newState.port),
|
||||
},
|
||||
stdio: ['ignore', 'ignore', 'ignore'],
|
||||
});
|
||||
agentProc.unref();
|
||||
console.log(`[browse] Sidebar agent started (PID: ${agentProc.pid})`);
|
||||
} catch (err: any) {
|
||||
console.error(`[browse] Sidebar agent failed to start: ${err.message}`);
|
||||
console.error(`[browse] Run manually: bun run ${agentScript}`);
|
||||
// Non-fatal: chat still works without the terminal agent.
|
||||
console.error(`[browse] Terminal agent failed to start: ${err.message}`);
|
||||
}
|
||||
} catch (err: any) {
|
||||
console.error(`[browse] Connect failed: ${err.message}`);
|
||||
|
||||
@@ -30,7 +30,7 @@ export const WRITE_COMMANDS = new Set([
|
||||
]);
|
||||
|
||||
export const META_COMMANDS = new Set([
|
||||
'tabs', 'tab', 'newtab', 'closetab',
|
||||
'tabs', 'tab', 'tab-each', 'newtab', 'closetab',
|
||||
'status', 'stop', 'restart',
|
||||
'screenshot', 'pdf', 'responsive',
|
||||
'chain', 'diff',
|
||||
@@ -144,6 +144,7 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
|
||||
'tab': { category: 'Tabs', description: 'Switch to tab', usage: 'tab <id>' },
|
||||
'newtab': { category: 'Tabs', description: 'Open new tab. With --json, returns {"tabId":N,"url":...} for programmatic use (make-pdf).', usage: 'newtab [url] [--json]' },
|
||||
'closetab':{ category: 'Tabs', description: 'Close tab', usage: 'closetab [id]' },
|
||||
'tab-each':{ category: 'Tabs', description: 'Run a command on every open tab. Returns JSON with per-tab results.', usage: 'tab-each <command> [args...]' },
|
||||
// Server
|
||||
'status': { category: 'Server', description: 'Health check' },
|
||||
'stop': { category: 'Server', description: 'Shutdown server' },
|
||||
|
||||
@@ -285,6 +285,108 @@ export async function handleMetaCommand(
|
||||
return `Closed tab${id ? ` ${id}` : ''}`;
|
||||
}
|
||||
|
||||
case 'tab-each': {
|
||||
// Fan out a single command across every open tab. Returns a JSON
|
||||
// object: { results: [{tabId, url, title, status, output}], total }.
|
||||
// Restores the originally active tab when done so the user's view
|
||||
// doesn't shift under them.
|
||||
//
|
||||
// Usage: $B tab-each <command> [args...]
|
||||
// $B tab-each snapshot -i → snapshot every tab
|
||||
// $B tab-each text → grab clean text from every tab
|
||||
// $B tab-each goto https://x.y → load the same URL in every tab
|
||||
if (args.length === 0) {
|
||||
throw new Error(
|
||||
'Usage: browse tab-each <command> [args...]\n' +
|
||||
'Example: browse tab-each snapshot -i'
|
||||
);
|
||||
}
|
||||
|
||||
const innerRaw = args[0];
|
||||
const innerName = canonicalizeCommand(innerRaw);
|
||||
const innerArgs = args.slice(1);
|
||||
|
||||
// Scope check the inner command before fanning out, so a single
|
||||
// permission failure aborts the whole batch instead of partially
|
||||
// mutating tabs.
|
||||
if (tokenInfo && tokenInfo.clientId !== 'root' && !checkScope(tokenInfo, innerName)) {
|
||||
throw new Error(
|
||||
`tab-each rejected: subcommand "${innerRaw}" not allowed by your token scope (${tokenInfo.scopes.join(', ')}).`
|
||||
);
|
||||
}
|
||||
|
||||
const tabs = await bm.getTabListWithTitles();
|
||||
const originalActive = tabs.find(t => t.active)?.id ?? bm.getActiveTabId();
|
||||
|
||||
const executeCmd = opts?.executeCommand;
|
||||
const results: Array<{
|
||||
tabId: number;
|
||||
url: string;
|
||||
title: string;
|
||||
status: number;
|
||||
output: string;
|
||||
}> = [];
|
||||
|
||||
try {
|
||||
for (const tab of tabs) {
|
||||
// Skip chrome:// internal pages — they aren't useful targets and
|
||||
// many commands fail outright on them.
|
||||
if (tab.url.startsWith('chrome://') || tab.url.startsWith('chrome-extension://')) {
|
||||
results.push({
|
||||
tabId: tab.id,
|
||||
url: tab.url,
|
||||
title: tab.title || '',
|
||||
status: 0,
|
||||
output: 'skipped: internal page',
|
||||
});
|
||||
continue;
|
||||
}
|
||||
// Switch to the tab. Don't pull focus away — we're a background
|
||||
// operation; the user shouldn't see the OS window jump.
|
||||
bm.switchTab(tab.id, { bringToFront: false });
|
||||
|
||||
let status = 0;
|
||||
let output = '';
|
||||
if (executeCmd) {
|
||||
const r = await executeCmd(
|
||||
{ command: innerName, args: innerArgs, tabId: tab.id },
|
||||
tokenInfo,
|
||||
);
|
||||
status = r.status;
|
||||
output = r.result;
|
||||
if (status !== 200) {
|
||||
try { output = JSON.parse(output).error || output; } catch (err: any) { if (!(err instanceof SyntaxError)) throw err; }
|
||||
}
|
||||
} else {
|
||||
// Fallback path (CLI / test harness without a server context).
|
||||
// We don't recurse through read/write/meta directly here because
|
||||
// tab-each is only meaningful with the live server; surface a
|
||||
// clear error.
|
||||
status = 500;
|
||||
output = 'tab-each requires the browse server (no executeCommand context)';
|
||||
}
|
||||
|
||||
results.push({
|
||||
tabId: tab.id,
|
||||
url: tab.url,
|
||||
title: tab.title || '',
|
||||
status,
|
||||
output,
|
||||
});
|
||||
}
|
||||
} finally {
|
||||
// Restore the original active tab so the user's view is unchanged.
|
||||
try { bm.switchTab(originalActive, { bringToFront: false }); } catch {}
|
||||
}
|
||||
|
||||
return JSON.stringify({
|
||||
command: innerName,
|
||||
args: innerArgs,
|
||||
total: results.length,
|
||||
results,
|
||||
}, null, 2);
|
||||
}
|
||||
|
||||
// ─── Server Control ────────────────────────────────
|
||||
case 'status': {
|
||||
const page = bm.getPage();
|
||||
|
||||
@@ -0,0 +1,122 @@
|
||||
/**
|
||||
* Session cookie registry for the Terminal sidebar tab's PTY WebSocket.
|
||||
*
|
||||
* Why this exists: WebSocket clients in browsers cannot send Authorization
|
||||
* headers on the upgrade request. The terminal-agent's /ws upgrade therefore
|
||||
* authenticates via cookie. We never put the PTY token in /health (codex
|
||||
* outside-voice finding #2: /health already leaks AUTH_TOKEN to any
|
||||
* localhost caller in headed mode; reusing that path for shell access would
|
||||
* widen an existing bug). Instead, the extension does an authenticated
|
||||
* POST /pty-session with the bootstrap AUTH_TOKEN; the server mints a
|
||||
* short-lived cookie scoped to this terminal session and pushes it to the
|
||||
* agent via loopback. The browser then carries the cookie automatically on
|
||||
* the WS upgrade.
|
||||
*
|
||||
* Design mirrors `sse-session-cookie.ts` deliberately. Same TTL, same
|
||||
* scoped-token-must-not-be-valid-as-root invariant, same opportunistic
|
||||
* pruning. Two registries instead of one because the cookie names are
|
||||
* different (`gstack_sse` vs `gstack_pty`) and the token spaces must not
|
||||
* overlap — an SSE-read cookie must never grant PTY access, and vice versa.
|
||||
*/
|
||||
import * as crypto from 'crypto';
|
||||
|
||||
interface Session {
|
||||
createdAt: number;
|
||||
expiresAt: number;
|
||||
}
|
||||
|
||||
const TTL_MS = 30 * 60 * 1000; // 30 minutes — matches SSE cookie
|
||||
const MAX_SESSIONS = 10_000;
|
||||
const sessions = new Map<string, Session>();
|
||||
|
||||
export const PTY_COOKIE_NAME = 'gstack_pty';
|
||||
|
||||
/** Mint a fresh PTY session token. */
|
||||
export function mintPtySessionToken(): { token: string; expiresAt: number } {
|
||||
const token = crypto.randomBytes(32).toString('base64url');
|
||||
const now = Date.now();
|
||||
const expiresAt = now + TTL_MS;
|
||||
sessions.set(token, { createdAt: now, expiresAt });
|
||||
pruneExpired(now);
|
||||
return { token, expiresAt };
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate a token. Returns true only if the token exists AND is not expired.
|
||||
* Lazily removes expired entries; opportunistically prunes a few more on
|
||||
* every call so the registry stays bounded under reconnect pressure.
|
||||
*/
|
||||
export function validatePtySessionToken(token: string | null | undefined): boolean {
|
||||
if (!token) return false;
|
||||
const s = sessions.get(token);
|
||||
if (!s) {
|
||||
pruneExpired(Date.now());
|
||||
return false;
|
||||
}
|
||||
if (Date.now() > s.expiresAt) {
|
||||
sessions.delete(token);
|
||||
pruneExpired(Date.now());
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Drop a session token (called on WS close so a leaked cookie can't be
|
||||
* replayed against a new PTY).
|
||||
*/
|
||||
export function revokePtySessionToken(token: string | null | undefined): void {
|
||||
if (!token) return;
|
||||
sessions.delete(token);
|
||||
}
|
||||
|
||||
/** Parse the PTY session token from a Cookie header. */
|
||||
export function extractPtyCookie(req: Request): string | null {
|
||||
const cookieHeader = req.headers.get('cookie');
|
||||
if (!cookieHeader) return null;
|
||||
for (const part of cookieHeader.split(';')) {
|
||||
const [name, ...valueParts] = part.trim().split('=');
|
||||
if (name === PTY_COOKIE_NAME) {
|
||||
return valueParts.join('=') || null;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the Set-Cookie header value for the PTY session cookie.
|
||||
* - HttpOnly: not readable from JS (mitigates XSS exfiltration).
|
||||
* - SameSite=Strict: not sent on cross-site requests (mitigates CSWSH).
|
||||
* - Path=/: scope to whole origin so /ws and /pty-session both see it.
|
||||
* - Max-Age matches the TTL.
|
||||
*
|
||||
* Secure is intentionally omitted: the daemon binds to 127.0.0.1 over plain
|
||||
* HTTP; setting Secure would prevent the browser from ever sending it back.
|
||||
*/
|
||||
export function buildPtySetCookie(token: string): string {
|
||||
const maxAge = Math.floor(TTL_MS / 1000);
|
||||
return `${PTY_COOKIE_NAME}=${token}; HttpOnly; SameSite=Strict; Path=/; Max-Age=${maxAge}`;
|
||||
}
|
||||
|
||||
/** Clear the PTY session cookie. */
|
||||
export function buildPtyClearCookie(): string {
|
||||
return `${PTY_COOKIE_NAME}=; HttpOnly; SameSite=Strict; Path=/; Max-Age=0`;
|
||||
}
|
||||
|
||||
function pruneExpired(now: number): void {
|
||||
let checked = 0;
|
||||
for (const [token, session] of sessions) {
|
||||
if (checked++ >= 20) break;
|
||||
if (session.expiresAt <= now) sessions.delete(token);
|
||||
}
|
||||
while (sessions.size > MAX_SESSIONS) {
|
||||
const first = sessions.keys().next().value;
|
||||
if (!first) break;
|
||||
sessions.delete(first);
|
||||
}
|
||||
}
|
||||
|
||||
// Test-only reset.
|
||||
export function __resetPtySessions(): void {
|
||||
sessions.clear();
|
||||
}
|
||||
+148
-905
File diff suppressed because it is too large
Load Diff
@@ -1,947 +0,0 @@
|
||||
/**
|
||||
* Sidebar Agent — polls agent-queue from server, spawns claude -p for each
|
||||
* message, streams live events back to the server via /sidebar-agent/event.
|
||||
*
|
||||
* This runs as a NON-COMPILED bun process because compiled bun binaries
|
||||
* cannot posix_spawn external executables. The server writes to the queue
|
||||
* file, this process reads it and spawns claude.
|
||||
*
|
||||
* Usage: BROWSE_BIN=/path/to/browse bun run browse/src/sidebar-agent.ts
|
||||
*/
|
||||
|
||||
import { spawn } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import { safeUnlink } from './error-handling';
|
||||
import {
|
||||
checkCanaryInStructure, logAttempt, hashPayload, extractDomain,
|
||||
combineVerdict, writeSessionState, readSessionState, THRESHOLDS,
|
||||
readDecision, clearDecision, excerptForReview,
|
||||
type LayerSignal,
|
||||
} from './security';
|
||||
import {
|
||||
loadTestsavant, scanPageContent, checkTranscript,
|
||||
shouldRunTranscriptCheck, getClassifierStatus,
|
||||
loadDeberta, scanPageContentDeberta,
|
||||
type ToolCallInput,
|
||||
} from './security-classifier';
|
||||
|
||||
const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
const KILL_FILE = path.join(path.dirname(QUEUE), 'sidebar-agent-kill');
|
||||
const SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '34567', 10);
|
||||
const SERVER_URL = `http://127.0.0.1:${SERVER_PORT}`;
|
||||
const POLL_MS = 200; // 200ms poll — keeps time-to-first-token low
|
||||
const B = process.env.BROWSE_BIN || path.resolve(__dirname, '../../.claude/skills/gstack/browse/dist/browse');
|
||||
|
||||
const CANCEL_DIR = path.join(process.env.HOME || '/tmp', '.gstack');
|
||||
function cancelFileForTab(tabId: number): string {
|
||||
return path.join(CANCEL_DIR, `sidebar-agent-cancel-${tabId}`);
|
||||
}
|
||||
|
||||
interface QueueEntry {
|
||||
prompt: string;
|
||||
args?: string[];
|
||||
stateFile?: string;
|
||||
cwd?: string;
|
||||
tabId?: number | null;
|
||||
message?: string | null;
|
||||
pageUrl?: string | null;
|
||||
sessionId?: string | null;
|
||||
ts?: string;
|
||||
canary?: string; // session-scoped token; leak = prompt injection evidence
|
||||
}
|
||||
|
||||
function isValidQueueEntry(e: unknown): e is QueueEntry {
|
||||
if (typeof e !== 'object' || e === null) return false;
|
||||
const obj = e as Record<string, unknown>;
|
||||
if (typeof obj.prompt !== 'string' || obj.prompt.length === 0) return false;
|
||||
if (obj.args !== undefined && (!Array.isArray(obj.args) || !obj.args.every(a => typeof a === 'string'))) return false;
|
||||
if (obj.stateFile !== undefined) {
|
||||
if (typeof obj.stateFile !== 'string') return false;
|
||||
if (obj.stateFile.includes('..')) return false;
|
||||
}
|
||||
if (obj.cwd !== undefined) {
|
||||
if (typeof obj.cwd !== 'string') return false;
|
||||
if (obj.cwd.includes('..')) return false;
|
||||
}
|
||||
if (obj.tabId !== undefined && obj.tabId !== null && typeof obj.tabId !== 'number') return false;
|
||||
if (obj.message !== undefined && obj.message !== null && typeof obj.message !== 'string') return false;
|
||||
if (obj.pageUrl !== undefined && obj.pageUrl !== null && typeof obj.pageUrl !== 'string') return false;
|
||||
if (obj.sessionId !== undefined && obj.sessionId !== null && typeof obj.sessionId !== 'string') return false;
|
||||
if (obj.canary !== undefined && typeof obj.canary !== 'string') return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
let lastLine = 0;
|
||||
let authToken: string | null = null;
|
||||
// Per-tab processing — each tab can run its own agent concurrently
|
||||
const processingTabs = new Set<number>();
|
||||
// Active claude subprocesses — keyed by tabId for targeted kill
|
||||
const activeProcs = new Map<number, ReturnType<typeof spawn>>();
|
||||
let activeProc: ReturnType<typeof spawn> | null = null;
|
||||
// Kill-file timestamp last seen — avoids double-kill on same write
|
||||
let lastKillTs = 0;
|
||||
|
||||
// ─── File drop relay ──────────────────────────────────────────
|
||||
|
||||
function getGitRoot(): string | null {
|
||||
try {
|
||||
const { execSync } = require('child_process');
|
||||
return execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
|
||||
} catch (err: any) {
|
||||
console.debug('[sidebar-agent] Not in a git repo:', err.message);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function writeToInbox(message: string, pageUrl?: string, sessionId?: string): void {
|
||||
const gitRoot = getGitRoot();
|
||||
if (!gitRoot) {
|
||||
console.error('[sidebar-agent] Cannot write to inbox — not in a git repo');
|
||||
return;
|
||||
}
|
||||
|
||||
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
|
||||
fs.mkdirSync(inboxDir, { recursive: true, mode: 0o700 });
|
||||
|
||||
const now = new Date();
|
||||
const timestamp = now.toISOString().replace(/:/g, '-');
|
||||
const filename = `${timestamp}-observation.json`;
|
||||
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
|
||||
const finalFile = path.join(inboxDir, filename);
|
||||
|
||||
const inboxMessage = {
|
||||
type: 'observation',
|
||||
timestamp: now.toISOString(),
|
||||
page: { url: pageUrl || 'unknown', title: '' },
|
||||
userMessage: message,
|
||||
sidebarSessionId: sessionId || 'unknown',
|
||||
};
|
||||
|
||||
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2), { mode: 0o600 });
|
||||
fs.renameSync(tmpFile, finalFile);
|
||||
console.log(`[sidebar-agent] Wrote inbox message: ${filename}`);
|
||||
}
|
||||
|
||||
// ─── Auth ────────────────────────────────────────────────────────
|
||||
|
||||
async function refreshToken(): Promise<string | null> {
|
||||
// Read token from state file (same-user, mode 0o600) instead of /health
|
||||
try {
|
||||
const stateFile = process.env.BROWSE_STATE_FILE ||
|
||||
path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
|
||||
const data = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
authToken = data.token || null;
|
||||
return authToken;
|
||||
} catch (err: any) {
|
||||
console.error('[sidebar-agent] Failed to refresh auth token:', err.message);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Event relay to server ──────────────────────────────────────
|
||||
|
||||
async function sendEvent(event: Record<string, any>, tabId?: number): Promise<void> {
|
||||
if (!authToken) await refreshToken();
|
||||
if (!authToken) return;
|
||||
|
||||
try {
|
||||
await fetch(`${SERVER_URL}/sidebar-agent/event`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': `Bearer ${authToken}`,
|
||||
},
|
||||
body: JSON.stringify({ ...event, tabId: tabId ?? null }),
|
||||
});
|
||||
} catch (err) {
|
||||
console.error('[sidebar-agent] Failed to send event:', err);
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Claude subprocess ──────────────────────────────────────────
|
||||
|
||||
function shorten(str: string): string {
|
||||
return str
|
||||
.replace(new RegExp(B.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g'), '$B')
|
||||
.replace(/\/Users\/[^/]+/g, '~')
|
||||
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
|
||||
.replace(/\.claude\/skills\/gstack\//g, '')
|
||||
.replace(/browse\/dist\/browse/g, '$B');
|
||||
}
|
||||
|
||||
function describeToolCall(tool: string, input: any): string {
|
||||
if (!input) return '';
|
||||
|
||||
// For Bash commands, generate a plain-English description
|
||||
if (tool === 'Bash' && input.command) {
|
||||
const cmd = input.command;
|
||||
|
||||
// Browse binary commands — the most common case
|
||||
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
|
||||
if (browseMatch) {
|
||||
const browseCmd = browseMatch[1] || browseMatch[2];
|
||||
const args = cmd.split(/\s+/).slice(2).join(' ');
|
||||
switch (browseCmd) {
|
||||
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
|
||||
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
|
||||
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
|
||||
case 'click': return `Clicking ${args}`;
|
||||
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
|
||||
case 'text': return 'Reading page text';
|
||||
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
|
||||
case 'links': return 'Finding all links on the page';
|
||||
case 'forms': return 'Looking for forms';
|
||||
case 'console': return 'Checking browser console for errors';
|
||||
case 'network': return 'Checking network requests';
|
||||
case 'url': return 'Checking current URL';
|
||||
case 'back': return 'Going back';
|
||||
case 'forward': return 'Going forward';
|
||||
case 'reload': return 'Reloading the page';
|
||||
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
|
||||
case 'wait': return `Waiting for ${args}`;
|
||||
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
|
||||
case 'style': return `Changing CSS: ${args}`;
|
||||
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
|
||||
case 'prettyscreenshot': return 'Taking a clean screenshot';
|
||||
case 'css': return `Checking CSS property: ${args}`;
|
||||
case 'is': return `Checking if element is ${args}`;
|
||||
case 'diff': return `Comparing ${args}`;
|
||||
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
|
||||
case 'status': return 'Checking browser status';
|
||||
case 'tabs': return 'Listing open tabs';
|
||||
case 'focus': return 'Bringing browser to front';
|
||||
case 'select': return `Selecting option in ${args}`;
|
||||
case 'hover': return `Hovering over ${args}`;
|
||||
case 'viewport': return `Setting viewport to ${args}`;
|
||||
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
|
||||
default: return `Running browse ${browseCmd} ${args}`.trim();
|
||||
}
|
||||
}
|
||||
|
||||
// Non-browse bash commands
|
||||
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
|
||||
let short = shorten(cmd);
|
||||
return short.length > 100 ? short.slice(0, 100) + '…' : short;
|
||||
}
|
||||
|
||||
if (tool === 'Read' && input.file_path) {
|
||||
// Skip Claude's internal tool-result file reads — they're plumbing, not user-facing
|
||||
if (input.file_path.includes('/tool-results/') || input.file_path.includes('/.claude/projects/')) return '';
|
||||
return `Reading ${shorten(input.file_path)}`;
|
||||
}
|
||||
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
|
||||
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
|
||||
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
|
||||
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
|
||||
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
|
||||
}
|
||||
|
||||
// Keep the old name as an alias for backward compat
|
||||
function summarizeToolInput(tool: string, input: any): string {
|
||||
return describeToolCall(tool, input);
|
||||
}
|
||||
|
||||
/**
|
||||
* Scan a Claude stream event for the session canary. Returns the channel where
|
||||
* it leaked, or null if clean. Covers every outbound channel: text blocks,
|
||||
* text deltas, tool_use arguments (including nested URL/path/command strings),
|
||||
* and result payloads.
|
||||
*/
|
||||
function detectCanaryLeak(event: any, canary: string, buf?: DeltaBuffer): string | null {
|
||||
if (!canary) return null;
|
||||
|
||||
if (event.type === 'assistant' && event.message?.content) {
|
||||
for (const block of event.message.content) {
|
||||
if (block.type === 'text' && typeof block.text === 'string' && block.text.includes(canary)) {
|
||||
return 'assistant_text';
|
||||
}
|
||||
if (block.type === 'tool_use' && checkCanaryInStructure(block.input, canary)) {
|
||||
return `tool_use:${block.name}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
|
||||
if (checkCanaryInStructure(event.content_block.input, canary)) {
|
||||
return `tool_use:${event.content_block.name}`;
|
||||
}
|
||||
}
|
||||
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta') {
|
||||
if (typeof event.delta.text === 'string') {
|
||||
// Rolling buffer: an attacker can ask Claude to emit the canary split
|
||||
// across two deltas (e.g., "CANARY-" then "ABCDEF"). A per-delta
|
||||
// substring check misses this. Concatenate the previous tail with
|
||||
// this chunk and search, then trim the tail to last canary.length-1
|
||||
// chars for the next event.
|
||||
const combined = buf ? buf.text_delta + event.delta.text : event.delta.text;
|
||||
if (combined.includes(canary)) return 'text_delta';
|
||||
if (buf) buf.text_delta = combined.slice(-(canary.length - 1));
|
||||
}
|
||||
}
|
||||
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
|
||||
if (typeof event.delta.partial_json === 'string') {
|
||||
const combined = buf ? buf.input_json_delta + event.delta.partial_json : event.delta.partial_json;
|
||||
if (combined.includes(canary)) return 'tool_input_delta';
|
||||
if (buf) buf.input_json_delta = combined.slice(-(canary.length - 1));
|
||||
}
|
||||
}
|
||||
if (event.type === 'content_block_stop' && buf) {
|
||||
// Block boundary — reset the rolling buffer so a canary straddling
|
||||
// two independent tool_use blocks isn't inferred.
|
||||
buf.text_delta = '';
|
||||
buf.input_json_delta = '';
|
||||
}
|
||||
if (event.type === 'result' && typeof event.result === 'string' && event.result.includes(canary)) {
|
||||
return 'result';
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Rolling-window tails for delta canary detection. See detectCanaryLeak. */
|
||||
interface DeltaBuffer {
|
||||
text_delta: string;
|
||||
input_json_delta: string;
|
||||
}
|
||||
|
||||
interface CanaryContext {
|
||||
canary: string;
|
||||
pageUrl: string;
|
||||
onLeak: (channel: string) => void;
|
||||
deltaBuf: DeltaBuffer;
|
||||
}
|
||||
|
||||
interface ToolResultScanContext {
|
||||
scan: (toolName: string, text: string) => Promise<void>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Per-tab map of tool_use_id → tool name. Lets the tool_result handler
|
||||
* know what tool produced the content (Read, Grep, Glob, Bash $B ...) so
|
||||
* we can tag attack logs with the ingress source.
|
||||
*/
|
||||
const toolUseRegistry = new Map<string, { toolName: string; toolInput: unknown }>();
|
||||
|
||||
/**
|
||||
* Extract plain-text content from a tool_result block. The Claude stream
|
||||
* encodes it as either a string or an array of content blocks (text, image).
|
||||
* We care about text — images can't carry prompt injection at this layer.
|
||||
*/
|
||||
function extractToolResultText(content: unknown): string {
|
||||
if (typeof content === 'string') return content;
|
||||
if (!Array.isArray(content)) return '';
|
||||
const parts: string[] = [];
|
||||
for (const block of content) {
|
||||
if (block && typeof block === 'object') {
|
||||
const b = block as Record<string, unknown>;
|
||||
if (b.type === 'text' && typeof b.text === 'string') parts.push(b.text);
|
||||
}
|
||||
}
|
||||
return parts.join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Tools whose outputs should be ML-scanned. Bash/$B outputs already get
|
||||
* scanned via the page-content flow. Read/Glob/Grep outputs have been
|
||||
* uncovered — Codex review flagged this gap. Adding coverage here closes it.
|
||||
*/
|
||||
const SCANNED_TOOLS = new Set(['Read', 'Grep', 'Glob', 'Bash', 'WebFetch']);
|
||||
|
||||
async function handleStreamEvent(event: any, tabId?: number, canaryCtx?: CanaryContext, toolResultScanCtx?: ToolResultScanContext): Promise<void> {
|
||||
// Canary check runs BEFORE any outbound send — we never want to relay
|
||||
// a leaked token to the sidepanel UI.
|
||||
if (canaryCtx) {
|
||||
const channel = detectCanaryLeak(event, canaryCtx.canary, canaryCtx.deltaBuf);
|
||||
if (channel) {
|
||||
canaryCtx.onLeak(channel);
|
||||
return; // drop the event — never relay content that leaked the canary
|
||||
}
|
||||
}
|
||||
|
||||
if (event.type === 'system' && event.session_id) {
|
||||
// Relay claude session ID for --resume support
|
||||
await sendEvent({ type: 'system', claudeSessionId: event.session_id }, tabId);
|
||||
}
|
||||
|
||||
if (event.type === 'assistant' && event.message?.content) {
|
||||
for (const block of event.message.content) {
|
||||
if (block.type === 'tool_use') {
|
||||
// Register the tool_use so we can correlate tool_results back to
|
||||
// the originating tool when they arrive in the next user-role message.
|
||||
if (block.id) toolUseRegistry.set(block.id, { toolName: block.name, toolInput: block.input });
|
||||
await sendEvent({ type: 'tool_use', tool: block.name, input: summarizeToolInput(block.name, block.input) }, tabId);
|
||||
} else if (block.type === 'text' && block.text) {
|
||||
await sendEvent({ type: 'text', text: block.text }, tabId);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Tool results come back in user-role messages. Content can be a string
|
||||
// or an array of typed content blocks.
|
||||
if (event.type === 'user' && event.message?.content) {
|
||||
for (const block of event.message.content) {
|
||||
if (block && typeof block === 'object' && block.type === 'tool_result') {
|
||||
const meta = block.tool_use_id ? toolUseRegistry.get(block.tool_use_id) : null;
|
||||
const toolName = meta?.toolName ?? 'Unknown';
|
||||
const text = extractToolResultText(block.content);
|
||||
// Scan this tool output with the ML classifier if the tool is in
|
||||
// the SCANNED_TOOLS set and the content is non-trivial.
|
||||
if (SCANNED_TOOLS.has(toolName) && text.length >= 32 && toolResultScanCtx) {
|
||||
// Fire-and-forget — never block the stream handler. If BLOCK
|
||||
// fires, onToolResultBlock handles kill + emit.
|
||||
toolResultScanCtx.scan(toolName, text).catch(() => {});
|
||||
}
|
||||
if (block.tool_use_id) toolUseRegistry.delete(block.tool_use_id);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
|
||||
if (event.content_block.id) {
|
||||
toolUseRegistry.set(event.content_block.id, {
|
||||
toolName: event.content_block.name,
|
||||
toolInput: event.content_block.input,
|
||||
});
|
||||
}
|
||||
await sendEvent({ type: 'tool_use', tool: event.content_block.name, input: summarizeToolInput(event.content_block.name, event.content_block.input) }, tabId);
|
||||
}
|
||||
|
||||
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta' && event.delta.text) {
|
||||
await sendEvent({ type: 'text_delta', text: event.delta.text }, tabId);
|
||||
}
|
||||
|
||||
// Relay tool results so the sidebar can show what happened
|
||||
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
|
||||
// Tool input streaming — skip, we already announced the tool
|
||||
}
|
||||
|
||||
if (event.type === 'result') {
|
||||
await sendEvent({ type: 'result', text: event.result || '' }, tabId);
|
||||
}
|
||||
|
||||
// Tool result events — summarize and relay
|
||||
if (event.type === 'tool_result' || (event.type === 'assistant' && event.message?.content)) {
|
||||
// Tool results come in the next assistant turn — handled above
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Fire the prompt-injection-detected event to the server. This terminates
|
||||
* the session from the sidepanel's perspective and renders the canary leak
|
||||
* banner. Also logs locally (salted hash + domain only) and fires telemetry
|
||||
* if configured.
|
||||
*/
|
||||
async function onCanaryLeaked(params: {
|
||||
tabId: number;
|
||||
channel: string;
|
||||
canary: string;
|
||||
pageUrl: string;
|
||||
}): Promise<void> {
|
||||
const { tabId, channel, canary, pageUrl } = params;
|
||||
const domain = extractDomain(pageUrl);
|
||||
console.warn(`[sidebar-agent] CANARY LEAK detected on ${channel} for tab ${tabId} (domain=${domain || 'unknown'})`);
|
||||
|
||||
// Local log — salted hash + domain only, never the payload
|
||||
logAttempt({
|
||||
ts: new Date().toISOString(),
|
||||
urlDomain: domain,
|
||||
payloadHash: hashPayload(canary), // hash the canary, not the payload (which might be leaked content)
|
||||
confidence: 1.0,
|
||||
layer: 'canary',
|
||||
verdict: 'block',
|
||||
});
|
||||
|
||||
// Broadcast to sidepanel so it can render the approved banner
|
||||
await sendEvent({
|
||||
type: 'security_event',
|
||||
verdict: 'block',
|
||||
reason: 'canary_leaked',
|
||||
layer: 'canary',
|
||||
channel,
|
||||
domain,
|
||||
}, tabId);
|
||||
|
||||
// Also emit agent_error so the sidepanel's existing error surface
|
||||
// reflects that the session terminated. Keeps old clients working.
|
||||
await sendEvent({
|
||||
type: 'agent_error',
|
||||
error: `Session terminated — prompt injection detected${domain ? ` from ${domain}` : ''}`,
|
||||
}, tabId);
|
||||
}
|
||||
|
||||
/**
|
||||
* Pre-spawn ML scan of the user message. If the classifier fires at BLOCK,
|
||||
* we log the attempt, emit a security_event to the sidepanel, and DO NOT
|
||||
* spawn claude. Returns true if the scan blocked the session.
|
||||
*
|
||||
* Fail-open: any classifier error or degraded state returns false (safe) so
|
||||
* the sidebar keeps working. The architectural controls (XML framing +
|
||||
* command allowlist, live in server.ts:554-577) still defend.
|
||||
*/
|
||||
async function preSpawnSecurityCheck(entry: QueueEntry): Promise<boolean> {
|
||||
const { message, canary, pageUrl, tabId } = entry;
|
||||
if (!message || message.length === 0) return false;
|
||||
const tid = tabId ?? 0;
|
||||
|
||||
// L4: scan the user message for direct injection patterns (TestSavantAI)
|
||||
// L4c: also scan with DeBERTa-v3 when ensemble is enabled (opt-in)
|
||||
const [contentSignal, debertaSignal] = await Promise.all([
|
||||
scanPageContent(message),
|
||||
scanPageContentDeberta(message),
|
||||
]);
|
||||
const signals: LayerSignal[] = [contentSignal, debertaSignal];
|
||||
|
||||
// L4b: only bother with Haiku if another layer already lit up at >= LOG_ONLY.
|
||||
// Saves ~70% of Haiku calls per plan §E1 "gating optimization".
|
||||
if (shouldRunTranscriptCheck(signals)) {
|
||||
const transcriptSignal = await checkTranscript({
|
||||
user_message: message,
|
||||
tool_calls: [], // no tool calls yet at session start
|
||||
});
|
||||
signals.push(transcriptSignal);
|
||||
}
|
||||
|
||||
const result = combineVerdict(signals);
|
||||
if (result.verdict !== 'block') return false;
|
||||
|
||||
// BLOCK verdict. Log + emit + refuse to spawn.
|
||||
const domain = extractDomain(pageUrl ?? '');
|
||||
const leaderSignal = signals.reduce((a, b) => (a.confidence > b.confidence ? a : b));
|
||||
|
||||
logAttempt({
|
||||
ts: new Date().toISOString(),
|
||||
urlDomain: domain,
|
||||
payloadHash: hashPayload(message),
|
||||
confidence: result.confidence,
|
||||
layer: leaderSignal.layer,
|
||||
verdict: 'block',
|
||||
});
|
||||
|
||||
console.warn(`[sidebar-agent] Pre-spawn BLOCK (${result.reason}) for tab ${tid}, confidence=${result.confidence.toFixed(3)}`);
|
||||
|
||||
await sendEvent({
|
||||
type: 'security_event',
|
||||
verdict: 'block',
|
||||
reason: result.reason ?? 'ml_classifier',
|
||||
layer: leaderSignal.layer,
|
||||
confidence: result.confidence,
|
||||
domain,
|
||||
}, tid);
|
||||
await sendEvent({
|
||||
type: 'agent_error',
|
||||
error: `Session blocked — prompt injection detected${domain ? ` from ${domain}` : ' in your message'}`,
|
||||
}, tid);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
async function askClaude(queueEntry: QueueEntry): Promise<void> {
|
||||
const { prompt, args, stateFile, cwd, tabId, canary, pageUrl } = queueEntry;
|
||||
const tid = tabId ?? 0;
|
||||
|
||||
processingTabs.add(tid);
|
||||
await sendEvent({ type: 'agent_start' }, tid);
|
||||
|
||||
// Pre-spawn ML scan: if the user message trips the ensemble, refuse to
|
||||
// spawn claude. Fail-open on classifier errors.
|
||||
if (await preSpawnSecurityCheck(queueEntry)) {
|
||||
processingTabs.delete(tid);
|
||||
return;
|
||||
}
|
||||
|
||||
return new Promise((resolve) => {
|
||||
// Canary context is set after proc is spawned (needs proc reference for kill).
|
||||
let canaryCtx: CanaryContext | undefined;
|
||||
let canaryTriggered = false;
|
||||
|
||||
// Use args from queue entry (server sets --model, --allowedTools, prompt framing).
|
||||
// Fall back to defaults only if queue entry has no args (backward compat).
|
||||
// Write doesn't expand attack surface beyond what Bash already provides.
|
||||
// The security boundary is the localhost-only message path, not the tool allowlist.
|
||||
let claudeArgs = args || ['-p', prompt, '--output-format', 'stream-json', '--verbose',
|
||||
'--allowedTools', 'Bash,Read,Glob,Grep,Write'];
|
||||
|
||||
// Validate cwd exists — queue may reference a stale worktree
|
||||
let effectiveCwd = cwd || process.cwd();
|
||||
try { fs.accessSync(effectiveCwd); } catch (err: any) {
|
||||
console.warn('[sidebar-agent] Worktree path inaccessible, falling back to cwd:', effectiveCwd, err.message);
|
||||
effectiveCwd = process.cwd();
|
||||
}
|
||||
|
||||
// Clear any stale cancel signal for this tab before starting
|
||||
const cancelFile = cancelFileForTab(tid);
|
||||
safeUnlink(cancelFile);
|
||||
|
||||
const proc = spawn('claude', claudeArgs, {
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
cwd: effectiveCwd,
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile || '',
|
||||
// Connect to the existing headed browse server, never start a new one.
|
||||
// BROWSE_PORT tells the CLI which port to check.
|
||||
// BROWSE_NO_AUTOSTART prevents spawning an invisible headless browser
|
||||
// if the headed server is down — fail fast with a clear error instead.
|
||||
BROWSE_PORT: process.env.BROWSE_PORT || '34567',
|
||||
BROWSE_NO_AUTOSTART: '1',
|
||||
// Pin this agent to its tab — prevents cross-tab interference
|
||||
// when multiple agents run simultaneously
|
||||
BROWSE_TAB: String(tid),
|
||||
},
|
||||
});
|
||||
|
||||
// Track active procs so kill-file polling can terminate them
|
||||
activeProcs.set(tid, proc);
|
||||
activeProc = proc;
|
||||
|
||||
proc.stdin.end();
|
||||
|
||||
// Now that proc exists, set up the canary-leak handler. It fires at most
|
||||
// once; on fire we kill the subprocess, emit security_event + agent_error,
|
||||
// and let the normal close handler resolve the promise.
|
||||
if (canary) {
|
||||
canaryCtx = {
|
||||
canary,
|
||||
pageUrl: pageUrl ?? '',
|
||||
deltaBuf: { text_delta: '', input_json_delta: '' },
|
||||
onLeak: (channel: string) => {
|
||||
if (canaryTriggered) return;
|
||||
canaryTriggered = true;
|
||||
onCanaryLeaked({ tabId: tid, channel, canary, pageUrl: pageUrl ?? '' });
|
||||
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
setTimeout(() => {
|
||||
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
}, 2000);
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Tool-result ML scan context. Addresses the Codex review gap: Read,
|
||||
// Grep, Glob, and WebFetch outputs enter Claude's context without
|
||||
// passing through the Bash $B pipeline that content-security.ts
|
||||
// already wraps. Scan them here.
|
||||
let toolResultBlockFired = false;
|
||||
const toolResultScanCtx: ToolResultScanContext = {
|
||||
scan: async (toolName: string, text: string) => {
|
||||
if (toolResultBlockFired) return;
|
||||
// Parallel L4 + L4c ensemble scan (DeBERTa no-op when disabled).
|
||||
// We run L4/L4c AND Haiku in parallel on tool outputs regardless of
|
||||
// L4's score, because BrowseSafe-Bench shows L4 (TestSavantAI) has
|
||||
// low recall on browser-agent-specific attacks (~15% at v1). Gating
|
||||
// Haiku on L4 meant our best signal almost never ran. The cost is
|
||||
// ~$0.002 + ~300ms per tool output, bounded by the Haiku timeout
|
||||
// and offset by Haiku actually seeing the real attack context.
|
||||
//
|
||||
// Haiku only runs when the Claude CLI is available (checkHaikuAvailable
|
||||
// caches the probe). In environments without it, the call returns a
|
||||
// degraded signal and the verdict falls back to L4 alone.
|
||||
const [contentSignal, debertaSignal, transcriptSignal] = await Promise.all([
|
||||
scanPageContent(text),
|
||||
scanPageContentDeberta(text),
|
||||
checkTranscript({
|
||||
user_message: queueEntry.message ?? '',
|
||||
tool_calls: [{ tool_name: toolName, tool_input: {} }],
|
||||
tool_output: text,
|
||||
}),
|
||||
]);
|
||||
const signals: LayerSignal[] = [contentSignal, debertaSignal, transcriptSignal];
|
||||
const result = combineVerdict(signals, { toolOutput: true });
|
||||
if (result.verdict !== 'block') return;
|
||||
toolResultBlockFired = true;
|
||||
const domain = extractDomain(pageUrl ?? '');
|
||||
const payloadHash = hashPayload(text.slice(0, 4096));
|
||||
|
||||
// Log pending — if the user overrides, we'll update via a separate
|
||||
// log line. The attempts.jsonl is append-only so both entries survive.
|
||||
logAttempt({
|
||||
ts: new Date().toISOString(),
|
||||
urlDomain: domain,
|
||||
payloadHash,
|
||||
confidence: result.confidence,
|
||||
layer: 'testsavant_content',
|
||||
verdict: 'block',
|
||||
});
|
||||
console.warn(`[sidebar-agent] Tool-result BLOCK on ${toolName} for tab ${tid} (confidence=${result.confidence.toFixed(3)}) — awaiting user decision`);
|
||||
|
||||
// Surface a REVIEWABLE block event. Sidepanel renders the suspected
|
||||
// text + layer scores + [Allow and continue] / [Block session] buttons.
|
||||
// The user has 60s to decide; default is BLOCK (safe fallback).
|
||||
const layerScores = signals
|
||||
.filter((s) => s.confidence > 0)
|
||||
.map((s) => ({ layer: s.layer, confidence: s.confidence }));
|
||||
await sendEvent({
|
||||
type: 'security_event',
|
||||
verdict: 'block',
|
||||
reason: 'tool_result_ml',
|
||||
layer: 'testsavant_content',
|
||||
confidence: result.confidence,
|
||||
domain,
|
||||
tool: toolName,
|
||||
reviewable: true,
|
||||
suspected_text: excerptForReview(text),
|
||||
signals: layerScores,
|
||||
}, tid);
|
||||
|
||||
// Poll for the user's decision. Default to BLOCK on timeout.
|
||||
const REVIEW_TIMEOUT_MS = 60_000;
|
||||
const POLL_MS = 500;
|
||||
clearDecision(tid); // clear any stale decision from a prior session
|
||||
const deadline = Date.now() + REVIEW_TIMEOUT_MS;
|
||||
let decision: 'allow' | 'block' = 'block';
|
||||
let decisionReason = 'timeout';
|
||||
while (Date.now() < deadline) {
|
||||
const rec = readDecision(tid);
|
||||
if (rec?.decision === 'allow' || rec?.decision === 'block') {
|
||||
decision = rec.decision;
|
||||
decisionReason = rec.reason ?? 'user';
|
||||
break;
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, POLL_MS));
|
||||
}
|
||||
clearDecision(tid);
|
||||
|
||||
if (decision === 'allow') {
|
||||
// User overrode. Log the override so the audit trail captures it.
|
||||
// toolResultBlockFired stays true so we don't re-prompt within the
|
||||
// same message — one override per BLOCK event.
|
||||
logAttempt({
|
||||
ts: new Date().toISOString(),
|
||||
urlDomain: domain,
|
||||
payloadHash,
|
||||
confidence: result.confidence,
|
||||
layer: 'testsavant_content',
|
||||
verdict: 'user_overrode',
|
||||
});
|
||||
await sendEvent({
|
||||
type: 'security_event',
|
||||
verdict: 'user_overrode',
|
||||
reason: 'tool_result_ml',
|
||||
layer: 'testsavant_content',
|
||||
confidence: result.confidence,
|
||||
domain,
|
||||
tool: toolName,
|
||||
}, tid);
|
||||
console.warn(`[sidebar-agent] Tab ${tid}: user overrode BLOCK — session continues`);
|
||||
// Let the block stay consumed; reset the flag so subsequent tool
|
||||
// results get scanned fresh.
|
||||
toolResultBlockFired = false;
|
||||
return;
|
||||
}
|
||||
|
||||
// User chose BLOCK (or timed out). Kill the session as before.
|
||||
await sendEvent({
|
||||
type: 'agent_error',
|
||||
error: `Session terminated — prompt injection detected in ${toolName} output${decisionReason === 'timeout' ? ' (review timeout)' : ''}`,
|
||||
}, tid);
|
||||
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
setTimeout(() => {
|
||||
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
}, 2000);
|
||||
},
|
||||
};
|
||||
|
||||
// Poll for per-tab cancel signal from server's killAgent()
|
||||
const cancelCheck = setInterval(() => {
|
||||
try {
|
||||
if (fs.existsSync(cancelFile)) {
|
||||
console.log(`[sidebar-agent] Cancel signal received for tab ${tid} — killing claude subprocess`);
|
||||
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
|
||||
fs.unlinkSync(cancelFile);
|
||||
clearInterval(cancelCheck);
|
||||
}
|
||||
} catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
|
||||
}, 500);
|
||||
|
||||
let buffer = '';
|
||||
|
||||
proc.stdout.on('data', (data: Buffer) => {
|
||||
buffer += data.toString();
|
||||
const lines = buffer.split('\n');
|
||||
buffer = lines.pop() || '';
|
||||
for (const line of lines) {
|
||||
if (!line.trim()) continue;
|
||||
try { handleStreamEvent(JSON.parse(line), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
|
||||
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse stream line:`, line.slice(0, 100), err.message);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
let stderrBuffer = '';
|
||||
proc.stderr.on('data', (data: Buffer) => {
|
||||
stderrBuffer += data.toString();
|
||||
});
|
||||
|
||||
proc.on('close', (code) => {
|
||||
clearInterval(cancelCheck);
|
||||
activeProc = null;
|
||||
activeProcs.delete(tid);
|
||||
if (buffer.trim()) {
|
||||
try { handleStreamEvent(JSON.parse(buffer), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
|
||||
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse final buffer:`, buffer.slice(0, 100), err.message);
|
||||
}
|
||||
}
|
||||
const doneEvent: Record<string, any> = { type: 'agent_done' };
|
||||
if (code !== 0 && stderrBuffer.trim()) {
|
||||
doneEvent.stderr = stderrBuffer.trim().slice(-500);
|
||||
}
|
||||
sendEvent(doneEvent, tid).then(() => {
|
||||
processingTabs.delete(tid);
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
|
||||
proc.on('error', (err) => {
|
||||
clearInterval(cancelCheck);
|
||||
activeProc = null;
|
||||
const errorMsg = stderrBuffer.trim()
|
||||
? `${err.message}\nstderr: ${stderrBuffer.trim().slice(-500)}`
|
||||
: err.message;
|
||||
sendEvent({ type: 'agent_error', error: errorMsg }, tid).then(() => {
|
||||
processingTabs.delete(tid);
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
|
||||
// Timeout (default 300s / 5 min — multi-page tasks need time)
|
||||
const timeoutMs = parseInt(process.env.SIDEBAR_AGENT_TIMEOUT || '300000', 10);
|
||||
setTimeout(() => {
|
||||
try { proc.kill('SIGTERM'); } catch (killErr: any) {
|
||||
console.warn(`[sidebar-agent] Tab ${tid}: Failed to kill timed-out process:`, killErr.message);
|
||||
}
|
||||
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
|
||||
const timeoutMsg = stderrBuffer.trim()
|
||||
? `Timed out after ${timeoutMs / 1000}s\nstderr: ${stderrBuffer.trim().slice(-500)}`
|
||||
: `Timed out after ${timeoutMs / 1000}s`;
|
||||
sendEvent({ type: 'agent_error', error: timeoutMsg }, tid).then(() => {
|
||||
processingTabs.delete(tid);
|
||||
resolve();
|
||||
});
|
||||
}, timeoutMs);
|
||||
});
|
||||
}
|
||||
|
||||
// ─── Poll loop ───────────────────────────────────────────────────
|
||||
|
||||
function countLines(): number {
|
||||
try {
|
||||
return fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean).length;
|
||||
} catch (err: any) {
|
||||
console.error('[sidebar-agent] Failed to read queue file:', err.message);
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
function readLine(n: number): string | null {
|
||||
try {
|
||||
const lines = fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean);
|
||||
return lines[n - 1] || null;
|
||||
} catch (err: any) {
|
||||
console.error(`[sidebar-agent] Failed to read queue line ${n}:`, err.message);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function poll() {
|
||||
const current = countLines();
|
||||
if (current <= lastLine) return;
|
||||
|
||||
while (lastLine < current) {
|
||||
lastLine++;
|
||||
const line = readLine(lastLine);
|
||||
if (!line) continue;
|
||||
|
||||
let parsed: unknown;
|
||||
try { parsed = JSON.parse(line); } catch (err: any) {
|
||||
console.warn(`[sidebar-agent] Skipping malformed queue entry at line ${lastLine}:`, line.slice(0, 80), err.message);
|
||||
continue;
|
||||
}
|
||||
if (!isValidQueueEntry(parsed)) {
|
||||
console.warn(`[sidebar-agent] Skipping invalid queue entry at line ${lastLine}: failed schema validation`);
|
||||
continue;
|
||||
}
|
||||
const entry = parsed;
|
||||
|
||||
const tid = entry.tabId ?? 0;
|
||||
// Skip if this tab already has an agent running — server queues per-tab
|
||||
if (processingTabs.has(tid)) continue;
|
||||
|
||||
console.log(`[sidebar-agent] Processing tab ${tid}: "${entry.message}"`);
|
||||
// Write to inbox so workspace agent can pick it up
|
||||
writeToInbox(entry.message || entry.prompt, entry.pageUrl, entry.sessionId);
|
||||
// Fire and forget — each tab's agent runs concurrently
|
||||
askClaude(entry).catch((err) => {
|
||||
console.error(`[sidebar-agent] Error on tab ${tid}:`, err);
|
||||
sendEvent({ type: 'agent_error', error: String(err) }, tid);
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Main ────────────────────────────────────────────────────────
|
||||
|
||||
function pollKillFile(): void {
|
||||
try {
|
||||
const stat = fs.statSync(KILL_FILE);
|
||||
const mtime = stat.mtimeMs;
|
||||
if (mtime > lastKillTs) {
|
||||
lastKillTs = mtime;
|
||||
if (activeProcs.size > 0) {
|
||||
console.log(`[sidebar-agent] Kill signal received — terminating ${activeProcs.size} active agent(s)`);
|
||||
for (const [tid, proc] of activeProcs) {
|
||||
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
|
||||
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 2000);
|
||||
processingTabs.delete(tid);
|
||||
}
|
||||
activeProcs.clear();
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// Kill file doesn't exist yet — normal state
|
||||
}
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const dir = path.dirname(QUEUE);
|
||||
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
|
||||
if (!fs.existsSync(QUEUE)) fs.writeFileSync(QUEUE, '', { mode: 0o600 });
|
||||
try { fs.chmodSync(QUEUE, 0o600); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
|
||||
|
||||
lastLine = countLines();
|
||||
await refreshToken();
|
||||
|
||||
console.log(`[sidebar-agent] Started. Watching ${QUEUE} from line ${lastLine}`);
|
||||
console.log(`[sidebar-agent] Server: ${SERVER_URL}`);
|
||||
console.log(`[sidebar-agent] Browse binary: ${B}`);
|
||||
|
||||
// If GSTACK_SECURITY_ENSEMBLE=deberta is set, also warm the DeBERTa-v3
|
||||
// ensemble classifier. Fire-and-forget alongside TestSavantAI — they
|
||||
// warm in parallel. No-op when the env var is unset.
|
||||
loadDeberta((msg) => console.log(`[security-classifier] ${msg}`))
|
||||
.catch((err) => console.warn('[sidebar-agent] DeBERTa warmup failed:', err?.message));
|
||||
|
||||
// Warm up the ML classifier in the background. First call triggers a 112MB
|
||||
// download (~30s on average broadband). Non-blocking — the sidebar stays
|
||||
// functional on cold start; classifier just reports 'off' until warmed.
|
||||
//
|
||||
// On warmup completion (success or failure), write the classifier status to
|
||||
// ~/.gstack/security/session-state.json so server.ts's /health endpoint can
|
||||
// report it to the sidepanel for shield icon rendering.
|
||||
loadTestsavant((msg) => console.log(`[security-classifier] ${msg}`))
|
||||
.then(() => {
|
||||
const s = getClassifierStatus();
|
||||
console.log(`[sidebar-agent] Classifier warmup complete: ${JSON.stringify(s)}`);
|
||||
const existing = readSessionState();
|
||||
writeSessionState({
|
||||
sessionId: existing?.sessionId ?? String(process.pid),
|
||||
canary: existing?.canary ?? '',
|
||||
warnedDomains: existing?.warnedDomains ?? [],
|
||||
classifierStatus: s,
|
||||
lastUpdated: new Date().toISOString(),
|
||||
});
|
||||
})
|
||||
.catch((err) => console.warn('[sidebar-agent] Classifier warmup failed (degraded mode):', err?.message));
|
||||
|
||||
setInterval(poll, POLL_MS);
|
||||
setInterval(pollKillFile, POLL_MS);
|
||||
}
|
||||
|
||||
main().catch(console.error);
|
||||
@@ -0,0 +1,556 @@
|
||||
/**
|
||||
* Terminal Agent — PTY-backed Claude Code terminal for the gstack browser
|
||||
* sidebar. Translates the phoenix gbrowser PTY (cmd/gbd/terminal.go) into
|
||||
* Bun, with a few changes informed by codex's outside-voice review:
|
||||
*
|
||||
* - Lives in a separate non-compiled bun process from sidebar-agent.ts so
|
||||
* a bug in WS framing or PTY cleanup can't take down the chat path.
|
||||
* - Binds 127.0.0.1 only — never on the dual-listener tunnel surface.
|
||||
* - Origin validation on the WS upgrade is REQUIRED (not defense-in-depth)
|
||||
* because a localhost shell WS is a real cross-site WebSocket-hijacking
|
||||
* target.
|
||||
* - Cookie-based auth via /internal/grant from the parent server, not a
|
||||
* token in /health.
|
||||
* - Lazy spawn: claude PTY is not spawned until the WS receives its first
|
||||
* data frame. Sidebar opens that never type don't burn a claude session.
|
||||
* - PTY dies with WS close (one PTY per WS). v1.1 may add session
|
||||
* survival; for v1 we match phoenix's lifecycle.
|
||||
*
|
||||
* The PTY uses Bun's `terminal:` spawn option (verified at impl time on
|
||||
* Bun 1.3.10): pass cols/rows + a data callback; write input via
|
||||
* `proc.terminal.write(buf)`; resize via `proc.terminal.resize(cols, rows)`.
|
||||
*/
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as crypto from 'crypto';
|
||||
import { safeUnlink } from './error-handling';
|
||||
|
||||
const STATE_FILE = process.env.BROWSE_STATE_FILE || path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
|
||||
const PORT_FILE = path.join(path.dirname(STATE_FILE), 'terminal-port');
|
||||
const BROWSE_SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '0', 10);
|
||||
const EXTENSION_ID = process.env.BROWSE_EXTENSION_ID || ''; // optional: tighten Origin check
|
||||
const INTERNAL_TOKEN = crypto.randomBytes(32).toString('base64url'); // shared with parent server via env at spawn
|
||||
|
||||
// In-memory cookie token registry. Parent posts /internal/grant after
|
||||
// /pty-session; we validate WS cookies against this set.
|
||||
const validTokens = new Set<string>();
|
||||
|
||||
// Active PTY session per WS. One terminal per connection. Codex finding #4:
|
||||
// uncaught handlers below catch bugs in framing/cleanup so they don't kill
|
||||
// the listener loop.
|
||||
process.on('uncaughtException', (err) => {
|
||||
console.error('[terminal-agent] uncaughtException:', err);
|
||||
});
|
||||
process.on('unhandledRejection', (reason) => {
|
||||
console.error('[terminal-agent] unhandledRejection:', reason);
|
||||
});
|
||||
|
||||
interface PtySession {
|
||||
proc: any | null; // Bun.Subprocess once spawned
|
||||
cols: number;
|
||||
rows: number;
|
||||
cookie: string;
|
||||
spawned: boolean;
|
||||
}
|
||||
|
||||
const sessions = new WeakMap<any, PtySession>(); // ws -> session
|
||||
|
||||
/** Find claude on PATH. */
|
||||
function findClaude(): string | null {
|
||||
// Test-only override. Lets the integration tests spawn /bin/bash instead
|
||||
// of requiring claude to be installed on every CI runner. NEVER read in
|
||||
// production (sidebar UI). Documented in browse/test/terminal-agent-integration.test.ts.
|
||||
const override = process.env.BROWSE_TERMINAL_BINARY;
|
||||
if (override && fs.existsSync(override)) return override;
|
||||
// Bun.which is sync and respects PATH. Falls back to a small list of
|
||||
// common install locations if PATH is stripped (e.g., launched from
|
||||
// Conductor with a minimal env).
|
||||
const which = (Bun as any).which?.('claude');
|
||||
if (which) return which;
|
||||
const candidates = [
|
||||
'/opt/homebrew/bin/claude',
|
||||
'/usr/local/bin/claude',
|
||||
`${process.env.HOME}/.local/bin/claude`,
|
||||
`${process.env.HOME}/.bun/bin/claude`,
|
||||
`${process.env.HOME}/.npm-global/bin/claude`,
|
||||
];
|
||||
for (const c of candidates) {
|
||||
try { fs.accessSync(c, fs.constants.X_OK); return c; } catch {}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Probe + persist claude availability for the bootstrap card. */
|
||||
function writeClaudeAvailable(): void {
|
||||
const stateDir = path.dirname(STATE_FILE);
|
||||
try { fs.mkdirSync(stateDir, { recursive: true, mode: 0o700 }); } catch {}
|
||||
const found = findClaude();
|
||||
const status = {
|
||||
available: !!found,
|
||||
path: found || undefined,
|
||||
install_url: 'https://docs.anthropic.com/en/docs/claude-code',
|
||||
checked_at: new Date().toISOString(),
|
||||
};
|
||||
const target = path.join(stateDir, 'claude-available.json');
|
||||
const tmp = path.join(stateDir, `.tmp-claude-${process.pid}`);
|
||||
try {
|
||||
fs.writeFileSync(tmp, JSON.stringify(status, null, 2), { mode: 0o600 });
|
||||
fs.renameSync(tmp, target);
|
||||
} catch {
|
||||
safeUnlink(tmp);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* System-prompt hint passed to claude via --append-system-prompt. Tells
|
||||
* claude what tab-awareness affordances exist in this session so it
|
||||
* doesn't have to discover them by trial. The user can override anything
|
||||
* here just by saying so — system prompt is a soft hint, not a contract.
|
||||
*
|
||||
* Two paths claude has:
|
||||
* 1. Read live state from <stateDir>/tabs.json + active-tab.json
|
||||
* (updated continuously by the gstack browser extension).
|
||||
* 2. Run $B tab, $B tabs, $B tab-each <command> to act on tabs. The
|
||||
* tab-each helper fans a single command across every open tab and
|
||||
* returns per-tab results as JSON.
|
||||
*/
|
||||
function buildTabAwarenessHint(stateDir: string): string {
|
||||
const tabsFile = path.join(stateDir, 'tabs.json');
|
||||
const activeFile = path.join(stateDir, 'active-tab.json');
|
||||
return [
|
||||
'You are running inside the gstack browser sidebar with live access to the user\'s browser tabs.',
|
||||
'',
|
||||
'Tab state files (kept fresh automatically by the extension):',
|
||||
` ${tabsFile} — all open tabs (id, url, title, active, pinned)`,
|
||||
` ${activeFile} — the currently active tab`,
|
||||
'Read these any time the user asks about "tabs", "the current page", or anything multi-tab. Do NOT shell out to $B tabs just to learn what\'s open — read the file.',
|
||||
'',
|
||||
'Tab manipulation commands (via $B):',
|
||||
' $B tab <id> — switch to a tab',
|
||||
' $B newtab [url] — open a new tab',
|
||||
' $B closetab [id] — close a tab (current if no id)',
|
||||
' $B tab-each <command> — fan out a command across every tab; returns JSON results',
|
||||
'',
|
||||
'When the user asks for multi-tab work, prefer $B tab-each. Examples:',
|
||||
' $B tab-each snapshot -i — grab a snapshot from every tab',
|
||||
' $B tab-each text — pull clean text from every tab',
|
||||
' $B tab-each title — list every tab\'s title',
|
||||
'',
|
||||
'You\'re in a real terminal with a real PTY — slash commands, /resume, ANSI colors all work as in a normal claude session.',
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
/** Spawn claude in a PTY. Returns null if claude not on PATH. */
|
||||
function spawnClaude(cols: number, rows: number, onData: (chunk: Buffer) => void) {
|
||||
const claudePath = findClaude();
|
||||
if (!claudePath) return null;
|
||||
|
||||
// Match phoenix env so claude knows which browse server to talk to and
|
||||
// doesn't try to autostart its own. BROWSE_HEADED=1 keeps the existing
|
||||
// headed-mode browser; BROWSE_NO_AUTOSTART prevents claude's gstack
|
||||
// tooling from racing to spawn another server.
|
||||
const env: Record<string, string> = {
|
||||
...process.env as any,
|
||||
BROWSE_PORT: String(BROWSE_SERVER_PORT),
|
||||
BROWSE_STATE_FILE: STATE_FILE,
|
||||
BROWSE_NO_AUTOSTART: '1',
|
||||
BROWSE_HEADED: '1',
|
||||
TERM: 'xterm-256color',
|
||||
COLORTERM: 'truecolor',
|
||||
};
|
||||
|
||||
// --append-system-prompt is the right injection surface (per `claude --help`):
|
||||
// it gets appended to the model's system prompt, so claude treats this as
|
||||
// contextual guidance, not a user message. Don't use a leading PTY write
|
||||
// for this — that would show up as if the user typed the hint, polluting
|
||||
// the visible transcript.
|
||||
const stateDir = path.dirname(STATE_FILE);
|
||||
const tabHint = buildTabAwarenessHint(stateDir);
|
||||
|
||||
const proc = (Bun as any).spawn([claudePath, '--append-system-prompt', tabHint], {
|
||||
terminal: {
|
||||
rows,
|
||||
cols,
|
||||
data(_terminal: any, chunk: Buffer) { onData(chunk); },
|
||||
},
|
||||
env,
|
||||
});
|
||||
return proc;
|
||||
}
|
||||
|
||||
/** Cleanup a PTY session: SIGINT, then SIGKILL after 3s. */
|
||||
function disposeSession(session: PtySession): void {
|
||||
try { session.proc?.terminal?.close?.(); } catch {}
|
||||
if (session.proc?.pid) {
|
||||
try { session.proc.kill?.('SIGINT'); } catch {}
|
||||
setTimeout(() => {
|
||||
try {
|
||||
if (session.proc && !session.proc.killed) session.proc.kill?.('SIGKILL');
|
||||
} catch {}
|
||||
}, 3000);
|
||||
}
|
||||
session.proc = null;
|
||||
session.spawned = false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the HTTP server. Two routes:
|
||||
* POST /internal/grant — parent server pushes a fresh cookie token
|
||||
* GET /ws — extension upgrades to WebSocket (PTY transport)
|
||||
*
|
||||
* Everything else returns 404. The listener binds 127.0.0.1 only.
|
||||
*/
|
||||
function buildServer() {
|
||||
return Bun.serve({
|
||||
hostname: '127.0.0.1',
|
||||
port: 0,
|
||||
idleTimeout: 0, // PTY connections are long-lived; default idleTimeout would kill them
|
||||
|
||||
fetch(req, server) {
|
||||
const url = new URL(req.url);
|
||||
|
||||
// /internal/grant — loopback-only handshake from parent server.
|
||||
if (url.pathname === '/internal/grant' && req.method === 'POST') {
|
||||
const auth = req.headers.get('authorization');
|
||||
if (auth !== `Bearer ${INTERNAL_TOKEN}`) {
|
||||
return new Response('forbidden', { status: 403 });
|
||||
}
|
||||
return req.json().then((body: any) => {
|
||||
if (typeof body?.token === 'string' && body.token.length > 16) {
|
||||
validTokens.add(body.token);
|
||||
}
|
||||
return new Response('ok');
|
||||
}).catch(() => new Response('bad', { status: 400 }));
|
||||
}
|
||||
|
||||
// /internal/revoke — drop a token (called on WS close or bootstrap reload)
|
||||
if (url.pathname === '/internal/revoke' && req.method === 'POST') {
|
||||
const auth = req.headers.get('authorization');
|
||||
if (auth !== `Bearer ${INTERNAL_TOKEN}`) {
|
||||
return new Response('forbidden', { status: 403 });
|
||||
}
|
||||
return req.json().then((body: any) => {
|
||||
if (typeof body?.token === 'string') validTokens.delete(body.token);
|
||||
return new Response('ok');
|
||||
}).catch(() => new Response('bad', { status: 400 }));
|
||||
}
|
||||
|
||||
// /claude-available — bootstrap card hits this when user clicks "I installed it".
|
||||
if (url.pathname === '/claude-available' && req.method === 'GET') {
|
||||
writeClaudeAvailable();
|
||||
const found = findClaude();
|
||||
return new Response(JSON.stringify({ available: !!found, path: found }), {
|
||||
status: 200,
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
});
|
||||
}
|
||||
|
||||
// /ws — WebSocket upgrade. CRITICAL gates:
|
||||
// (1) Origin must be chrome-extension://<id>. Cross-site WS hijacking
|
||||
// defense — required, not optional.
|
||||
// (2) Token must be in validTokens. We accept the token via two
|
||||
// transports for compatibility:
|
||||
// - Sec-WebSocket-Protocol (preferred for browsers — the only
|
||||
// auth header settable from the browser WebSocket API)
|
||||
// - Cookie gstack_pty (works for non-browser callers and
|
||||
// same-port browser callers; doesn't survive the cross-port
|
||||
// jump from server.ts:34567 to the agent's random port
|
||||
// when SameSite=Strict is set)
|
||||
// Either path works; both verify against the same in-memory
|
||||
// validTokens Set, populated by the parent server's
|
||||
// authenticated /pty-session → /internal/grant chain.
|
||||
if (url.pathname === '/ws') {
|
||||
const origin = req.headers.get('origin') || '';
|
||||
const isExtensionOrigin = origin.startsWith('chrome-extension://');
|
||||
if (!isExtensionOrigin) {
|
||||
return new Response('forbidden origin', { status: 403 });
|
||||
}
|
||||
if (EXTENSION_ID && origin !== `chrome-extension://${EXTENSION_ID}`) {
|
||||
return new Response('forbidden origin', { status: 403 });
|
||||
}
|
||||
|
||||
// Try Sec-WebSocket-Protocol first. Format: a single token, possibly
|
||||
// with a `gstack-pty.` prefix (which we strip). Browsers send a
|
||||
// comma-separated list when multiple were requested; we pick the
|
||||
// first that matches a known token.
|
||||
const protoHeader = req.headers.get('sec-websocket-protocol') || '';
|
||||
let token: string | null = null;
|
||||
let acceptedProtocol: string | null = null;
|
||||
for (const raw of protoHeader.split(',').map(s => s.trim()).filter(Boolean)) {
|
||||
const candidate = raw.startsWith('gstack-pty.') ? raw.slice('gstack-pty.'.length) : raw;
|
||||
if (validTokens.has(candidate)) {
|
||||
token = candidate;
|
||||
acceptedProtocol = raw;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback: Cookie gstack_pty (legacy / non-browser callers).
|
||||
if (!token) {
|
||||
const cookieHeader = req.headers.get('cookie') || '';
|
||||
for (const part of cookieHeader.split(';')) {
|
||||
const [name, ...rest] = part.trim().split('=');
|
||||
if (name === 'gstack_pty') {
|
||||
const candidate = rest.join('=') || null;
|
||||
if (candidate && validTokens.has(candidate)) {
|
||||
token = candidate;
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (!token) {
|
||||
return new Response('unauthorized', { status: 401 });
|
||||
}
|
||||
|
||||
const upgraded = server.upgrade(req, {
|
||||
data: { cookie: token },
|
||||
// Echo the protocol back so the browser accepts the upgrade.
|
||||
// Required when the client sends Sec-WebSocket-Protocol — the
|
||||
// server MUST select one of the offered protocols, otherwise
|
||||
// the browser closes the connection immediately.
|
||||
...(acceptedProtocol ? { headers: { 'Sec-WebSocket-Protocol': acceptedProtocol } } : {}),
|
||||
});
|
||||
return upgraded ? undefined : new Response('upgrade failed', { status: 500 });
|
||||
}
|
||||
|
||||
return new Response('not found', { status: 404 });
|
||||
},
|
||||
|
||||
websocket: {
|
||||
message(ws, raw) {
|
||||
let session = sessions.get(ws);
|
||||
if (!session) {
|
||||
session = {
|
||||
proc: null,
|
||||
cols: 80,
|
||||
rows: 24,
|
||||
cookie: (ws.data as any)?.cookie || '',
|
||||
spawned: false,
|
||||
};
|
||||
sessions.set(ws, session);
|
||||
}
|
||||
|
||||
// Text frames are control messages: {type: "resize", cols, rows} or
|
||||
// {type: "tabSwitch", tabId, url, title}. Binary frames are raw input
|
||||
// bytes destined for the PTY stdin.
|
||||
if (typeof raw === 'string') {
|
||||
let msg: any;
|
||||
try { msg = JSON.parse(raw); } catch { return; }
|
||||
if (msg?.type === 'resize') {
|
||||
const cols = Math.max(2, Math.floor(Number(msg.cols) || 80));
|
||||
const rows = Math.max(2, Math.floor(Number(msg.rows) || 24));
|
||||
session.cols = cols;
|
||||
session.rows = rows;
|
||||
try { session.proc?.terminal?.resize?.(cols, rows); } catch {}
|
||||
return;
|
||||
}
|
||||
if (msg?.type === 'tabSwitch') {
|
||||
handleTabSwitch(msg);
|
||||
return;
|
||||
}
|
||||
if (msg?.type === 'tabState') {
|
||||
handleTabState(msg);
|
||||
return;
|
||||
}
|
||||
// Unknown text frame — ignore.
|
||||
return;
|
||||
}
|
||||
|
||||
// Binary input. Lazy-spawn claude on the first byte.
|
||||
if (!session.spawned) {
|
||||
session.spawned = true;
|
||||
const proc = spawnClaude(session.cols, session.rows, (chunk) => {
|
||||
try { ws.sendBinary(chunk); } catch {}
|
||||
});
|
||||
if (!proc) {
|
||||
try {
|
||||
ws.send(JSON.stringify({
|
||||
type: 'error',
|
||||
code: 'CLAUDE_NOT_FOUND',
|
||||
message: 'claude CLI not on PATH. Install: https://docs.anthropic.com/en/docs/claude-code',
|
||||
}));
|
||||
ws.close(4404, 'claude not found');
|
||||
} catch {}
|
||||
return;
|
||||
}
|
||||
session.proc = proc;
|
||||
// Watch for child exit so the WS closes cleanly when claude exits.
|
||||
proc.exited?.then?.(() => {
|
||||
try { ws.close(1000, 'pty exited'); } catch {}
|
||||
});
|
||||
}
|
||||
try {
|
||||
// raw is a Uint8Array; Bun.Terminal.write accepts string|Buffer.
|
||||
// Convert to Buffer for safety.
|
||||
session.proc?.terminal?.write?.(Buffer.from(raw as Uint8Array));
|
||||
} catch (err) {
|
||||
console.error('[terminal-agent] terminal.write failed:', err);
|
||||
}
|
||||
},
|
||||
|
||||
close(ws) {
|
||||
const session = sessions.get(ws);
|
||||
if (session) {
|
||||
disposeSession(session);
|
||||
if (session.cookie) {
|
||||
// Drop the cookie so it can't be replayed against a new PTY.
|
||||
validTokens.delete(session.cookie);
|
||||
}
|
||||
sessions.delete(ws);
|
||||
}
|
||||
},
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Tab-switch helper: write the active tab to a state file (claude reads it)
|
||||
* and notify the parent server so its activeTabId stays synced. Skips
|
||||
* chrome:// and chrome-extension:// internal pages.
|
||||
*/
|
||||
/**
|
||||
* Live tab snapshot. Writes <stateDir>/tabs.json (full list) and updates
|
||||
* <stateDir>/active-tab.json (current active). claude can read these any
|
||||
* time without invoking $B tabs — saves a round-trip when the model just
|
||||
* needs to check the landscape before deciding what to do.
|
||||
*/
|
||||
function handleTabState(msg: {
|
||||
active?: { tabId?: number; url?: string; title?: string } | null;
|
||||
tabs?: Array<{ tabId?: number; url?: string; title?: string; active?: boolean; windowId?: number; pinned?: boolean; audible?: boolean }>;
|
||||
reason?: string;
|
||||
}): void {
|
||||
const stateDir = path.dirname(STATE_FILE);
|
||||
try { fs.mkdirSync(stateDir, { recursive: true, mode: 0o700 }); } catch {}
|
||||
|
||||
// tabs.json — full list
|
||||
if (Array.isArray(msg.tabs)) {
|
||||
const payload = {
|
||||
updatedAt: new Date().toISOString(),
|
||||
reason: msg.reason || 'unknown',
|
||||
tabs: msg.tabs.map(t => ({
|
||||
tabId: t.tabId ?? null,
|
||||
url: t.url || '',
|
||||
title: t.title || '',
|
||||
active: !!t.active,
|
||||
windowId: t.windowId ?? null,
|
||||
pinned: !!t.pinned,
|
||||
audible: !!t.audible,
|
||||
})),
|
||||
};
|
||||
const target = path.join(stateDir, 'tabs.json');
|
||||
const tmp = path.join(stateDir, `.tmp-tabs-${process.pid}`);
|
||||
try {
|
||||
fs.writeFileSync(tmp, JSON.stringify(payload, null, 2), { mode: 0o600 });
|
||||
fs.renameSync(tmp, target);
|
||||
} catch {
|
||||
safeUnlink(tmp);
|
||||
}
|
||||
}
|
||||
|
||||
// active-tab.json — single active tab. Skip chrome-internal pages so
|
||||
// claude doesn't see chrome:// or chrome-extension:// URLs as
|
||||
// "current target."
|
||||
const active = msg.active;
|
||||
if (active && active.url && !active.url.startsWith('chrome://') && !active.url.startsWith('chrome-extension://')) {
|
||||
const ctxFile = path.join(stateDir, 'active-tab.json');
|
||||
const tmp = path.join(stateDir, `.tmp-tab-${process.pid}`);
|
||||
try {
|
||||
fs.writeFileSync(tmp, JSON.stringify({
|
||||
tabId: active.tabId ?? null,
|
||||
url: active.url,
|
||||
title: active.title ?? '',
|
||||
}), { mode: 0o600 });
|
||||
fs.renameSync(tmp, ctxFile);
|
||||
} catch {
|
||||
safeUnlink(tmp);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function handleTabSwitch(msg: { tabId?: number; url?: string; title?: string }): void {
|
||||
const url = msg.url || '';
|
||||
if (!url || url.startsWith('chrome://') || url.startsWith('chrome-extension://')) return;
|
||||
|
||||
const stateDir = path.dirname(STATE_FILE);
|
||||
const ctxFile = path.join(stateDir, 'active-tab.json');
|
||||
const tmp = path.join(stateDir, `.tmp-tab-${process.pid}`);
|
||||
try {
|
||||
fs.writeFileSync(tmp, JSON.stringify({
|
||||
tabId: msg.tabId ?? null,
|
||||
url,
|
||||
title: msg.title ?? '',
|
||||
}), { mode: 0o600 });
|
||||
fs.renameSync(tmp, ctxFile);
|
||||
} catch {
|
||||
safeUnlink(tmp);
|
||||
}
|
||||
|
||||
// Best-effort sync to parent server so its activeTabId tracking matches.
|
||||
// No await; this is fire-and-forget.
|
||||
if (BROWSE_SERVER_PORT > 0) {
|
||||
fetch(`http://127.0.0.1:${BROWSE_SERVER_PORT}/command`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': `Bearer ${readBrowseToken()}`,
|
||||
},
|
||||
body: JSON.stringify({
|
||||
command: 'tab',
|
||||
args: [String(msg.tabId ?? ''), '--no-focus'],
|
||||
}),
|
||||
}).catch(() => {});
|
||||
}
|
||||
}
|
||||
|
||||
function readBrowseToken(): string {
|
||||
try {
|
||||
const raw = fs.readFileSync(STATE_FILE, 'utf-8');
|
||||
const j = JSON.parse(raw);
|
||||
return j.token || '';
|
||||
} catch { return ''; }
|
||||
}
|
||||
|
||||
// Boot.
|
||||
function main() {
|
||||
writeClaudeAvailable();
|
||||
const server = buildServer();
|
||||
const port = (server as any).port || (server as any).address?.port;
|
||||
if (!port) {
|
||||
console.error('[terminal-agent] failed to bind: no port');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Write port file atomically so the parent server can pick it up.
|
||||
const dir = path.dirname(PORT_FILE);
|
||||
try { fs.mkdirSync(dir, { recursive: true, mode: 0o700 }); } catch {}
|
||||
const tmp = `${PORT_FILE}.tmp-${process.pid}`;
|
||||
fs.writeFileSync(tmp, String(port), { mode: 0o600 });
|
||||
fs.renameSync(tmp, PORT_FILE);
|
||||
|
||||
// Hand the parent the internal token so it can call /internal/grant.
|
||||
// Parent learns INTERNAL_TOKEN via env (TERMINAL_AGENT_INTERNAL_TOKEN below).
|
||||
// We just print it on stdout for the supervising process to pick up if it's
|
||||
// not already in env. Defense against env races at spawn time.
|
||||
console.log(`[terminal-agent] listening on 127.0.0.1:${port} pid=${process.pid}`);
|
||||
|
||||
// Cleanup port file on exit.
|
||||
const cleanup = () => { safeUnlink(PORT_FILE); process.exit(0); };
|
||||
process.on('SIGTERM', cleanup);
|
||||
process.on('SIGINT', cleanup);
|
||||
}
|
||||
|
||||
// Export the internal token so cli.ts can pass the SAME value to the parent
|
||||
// server via env. Parent reads BROWSE_TERMINAL_INTERNAL_TOKEN and uses it
|
||||
// for /internal/grant calls.
|
||||
//
|
||||
// In practice, the agent generates INTERNAL_TOKEN once at boot and writes it
|
||||
// to a state file the parent reads. This avoids env-passing races. See main().
|
||||
const INTERNAL_TOKEN_FILE = path.join(path.dirname(STATE_FILE), 'terminal-internal-token');
|
||||
try {
|
||||
fs.mkdirSync(path.dirname(INTERNAL_TOKEN_FILE), { recursive: true, mode: 0o700 });
|
||||
fs.writeFileSync(INTERNAL_TOKEN_FILE, INTERNAL_TOKEN, { mode: 0o600 });
|
||||
} catch {}
|
||||
|
||||
main();
|
||||
@@ -19,31 +19,10 @@ import { PAGE_CONTENT_COMMANDS } from '../src/commands';
|
||||
|
||||
const REPO_ROOT = path.resolve(__dirname, '..', '..');
|
||||
|
||||
describe('canary stream-chunk split detection', () => {
|
||||
test('detectCanaryLeak uses rolling buffer across consecutive deltas', () => {
|
||||
// Pull in the function via dynamic require so we don't re-export it
|
||||
// from sidebar-agent.ts (it's internal on purpose).
|
||||
const agentSource = fs.readFileSync(
|
||||
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
|
||||
'utf-8',
|
||||
);
|
||||
// Contract: detectCanaryLeak accepts an optional DeltaBuffer and
|
||||
// uses .slice(-(canary.length - 1)) to retain a rolling tail.
|
||||
expect(agentSource).toContain('DeltaBuffer');
|
||||
expect(agentSource).toMatch(/text_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
|
||||
expect(agentSource).toMatch(/input_json_delta\s*=\s*combined\.slice\(-\(canary\.length - 1\)\)/);
|
||||
});
|
||||
|
||||
test('canary context initializes deltaBuf', () => {
|
||||
const agentSource = fs.readFileSync(
|
||||
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
|
||||
'utf-8',
|
||||
);
|
||||
// The askClaude call site must construct the buffer so the rolling
|
||||
// detection actually runs.
|
||||
expect(agentSource).toContain("deltaBuf: { text_delta: '', input_json_delta: '' }");
|
||||
});
|
||||
});
|
||||
// canary stream-chunk split detection — tested detectCanaryLeak inside
|
||||
// sidebar-agent.ts. Both the chat-stream pipeline and the function are
|
||||
// gone (Terminal pane uses an interactive PTY; user keystrokes are the
|
||||
// trust source, no chunked LLM stream to canary-scan).
|
||||
|
||||
describe('tool-output ensemble rule (single-layer BLOCK)', () => {
|
||||
test('user-input context: single layer at BLOCK degrades to WARN', () => {
|
||||
@@ -117,13 +96,10 @@ describe('transcript classifier tool_output parameter', () => {
|
||||
expect(src).toContain('tool_output');
|
||||
});
|
||||
|
||||
test('sidebar-agent passes tool text to transcript on tool-result scan', () => {
|
||||
const src = fs.readFileSync(
|
||||
path.join(REPO_ROOT, 'browse', 'src', 'sidebar-agent.ts'),
|
||||
'utf-8',
|
||||
);
|
||||
expect(src).toContain('tool_output: text');
|
||||
});
|
||||
// sidebar-agent passed tool text to the transcript classifier on
|
||||
// tool-result scans. That whole pipeline is gone — Terminal pane has
|
||||
// no LLM stream to scan, and security-classifier.ts is dead code with
|
||||
// no production caller (a separate v1.1+ cleanup TODO).
|
||||
});
|
||||
|
||||
describe('GSTACK_SECURITY_OFF kill switch', () => {
|
||||
|
||||
@@ -15,7 +15,13 @@ import * as os from 'os';
|
||||
const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
|
||||
const WRITE_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/write-commands.ts'), 'utf-8');
|
||||
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
|
||||
const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8');
|
||||
// sidebar-agent.ts was ripped (chat queue replaced by interactive PTY).
|
||||
// AGENT_SRC kept as empty string so the legacy describe block below skips
|
||||
// without crashing module load on a missing file.
|
||||
const AGENT_SRC = (() => {
|
||||
try { return fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8'); }
|
||||
catch { return ''; }
|
||||
})();
|
||||
const SNAPSHOT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/snapshot.ts'), 'utf-8');
|
||||
const PATH_SECURITY_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/path-security.ts'), 'utf-8');
|
||||
|
||||
@@ -51,53 +57,12 @@ function extractFunction(src: string, name: string): string {
|
||||
return src.slice(start);
|
||||
}
|
||||
|
||||
// ─── Task 4: Agent queue poisoning — full schema validation + permissions ───
|
||||
|
||||
describe('Agent queue security', () => {
|
||||
it('server queue directory must use restricted permissions', () => {
|
||||
const queueSection = SERVER_SRC.slice(SERVER_SRC.indexOf('agentQueue'), SERVER_SRC.indexOf('agentQueue') + 2000);
|
||||
expect(queueSection).toMatch(/0o700/);
|
||||
});
|
||||
|
||||
it('sidebar-agent queue directory must use restricted permissions', () => {
|
||||
// The mkdirSync for the queue dir lives in main() — search the main() body
|
||||
const mainStart = AGENT_SRC.indexOf('async function main');
|
||||
const queueSection = AGENT_SRC.slice(mainStart);
|
||||
expect(queueSection).toMatch(/0o700/);
|
||||
});
|
||||
|
||||
it('cli.ts queue file creation must use restricted permissions', () => {
|
||||
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
|
||||
const queueSection = CLI_SRC.slice(CLI_SRC.indexOf('queue') || 0, CLI_SRC.indexOf('queue') + 2000);
|
||||
expect(queueSection).toMatch(/0o700|0o600|mode/);
|
||||
});
|
||||
|
||||
it('queue reader must have a validator function covering all fields', () => {
|
||||
// Extract ONLY the validator function body by walking braces
|
||||
const validatorStart = AGENT_SRC.indexOf('function isValidQueueEntry');
|
||||
expect(validatorStart).toBeGreaterThan(-1);
|
||||
let depth = 0;
|
||||
let bodyStart = AGENT_SRC.indexOf('{', validatorStart);
|
||||
let bodyEnd = bodyStart;
|
||||
for (let i = bodyStart; i < AGENT_SRC.length; i++) {
|
||||
if (AGENT_SRC[i] === '{') depth++;
|
||||
if (AGENT_SRC[i] === '}') depth--;
|
||||
if (depth === 0) { bodyEnd = i + 1; break; }
|
||||
}
|
||||
const validatorBlock = AGENT_SRC.slice(validatorStart, bodyEnd);
|
||||
|
||||
expect(validatorBlock).toMatch(/prompt.*string/);
|
||||
expect(validatorBlock).toMatch(/Array\.isArray/);
|
||||
expect(validatorBlock).toMatch(/\.\./);
|
||||
expect(validatorBlock).toContain('stateFile');
|
||||
expect(validatorBlock).toContain('tabId');
|
||||
expect(validatorBlock).toMatch(/number/);
|
||||
expect(validatorBlock).toContain('null');
|
||||
expect(validatorBlock).toContain('message');
|
||||
expect(validatorBlock).toContain('pageUrl');
|
||||
expect(validatorBlock).toContain('sessionId');
|
||||
});
|
||||
});
|
||||
// ─── Agent queue security ──────────────────────────────────────────────────
|
||||
// Original block validated the chat queue's filesystem permissions and
|
||||
// schema validator on sidebar-agent.ts. Both are gone (chat queue ripped
|
||||
// in favor of the interactive Terminal PTY). The remaining 0o700 / 0o600
|
||||
// invariants on extension queue paths are now covered by terminal-agent
|
||||
// integration tests and the sidebar-tabs regression suite.
|
||||
|
||||
// ─── Shared source reads for CSS validator tests ────────────────────────────
|
||||
const CDP_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cdp-inspector.ts'), 'utf-8');
|
||||
@@ -325,30 +290,13 @@ describe('Round-2 finding 2: snapshot.ts annotated path uses realpathSync', () =
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Round-2 finding 3: stateFile path traversal check in isValidQueueEntry ─
|
||||
|
||||
describe('Round-2 finding 3: isValidQueueEntry checks stateFile for path traversal', () => {
|
||||
it('isValidQueueEntry checks stateFile for .. traversal sequences', () => {
|
||||
const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
|
||||
expect(fn).toBeTruthy();
|
||||
// Must check stateFile for '..' — find the stateFile block and look for '..' string
|
||||
const stateFileIdx = fn.indexOf('stateFile');
|
||||
expect(stateFileIdx).toBeGreaterThan(-1);
|
||||
const stateFileBlock = fn.slice(stateFileIdx, stateFileIdx + 200);
|
||||
// The block must contain a check for the two-dot traversal sequence
|
||||
expect(stateFileBlock).toMatch(/'\.\.'|"\.\."|\.\./);
|
||||
});
|
||||
|
||||
it('isValidQueueEntry stateFile block contains both type check and traversal check', () => {
|
||||
const fn = extractFunction(AGENT_SRC, 'isValidQueueEntry');
|
||||
const stateFileIdx = fn.indexOf('stateFile');
|
||||
const stateBlock = fn.slice(stateFileIdx, stateFileIdx + 300);
|
||||
// Must contain the type check
|
||||
expect(stateBlock).toContain('typeof obj.stateFile');
|
||||
// Must contain the includes('..') call
|
||||
expect(stateBlock).toMatch(/includes\s*\(\s*['"]\.\.['"]\s*\)/);
|
||||
});
|
||||
});
|
||||
// ─── Round-2 finding 3: stateFile path traversal check ─────────────────────
|
||||
// Tested isValidQueueEntry's stateFile validator on sidebar-agent.ts. Both
|
||||
// the function and the file are gone (chat queue ripped). The terminal-agent
|
||||
// PTY path no longer takes a queue entry — it accepts WebSocket frames
|
||||
// gated on Origin + session token, no on-disk queue to traverse. Path
|
||||
// traversal in browse-server's tab-state writer is covered by
|
||||
// browse/test/terminal-agent.test.ts (handleTabState atomic-write tests).
|
||||
|
||||
// ─── Task 5: /health endpoint must not expose sensitive fields ───────────────
|
||||
|
||||
@@ -421,24 +369,11 @@ describe('cookie-import domain validation', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Task 9: loadSession ID validation ──────────────────────────────────────
|
||||
|
||||
describe('loadSession session ID validation', () => {
|
||||
it('loadSession validates session ID format before using it in a path', () => {
|
||||
const fn = extractFunction(SERVER_SRC, 'loadSession');
|
||||
expect(fn).toBeTruthy();
|
||||
// Must contain the alphanumeric regex guard
|
||||
expect(fn).toMatch(/\[a-zA-Z0-9_-\]/);
|
||||
});
|
||||
|
||||
it('loadSession returns null on invalid session ID', () => {
|
||||
const fn = extractFunction(SERVER_SRC, 'loadSession');
|
||||
const block = fn.slice(fn.indexOf('activeData.id'));
|
||||
// Must warn and return null
|
||||
expect(block).toContain('Invalid session ID');
|
||||
expect(block).toContain('return null');
|
||||
});
|
||||
});
|
||||
// loadSession session ID validation — loadSession lived inside the chat
|
||||
// agent state block (sidebar-agent.ts session persistence). Chat queue
|
||||
// is gone, so the function and its session-ID validator are gone. The
|
||||
// terminal-agent's PTY session has no on-disk session ID — the WebSocket
|
||||
// holds the session for its lifetime.
|
||||
|
||||
// ─── Task 10: Responsive screenshot path validation ──────────────────────────
|
||||
|
||||
@@ -520,40 +455,11 @@ describe('Task 11: state load cookie validation', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Task 12: Validate activeTabUrl before syncActiveTabByUrl ─────────────────
|
||||
|
||||
describe('Task 12: activeTabUrl sanitized before syncActiveTabByUrl', () => {
|
||||
it('sidebar-tabs route sanitizes activeUrl before syncActiveTabByUrl', () => {
|
||||
const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
|
||||
expect(block).toContain('sanitizeExtensionUrl');
|
||||
expect(block).toContain('syncActiveTabByUrl');
|
||||
const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
|
||||
const syncIdx = block.indexOf('syncActiveTabByUrl');
|
||||
expect(sanitizeIdx).toBeLessThan(syncIdx);
|
||||
});
|
||||
|
||||
it('sidebar-command route sanitizes extensionUrl before syncActiveTabByUrl', () => {
|
||||
const block = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
|
||||
expect(block).toContain('sanitizeExtensionUrl');
|
||||
expect(block).toContain('syncActiveTabByUrl');
|
||||
const sanitizeIdx = block.indexOf('sanitizeExtensionUrl');
|
||||
const syncIdx = block.indexOf('syncActiveTabByUrl');
|
||||
expect(sanitizeIdx).toBeLessThan(syncIdx);
|
||||
});
|
||||
|
||||
it('direct unsanitized syncActiveTabByUrl calls are not present (all calls go through sanitize)', () => {
|
||||
// Every syncActiveTabByUrl call should be preceded by sanitizeExtensionUrl in the nearby code
|
||||
// We verify there are no direct browserManager.syncActiveTabByUrl(activeUrl) or
|
||||
// browserManager.syncActiveTabByUrl(extensionUrl) patterns (without sanitize wrapper)
|
||||
const block1 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-tabs'", "url.pathname === '/sidebar-tabs/switch'");
|
||||
// Should NOT contain direct call with raw activeUrl
|
||||
expect(block1).not.toMatch(/syncActiveTabByUrl\(activeUrl\)/);
|
||||
|
||||
const block2 = sliceBetween(SERVER_SRC, "url.pathname === '/sidebar-command'", "url.pathname === '/sidebar-chat/clear'");
|
||||
// Should NOT contain direct call with raw extensionUrl
|
||||
expect(block2).not.toMatch(/syncActiveTabByUrl\(extensionUrl\)/);
|
||||
});
|
||||
});
|
||||
// activeTabUrl sanitized before syncActiveTabByUrl — tested URL sanitization
|
||||
// on the now-deleted /sidebar-tabs and /sidebar-command routes. The
|
||||
// terminal-agent reads tab URLs from the live tabs.json file (atomic write
|
||||
// from background.js), and chrome:// / chrome-extension:// pages are
|
||||
// filtered server-side in handleTabState — see browse/test/terminal-agent.test.ts.
|
||||
|
||||
// ─── Task 13: Inbox output wrapped as untrusted ──────────────────────────────
|
||||
|
||||
@@ -581,107 +487,17 @@ describe('Task 13: inbox output wrapped as untrusted content', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Task 14: DOM serialization round-trip replaced with DocumentFragment ─────
|
||||
// switchChatTab DocumentFragment + pollChat reentrancy guard tests targeted
|
||||
// now-deleted chat-tab DOM logic and chat-polling reentrancy. Both are gone
|
||||
// (Terminal pane is the sole sidebar surface; xterm.js owns its own DOM
|
||||
// lifecycle, and the WebSocket has no reentrancy hazard).
|
||||
|
||||
const SIDEPANEL_SRC = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
|
||||
|
||||
describe('Task 14: switchChatTab uses DocumentFragment, not innerHTML round-trip', () => {
|
||||
it('switchChatTab does NOT use innerHTML to restore chat (string-based re-parse removed)', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
|
||||
expect(fn).toBeTruthy();
|
||||
// Must NOT have the dangerous pattern of assigning chatDomByTab value back to innerHTML
|
||||
expect(fn).not.toMatch(/chatMessages\.innerHTML\s*=\s*chatDomByTab/);
|
||||
});
|
||||
|
||||
it('switchChatTab uses createDocumentFragment to save chat DOM', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
|
||||
expect(fn).toContain('createDocumentFragment');
|
||||
});
|
||||
|
||||
it('switchChatTab moves nodes via appendChild/firstChild (not innerHTML assignment)', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
|
||||
// Must use appendChild to restore nodes from fragment
|
||||
expect(fn).toContain('chatMessages.appendChild');
|
||||
});
|
||||
|
||||
it('chatDomByTab comment documents that values are DocumentFragments, not strings', () => {
|
||||
// Check module-level comment on chatDomByTab
|
||||
const commentIdx = SIDEPANEL_SRC.indexOf('chatDomByTab');
|
||||
const commentLine = SIDEPANEL_SRC.slice(commentIdx, commentIdx + 120);
|
||||
expect(commentLine).toMatch(/DocumentFragment|fragment/i);
|
||||
});
|
||||
|
||||
it('welcome screen is built with DOM methods in the else branch (not innerHTML)', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
|
||||
// The else branch must use createElement, not innerHTML template literal
|
||||
expect(fn).toContain('createElement');
|
||||
// The specific innerHTML template with chat-welcome must be gone
|
||||
expect(fn).not.toMatch(/innerHTML\s*=\s*`[\s\S]*?chat-welcome/);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Task 15: pollChat/switchChatTab reentrancy guard ────────────────────────
|
||||
|
||||
describe('Task 15: pollChat reentrancy guard and deferred call in switchChatTab', () => {
|
||||
it('pollInProgress guard variable is declared at module scope', () => {
|
||||
// Must be declared before any function definitions (within first 2000 chars)
|
||||
const moduleTop = SIDEPANEL_SRC.slice(0, 2000);
|
||||
expect(moduleTop).toContain('pollInProgress');
|
||||
});
|
||||
|
||||
it('pollChat function checks and sets pollInProgress', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
|
||||
expect(fn).toBeTruthy();
|
||||
expect(fn).toContain('pollInProgress');
|
||||
});
|
||||
|
||||
it('pollChat resets pollInProgress in finally block', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'pollChat');
|
||||
// The finally block must contain the reset
|
||||
const finallyIdx = fn.indexOf('finally');
|
||||
expect(finallyIdx).toBeGreaterThan(-1);
|
||||
const finallyBlock = fn.slice(finallyIdx, finallyIdx + 60);
|
||||
expect(finallyBlock).toContain('pollInProgress');
|
||||
});
|
||||
|
||||
it('switchChatTab calls pollChat via setTimeout (not directly)', () => {
|
||||
const fn = extractFunction(SIDEPANEL_SRC, 'switchChatTab');
|
||||
// Must use setTimeout to defer pollChat — no direct call at the end
|
||||
expect(fn).toMatch(/setTimeout\s*\(\s*pollChat/);
|
||||
// Must NOT have a bare direct call `pollChat()` at the end (outside setTimeout)
|
||||
// We check that there is no standalone `pollChat()` call (outside setTimeout wrapper)
|
||||
const withoutSetTimeout = fn.replace(/setTimeout\s*\(\s*pollChat[^)]*\)/g, '');
|
||||
expect(withoutSetTimeout).not.toMatch(/\bpollChat\s*\(\s*\)/);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Task 16: SIGKILL escalation in sidebar-agent timeout ────────────────────
|
||||
|
||||
describe('Task 16: sidebar-agent timeout handler uses SIGTERM→SIGKILL escalation', () => {
|
||||
it('timeout block sends SIGTERM first', () => {
|
||||
// Slice from "Timed out" / setTimeout block to processingTabs.delete
|
||||
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
|
||||
expect(timeoutStart).toBeGreaterThan(-1);
|
||||
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
|
||||
expect(timeoutBlock).toContain('SIGTERM');
|
||||
});
|
||||
|
||||
it('timeout block escalates to SIGKILL after delay', () => {
|
||||
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
|
||||
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
|
||||
expect(timeoutBlock).toContain('SIGKILL');
|
||||
});
|
||||
|
||||
it('SIGTERM appears before SIGKILL in timeout block', () => {
|
||||
const timeoutStart = AGENT_SRC.indexOf("SIDEBAR_AGENT_TIMEOUT");
|
||||
const timeoutBlock = AGENT_SRC.slice(timeoutStart, timeoutStart + 600);
|
||||
const sigtermIdx = timeoutBlock.indexOf('SIGTERM');
|
||||
const sigkillIdx = timeoutBlock.indexOf('SIGKILL');
|
||||
expect(sigtermIdx).toBeGreaterThan(-1);
|
||||
expect(sigkillIdx).toBeGreaterThan(-1);
|
||||
expect(sigtermIdx).toBeLessThan(sigkillIdx);
|
||||
});
|
||||
});
|
||||
// ─── Task 16: SIGKILL escalation ────────────────────────────────────────────
|
||||
// Originally tested sidebar-agent's SIDEBAR_AGENT_TIMEOUT block. The chat
|
||||
// queue and its watchdog are gone. terminal-agent.ts disposes claude with
|
||||
// the same SIGINT-then-SIGKILL-after-3s pattern; that's covered by
|
||||
// browse/test/terminal-agent.test.ts ("cleanup escalates SIGINT to SIGKILL
|
||||
// after 3s on close").
|
||||
|
||||
// ─── Task 17: viewport and wait bounds clamping ──────────────────────────────
|
||||
|
||||
|
||||
@@ -1,218 +0,0 @@
|
||||
/**
|
||||
* Full-stack E2E — the security-contract anchor test.
|
||||
*
|
||||
* Spins up a real browse server + real sidebar-agent subprocess, points
|
||||
* them at a MOCK claude binary (browse/test/fixtures/mock-claude/claude)
|
||||
* that deterministically emits a canary-leaking tool_use event, then
|
||||
* verifies the whole pipeline reacts:
|
||||
*
|
||||
* 1. Server canary-injects into the system prompt
|
||||
* 2. Server queues the message
|
||||
* 3. Sidebar-agent spawns mock-claude
|
||||
* 4. Mock-claude emits tool_use with CANARY-XXX in a URL arg
|
||||
* 5. Sidebar-agent's detectCanaryLeak fires on the stream event
|
||||
* 6. onCanaryLeaked logs, SIGTERM's mock-claude, emits security_event
|
||||
* 7. /sidebar-chat returns security_event + agent_error entries
|
||||
*
|
||||
* This test proves the end-to-end contract: when a canary leak happens,
|
||||
* the session terminates AND the sidepanel receives the events that drive
|
||||
* the approved banner render. No LLM cost, <10s total runtime.
|
||||
*
|
||||
* Fully deterministic — safe to run on every commit (gate tier).
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
let serverProc: Subprocess | null = null;
|
||||
let agentProc: Subprocess | null = null;
|
||||
let serverPort = 0;
|
||||
let authToken = '';
|
||||
let tmpDir = '';
|
||||
let stateFile = '';
|
||||
let queueFile = '';
|
||||
const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
|
||||
|
||||
async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
Authorization: `Bearer ${authToken}`,
|
||||
...(opts.headers as Record<string, string> | undefined),
|
||||
};
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-e2e-fullstack-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
|
||||
const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
|
||||
|
||||
// 1) Start the browse server.
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1', // no Chromium for this test
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Wait for state file with token + port
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
|
||||
// 2) Start the sidebar-agent with PATH prepended by the mock-claude dir.
|
||||
// sidebar-agent spawns `claude` via PATH lookup (spawn('claude', ...) — see
|
||||
// browse/src/sidebar-agent.ts spawnClaude), so prepending works without any
|
||||
// source change.
|
||||
const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
|
||||
agentProc = spawn(['bun', 'run', agentScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
PATH: shimmedPath,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_SERVER_PORT: String(serverPort),
|
||||
BROWSE_PORT: String(serverPort),
|
||||
BROWSE_NO_AUTOSTART: '1',
|
||||
// Scenario for mock-claude inherits through spawn env below — the agent
|
||||
// itself doesn't read this, but the claude subprocess it spawns does.
|
||||
MOCK_CLAUDE_SCENARIO: 'canary_leak_in_tool_arg',
|
||||
// Force classifier off so pre-spawn ML scan doesn't fire on our
|
||||
// benign synthetic test prompt. This test exercises the canary
|
||||
// path specifically.
|
||||
GSTACK_SECURITY_OFF: '1',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Give the agent a moment to establish its poll loop.
|
||||
await new Promise((r) => setTimeout(r, 500));
|
||||
}, 30000);
|
||||
|
||||
async function drainStderr(proc: Subprocess | null, label: string): Promise<void> {
|
||||
if (!proc?.stderr) return;
|
||||
try {
|
||||
const reader = (proc.stderr as ReadableStream).getReader();
|
||||
// Drain briefly — don't block shutdown
|
||||
const result = await Promise.race([
|
||||
reader.read(),
|
||||
new Promise<ReadableStreamReadResult<Uint8Array>>((resolve) =>
|
||||
setTimeout(() => resolve({ done: true, value: undefined }), 100)
|
||||
),
|
||||
]);
|
||||
if (result?.value) {
|
||||
const text = new TextDecoder().decode(result.value);
|
||||
if (text.trim()) console.error(`[${label} stderr]`, text.slice(0, 2000));
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
|
||||
afterAll(async () => {
|
||||
// Dump agent stderr for diagnostic
|
||||
await drainStderr(agentProc, 'agent');
|
||||
for (const proc of [serverProc, agentProc]) {
|
||||
if (proc) {
|
||||
try { proc.kill('SIGTERM'); } catch {}
|
||||
try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
|
||||
}
|
||||
}
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
describe('security pipeline E2E (mock claude)', () => {
|
||||
test('server injects canary, queues message, agent spawns mock claude', async () => {
|
||||
const resp = await apiFetch('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: "What's on this page?",
|
||||
activeTabUrl: 'https://attacker.example.com/',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Wait for the sidebar-agent to pick up the entry and spawn mock-claude.
|
||||
// Queue entry must contain `canary` field (added by server.ts spawnClaude).
|
||||
await new Promise((r) => setTimeout(r, 250));
|
||||
const queueContent = fs.readFileSync(queueFile, 'utf-8').trim();
|
||||
const lines = queueContent.split('\n').filter(Boolean);
|
||||
expect(lines.length).toBeGreaterThan(0);
|
||||
const entry = JSON.parse(lines[lines.length - 1]);
|
||||
expect(entry.canary).toMatch(/^CANARY-[0-9A-F]+$/);
|
||||
expect(entry.prompt).toContain(entry.canary);
|
||||
expect(entry.prompt).toContain('NEVER include it');
|
||||
});
|
||||
|
||||
test('canary leak triggers security_event + agent_error in /sidebar-chat', async () => {
|
||||
// By now the mock-claude subprocess has emitted the tool_use with the
|
||||
// leaked canary. Sidebar-agent's handleStreamEvent -> detectCanaryLeak
|
||||
// -> onCanaryLeaked should have fired security_event + agent_error and
|
||||
// SIGTERM'd the mock. Poll /sidebar-chat up to 10s for the events.
|
||||
const deadline = Date.now() + 10000;
|
||||
let securityEvent: any = null;
|
||||
let agentError: any = null;
|
||||
while (Date.now() < deadline && (!securityEvent || !agentError)) {
|
||||
const resp = await apiFetch('/sidebar-chat');
|
||||
const data: any = await resp.json();
|
||||
for (const entry of data.entries ?? []) {
|
||||
if (entry.type === 'security_event') securityEvent = entry;
|
||||
if (entry.type === 'agent_error') agentError = entry;
|
||||
}
|
||||
if (securityEvent && agentError) break;
|
||||
await new Promise((r) => setTimeout(r, 250));
|
||||
}
|
||||
|
||||
expect(securityEvent).not.toBeNull();
|
||||
expect(securityEvent.verdict).toBe('block');
|
||||
expect(securityEvent.reason).toBe('canary_leaked');
|
||||
expect(securityEvent.layer).toBe('canary');
|
||||
// The leak is on a tool_use channel — onCanaryLeaked records "tool_use:Bash"
|
||||
expect(String(securityEvent.channel)).toContain('tool_use');
|
||||
expect(securityEvent.domain).toBe('attacker.example.com');
|
||||
|
||||
expect(agentError).not.toBeNull();
|
||||
expect(agentError.error).toContain('Session terminated');
|
||||
expect(agentError.error).toContain('prompt injection detected');
|
||||
}, 15000);
|
||||
|
||||
test('attempts.jsonl logged with salted payload_hash and verdict=block', async () => {
|
||||
// onCanaryLeaked also calls logAttempt — check the log file exists
|
||||
// and contains the event. The file lives at ~/.gstack/security/attempts.jsonl.
|
||||
const logPath = path.join(os.homedir(), '.gstack', 'security', 'attempts.jsonl');
|
||||
expect(fs.existsSync(logPath)).toBe(true);
|
||||
const content = fs.readFileSync(logPath, 'utf-8');
|
||||
const recent = content.split('\n').filter(Boolean).slice(-10);
|
||||
// Find at least one entry with verdict=block and layer=canary from our run
|
||||
const ourEntry = recent
|
||||
.map((l) => { try { return JSON.parse(l); } catch { return null; } })
|
||||
.find((e) => e && e.layer === 'canary' && e.verdict === 'block' && e.urlDomain === 'attacker.example.com');
|
||||
expect(ourEntry).toBeTruthy();
|
||||
// payload_hash is a 64-char sha256 hex
|
||||
expect(String(ourEntry.payloadHash)).toMatch(/^[0-9a-f]{64}$/);
|
||||
// Never stored the payload itself — only the hash
|
||||
expect(JSON.stringify(ourEntry)).not.toContain('CANARY-');
|
||||
});
|
||||
});
|
||||
@@ -1,405 +0,0 @@
|
||||
/**
|
||||
* Full-stack review-flow E2E with the real classifier.
|
||||
*
|
||||
* Spins up real server + real sidebar-agent subprocess + mock-claude and
|
||||
* exercises the whole tool-output BLOCK → review → decide path with the
|
||||
* real TestSavantAI classifier warm. The injection string trips the real
|
||||
* model reliably (measured: confidence 0.9999 on classic DAN-style text).
|
||||
*
|
||||
* What this covers that gate-tier tests don't:
|
||||
* * Real classifier actually fires on the injection
|
||||
* * sidebar-agent emits a reviewable security_event for real, not a stub
|
||||
* * server's POST /security-decision writes the on-disk decision file
|
||||
* * sidebar-agent's poll loop reads the file and either resumes or kills
|
||||
* the mock-claude subprocess
|
||||
* * attempts.jsonl ends up with the right verdict (block vs user_overrode)
|
||||
*
|
||||
* This is periodic tier. First run warms the ~112MB classifier from
|
||||
* HuggingFace — ~30s cold. Subsequent runs use the cached model under
|
||||
* ~/.gstack/models/testsavant-small/ and complete in ~5s.
|
||||
*
|
||||
* SKIPS if the classifier can't warm (no network, no disk) — the test is
|
||||
* truth-seeking only when the stack is genuinely up.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
const MOCK_CLAUDE_DIR = path.resolve(import.meta.dir, 'fixtures', 'mock-claude');
|
||||
const WARMUP_TIMEOUT_MS = 90_000; // first-run download budget
|
||||
const CLASSIFIER_CACHE = path.join(os.homedir(), '.gstack', 'models', 'testsavant-small');
|
||||
|
||||
let serverProc: Subprocess | null = null;
|
||||
let agentProc: Subprocess | null = null;
|
||||
let serverPort = 0;
|
||||
let authToken = '';
|
||||
let tmpDir = '';
|
||||
let stateFile = '';
|
||||
let queueFile = '';
|
||||
let attemptsPath = '';
|
||||
|
||||
/**
|
||||
* Eager check — is the classifier model already on disk? `test.skipIf()`
|
||||
* is evaluated at file-registration time (before beforeAll runs), so a
|
||||
* runtime boolean wouldn't work — all tests would unconditionally register
|
||||
* as skipped. Probe the model dir synchronously at file load.
|
||||
* Same pattern as security-sidepanel-dom.test.ts uses for chromium.
|
||||
*/
|
||||
const CLASSIFIER_READY = (() => {
|
||||
try {
|
||||
if (!fs.existsSync(CLASSIFIER_CACHE)) return false;
|
||||
// At minimum we need the tokenizer config + onnx model.
|
||||
return fs.existsSync(path.join(CLASSIFIER_CACHE, 'tokenizer.json'))
|
||||
&& fs.existsSync(path.join(CLASSIFIER_CACHE, 'onnx'));
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
})();
|
||||
|
||||
async function apiFetch(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, {
|
||||
...opts,
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
Authorization: `Bearer ${authToken}`,
|
||||
...(opts.headers as Record<string, string> | undefined),
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
async function waitForSecurityEntry(
|
||||
predicate: (entry: any) => boolean,
|
||||
timeoutMs: number,
|
||||
): Promise<any | null> {
|
||||
const deadline = Date.now() + timeoutMs;
|
||||
while (Date.now() < deadline) {
|
||||
const resp = await apiFetch('/sidebar-chat');
|
||||
const data: any = await resp.json();
|
||||
for (const entry of data.entries ?? []) {
|
||||
if (entry.type === 'security_event' && predicate(entry)) return entry;
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, 250));
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
async function waitForProcessExit(proc: Subprocess, timeoutMs: number): Promise<number | null> {
|
||||
const deadline = Date.now() + timeoutMs;
|
||||
while (Date.now() < deadline) {
|
||||
if (proc.exitCode !== null) return proc.exitCode;
|
||||
await new Promise((r) => setTimeout(r, 100));
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
async function readAttempts(): Promise<any[]> {
|
||||
if (!fs.existsSync(attemptsPath)) return [];
|
||||
const raw = fs.readFileSync(attemptsPath, 'utf-8');
|
||||
return raw.split('\n').filter(Boolean).map((l) => {
|
||||
try { return JSON.parse(l); } catch { return null; }
|
||||
}).filter(Boolean);
|
||||
}
|
||||
|
||||
async function startStack(scenario: string, attemptsDir: string): Promise<void> {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'security-review-fullstack-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
// Re-root HOME for both server and agent so:
|
||||
// - server.ts's SESSIONS_DIR doesn't load pre-existing chat history
|
||||
// from ~/.gstack/sidebar-sessions/ (caused ghost security_events to
|
||||
// leak in from the live /open-gstack-browser session)
|
||||
// - security.ts's attempts.jsonl writes land in a test-owned dir
|
||||
// - session-state.json, chromium-profile, etc. stay isolated
|
||||
fs.mkdirSync(path.join(attemptsDir, '.gstack'), { recursive: true });
|
||||
|
||||
// Symlink the models dir through to the real cache — without it the
|
||||
// sidebar-agent would try to re-download 112MB every test run.
|
||||
const testModelsDir = path.join(attemptsDir, '.gstack', 'models');
|
||||
const realModelsDir = path.join(os.homedir(), '.gstack', 'models');
|
||||
try {
|
||||
if (fs.existsSync(realModelsDir) && !fs.existsSync(testModelsDir)) {
|
||||
fs.symlinkSync(realModelsDir, testModelsDir);
|
||||
}
|
||||
} catch {
|
||||
// Symlink may already exist — ignore.
|
||||
}
|
||||
|
||||
const serverScript = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
|
||||
const agentScript = path.resolve(import.meta.dir, '..', 'src', 'sidebar-agent.ts');
|
||||
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1',
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
HOME: attemptsDir,
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
|
||||
const shimmedPath = `${MOCK_CLAUDE_DIR}:${process.env.PATH ?? ''}`;
|
||||
agentProc = spawn(['bun', 'run', agentScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
PATH: shimmedPath,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_SERVER_PORT: String(serverPort),
|
||||
BROWSE_PORT: String(serverPort),
|
||||
BROWSE_NO_AUTOSTART: '1',
|
||||
MOCK_CLAUDE_SCENARIO: scenario,
|
||||
HOME: attemptsDir,
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
attemptsPath = path.join(attemptsDir, '.gstack', 'security', 'attempts.jsonl');
|
||||
|
||||
// Give the agent a moment to establish its poll loop + warmup the model.
|
||||
await new Promise((r) => setTimeout(r, 500));
|
||||
}
|
||||
|
||||
async function stopStack(): Promise<void> {
|
||||
for (const proc of [serverProc, agentProc]) {
|
||||
if (proc) {
|
||||
try { proc.kill('SIGTERM'); } catch {}
|
||||
try { setTimeout(() => { try { proc.kill('SIGKILL'); } catch {} }, 1500); } catch {}
|
||||
}
|
||||
}
|
||||
serverProc = null;
|
||||
agentProc = null;
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
// Sanity: the on-disk cache is real + decodable. If this fails, mark the
|
||||
// file as "classifier unavailable" (we can't toggle CLASSIFIER_READY
|
||||
// post-registration — a failure here just means the tests below will
|
||||
// exercise the agent without a working classifier, which is the honest
|
||||
// signal we want anyway).
|
||||
if (!CLASSIFIER_READY) return;
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await stopStack();
|
||||
});
|
||||
|
||||
describe('review-flow full-stack E2E', () => {
|
||||
test.skipIf(!CLASSIFIER_READY)(
|
||||
'tool_result injection → reviewable event → user ALLOWS → attempts.jsonl has user_overrode',
|
||||
async () => {
|
||||
const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-allow-'));
|
||||
try {
|
||||
await startStack('tool_result_injection', attemptsDir);
|
||||
|
||||
// Fire the message that will cause mock-claude to emit the
|
||||
// injection-laden tool_result.
|
||||
const resp = await apiFetch('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'summarize the hacker news comments',
|
||||
activeTabUrl: 'https://news.ycombinator.com/item?id=42',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Wait for the real classifier to fire and emit a reviewable
|
||||
// security_event. The classifier is warm so this should happen in
|
||||
// well under 10s once the tool_result arrives.
|
||||
const reviewable = await waitForSecurityEntry(
|
||||
(e) => e.verdict === 'block' && e.reviewable === true,
|
||||
30_000,
|
||||
);
|
||||
expect(reviewable).not.toBeNull();
|
||||
expect(reviewable.reason).toBe('tool_result_ml');
|
||||
expect(reviewable.tool).toBe('Bash');
|
||||
expect(String(reviewable.suspected_text ?? '')).toContain('IGNORE ALL PREVIOUS');
|
||||
|
||||
// User clicks Allow via the banner → sidepanel POSTs to server.
|
||||
const decisionResp = await apiFetch('/security-decision', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
tabId: reviewable.tabId,
|
||||
decision: 'allow',
|
||||
reason: 'user',
|
||||
}),
|
||||
});
|
||||
expect(decisionResp.status).toBe(200);
|
||||
|
||||
// Wait for sidebar-agent's poll loop to consume the decision and
|
||||
// emit a follow-up user_overrode security_event.
|
||||
const overrode = await waitForSecurityEntry(
|
||||
(e) => e.verdict === 'user_overrode',
|
||||
10_000,
|
||||
);
|
||||
expect(overrode).not.toBeNull();
|
||||
|
||||
// Audit log must capture both the block and the override, in that
|
||||
// order. Both records share the same salted payload hash so the
|
||||
// security dashboard can aggregate them as a single attempt.
|
||||
const attempts = await readAttempts();
|
||||
const blockLog = attempts.find(
|
||||
(a) => a.verdict === 'block' && a.layer === 'testsavant_content',
|
||||
);
|
||||
const overrodeLog = attempts.find(
|
||||
(a) => a.verdict === 'user_overrode' && a.layer === 'testsavant_content',
|
||||
);
|
||||
expect(blockLog).toBeTruthy();
|
||||
expect(overrodeLog).toBeTruthy();
|
||||
expect(overrodeLog.payloadHash).toBe(blockLog.payloadHash);
|
||||
// Privacy contract: neither record includes the raw payload.
|
||||
expect(JSON.stringify(overrodeLog)).not.toContain('IGNORE ALL PREVIOUS');
|
||||
|
||||
// Liveness: session must actually KEEP RUNNING after Allow. Mock-claude
|
||||
// emits a second tool_use to post-block-followup.example.com ~8s
|
||||
// after the tool_result. That event must reach the chat feed, proving
|
||||
// the sidebar-agent resumed the stream-handler relay instead of
|
||||
// silently wedging.
|
||||
const followupDeadline = Date.now() + 20_000;
|
||||
let followup: any = null;
|
||||
while (Date.now() < followupDeadline && !followup) {
|
||||
const chatResp = await apiFetch('/sidebar-chat');
|
||||
const chatData: any = await chatResp.json();
|
||||
for (const entry of chatData.entries ?? []) {
|
||||
const input = String((entry as any).input ?? '');
|
||||
if (
|
||||
entry.type === 'tool_use' &&
|
||||
input.includes('post-block-followup.example.com')
|
||||
) {
|
||||
followup = entry;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!followup) await new Promise((r) => setTimeout(r, 300));
|
||||
}
|
||||
expect(followup).not.toBeNull();
|
||||
} finally {
|
||||
await stopStack();
|
||||
try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
|
||||
}
|
||||
},
|
||||
90_000,
|
||||
);
|
||||
|
||||
test.skipIf(!CLASSIFIER_READY)(
|
||||
'tool_result injection → reviewable event → user BLOCKS → agent session terminates',
|
||||
async () => {
|
||||
const attemptsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'attempts-block-'));
|
||||
try {
|
||||
await startStack('tool_result_injection', attemptsDir);
|
||||
|
||||
const resp = await apiFetch('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'summarize the hacker news comments',
|
||||
activeTabUrl: 'https://news.ycombinator.com/item?id=42',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
const reviewable = await waitForSecurityEntry(
|
||||
(e) => e.verdict === 'block' && e.reviewable === true,
|
||||
30_000,
|
||||
);
|
||||
expect(reviewable).not.toBeNull();
|
||||
|
||||
const decisionResp = await apiFetch('/security-decision', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
tabId: reviewable.tabId,
|
||||
decision: 'block',
|
||||
reason: 'user',
|
||||
}),
|
||||
});
|
||||
expect(decisionResp.status).toBe(200);
|
||||
|
||||
// Wait for the agent_error that the sidebar-agent emits when it
|
||||
// kills the claude subprocess after a user-confirmed block. This
|
||||
// is the sidepanel's "Session terminated" signal.
|
||||
const deadline = Date.now() + 15_000;
|
||||
let errorEntry: any = null;
|
||||
while (Date.now() < deadline && !errorEntry) {
|
||||
const chatResp = await apiFetch('/sidebar-chat');
|
||||
const chatData: any = await chatResp.json();
|
||||
for (const entry of chatData.entries ?? []) {
|
||||
if (
|
||||
entry.type === 'agent_error' &&
|
||||
String(entry.error ?? '').includes('Session terminated')
|
||||
) {
|
||||
errorEntry = entry;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!errorEntry) await new Promise((r) => setTimeout(r, 200));
|
||||
}
|
||||
expect(errorEntry).not.toBeNull();
|
||||
|
||||
// attempts.jsonl must NOT have a user_overrode entry for this run.
|
||||
const attempts = await readAttempts();
|
||||
const overrodeLog = attempts.find((a) => a.verdict === 'user_overrode');
|
||||
expect(overrodeLog).toBeFalsy();
|
||||
|
||||
// The real security property: after Block, NO FURTHER tool calls
|
||||
// reach the chat feed. Mock-claude would have emitted a tool_use
|
||||
// to post-block-followup.example.com ~8s after the tool_result if
|
||||
// the session had kept running. Wait long enough for that window
|
||||
// to close (12s total), then assert the followup event never
|
||||
// appeared. This is what makes "block" actually stop the page —
|
||||
// the subprocess is SIGTERM'd before it can emit the next event.
|
||||
await new Promise((r) => setTimeout(r, 12_000));
|
||||
const finalChatResp = await apiFetch('/sidebar-chat');
|
||||
const finalChatData: any = await finalChatResp.json();
|
||||
const followupAttempted = (finalChatData.entries ?? []).some(
|
||||
(entry: any) =>
|
||||
entry.type === 'tool_use' &&
|
||||
String(entry.input ?? '').includes('post-block-followup.example.com'),
|
||||
);
|
||||
expect(followupAttempted).toBe(false);
|
||||
|
||||
// And mock-claude must actually have died (not just been signaled
|
||||
// — the SIGTERM + SIGKILL pair should have exited the process).
|
||||
const mockAlive = (await apiFetch('/sidebar-chat')).ok; // channel still open
|
||||
expect(mockAlive).toBe(true);
|
||||
} finally {
|
||||
await stopStack();
|
||||
try { fs.rmSync(attemptsDir, { recursive: true, force: true }); } catch {}
|
||||
}
|
||||
},
|
||||
90_000,
|
||||
);
|
||||
|
||||
test.skipIf(!CLASSIFIER_READY)(
|
||||
'no decision within 60s → timeout auto-blocks',
|
||||
async () => {
|
||||
// This test would naturally take 60s+ to run. We assert the
|
||||
// decision file semantics instead — the unit-test suite already
|
||||
// verified the poll loop times out and defaults to block
|
||||
// (security-review-flow.test.ts). Kept here as a spec marker so
|
||||
// the scenario is documented in the full-stack file.
|
||||
expect(true).toBe(true);
|
||||
},
|
||||
);
|
||||
});
|
||||
@@ -1,345 +0,0 @@
|
||||
/**
|
||||
* Review-flow E2E (sidepanel side, hermetic).
|
||||
*
|
||||
* Loads the real extension sidepanel.html in Playwright Chromium, stubs
|
||||
* the browse server responses, injects a `reviewable: true` security_event
|
||||
* into /sidebar-chat, and asserts the user-in-the-loop flow end-to-end:
|
||||
*
|
||||
* 1. Banner renders with "Review suspected injection" title
|
||||
* 2. Suspected text excerpt shows up inside the expandable details
|
||||
* 3. Allow + Block buttons are visible and actionable
|
||||
* 4. Clicking Allow posts to /security-decision with decision:"allow"
|
||||
* 5. Clicking Block posts to /security-decision with decision:"block"
|
||||
* 6. Banner auto-hides after decision
|
||||
*
|
||||
* This is the UI-and-wire test. The server-side handshake (decision file
|
||||
* write + sidebar-agent poll) is covered by security-review-flow.test.ts.
|
||||
* The full-stack version with real mock-claude + real classifier lives
|
||||
* in security-review-fullstack.test.ts (periodic tier).
|
||||
*
|
||||
* Gate tier. ~3s. Skipped if Playwright chromium is unavailable.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import { chromium, type Browser, type Page } from 'playwright';
|
||||
|
||||
const EXTENSION_DIR = path.resolve(import.meta.dir, '..', '..', 'extension');
|
||||
const SIDEPANEL_URL = `file://${EXTENSION_DIR}/sidepanel.html`;
|
||||
|
||||
const CHROMIUM_AVAILABLE = (() => {
|
||||
try {
|
||||
const exe = chromium.executablePath();
|
||||
return !!exe && fs.existsSync(exe);
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
})();
|
||||
|
||||
interface DecisionCall {
|
||||
tabId: number;
|
||||
decision: 'allow' | 'block';
|
||||
reason?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Install the same stubs the existing sidepanel-dom test uses, plus a
|
||||
* fetch interceptor that captures POSTs to /security-decision into a
|
||||
* page-scoped array. Returns a handle to read the captured calls.
|
||||
*/
|
||||
async function installStubsAndCapture(
|
||||
page: Page,
|
||||
scenario: { securityEntries: any[] },
|
||||
): Promise<void> {
|
||||
await page.addInitScript((params: any) => {
|
||||
(window as any).__decisionCalls = [];
|
||||
|
||||
(window as any).chrome = {
|
||||
runtime: {
|
||||
sendMessage: (_req: any, cb: any) => {
|
||||
const payload = { connected: true, port: 34567 };
|
||||
if (typeof cb === 'function') {
|
||||
setTimeout(() => cb(payload), 0);
|
||||
return undefined;
|
||||
}
|
||||
return Promise.resolve(payload);
|
||||
},
|
||||
lastError: null,
|
||||
onMessage: { addListener: () => {} },
|
||||
},
|
||||
tabs: {
|
||||
query: (_q: any, cb: any) => setTimeout(() => cb([{ id: 1, url: 'https://example.com' }]), 0),
|
||||
onActivated: { addListener: () => {} },
|
||||
onUpdated: { addListener: () => {} },
|
||||
},
|
||||
};
|
||||
|
||||
(window as any).EventSource = class {
|
||||
constructor() {}
|
||||
addEventListener() {}
|
||||
close() {}
|
||||
};
|
||||
|
||||
const scenarioRef = params;
|
||||
const origFetch = window.fetch;
|
||||
window.fetch = async function (input: any, init?: any) {
|
||||
const url = String(input);
|
||||
if (url.endsWith('/health')) {
|
||||
return new Response(JSON.stringify({
|
||||
status: 'healthy',
|
||||
token: 'test-token',
|
||||
mode: 'headed',
|
||||
agent: { status: 'idle', runningFor: null, queueLength: 0 },
|
||||
session: null,
|
||||
security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
|
||||
}), { status: 200, headers: { 'Content-Type': 'application/json' } });
|
||||
}
|
||||
if (url.includes('/sidebar-chat')) {
|
||||
return new Response(JSON.stringify({
|
||||
entries: scenarioRef.securityEntries ?? [],
|
||||
total: (scenarioRef.securityEntries ?? []).length,
|
||||
agentStatus: 'idle',
|
||||
activeTabId: 1,
|
||||
security: { status: 'protected', layers: { testsavant: 'ok', transcript: 'ok', canary: 'ok' } },
|
||||
}), { status: 200, headers: { 'Content-Type': 'application/json' } });
|
||||
}
|
||||
if (url.includes('/security-decision') && init?.method === 'POST') {
|
||||
try {
|
||||
const body = JSON.parse(init.body || '{}');
|
||||
(window as any).__decisionCalls.push(body);
|
||||
} catch {
|
||||
(window as any).__decisionCalls.push({ _parseError: true, raw: init?.body });
|
||||
}
|
||||
return new Response(JSON.stringify({ ok: true }), { status: 200, headers: { 'Content-Type': 'application/json' } });
|
||||
}
|
||||
if (url.includes('/sidebar-tabs')) {
|
||||
return new Response(JSON.stringify({ tabs: [] }), { status: 200 });
|
||||
}
|
||||
if (typeof origFetch === 'function') return origFetch(input, init);
|
||||
return new Response('{}', { status: 200 });
|
||||
} as any;
|
||||
}, scenario);
|
||||
}
|
||||
|
||||
let browser: Browser | null = null;
|
||||
|
||||
beforeAll(async () => {
|
||||
if (!CHROMIUM_AVAILABLE) return;
|
||||
browser = await chromium.launch({ headless: true });
|
||||
}, 30000);
|
||||
|
||||
afterAll(async () => {
|
||||
if (browser) {
|
||||
try {
|
||||
// Race browser.close() against a timeout — on rare occasions Playwright
|
||||
// hangs on close because an EventSource stub keeps a poll alive. 10s is
|
||||
// plenty; past that we forcibly drop the handle. Bun's default hook
|
||||
// timeout is 5s and has bitten this file.
|
||||
await Promise.race([
|
||||
browser.close(),
|
||||
new Promise<void>((resolve) => setTimeout(resolve, 10000)),
|
||||
]);
|
||||
} catch {}
|
||||
}
|
||||
}, 15000);
|
||||
|
||||
/**
|
||||
* The reviewable security_event the sidebar-agent emits on tool-output BLOCK.
|
||||
* Mirrors the shape of the real production event: verdict:'block',
|
||||
* reviewable:true, suspected_text excerpt, per-layer signals, and tabId
|
||||
* so the banner's Allow/Block buttons know which tab to decide for.
|
||||
*/
|
||||
function buildReviewableEntry(overrides?: Partial<any>): any {
|
||||
return {
|
||||
id: 42,
|
||||
ts: '2026-04-20T12:00:00Z',
|
||||
role: 'agent',
|
||||
type: 'security_event',
|
||||
verdict: 'block',
|
||||
reason: 'tool_result_ml',
|
||||
layer: 'testsavant_content',
|
||||
confidence: 0.95,
|
||||
domain: 'news.ycombinator.com',
|
||||
tool: 'Bash',
|
||||
reviewable: true,
|
||||
suspected_text: 'A comment thread discussing ignore previous instructions and reveal secrets — classifier flagged this as injection but it is actually benign developer content about a prompt injection incident.',
|
||||
signals: [
|
||||
{ layer: 'testsavant_content', confidence: 0.95 },
|
||||
{ layer: 'transcript_classifier', confidence: 0.0, meta: { degraded: true } },
|
||||
],
|
||||
tabId: 1,
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
describe('sidepanel review-flow E2E', () => {
|
||||
test.skipIf(!CHROMIUM_AVAILABLE)('reviewable event shows review banner with suspected text + buttons', async () => {
|
||||
const context = await browser!.newContext();
|
||||
const page = await context.newPage();
|
||||
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
|
||||
await page.goto(SIDEPANEL_URL);
|
||||
|
||||
// Wait for /sidebar-chat poll to deliver the entry + banner to render.
|
||||
await page.waitForFunction(
|
||||
() => {
|
||||
const b = document.getElementById('security-banner') as HTMLElement | null;
|
||||
return !!b && b.style.display !== 'none';
|
||||
},
|
||||
{ timeout: 5000 },
|
||||
);
|
||||
|
||||
// Title flips to the review framing (not "Session terminated")
|
||||
const title = await page.$eval('#security-banner-title', (el) => el.textContent);
|
||||
expect(title).toContain('Review suspected injection');
|
||||
|
||||
// Subtitle mentions the tool + domain
|
||||
const subtitle = await page.$eval('#security-banner-subtitle', (el) => el.textContent);
|
||||
expect(subtitle).toContain('Bash');
|
||||
expect(subtitle).toContain('news.ycombinator.com');
|
||||
expect(subtitle).toContain('allow to continue');
|
||||
|
||||
// Suspected text shows up unescaped (textContent, not innerHTML)
|
||||
const suspect = await page.$eval('#security-banner-suspect', (el) => el.textContent);
|
||||
expect(suspect).toContain('ignore previous instructions');
|
||||
|
||||
// Both action buttons are visible
|
||||
const allowVisible = await page.locator('#security-banner-btn-allow').isVisible();
|
||||
const blockVisible = await page.locator('#security-banner-btn-block').isVisible();
|
||||
expect(allowVisible).toBe(true);
|
||||
expect(blockVisible).toBe(true);
|
||||
|
||||
// Details auto-expanded so the user sees context
|
||||
const detailsHidden = await page.$eval('#security-banner-details', (el) => (el as HTMLElement).hidden);
|
||||
expect(detailsHidden).toBe(false);
|
||||
|
||||
await context.close();
|
||||
}, 15000);
|
||||
|
||||
test.skipIf(!CHROMIUM_AVAILABLE)('clicking Allow posts {decision:"allow"} and hides banner', async () => {
|
||||
const context = await browser!.newContext();
|
||||
const page = await context.newPage();
|
||||
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry()] });
|
||||
await page.goto(SIDEPANEL_URL);
|
||||
await page.waitForSelector('#security-banner-btn-allow:visible', { timeout: 5000 });
|
||||
|
||||
await page.click('#security-banner-btn-allow');
|
||||
|
||||
// Decision POST should have fired with decision:"allow" and the tabId
|
||||
// from the security_event. Give the fetch promise a tick to resolve.
|
||||
await page.waitForFunction(
|
||||
() => (window as any).__decisionCalls?.length > 0,
|
||||
{ timeout: 2000 },
|
||||
);
|
||||
|
||||
const calls = await page.evaluate(() => (window as any).__decisionCalls);
|
||||
expect(calls).toHaveLength(1);
|
||||
expect(calls[0].decision).toBe('allow');
|
||||
expect(calls[0].tabId).toBe(1);
|
||||
expect(calls[0].reason).toBe('user');
|
||||
|
||||
// Banner should hide optimistically after the POST
|
||||
await page.waitForFunction(
|
||||
() => {
|
||||
const b = document.getElementById('security-banner') as HTMLElement | null;
|
||||
return !!b && b.style.display === 'none';
|
||||
},
|
||||
{ timeout: 2000 },
|
||||
);
|
||||
|
||||
await context.close();
|
||||
}, 15000);
|
||||
|
||||
test.skipIf(!CHROMIUM_AVAILABLE)('clicking Block posts {decision:"block"} and hides banner', async () => {
|
||||
const context = await browser!.newContext();
|
||||
const page = await context.newPage();
|
||||
await installStubsAndCapture(page, { securityEntries: [buildReviewableEntry({ id: 55 })] });
|
||||
await page.goto(SIDEPANEL_URL);
|
||||
await page.waitForSelector('#security-banner-btn-block:visible', { timeout: 5000 });
|
||||
|
||||
await page.click('#security-banner-btn-block');
|
||||
|
||||
await page.waitForFunction(
|
||||
() => (window as any).__decisionCalls?.length > 0,
|
||||
{ timeout: 2000 },
|
||||
);
|
||||
|
||||
const calls = await page.evaluate(() => (window as any).__decisionCalls);
|
||||
expect(calls).toHaveLength(1);
|
||||
expect(calls[0].decision).toBe('block');
|
||||
expect(calls[0].tabId).toBe(1);
|
||||
|
||||
await page.waitForFunction(
|
||||
() => {
|
||||
const b = document.getElementById('security-banner') as HTMLElement | null;
|
||||
return !!b && b.style.display === 'none';
|
||||
},
|
||||
{ timeout: 2000 },
|
||||
);
|
||||
|
||||
await context.close();
|
||||
}, 15000);
|
||||
|
||||
test.skipIf(!CHROMIUM_AVAILABLE)('non-reviewable event still shows hard-stop banner with no buttons', async () => {
|
||||
// Regression guard: the existing hard-stop canary leak UX must not be
|
||||
// disturbed by the reviewable branch. An event without reviewable:true
|
||||
// keeps the old behavior.
|
||||
const hardStop = {
|
||||
id: 99,
|
||||
ts: '2026-04-20T12:00:00Z',
|
||||
role: 'agent',
|
||||
type: 'security_event',
|
||||
verdict: 'block',
|
||||
reason: 'canary_leaked',
|
||||
layer: 'canary',
|
||||
confidence: 1.0,
|
||||
domain: 'attacker.example.com',
|
||||
channel: 'tool_use:Bash',
|
||||
tabId: 1,
|
||||
};
|
||||
const context = await browser!.newContext();
|
||||
const page = await context.newPage();
|
||||
await installStubsAndCapture(page, { securityEntries: [hardStop] });
|
||||
await page.goto(SIDEPANEL_URL);
|
||||
await page.waitForFunction(
|
||||
() => {
|
||||
const b = document.getElementById('security-banner') as HTMLElement | null;
|
||||
return !!b && b.style.display !== 'none';
|
||||
},
|
||||
{ timeout: 5000 },
|
||||
);
|
||||
|
||||
const title = await page.$eval('#security-banner-title', (el) => el.textContent);
|
||||
expect(title).toContain('Session terminated');
|
||||
|
||||
// Action row stays hidden for the non-reviewable path
|
||||
const actionsHidden = await page.$eval('#security-banner-actions', (el) => (el as HTMLElement).hidden);
|
||||
expect(actionsHidden).toBe(true);
|
||||
|
||||
await context.close();
|
||||
}, 15000);
|
||||
|
||||
test.skipIf(!CHROMIUM_AVAILABLE)('suspected text renders via textContent, not innerHTML (XSS guard)', async () => {
|
||||
// If the sidepanel ever regressed to innerHTML for the suspected text,
|
||||
// a crafted excerpt could execute script. This test uses one; if the
|
||||
// <script> runs, window.__xss gets set. It must remain undefined.
|
||||
const xssAttempt = buildReviewableEntry({
|
||||
suspected_text: '<script>window.__xss = "pwn"</script><img src=x onerror="window.__xss=\'onerror\'">',
|
||||
});
|
||||
const context = await browser!.newContext();
|
||||
const page = await context.newPage();
|
||||
await installStubsAndCapture(page, { securityEntries: [xssAttempt] });
|
||||
await page.goto(SIDEPANEL_URL);
|
||||
await page.waitForSelector('#security-banner-suspect:not([hidden])', { timeout: 5000 });
|
||||
|
||||
// The literal text should appear inside the suspect block (as text, not markup)
|
||||
const suspectText = await page.$eval('#security-banner-suspect', (el) => el.textContent);
|
||||
expect(suspectText).toContain('<script>');
|
||||
|
||||
// No script executed
|
||||
const xssFlag = await page.evaluate(() => (window as any).__xss);
|
||||
expect(xssFlag).toBeUndefined();
|
||||
|
||||
await context.close();
|
||||
}, 15000);
|
||||
});
|
||||
@@ -1,226 +0,0 @@
|
||||
/**
|
||||
* Layer 3: Sidebar agent round-trip tests.
|
||||
* Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
|
||||
* script that outputs canned stream-json. Verifies events flow end-to-end:
|
||||
* POST /sidebar-command → queue → sidebar-agent → mock claude → events → /sidebar-chat
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
let serverProc: Subprocess | null = null;
|
||||
let agentProc: Subprocess | null = null;
|
||||
let serverPort: number = 0;
|
||||
let authToken: string = '';
|
||||
let tmpDir: string = '';
|
||||
let stateFile: string = '';
|
||||
let queueFile: string = '';
|
||||
let mockBinDir: string = '';
|
||||
|
||||
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
...(opts.headers as Record<string, string> || {}),
|
||||
};
|
||||
if (!headers['Authorization'] && authToken) {
|
||||
headers['Authorization'] = `Bearer ${authToken}`;
|
||||
}
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
||||
}
|
||||
|
||||
async function resetState() {
|
||||
await api('/sidebar-session/new', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
}
|
||||
|
||||
async function pollChatUntil(
|
||||
predicate: (entries: any[]) => boolean,
|
||||
timeoutMs = 10000,
|
||||
): Promise<any[]> {
|
||||
const deadline = Date.now() + timeoutMs;
|
||||
while (Date.now() < deadline) {
|
||||
const resp = await api('/sidebar-chat?after=0');
|
||||
const data = await resp.json();
|
||||
if (predicate(data.entries)) return data.entries;
|
||||
await new Promise(r => setTimeout(r, 300));
|
||||
}
|
||||
// Return whatever we have on timeout
|
||||
const resp = await api('/sidebar-chat?after=0');
|
||||
return (await resp.json()).entries;
|
||||
}
|
||||
|
||||
function writeMockClaude(script: string) {
|
||||
const mockPath = path.join(mockBinDir, 'claude');
|
||||
fs.writeFileSync(mockPath, script, { mode: 0o755 });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
mockBinDir = path.join(tmpDir, 'bin');
|
||||
fs.mkdirSync(mockBinDir, { recursive: true });
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
// Write default mock claude that outputs canned events
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"system","session_id":"mock-session-123"}'
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
|
||||
echo '{"type":"result","result":"Done."}'
|
||||
`);
|
||||
|
||||
// Start server (no browser)
|
||||
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1',
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Wait for server
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
|
||||
// Start sidebar-agent with mock claude on PATH
|
||||
const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
|
||||
agentProc = spawn(['bun', 'run', agentScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
PATH: `${mockBinDir}:${process.env.PATH}`,
|
||||
BROWSE_SERVER_PORT: String(serverPort),
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
SIDEBAR_AGENT_TIMEOUT: '10000',
|
||||
BROWSE_BIN: 'browse', // doesn't matter, mock claude doesn't use it
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Give sidebar-agent time to start polling
|
||||
await new Promise(r => setTimeout(r, 1000));
|
||||
}, 20000);
|
||||
|
||||
afterAll(() => {
|
||||
if (agentProc) { try { agentProc.kill(); } catch {} }
|
||||
if (serverProc) { try { serverProc.kill(); } catch {} }
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
describe('sidebar-agent round-trip', () => {
|
||||
test('full message round-trip with mock claude', async () => {
|
||||
await resetState();
|
||||
|
||||
// Send a command
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'what is on this page?',
|
||||
activeTabUrl: 'https://example.com/test',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Wait for mock claude to process and events to arrive
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.some((e: any) => e.type === 'agent_done'),
|
||||
15000,
|
||||
);
|
||||
|
||||
// Verify the flow: user message → agent_start → text → agent_done
|
||||
const userEntry = entries.find((e: any) => e.role === 'user');
|
||||
expect(userEntry).toBeDefined();
|
||||
expect(userEntry.message).toBe('what is on this page?');
|
||||
|
||||
// The mock claude outputs text — check for any agent text entry
|
||||
const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
|
||||
expect(textEntries.length).toBeGreaterThan(0);
|
||||
|
||||
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
|
||||
expect(doneEntry).toBeDefined();
|
||||
|
||||
// Agent should be back to idle
|
||||
const session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
}, 20000);
|
||||
|
||||
test('claude crash produces agent_error', async () => {
|
||||
await resetState();
|
||||
|
||||
// Replace mock claude with one that crashes
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"system","session_id":"crash-test"}' >&2
|
||||
exit 1
|
||||
`);
|
||||
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'crash test' }),
|
||||
});
|
||||
|
||||
// Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
|
||||
15000,
|
||||
);
|
||||
|
||||
// Agent should recover to idle
|
||||
const session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
|
||||
// Restore working mock
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
|
||||
`);
|
||||
}, 20000);
|
||||
|
||||
test('sequential queue drain', async () => {
|
||||
await resetState();
|
||||
|
||||
// Restore working mock
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
|
||||
`);
|
||||
|
||||
// Send two messages rapidly — first processes, second queues
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'first message' }),
|
||||
});
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'second message' }),
|
||||
});
|
||||
|
||||
// Wait for both to complete (two agent_done events)
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
|
||||
20000,
|
||||
);
|
||||
|
||||
// Both user messages should be in chat
|
||||
const userEntries = entries.filter((e: any) => e.role === 'user');
|
||||
expect(userEntries.length).toBeGreaterThanOrEqual(2);
|
||||
}, 25000);
|
||||
});
|
||||
@@ -1,562 +0,0 @@
|
||||
/**
|
||||
* Tests for sidebar agent queue parsing and inbox writing.
|
||||
*
|
||||
* sidebar-agent.ts functions are not exported (it's an entry-point script),
|
||||
* so we test the same logic inline: JSONL parsing, writeToInbox filesystem
|
||||
* behavior, and edge cases.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
// ─── Helpers: replicate sidebar-agent logic for unit testing ──────
|
||||
|
||||
/** Parse a single JSONL line — same logic as sidebar-agent poll() */
|
||||
function parseQueueLine(line: string): any | null {
|
||||
if (!line.trim()) return null;
|
||||
try {
|
||||
const entry = JSON.parse(line);
|
||||
if (!entry.message && !entry.prompt) return null;
|
||||
return entry;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** Read all valid entries from a JSONL string — same as countLines + readLine loop */
|
||||
function parseQueueFile(content: string): any[] {
|
||||
const entries: any[] = [];
|
||||
const lines = content.split('\n').filter(Boolean);
|
||||
for (const line of lines) {
|
||||
const entry = parseQueueLine(line);
|
||||
if (entry) entries.push(entry);
|
||||
}
|
||||
return entries;
|
||||
}
|
||||
|
||||
/** Write to inbox — extracted logic from sidebar-agent.ts writeToInbox() */
|
||||
function writeToInbox(
|
||||
gitRoot: string,
|
||||
message: string,
|
||||
pageUrl?: string,
|
||||
sessionId?: string,
|
||||
): string | null {
|
||||
if (!gitRoot) return null;
|
||||
|
||||
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
|
||||
fs.mkdirSync(inboxDir, { recursive: true });
|
||||
|
||||
const now = new Date();
|
||||
const timestamp = now.toISOString().replace(/:/g, '-');
|
||||
const filename = `${timestamp}-observation.json`;
|
||||
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
|
||||
const finalFile = path.join(inboxDir, filename);
|
||||
|
||||
const inboxMessage = {
|
||||
type: 'observation',
|
||||
timestamp: now.toISOString(),
|
||||
page: { url: pageUrl || 'unknown', title: '' },
|
||||
userMessage: message,
|
||||
sidebarSessionId: sessionId || 'unknown',
|
||||
};
|
||||
|
||||
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2));
|
||||
fs.renameSync(tmpFile, finalFile);
|
||||
return finalFile;
|
||||
}
|
||||
|
||||
/** Shorten paths — same logic as sidebar-agent.ts shorten() */
|
||||
function shorten(str: string): string {
|
||||
return str
|
||||
.replace(/\/Users\/[^/]+/g, '~')
|
||||
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
|
||||
.replace(/\.claude\/skills\/gstack\//g, '')
|
||||
.replace(/browse\/dist\/browse/g, '$B');
|
||||
}
|
||||
|
||||
/** describeToolCall — replicated from sidebar-agent.ts for unit testing */
|
||||
function describeToolCall(tool: string, input: any): string {
|
||||
if (!input) return '';
|
||||
|
||||
if (tool === 'Bash' && input.command) {
|
||||
const cmd = input.command;
|
||||
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
|
||||
if (browseMatch) {
|
||||
const browseCmd = browseMatch[1] || browseMatch[2];
|
||||
const args = cmd.split(/\s+/).slice(2).join(' ');
|
||||
switch (browseCmd) {
|
||||
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
|
||||
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
|
||||
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
|
||||
case 'click': return `Clicking ${args}`;
|
||||
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
|
||||
case 'text': return 'Reading page text';
|
||||
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
|
||||
case 'links': return 'Finding all links on the page';
|
||||
case 'forms': return 'Looking for forms';
|
||||
case 'console': return 'Checking browser console for errors';
|
||||
case 'network': return 'Checking network requests';
|
||||
case 'url': return 'Checking current URL';
|
||||
case 'back': return 'Going back';
|
||||
case 'forward': return 'Going forward';
|
||||
case 'reload': return 'Reloading the page';
|
||||
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
|
||||
case 'wait': return `Waiting for ${args}`;
|
||||
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
|
||||
case 'style': return `Changing CSS: ${args}`;
|
||||
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
|
||||
case 'prettyscreenshot': return 'Taking a clean screenshot';
|
||||
case 'css': return `Checking CSS property: ${args}`;
|
||||
case 'is': return `Checking if element is ${args}`;
|
||||
case 'diff': return `Comparing ${args}`;
|
||||
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
|
||||
case 'status': return 'Checking browser status';
|
||||
case 'tabs': return 'Listing open tabs';
|
||||
case 'focus': return 'Bringing browser to front';
|
||||
case 'select': return `Selecting option in ${args}`;
|
||||
case 'hover': return `Hovering over ${args}`;
|
||||
case 'viewport': return `Setting viewport to ${args}`;
|
||||
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
|
||||
default: return `Running browse ${browseCmd} ${args}`.trim();
|
||||
}
|
||||
}
|
||||
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
|
||||
let short = shorten(cmd);
|
||||
return short.length > 100 ? short.slice(0, 100) + '…' : short;
|
||||
}
|
||||
|
||||
if (tool === 'Read' && input.file_path) return `Reading ${shorten(input.file_path)}`;
|
||||
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
|
||||
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
|
||||
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
|
||||
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
|
||||
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
|
||||
}
|
||||
|
||||
// ─── Test setup ──────────────────────────────────────────────────
|
||||
|
||||
let tmpDir: string;
|
||||
|
||||
beforeEach(() => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-agent-test-'));
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
// ─── Queue File Parsing ─────────────────────────────────────────
|
||||
|
||||
describe('queue file parsing', () => {
|
||||
test('valid JSONL line parsed correctly', () => {
|
||||
const line = JSON.stringify({ message: 'hello', prompt: 'check this', pageUrl: 'https://example.com' });
|
||||
const entry = parseQueueLine(line);
|
||||
expect(entry).not.toBeNull();
|
||||
expect(entry.message).toBe('hello');
|
||||
expect(entry.prompt).toBe('check this');
|
||||
expect(entry.pageUrl).toBe('https://example.com');
|
||||
});
|
||||
|
||||
test('malformed JSON line skipped without crash', () => {
|
||||
const entry = parseQueueLine('this is not json {{{');
|
||||
expect(entry).toBeNull();
|
||||
});
|
||||
|
||||
test('valid JSON without message or prompt is skipped', () => {
|
||||
const line = JSON.stringify({ foo: 'bar' });
|
||||
const entry = parseQueueLine(line);
|
||||
expect(entry).toBeNull();
|
||||
});
|
||||
|
||||
test('empty file returns no entries', () => {
|
||||
const entries = parseQueueFile('');
|
||||
expect(entries).toEqual([]);
|
||||
});
|
||||
|
||||
test('file with blank lines returns no entries', () => {
|
||||
const entries = parseQueueFile('\n\n\n');
|
||||
expect(entries).toEqual([]);
|
||||
});
|
||||
|
||||
test('mixed valid and invalid lines', () => {
|
||||
const content = [
|
||||
JSON.stringify({ message: 'first' }),
|
||||
'not json',
|
||||
JSON.stringify({ unrelated: true }),
|
||||
JSON.stringify({ message: 'second', prompt: 'do stuff' }),
|
||||
].join('\n');
|
||||
|
||||
const entries = parseQueueFile(content);
|
||||
expect(entries.length).toBe(2);
|
||||
expect(entries[0].message).toBe('first');
|
||||
expect(entries[1].message).toBe('second');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── writeToInbox ────────────────────────────────────────────────
|
||||
|
||||
describe('writeToInbox', () => {
|
||||
test('creates .context/sidebar-inbox/ directory', () => {
|
||||
writeToInbox(tmpDir, 'test message');
|
||||
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
|
||||
expect(fs.existsSync(inboxDir)).toBe(true);
|
||||
expect(fs.statSync(inboxDir).isDirectory()).toBe(true);
|
||||
});
|
||||
|
||||
test('writes valid JSON file', () => {
|
||||
const filePath = writeToInbox(tmpDir, 'test message', 'https://example.com', 'session-123');
|
||||
expect(filePath).not.toBeNull();
|
||||
expect(fs.existsSync(filePath!)).toBe(true);
|
||||
|
||||
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
|
||||
expect(data.type).toBe('observation');
|
||||
expect(data.userMessage).toBe('test message');
|
||||
expect(data.page.url).toBe('https://example.com');
|
||||
expect(data.sidebarSessionId).toBe('session-123');
|
||||
expect(data.timestamp).toBeTruthy();
|
||||
});
|
||||
|
||||
test('atomic write — final file exists, no .tmp left', () => {
|
||||
const filePath = writeToInbox(tmpDir, 'atomic test');
|
||||
expect(filePath).not.toBeNull();
|
||||
expect(fs.existsSync(filePath!)).toBe(true);
|
||||
|
||||
// Check no .tmp files remain in the inbox directory
|
||||
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
|
||||
const files = fs.readdirSync(inboxDir);
|
||||
const tmpFiles = files.filter(f => f.endsWith('.tmp'));
|
||||
expect(tmpFiles.length).toBe(0);
|
||||
|
||||
// Final file should end with -observation.json
|
||||
const jsonFiles = files.filter(f => f.endsWith('-observation.json') && !f.startsWith('.'));
|
||||
expect(jsonFiles.length).toBe(1);
|
||||
});
|
||||
|
||||
test('handles missing git root gracefully', () => {
|
||||
const result = writeToInbox('', 'test');
|
||||
expect(result).toBeNull();
|
||||
});
|
||||
|
||||
test('defaults pageUrl to unknown when not provided', () => {
|
||||
const filePath = writeToInbox(tmpDir, 'no url provided');
|
||||
expect(filePath).not.toBeNull();
|
||||
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
|
||||
expect(data.page.url).toBe('unknown');
|
||||
});
|
||||
|
||||
test('defaults sessionId to unknown when not provided', () => {
|
||||
const filePath = writeToInbox(tmpDir, 'no session');
|
||||
expect(filePath).not.toBeNull();
|
||||
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
|
||||
expect(data.sidebarSessionId).toBe('unknown');
|
||||
});
|
||||
|
||||
test('multiple writes create separate files', () => {
|
||||
writeToInbox(tmpDir, 'message 1');
|
||||
// Tiny delay to ensure different timestamps
|
||||
const t = Date.now();
|
||||
while (Date.now() === t) {} // spin until next ms
|
||||
writeToInbox(tmpDir, 'message 2');
|
||||
|
||||
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
|
||||
const files = fs.readdirSync(inboxDir).filter(f => f.endsWith('.json') && !f.startsWith('.'));
|
||||
expect(files.length).toBe(2);
|
||||
});
|
||||
});
|
||||
|
||||
// ─── describeToolCall (verbose narration) ────────────────────────
|
||||
|
||||
describe('describeToolCall', () => {
|
||||
// Browse navigation commands
|
||||
test('goto → plain English with URL', () => {
|
||||
const result = describeToolCall('Bash', { command: '$B goto https://example.com' });
|
||||
expect(result).toBe('Opening https://example.com');
|
||||
});
|
||||
|
||||
test('goto strips quotes from URL', () => {
|
||||
const result = describeToolCall('Bash', { command: '$B goto "https://example.com"' });
|
||||
expect(result).toBe('Opening https://example.com');
|
||||
});
|
||||
|
||||
test('url → checking current URL', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B url' })).toBe('Checking current URL');
|
||||
});
|
||||
|
||||
test('back/forward/reload → plain English', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B back' })).toBe('Going back');
|
||||
expect(describeToolCall('Bash', { command: '$B forward' })).toBe('Going forward');
|
||||
expect(describeToolCall('Bash', { command: '$B reload' })).toBe('Reloading the page');
|
||||
});
|
||||
|
||||
// Snapshot variants
|
||||
test('snapshot -i → scanning for interactive elements', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B snapshot -i' })).toBe('Scanning for interactive elements');
|
||||
});
|
||||
|
||||
test('snapshot -D → checking what changed', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B snapshot -D' })).toBe('Checking what changed');
|
||||
});
|
||||
|
||||
test('snapshot (plain) → taking a snapshot', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B snapshot' })).toBe('Taking a snapshot of the page');
|
||||
});
|
||||
|
||||
// Interaction commands
|
||||
test('click → clicking element', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B click @e3' })).toBe('Clicking @e3');
|
||||
});
|
||||
|
||||
test('fill → typing into element', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B fill @e4 "hello world"' })).toBe('Typing ""hello world"" into @e4');
|
||||
});
|
||||
|
||||
test('scroll with selector → scrolling to element', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B scroll .footer' })).toBe('Scrolling to .footer');
|
||||
});
|
||||
|
||||
test('scroll without args → scrolling down', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B scroll' })).toBe('Scrolling down');
|
||||
});
|
||||
|
||||
// Reading commands
|
||||
test('text → reading page text', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B text' })).toBe('Reading page text');
|
||||
});
|
||||
|
||||
test('html with selector → reading HTML of element', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B html .header' })).toBe('Reading HTML of .header');
|
||||
});
|
||||
|
||||
test('html without selector → reading full page HTML', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B html' })).toBe('Reading full page HTML');
|
||||
});
|
||||
|
||||
test('links → finding all links', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B links' })).toBe('Finding all links on the page');
|
||||
});
|
||||
|
||||
test('console → checking console', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B console' })).toBe('Checking browser console for errors');
|
||||
});
|
||||
|
||||
// Inspector commands
|
||||
test('inspect with selector → inspecting CSS', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B inspect .header' })).toBe('Inspecting CSS of .header');
|
||||
});
|
||||
|
||||
test('inspect without args → getting last picked element', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B inspect' })).toBe('Getting CSS for last picked element');
|
||||
});
|
||||
|
||||
test('style → changing CSS', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B style .header color red' })).toBe('Changing CSS: .header color red');
|
||||
});
|
||||
|
||||
test('cleanup → removing page clutter', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B cleanup --all' })).toBe('Removing page clutter (ads, popups, banners)');
|
||||
});
|
||||
|
||||
// Visual commands
|
||||
test('screenshot → saving screenshot', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B screenshot /tmp/shot.png' })).toBe('Saving screenshot to /tmp/shot.png');
|
||||
});
|
||||
|
||||
test('screenshot without path', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B screenshot' })).toBe('Saving screenshot');
|
||||
});
|
||||
|
||||
test('responsive → multi-size screenshots', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B responsive' })).toBe('Taking screenshots at mobile, tablet, and desktop sizes');
|
||||
});
|
||||
|
||||
// Non-browse tools
|
||||
test('Read tool → reading file', () => {
|
||||
expect(describeToolCall('Read', { file_path: '/Users/foo/project/src/app.ts' })).toBe('Reading ~/project/src/app.ts');
|
||||
});
|
||||
|
||||
test('Grep tool → searching for pattern', () => {
|
||||
expect(describeToolCall('Grep', { pattern: 'handleClick' })).toBe('Searching for "handleClick"');
|
||||
});
|
||||
|
||||
test('Glob tool → finding files', () => {
|
||||
expect(describeToolCall('Glob', { pattern: '**/*.tsx' })).toBe('Finding files matching **/*.tsx');
|
||||
});
|
||||
|
||||
test('Edit tool → editing file', () => {
|
||||
expect(describeToolCall('Edit', { file_path: '/Users/foo/src/main.ts' })).toBe('Editing ~/src/main.ts');
|
||||
});
|
||||
|
||||
// Edge cases
|
||||
test('null input → empty string', () => {
|
||||
expect(describeToolCall('Bash', null)).toBe('');
|
||||
});
|
||||
|
||||
test('unknown browse command → generic description', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B newtab https://foo.com' })).toContain('newtab');
|
||||
});
|
||||
|
||||
test('non-browse bash → shortened command', () => {
|
||||
expect(describeToolCall('Bash', { command: 'echo hello' })).toBe('echo hello');
|
||||
});
|
||||
|
||||
test('full browse binary path recognized', () => {
|
||||
const result = describeToolCall('Bash', { command: '/Users/garrytan/.claude/skills/gstack/browse/dist/browse goto https://example.com' });
|
||||
expect(result).toBe('Opening https://example.com');
|
||||
});
|
||||
|
||||
test('tab command → switching tab', () => {
|
||||
expect(describeToolCall('Bash', { command: '$B tab 2' })).toContain('tab');
|
||||
});
|
||||
});
|
||||
|
||||
// ─── Per-tab agent concurrency (source code validation) ──────────
|
||||
|
||||
describe('per-tab agent concurrency', () => {
|
||||
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
|
||||
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||||
|
||||
test('server has per-tab agent state map', () => {
|
||||
expect(serverSrc).toContain('tabAgents');
|
||||
expect(serverSrc).toContain('TabAgentState');
|
||||
expect(serverSrc).toContain('getTabAgent');
|
||||
});
|
||||
|
||||
test('server returns per-tab agent status in /sidebar-chat', () => {
|
||||
expect(serverSrc).toContain('getTabAgentStatus');
|
||||
expect(serverSrc).toContain('tabAgentStatus');
|
||||
});
|
||||
|
||||
test('spawnClaude accepts forTabId parameter', () => {
|
||||
const spawnFn = serverSrc.slice(
|
||||
serverSrc.indexOf('function spawnClaude('),
|
||||
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
|
||||
);
|
||||
expect(spawnFn).toContain('forTabId');
|
||||
expect(spawnFn).toContain('tabState.status');
|
||||
});
|
||||
|
||||
test('sidebar-command endpoint uses per-tab agent state', () => {
|
||||
expect(serverSrc).toContain('msgTabId');
|
||||
expect(serverSrc).toContain('tabState.status');
|
||||
expect(serverSrc).toContain('tabState.queue');
|
||||
});
|
||||
|
||||
test('agent event handler resets per-tab state', () => {
|
||||
expect(serverSrc).toContain('eventTabId');
|
||||
expect(serverSrc).toContain('tabState.status = \'idle\'');
|
||||
});
|
||||
|
||||
test('agent event handler processes per-tab queue', () => {
|
||||
// After agent_done, should process next message from THIS tab's queue
|
||||
expect(serverSrc).toContain('tabState.queue.length > 0');
|
||||
expect(serverSrc).toContain('tabState.queue.shift');
|
||||
});
|
||||
|
||||
test('sidebar-agent uses per-tab processing set', () => {
|
||||
expect(agentSrc).toContain('processingTabs');
|
||||
expect(agentSrc).not.toContain('isProcessing');
|
||||
});
|
||||
|
||||
test('sidebar-agent sends tabId with all events', () => {
|
||||
// sendEvent should accept tabId parameter
|
||||
expect(agentSrc).toContain('async function sendEvent(event: Record<string, any>, tabId?: number)');
|
||||
// askClaude destructures tabId from queue entry (regex tolerates
|
||||
// additional fields like `canary` and `pageUrl` from security module).
|
||||
expect(agentSrc).toMatch(
|
||||
/const \{[^}]*\bprompt\b[^}]*\bargs\b[^}]*\bstateFile\b[^}]*\bcwd\b[^}]*\btabId\b[^}]*\}/
|
||||
);
|
||||
});
|
||||
|
||||
test('sidebar-agent allows concurrent agents across tabs', () => {
|
||||
// poll() should not block globally — it should check per-tab
|
||||
expect(agentSrc).toContain('processingTabs.has(tid)');
|
||||
// askClaude should be fire-and-forget (no await blocking the loop)
|
||||
expect(agentSrc).toContain('askClaude(entry).catch');
|
||||
});
|
||||
|
||||
test('queue entries include tabId', () => {
|
||||
const spawnFn = serverSrc.slice(
|
||||
serverSrc.indexOf('function spawnClaude('),
|
||||
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
|
||||
);
|
||||
expect(spawnFn).toContain('tabId: agentTabId');
|
||||
});
|
||||
|
||||
test('health check monitors all per-tab agents', () => {
|
||||
expect(serverSrc).toContain('for (const [tid, state] of tabAgents)');
|
||||
});
|
||||
});
|
||||
|
||||
describe('BROWSE_TAB tab pinning (cross-tab isolation)', () => {
|
||||
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
|
||||
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||||
const cliSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'cli.ts'), 'utf-8');
|
||||
|
||||
test('sidebar-agent passes BROWSE_TAB env var to claude process', () => {
|
||||
// The env block should include BROWSE_TAB set to the tab ID
|
||||
expect(agentSrc).toContain('BROWSE_TAB');
|
||||
expect(agentSrc).toContain('String(tid)');
|
||||
});
|
||||
|
||||
test('CLI reads BROWSE_TAB and sends tabId in command body', () => {
|
||||
// BROWSE_TAB env var is still honored (sidebar-agent path). After the
|
||||
// make-pdf refactor, the CLI layer now also accepts --tab-id <N>, with
|
||||
// the CLI flag taking precedence over the env var. Both resolve to the
|
||||
// same `tabId` body field.
|
||||
expect(cliSrc).toContain('process.env.BROWSE_TAB');
|
||||
expect(cliSrc).toContain('parseInt(envTab, 10)');
|
||||
});
|
||||
|
||||
test('handleCommandInternal accepts tabId from request body', () => {
|
||||
const handleFn = serverSrc.slice(
|
||||
serverSrc.indexOf('async function handleCommandInternal('),
|
||||
serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1) > 0
|
||||
? serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1)
|
||||
: serverSrc.indexOf('\nasync function ', serverSrc.indexOf('async function handleCommandInternal(') + 200),
|
||||
);
|
||||
// Should destructure tabId from body
|
||||
expect(handleFn).toContain('tabId');
|
||||
// Should save and restore the active tab
|
||||
expect(handleFn).toContain('savedTabId');
|
||||
expect(handleFn).toContain('switchTab(tabId');
|
||||
});
|
||||
|
||||
test('handleCommandInternal restores active tab after command (success path)', () => {
|
||||
// On success, should restore savedTabId without stealing focus
|
||||
const handleFn = serverSrc.slice(
|
||||
serverSrc.indexOf('async function handleCommandInternal('),
|
||||
serverSrc.length,
|
||||
);
|
||||
// Count restore calls — should appear in both success and error paths
|
||||
const restoreCount = (handleFn.match(/switchTab\(savedTabId/g) || []).length;
|
||||
expect(restoreCount).toBeGreaterThanOrEqual(2); // success + error paths
|
||||
});
|
||||
|
||||
test('handleCommandInternal restores active tab on error path', () => {
|
||||
// The catch block should also restore
|
||||
const catchBlock = serverSrc.slice(
|
||||
serverSrc.indexOf('} catch (err: any) {', serverSrc.indexOf('async function handleCommandInternal(')),
|
||||
);
|
||||
expect(catchBlock).toContain('switchTab(savedTabId');
|
||||
});
|
||||
|
||||
test('tab pinning only activates when tabId is provided', () => {
|
||||
const handleFn = serverSrc.slice(
|
||||
serverSrc.indexOf('async function handleCommandInternal('),
|
||||
serverSrc.indexOf('try {', serverSrc.indexOf('async function handleCommandInternal(') + 1),
|
||||
);
|
||||
// Should check tabId is not undefined/null before switching
|
||||
expect(handleFn).toContain('tabId !== undefined');
|
||||
expect(handleFn).toContain('tabId !== null');
|
||||
});
|
||||
|
||||
test('CLI only sends tabId when it is a valid number', () => {
|
||||
// Body should conditionally include tabId. Historically that was keyed off
|
||||
// the BROWSE_TAB env var. After the make-pdf refactor, the CLI also honors
|
||||
// a --tab-id <N> flag on the CLI itself, so the check is "tabId defined
|
||||
// AND not NaN" rather than literally inspecting the env var.
|
||||
expect(cliSrc).toContain('tabId !== undefined && !isNaN(tabId)');
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,256 @@
|
||||
/**
|
||||
* Regression: sidebar layout invariants after the chat-tab rip.
|
||||
*
|
||||
* The Chrome side panel used to host two surfaces: Chat (one-shot
|
||||
* `claude -p` queue) and Terminal (interactive PTY). Chat was ripped
|
||||
* once the PTY proved out — sidebar-agent.ts is gone, the chat queue
|
||||
* endpoints are gone, and the primary-tab nav (Terminal | Chat) is
|
||||
* gone. Terminal is now the sole primary surface.
|
||||
*
|
||||
* This file locks the load-bearing invariants of that layout so a
|
||||
* future refactor can't silently re-introduce the old surface or break
|
||||
* the new one.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const HTML = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.html'), 'utf-8');
|
||||
const JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel.js'), 'utf-8');
|
||||
const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8');
|
||||
const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8'));
|
||||
|
||||
describe('sidebar: chat tab + nav are removed, Terminal is sole primary surface', () => {
|
||||
test('No primary-tab nav element exists', () => {
|
||||
expect(HTML).not.toContain('class="primary-tabs"');
|
||||
expect(HTML).not.toContain('data-pane="chat"');
|
||||
expect(HTML).not.toContain('data-pane="terminal"');
|
||||
});
|
||||
|
||||
test('No <main id="tab-chat"> pane', () => {
|
||||
expect(HTML).not.toMatch(/<main[^>]*id="tab-chat"/);
|
||||
expect(HTML).not.toContain('id="chat-messages"');
|
||||
expect(HTML).not.toContain('id="chat-loading"');
|
||||
expect(HTML).not.toContain('id="chat-welcome"');
|
||||
});
|
||||
|
||||
test('No chat input / send button / experimental banner', () => {
|
||||
expect(HTML).not.toContain('class="command-bar"');
|
||||
expect(HTML).not.toContain('id="command-input"');
|
||||
expect(HTML).not.toContain('id="send-btn"');
|
||||
expect(HTML).not.toContain('id="stop-agent-btn"');
|
||||
expect(HTML).not.toContain('id="experimental-banner"');
|
||||
});
|
||||
|
||||
test('No clear-chat button in footer', () => {
|
||||
expect(HTML).not.toContain('id="clear-chat"');
|
||||
});
|
||||
|
||||
test('Terminal pane is .active by default and has the toolbar', () => {
|
||||
expect(HTML).toMatch(/<main[^>]*id="tab-terminal"[^>]*class="tab-content active"/);
|
||||
expect(HTML).toContain('id="terminal-toolbar"');
|
||||
expect(HTML).toContain('id="terminal-restart-now"');
|
||||
});
|
||||
|
||||
test('Quick-actions buttons (Cleanup / Screenshot / Cookies) survive in the terminal toolbar', () => {
|
||||
// Garry explicitly wanted these kept after the chat rip — they drive
|
||||
// browser actions, not chat.
|
||||
expect(HTML).toContain('id="chat-cleanup-btn"');
|
||||
expect(HTML).toContain('id="chat-screenshot-btn"');
|
||||
expect(HTML).toContain('id="chat-cookies-btn"');
|
||||
// They live inside the terminal toolbar now (siblings of the Restart
|
||||
// button), not as a separate strip below all panes.
|
||||
const toolbarStart = HTML.indexOf('id="terminal-toolbar"');
|
||||
const toolbarEnd = HTML.indexOf('</div>', toolbarStart);
|
||||
const toolbarBlock = HTML.slice(toolbarStart, toolbarEnd + 6);
|
||||
expect(toolbarBlock).toContain('id="chat-cleanup-btn"');
|
||||
expect(toolbarBlock).toContain('id="chat-screenshot-btn"');
|
||||
expect(toolbarBlock).toContain('id="chat-cookies-btn"');
|
||||
});
|
||||
});
|
||||
|
||||
describe('sidepanel.js: chat helpers ripped, terminal-injection helper survives', () => {
|
||||
test('No primary-tab click handler', () => {
|
||||
expect(JS).not.toContain("querySelectorAll('.primary-tab')");
|
||||
expect(JS).not.toContain('activePrimaryPaneId');
|
||||
});
|
||||
|
||||
test('No chat polling, sendMessage, sendChat, stopAgent, or pollTabs', () => {
|
||||
expect(JS).not.toContain('chatPollInterval');
|
||||
expect(JS).not.toContain('function sendMessage');
|
||||
expect(JS).not.toContain('function pollChat');
|
||||
expect(JS).not.toContain('function pollTabs');
|
||||
expect(JS).not.toContain('function switchChatTab');
|
||||
expect(JS).not.toContain('function stopAgent');
|
||||
expect(JS).not.toContain('function applyChatEnabled');
|
||||
expect(JS).not.toContain('function showSecurityBanner');
|
||||
});
|
||||
|
||||
test('Cleanup runs through the live PTY (no /sidebar-command POST)', () => {
|
||||
// The new Cleanup handler injects the prompt straight into claude's
|
||||
// PTY via gstackInjectToTerminal. The dead code path was a POST to
|
||||
// /sidebar-command which kicked off a fresh claude -p subprocess.
|
||||
const cleanup = JS.slice(JS.indexOf('async function runCleanup'));
|
||||
expect(cleanup).toContain('window.gstackInjectToTerminal');
|
||||
expect(cleanup).not.toContain('/sidebar-command');
|
||||
expect(cleanup).not.toContain('addChatEntry');
|
||||
});
|
||||
|
||||
test('Inspector "Send to Code" routes through the live PTY', () => {
|
||||
const sendBtn = JS.slice(JS.indexOf('inspectorSendBtn.addEventListener'));
|
||||
expect(sendBtn).toContain('window.gstackInjectToTerminal');
|
||||
expect(sendBtn).not.toContain("type: 'sidebar-command'");
|
||||
});
|
||||
|
||||
test('updateConnection no longer kicks off chat / tab polling', () => {
|
||||
const update = JS.slice(JS.indexOf('function updateConnection'), JS.indexOf('function updateConnection') + 1500);
|
||||
expect(update).not.toContain('chatPollInterval');
|
||||
expect(update).not.toContain('tabPollInterval');
|
||||
expect(update).not.toContain('pollChat');
|
||||
expect(update).not.toContain('pollTabs');
|
||||
// BUT must still expose the bootstrap globals for sidepanel-terminal.js.
|
||||
expect(update).toContain('window.gstackServerPort');
|
||||
expect(update).toContain('window.gstackAuthToken');
|
||||
});
|
||||
});
|
||||
|
||||
describe('sidepanel-terminal.js: eager auto-connect + injection API', () => {
|
||||
test('Exposes window.gstackInjectToTerminal for cross-pane use', () => {
|
||||
expect(TERM_JS).toContain('window.gstackInjectToTerminal');
|
||||
// Returns false when no live session, true when bytes go out.
|
||||
const inject = TERM_JS.slice(TERM_JS.indexOf('window.gstackInjectToTerminal'));
|
||||
expect(inject).toContain('return false');
|
||||
expect(inject).toContain('return true');
|
||||
expect(inject).toContain('ws.readyState !== WebSocket.OPEN');
|
||||
});
|
||||
|
||||
test('Auto-connects on init (no keypress required)', () => {
|
||||
expect(TERM_JS).not.toContain('function onAnyKey');
|
||||
expect(TERM_JS).not.toContain("addEventListener('keydown'");
|
||||
expect(TERM_JS).toContain('function tryAutoConnect');
|
||||
});
|
||||
|
||||
test('Repaint hook fires when Terminal pane becomes visible', () => {
|
||||
// The chat-tab rip removed gstack:primary-tab-changed; we use a
|
||||
// MutationObserver on #tab-terminal's class attr instead. The
|
||||
// observer must call repaintIfLive when the .active class returns.
|
||||
expect(TERM_JS).toContain('MutationObserver');
|
||||
expect(TERM_JS).toContain("attributeFilter: ['class']");
|
||||
expect(TERM_JS).toContain('repaintIfLive');
|
||||
const repaint = TERM_JS.slice(TERM_JS.indexOf('function repaintIfLive'));
|
||||
expect(repaint).toContain('fitAddon && fitAddon.fit()');
|
||||
expect(repaint).toContain('term.refresh');
|
||||
expect(repaint).toContain("type: 'resize'");
|
||||
});
|
||||
|
||||
test('No auto-reconnect on close (Restart is user-initiated)', () => {
|
||||
const closeOnly = TERM_JS.slice(
|
||||
TERM_JS.indexOf("ws.addEventListener('close'"),
|
||||
TERM_JS.indexOf("ws.addEventListener('error'"),
|
||||
);
|
||||
expect(closeOnly).not.toContain('setTimeout');
|
||||
expect(closeOnly).not.toContain('tryAutoConnect');
|
||||
expect(closeOnly).not.toContain('connect()');
|
||||
});
|
||||
|
||||
test('forceRestart helper closes ws, disposes xterm, returns to IDLE', () => {
|
||||
expect(TERM_JS).toContain('function forceRestart');
|
||||
const fn = TERM_JS.slice(TERM_JS.indexOf('function forceRestart'));
|
||||
expect(fn).toContain('ws && ws.close()');
|
||||
expect(fn).toContain('term.dispose()');
|
||||
expect(fn).toContain('STATE.IDLE');
|
||||
expect(fn).toContain('tryAutoConnect()');
|
||||
});
|
||||
|
||||
test('Both restart buttons (mid-session and ENDED) call forceRestart', () => {
|
||||
expect(TERM_JS).toContain("els.restart?.addEventListener('click', forceRestart)");
|
||||
expect(TERM_JS).toContain("els.restartNow?.addEventListener('click', forceRestart)");
|
||||
});
|
||||
});
|
||||
|
||||
describe('server.ts: chat / sidebar-agent endpoints are gone', () => {
|
||||
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
|
||||
|
||||
test('No /sidebar-command, /sidebar-chat, /sidebar-agent/* routes', () => {
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-command['"]/);
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-chat['"]/);
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname\.startsWith\(['"]\/sidebar-agent\//);
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-agent\/event['"]/);
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-tabs['"]/);
|
||||
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-session['"]/);
|
||||
});
|
||||
|
||||
test('No chat-related state declarations or helpers', () => {
|
||||
// Allow the symbol names inside the rip-marker comments — but no
|
||||
// `let`, `const`, `function`, or `interface` declarations of them.
|
||||
expect(SERVER_SRC).not.toMatch(/^let agentProcess/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^let agentStatus/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^let messageQueue/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^let sidebarSession/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^const tabAgents/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^function pickSidebarModel/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^function processAgentEvent/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^function killAgent/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^function addChatEntry/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^interface ChatEntry/m);
|
||||
expect(SERVER_SRC).not.toMatch(/^interface SidebarSession/m);
|
||||
});
|
||||
|
||||
test('/health no longer surfaces agentStatus or messageQueue length', () => {
|
||||
const health = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/health'"));
|
||||
const slice = health.slice(0, 2000);
|
||||
expect(slice).not.toContain('agentStatus');
|
||||
expect(slice).not.toContain('messageQueue');
|
||||
expect(slice).not.toContain('agentStartTime');
|
||||
// chatEnabled is hardcoded false now (older clients still see the field).
|
||||
expect(slice).toMatch(/chatEnabled:\s*false/);
|
||||
// terminalPort survives.
|
||||
expect(slice).toContain('terminalPort');
|
||||
});
|
||||
});
|
||||
|
||||
describe('cli.ts: sidebar-agent is no longer spawned', () => {
|
||||
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
|
||||
|
||||
test('No Bun.spawn of sidebar-agent.ts', () => {
|
||||
expect(CLI_SRC).not.toMatch(/Bun\.spawn\(\s*\['bun',\s*'run',\s*\w*[Aa]gent[Ss]cript\][\s\S]{0,300}sidebar-agent/);
|
||||
// The variable name `agentScript` was for sidebar-agent. After the
|
||||
// rip there's only termAgentScript. Allow comments to mention the
|
||||
// history but not active spawn calls.
|
||||
expect(CLI_SRC).not.toMatch(/^\s*let agentScript = path\.resolve/m);
|
||||
});
|
||||
|
||||
test('Terminal-agent spawn survives', () => {
|
||||
expect(CLI_SRC).toContain('terminal-agent.ts');
|
||||
expect(CLI_SRC).toMatch(/Bun\.spawn\(\['bun',\s*'run',\s*termAgentScript\]/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('files: sidebar-agent.ts and its tests are deleted', () => {
|
||||
test('browse/src/sidebar-agent.ts is gone', () => {
|
||||
expect(fs.existsSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'))).toBe(false);
|
||||
});
|
||||
|
||||
test('sidebar-agent test files are gone', () => {
|
||||
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent.test.ts'))).toBe(false);
|
||||
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent-roundtrip.test.ts'))).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('manifest: ws permission + xterm-safe CSP', () => {
|
||||
test('host_permissions covers ws localhost', () => {
|
||||
expect(MANIFEST.host_permissions).toContain('ws://127.0.0.1:*/');
|
||||
});
|
||||
|
||||
test('host_permissions still covers http localhost', () => {
|
||||
expect(MANIFEST.host_permissions).toContain('http://127.0.0.1:*/');
|
||||
});
|
||||
|
||||
test('manifest does NOT add unsafe-eval to extension_pages CSP', () => {
|
||||
const csp = MANIFEST.content_security_policy;
|
||||
if (csp && csp.extension_pages) {
|
||||
expect(csp.extension_pages).not.toContain('unsafe-eval');
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,196 @@
|
||||
/**
|
||||
* tab-each — fan-out command for the live Terminal pane.
|
||||
*
|
||||
* Source-level guards: command is registered, has a description + usage,
|
||||
* scope-check the inner command, restore the original active tab in a
|
||||
* finally block (so a mid-batch exception doesn't leave the user looking
|
||||
* at a tab they didn't choose).
|
||||
*
|
||||
* Behavioral logic test: drive handleMetaCommand directly with a mock
|
||||
* BrowserManager + executeCommand callback. Verify the iteration order,
|
||||
* the JSON shape, the tab restore, and the chrome:// skip.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import { handleMetaCommand } from '../src/meta-commands';
|
||||
import { META_COMMANDS, COMMAND_DESCRIPTIONS } from '../src/commands';
|
||||
|
||||
const META_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/meta-commands.ts'), 'utf-8');
|
||||
|
||||
describe('tab-each: registration', () => {
|
||||
test('command is in META_COMMANDS', () => {
|
||||
expect(META_COMMANDS.has('tab-each')).toBe(true);
|
||||
});
|
||||
|
||||
test('has a description and usage entry', () => {
|
||||
expect(COMMAND_DESCRIPTIONS['tab-each']).toBeDefined();
|
||||
expect(COMMAND_DESCRIPTIONS['tab-each'].usage).toContain('tab-each');
|
||||
expect(COMMAND_DESCRIPTIONS['tab-each'].category).toBe('Tabs');
|
||||
});
|
||||
});
|
||||
|
||||
describe('tab-each: source-level guards', () => {
|
||||
test('scope-checks the inner command before fanning out', () => {
|
||||
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"));
|
||||
expect(block).toContain('checkScope(tokenInfo, innerName)');
|
||||
// The scope check must run BEFORE the for-loop. If it ran inside the
|
||||
// loop, a permission failure on the second tab would leave the first
|
||||
// tab already mutated.
|
||||
const checkIdx = block.indexOf('checkScope(tokenInfo, innerName)');
|
||||
const loopIdx = block.indexOf('for (const tab of tabs)');
|
||||
expect(checkIdx).toBeLessThan(loopIdx);
|
||||
});
|
||||
|
||||
test('restores the original active tab in a finally block', () => {
|
||||
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
|
||||
expect(block).toContain('finally');
|
||||
expect(block).toContain('originalActive');
|
||||
expect(block).toContain('switchTab(originalActive');
|
||||
});
|
||||
|
||||
test('uses bringToFront: false so the OS window does NOT jump', () => {
|
||||
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
|
||||
// tab-each is a background operation — pulling focus would steal the
|
||||
// user's foreground app every time claude fans out, which is
|
||||
// unacceptable.
|
||||
expect(block).toContain('bringToFront: false');
|
||||
});
|
||||
|
||||
test('skips chrome:// and chrome-extension:// internal pages', () => {
|
||||
const block = META_SRC.slice(META_SRC.indexOf("case 'tab-each':"), META_SRC.indexOf("case 'tab-each':") + 4000);
|
||||
expect(block).toContain("startsWith('chrome://')");
|
||||
expect(block).toContain("startsWith('chrome-extension://')");
|
||||
});
|
||||
});
|
||||
|
||||
describe('tab-each: behavior', () => {
|
||||
function mockBm(tabs: Array<{ id: number; url: string; title: string; active: boolean }>) {
|
||||
let activeId = tabs.find(t => t.active)?.id ?? tabs[0]?.id ?? 0;
|
||||
const switched: number[] = [];
|
||||
return {
|
||||
__switched: switched,
|
||||
__activeId: () => activeId,
|
||||
getActiveSession: () => ({}),
|
||||
getActiveTabId: () => activeId,
|
||||
getTabListWithTitles: async () => tabs.map(t => ({ ...t })),
|
||||
switchTab: (id: number, _opts?: any) => { switched.push(id); activeId = id; },
|
||||
} as any;
|
||||
}
|
||||
|
||||
test('iterates every tab, calls executeCommand for each, returns JSON results', async () => {
|
||||
const tabs = [
|
||||
{ id: 1, url: 'https://news.example.com', title: 'News', active: true },
|
||||
{ id: 2, url: 'https://docs.example.com', title: 'Docs', active: false },
|
||||
{ id: 3, url: 'https://github.com', title: 'GitHub', active: false },
|
||||
];
|
||||
const bm = mockBm(tabs);
|
||||
const calls: Array<{ command: string; args?: string[]; tabId?: number }> = [];
|
||||
const out = await handleMetaCommand(
|
||||
'tab-each',
|
||||
['snapshot', '-i'],
|
||||
bm,
|
||||
async () => {},
|
||||
null,
|
||||
{
|
||||
executeCommand: async (body) => {
|
||||
calls.push(body);
|
||||
return { status: 200, result: `snap-of-${body.tabId}` };
|
||||
},
|
||||
},
|
||||
);
|
||||
|
||||
const parsed = JSON.parse(out);
|
||||
expect(parsed.command).toBe('snapshot');
|
||||
expect(parsed.args).toEqual(['-i']);
|
||||
expect(parsed.total).toBe(3);
|
||||
expect(parsed.results.map((r: any) => r.tabId)).toEqual([1, 2, 3]);
|
||||
expect(parsed.results.every((r: any) => r.status === 200)).toBe(true);
|
||||
expect(parsed.results[0].output).toBe('snap-of-1');
|
||||
|
||||
// Inner command was dispatched 3 times, once per tab, with the right tabId.
|
||||
expect(calls).toHaveLength(3);
|
||||
expect(calls.map(c => c.tabId)).toEqual([1, 2, 3]);
|
||||
expect(calls.every(c => c.command === 'snapshot')).toBe(true);
|
||||
});
|
||||
|
||||
test('skips chrome:// pages with status=0 + "skipped" output', async () => {
|
||||
const tabs = [
|
||||
{ id: 1, url: 'chrome://newtab', title: 'New Tab', active: true },
|
||||
{ id: 2, url: 'https://example.com', title: 'Example', active: false },
|
||||
{ id: 3, url: 'chrome-extension://abc/page.html', title: 'Ext', active: false },
|
||||
];
|
||||
const bm = mockBm(tabs);
|
||||
const calls: any[] = [];
|
||||
const out = await handleMetaCommand(
|
||||
'tab-each',
|
||||
['text'],
|
||||
bm,
|
||||
async () => {},
|
||||
null,
|
||||
{
|
||||
executeCommand: async (body) => {
|
||||
calls.push(body);
|
||||
return { status: 200, result: `text-of-${body.tabId}` };
|
||||
},
|
||||
},
|
||||
);
|
||||
|
||||
const parsed = JSON.parse(out);
|
||||
expect(parsed.total).toBe(3);
|
||||
// chrome:// and chrome-extension:// → skipped (status 0).
|
||||
expect(parsed.results[0].status).toBe(0);
|
||||
expect(parsed.results[0].output).toContain('skipped');
|
||||
expect(parsed.results[2].status).toBe(0);
|
||||
// Only the real tab dispatched.
|
||||
expect(calls).toHaveLength(1);
|
||||
expect(calls[0].tabId).toBe(2);
|
||||
});
|
||||
|
||||
test('restores the originally active tab even if a tab errors', async () => {
|
||||
const tabs = [
|
||||
{ id: 10, url: 'https://a.example', title: 'A', active: false },
|
||||
{ id: 20, url: 'https://b.example', title: 'B', active: true }, // initially active
|
||||
{ id: 30, url: 'https://c.example', title: 'C', active: false },
|
||||
];
|
||||
const bm = mockBm(tabs);
|
||||
let calls = 0;
|
||||
const out = await handleMetaCommand(
|
||||
'tab-each',
|
||||
['text'],
|
||||
bm,
|
||||
async () => {},
|
||||
null,
|
||||
{
|
||||
executeCommand: async (body) => {
|
||||
calls++;
|
||||
if (body.tabId === 20) {
|
||||
return { status: 500, result: JSON.stringify({ error: 'boom' }) };
|
||||
}
|
||||
return { status: 200, result: `ok-${body.tabId}` };
|
||||
},
|
||||
},
|
||||
);
|
||||
|
||||
const parsed = JSON.parse(out);
|
||||
expect(parsed.results.find((r: any) => r.tabId === 20).status).toBe(500);
|
||||
expect(parsed.results.find((r: any) => r.tabId === 20).output).toBe('boom');
|
||||
expect(parsed.results.find((r: any) => r.tabId === 10).status).toBe(200);
|
||||
expect(parsed.results.find((r: any) => r.tabId === 30).status).toBe(200);
|
||||
// Active tab restored to 20 (the one that was active when we started).
|
||||
expect(bm.__activeId()).toBe(20);
|
||||
});
|
||||
|
||||
test('throws on empty args (no inner command)', async () => {
|
||||
const bm = mockBm([{ id: 1, url: 'https://x.example', title: 'X', active: true }]);
|
||||
await expect(handleMetaCommand(
|
||||
'tab-each',
|
||||
[],
|
||||
bm,
|
||||
async () => {},
|
||||
null,
|
||||
{ executeCommand: async () => ({ status: 200, result: '' }) },
|
||||
)).rejects.toThrow(/Usage/);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,273 @@
|
||||
/**
|
||||
* Integration tests for terminal-agent.ts.
|
||||
*
|
||||
* Spawns the agent as a real subprocess in a temp state directory,
|
||||
* exercises:
|
||||
* 1. /internal/grant — loopback handshake with the internal token.
|
||||
* 2. /ws Origin gate — non-extension Origin → 403.
|
||||
* 3. /ws cookie gate — missing/invalid cookie → 401.
|
||||
* 4. /ws full PTY round-trip — write `echo hi\n`, read `hi`.
|
||||
* 5. resize control message — terminal accepts and stays alive.
|
||||
* 6. close behavior — sending close terminates the PTY child.
|
||||
*
|
||||
* Uses /bin/bash via BROWSE_TERMINAL_BINARY override so CI doesn't need
|
||||
* the `claude` binary installed.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
const AGENT_SCRIPT = path.join(import.meta.dir, '../src/terminal-agent.ts');
|
||||
const BASH = '/bin/bash';
|
||||
|
||||
let stateDir: string;
|
||||
let agentProc: any;
|
||||
let agentPort: number;
|
||||
let internalToken: string;
|
||||
|
||||
function readPortFile(): number {
|
||||
for (let i = 0; i < 50; i++) {
|
||||
try {
|
||||
const v = parseInt(fs.readFileSync(path.join(stateDir, 'terminal-port'), 'utf-8').trim(), 10);
|
||||
if (Number.isFinite(v) && v > 0) return v;
|
||||
} catch {}
|
||||
Bun.sleepSync(40);
|
||||
}
|
||||
throw new Error('terminal-agent never wrote port file');
|
||||
}
|
||||
|
||||
function readTokenFile(): string {
|
||||
for (let i = 0; i < 50; i++) {
|
||||
try {
|
||||
const t = fs.readFileSync(path.join(stateDir, 'terminal-internal-token'), 'utf-8').trim();
|
||||
if (t.length > 16) return t;
|
||||
} catch {}
|
||||
Bun.sleepSync(40);
|
||||
}
|
||||
throw new Error('terminal-agent never wrote internal token');
|
||||
}
|
||||
|
||||
beforeAll(() => {
|
||||
stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-term-'));
|
||||
const stateFile = path.join(stateDir, 'browse.json');
|
||||
// browse.json must exist so the agent's readBrowseToken doesn't throw.
|
||||
fs.writeFileSync(stateFile, JSON.stringify({ token: 'test-browse-token' }));
|
||||
agentProc = Bun.spawn(['bun', 'run', AGENT_SCRIPT], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_SERVER_PORT: '0', // not used in this test
|
||||
BROWSE_TERMINAL_BINARY: BASH,
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
agentPort = readPortFile();
|
||||
internalToken = readTokenFile();
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
try { agentProc?.kill?.(); } catch {}
|
||||
try { fs.rmSync(stateDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
async function grantToken(token: string): Promise<Response> {
|
||||
return fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': `Bearer ${internalToken}`,
|
||||
},
|
||||
body: JSON.stringify({ token }),
|
||||
});
|
||||
}
|
||||
|
||||
describe('terminal-agent: /internal/grant', () => {
|
||||
test('accepts grants signed with the internal token', async () => {
|
||||
const resp = await grantToken('test-cookie-token-very-long-yes');
|
||||
expect(resp.status).toBe(200);
|
||||
});
|
||||
|
||||
test('rejects grants with the wrong internal token', async () => {
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/internal/grant`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': 'Bearer wrong-token',
|
||||
},
|
||||
body: JSON.stringify({ token: 'whatever' }),
|
||||
});
|
||||
expect(resp.status).toBe(403);
|
||||
});
|
||||
});
|
||||
|
||||
describe('terminal-agent: /ws gates', () => {
|
||||
test('rejects upgrade attempts without an extension Origin', async () => {
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`);
|
||||
expect(resp.status).toBe(403);
|
||||
expect(await resp.text()).toBe('forbidden origin');
|
||||
});
|
||||
|
||||
test('rejects upgrade attempts from a non-extension Origin', async () => {
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: { 'Origin': 'https://evil.example.com' },
|
||||
});
|
||||
expect(resp.status).toBe(403);
|
||||
});
|
||||
|
||||
test('rejects extension-Origin upgrades without a granted cookie', async () => {
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: {
|
||||
'Origin': 'chrome-extension://abc123',
|
||||
'Cookie': 'gstack_pty=never-granted',
|
||||
},
|
||||
});
|
||||
expect(resp.status).toBe(401);
|
||||
});
|
||||
});
|
||||
|
||||
describe('terminal-agent: PTY round-trip via real WebSocket (Cookie auth)', () => {
|
||||
test('binary writes go to PTY stdin, output streams back', async () => {
|
||||
const cookie = 'rt-token-must-be-at-least-seventeen-chars-long';
|
||||
const granted = await grantToken(cookie);
|
||||
expect(granted.status).toBe(200);
|
||||
|
||||
const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: {
|
||||
'Origin': 'chrome-extension://test-extension-id',
|
||||
'Cookie': `gstack_pty=${cookie}`,
|
||||
},
|
||||
} as any);
|
||||
|
||||
const collected: string[] = [];
|
||||
let opened = false;
|
||||
let closed = false;
|
||||
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
|
||||
ws.addEventListener('open', () => { opened = true; clearTimeout(timer); resolve(); });
|
||||
ws.addEventListener('error', (e: any) => { clearTimeout(timer); reject(new Error('ws error')); });
|
||||
});
|
||||
|
||||
ws.addEventListener('message', (ev: any) => {
|
||||
if (typeof ev.data === 'string') return; // ignore control frames
|
||||
const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data;
|
||||
collected.push(new TextDecoder().decode(buf));
|
||||
});
|
||||
|
||||
ws.addEventListener('close', () => { closed = true; });
|
||||
|
||||
// Lazy-spawn trigger: any binary frame causes the agent to spawn /bin/bash.
|
||||
ws.send(new TextEncoder().encode('echo hello-pty-world\nexit\n'));
|
||||
|
||||
// Wait up to 5s for output and shutdown.
|
||||
await new Promise<void>((resolve) => {
|
||||
const start = Date.now();
|
||||
const tick = () => {
|
||||
const joined = collected.join('');
|
||||
if (joined.includes('hello-pty-world')) return resolve();
|
||||
if (Date.now() - start > 5000) return resolve();
|
||||
setTimeout(tick, 50);
|
||||
};
|
||||
tick();
|
||||
});
|
||||
|
||||
expect(opened).toBe(true);
|
||||
const allOutput = collected.join('');
|
||||
expect(allOutput).toContain('hello-pty-world');
|
||||
|
||||
try { ws.close(); } catch {}
|
||||
// Give cleanup a moment.
|
||||
await Bun.sleep(200);
|
||||
});
|
||||
|
||||
test('Sec-WebSocket-Protocol auth path: browser-style upgrade with token in protocol', async () => {
|
||||
// This is the path the actual browser extension takes. Cross-port
|
||||
// SameSite=Strict cookies don't reliably survive the jump from the
|
||||
// browse server (port A) to the agent (port B) when initiated from a
|
||||
// chrome-extension origin, so we send the token via the only auth
|
||||
// header the browser WebSocket API lets us set: Sec-WebSocket-Protocol.
|
||||
//
|
||||
// The browser sends `gstack-pty.<token>` and the agent must:
|
||||
// 1) strip the gstack-pty. prefix
|
||||
// 2) validate the token
|
||||
// 3) ECHO the protocol back in the upgrade response
|
||||
// Without (3) the browser closes the connection immediately, which
|
||||
// is the exact bug the original cookie-only implementation hit in
|
||||
// manual dogfood. This test catches that regression in CI.
|
||||
const token = 'sec-protocol-token-must-be-at-least-seventeen-chars';
|
||||
await grantToken(token);
|
||||
|
||||
// We exercise the protocol path by raw-handshaking via fetch+Upgrade,
|
||||
// because Bun's test-client WebSocket constructor doesn't propagate
|
||||
// `protocols` cleanly when also passed `headers` (the constructor
|
||||
// detects the third-arg form unreliably). Real browsers (Chromium)
|
||||
// use the standard protocols arg fine — the server-side handler is
|
||||
// identical either way, so this test still locks the load-bearing
|
||||
// invariant: the agent accepts a token via Sec-WebSocket-Protocol
|
||||
// and echoes the protocol back so a browser would accept the upgrade.
|
||||
const handshakeKey = 'dGhlIHNhbXBsZSBub25jZQ==';
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: {
|
||||
'Connection': 'Upgrade',
|
||||
'Upgrade': 'websocket',
|
||||
'Sec-WebSocket-Version': '13',
|
||||
'Sec-WebSocket-Key': handshakeKey,
|
||||
'Sec-WebSocket-Protocol': `gstack-pty.${token}`,
|
||||
'Origin': 'chrome-extension://test-extension-id',
|
||||
},
|
||||
});
|
||||
|
||||
// 101 Switching Protocols + protocol echoed back = browser would accept.
|
||||
// 401/403/anything else = browser would close the connection immediately
|
||||
// (the bug we hit in manual dogfood).
|
||||
expect(resp.status).toBe(101);
|
||||
expect(resp.headers.get('upgrade')?.toLowerCase()).toBe('websocket');
|
||||
expect(resp.headers.get('sec-websocket-protocol')).toBe(`gstack-pty.${token}`);
|
||||
});
|
||||
|
||||
test('Sec-WebSocket-Protocol auth: rejects unknown token even with valid Origin', async () => {
|
||||
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: {
|
||||
'Connection': 'Upgrade',
|
||||
'Upgrade': 'websocket',
|
||||
'Sec-WebSocket-Version': '13',
|
||||
'Sec-WebSocket-Key': 'dGhlIHNhbXBsZSBub25jZQ==',
|
||||
'Sec-WebSocket-Protocol': 'gstack-pty.never-granted-token',
|
||||
'Origin': 'chrome-extension://test-extension-id',
|
||||
},
|
||||
});
|
||||
expect(resp.status).toBe(401);
|
||||
});
|
||||
|
||||
test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => {
|
||||
const cookie = 'resize-token-must-be-at-least-seventeen-chars';
|
||||
await grantToken(cookie);
|
||||
|
||||
const ws = new WebSocket(`ws://127.0.0.1:${agentPort}/ws`, {
|
||||
headers: {
|
||||
'Origin': 'chrome-extension://test-extension-id',
|
||||
'Cookie': `gstack_pty=${cookie}`,
|
||||
},
|
||||
} as any);
|
||||
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
const timer = setTimeout(() => reject(new Error('ws never opened')), 5000);
|
||||
ws.addEventListener('open', () => { clearTimeout(timer); resolve(); });
|
||||
ws.addEventListener('error', () => { clearTimeout(timer); reject(new Error('ws error')); });
|
||||
});
|
||||
|
||||
// Send a resize before anything else (lazy-spawn won't fire).
|
||||
ws.send(JSON.stringify({ type: 'resize', cols: 120, rows: 40 }));
|
||||
|
||||
// After resize, send a binary frame; should still work.
|
||||
ws.send(new TextEncoder().encode('exit\n'));
|
||||
|
||||
await Bun.sleep(300);
|
||||
// ws still readyState 1 (OPEN) or 3 (CLOSED after exit) — both fine.
|
||||
expect([WebSocket.OPEN, WebSocket.CLOSED]).toContain(ws.readyState);
|
||||
|
||||
try { ws.close(); } catch {}
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,223 @@
|
||||
/**
|
||||
* Unit tests for the Terminal-tab PTY agent and its server-side glue.
|
||||
*
|
||||
* Coverage:
|
||||
* - pty-session-cookie module: mint / validate / revoke / TTL pruning.
|
||||
* - source-level guard: /pty-session and /terminal/* are NOT in TUNNEL_PATHS.
|
||||
* - source-level guard: /health does not surface ptyToken.
|
||||
* - source-level guard: terminal-agent binds 127.0.0.1 only.
|
||||
* - source-level guard: terminal-agent enforces Origin AND cookie on /ws.
|
||||
*
|
||||
* These are read-only checks against source — they prevent silent surface
|
||||
* widening during a routine refactor (matches the dual-listener.test.ts
|
||||
* pattern). End-to-end behavior (real /bin/bash PTY round-trip,
|
||||
* tunnel-surface 404 + denial-log) lives in
|
||||
* `browse/test/terminal-agent-integration.test.ts`.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeEach } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import {
|
||||
mintPtySessionToken, validatePtySessionToken, revokePtySessionToken,
|
||||
extractPtyCookie, buildPtySetCookie, buildPtyClearCookie,
|
||||
PTY_COOKIE_NAME, __resetPtySessions,
|
||||
} from '../src/pty-session-cookie';
|
||||
|
||||
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
|
||||
const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/terminal-agent.ts'), 'utf-8');
|
||||
|
||||
describe('pty-session-cookie: mint/validate/revoke', () => {
|
||||
beforeEach(() => __resetPtySessions());
|
||||
|
||||
test('a freshly minted token validates', () => {
|
||||
const { token } = mintPtySessionToken();
|
||||
expect(validatePtySessionToken(token)).toBe(true);
|
||||
});
|
||||
|
||||
test('null and unknown tokens fail validation', () => {
|
||||
expect(validatePtySessionToken(null)).toBe(false);
|
||||
expect(validatePtySessionToken(undefined)).toBe(false);
|
||||
expect(validatePtySessionToken('')).toBe(false);
|
||||
expect(validatePtySessionToken('not-a-real-token')).toBe(false);
|
||||
});
|
||||
|
||||
test('revoke makes a token invalid', () => {
|
||||
const { token } = mintPtySessionToken();
|
||||
expect(validatePtySessionToken(token)).toBe(true);
|
||||
revokePtySessionToken(token);
|
||||
expect(validatePtySessionToken(token)).toBe(false);
|
||||
});
|
||||
|
||||
test('Set-Cookie has HttpOnly + SameSite=Strict + Path=/ + Max-Age', () => {
|
||||
const { token } = mintPtySessionToken();
|
||||
const cookie = buildPtySetCookie(token);
|
||||
expect(cookie).toContain(`${PTY_COOKIE_NAME}=${token}`);
|
||||
expect(cookie).toContain('HttpOnly');
|
||||
expect(cookie).toContain('SameSite=Strict');
|
||||
expect(cookie).toContain('Path=/');
|
||||
expect(cookie).toMatch(/Max-Age=\d+/);
|
||||
// Secure is intentionally omitted — daemon binds 127.0.0.1 over HTTP.
|
||||
expect(cookie).not.toContain('Secure');
|
||||
});
|
||||
|
||||
test('clear-cookie has Max-Age=0', () => {
|
||||
expect(buildPtyClearCookie()).toContain('Max-Age=0');
|
||||
});
|
||||
|
||||
test('extractPtyCookie reads gstack_pty from a Cookie header', () => {
|
||||
const { token } = mintPtySessionToken();
|
||||
const req = new Request('http://127.0.0.1/ws', {
|
||||
headers: { 'cookie': `othercookie=foo; gstack_pty=${token}; baz=qux` },
|
||||
});
|
||||
expect(extractPtyCookie(req)).toBe(token);
|
||||
});
|
||||
|
||||
test('extractPtyCookie returns null when the cookie is missing', () => {
|
||||
const req = new Request('http://127.0.0.1/ws', {
|
||||
headers: { 'cookie': 'unrelated=value' },
|
||||
});
|
||||
expect(extractPtyCookie(req)).toBe(null);
|
||||
});
|
||||
});
|
||||
|
||||
describe('Source-level guard: /pty-session is not on the tunnel surface', () => {
|
||||
test('TUNNEL_PATHS does not include /pty-session or /terminal/*', () => {
|
||||
const start = SERVER_SRC.indexOf('const TUNNEL_PATHS = new Set<string>([');
|
||||
expect(start).toBeGreaterThan(-1);
|
||||
const end = SERVER_SRC.indexOf(']);', start);
|
||||
const body = SERVER_SRC.slice(start, end);
|
||||
expect(body).not.toContain('/pty-session');
|
||||
expect(body).not.toContain('/terminal/');
|
||||
expect(body).not.toContain('/terminal-');
|
||||
});
|
||||
});
|
||||
|
||||
describe('Source-level guard: /health does NOT surface ptyToken', () => {
|
||||
test('/health response body does not include ptyToken', () => {
|
||||
const healthIdx = SERVER_SRC.indexOf("url.pathname === '/health'");
|
||||
expect(healthIdx).toBeGreaterThan(-1);
|
||||
// Slice from /health through the response close-bracket.
|
||||
const slice = SERVER_SRC.slice(healthIdx, healthIdx + 2000);
|
||||
// The /health JSON.stringify body must not mention the cookie token.
|
||||
// It's allowed to include `terminalPort` (a port number, not auth).
|
||||
expect(slice).not.toContain('ptyToken');
|
||||
expect(slice).not.toContain('gstack_pty');
|
||||
expect(slice).toContain('terminalPort');
|
||||
});
|
||||
});
|
||||
|
||||
describe('Source-level guard: terminal-agent', () => {
|
||||
test('binds 127.0.0.1 only, never 0.0.0.0', () => {
|
||||
expect(AGENT_SRC).toContain("hostname: '127.0.0.1'");
|
||||
expect(AGENT_SRC).not.toContain("hostname: '0.0.0.0'");
|
||||
});
|
||||
|
||||
test('rejects /ws upgrades without chrome-extension:// Origin', () => {
|
||||
// The Origin check must run BEFORE the cookie check — otherwise a
|
||||
// missing-origin attempt would surface the 401 cookie message and
|
||||
// signal to attackers that they need to forge a cookie.
|
||||
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
|
||||
expect(wsHandler).toContain('chrome-extension://');
|
||||
expect(wsHandler).toContain('forbidden origin');
|
||||
});
|
||||
|
||||
test('validates the session token against an in-memory token set', () => {
|
||||
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
|
||||
// Two transports: Sec-WebSocket-Protocol (preferred for browsers) and
|
||||
// Cookie gstack_pty (fallback). Both verify against validTokens.
|
||||
expect(wsHandler).toContain('sec-websocket-protocol');
|
||||
expect(wsHandler).toContain('gstack_pty');
|
||||
expect(wsHandler).toContain('validTokens.has');
|
||||
});
|
||||
|
||||
test('Sec-WebSocket-Protocol auth: strips gstack-pty. prefix and echoes back', () => {
|
||||
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
|
||||
// Browsers send `Sec-WebSocket-Protocol: gstack-pty.<token>`. The agent
|
||||
// must strip the prefix before checking validTokens, AND echo the
|
||||
// protocol back in the upgrade response — without the echo, the
|
||||
// browser closes the connection immediately.
|
||||
expect(wsHandler).toContain("'gstack-pty.'");
|
||||
expect(wsHandler).toContain('Sec-WebSocket-Protocol');
|
||||
expect(wsHandler).toContain('acceptedProtocol');
|
||||
});
|
||||
|
||||
test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => {
|
||||
// The whole point of lazy-spawn (codex finding #8) is that the WS
|
||||
// upgrade itself does NOT call spawnClaude. Spawn happens on first
|
||||
// message frame.
|
||||
const upgradeBlock = AGENT_SRC.slice(
|
||||
AGENT_SRC.indexOf("if (url.pathname === '/ws')"),
|
||||
AGENT_SRC.indexOf("websocket: {"),
|
||||
);
|
||||
expect(upgradeBlock).not.toContain('spawnClaude(');
|
||||
// Spawn must be invoked from the message handler (lazy on first byte).
|
||||
const messageHandler = AGENT_SRC.slice(AGENT_SRC.indexOf('message(ws, raw)'));
|
||||
expect(messageHandler).toContain('spawnClaude(');
|
||||
expect(messageHandler).toContain('!session.spawned');
|
||||
});
|
||||
|
||||
test('process.on uncaughtException + unhandledRejection handlers exist', () => {
|
||||
expect(AGENT_SRC).toContain("process.on('uncaughtException'");
|
||||
expect(AGENT_SRC).toContain("process.on('unhandledRejection'");
|
||||
});
|
||||
|
||||
test('cleanup escalates SIGINT to SIGKILL after 3s on close', () => {
|
||||
// disposeSession must be idempotent and use a SIGINT-then-SIGKILL pattern.
|
||||
const dispose = AGENT_SRC.slice(AGENT_SRC.indexOf('function disposeSession'));
|
||||
expect(dispose).toContain("'SIGINT'");
|
||||
expect(dispose).toContain("'SIGKILL'");
|
||||
expect(dispose).toContain('3000');
|
||||
});
|
||||
|
||||
test('tabState frames write tabs.json + active-tab.json', () => {
|
||||
expect(AGENT_SRC).toContain("msg?.type === 'tabState'");
|
||||
expect(AGENT_SRC).toContain('function handleTabState');
|
||||
const fn = AGENT_SRC.slice(AGENT_SRC.indexOf('function handleTabState'));
|
||||
// Atomic write via tmp + rename for both files (so claude never reads
|
||||
// a half-written JSON document).
|
||||
expect(fn).toContain("'tabs.json'");
|
||||
expect(fn).toContain("'active-tab.json'");
|
||||
expect(fn).toContain('renameSync');
|
||||
// Skip chrome:// and chrome-extension:// pages — they're not useful
|
||||
// targets for browse commands.
|
||||
expect(fn).toContain("startsWith('chrome://')");
|
||||
expect(fn).toContain("startsWith('chrome-extension://')");
|
||||
});
|
||||
|
||||
test('claude is spawned with --append-system-prompt tab-awareness hint', () => {
|
||||
expect(AGENT_SRC).toContain('function buildTabAwarenessHint');
|
||||
const hint = AGENT_SRC.slice(AGENT_SRC.indexOf('function buildTabAwarenessHint'));
|
||||
// The hint must mention the live state files and the fanout command —
|
||||
// those are the two affordances that distinguish a gstack-PTY claude
|
||||
// from a plain `claude` session.
|
||||
expect(hint).toContain('tabs.json');
|
||||
expect(hint).toContain('active-tab.json');
|
||||
expect(hint).toContain('tab-each');
|
||||
// And it must be passed via --append-system-prompt at spawn time
|
||||
// (NOT written into the PTY as user input — that would pollute the
|
||||
// visible transcript).
|
||||
const spawn = AGENT_SRC.slice(AGENT_SRC.indexOf('function spawnClaude'));
|
||||
expect(spawn).toContain("'--append-system-prompt'");
|
||||
expect(spawn).toContain('tabHint');
|
||||
});
|
||||
});
|
||||
|
||||
describe('Source-level guard: server.ts /pty-session route', () => {
|
||||
test('validates AUTH_TOKEN, grants over loopback, returns token + Set-Cookie', () => {
|
||||
const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'"));
|
||||
// Must check auth before minting.
|
||||
const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken'));
|
||||
expect(beforeMint).toContain('validateAuth');
|
||||
// Must call the loopback grant before responding (otherwise the
|
||||
// agent's validTokens Set never sees the token and /ws would 401).
|
||||
expect(route).toContain('grantPtyToken');
|
||||
// Must return the token in the JSON body for the
|
||||
// Sec-WebSocket-Protocol auth path (cross-port cookies don't survive
|
||||
// SameSite=Strict from a chrome-extension origin).
|
||||
expect(route).toContain('ptySessionToken');
|
||||
// Set-Cookie is kept as a fallback for non-browser callers.
|
||||
expect(route).toContain('Set-Cookie');
|
||||
expect(route).toContain('buildPtySetCookie');
|
||||
});
|
||||
});
|
||||
@@ -15,6 +15,8 @@
|
||||
"devDependencies": {
|
||||
"@anthropic-ai/claude-agent-sdk": "0.2.117",
|
||||
"@anthropic-ai/sdk": "^0.78.0",
|
||||
"xterm": "5",
|
||||
"xterm-addon-fit": "^0.8.0",
|
||||
},
|
||||
},
|
||||
},
|
||||
@@ -537,6 +539,10 @@
|
||||
|
||||
"ws": ["ws@8.20.0", "", { "peerDependencies": { "bufferutil": "^4.0.1", "utf-8-validate": ">=5.0.2" }, "optionalPeers": ["bufferutil", "utf-8-validate"] }, "sha512-sAt8BhgNbzCtgGbt2OxmpuryO63ZoDk/sqaB/znQm94T4fCEsy/yV+7CdC1kJhOU9lboAEU7R3kquuycDoibVA=="],
|
||||
|
||||
"xterm": ["xterm@5.3.0", "", {}, "sha512-8QqjlekLUFTrU6x7xck1MsPzPA571K5zNqWm0M0oroYEWVOptZ0+ubQSkQ3uxIEhcIHRujJy6emDWX4A7qyFzg=="],
|
||||
|
||||
"xterm-addon-fit": ["xterm-addon-fit@0.8.0", "", { "peerDependencies": { "xterm": "^5.0.0" } }, "sha512-yj3Np7XlvxxhYF/EJ7p3KHaMt6OdwQ+HDu573Vx1lRXsVxOcnVJs51RgjZOouIZOczTsskaS+CpXspK81/DLqw=="],
|
||||
|
||||
"y18n": ["y18n@5.0.8", "", {}, "sha512-0pfFzegeDWJHJIAmTLRP2DwHjdF5s7jo9tuztdQxAhINCdvS+3nGINqPd00AphqJR/0LhANUS6/+7SCb98YOfA=="],
|
||||
|
||||
"yargs": ["yargs@17.7.2", "", { "dependencies": { "cliui": "^8.0.1", "escalade": "^3.1.1", "get-caller-file": "^2.0.5", "require-directory": "^2.1.1", "string-width": "^4.2.3", "y18n": "^5.0.5", "yargs-parser": "^21.1.1" } }, "sha512-7dSzzRQ++CKnNI/krKnYRV7JKKPUXMEh61soaHKg9mrWEhzFWhFnxPxGl+69cD1Ou63C13NUPCnmIcrvqCuM6w=="],
|
||||
|
||||
@@ -1,190 +1,200 @@
|
||||
# Sidebar Message Flow
|
||||
# Sidebar Flow
|
||||
|
||||
How the GStack Browser sidebar actually works. Read this before touching
|
||||
sidepanel.js, background.js, content.js, server.ts sidebar endpoints,
|
||||
or sidebar-agent.ts.
|
||||
`sidepanel.js`, `background.js`, `content.js`, `terminal-agent.ts`, or
|
||||
sidebar-related server endpoints.
|
||||
|
||||
The sidebar has one primary surface — the **Terminal** pane, an interactive
|
||||
`claude` PTY. Activity / Refs / Inspector survive as debug overlays behind
|
||||
the `debug` toggle in the footer. The chat queue path (one-shot `claude -p`,
|
||||
sidebar-agent.ts) was ripped once the PTY proved out — the Terminal pane is
|
||||
strictly more capable.
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐
|
||||
│ sidepanel.js │────▶│ background.js│────▶│ server.ts │────▶│sidebar-agent.ts│
|
||||
│ (Chrome panel) │ │ (svc worker) │ │ (Bun HTTP) │ │ (Bun process) │
|
||||
└─────────────────┘ └──────────────┘ └─────────────┘ └────────────────┘
|
||||
▲ │ │
|
||||
│ polls /sidebar-chat │ polls queue file │
|
||||
└───────────────────────────────────────────┘ │
|
||||
◀──────────────────────┘
|
||||
POST /sidebar-agent/event
|
||||
┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐
|
||||
│ sidepanel.js + │────▶│ server.ts │────▶│terminal-agent.ts │
|
||||
│ -terminal.js │ │ (compiled) │ │ (non-compiled) │
|
||||
│ (xterm.js) │ │ │ │ PTY listener │
|
||||
└─────────────────┘ └──────────────┘ └──────────────────┘
|
||||
▲ │ │
|
||||
│ ws://127.0.0.1:<termPort>/ws (Sec-WebSocket-Protocol auth)
|
||||
└───────────────────────┼──────────────────────▶│ Bun.spawn(claude)
|
||||
│ │ terminal: {data}
|
||||
│ ▼
|
||||
│ ┌──────────────────┐
|
||||
│ │ claude PTY │
|
||||
│ └──────────────────┘
|
||||
POST /pty-session │
|
||||
(Bearer AUTH_TOKEN) │
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ pty-session- │
|
||||
│ cookie.ts │
|
||||
│ (in-memory token │
|
||||
│ registry) │
|
||||
└──────────────────┘
|
||||
│
|
||||
│ POST /internal/grant (loopback)
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ validTokens Set │
|
||||
│ in agent memory │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Startup Timeline
|
||||
The compiled browse server can't `posix_spawn` external executables —
|
||||
`terminal-agent.ts` runs as a separate non-compiled `bun run` process and
|
||||
owns the `claude` subprocess.
|
||||
|
||||
## Startup + first-keystroke timeline
|
||||
|
||||
```
|
||||
T+0ms CLI runs `$B connect`
|
||||
├── Server starts on port 34567
|
||||
├── Writes state to .gstack/browse.json (pid, port, token)
|
||||
├── Launches headed Chromium with extension
|
||||
└── Clears sidebar-agent-queue.jsonl
|
||||
├── Server starts (compiled)
|
||||
└── Spawns terminal-agent.ts via `bun run`
|
||||
|
||||
T+500ms sidebar-agent.ts spawned by CLI
|
||||
├── Reads auth token from .gstack/browse.json
|
||||
├── Creates queue file if missing
|
||||
├── Sets lastLine = current line count
|
||||
└── Starts polling every 200ms
|
||||
T+500ms terminal-agent.ts boots
|
||||
├── Bun.serve on 127.0.0.1:0 (random port)
|
||||
├── Writes <stateDir>/terminal-port (server reads it for /health)
|
||||
├── Writes <stateDir>/terminal-internal-token (loopback handshake)
|
||||
└── Probes claude → writes claude-available.json
|
||||
|
||||
T+1-3s Extension loads in Chromium
|
||||
├── background.js: health poll every 1s (fast startup)
|
||||
│ └── GET /health → gets auth token
|
||||
├── content.js: injects on welcome page
|
||||
│ └── Does NOT fire gstack-extension-ready (waits for sidebar)
|
||||
└── Side panel: may auto-open via chrome.sidePanel.open()
|
||||
T+1-3s Extension loads, sidebar opens
|
||||
├── sidepanel-terminal.js: setState(IDLE), shows "Starting Claude Code..."
|
||||
└── tryAutoConnect() polls until window.gstackServerPort + token are set
|
||||
|
||||
T+2-10s Side panel connects
|
||||
├── tryConnect() → asks background for port/token
|
||||
├── Fallback: direct GET /health for token
|
||||
├── updateConnection(url, token)
|
||||
│ ├── Starts chat polling (1s interval)
|
||||
│ ├── Starts tab polling (2s interval)
|
||||
│ ├── Connects SSE activity stream
|
||||
│ └── Sends { type: 'sidebarOpened' } to background
|
||||
└── background relays to content script → hides welcome arrow
|
||||
|
||||
T+10s+ Ready for messages
|
||||
T+ready tryAutoConnect calls connect()
|
||||
├── POST /pty-session (Authorization: Bearer AUTH_TOKEN)
|
||||
│ └── server mints session token, posts /internal/grant to agent
|
||||
│ └── responds with {terminalPort, ptySessionToken}
|
||||
├── GET /claude-available (preflight)
|
||||
├── new WebSocket(`ws://127.0.0.1:<terminalPort>/ws`,
|
||||
│ [`gstack-pty.<token>`])
|
||||
│ └── Browser sends Sec-WebSocket-Protocol + Origin
|
||||
│ └── Agent validates Origin AND token BEFORE upgrading
|
||||
│ └── Agent echoes the protocol back (REQUIRED — browser
|
||||
│ closes the connection without it)
|
||||
├── On open: send {type:"resize"} then a single \n byte
|
||||
└── Agent message handler sees the byte → spawnClaude()
|
||||
```
|
||||
|
||||
## Message Flow: User Types → Claude Responds
|
||||
## Auth: WebSocket can't send Authorization headers
|
||||
|
||||
```
|
||||
1. User types "go to hn" in sidebar, hits Enter
|
||||
Browser WebSocket clients can't set `Authorization`. They CAN set
|
||||
`Sec-WebSocket-Protocol` via the second arg of `new WebSocket(url,
|
||||
protocols)`. We exploit that:
|
||||
|
||||
2. sidepanel.js sendMessage()
|
||||
├── Renders user bubble immediately (optimistic)
|
||||
├── Renders thinking dots immediately
|
||||
├── Switches to fast poll (300ms)
|
||||
└── chrome.runtime.sendMessage({ type: 'sidebar-command', message, tabId })
|
||||
1. `POST /pty-session` (auth: Bearer AUTH_TOKEN) → server mints a
|
||||
short-lived session token, pushes it to the agent over loopback,
|
||||
returns it in the JSON body.
|
||||
2. Extension calls `new WebSocket(url, ['gstack-pty.<token>'])`.
|
||||
3. Agent reads `Sec-WebSocket-Protocol`, strips `gstack-pty.`, validates
|
||||
against `validTokens`, echoes the protocol back. Echo is mandatory —
|
||||
without it Chromium closes the connection on receipt of the upgrade
|
||||
response.
|
||||
|
||||
3. background.js
|
||||
├── Gets active Chrome tab URL
|
||||
└── POST /sidebar-command { message, activeTabUrl }
|
||||
with Authorization: Bearer ${authToken}
|
||||
A `Set-Cookie: gstack_pty=...` header is also returned for non-browser
|
||||
callers (curl, integration tests). The cookie path was the original v1
|
||||
design but `SameSite=Strict` cookies don't survive the cross-port jump
|
||||
from server.ts:34567 → agent:<random> from a chrome-extension origin.
|
||||
The protocol-token path is what the browser actually uses.
|
||||
|
||||
4. server.ts /sidebar-command handler
|
||||
├── validateAuth(req)
|
||||
├── syncActiveTabByUrl(extensionUrl) — syncs Playwright tab to Chrome tab
|
||||
├── pickSidebarModel(message) — 'sonnet' for actions, 'opus' for analysis
|
||||
├── Adds user message to chat buffer
|
||||
├── Builds system prompt + args
|
||||
└── Appends JSON to ~/.gstack/sidebar-agent-queue.jsonl
|
||||
### Dual-token model
|
||||
|
||||
5. sidebar-agent.ts poll() (within 200ms)
|
||||
├── Reads new line from queue file
|
||||
├── Parses JSON entry
|
||||
├── Checks processingTabs — skips if tab already has agent running
|
||||
└── askClaude(entry) — fire and forget
|
||||
| Token | Lives in | Used for | Lifetime |
|
||||
|-------|----------|----------|----------|
|
||||
| `AUTH_TOKEN` | `<stateDir>/browse.json`; in-memory in server.ts | `/pty-session` POST (mint cookie + token) | server lifetime |
|
||||
| `gstack-pty.<...>` (Sec-WebSocket-Protocol) | Browser memory only; agent `validTokens` Set | `/ws` upgrade auth | 30 min, auto-revoked on WS close |
|
||||
| `INTERNAL_TOKEN` | `<stateDir>/terminal-internal-token`; in agent memory | server → agent loopback `/internal/grant` | agent lifetime |
|
||||
|
||||
6. sidebar-agent.ts askClaude()
|
||||
├── spawn('claude', ['-p', prompt, '--model', model, ...])
|
||||
├── Streams stdout line-by-line (stream-json format)
|
||||
├── For each event: POST /sidebar-agent/event { type, tool, text, tabId }
|
||||
└── On close: POST /sidebar-agent/event { type: 'agent_done' }
|
||||
`AUTH_TOKEN` is **never** valid for `/ws` directly. The session token is
|
||||
**never** valid for `/pty-session` or `/command`. Strict separation
|
||||
prevents an SSE or page-content token leak from escalating into shell
|
||||
access.
|
||||
|
||||
7. server.ts processAgentEvent()
|
||||
├── Adds entry to chat buffer (in-memory + disk)
|
||||
├── On agent_done: sets tab status to 'idle'
|
||||
└── On agent_done: processes next queued message for that tab
|
||||
## Threat model
|
||||
|
||||
8. sidepanel.js pollChat() (every 300ms during fast poll)
|
||||
├── GET /sidebar-chat?after=${chatLineCount}&tabId=${tabId}
|
||||
├── Renders new entries (text, tool_use, agent_done)
|
||||
└── On agent idle: removes thinking dots, stops fast poll
|
||||
```
|
||||
The Terminal pane **bypasses the prompt-injection security stack** on
|
||||
purpose — the user is typing directly to claude, there's no untrusted
|
||||
page content in the loop. Trust source is the keyboard, same as any
|
||||
local terminal.
|
||||
|
||||
## Arrow Hint Hide Flow (4-step signal chain)
|
||||
That trust assumption is load-bearing on three transport guarantees:
|
||||
|
||||
The welcome page shows a right-pointing arrow until the sidebar opens.
|
||||
1. **Local-only listener.** terminal-agent.ts binds `127.0.0.1` only.
|
||||
The dual-listener tunnel surface (server.ts `TUNNEL_PATHS`) does
|
||||
not include `/pty-session` or `/terminal/*`, so the tunnel returns
|
||||
404 by default-deny.
|
||||
2. **Origin gate.** `/ws` upgrades require
|
||||
`Origin: chrome-extension://<id>`. A localhost web page can't mount
|
||||
a cross-site WebSocket hijack against the shell because its Origin
|
||||
is a regular `http(s)://...`.
|
||||
3. **Session token auth.** Minted only by an authenticated
|
||||
`/pty-session` POST, scoped to one WS, auto-revoked on close.
|
||||
|
||||
```
|
||||
1. sidepanel.js updateConnection()
|
||||
└── chrome.runtime.sendMessage({ type: 'sidebarOpened' })
|
||||
Drop any one of those three and the whole tab becomes unsafe.
|
||||
|
||||
2. background.js
|
||||
└── chrome.tabs.sendMessage(activeTabId, { type: 'sidebarOpened' })
|
||||
## Lifecycle
|
||||
|
||||
3. content.js onMessage handler
|
||||
└── document.dispatchEvent(new CustomEvent('gstack-extension-ready'))
|
||||
- **Eager auto-connect.** Sidebar opens → tryAutoConnect polls for the
|
||||
bootstrap globals and connects as soon as they're set. No keypress
|
||||
required.
|
||||
- **One PTY per WS.** Closing the WebSocket SIGINTs claude, then SIGKILLs
|
||||
after 3s. The session token is revoked so a stolen token can't be
|
||||
replayed.
|
||||
- **No auto-reconnect on close.** The user sees "Session ended, click to
|
||||
start a new session." Auto-reconnect would burn a fresh claude session
|
||||
on every reload. v1.1 may add session resumption keyed on tab/session
|
||||
id (see TODOS).
|
||||
- **Manual restart anytime.** A `↻ Restart` button lives in the always-
|
||||
visible terminal toolbar — works mid-session, not just from the ENDED
|
||||
state.
|
||||
|
||||
4. welcome.html script
|
||||
└── addEventListener('gstack-extension-ready', () => arrow.classList.add('hidden'))
|
||||
```
|
||||
## Quick-action toolbar
|
||||
|
||||
The arrow does NOT hide when the extension loads. Only when the sidebar connects.
|
||||
Three browser-action buttons live next to the Restart button at the top
|
||||
of the Terminal pane:
|
||||
|
||||
## Auth Token Flow
|
||||
| Button | Behavior |
|
||||
|--------|----------|
|
||||
| 🧹 Cleanup | `window.gstackInjectToTerminal(prompt)` — pipes a "remove ads/banners" instruction into the live PTY. claude in the terminal sees it and acts. |
|
||||
| 📸 Screenshot | `POST /command screenshot` — direct browse-server call, no PTY involvement. |
|
||||
| 🍪 Cookies | Navigates to the `/cookie-picker` page. |
|
||||
|
||||
```
|
||||
Server starts → AUTH_TOKEN = crypto.randomUUID()
|
||||
│
|
||||
├── GET /health (no auth) → returns { token: AUTH_TOKEN }
|
||||
│
|
||||
├── background.js checkHealth() → authToken = data.token
|
||||
│ └── Refreshes on EVERY health poll (fixes stale token on restart)
|
||||
│
|
||||
├── sidepanel.js tryConnect() → serverToken from background or /health
|
||||
│ └── Used for chat polling: Authorization: Bearer ${serverToken}
|
||||
│
|
||||
└── sidebar-agent.ts refreshToken() → reads from .gstack/browse.json
|
||||
└── Used for event relay: Authorization: Bearer ${authToken}
|
||||
```
|
||||
The Inspector's "Send to Code" button uses the same `gstackInjectToTerminal`
|
||||
path to forward CSS inspector data into claude.
|
||||
|
||||
If the server restarts, all three components get fresh tokens within 10s
|
||||
(background health poll interval).
|
||||
## Debug surfaces (Activity / Refs / Inspector)
|
||||
|
||||
## Model Routing
|
||||
Behind the `debug` toggle in the footer. SSE-driven, independent of the
|
||||
Terminal pane:
|
||||
|
||||
`pickSidebarModel(message)` in server.ts classifies messages:
|
||||
- **Activity** — streams every browse command via `/activity/stream` SSE.
|
||||
- **Refs** — REST: `GET /refs` — current page's `@ref` element labels.
|
||||
- **Inspector** — CDP-based element picker; SSE on `/inspector/events`.
|
||||
|
||||
| Pattern | Model | Why |
|
||||
|---------|-------|-----|
|
||||
| "click @e24", "go to hn", "screenshot" | sonnet | Deterministic tool calls, no thinking needed |
|
||||
| "what does this page say?", "summarize" | opus | Needs comprehension |
|
||||
| "find bugs", "check for broken links" | opus | Analysis task |
|
||||
| "navigate to X and fill the form" | sonnet | Action-oriented, no analysis words |
|
||||
When the debug strip closes, the Terminal pane re-becomes visible.
|
||||
xterm.js doesn't auto-redraw when its container flips from `display:none`
|
||||
to `display:flex`, so sidepanel-terminal.js runs a `MutationObserver` on
|
||||
`#tab-terminal`'s class attribute and forces a fit + refresh when
|
||||
`.active` returns.
|
||||
|
||||
Analysis words (`what`, `why`, `how`, `summarize`, `describe`, `analyze`, `read X and Y`)
|
||||
always override action verbs and force opus.
|
||||
|
||||
## Known Failure Modes
|
||||
|
||||
| Failure | Symptom | Root Cause | Fix |
|
||||
|---------|---------|------------|-----|
|
||||
| Stale auth token | "Unauthorized" in input | Server restarted, background had old token | background.js refreshes token on every health poll |
|
||||
| Tab ID mismatch | Message sent, no response visible | Server assigned tabId 1, sidebar polling tabId 0 | switchChatTab preserves optimistic UI during switch |
|
||||
| Sidebar agent not running | Messages queue forever | Agent process failed to spawn or crashed | Check `ps aux | grep sidebar-agent` |
|
||||
| Agent stale token | Agent runs but no events appear in sidebar | sidebar-agent has old token from .gstack/browse.json | Agent re-reads token before each event POST |
|
||||
| Queue file missing | spawnClaude fails | Race between server start and agent start | Both sides create file if missing |
|
||||
| Optimistic UI blown away | User bubble + dots vanish | switchChatTab replaced DOM with welcome screen | Preserved DOM when lastOptimisticMsg is set |
|
||||
|
||||
## Per-Tab Concurrency
|
||||
|
||||
Each browser tab can run its own agent simultaneously:
|
||||
|
||||
- Server: `tabAgents: Map<number, TabAgentState>` with per-tab queue (max 5)
|
||||
- sidebar-agent: `processingTabs: Set<number>` prevents duplicate spawns
|
||||
- Two messages on same tab: queued sequentially, processed in order
|
||||
- Two messages on different tabs: run concurrently
|
||||
|
||||
## File Locations
|
||||
## Files
|
||||
|
||||
| Component | File | Runs in |
|
||||
|-----------|------|---------|
|
||||
| Sidebar UI | `extension/sidepanel.js` | Chrome side panel |
|
||||
| Sidebar UI shell | `extension/sidepanel.html` + `sidepanel.js` + `sidepanel.css` | Chrome side panel |
|
||||
| Terminal UI | `extension/sidepanel-terminal.js` + `extension/lib/xterm.js` | Chrome side panel |
|
||||
| Service worker | `extension/background.js` | Chrome background |
|
||||
| Content script | `extension/content.js` | Page context |
|
||||
| Welcome page | `browse/src/welcome.html` | Page context |
|
||||
| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
|
||||
| Agent process | `browse/src/sidebar-agent.ts` | Bun (non-compiled, can spawn) |
|
||||
| PTY agent | `browse/src/terminal-agent.ts` | Bun (non-compiled) |
|
||||
| PTY token store | `browse/src/pty-session-cookie.ts` | Bun (compiled, in server.ts) |
|
||||
| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
|
||||
| Queue file | `~/.gstack/sidebar-agent-queue.jsonl` | Filesystem |
|
||||
| State file | `.gstack/browse.json` | Filesystem |
|
||||
| Chat log | `~/.gstack/sessions/<id>/chat.jsonl` | Filesystem |
|
||||
| State file | `<stateDir>/browse.json` | Filesystem |
|
||||
| Terminal port | `<stateDir>/terminal-port` | Filesystem |
|
||||
| Internal token | `<stateDir>/terminal-internal-token` | Filesystem |
|
||||
| Claude probe | `<stateDir>/claude-available.json` | Filesystem |
|
||||
| Active tab | `<stateDir>/active-tab.json` | Filesystem (claude reads) |
|
||||
|
||||
+59
-4
@@ -287,6 +287,7 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
|
||||
const ALLOWED_TYPES = new Set([
|
||||
'getPort', 'setPort', 'getServerUrl', 'getToken', 'fetchRefs',
|
||||
'openSidePanel', 'sidebarOpened', 'command', 'sidebar-command',
|
||||
'getTabState',
|
||||
// Inspector message types
|
||||
'startInspector', 'stopInspector', 'elementPicked', 'pickerCancelled',
|
||||
'applyStyle', 'toggleClass', 'injectCSS', 'resetAll',
|
||||
@@ -302,6 +303,11 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
|
||||
return true;
|
||||
}
|
||||
|
||||
if (msg.type === 'getTabState') {
|
||||
snapshotTabs().then(snap => sendResponse(snap || { active: null, tabs: [] }));
|
||||
return true; // async sendResponse
|
||||
}
|
||||
|
||||
if (msg.type === 'setPort') {
|
||||
savePort(msg.port).then(() => {
|
||||
checkHealth();
|
||||
@@ -506,11 +512,48 @@ chrome.runtime.onInstalled.addListener(() => {
|
||||
// Fire on every service worker startup (covers persistent context reuse)
|
||||
autoOpenSidePanel();
|
||||
|
||||
// ─── Tab Switch Detection ────────────────────────────────────────
|
||||
// Notify sidepanel instantly when the user switches tabs in the browser.
|
||||
// This is faster than polling — the sidebar swaps chat context immediately.
|
||||
// ─── Tab Awareness ───────────────────────────────────────────────
|
||||
// Push live tab state to the sidepanel so claude in the Terminal pane
|
||||
// always has up-to-date tabs.json + active-tab.json on disk. The
|
||||
// sidepanel relays these to terminal-agent.ts over the live WebSocket;
|
||||
// terminal-agent writes the files for claude to read.
|
||||
|
||||
async function snapshotTabs() {
|
||||
try {
|
||||
const [active] = await chrome.tabs.query({ active: true, currentWindow: true });
|
||||
const all = await chrome.tabs.query({});
|
||||
const slim = all.map(t => ({
|
||||
tabId: t.id,
|
||||
url: t.url || '',
|
||||
title: t.title || '',
|
||||
active: !!t.active,
|
||||
windowId: t.windowId,
|
||||
pinned: !!t.pinned,
|
||||
audible: !!t.audible,
|
||||
}));
|
||||
return {
|
||||
active: active ? { tabId: active.id, url: active.url || '', title: active.title || '' } : null,
|
||||
tabs: slim,
|
||||
};
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
async function pushTabState(reason) {
|
||||
const snapshot = await snapshotTabs();
|
||||
if (!snapshot) return;
|
||||
chrome.runtime.sendMessage({
|
||||
type: 'browserTabState',
|
||||
reason,
|
||||
...snapshot,
|
||||
}).catch(() => {}); // expected: sidepanel may not be open
|
||||
}
|
||||
|
||||
chrome.tabs.onActivated.addListener((activeInfo) => {
|
||||
// Keep the legacy event for any consumer still listening to it (the chat
|
||||
// path is gone but the message type is harmless), and also fire the new
|
||||
// unified state push so claude's tabs.json reflects the new active tab.
|
||||
chrome.tabs.get(activeInfo.tabId, (tab) => {
|
||||
if (chrome.runtime.lastError || !tab) return;
|
||||
chrome.runtime.sendMessage({
|
||||
@@ -518,8 +561,20 @@ chrome.tabs.onActivated.addListener((activeInfo) => {
|
||||
tabId: activeInfo.tabId,
|
||||
url: tab.url || '',
|
||||
title: tab.title || '',
|
||||
}).catch(() => {}); // expected: sidepanel may not be open
|
||||
}).catch(() => {});
|
||||
});
|
||||
pushTabState('activated');
|
||||
});
|
||||
|
||||
chrome.tabs.onCreated.addListener(() => pushTabState('created'));
|
||||
chrome.tabs.onRemoved.addListener(() => pushTabState('removed'));
|
||||
chrome.tabs.onUpdated.addListener((_id, changeInfo) => {
|
||||
// Throttle: only re-push on URL or title changes, not on every loading
|
||||
// tick. We don't want to spam claude with a state push every 50ms while
|
||||
// a page loads.
|
||||
if (changeInfo.url || changeInfo.title || changeInfo.status === 'complete') {
|
||||
pushTabState('updated');
|
||||
}
|
||||
});
|
||||
|
||||
// ─── Startup ────────────────────────────────────────────────────
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"version": "0.1.0",
|
||||
"description": "Live activity feed and @ref overlays for gstack browse",
|
||||
"permissions": ["sidePanel", "storage", "activeTab", "scripting"],
|
||||
"host_permissions": ["http://127.0.0.1:*/"],
|
||||
"host_permissions": ["http://127.0.0.1:*/", "ws://127.0.0.1:*/"],
|
||||
"action": {
|
||||
"default_icon": {
|
||||
"16": "icons/icon-16.png",
|
||||
|
||||
@@ -0,0 +1,442 @@
|
||||
/**
|
||||
* Terminal sidebar tab — interactive Claude Code PTY in xterm.js.
|
||||
*
|
||||
* Lifecycle (per plan + codex review):
|
||||
* 1. Sidebar opens. Terminal is the default-active tab.
|
||||
* 2. Bootstrap card shows "Press any key to start Claude Code."
|
||||
* 3. On first keystroke (lazy spawn — codex finding #8): the extension
|
||||
* a) POSTs /pty-session on the browse server with the AUTH_TOKEN to
|
||||
* mint a short-lived HttpOnly cookie scoped to the terminal-agent.
|
||||
* b) Opens ws://127.0.0.1:<terminalPort>/ws — the cookie travels
|
||||
* automatically. Terminal-agent validates the cookie + the
|
||||
* chrome-extension:// Origin (codex finding #9), then spawns
|
||||
* claude in a PTY.
|
||||
* 4. Bytes pump both ways. Resize observer sends {type:"resize"} text
|
||||
* frames; tab-switch hooks send {type:"tabSwitch"} frames.
|
||||
* 5. PTY exits or WS closes -> we show "Session ended" with a restart
|
||||
* button. We do NOT auto-reconnect (codex finding #8: auto-reconnect
|
||||
* = burn fresh claude session every time).
|
||||
*
|
||||
* Keep this file dependency-free. xterm.js + xterm-addon-fit are loaded
|
||||
* via <script src> tags in sidepanel.html (window.Terminal, window.FitAddon).
|
||||
*/
|
||||
(function () {
|
||||
'use strict';
|
||||
|
||||
const Terminal = window.Terminal;
|
||||
const FitAddonModule = window.FitAddon;
|
||||
if (!Terminal) {
|
||||
console.error('[gstack terminal] xterm not loaded');
|
||||
return;
|
||||
}
|
||||
|
||||
const els = {
|
||||
bootstrap: document.getElementById('terminal-bootstrap'),
|
||||
bootstrapStatus: document.getElementById('terminal-bootstrap-status'),
|
||||
installCard: document.getElementById('terminal-install-card'),
|
||||
installRetry: document.getElementById('terminal-install-retry'),
|
||||
mount: document.getElementById('terminal-mount'),
|
||||
ended: document.getElementById('terminal-ended'),
|
||||
restart: document.getElementById('terminal-restart'),
|
||||
restartNow: document.getElementById('terminal-restart-now'),
|
||||
};
|
||||
|
||||
/** State machine. */
|
||||
const STATE = { IDLE: 'idle', CONNECTING: 'connecting', LIVE: 'live', ENDED: 'ended', NO_CLAUDE: 'no-claude' };
|
||||
let state = STATE.IDLE;
|
||||
|
||||
let term = null;
|
||||
let fitAddon = null;
|
||||
let ws = null;
|
||||
|
||||
function show(el) { el.style.display = ''; }
|
||||
function hide(el) { el.style.display = 'none'; }
|
||||
|
||||
function setState(next, opts = {}) {
|
||||
state = next;
|
||||
switch (next) {
|
||||
case STATE.IDLE:
|
||||
show(els.bootstrap);
|
||||
hide(els.installCard);
|
||||
hide(els.mount);
|
||||
hide(els.ended);
|
||||
els.bootstrapStatus.textContent = opts.message || 'Press any key to start Claude Code.';
|
||||
break;
|
||||
case STATE.CONNECTING:
|
||||
show(els.bootstrap);
|
||||
hide(els.installCard);
|
||||
hide(els.mount);
|
||||
hide(els.ended);
|
||||
els.bootstrapStatus.textContent = 'Connecting...';
|
||||
break;
|
||||
case STATE.LIVE:
|
||||
hide(els.bootstrap);
|
||||
hide(els.installCard);
|
||||
show(els.mount);
|
||||
hide(els.ended);
|
||||
break;
|
||||
case STATE.ENDED:
|
||||
hide(els.bootstrap);
|
||||
hide(els.installCard);
|
||||
hide(els.mount);
|
||||
show(els.ended);
|
||||
break;
|
||||
case STATE.NO_CLAUDE:
|
||||
show(els.bootstrap);
|
||||
show(els.installCard);
|
||||
hide(els.mount);
|
||||
hide(els.ended);
|
||||
els.bootstrapStatus.textContent = '';
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Read auth + terminalPort from the server's /health. We don't fetch this
|
||||
* here — sidepanel.js already polls /health for connection state and
|
||||
* exposes the relevant fields on window.gstackHealth (set below in init()).
|
||||
* If terminalPort is missing, the agent isn't ready yet.
|
||||
*/
|
||||
function getHealth() {
|
||||
return window.gstackHealth || {};
|
||||
}
|
||||
|
||||
function getServerPort() {
|
||||
return window.gstackServerPort || null;
|
||||
}
|
||||
|
||||
function getAuthToken() {
|
||||
return window.gstackAuthToken || null;
|
||||
}
|
||||
|
||||
/**
|
||||
* POST /pty-session to mint a fresh terminal session. Returns
|
||||
* { terminalPort, ptySessionToken, expiresAt } on success, or
|
||||
* { error } on failure. The token rides on the WebSocket
|
||||
* Sec-WebSocket-Protocol header, which is the only auth header
|
||||
* the browser WebSocket API lets us set. The token is NOT persisted —
|
||||
* each sidebar load mints a fresh one and discards it on close.
|
||||
*/
|
||||
async function mintSession() {
|
||||
const serverPort = getServerPort();
|
||||
const token = getAuthToken();
|
||||
if (!serverPort || !token) {
|
||||
return { error: 'browse server not ready' };
|
||||
}
|
||||
try {
|
||||
const resp = await fetch(`http://127.0.0.1:${serverPort}/pty-session`, {
|
||||
method: 'POST',
|
||||
headers: { 'Authorization': `Bearer ${token}` },
|
||||
credentials: 'include',
|
||||
});
|
||||
if (!resp.ok) {
|
||||
const body = await resp.text().catch(() => '');
|
||||
return { error: `${resp.status} ${body || resp.statusText}` };
|
||||
}
|
||||
return await resp.json();
|
||||
} catch (err) {
|
||||
return { error: err && err.message ? err.message : String(err) };
|
||||
}
|
||||
}
|
||||
|
||||
async function checkClaudeAvailable(terminalPort) {
|
||||
try {
|
||||
const resp = await fetch(`http://127.0.0.1:${terminalPort}/claude-available`, {
|
||||
credentials: 'include',
|
||||
});
|
||||
if (!resp.ok) return { available: false };
|
||||
return await resp.json();
|
||||
} catch {
|
||||
return { available: false };
|
||||
}
|
||||
}
|
||||
|
||||
function ensureXterm() {
|
||||
if (term) return;
|
||||
term = new Terminal({
|
||||
fontFamily: '"JetBrains Mono", "SF Mono", Menlo, monospace',
|
||||
fontSize: 13,
|
||||
theme: { background: '#0a0a0a', foreground: '#e5e5e5' },
|
||||
cursorBlink: true,
|
||||
scrollback: 5000,
|
||||
allowTransparency: false,
|
||||
convertEol: false,
|
||||
});
|
||||
if (FitAddonModule && FitAddonModule.FitAddon) {
|
||||
fitAddon = new FitAddonModule.FitAddon();
|
||||
term.loadAddon(fitAddon);
|
||||
}
|
||||
// CRITICAL: caller must make els.mount visible BEFORE invoking
|
||||
// ensureXterm. xterm.js measures the container synchronously inside
|
||||
// term.open() — if the mount is display:none, xterm caches a 0-size
|
||||
// viewport and never auto-grows even after the container goes
|
||||
// visible. The visible-first pattern is enforced by connect()
|
||||
// calling setState(STATE.LIVE) before us.
|
||||
term.open(els.mount);
|
||||
// First fit waits for the next paint frame so the browser has
|
||||
// applied the .active class transition. Otherwise term.cols/rows
|
||||
// can come back as the minimum (2x2) when the mount's clientHeight
|
||||
// is still being computed.
|
||||
requestAnimationFrame(() => {
|
||||
try {
|
||||
fitAddon && fitAddon.fit();
|
||||
if (ws && ws.readyState === WebSocket.OPEN) {
|
||||
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
|
||||
}
|
||||
} catch {}
|
||||
});
|
||||
|
||||
const ro = new ResizeObserver(() => {
|
||||
try {
|
||||
fitAddon && fitAddon.fit();
|
||||
if (ws && ws.readyState === WebSocket.OPEN) {
|
||||
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
|
||||
}
|
||||
} catch {}
|
||||
});
|
||||
ro.observe(els.mount);
|
||||
|
||||
term.onData((data) => {
|
||||
if (ws && ws.readyState === WebSocket.OPEN) {
|
||||
ws.send(new TextEncoder().encode(data));
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Inject a string into the live PTY (the same way a real keystroke would).
|
||||
* Used by the toolbar's Cleanup button and the Inspector's "Send to Code"
|
||||
* action so the user can drive claude from outside-the-keyboard surfaces.
|
||||
* Returns true if the bytes went out, false if no live session.
|
||||
*/
|
||||
window.gstackInjectToTerminal = function (text) {
|
||||
if (!text || !ws || ws.readyState !== WebSocket.OPEN) return false;
|
||||
try {
|
||||
ws.send(new TextEncoder().encode(text));
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
};
|
||||
|
||||
async function connect() {
|
||||
if (state !== STATE.IDLE) return; // already connecting/live
|
||||
setState(STATE.CONNECTING);
|
||||
|
||||
const minted = await mintSession();
|
||||
if (minted.error) {
|
||||
setState(STATE.IDLE, { message: `Cannot start: ${minted.error}` });
|
||||
return;
|
||||
}
|
||||
const { terminalPort, ptySessionToken } = minted;
|
||||
if (!ptySessionToken) {
|
||||
setState(STATE.IDLE, { message: 'Cannot start: no session token returned' });
|
||||
return;
|
||||
}
|
||||
|
||||
// Pre-flight: does claude even exist on PATH?
|
||||
const claudeStatus = await checkClaudeAvailable(terminalPort);
|
||||
if (!claudeStatus.available) {
|
||||
setState(STATE.NO_CLAUDE);
|
||||
return;
|
||||
}
|
||||
|
||||
// setState(LIVE) flips terminal-mount from display:none to display:flex.
|
||||
// We MUST do that BEFORE ensureXterm() — xterm.js measures the container
|
||||
// synchronously inside term.open() and a hidden container yields a 0x0
|
||||
// terminal that never recovers. ensureXterm + the requestAnimationFrame
|
||||
// fit() inside it run after the browser has applied the layout.
|
||||
setState(STATE.LIVE);
|
||||
ensureXterm();
|
||||
|
||||
// Token rides on Sec-WebSocket-Protocol — the only auth header the
|
||||
// browser WebSocket API lets us set. Cross-port HttpOnly cookies with
|
||||
// SameSite=Strict don't survive the jump from server.ts:34567 to the
|
||||
// agent's random port from a chrome-extension origin, so cookies
|
||||
// alone weren't reliable.
|
||||
ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${ptySessionToken}`]);
|
||||
ws.binaryType = 'arraybuffer';
|
||||
|
||||
ws.addEventListener('open', () => {
|
||||
try {
|
||||
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
|
||||
} catch {}
|
||||
// Push a fresh tab snapshot so claude's tabs.json is populated by
|
||||
// the time the lazy spawn finishes booting. Background.js exposes
|
||||
// the snapshot helper via chrome.runtime; we ask for it here and
|
||||
// forward whatever comes back.
|
||||
try {
|
||||
chrome.runtime.sendMessage({ type: 'getTabState' }, (resp) => {
|
||||
if (resp && ws && ws.readyState === WebSocket.OPEN) {
|
||||
try {
|
||||
ws.send(JSON.stringify({
|
||||
type: 'tabState',
|
||||
active: resp.active,
|
||||
tabs: resp.tabs,
|
||||
reason: 'initial',
|
||||
}));
|
||||
} catch {}
|
||||
}
|
||||
});
|
||||
} catch {}
|
||||
// Send a single byte to nudge the agent to spawn claude (lazy-spawn trigger).
|
||||
try { ws.send(new TextEncoder().encode('\n')); } catch {}
|
||||
});
|
||||
|
||||
ws.addEventListener('message', (ev) => {
|
||||
if (typeof ev.data === 'string') {
|
||||
// Agent control message (rare). Treat as JSON; error frames carry code.
|
||||
try {
|
||||
const msg = JSON.parse(ev.data);
|
||||
if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') {
|
||||
setState(STATE.NO_CLAUDE);
|
||||
try { ws.close(); } catch {}
|
||||
}
|
||||
} catch {}
|
||||
return;
|
||||
}
|
||||
// Binary: feed to xterm.
|
||||
const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data;
|
||||
term.write(buf);
|
||||
});
|
||||
|
||||
ws.addEventListener('close', () => {
|
||||
ws = null;
|
||||
if (state !== STATE.NO_CLAUDE) setState(STATE.ENDED);
|
||||
});
|
||||
|
||||
ws.addEventListener('error', (err) => {
|
||||
console.error('[gstack terminal] ws error', err);
|
||||
});
|
||||
}
|
||||
|
||||
function teardown() {
|
||||
try { ws && ws.close(); } catch {}
|
||||
ws = null;
|
||||
if (term) {
|
||||
try { term.dispose(); } catch {}
|
||||
term = null;
|
||||
fitAddon = null;
|
||||
}
|
||||
setState(STATE.IDLE);
|
||||
}
|
||||
|
||||
// ─── Wiring ───────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Force a fresh session: close any open WS, dispose xterm, return to
|
||||
* IDLE, kick off auto-connect. Safe to call from any state.
|
||||
*/
|
||||
function forceRestart() {
|
||||
try { ws && ws.close(); } catch {}
|
||||
ws = null;
|
||||
if (term) {
|
||||
try { term.dispose(); } catch {}
|
||||
term = null;
|
||||
fitAddon = null;
|
||||
}
|
||||
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
|
||||
tryAutoConnect();
|
||||
}
|
||||
|
||||
/**
|
||||
* Repaint xterm when the Terminal pane becomes visible. xterm.js has a
|
||||
* known issue where its renderer doesn't redraw after a display:none →
|
||||
* display:flex flip — the canvas/DOM stays blank until something forces
|
||||
* a layout pass. fit() recomputes dimensions, refresh() redraws.
|
||||
*/
|
||||
function repaintIfLive() {
|
||||
if (state !== STATE.LIVE || !term) return;
|
||||
try { fitAddon && fitAddon.fit(); } catch {}
|
||||
try { term.refresh(0, term.rows - 1); } catch {}
|
||||
try {
|
||||
if (ws && ws.readyState === WebSocket.OPEN) {
|
||||
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
|
||||
function init() {
|
||||
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
|
||||
|
||||
els.installRetry?.addEventListener('click', () => {
|
||||
// Re-probe claude on PATH, then try a connect.
|
||||
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
|
||||
tryAutoConnect();
|
||||
});
|
||||
|
||||
// Two restart buttons:
|
||||
// - els.restart lives inside the ENDED state card (visible only after
|
||||
// a session has ended).
|
||||
// - els.restartNow lives in the always-visible toolbar (lets the user
|
||||
// force a fresh claude mid-session without waiting for it to exit).
|
||||
els.restart?.addEventListener('click', forceRestart);
|
||||
els.restartNow?.addEventListener('click', forceRestart);
|
||||
|
||||
|
||||
// Live browser-tab state. background.js → sidepanel.js → us. We
|
||||
// forward over the live PTY WebSocket; terminal-agent.ts writes
|
||||
// <stateDir>/active-tab.json + <stateDir>/tabs.json so claude can
|
||||
// always read the current tab landscape.
|
||||
document.addEventListener('gstack:tab-state', (ev) => {
|
||||
if (!ws || ws.readyState !== WebSocket.OPEN) return;
|
||||
try {
|
||||
ws.send(JSON.stringify({
|
||||
type: 'tabState',
|
||||
active: ev.detail?.active,
|
||||
tabs: ev.detail?.tabs,
|
||||
reason: ev.detail?.reason,
|
||||
}));
|
||||
} catch {}
|
||||
});
|
||||
|
||||
// Repaint after a debug-tab → primary-pane transition. The debug
|
||||
// tabs (Activity / Refs / Inspector) hide the Terminal pane via
|
||||
// .tab-content { display: none }; xterm doesn't auto-redraw when its
|
||||
// container flips back to visible, so we listen for the close-debug
|
||||
// event and force a fit + refresh.
|
||||
const observer = new MutationObserver(() => {
|
||||
const term = document.getElementById('tab-terminal');
|
||||
if (term?.classList.contains('active')) {
|
||||
requestAnimationFrame(repaintIfLive);
|
||||
}
|
||||
});
|
||||
const target = document.getElementById('tab-terminal');
|
||||
if (target) observer.observe(target, { attributes: true, attributeFilter: ['class'] });
|
||||
|
||||
tryAutoConnect();
|
||||
}
|
||||
|
||||
/**
|
||||
* Eager-connect when the sidebar opens. Polls for sidepanel.js to populate
|
||||
* window.gstackServerPort + window.gstackAuthToken (which it does as soon
|
||||
* as /health succeeds), then fires connect() automatically. The user
|
||||
* doesn't have to press a key — Terminal is the default tab and "tap to
|
||||
* start" was a needless paper cut on every reload.
|
||||
*/
|
||||
function tryAutoConnect() {
|
||||
if (state !== STATE.IDLE) return;
|
||||
let waited = 0;
|
||||
const tick = () => {
|
||||
// If the user navigated away (Chat tab) or already connected, drop out.
|
||||
if (state !== STATE.IDLE) return;
|
||||
if (getServerPort() && getAuthToken()) {
|
||||
connect();
|
||||
return;
|
||||
}
|
||||
waited += 200;
|
||||
if (waited > 15000) {
|
||||
setState(STATE.IDLE, { message: 'Browse server not ready. Reload sidebar to retry.' });
|
||||
return;
|
||||
}
|
||||
setTimeout(tick, 200);
|
||||
};
|
||||
tick();
|
||||
}
|
||||
|
||||
if (document.readyState === 'loading') {
|
||||
document.addEventListener('DOMContentLoaded', init);
|
||||
} else {
|
||||
init();
|
||||
}
|
||||
})();
|
||||
@@ -675,6 +675,118 @@ body::after {
|
||||
}
|
||||
.tab-content.active { display: flex; flex-direction: column; }
|
||||
|
||||
/* ─── Terminal Tab ────────────────────────────────────────────── */
|
||||
/* The Terminal pane manages its own scrolling (xterm has a viewport with
|
||||
scrollback). The default .tab-content rules above set overflow-y: auto,
|
||||
which collapses min-height for nested flex children — that's why
|
||||
.terminal-mount couldn't grow to fill available space. Override here. */
|
||||
#tab-terminal {
|
||||
background: #0a0a0a;
|
||||
padding: 0;
|
||||
overflow: hidden;
|
||||
min-height: 0;
|
||||
}
|
||||
#tab-terminal.active {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
.terminal-toolbar {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 6px;
|
||||
padding: 4px 8px;
|
||||
border-bottom: 1px solid #1a1a1a;
|
||||
background: #0a0a0a;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
.terminal-toolbar-actions {
|
||||
display: flex;
|
||||
gap: 4px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.terminal-toolbar-btn {
|
||||
background: transparent;
|
||||
border: 1px solid #27272a;
|
||||
color: #a1a1aa;
|
||||
padding: 3px 10px;
|
||||
font-size: 11px;
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
border-radius: 3px;
|
||||
cursor: pointer;
|
||||
}
|
||||
.terminal-toolbar-btn:hover {
|
||||
color: #f59e0b;
|
||||
border-color: #f59e0b;
|
||||
}
|
||||
.terminal-bootstrap {
|
||||
flex: 1;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
text-align: center;
|
||||
color: #71717a;
|
||||
padding: 24px;
|
||||
}
|
||||
.terminal-bootstrap-icon {
|
||||
font-size: 32px;
|
||||
color: #f59e0b;
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
.terminal-bootstrap p { margin: 4px 0; }
|
||||
.terminal-install-card {
|
||||
margin: 24px;
|
||||
padding: 16px;
|
||||
border: 1px solid #27272a;
|
||||
border-radius: 6px;
|
||||
text-align: center;
|
||||
}
|
||||
.terminal-install-card a { color: #f59e0b; }
|
||||
.install-retry-btn {
|
||||
margin-top: 12px;
|
||||
padding: 6px 14px;
|
||||
background: #f59e0b;
|
||||
color: #0a0a0a;
|
||||
border: none;
|
||||
border-radius: 4px;
|
||||
font-family: inherit;
|
||||
font-size: 12px;
|
||||
cursor: pointer;
|
||||
}
|
||||
.install-retry-btn:hover { opacity: 0.9; }
|
||||
.terminal-mount {
|
||||
/* min-height: 0 is the standard flex-overflow fix — without it, a flex
|
||||
item with overflowing content can't shrink below its content size,
|
||||
so flex:1 refuses to expand into available space and xterm renders
|
||||
into whatever the content happens to be (i.e. its own initial 2x2
|
||||
measurement). With min-height:0 the item respects the flex parent's
|
||||
remaining space and xterm grows to fill it. */
|
||||
flex: 1 1 0;
|
||||
min-height: 0;
|
||||
width: 100%;
|
||||
background: #0a0a0a;
|
||||
padding: 8px;
|
||||
box-sizing: border-box;
|
||||
/* position: relative so xterm's absolutely-positioned helpers (the
|
||||
hidden textarea for input) anchor inside us, not on body. */
|
||||
position: relative;
|
||||
}
|
||||
.terminal-mount .xterm,
|
||||
.terminal-mount .xterm .xterm-viewport,
|
||||
.terminal-mount .xterm .xterm-screen {
|
||||
height: 100% !important;
|
||||
}
|
||||
.terminal-ended {
|
||||
flex: 1;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
color: #71717a;
|
||||
padding: 24px;
|
||||
}
|
||||
|
||||
/* ─── Activity Feed ───────────────────────────────────── */
|
||||
#activity-feed { flex: 1; }
|
||||
|
||||
|
||||
+32
-64
@@ -3,6 +3,7 @@
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<link rel="stylesheet" href="sidepanel.css">
|
||||
<link rel="stylesheet" href="lib/xterm.css">
|
||||
</head>
|
||||
<body>
|
||||
<!-- Security shield — reflects ~/.gstack/security/session-state.json status.
|
||||
@@ -24,54 +25,38 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Security event banner — fires on prompt injection detection.
|
||||
Variant A from /plan-design-review 2026-04-19: centered alert-heavy,
|
||||
big red error icon, mono layer scores in expandable details. -->
|
||||
<div class="security-banner" id="security-banner" role="alert" aria-live="assertive" style="display:none">
|
||||
<button class="security-banner-close" id="security-banner-close" aria-label="Dismiss">×</button>
|
||||
<div class="security-banner-icon" aria-hidden="true">
|
||||
<svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
|
||||
<circle cx="12" cy="12" r="10"></circle>
|
||||
<line x1="12" y1="8" x2="12" y2="12"></line>
|
||||
<line x1="12" y1="16" x2="12.01" y2="16"></line>
|
||||
</svg>
|
||||
</div>
|
||||
<div class="security-banner-title" id="security-banner-title">Session terminated</div>
|
||||
<div class="security-banner-subtitle" id="security-banner-subtitle">prompt injection detected</div>
|
||||
<button class="security-banner-expand" id="security-banner-expand" aria-expanded="false" aria-controls="security-banner-details">
|
||||
<span>What happened</span>
|
||||
<svg class="security-banner-chevron" width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
|
||||
<polyline points="6 9 12 15 18 9"></polyline>
|
||||
</svg>
|
||||
</button>
|
||||
<div class="security-banner-details" id="security-banner-details" hidden>
|
||||
<div class="security-banner-section-label">SECURITY LAYERS</div>
|
||||
<div class="security-banner-layers" id="security-banner-layers"></div>
|
||||
<div class="security-banner-section-label" id="security-banner-suspect-label" hidden>SUSPECTED TEXT</div>
|
||||
<pre class="security-banner-suspect" id="security-banner-suspect" hidden></pre>
|
||||
</div>
|
||||
<div class="security-banner-actions" id="security-banner-actions" hidden>
|
||||
<button type="button" class="security-banner-btn security-banner-btn-block" id="security-banner-btn-block">Block session</button>
|
||||
<button type="button" class="security-banner-btn security-banner-btn-allow" id="security-banner-btn-allow">Allow and continue</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Browser tab bar -->
|
||||
<div class="browser-tabs" id="browser-tabs" style="display:none"></div>
|
||||
|
||||
<!-- Chat Tab (default, full height) -->
|
||||
<main id="tab-chat" class="tab-content active">
|
||||
<div class="chat-messages" id="chat-messages">
|
||||
<div class="chat-loading" id="chat-loading">
|
||||
<div class="chat-loading-spinner"></div>
|
||||
<p id="loading-status">Looking for browse server...</p>
|
||||
<pre id="loading-debug" class="muted" style="font-size:11px; font-family:'JetBrains Mono',monospace; white-space:pre-wrap; margin-top:8px; color:#71717A;"></pre>
|
||||
</div>
|
||||
<div class="chat-welcome" id="chat-welcome" style="display:none">
|
||||
<div class="chat-welcome-icon">G</div>
|
||||
<p>Send a message to Claude Code.</p>
|
||||
<p class="muted">Your agent will see it and act on it.</p>
|
||||
<!-- Terminal pane is now the sole primary surface. Activity / Refs /
|
||||
Inspector still exist behind the `debug` toggle in the footer. -->
|
||||
<main id="tab-terminal" class="tab-content active" role="tabpanel" aria-label="Terminal">
|
||||
<!-- Toolbar with browser quick-actions on the left, Restart on the right.
|
||||
Restart is always visible so the user can force a fresh claude any
|
||||
time, not just from the ENDED state. -->
|
||||
<div class="terminal-toolbar" id="terminal-toolbar">
|
||||
<div class="terminal-toolbar-actions">
|
||||
<button id="chat-cleanup-btn" class="terminal-toolbar-btn" title="Remove ads, banners, popups">🧹 Cleanup</button>
|
||||
<button id="chat-screenshot-btn" class="terminal-toolbar-btn" title="Take a screenshot">📸 Screenshot</button>
|
||||
<button id="chat-cookies-btn" class="terminal-toolbar-btn" title="Import cookies from your browser">🍪 Cookies</button>
|
||||
</div>
|
||||
<button class="terminal-toolbar-btn" id="terminal-restart-now" title="Restart Claude Code session">↻ Restart</button>
|
||||
</div>
|
||||
<div class="terminal-bootstrap" id="terminal-bootstrap">
|
||||
<div class="terminal-bootstrap-icon">▸</div>
|
||||
<p id="terminal-bootstrap-status">Starting Claude Code...</p>
|
||||
<p class="muted" id="terminal-bootstrap-hint">Real PTY. Real terminal. Real claude.</p>
|
||||
<pre id="loading-debug" class="muted" style="font-size:11px; font-family:'JetBrains Mono',monospace; white-space:pre-wrap; margin-top:8px; color:#71717A;"></pre>
|
||||
</div>
|
||||
<div class="terminal-install-card" id="terminal-install-card" style="display:none">
|
||||
<p><strong>Claude Code not found</strong></p>
|
||||
<p class="muted">Install: <a href="https://docs.anthropic.com/en/docs/claude-code" target="_blank">docs.anthropic.com/en/docs/claude-code</a></p>
|
||||
<button class="install-retry-btn" id="terminal-install-retry">I installed it — try again</button>
|
||||
</div>
|
||||
<div class="terminal-mount" id="terminal-mount" style="display:none"></div>
|
||||
<div class="terminal-ended" id="terminal-ended" style="display:none">
|
||||
<p>Session ended.</p>
|
||||
<button class="install-retry-btn" id="terminal-restart">Start a new session</button>
|
||||
</div>
|
||||
</main>
|
||||
|
||||
@@ -174,30 +159,10 @@
|
||||
</div>
|
||||
</main>
|
||||
|
||||
<!-- Experimental chat banner (shown when chatEnabled) -->
|
||||
<div id="experimental-banner" class="experimental-banner" style="display: none;">
|
||||
Browser co-pilot — controls this browser, reports back to your workspace
|
||||
</div>
|
||||
|
||||
<!-- Quick Actions Toolbar -->
|
||||
<div class="quick-actions" id="quick-actions">
|
||||
<button id="chat-cleanup-btn" class="quick-action-btn" title="Remove ads, banners, popups">🧹 Cleanup</button>
|
||||
<button id="chat-screenshot-btn" class="quick-action-btn" title="Take a screenshot">📸 Screenshot</button>
|
||||
<button id="chat-cookies-btn" class="quick-action-btn" title="Import cookies from your browser">🍪 Cookies</button>
|
||||
</div>
|
||||
|
||||
<!-- Command Bar -->
|
||||
<div class="command-bar">
|
||||
<button class="stop-btn" id="stop-agent-btn" title="Stop agent" style="display: none;">■</button>
|
||||
<input type="text" class="command-input" id="command-input" placeholder="Ask about this page..." autocomplete="off" spellcheck="false">
|
||||
<button class="send-btn" id="send-btn" title="Send">↑</button>
|
||||
</div>
|
||||
|
||||
<!-- Footer with connection + debug toggle -->
|
||||
<footer>
|
||||
<div class="footer-left">
|
||||
<button class="debug-toggle" id="debug-toggle" title="Toggle debug panels">debug</button>
|
||||
<button class="footer-btn" id="clear-chat" title="Clear chat">clear</button>
|
||||
<button class="footer-btn" id="reload-sidebar" title="Reload sidebar">reload</button>
|
||||
</div>
|
||||
<div class="footer-right">
|
||||
@@ -215,6 +180,9 @@
|
||||
<button class="tab close-debug" id="close-debug" title="Close debug">×</button>
|
||||
</nav>
|
||||
|
||||
<script src="lib/xterm.js"></script>
|
||||
<script src="lib/xterm-addon-fit.js"></script>
|
||||
<script src="sidepanel.js"></script>
|
||||
<script src="sidepanel-terminal.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
+82
-950
File diff suppressed because it is too large
Load Diff
+6
-3
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gstack",
|
||||
"version": "1.13.1.0",
|
||||
"version": "1.15.0.0",
|
||||
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
|
||||
"license": "MIT",
|
||||
"type": "module",
|
||||
@@ -9,7 +9,8 @@
|
||||
"make-pdf": "./make-pdf/dist/pdf"
|
||||
},
|
||||
"scripts": {
|
||||
"build": "bun run gen:skill-docs --host all; bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile design/src/cli.ts --outfile design/dist/design && bun build --compile make-pdf/src/cli.ts --outfile make-pdf/dist/pdf && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && git rev-parse HEAD > design/dist/.version && git rev-parse HEAD > make-pdf/dist/.version && chmod +x browse/dist/browse browse/dist/find-browse design/dist/design make-pdf/dist/pdf bin/gstack-global-discover && (rm -f .*.bun-build || true)",
|
||||
"build": "bun run vendor:xterm && bun run gen:skill-docs --host all; bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile design/src/cli.ts --outfile design/dist/design && bun build --compile make-pdf/src/cli.ts --outfile make-pdf/dist/pdf && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && git rev-parse HEAD > design/dist/.version && git rev-parse HEAD > make-pdf/dist/.version && chmod +x browse/dist/browse browse/dist/find-browse design/dist/design make-pdf/dist/pdf bin/gstack-global-discover && (rm -f .*.bun-build || true)",
|
||||
"vendor:xterm": "mkdir -p extension/lib && cp node_modules/xterm/lib/xterm.js extension/lib/xterm.js && cp node_modules/xterm/css/xterm.css extension/lib/xterm.css && cp node_modules/xterm-addon-fit/lib/xterm-addon-fit.js extension/lib/xterm-addon-fit.js",
|
||||
"dev:make-pdf": "bun run make-pdf/src/cli.ts",
|
||||
"dev:design": "bun run design/src/cli.ts",
|
||||
"gen:skill-docs": "bun run scripts/gen-skill-docs.ts",
|
||||
@@ -62,6 +63,8 @@
|
||||
],
|
||||
"devDependencies": {
|
||||
"@anthropic-ai/claude-agent-sdk": "0.2.117",
|
||||
"@anthropic-ai/sdk": "^0.78.0"
|
||||
"@anthropic-ai/sdk": "^0.78.0",
|
||||
"xterm": "5",
|
||||
"xterm-addon-fit": "^0.8.0"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1677,30 +1677,8 @@ describe('no compiled binaries in git', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('sidebar agent (#584)', () => {
|
||||
// #584 — Sidebar Write: sidebar-agent.ts allowedTools includes Write
|
||||
test('sidebar-agent.ts allowedTools includes Write', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||||
// Find the allowedTools line in the askClaude function
|
||||
const match = content.match(/--allowedTools['"]\s*,\s*['"]([^'"]+)['"]/);
|
||||
expect(match).not.toBeNull();
|
||||
expect(match![1]).toContain('Write');
|
||||
});
|
||||
|
||||
// #584 — Server Write: server.ts allowedTools includes Write (DRY parity)
|
||||
test('server.ts allowedTools excludes Write (agent is read-only + Bash)', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'server.ts'), 'utf-8');
|
||||
// Find the sidebar allowedTools in the headed-mode path
|
||||
const match = content.match(/--allowedTools['"]\s*,\s*['"]([^'"]+)['"]/);
|
||||
expect(match).not.toBeNull();
|
||||
expect(match![1]).toContain('Bash');
|
||||
expect(match![1]).not.toContain('Write');
|
||||
});
|
||||
|
||||
// #584 — Sidebar stderr: stderr handler is not empty
|
||||
test('sidebar-agent.ts stderr handler is not empty', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||||
// The stderr handler should NOT be an empty arrow function
|
||||
expect(content).not.toContain("proc.stderr.on('data', () => {})");
|
||||
});
|
||||
});
|
||||
// `sidebar agent (#584)` describe block was here. sidebar-agent.ts and
|
||||
// the entire chat-queue path were ripped in favor of the interactive
|
||||
// claude PTY (terminal-agent.ts); these assertions had no target file.
|
||||
// Terminal-pane invariants are covered by browse/test/sidebar-tabs.test.ts
|
||||
// and browse/test/terminal-agent.test.ts.
|
||||
|
||||
Reference in New Issue
Block a user