mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-02 00:01:37 +02:00
3bef43bc5ad85a3e46ce54db5ae6d0f7f181fef3
28 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
19770ea8b4 |
v1.51.0.0 feat: $B memory diagnostic + 4 CDP-resource leak fixes (#1751)
* add withCdpSession + getOrCreateCdpSession helpers
Two CDP-session lifecycle helpers in cdp-bridge.ts:
- withCdpSession(page, fn): ephemeral session with try/finally detach.
For one-shot CDP work (archive snapshots, $B memory, single
Page.captureScreenshot) where the caller doesn't need session reuse.
- getOrCreateCdpSession(page, cache): cached long-lived session that
registers a page.once('close') hook to BOTH delete the cache entry
AND call session.detach(). Pre-helper code only deleted the cache
entry, leaving the Chromium-side CDP target attached until the
underlying transport dropped.
Pure addition. Existing callers untouched in this commit; they migrate
in the next commit alongside the static-grep test that pins the
invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* migrate 3 CDP-session sites to lifecycle helpers
Fixes the CDP-target leak class identified by /codex outside-voice on
the eng review (D11 EXPAND_SCOPE). All three sites called
`page.context().newCDPSession(page)` directly and either forgot the
detach entirely (cdp-bridge cache cleanup), only detached on the
success path (write-commands archive), or detached on framenavigated
but not page-close (cdp-inspector).
- cdp-bridge.ts: `getCdpSession` now delegates to
`getOrCreateCdpSession`, which registers a `page.once('close')` hook
that BOTH removes the cache entry AND calls `session.detach()`.
- cdp-inspector.ts: same migration for the inspector's session pool.
Keeps the existing framenavigated detach (more granular than close
for DOM/CSS state invalidation) plus an inspector-layer close hook
for the initializedPages WeakSet.
- write-commands.ts archive: wraps Page.captureSnapshot in
withCdpSession so the detach runs in `finally`, including the path
where captureSnapshot throws.
The static-grep tripwire (next commit) pins the invariant so future
direct calls to newCDPSession fail CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add CDP-session cleanup tripwire + helper unit tests
browse/test/cdp-session-cleanup.test.ts pins the invariant that no
source file outside cdp-bridge.ts may call newCDPSession() directly.
If a future refactor reintroduces the direct call, CI fails with a
file:line list and a pointer to the right helper to use instead
(withCdpSession for one-shot, getOrCreateCdpSession for cached).
Also covers the helpers themselves with fake-Page unit tests:
- withCdpSession detaches on success
- withCdpSession detaches on throw (the actual leak fix)
- withCdpSession swallows detach errors so they don't mask fn errors
- getOrCreateCdpSession caches the session across calls
- close hook detaches AND clears the cache
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* extract createSseEndpoint helper with cleanup contract
browse/src/sse-helpers.ts owns the SSE cleanup invariant:
cleanup runs on abort, enqueue failure, AND heartbeat failure,
exactly once, regardless of which edge fires first.
Pre-helper, /activity/stream and /inspector/events ran cleanup only on
the req.signal.abort edge. If the underlying TCP died without firing
abort (Chromium MV3 service-worker suspend, intermediate proxy
half-close), the subscriber closure stayed in the Set capturing the
ReadableStreamDefaultController plus any payloads queued behind it. Over
a multi-day sidebar session this compounded into multi-MB of retained
controllers per dead connection.
Caller surface: initialReplay (optional, for gap replay or state
snapshots), subscribe (live-event source), liveEventName (SSE event
name for live wrap), heartbeatMs. send() helper handles JSON encoding
with sanitizeReplacer + lone-surrogate stripping.
Unit tests pin all three cleanup edges + idempotency + replay ordering
+ surrogate sanitization. Endpoint refactors land in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* route /activity/stream + /inspector/events through createSseEndpoint
Both endpoints collapse from ~45 lines of in-line ReadableStream wiring
to ~8 lines of helper config. Behavior preserved bit-for-bit by the
new sse-helpers tests:
- initial replay (activity gap + history, inspector state snapshot)
- live event subscription
- 15s heartbeat
- SSE framing
- sanitizeReplacer applied to every JSON.stringify
The leak fix is the cleanup contract: pre-refactor, both endpoints ran
cleanup only on req.signal.abort. If TCP died without firing abort
(Chromium MV3 SW suspend, intermediate proxy half-close), the
subscriber closure stayed in the Set forever capturing the
ReadableStreamDefaultController + queued payloads. Post-refactor, an
enqueue-failure or heartbeat-failure on a dead consumer triggers the
same idempotent cleanup as abort would.
Net: -83 / +15 in server.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* cap inspector modificationHistory at 200 entries
Pre-cap, modificationHistory was an unbounded module-scoped array that
grew for every CSS edit through $B css across the entire session.
Small per-entry footprint but no upper bound, the kind of slow leak
that compounds over multi-day inspector use.
Cap is 200, oldest evicted on push past the cap. modHistoryTotalPushed
stays monotonic across the session so undoModification can tell the
user when their target index has been evicted, instead of just the
opaque pre-cap "No modification at index 500" with no context.
__testInternals export lets the cap + eviction error be unit-tested
without spinning up a CDP-driven Page. Production code must continue
to go through modifyStyle / undoModification / resetModifications.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add BrowserManager.getMemorySnapshot() + shared types
Diagnostic foundation for $B memory and the /memory endpoint that land
in the next two commits. Collects:
- Bun process memory via process.memoryUsage (cross-platform, accurate).
- Per-tab JS heap via CDP Performance.getMetrics, lazy per tracked page,
swallows target-died errors so a dying tab doesn't poison the
snapshot for the rest.
- Chromium process tree via SystemInfo.getProcessInfo (PID + type +
CPU time). RSS is NOT exposed via CDP — the eng review (D2 USE_CDP)
picked CDP over shelling to `ps`, so notes[] tells the caller why
the RSS column is absent and points at the follow-up TODO.
cdp-inspector exports getModificationHistoryStats so the snapshot can
surface buffer occupancy + cap + evicted count without reaching into
module-private state.
memory-snapshot.ts holds the shared types so server.ts and read-commands
can import without circular dep on browser-manager.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add \$B memory command
Registers 'memory' in META_COMMANDS, wires the meta-command dispatch
to a lazy-imported handler in memory-command.ts. Lazy because the
import graph (cdp-bridge + memory-snapshot + buffer accessors) isn't
useful to projects that never run the diagnostic.
The handler assembles MemoryStructureStats from the modules that own
each buffer (cdp-inspector mod history stats, activity subscriber
count, console/network/dialog buffer lengths, captureBuffer bytes,
inspectorSubscriber count via a new server.ts export) and calls
BrowserManager.getMemorySnapshot. Output is text by default, JSON with
--json so the sidebar footer and test harness can consume it
programmatically. buildMemorySnapshotJson is the entry the /memory
endpoint will call in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add /memory endpoint (SSE-session-cookie gated)
GET /memory returns the BrowserManager memory snapshot as JSON. Auth
matches /activity/stream and /inspector/events: Bearer header OR
view-only SSE-session cookie (the extension fetches the cookie once
via POST /sse-session, then polls /memory with withCredentials: true).
Deliberately NOT extending /health for the sidebar footer poll —
TODOS.md "Audit /health token distribution" records that /health
already surfaces AUTH_TOKEN to any localhost caller in headed mode. A
separate endpoint with the standard SSE auth keeps the future /health
fix from cascading into the sidebar.
sanitizeReplacer is applied at egress because tab.url and tab.title
come from page content — lone-surrogate bytes from broken emoji could
otherwise reach the sidebar and (when forwarded to Claude API) trigger
HTTP 400.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add sidebar footer RSS readout (polls /memory every 30s)
Footer now shows "<bun-rss> · <tab-count>" sourced from the /memory
endpoint, polled every 30s. Color thresholds: orange warn at 2 GB Bun
RSS or 50 tabs; red bad at 8 GB or 200 tabs (matches the tab-guardrail
threshold landing in a later commit). The footer gives the user an
early signal that the cliff is forming, instead of only learning when
the OS OOM-kills the process.
Backoff per Codex's flag: if a poll takes > 2s response time the
sidebar drops to a 5-minute cadence until the next successful fast
poll. The diagnostic shouldn't add load to a browser that's already
unhealthy.
Start/stop is wired to the existing setServerInfo() hook so the timer
only runs while the sidebar is connected to a server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* stop materializing response bodies in requestfinished listener
The Bun-side accelerant on the gbrowser-OOM investigation. Pre-fix,
the per-page requestfinished listener called \`await res.body()\` just
to read .length — Playwright fetches the bytes from Chromium across
CDP into a Bun Buffer, only for the listener to discard the buffer
after a single length read. On a long-lived headed browser with
media-heavy pages this is multi-GB/hour of Buffer allocation churn.
Bun GCs it, but the cross-process CDP traffic + transient allocation
pressure feeds the OOM trajectory.
The fix: req.sizes() pulls from the Network.loadingFinished event
Chromium already emits. No body materialization. Accurate for chunked
transfer, gzip-compressed responses, and streaming media — the cases
where a naive Content-Length header read (the original review's
proposal) would have missed the size entirely (Codex flag on the eng
review, D10 USE_CDP_EVENT_BATCHED).
The D10 stretch goal — replacing N per-page listeners with a single
context-level CDP listener via Target.setAutoAttach — is deferred and
tracked in TODOS. The listener architecture change is significantly
more plumbing than the leak fix and not on the critical path for
stopping the body materialization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* tab guardrail (50/200 thresholds) + sidebar action toast
Server side (browser-manager.ts):
Idempotent threshold tracker fires an activity entry exactly once at
each upward crossing of 50 (soft warn) and 200 (hard warn). Re-arms
when the count drops below. Activity-feed surface gives the
audit-trail invariant even with the sidebar closed; the toast UX
lives in the sidebar.
Sidebar side (extension/sidepanel.{html,css,js}):
Every /memory poll evaluates two trigger conditions:
- Any single tab > 4 GB JS heap (catches the WebGL/video runaway
case Codex flagged on the eng review).
- Tab count >= 200.
Toast shows top 5 tabs ranked by max(jsHeap, nodes*1KB + listeners*200)
so a WebGL-heavy tab with small JS heap still surfaces. Default-selected
checkboxes + "Close selected" run \`\$B closetab <id>\` through the
existing /command path — no chrome.tabs.remove bridge needed. "Snooze"
bumps tabsAbove/heapAbove thresholds in chrome.storage.session so the
toast stays hidden until the user accumulates more tabs OR one tab
grows another 2 GB.
Tests: browse/test/tab-guardrail.test.ts pins the server-side
fires-once + re-arms invariants without spinning up Chromium.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add memory-leak reproducer (gate tier)
browse/test/memory-leak-reproducer.test.ts pins the invariant from
the D10 fix: wirePageEvents.requestfinished must call req.sizes() but
must NEVER call res.body(). Fakes a page emitting a burst of 200
requestfinished events, each with a notional 1 MB response — pre-fix
this would allocate 200 MB of Buffer per burst, post-fix not one byte
of body content is materialized.
The test also asserts networkBuffer entries are still populated with
the right size, so size reporting in the network panel doesn't
regress.
A real-Chromium peak-RSS reproducer (periodic tier) is deferred —
see TODOS "Reproducer with WebGL / video / MSE buffer pressure". This
gate-tier test is sufficient to catch the leak class being
reintroduced by any future refactor of the requestfinished listener.
Wall clock: ~400ms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* TODOS: 4 follow-ups from gbrowser-OOM PR
Captures the items deliberately deferred from the v1.49 leak-fix PR
so the deferrals don't fall off the radar:
- P2: MV3 extension service-worker memory profile (Codex finding #4)
- P2: Native + GPU memory breakdown in \$B memory (Codex finding #5)
- P3: Single-context CDP listener for Network.loadingFinished (D10
stretch goal)
- P3: Real-Chromium peak-RSS reproducer for periodic tier (Codex
finding on transient amplification + ANGLE_B_NUMBERS CHANGELOG
framing dependency)
Each entry follows the standard TODOS.md format: What / Why / Pros /
Cons / Context / Priority / Effort.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* regen SKILL.md after adding \$B memory command
The C8 commit added 'memory' to META_COMMANDS + COMMAND_DESCRIPTIONS
but didn't regenerate the SKILL.md files. The category was 'Diagnostics'
which isn't in scripts/resolvers/browse.ts:categoryOrder; switched to
'Server' (matches the existing 'status' / 'restart' / 'handoff'
pattern) so the table renders under the existing ### Server section.
Test fix: gen-skill-docs.test.ts asserts every command appears in the
generated SKILL.md and gstack/llms.txt; without this regen the test
fails with "Expected to contain: 'memory'".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* add coverage for \$B memory diagnostic surface
17 tests across the formatter + byte renderer + JSON entry point:
- formatBytes() 4-tier (bytes, KB, MB, GB) + 160 GB sanity case
(the friend's OOM number from the original screenshot, so the
renderer doesn't blow up at real leak scale)
- handleMemoryCommand --json mode parseable shape
- handleMemoryCommand text mode: Bun server line, no-tabs branch,
top-10 sort with "...and N more" tail, Chromium process grouping
by type, "unavailable" line when processes is null, modification-
history evicted-count format, notes section rendering, long-URL
ellipsis truncation
- buildMemorySnapshotJson returns shape matching the type
The formatSnapshotText renderer is private to memory-command.ts;
tests exercise it through handleMemoryCommand's text-mode return
path. The eviction-count format is pinned via a parallel format
contract assertion since the renderer reads live module state.
Coverage gate: brings the diagnostic surface from 0% to ~80%.
Extension UI (sidepanel.js footer + toast) remains uncovered —
adding tests there would require extracting fmtBytesShort and
tabRamScore from sidepanel.js into a testable TS module, which is
deferred to a follow-up to keep this PR scoped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.51.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v1.51.0.0
Add $B memory command to BROWSER.md server lifecycle table. Document the
new createSseEndpoint helper + CDP session lifecycle helpers (withCdpSession,
getOrCreateCdpSession) in CLAUDE.md alongside the existing server hardening
notes, with the static-grep tripwire callout so future contributors route
through the helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(test): pin SSE sanitizer wiring to the v1.51 createSseEndpoint helper
The two `wiring invariants` tests grepped server.ts for
`JSON.stringify(entry, sanitizeReplacer)` and
`JSON.stringify(event, sanitizeReplacer)` — patterns that lived inline
in /activity/stream and /inspector/events before the v1.51 refactor
moved both endpoints behind createSseEndpoint. Sanitization still
happens (the helper applies it inside its send() and live-event
callback), but the static-grep was pinned to the old wiring and started
failing on Windows free-tests after the refactor landed.
Updated to check the new contract:
- /activity/stream + /inspector/events route through createSseEndpoint
(regex match of the route handler block ending in the helper call).
- sse-helpers.ts contains JSON.stringify + sanitizeReplacer + imports
stripLoneSurrogates from ./sanitize (catches drift to a private copy).
- server.ts retains its own sanitizeReplacer for non-SSE egress paths
(handleCommandInternal); the two replacers coexist by design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7ca04d8ef0 |
v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569) gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin (e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints, analytics, and learnings into that plugin's directory. Anyone with the Codex plugin installed alongside gstack hit this silently. Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path contains "gstack"). Skill installs fall through to \$HOME/.gstack. Contributed by @ElliotDrel via #1570. Closes #1569. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+ gbrain v0.20+ changed `gbrain sources list --json` to return {sources: [...]} instead of a flat array. sourceLocalPath crashed upstream with `list.find is not a function` on every /sync-gbrain invocation against modern gbrain. Accept both shapes for forward/backward compat, matching probeSource/sourcePageCount in lib/gbrain-sources.ts. Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564 (@tonyjzhou, same fix, different shape — credit retained). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559) gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`, which fails in any environment where the `command` builtin isn't on the spawned process's PATH (most non-interactive shells). The probe then reported gbrain as missing even when it was installed, and context-load silently skipped vector/list queries. Fix: probe `gbrain --version` directly with a 500ms timeout (matching the rest of the file's MCP_TIMEOUT_MS). Same semantics, works everywhere execFile works. Contributed by @jbetala7 via #1560. Closes #1559. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418) Adds an exec-path regression test that runs a fake gbrain shim emitting the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings", exit 1 for health_score < 100, no top-level `engine` field). Confirms freshDetectEngineTier recovers stdout from the non-zero exit and falls back to GBRAIN_HOME/config.json for the engine label. The pre-existing test for #1415 only stripped gbrain from PATH; this test exercises the actual doctor parse path, closing the gap that codex's plan review flagged. Also documents the schema_version separation in lib/gbrain-local-status.ts: the local CacheEntry stays at version 1, distinct from the doctor-output schema_version which we accept across versions in gstack-memory-helpers. Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2 collapse). The fix landed pre-emptively in v1.29.x; this commit pins it with a stronger test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346) #1346 reported that gstack-memory-ingest still called the renamed gbrain put_page subcommand on gbrain v0.18+. The actual code migrated to `gbrain put` and later to batch `gbrain import <dir>` before this report landed — only documentation lag remained. This commit: - Updates the --help string ("Skip gbrain put calls (still updates state file)") so user-facing docs match the shipped subcommand - Updates two inline comments that still referenced the old name - Adds test/memory-ingest-no-put_page.test.ts: a regression pin that strips comments from bin/gstack-memory-ingest.ts and fails the build if "put_page" appears in any active code or string literal, plus a sanity check that the file still calls a supported gbrain page-write verb (put or import) Closes #1346. Reporter @kylma-code surfaced the doc lag; the original code migration credit is on the v1.27.x wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug> scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions using the renamed `gbrain put_page` subcommand across 10 skills (office-hours, investigate, plan-ceo-review, retro, plan-eng-review, ship, cso, design-consultation, fallback, entity-stub). Every gstack user copying those snippets hit "unknown command: put_page" on gbrain v0.18+. This commit: - Rewrites all 10 instruction templates to use `gbrain put <slug> --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML frontmatter inside --content, matching the v0.18+ subcommand shape - Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands" table to reference `gbrain put` and `gbrain get` - Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two invariants: (a) resolver source ships only canonical instructions, (b) every tracked SKILL.md file is free of `gbrain put_page` CHANGELOG entries are deliberately left untouched (historical record). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561) Bun's Windows shell parser rejects multiple constructs the inline package.json build chain used: brace groups `{ cmd; }`, subshells with redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells near redirections in general. Every Windows install + every auto-upgrade since v1.34.2.0 has failed on `bun run build`. Extracts the build chain to scripts/build.sh and the .version writes to scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing involved. Also adds Windows-specific bun.exe handling for non-ASCII PATHs (a separate Windows footgun where Bun's --compile fails when the binary lives under a path with non-ASCII chars). Updates test/build-script-shell-compat.test.ts to assert the new shape: no subshells with redirections anywhere in the build chain, and build delegates to scripts/build.sh which delegates .version writes. Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed in build helper), #1480 (@mikepsinn, partial overlap), #1460 (@realcarsonterry, brace-group fix subsumed) — credit retained. Closes #1538, #1537, #1530, #1457, #1561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554) bun build --compile on Windows appends .exe to the output filename, producing browse.exe instead of browse. find-browse's existsSync probe only checked the bare path and returned null on Windows even when the binary was correctly built. .gitignore similarly only excluded the bare bin/gstack-global-discover path, leaving the .exe variant tracked. This commit: - .gitignore: changes `bin/gstack-global-discover` → `bin/gstack-global-discover*` so the Windows .exe variant is ignored - browse/src/find-browse.ts: adds isExecutable + findExecutable helpers that fall back to .exe/.cmd/.bat probing on Windows, mirroring the same helper already in make-pdf/src/browseClient.ts and pdftotext.ts Contributed by @Mike-E-Log via #1554. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest Adds .github/workflows/windows-setup-e2e.yml as the gate that catches Bun shell-parser regressions in the build chain before they reach users. Triggers on PRs touching package.json, scripts/build.sh, scripts/write-version-files.sh, setup, browse cli/find-browse, or gstack-paths. What it verifies: 1. bun run build completes on Windows (the previously-broken path that #1538/#1537/#1530/#1457/#1561 reported) 2. All compiled binaries land on disk (browse.exe, find-browse.exe, design.exe, gstack-global-discover.exe) 3. find-browse resolves to the .exe variant on Windows (regression gate for #1554) 4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT on Windows (regression gate for #1570) Complements the existing windows-free-tests.yml (curated unit subset); this new workflow exercises the install path itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209) Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together (mutually exclusive at argv level). Every /codex review, /review, and /ship structured Codex review call ended with an argv error before the model ran. Fix: scope the diff in prompt text using "Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD" instead of `--base <base>`. Preserves the filesystem boundary instruction across all invocations and keeps Codex's review prompt tuning. Touches: - codex/SKILL.md.tmpl + regenerated codex/SKILL.md - scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md - test/gen-skill-docs.test.ts: new regression that fails if any of the five known files still contain the prompt+--base shape - test/skill-validation.test.ts: corresponding negative + positive pin on the rendered SKILL.md files Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527 (@mvanhorn — same intent, different patch shape, CONFLICTING) and #1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): diff from git merge-base, not git diff origin/<base> (#1492) git diff origin/<base> shows everything since the common ancestor in both directions — it includes commits that landed on origin/<base> after this branch was created as deletions. That made /review and /ship's pre-landing structured review report inflated diff totals and flagged "removed" code that was actually still present in the working tree. Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff the working tree against that point. Same coverage of uncommitted edits, no phantom deletions from out-of-order base advancement. Applies to /review's Step 1 (diff existence check), Step 3 (get the diff), the build-on-intent scope-creep check, the structured review DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt. Same change flows into ship/SKILL.md via the shared resolver. Touches: - review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md - scripts/resolvers/review.ts - scripts/resolvers/review-army.ts Contributed by @mvanhorn via #1492. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522) #1503 reported that the bare codex review --base path stripped the filesystem boundary instruction, letting Codex spend tokens reading .claude/skills/ and agents/. #1522 proposed adding a skill-path detector that switched to the custom-instructions route when the diff touched skill files. After C10 (#1209) restructured codex review to always carry the boundary in the prompt (the prompt+--base argv conflict forced the restructure), the skill-path detector becomes redundant — every default call already preserves the boundary. This commit pins the post-#1209 invariant with a test that fails the build if any future refactor strips the boundary from codex/SKILL.md, review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test. #1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers its safety concern); credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): use command -v instead of which for codex detection (#1197) `which` is not on PATH in every shell — some Windows shells, BusyBox- only containers, and minimal CI images all fail when skills probe codex availability via `which codex`. `command -v` is a POSIX builtin and always available where the skill is running. Touched: - codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "") - scripts/resolvers/review.ts and scripts/resolvers/design.ts: 3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1` - Regenerated all 10 affected SKILL.md files (codex, review, ship, design-consultation, design-review, office-hours, plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review) - test/skill-validation.test.ts: updated pin + defensive regression test that fails if `which codex` returns to codex/SKILL.md - test/skill-e2e-plan.test.ts: updated summary regex Contributed by @mvanhorn via #1197. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327) When codex exits non-zero (parse errors, arg-shape breaks, model API errors that propagate as non-zero status), the calling agent previously saw an empty output and burned 30-60 minutes misdiagnosing as a silent model/API stall. The hang-detection block only caught exit 124 (the timeout-wrapper signal). Adds elif blocks in all four codex invocation sites (Review default, Challenge, Consult new-session, Consult resume) that: - Echo "[codex exit N] <stderr first line>" to stdout - Indent the first 20 stderr lines for inline context - Log codex_nonzero_exit telemetry tagged with the call site Contributed by @genisis0x via #1467. Closes #1327. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248) The design binary previously called process.env.OPENAI_API_KEY without checking where the key came from. If a user ran $D inside someone else's project that had OPENAI_API_KEY in its .env, the resulting generation billed that project's account. Silent and irreversible. Fix: resolveApiKeyInfo() returns both the key and its source. When the env-var path matches an OPENAI_API_KEY entry in the current directory's .env, .env.<NODE_ENV>, or .env.local file, we set a warning. requireApiKey() prints "Using OpenAI key from <source>" plus the warning before the run — never the key itself. Adds 6 unit tests covering: config-vs-env precedence, env-only (no match), env+cwd .env match, quoted/exported values, value-mismatch (no false positive), and the no-leak invariant for requireApiKey stderr output. Contributed by @jbetala7 via #1278. Closes #1248. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214) Full-page screenshots of tall pages routinely exceeded 2000px on the longest dimension, silently bricking the agent's session: the resulting base64 reached the Anthropic vision API which rejected the oversized image, leaving the agent burning turns on a useless blob with no stderr trace from the browse side. Adds browse/src/screenshot-size-guard.ts as a shared helper: - guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000 - guardScreenshotPath(path) → file-mode variant that rewrites in place - Aspect ratio preserved via sharp's resize fit:inside - Stderr diagnostic on any downscale so callers can see when it fired - Lazy sharp import so non-screenshot paths pay no startup cost Wires the guard into all three full-page callsites codex review flagged: - browse/src/snapshot.ts: annotated + heatmap fullPage captures - browse/src/meta-commands.ts: screenshot command (path + base64 fullPage modes) plus the responsive 3-viewport sweep - browse/src/write-commands.ts: prettyscreenshot fullPage path Covers seven unit cases (pass-through, downscale, aspect ratio, exactly-2000px edge, file-mode rewrite) plus a static invariant test that fails the build if any of the three callsites stops importing the guard. Closes #1214. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370) The L4 TestSavant classifier in browse/src/security-classifier.ts can't be imported into the compiled browse server (onnxruntime-node dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent that used to host it (sidebar-agent.ts) was removed when the PTY proved out — leaving the classifier file shipped but with zero callers. Exactly the gap codex flagged in #1370. Adds browse/src/security-sidecar-entry.ts: a Node script that runs the classifier as a subprocess of the browse server. It reads NDJSON requests from stdin and writes id-correlated NDJSON responses to stdout, supporting: - op: "scan-page-content" — full L4 classifier scan - op: "ping" — liveness probe for the client's health check - op: "status" — classifier readiness (used by /pty-inject-scan to surface l4 { available: bool } in its response) Plus browse/src/find-security-sidecar.ts: a resolver that locates node + the bundled JS entry (browse/dist/security-sidecar.js, built in a follow-up package.json change) or falls back to the dev TS entry. Returns null cleanly when node isn't on PATH so the calling endpoint can degrade per D7 (extension WARN + user confirm). C17 of the security-stack wave. C18 adds the IPC client + lifecycle management; C19 wires the endpoint; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370) Adds browse/src/security-sidecar-client.ts to manage the Node L4 classifier subprocess from the compiled browse server: - Lazy spawn on first scan; reuses the same process across requests - Id-correlated request/response via NDJSON over stdio - 5s default per-scan timeout; 64KB payload cap (short-circuits before spawn so oversized requests don't waste a process) - 3-in-10-minutes respawn cap → trips circuit breaker; subsequent scans throw immediately so the /pty-inject-scan endpoint can surface l4 { available: false } to the extension and degrade to WARN+confirm - process.on('exit') sends SIGTERM to the child for clean teardown - isSidecarAvailable() lets the endpoint probe before scan calls so the response shape reflects degraded mode honestly Unit tests cover the payload cap, the availability probe, and the breaker-doesn't-crash invariant under repeated rejected calls. C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370) The sidebar's gstackInjectToTerminal callers (toolbar Cleanup, Inspector "Send to Code") were piping page-derived text directly into the live claude PTY with ZERO classifier processing — the gap codex flagged in #1370. The documented sidebar security stack had a hole the size of every Cleanup-button click. Adds POST /pty-inject-scan to browse/src/server.ts: - Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the general 404 path; never reaches the scan logic) - Root-token auth via existing validateAuth() — 401 on unauth - 64KB request cap → 413 + payload-too-large body - 5s scan timeout via sidecar client - URL-blocklist forced to BLOCK in PTY context (page-derived REPL input is higher-risk than ordinary tool output) - L4 ML classifier via the sidecar when available; degrades to WARN per D7 when sidecar is unavailable - Response goes through JSON.stringify(..., sanitizeReplacer) per v1.38.0.0 Unicode-egress hardening - Imports only from security-sidecar-client.ts, never directly from security-classifier.ts (which would brick the compiled Bun binary) Seven static-invariant tests pin the POST verb, auth gate, 64KB cap, tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability shape, and the no-direct-classifier-import rule. C19 of the security-stack wave. C20 routes the extension through it; C21 adds the invariant AST check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370) Closes the documented-vs-shipped gap codex flagged in #1370. The sidebar's two PTY-injection call sites (Inspector "Send to Code" and toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint before writing to the live claude REPL. Adds window.gstackScanForPTYInject(text, origin) to extension/sidepanel-terminal.js: - Async, returns { allow, verdict, reasons, l4 } - POST to /pty-inject-scan with the existing root-token auth - WARN+confirm on scan failure (network down, sidecar absent, etc.) rather than silent PASS — D7 honest-degradation gstackInjectToTerminal stays synchronous, returns boolean. Per D6: keeping the inject sync means existing `const ok = ...?.()` callers don't break, and the invariant test in test/extension-pty-inject-invariant.test.ts can statically pin that every call goes through the scan first. extension/sidepanel.js call sites updated: - inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via window.confirm, PASS injects silently - runCleanup() → same flow. Static cleanup prompt always PASSes but still routes through scan to honor the invariant. C20 of the security-stack wave. C21 adds the static invariant test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(security): invariant — extension PTY inject must be scan-gated (#1370) Static-analysis invariant test that fails the build if any extension/*.js path calls window.gstackInjectToTerminal without a preceding window.gstackScanForPTYInject in the same enclosing function. Closes the documented-vs-shipped gap codex demanded a machine check on. Rules: - Rule 1: any file that calls inject must also reference scan - Rule 2: in the enclosing function (function declaration, arrow, async (), event handler), a scan call must appear before the inject call by source position - Exemption: sidepanel-terminal.js (the file that DEFINES the inject function) is exempt from Rule 2 since the definition is not a call Plus two structural checks: - sidepanel-terminal.js defines both the inject and scan functions - inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would silently break the `const ok = ...?.()` pattern at every caller C21 of the security-stack wave. The sidecar architecture (#1370) is complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension pre-scan wiring (C20), and now the regression gate (C21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112) Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current browse/src/stealth.ts contract. The existing minimal "codex narrowed" stealth (webdriver-mask + AutomationControlled launch arg) stays the default. PR #1112's six additional patches are added behind an opt-in GSTACK_STEALTH=extended env flag. Extended-mode patches (applied AFTER the default mask, in order): 1. delete navigator.webdriver from prototype (not just the getter — detectors check `"webdriver" in navigator`) 2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1 software-GPU tell in containers) 3. navigator.plugins returns a PluginArray-prototype-passing array with MimeType objects and namedItem() 4. window.chrome populated with chrome.app, chrome.runtime, chrome.loadTimes(), chrome.csi() with realistic shapes 5. navigator.mediaDevices backfilled when headless drops it 6. CDP cdc_*-prefixed window globals cleared Why opt-in: the default mode's contract is fingerprint CONSISTENCY, which protects against detectors that flag spoofing mismatch. Extended mode actively lies about the environment; sites that reflect on these properties can break. Users who hit detection in default mode can flip GSTACK_STEALTH=extended for SannySoft 100% pass-rate. Twenty unit tests pin the env-flag semantics, all six patches' code presence, and the applyStealth wiring order. Live SannySoft pass-rate verification stays in the periodic-tier E2E suite. Contributed by @garrytan via #1112 (rebased — original PR opened before the codex-narrowed minimum landed; rebase preserves the narrowed default while adding the SannySoft-passing path as opt-in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates Updates the three ship-SKILL.md golden baselines (claude, codex, factory hosts) to match the new shape produced by: - C10 #1209 codex argv (prompt + diff scope, no --base) - C11 #1492 merge-base diff (DIFF_BASE= preamble) - C13 #1197 command -v for codex detection - C12 + boundary preservation per regen-enforcing test Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs, commit the regenerated outputs together. Goldens are part of the regen contract — without this commit, test/host-config.test.ts' golden-baseline checks fail with the diff codex review surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes) Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the release-summary format in CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table, "What this means for builders" closer, then itemized Added/Changed/Fixed/For contributors with inline credit to every PR author and original issue reporter. Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net, substantial new capability across security (PTY sidecar wiring), install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI 0.130+), and quality (screenshot guard, design key disclosure, extended stealth opt-in). MINOR is the right call. Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537, #1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327, #1193 pattern, #1152 pattern. Credit retained inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe] windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a freshly-built repo where binaries land at browse/dist/browse.exe (no .claude/skills/gstack/ install layout). The previous markers chain only matched .codex/.agents/.claude prefixed paths, so find-browse exited "not found" even when the binary was present. Adds a source-checkout fallback after the marker scan: if no installed layout resolves but <git-root>/browse/dist/browse[.exe] exists, return that. Three real callers hit this path: - gstack repo dev workflow before `./setup` runs - windows-setup-e2e.yml CI (the breakage that surfaced this) - make-pdf consumers running from a sibling source checkout Smoke-verified: a fresh git repo with browse/dist/browse on disk now resolves through the source-checkout branch (was returning null before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
40d00bd2ce |
v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592)
* fix(build-app): escape sed replacement metachars in Chromium rebrand
build-app.sh injects \$APP_NAME directly into the replacement half of
sed's s/// when patching Chromium's localized InfoPlist.strings. If
\$APP_NAME ever carries '/', '&', or '\\' — the command either breaks
or starts interpreting input as sed syntax. The trailing '|| true'
would then silently hide the failure and ship a DMG that still says
'Google Chrome for Testing' in the menu bar.
Escape replacement metachars before substitution. No change for the
default name 'GStack Browser'.
* fix(build-app): bail out if 'mktemp -d' fails instead of cp-ing into '/'
The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check.
If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP
is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' —
expands to 'cp -a "<app>" "/"', which copies the bundle into the root of
the filesystem.
Refuse to continue unless mktemp produced a real directory. Defensive
second check catches the (rare) case where mktemp succeeds but returns
something that isn't a directory we can cp into.
* fix(telemetry-sync): drop predictable $$ tmp-file fallback
gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on
failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable
and reusable, so on shared hosts another user can pre-create or symlink
the path and either steal the response body or clobber an unrelated
file when curl writes through it.
Drop the fallback. If mktemp cannot produce a unique file we just skip
this sync cycle — the events stay on disk and the next run picks them
up. Also install an EXIT trap so the response file is cleaned up on
unexpected exit, not just on the happy path.
* fix(verify-rls): drop predictable $$-based tmp file fallback
Same shape as gstack-telemetry-sync: on mktemp failure the script fell
back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the
PID and a per-check counter. On a shared box another user can pre-create
or symlink the path and either capture the HTTP response body (which may
leak what the RLS tests revealed) or corrupt an unrelated file that curl
writes through.
Make mktemp strict. On failure return from the check function; the caller
tallies a FAIL and the run moves on.
* fix(security-classifier): close writer + delete tmp on download error
downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives
it from a fetch body reader, but if reader.read() or writer.write()
throws mid-download the writer is never closed. That leaks an FD per
failed attempt and leaves the half-written tmp on disk. A later retry
can land in renameSync(tmp, dest) with a truncated TestSavantAI /
DeBERTa ONNX file — which then loads but produces garbage classifier
verdicts until the user manually nukes the models cache.
Wrap the download loop in try/catch. On failure, destroy() the writer
and unlink the tmp before rethrowing, so the next attempt starts from a
clean slate.
* fix(meta-commands): guard JSON.parse in pdf --from-file parser
parsePdfFromFile() runs JSON.parse on user-supplied file contents with
no try/catch. A malformed payload surfaces as an uncaught SyntaxError
from the 'pdf' command handler and the user sees an opaque stack trace
instead of "this file isn't valid JSON". Worse, the same call path is
used by make-pdf when header/footer HTML would overflow Windows'
CreateProcess argv cap, so a corrupt payload file there can take down
the make-pdf run.
Wrap JSON.parse. Re-throw with a message that names the offending file
and echoes the parser's own explanation. Also reject top-level non-
objects (null, array, primitive) since the rest of the function treats
json as an object — catching that here produces a clear error instead
of a TypeError further down.
* fix(global-discover): stop dropping sessions when header >8KB
extractCwdFromJsonl() reads the first 8KB of each JSONL session file and
runs JSON.parse on every newline-split line. When a session record
happens to straddle the 8KB cap, the last line ends in a truncated JSON
fragment, JSON.parse throws, the catch block 'continue's silently, and
if that was the only line carrying 'cwd' the whole project gets dropped
from the discovery output without a warning.
Two independent hardening steps:
1. Raise the read cap to 64KB. Session headers observed in Claude
Code / Codex / Gemini transcripts fit comfortably; this just moves
the cliff out of the normal range.
2. Drop the final segment after splitting on '\\n'. If the read hit
the cap mid-line, that segment is guaranteed incomplete; if the
file ended inside the buffer, the split produces an empty final
segment and dropping it is a no-op.
Together these make the parser robust regardless of how verbose the
leading records are.
* test: export downloadFile, parsePdfFromFile, extractCwdFromJsonl
These three internal helpers are now imported by regression tests
landing in the next commits (PR #1169 follow-up). Pattern matches the
existing normalizeRemoteUrl export in gstack-global-discover.ts which
test/global-discover.test.ts already imports side-effect-free.
No change to runtime behavior; gstack has no public package entrypoint
that would re-export these, so the in-repo surface is unchanged for
callers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security-classifier): await writer close before unlinking tmp on error
The earlier downloadFile() error-path cleanup hit a race: Node's
createWriteStream lazily opens the FD and flushes buffered writes during
destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()`
hits ENOENT (file not yet on disk), then the writer's destroy finishes on the
next tick and creates the file fresh — leaving the half-written tmp behind
exactly as the original fix tried to prevent.
The new sequence awaits the writer's 'close' event before unlinking, so the FD
is fully torn down and no subsequent flush can re-create the path.
Caught by browse/test/security-classifier-download-cleanup.test.ts in the
next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(browse): regression tests for downloadFile cleanup + parsePdfFromFile guard
Covers PR #1169 bugs #6 and #7:
- security-classifier-download-cleanup.test.ts pins downloadFile error-path
cleanup against three failure shapes: reader rejects mid-stream, non-2xx
response, missing body. Asserts the dest file is not created and no
<dest>.tmp.* siblings remain (glob-matched, not exact path — codex push:
if the fix later switches to mkdtempSync, the assertion still holds).
Includes a happy-path case so the cleanup isn't fighting a correct download.
- regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile
to throw a helpful error for: invalid JSON, empty file, top-level array,
top-level number, top-level string, top-level null, top-level boolean.
Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof
guard must be tested separately from the JSON.parse try/catch.
Both files use mkdtempSync(process.cwd()/...) for fixture isolation since
SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(global-discover): regression for extractCwdFromJsonl 64KB cap
PR #1169 bug #8: the 8KB read cap landed mid-line on Claude Code session
headers, JSON.parse threw on the truncated tail, the catch silently
continued, and the project disappeared from /gstack discovery output.
Six new cases under describe("extractCwdFromJsonl 64KB cap"):
- happy path: small JSONL with obj.cwd returns it
- 12KB first line with obj.cwd: returns cwd (the bug case)
- 80KB single line overflowing 64KB: returns null without crashing
- complete line followed by partial second line: trailing-partial-drop
must not poison the result; returns first line's cwd
- missing file: returns null (file read error swallowed)
- malformed first line + valid second line within cap: skips bad,
returns second's cwd
Tests use the exported extractCwdFromJsonl (added in earlier export
commit) and live in a separate describe block from the existing
"4KB / 128KB buffer" tests, which exercise the unrelated scanCodex
meta.payload.cwd path at L338 — different function, different bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: regression tests for shell-script bugs in PR #1169 (#2-#5)
Two new test files pinning the four shell-script invariants from the
external audit:
regression-pr1169-build-app-sed.test.ts — bugs #2 + #3
- Runtime isolation: extracts the sed-escape sequence from build-app.sh
and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App",
"A/B\C&D"). Asserts the literal hostile name round-trips through a real
`sed s///` invocation, locking the metachar safety end-to-end.
- Static check: the rebrand block must contain both the escape line AND
the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME
interpolation directly into the s/// replacement is rejected.
- Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }`
failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation
AND the cp -a appears AFTER both guards.
- Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that
exits 1, asserts the script exits non-zero before any cp block can reach.
regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5
- Per codex pushback, the invariant is "no `mktemp ... || echo <path>`
fallback shape" — not just "no $$ token." That's a stronger invariant
that catches future swaps to $RANDOM or hardcoded paths.
- For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh:
- no echo-based fallback after mktemp
- no $$ inside any /tmp path literal
- mktemp failure path explicitly exits / returns non-zero
- telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup
so success paths don't leak the tmp on normal exit.
All seven new test files are gate-tier (deterministic, sub-second, no LLM,
no network). Runtime shell tests use fake-bin PATH stubs in temp dirs;
no $HOME mutation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.41.1.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: RagavRida <ragavrida@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
00f966b3ec |
v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391)
* fix(codex): use resume-compatible flags * fix: V-001 security vulnerability Automated security fix generated by Orbis Security AI * docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 in |
||
|
|
e8893a18b1 |
v1.20.0.0 feat: browser-skills runtime + gbrain-support carryover (#1233)
* feat(gbrain-sync): queue primitives + writer shims
Adds bin/gstack-brain-enqueue (atomic append to sync queue) and
bin/gstack-jsonl-merge (git merge driver, ts-sort with SHA-256 fallback).
Wires one backgrounded enqueue call into learnings-log, timeline-log,
review-log, and developer-profile --migrate. question-log and
question-preferences stay local per Codex v2 decision.
gstack-config gains gbrain_sync_mode (off/artifacts-only/full) and
gbrain_sync_mode_prompted keys, plus GSTACK_HOME env alignment so
tests don't leak into real ~/.gstack/config.yaml.
* feat(gbrain-sync): --once drain + secret scan + push
bin/gstack-brain-sync is the core sync binary. Subcommands: --once
(drain queue, allowlist-filter, privacy-class-filter, secret-scan
staged diff, commit with template, push with fetch+merge retry),
--status, --skip-file <path>, --drop-queue --yes, --discover-new
(cursor-based detection of artifact writes that skip the shim).
Secret regex families: AWS keys, GitHub tokens (ghp_/gho_/ghu_/ghs_/
ghr_/github_pat_), OpenAI sk-, PEM blocks, JWTs, bearer-token-in-JSON.
On hit: unstage, preserve queue, print remediation hint (--skip-file
or edit), exit clean. No daemon — invoked by preamble at skill
boundaries.
* feat(gbrain-sync): init, restore, uninstall, consumer registry
bin/gstack-brain-init: idempotent first-run. git init ~/.gstack/,
.gitignore=*, canonical .brain-allowlist + .brain-privacy-map.json,
pre-commit secret-scan hook (defense-in-depth), merge driver registration
via git config, gh repo create --private OR arbitrary --remote <url>,
initial push, ~/.gstack-brain-remote.txt for new-machine discovery,
GBrain consumer registration via HTTP POST.
bin/gstack-brain-restore: safe new-machine bootstrap. Refuses clobber
of existing allowlisted files, clones to staging, rsync-copies tracked
files, re-registers merge drivers (required — not cloned from remote),
rehydrates consumers.json, prompts for per-consumer tokens.
bin/gstack-brain-uninstall: clean off-ramp. Removes .git + .brain-*
files + consumers.json + config keys. Preserves user data (learnings,
plans, retros, profile). Optional --delete-remote for GitHub repos.
bin/gstack-brain-consumer + bin/gstack-brain-reader (symlink alias):
registry management. Internal 'consumer' term; user-facing 'reader'
per DX review decision.
* feat(gbrain-sync): preamble block — privacy gate + boundary sync
scripts/resolvers/preamble/generate-brain-sync-block.ts emits bash that
runs at every skill invocation:
- Detects ~/.gstack-brain-remote.txt on machines without local .git
and surfaces a restore-available hint (does NOT auto-run restore).
- Runs gstack-brain-sync --once at skill start to drain any pending
writes (and at skill end via prose instruction).
- Once-per-day auto-pull (cached via .brain-last-pull) for append-only
JSONL files.
- Emits BRAIN_SYNC: status line every skill run.
Also emits prose for the host LLM to fire the one-time privacy
stop-gate (full / artifacts-only / off) when gbrain is detected and
gbrain_sync_mode_prompted is false. Wired into preamble.ts composition.
* test(gbrain-sync): 27-test consolidated suite
test/brain-sync.test.ts covers:
- Config: validation, defaults, GSTACK_HOME env isolation
- Enqueue: no-op gates, skip list, concurrent atomicity, JSON escape
- JSONL merge driver: 3-way + ts-sort + SHA-256 fallback
- Init + sync: canonical file creation, merge driver registration,
push-reject + fetch+merge retry path
- Init refuses different remote (idempotency)
- Cross-machine restore round-trip (machine A write → machine B sees)
- Secret scan across all 6 regex families (AWS, GH, OpenAI, PEM, JWT,
bearer-JSON). --skip-file unblock remediation
- Uninstall removes sync config, preserves user data
- --discover-new idempotence via mtime+size cursor
Behaviors verified via integration smokes during implementation. Known
follow-up: bun-test 5s default timeout needs 30s wrapper for
spawnSync-heavy tests.
* docs(gbrain-sync): user guide + error lookup + README section
docs/gbrain-sync.md: setup walkthrough, privacy modes, cross-machine
workflow, secret protection, two-machine conflict handling, uninstall,
troubleshooting reference.
docs/gbrain-sync-errors.md: problem/cause/fix index for every
user-visible error. Patterned on Rust's error docs + Stripe's API
error reference.
README.md: 'Cross-machine memory with GBrain sync' section near the
top (discovery moment), plus docs-table entry.
* chore: bump version and changelog (v1.7.0.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore: regenerate SKILL.md files for gbrain-sync preamble block
Re-runs bun run gen:skill-docs after adding generateBrainSyncBlock
to scripts/resolvers/preamble.ts in
|
||
|
|
ed1e4be2f6 |
feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216)
* build: vendor xterm@5 for the Terminal sidebar tab
Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm`
build step that copies the assets into `extension/lib/` at build time.
The vendored files are .gitignored so the npm version stays the source
of truth. xterm@5 is eval-free, so no MV3 CSP changes needed.
No runtime callers yet — this just stages the assets.
* feat(server): add pty-session-cookie module for the Terminal tab
Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly
cookies for authenticating the Terminal-tab WebSocket upgrade against
the terminal-agent. Same TTL, same opportunistic-pruning shape, same
"scoped tokens never valid as root" invariant. Two registries instead of
one because the cookie names are different (`gstack_sse` vs `gstack_pty`)
and the token spaces must not overlap.
No callers yet — wired up in the next commit.
* feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab)
Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun
non-compiled process. Lives separately from `sidebar-agent.ts` so a
WS-framing or PTY-cleanup bug can't take down the chat path (codex
outside-voice review caught the coupling risk).
Architecture:
- Bun.serve on 127.0.0.1:0 (never tunneled).
- POST /internal/grant accepts cookie tokens from the parent server over
loopback, authenticated with a per-boot internal token.
- GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and
(b) the gstack_pty cookie minted by /pty-session. Either gate alone is
insufficient (CSWSH defense + auth defense).
- Lazy spawn: claude PTY is not started until the WS receives its first
data frame. Idle sidebar opens cost nothing.
- Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at
impl time on Bun 1.3.10. proc.terminal.write() for input,
proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback
on close.
- process.on('uncaughtException'|'unhandledRejection') handlers so a
framing bug logs but doesn't kill the listener loop.
Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration
tests spawn /bin/bash instead of requiring claude on every CI runner.
Not yet spawned by anything — wired in the next commit.
* feat(server): wire /pty-session route + spawn terminal-agent
Server-side glue connecting the Terminal sidebar tab to the new
terminal-agent process.
server.ts:
- New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty
HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to
the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie
to the extension.
- /health response gains `terminalPort` (just the port number — never a
shell token). Tokens flow via the cookie path, never /health, because
/health already surfaces AUTH_TOKEN to localhost callers in headed mode
(that's a separate v1.1+ TODO).
- /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS,
so the dual-listener tunnel surface 404s by default-deny.
- Shutdown path now also pkills terminal-agent and unlinks its state files
(terminal-port + terminal-internal-token) so a reconnect doesn't try to
hit a dead port.
cli.ts:
- After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same
pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with
BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn
fails — chat still works without the terminal agent.
* feat(extension): Terminal as default sidebar tab
Adds a primary tab bar (Terminal | Chat) above the existing tab-content
panes. Terminal is the default-active tab; clicking Chat returns to the
existing claude -p one-shot flow which is preserved verbatim.
manifest.json: adds ws://127.0.0.1:*/ to host_permissions so MV3 doesn't
block the WebSocket upgrade.
sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a
"Press any key to start Claude Code" bootstrap card, claude-not-found
install card, xterm mount point, and "session ended" restart UI. Loads
xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no
longer the .active default.
sidepanel.js: new activePrimaryPaneId() helper that reads which primary
tab is selected. Debug-close paths now route back to whichever primary
pane is active (was hardcoded to tab-chat). Primary-tab click handler
toggles .active classes and aria-selected. window.gstackServerPort and
window.gstackAuthToken exposed so sidepanel-terminal.js can build the
/pty-session POST and the WS URL.
sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first
keystroke fires POST /pty-session, then opens
ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically
by the browser. Resize observer sends {type:"resize"} text frames.
ResizeObserver, tab-switch hooks, restart button, install-card retry.
On WS close shows "Session ended, click to restart" — no auto-reconnect
(codex outside-voice flagged that as session-burning).
sidepanel.css: primary-tabs bar + Terminal pane styling (full-height
xterm container, install card, ended state).
* test: terminal-agent + cookie module + sidebar default-tab regression
Three new test files:
terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/
revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure
since 127.0.0.1 over HTTP), source-level guards that /pty-session and
/terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken
or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces
chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant
(spawnClaude is called from message handler, not upgrade), uncaughtException/
unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup.
terminal-agent-integration.test.ts (7 tests): spawns the agent as a real
subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects
the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no
cookie → 401), real WebSocket round-trip with /bin/bash via the
BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it
back), and resize message acceptance.
sidebar-tabs.test.ts (13 tests): structural regression suite locking the
load-bearing invariants of the default-tab change — Terminal is .active,
Chat is not, xterm assets are loaded, debug-close path no longer hardcodes
tab-chat (uses activePrimaryPaneId), primary-tab click handler exists,
chat surface is not accidentally deleted, terminal JS does NOT auto-
reconnect on close, manifest declares ws:// + http:// localhost host
permissions, no unsafe-eval.
Plan called for Playwright + extension regression; the codebase doesn't
ship Playwright extension launcher infra, so we follow the existing
extension-test pattern (source-level structural assertions). Same
load-bearing intent — locks the invariants before they regress.
* docs: Terminal flow + threat model + v1.1 follow-ups
SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS
upgrade path (/pty-session cookie mint → /ws Origin + cookie gate →
lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session,
gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback),
and the threat-model boundary — the Terminal tab bypasses the entire
prompt-injection security stack on purpose; user keystrokes are the
trust source. That trust assumption is load-bearing on three transport
guarantees: local-only listener, Origin gate, cookie auth. Drop any
one of those three and the tab becomes unsafe.
CLAUDE.md: extends the "Sidebar architecture" note to include
terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is
its own process" note so a future contributor doesn't bolt PTY logic
onto sidebar-agent.ts.
TODOS.md: three new follow-ups under a new "Sidebar Terminal" section:
- v1.1: PTY session survives sidebar reload (Issue 1C deferred).
- v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 —
a pre-existing soft leak that cc-pty-import sidesteps but doesn't
fix).
- v1.1+: apply terminal-agent's process.on exception handlers to
sidebar-agent.ts (codex finding #4 — chat path has no fatal
handlers).
* feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip
The chat queue path is gone. The Chrome side panel is now just an
interactive claude PTY in xterm.js. Activity / Refs / Inspector still
exist behind the `debug` toggle in the footer.
Three threads of change, all from dogfood iteration on top of
cc-pty-import:
1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol
- Browsers can't set Authorization on a WebSocket upgrade. We had
been minting an HttpOnly gstack_pty cookie via /pty-session, but
SameSite=Strict cookies don't survive the cross-port jump from
server.ts:34567 to the agent's random port from a chrome-extension
origin. The WS opened then immediately closed → "Session ended."
- /pty-session now also returns ptySessionToken in the JSON body.
- Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`.
Browser sends Sec-WebSocket-Protocol on the upgrade.
- Agent reads the protocol header, validates against validTokens,
and MUST echo the protocol back (Chromium closes the connection
immediately if a server doesn't pick one of the offered protocols).
- Cookie path is kept as a fallback for non-browser callers (curl,
integration tests).
- New integration test exercises the full protocol-auth round-trip
via raw fetch+Upgrade so a future regression of this exact class
fails in CI.
2. fix(extension): UX polish on the Terminal pane
- Eager auto-connect when the sidebar opens — no "Press any key to
start" friction every reload.
- Always-visible ↻ Restart button in the terminal toolbar (not
gated on the ENDED state) so the user can force a fresh claude
mid-session.
- MutationObserver on #tab-terminal's class attribute drives a
fitAddon.fit() + term.refresh() when the pane becomes visible
again — xterm doesn't auto-redraw after display:none → display:flex.
3. feat(extension): rip the chat tab + sidebar-agent.ts
- Sidebar is Terminal-only. No more Terminal | Chat primary nav.
- sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat,
/sidebar-agent/event, /sidebar-tabs* and friends all deleted.
- The pickSidebarModel router (sonnet vs opus) is gone — the live
PTY uses whatever model the user's `claude` CLI is configured with.
- Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive
in the Terminal toolbar. Cleanup now injects its prompt into the
live PTY via window.gstackInjectToTerminal — no more
/sidebar-command POST. The Inspector "Send to Code" action uses
the same injection path.
- clear-chat button removed from the footer.
- sidepanel.js shed ~900 lines of chat polling, optimistic UI,
stop-agent, etc.
Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and
docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar
regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27
structural assertions locking the new layout — Terminal sole pane,
no chat input, quick-actions in toolbar, eager-connect, MutationObserver
repaint, restart helper.
* feat: live tab awareness for the Terminal pane
claude in the PTY now has continuous tab-aware context. Three pieces:
1. Live state files. background.js listens to chrome.tabs.onActivated /
onCreated / onRemoved / onUpdated (throttled to URL/title/status==
complete so loading spinners don't spam) and pushes a snapshot. The
sidepanel relays it as a custom event; sidepanel-terminal.js sends
{type:"tabState"} text frames over the live PTY WebSocket.
terminal-agent.ts writes:
<stateDir>/tabs.json all open tabs (id, url, title, active,
pinned, audible, windowId)
<stateDir>/active-tab.json current active tab (skips chrome:// and
chrome-extension:// internal pages)
Atomic write via tmp + rename so claude never reads a half-written
document. A fresh snapshot is pushed on WS open so the files exist by
the time claude finishes booting.
2. New $B tab-each <command> [args...] meta-command. Fans out a single
command across every open tab, returns
{command, args, total, results: [{tabId, url, title, status, output}]}.
Skips chrome:// pages; restores the originally active tab in a finally
block (so a mid-batch error doesn't leave the user looking at a
different tab); uses bringToFront: false so the OS window doesn't
jump on every fanout. Scope-checks the inner command BEFORE the loop.
3. --append-system-prompt hint at spawn time. Claude is told about both
the state files and the $B tab-each command up front, so it doesn't
have to discover the surface by trial. Passed via the --append-system-
prompt CLI flag, NOT as a leading PTY write — the hint stays out of
the visible transcript.
Tests:
- browse/test/tab-each.test.ts (new) — registration + source-level
invariants (scope check before loop, finally-restore, bringToFront:false,
chrome:// skip) + behavior tests with a mock BrowserManager that verify
iteration order, JSON shape, error handling, and active-tab restore.
- browse/test/terminal-agent.test.ts — three new assertions for
tabState handler shape, atomic-write pattern, and the
--append-system-prompt wiring at spawn.
Verified live: opened 5 tabs, ran $B tab-each url against the live
server, got per-tab JSON results back, original active tab restored
without OS focus stealing.
* chore: drop sidebar-agent test refs after chat rip
Five test files / describe blocks targeted the deleted chat path:
- browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E
with mock claude — whole file gone)
- browse/test/security-review-fullstack.test.ts (review-flow E2E with real
classifier — whole file gone)
- browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for
the security event banner that was ripped from sidepanel.html)
- browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue
permissions, isValidQueueEntry stateFile traversal, loadSession session-ID
validation, switchChatTab DocumentFragment, pollChat reentrancy guard,
/sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation,
AGENT_SRC top-level read converted to graceful fallback)
- browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split
detection on detectCanaryLeak; one tool-output test on sidebar-agent)
- test/skill-validation.test.ts (sidebar agent #584 describe block)
These all assumed sidebar-agent.ts existed and tested chat-queue plumbing,
chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier
canary detection. With the live PTY there is no chat queue, no chat tab,
no LLM stream to canary-scan, and no per-message subprocess. The Terminal
pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts
(27 structural assertions), browse/test/terminal-agent.test.ts, and
browse/test/terminal-agent-integration.test.ts.
bun test → exit 0, 0 failures.
* chore: bump version and changelog (v1.14.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(extension): xterm fills the full Terminal panel height
The Terminal pane only rendered into the top portion of the panel — most
of the panel below the prompt was an empty black gap. Three layered
issues, all about xterm.js measuring dimensions during a layout state
that wasn't ready yet:
1. order-of-operations in connect(): ensureXterm() ran BEFORE
setState(LIVE), so term.open() measured els.mount while it was still
display:none. xterm caches a 0-size viewport synchronously inside
open() and never auto-recovers when the container goes visible.
Flipped: setState(LIVE) → ensureXterm.
2. first fit() ran synchronously before the browser had applied the
.active class transition. Wrapped in requestAnimationFrame so layout
has settled before fit() reads clientHeight.
3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the
flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and
the lack of `min-height: 0` on .terminal-mount meant the item
couldn't shrink below content size. flex:1 then refused to expand
into available space and xterm rendered into whatever its initial
2x2 measurement happened to be.
Fixes:
- extension/sidepanel-terminal.js: reorder + RAF fit
- extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` +
`min-height: 0` + `position: relative`. #tab-terminal overrides
.tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has
its own viewport scroll; the parent shouldn't compete) and explicitly
re-declares `display: flex; flex-direction: column` for #tab-terminal.active.
bun test browse/test/sidebar-tabs.test.ts → 27/27 pass.
Manually verified: side panel opens → Terminal fills full panel height,
xterm scrollback works, debug-tab toggle still repaints correctly.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
54d4cde773 |
security: tunnel dual-listener + SSRF + envelope + path wave (v1.6.0.0) (#1137)
* refactor(security): loosen /connect rate limit from 3/min to 300/min
Setup keys are 24 random bytes (unbruteforceable), so a tight rate limit
does not meaningfully prevent key guessing. It exists only to cap
bandwidth, CPU, and log-flood damage from someone who discovered the
ngrok URL. A legitimate pair-agent session hits /connect once; 300/min
is 60x that pattern and never hit accidentally.
3/min caused pairing to fail on any retry flow (network blip, second
paired client) with no upside. Per-IP tracking was considered and
rejected — adds a bounded Map + LRU for defense already adequate at the
global layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(security): add tunnel-denial-log module for attack visibility
Append-only log of tunnel-surface auth denials to
~/.gstack/security/attempts.jsonl. Gives operators visibility into who
is probing tunneled daemons so the next security wave can be driven by
real attack data instead of speculation.
Design notes:
- Async via fs.promises.appendFile. Never appendFileSync — blocking the
event loop on every denial during a flood is what an attacker wants
(prior learning: sync-audit-log-io, 10/10 confidence).
- In-process rate cap at 60 writes/minute globally. Excess denials are
counted in memory but not written to disk — prevents disk DoS.
- Writes to the same ~/.gstack/security/attempts.jsonl used by the
prompt-injection attempt log. File rotation is handled by the existing
security pipeline (10MB, 5 generations).
No consumers in this commit; wired up in the dual-listener refactor that
follows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(security): dual-listener tunnel architecture
The /health endpoint leaked AUTH_TOKEN to any caller that hit the ngrok
URL (spoofing chrome-extension:// origin, or catching headed mode).
Surfaced by @garagon in PR #1026; the original fix was header-inference
on the single port. Codex's outside-voice review during /plan-ceo-review
called that approach brittle (ngrok header behavior could change, local
proxies would false-positive), and pushed for the structural fix.
This is that fix. Stop making /health a root-token bootstrap endpoint on
any surface the tunnel can reach. The server now binds two HTTP
listeners when a tunnel is active. The local listener (extension, CLI,
sidebar) stays on 127.0.0.1 and is never exposed to ngrok. ngrok
forwards only to the tunnel listener, which serves only /connect
(unauth, rate-limited) and /command with a locked allowlist of
browser-driving commands. Security property comes from physical port
separation, not from header inference — a tunnel caller cannot reach
/health or /cookie-picker or /inspector because they live on a
different TCP socket.
What this commit adds to browse/src/server.ts:
* Surface type ('local' | 'tunnel') and TUNNEL_PATHS +
TUNNEL_COMMANDS allowlists near the top of the file.
* makeFetchHandler(surface) factory replacing the single fetch arrow;
closure-captures the surface so the filter that runs before route
dispatch knows which socket accepted the request.
* Tunnel filter at dispatch entry: 404s anything not on TUNNEL_PATHS,
403s root-token bearers with a clear pairing hint, 401s non-/connect
requests that lack a scoped token. Every denial is logged via
logTunnelDenial (from tunnel-denial-log).
* GET /connect alive probe (unauth on both surfaces) so /pair and
/tunnel/start can detect dead ngrok tunnels without reaching
/health — /health is no longer tunnel-reachable.
* Lazy tunnel listener lifecycle. /tunnel/start binds a dedicated
Bun.serve on an ephemeral port, points ngrok.forward at THAT port
(not the local port), hard-fails on bind error (no local fallback),
tears down cleanly on ngrok failure. BROWSE_TUNNEL=1 startup uses
the same pattern.
* closeTunnel() helper — single teardown path for both the ngrok
listener and the tunnel Bun.serve listener.
* resolveNgrokAuthtoken() helper — shared authtoken lookup across
/tunnel/start and BROWSE_TUNNEL=1 startup (was duplicated).
* TUNNEL_COMMANDS check in /command dispatch: on the tunnel surface,
commands outside the allowlist return 403 with a list of allowed
commands as a hint.
* Probe paths in /pair and /tunnel/start migrated from /health to
GET /connect — the only unauth path reachable on the tunnel surface
under the new architecture.
Test updates in browse/test/server-auth.test.ts:
* /pair liveness-verify test: assert via closeTunnel() helper instead
of the inline `tunnelActive = false; tunnelUrl = null` lines that
the helper subsumes.
* /tunnel/start cached-tunnel test: same closeTunnel() adaptation.
Credit
Derived from PR #1026 by @garagon — thanks for flagging the critical
bug that drove the architectural rewrite. The per-request
isTunneledRequest approach from #1026 is superseded by physical port
separation here; the underlying report remains the root cause for the
entire v1.6.0.0 wave.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(security): add source-level guards for dual-listener architecture
23 source-level assertions that keep future contributors from silently
widening the tunnel surface during a routine refactor. Covers:
* Surface type + tunnelServer state variable shape
* TUNNEL_PATHS is a closed set of /connect, /command, /sidebar-chat
(and NOT /health, /welcome, /cookie-picker, /inspector/*, /pair,
/token, /refs, /activity/stream, /tunnel/{start,stop})
* TUNNEL_COMMANDS includes browser-driving ops only (and NOT
launch-browser, tunnel-start, token-mint, cookie-import, etc.)
* makeFetchHandler(surface) factory exists and is wired to both
listeners with the correct surface parameter
* Tunnel filter runs BEFORE any route dispatch, with 404/403/401
responses and logged denials for each reason
* GET /connect returns {alive: true} unauth
* /command dispatch enforces TUNNEL_COMMANDS on tunnel surface
* closeTunnel() helper tears down ngrok + Bun.serve listener
* /tunnel/start binds on ephemeral port, points ngrok at TUNNEL_PORT
(not local port), hard-fails on bind error (no fallback), probes
cached tunnel via GET /connect (not /health), tears down on
ngrok.forward failure
* BROWSE_TUNNEL=1 startup uses the dual-listener pattern
* logTunnelDenial wired for all three denial reasons
* /connect rate limit is 300/min, not 3/min
All 23 tests pass. Behavioral integration tests (spawn subprocess, real
network) live in the E2E suite that lands later in this wave.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* security: gate download + scrape through validateNavigationUrl (SSRF)
The `goto` command was correctly wired through validateNavigationUrl,
but `download` and `scrape` called page.request.fetch(url, ...) directly.
A caller with the default write scope could hit the /command endpoint
and ask the daemon to fetch http://169.254.169.254/latest/meta-data/
(AWS IMDSv1) or the GCP/Azure/internal equivalents. The response body
comes back as base64 or lands on disk where GET /file serves it.
Fix: call validateNavigationUrl(url) immediately before each
page.request.fetch() call site in download and in the scrape loop.
Same blocklist that already protects `goto`: file://, javascript:,
data:, chrome://, cloud metadata (IPv4 all encodings, IPv6 ULA,
metadata.*.internal).
Tests: extend browse/test/url-validation.test.ts with a source-level
guard that walks every `await page.request.fetch(` call site and
asserts a validateNavigationUrl call precedes it within the same
branch. Regression trips before code review if a future refactor
drops the gate.
* security: route splitForScoped through envelope sentinel escape
The scoped-token snapshot path in snapshot.ts built its untrusted
block by pushing the raw accessibility-tree lines between the literal
`═══ BEGIN UNTRUSTED WEB CONTENT ═══` / `═══ END UNTRUSTED WEB CONTENT ═══`
sentinels. The full-page wrap path in content-security.ts already
applied a zero-width-space escape on those exact strings to prevent
sentinel injection, but the scoped path skipped it.
Net effect: a page whose rendered text contains the literal sentinel
can close the envelope early from inside untrusted content and forge
a fake "trusted" block for the LLM. That includes fabricating
interactive `@eN` references the agent will act on.
Fix:
* Extract the zero-width-space escape into a named, exported helper
`escapeEnvelopeSentinels(content)` in content-security.ts.
* Have `wrapUntrustedPageContent` call it (behavior unchanged on
that path — same bytes out).
* Import the helper in snapshot.ts and map it over `untrustedLines`
in the `splitForScoped` branch before pushing the BEGIN sentinel.
Tests: add a describe block in content-security.test.ts that covers
* `escapeEnvelopeSentinels` defuses BEGIN and END markers;
* `escapeEnvelopeSentinels` leaves normal text untouched;
* `wrapUntrustedPageContent` still emits exactly one real envelope
pair when hostile content contains forged sentinels;
* snapshot.ts imports the helper;
* the scoped-snapshot branch calls `escapeEnvelopeSentinels` before
pushing the BEGIN sentinel (source-level regression — if a future
refactor reorders this, the test trips).
* security: extend hidden-element detection to all DOM-reading channels
The Confusion Protocol envelope wrap (`wrapUntrustedPageContent`)
covers every scoped PAGE_CONTENT_COMMAND, but the hidden-element
ARIA-injection detection layer only ran for `text`. Other DOM-reading
channels (html, links, forms, accessibility, attrs, data, media,
ux-audit) returned their output through the envelope with no hidden-
content filter, so a page serving a display:none div that instructs
the agent to disregard prior system messages, or an aria-label that
claims to put the LLM in admin mode, leaked the injection payload on
any non-text channel. The envelope alone does not mitigate this, and
the page itself never rendered the hostile content to the human
operator.
Fix:
* New export `DOM_CONTENT_COMMANDS` in commands.ts — the subset of
PAGE_CONTENT_COMMANDS that derives its output from the live DOM.
Console and dialog stay out; they read separate runtime state.
* server.ts runs `markHiddenElements` + `cleanupHiddenMarkers` for
every scoped command in this set. `text` keeps its existing
`getCleanTextWithStripping` path (hidden elements physically
stripped before the read). All other channels keep their output
format but emit flagged elements as CONTENT WARNINGS on the
envelope, so the LLM sees what it would otherwise have consumed
silently.
* Hidden-element descriptions merge into `combinedWarnings`
alongside content-filter warnings before the wrap call.
Tests: new describe block in content-security.test.ts covering
* `DOM_CONTENT_COMMANDS` export shape and channel membership;
* dispatch gates on `DOM_CONTENT_COMMANDS.has(command)`, not the
literal `text` string;
* hiddenContentWarnings plumbs into `combinedWarnings` and reaches
wrapUntrustedPageContent;
* DOM_CONTENT_COMMANDS is a strict subset of PAGE_CONTENT_COMMANDS.
Existing datamarking, envelope wrap, centralized-wrapping, and chain
security suites stay green (52 pass, 0 fail).
* security: validate --from-file payload paths for parity with direct paths
The direct `load-html <file>` path runs every caller-supplied file path
through validateReadPath() so reads stay confined to SAFE_DIRECTORIES
(cwd, TEMP_DIR). The `load-html --from-file <payload.json>` shortcut
and its sibling `pdf --from-file <payload.json>` skipped that check and
went straight to fs.readFileSync(). An MCP caller that picks the
payload path (or any caller whose payload argument is reachable from
attacker-influenced text) could use --from-file as a read-anywhere
escape hatch for the safe-dirs policy.
Fix: call validateReadPath(path.resolve(payloadPath)) before readFileSync
at both sites. Error surface mirrors the direct-path branch so ops and
agent errors stay consistent.
Test coverage in browse/test/from-file-path-validation.test.ts:
- source-level: validateReadPath precedes readFileSync in the load-html
--from-file branch (write-commands.ts) and the pdf --from-file parser
(meta-commands.ts)
- error-message parity: both sites reference SAFE_DIRECTORIES
Related security audit pattern: R3 F002 (validateNavigationUrl gap on
download/scrape) and R3 F008 (markHiddenElements gap on 10 DOM commands)
were the same shape — a defense that existed on the primary code path
but not its shortcut sibling. This PR closes the same class of gap on
the --from-file shortcuts.
* fix(design): escape url.origin when injecting into served HTML
serve.ts injected url.origin into a single-quoted JS string in
the response body. A local request with a crafted Host header
(e.g. Host: "evil'-alert(1)-'x") would break out of the string
and execute JS in the 127.0.0.1:<port> origin opened by the
design board. Low severity — bound to localhost, requires a
local attacker — but no reason not to escape.
Fix: JSON.stringify(url.origin) produces a properly quoted,
escaped JS string literal in one call.
Also includes Prettier reformatting (single→double quotes,
trailing commas, line wrapping) applied by the repo's
PostToolUse formatter hook. Security change is the one line
in the HTML injection; everything else is whitespace/style.
* fix(scripts): drop shell:true from slop-diff npx invocations
spawnSync('npx', [...], { shell: true }) invokes /bin/sh -c
with the args concatenated, subjecting them to shell parsing
(word splitting, glob expansion, metacharacter interpretation).
No user input reaches these calls today, so not exploitable —
but the posture is wrong: npx + shell args should be direct.
Fix: scope shell:true to process.platform === 'win32' where
npx is actually a .cmd requiring the shell. POSIX runs the
npx binary directly with array-form args.
Also includes Prettier reformatting (single→double quotes,
trailing commas, line wrapping) applied by the repo's
PostToolUse formatter hook. Security-relevant change is just
the two shell:true -> shell: process.platform === 'win32'
lines; everything else is whitespace/style.
* security(E3): gate GSTACK_SLUG on /welcome path traversal
The /welcome handler interpolates GSTACK_SLUG directly into the filesystem
path used to locate the project-local welcome page. Without validation, a
slug like "../../etc/passwd" would resolve to
~/.gstack/projects/../../etc/passwd/designs/welcome-page-20260331/finalized.html
— classic path traversal.
Not exploitable today: GSTACK_SLUG is set by the gstack CLI at daemon launch,
and an attacker would already need local env-var access to poison it. But
the gate is one regex (^[a-z0-9_-]+$), and a defense-in-depth pass costs us
nothing when the cost of being wrong is arbitrary file read via /welcome.
Fall back to the safe 'unknown' literal when the slug fails validation —
same fallback the code already uses when GSTACK_SLUG is unset. No behavior
change for legitimate slugs (they all match the regex).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* security(N1): replace ?token= SSE auth with HttpOnly session cookie
Activity stream and inspector events SSE endpoints accepted the root
AUTH_TOKEN via `?token=` query param (EventSource can't send Authorization
headers). URLs leak to browser history, referer headers, server logs,
crash reports, and refactoring accidents. Codex flagged this during the
/plan-ceo-review outside voice pass.
New auth model: the extension calls POST /sse-session with a Bearer token
and receives a view-only session cookie (HttpOnly, SameSite=Strict, 30-min
TTL). EventSource is opened with `withCredentials: true` so the browser
sends the cookie back on the SSE connection. The ?token= query param is
GONE — no more URL-borne secrets.
Scope isolation (prior learning cookie-picker-auth-isolation, 10/10
confidence): the SSE session cookie grants access to /activity/stream and
/inspector/events ONLY. The token is never valid against /command, /token,
or any mutating endpoint. A leaked cookie can watch activity; it cannot
execute browser commands.
Components
* browse/src/sse-session-cookie.ts — registry: mint/validate/extract/
build-cookie. 256-bit tokens, 30-min TTL, lazy expiry pruning,
no imports from token-registry (scope isolation enforced by module
boundary).
* browse/src/server.ts — POST /sse-session mint endpoint (requires
Bearer). /activity/stream and /inspector/events now accept Bearer
OR the session cookie, and reject ?token= query param.
* extension/sidepanel.js — ensureSseSessionCookie() bootstrap call,
EventSource opened with withCredentials:true on both SSE endpoints.
Tested via the source guards; behavioral test is the E2E pairing
flow that lands later in the wave.
* browse/test/sse-session-cookie.test.ts — 20 unit tests covering
mint entropy, TTL enforcement, cookie flag invariants, cookie
parsing from multi-cookie headers, and scope-isolation contract
guard (module must not import token-registry).
* browse/test/server-auth.test.ts — existing /activity/stream auth
test updated to assert the new cookie-based gate and the absence
of the ?token= query param.
Cookie flag choices:
* HttpOnly: token not readable from page JS (mitigates XSS
exfiltration).
* SameSite=Strict: cookie not sent on cross-site requests (mitigates
CSRF). Fine for SSE because the extension connects to 127.0.0.1
directly.
* Path=/: cookie scoped to the whole origin.
* Max-Age=1800: 30 minutes, matches TTL. Extension re-mints on
reconnect when daemon restarts.
* Secure NOT set: daemon binds to 127.0.0.1 over plain HTTP. Adding
Secure would block the browser from ever sending the cookie back.
Add Secure when gstack ships over HTTPS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* security(N2): document Windows v20 ABE elevation path on CDP port
The existing comment around the cookie-import-browser --remote-debugging-port
launch claimed "threat model: no worse than baseline." That's wrong on
Windows with App-Bound Encryption v20. A same-user local process that
opens the cookie SQLite DB directly CANNOT decrypt v20 values (DPAPI
context is bound to the browser process). The CDP port lets them bypass
that: connect to the debug port, call Network.getAllCookies inside Chrome,
walk away with decrypted v20 cookies.
The correct fix is to switch from TCP --remote-debugging-port to
--remote-debugging-pipe so the CDP transport is a stdio pipe, not a
socket. That requires restructuring the CDP WebSocket client in this
module and Playwright doesn't expose the pipe transport out of the box.
Non-trivial, deferred from the v1.6.0.0 wave.
This commit updates the comment to correctly describe the threat and
points at the tracking issue. No code change to the launch itself.
Follow-up: #1136.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(E2): document dual-listener tunnel architecture in ARCHITECTURE.md
Adds an explicit per-endpoint disposition table to the Security model
section, covering the v1.6.0.0 dual-listener refactor. Every HTTP
endpoint now has a documented local-vs-tunnel answer. Future audits
(and future contributors wondering "is it safe to add X to the tunnel
surface?") can read this instead of reverse-engineering server.ts.
Also documents:
* Why physical port separation beats per-request header inference
(ngrok behavior drift, local proxies can forge headers, etc.)
* Tunnel surface denial logging → ~/.gstack/security/attempts.jsonl
* SSE session cookie model (gstack_sse, 30-min TTL, stream-scope only,
module-boundary-enforced scope isolation)
* N2 non-goal for Windows v20 ABE via CDP port (tracking #1136)
No code changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(E1): end-to-end pair-agent flow against a spawned daemon
Spawns the browse daemon as a subprocess with BROWSE_HEADLESS_SKIP=1 so
the HTTP layer runs without a real browser. Exercises:
* GET /health — token delivery for chrome-extension origin, withheld
otherwise (the F1 + PR #1026 invariant)
* GET /connect — alive probe returns {alive:true} unauth
* POST /pair — root Bearer required (403 without), returns setup_key
* POST /connect — setup_key exchange mints a distinct scoped token
* POST /command — 401 without auth
* POST /sse-session — Bearer required, Set-Cookie has HttpOnly +
SameSite=Strict (the N1 invariant)
* GET /activity/stream — 401 without auth
* GET /activity/stream?token= — 401 (the old ?token= query param is
REJECTED, which is the whole point of N1)
* GET /welcome — serves HTML, does not leak /etc/passwd content under
the default 'unknown' slug (E3 regex gate)
12 behavioral tests, ~220ms end-to-end, no network dependencies, no
ngrok, no real browser. This is the receipt for the wave's central
'pair-agent still works + the security boundary holds' claim.
Tunnel-port binding (/tunnel/start) is deliberately NOT exercised here
— it requires an ngrok authtoken and live network. The dual-listener
route allowlist is covered by source-level guards in
dual-listener.test.ts; behavioral tunnel testing belongs in a separate
paid-evals harness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* release(v1.6.0.0): bump VERSION + CHANGELOG for security wave
Architectural bump, not patch: dual-listener HTTP refactor changes the
daemon's tunnel-exposure model. See CHANGELOG for the full release
summary (~950 words) covering the five root causes this wave closes:
1. /health token leak over ngrok (F1 + E3 + test infra)
2. /cookie-picker + /inspector exposed over the tunnel (F1)
3. ?token=<ROOT> in SSE URLs leaking to logs/referer/history (N1)
4. /welcome GSTACK_SLUG path traversal (E3)
5. Windows v20 ABE elevation via CDP port (N2 — documented non-goal,
tracked as #1136)
Plus the base PRs: SSRF gate (#1029), envelope sentinel escape (#1031),
DOM-channel hidden-element coverage (#1032), --from-file path validation
(#1103), and 2 commits from #1073 (@theqazi).
VERSION + package.json bumped to 1.6.0.0. CHANGELOG entry covers
credits (@garagon, @Hybirdss, @HMAKT99, @theqazi), review lineage (CEO
→ Codex outside voice → Eng), and the non-goal tracking issue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review findings (4 auto-fixes)
Addresses 4 findings from the Claude adversarial subagent on the
v1.6.0.0 security wave diff. No user-visible behavior change; all
are defense-in-depth hardening of newly-introduced code.
1. GET /connect rate-limited (was POST-only) [HIGH conf 8/10]
Attacker discovering the ngrok URL could probe unlimited GETs for
daemon enumeration. Now shares the global /connect counter.
2. ngrok listener leak on tunnel startup failure [MEDIUM conf 8/10]
If ngrok.forward() resolved but tunnelListener.url() or the
state-file write threw, the Bun listener was torn down but the
ngrok session was leaked. Fixed in BOTH /tunnel/start and
BROWSE_TUNNEL=1 startup paths.
3. GSTACK_SKILL_ROOT path-traversal gate [MEDIUM conf 8/10]
Symmetric with E3's GSTACK_SLUG regex gate — reject values
containing '..' before interpolating into the welcome-page path.
4. SSE session registry pruning [LOW conf 7/10]
pruneExpired() only checked 10 entries per mint call. Now runs
on every validate too, checks 20 entries, with a hard 10k cap as
backstop. Prevents registry growth under sustained extension
reconnect pressure.
Tests remain green (56/56 in sse-session-cookie + dual-listener +
pair-agent-e2e suites).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v1.6.0.0
Reflect the dual-listener tunnel architecture, SSE session cookies,
SSRF guards, and Windows v20 ABE non-goal across the three docs
users actually read for remote-agent and browser auth context:
- docs/REMOTE_BROWSER_ACCESS.md: rewrote Architecture diagram for
dual listeners, fixed /connect rate limit (3/min → 300/min),
removed stale "/health requires no auth" (now 404 on tunnel),
added SSE cookie auth, expanded Security Model with tunnel
allowlist, SSRF guards, /welcome path traversal defense, and
the Windows v20 ABE tracking note.
- BROWSER.md: added dual-listener paragraph to Authentication and
linked to ARCHITECTURE.md endpoint table. Replaced the stale
?token= SSE auth note with the HttpOnly gstack_sse cookie flow.
- CLAUDE.md: added Transport-layer security section above the
sidebar prompt-injection stack so contributors editing server.ts,
sse-session-cookie.ts, or tunnel-denial-log.ts see the load-bearing
module boundaries before touching them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(make-pdf): write --from-file payload to /tmp, not os.tmpdir()
make-pdf's browseClient wrote its --from-file payload to os.tmpdir(),
which is /var/folders/... on macOS. v1.6.0.0's PR #1103 cherry-pick
tightened browse load-html --from-file to validate against the
safe-dirs allowlist ([TEMP_DIR, cwd] where TEMP_DIR is '/tmp' on
macOS/Linux, os.tmpdir() on Windows). This closed a CLI/API parity
gap but broke make-pdf on macOS because /var/folders/... is outside
the allowlist.
Fix: mirror browse's TEMP_DIR convention — use '/tmp' on non-Windows,
os.tmpdir() on Windows. The make-pdf-gate CI failure on macOS-latest
(run 72440797490) is caused by exactly this: the payload file was
rejected by validateReadPath.
Verified locally: the combined-gate e2e test now passes after
rebuilding make-pdf/dist/pdf.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(sidebar): killAgent resets per-tab state; align tests with current agent event format
Two pre-existing bugs surfaced while running the full e2e suite on the
sec-wave branch. Both pre-date v1.6.0.0 (same failures on main at
|
||
|
|
d0782c4c4d |
feat(v1.4.0.0): /make-pdf — markdown to publication-quality PDFs (#1086)
* feat(browse): full $B pdf flag contract + tab-scoped load-html/js/pdf
Grow $B pdf from a 2-line wrapper (hard-coded A4) into a real PDF engine
frontend so make-pdf can shell out to it without duplicating Playwright:
- pdf: --format, --width/--height, --margins, --margin-*, --header-template,
--footer-template, --page-numbers, --tagged, --outline, --print-background,
--prefer-css-page-size, --toc. Mutex rules enforced. --from-file <json>
dodges Windows argv limits (8191 char CreateProcess cap).
- load-html: add --from-file <json> mode for large inline HTML. Size + magic
byte checks still apply to the inline content, not the payload file path.
- newtab: add --json returning {"tabId":N,"url":...} for programmatic use.
- cli: extract --tab-id flag and route as body.tabId to the HTTP layer so
parallel callers can target specific tabs without racing on the active
tab (makes make-pdf's per-render tab isolation possible).
- --toc: non-fatal 3s wait for window.__pagedjsAfterFired. Paged.js ships
later; v1 renders TOC statically via the markdown renderer.
Codex round 2 flagged these P0 issues during plan review. All resolved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(resolvers): add MAKE_PDF_SETUP + makePdfDir host paths
Skill templates can now embed {{MAKE_PDF_SETUP}} to resolve $P to the
make-pdf binary via the same discovery order as $B / $D: env override
(MAKE_PDF_BIN), local skill root, global install, or PATH.
Mirrors the pattern established by generateBrowseSetup() and
generateDesignSetup() in scripts/resolvers/design.ts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(make-pdf): new /make-pdf skill + orchestrator binary
Turn markdown into publication-quality PDFs. $P generate input.md out.pdf
produces a PDF with 1in margins, intelligent page breaks, page numbers,
running header, CONFIDENTIAL footer, and curly quotes/em dashes — all on
Helvetica so copy-paste extraction works ("S ai li ng" bug avoided).
Architecture (per Codex round 2):
markdown → render.ts (marked + sanitize + smartypants) → orchestrator
→ $B newtab --json → $B load-html --tab-id → $B js (poll Paged.js)
→ $B pdf --tab-id → $B closetab
browseClient.ts shells out to the compiled browse CLI rather than
duplicating Playwright. --tab-id isolation per render means parallel
$P generate calls don't race on the active tab. try/finally tab cleanup
survives Paged.js timeouts, browser crashes, and output-path failures.
Features in v1:
--cover left-aligned cover page (eyebrow + title + hairline rule)
--toc clickable static TOC (Paged.js page numbers deferred)
--watermark <text> diagonal DRAFT/CONFIDENTIAL layer
--no-chapter-breaks opt out of H1-starts-new-page
--page-numbers "N of M" footer (default on)
--tagged --outline accessible PDF + bookmark outline (default on)
--allow-network opt in to external image loading (default off for privacy)
--quiet --verbose stderr control
Design decisions locked from the /plan-design-review pass:
- Helvetica everywhere (Chromium emits single-word Tj operators for
system fonts; bundled webfonts emit per-glyph and break extraction).
- Left-aligned body, flush-left paragraphs, no text-indent, 12pt gap.
- Cover shares 1in margins with body pages; no flexbox-center, no
inset padding.
- The reference HTMLs at .context/designs/*.html are the implementation
source of truth for print-css.ts.
Tests (56 unit + 1 E2E combined-features gate):
- smartypants: code/URL-safe, verified against 10 fixtures
- sanitizer: strips <script>/<iframe>/on*/javascript: URLs
- render: HTML assembly, CJK fallback, cover/TOC/chapter wrap
- print-css: all @page rules, margin variants, watermark
- pdftotext: normalize()+copyPasteGate() cross-OS tolerance
- browseClient: binary resolution + typed error propagation
- combined-features gate (P0): 2-chapter fixture with smartypants +
hyphens + ligatures + bold/italic + inline code + lists + blockquote
passes through PDF → pdftotext → expected.txt diff
Deferred to Phase 4 (future PR): Paged.js vendored for accurate TOC page
numbers, highlight.js for syntax highlighting, drop caps, pull quotes,
two-column, CMYK, watermark visual-diff acceptance.
Plan: .context/ceo-plans/2026-04-19-perfect-pdf-generator.md
References: .context/designs/make-pdf-*.html
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(build): wire make-pdf into build/test/setup/bin + add marked dep
- package.json: compile make-pdf/dist/pdf as part of bun run build; add
"make-pdf" to bin entry; include make-pdf/test/ in the free test pass;
add marked@18.0.2 as a dep (markdown parser, ~40KB).
- setup: add make-pdf/dist/pdf to the Apple Silicon codesign loop.
- .gitignore: add make-pdf/dist/ (matches browse/dist/ and design/dist/).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci(make-pdf): matrix copy-paste gate on Ubuntu + macOS
Runs the combined-features P0 gate on pull requests that touch make-pdf/
or browse's PDF surface. Installs poppler (macOS) / poppler-utils (Ubuntu)
per OS. Windows deferred to tolerant mode (Xpdf / Poppler-Windows
extraction variance not yet calibrated against the normalized comparator —
Codex round 2 #18).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(skills): regenerate SKILL.md for make-pdf addition + browse pdf flags
bun run gen:skill-docs picks up:
- the new /make-pdf skill (make-pdf/SKILL.md)
- updated browse command descriptions for 'pdf', 'load-html', 'newtab'
reflecting the new flag contract and --from-file mode
Source of truth stays the .tmpl files + COMMAND_DESCRIPTIONS;
these are regenerated artifacts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(tests): repair stale test expectations + emit _EXPLAIN_LEVEL / _QUESTION_TUNING from preamble
Three pre-existing test failures on main were blocking /ship:
- test/skill-validation.test.ts "Step 3.4 test coverage audit" expected the
literal strings "CODE PATH COVERAGE" and "USER FLOW COVERAGE" which were
removed when the Step 7 coverage diagram was compressed. Updated assertions
to check the stable `Code paths:` / `User flows:` labels that still ship.
- test/skill-validation.test.ts "ship step numbering" allowed-substeps list
didn't include 15.0 (WIP squash) and 15.1 (bisectable commits) which were
added for continuous checkpoint mode. Extended the allowlist.
- test/writing-style-resolver.test.ts and test/plan-tune.test.ts expected
`_EXPLAIN_LEVEL` and `_QUESTION_TUNING` bash variables in the preamble but
generate-preamble-bash.ts had been refactored and those lines were dropped.
Without them, downstream skills can't read `explain_level` or
`question_tuning` config at runtime — terse mode and /plan-tune features
were silently broken.
Added the two bash echo blocks back to generatePreambleBash and refreshed
the golden-file fixtures to match. All three preamble-related golden
baselines (claude/codex/factory) are synchronized with the new output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.4.0.0)
New /make-pdf skill + $P binary.
Turn any markdown file into a publication-quality PDF. Default output is
a 1in-margin Helvetica letter with page numbers in the footer. `--cover`
adds a left-aligned cover page, `--toc` generates a clickable table of
contents, `--watermark DRAFT` overlays a diagonal watermark. Copy-paste
extraction from the PDF produces clean words, not "S a i l i n g"
spaced out letter by letter. CI gate (macOS + Ubuntu) runs a combined-
features fixture through pdftotext on every PR.
make-pdf shells out to browse rather than duplicating Playwright.
$B pdf grew into a real PDF engine with full flag contract (--format,
--margins, --header-template, --footer-template, --page-numbers,
--tagged, --outline, --toc, --tab-id, --from-file). $B load-html and
$B js gained --tab-id. $B newtab --json returns structured output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(changelog): rewrite v1.4.0.0 headline — positive voice, no VC framing
The original headline led with "a PDF you wouldn't be embarrassed to send
to a VC": double-negative voice and audience-too-narrow. /make-pdf works
for essays, letters, memos, reports, proposals, and briefs. Framing the
whole release around founders-to-investors misses the wider audience.
New headline: "Turn any markdown file into a PDF that looks finished."
New tagline: "This one reads like a real essay or a real letter."
Positive voice. Broader aperture. Same energy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c15b805cd8 |
feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062)
* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives
Adds the foundation layer for Puppeteer-parity features:
- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
enables load-html content to survive context recreation (viewport --scale)
via in-memory replay. ASCII lifecycle diagram in the source explains the
clear-before-navigation contract.
- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
for name aliases (setcontent / set-content / setContent → load-html),
consumed by server dispatch and chain prevalidation.
- buildUnknownCommandError() pure function — rich error messages with
Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.
- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
tokens can use it.
- screenshot and viewport descriptions updated for upcoming flags.
- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
Levenshtein threshold + alphabetical tiebreak, short-input guard,
NEW_IN_VERSION upgrade hint, alias + scope integration invariants.
No consumers yet — pure additive foundation. Safe to bisect on its own.
* feat(browse): accept file:// in goto with smart cwd/home-relative parsing
Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:
file:///abs/path.html → unchanged
file://./docs/page.html → file://<cwd>/docs/page.html
file://~/Documents/page.html → file://<HOME>/Documents/page.html
file://docs/page.html → file://<cwd>/docs/page.html
file://localhost/abs/path → unchanged
file://host.example.com/... → rejected (UNC/network)
file:// and file:/// → rejected (would list a directory)
Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.
Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.
Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).
Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.
References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.
* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing
Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:
1. deviceScaleFactor + currentViewport tracking. New private fields (default
scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
deviceScaleFactor is a context-level Playwright option — changing it
requires recreateContext(). The method validates (finite number, 1-3 cap,
headed-mode rejected), stores new values, calls recreateContext(), and
rolls back the fields on failure so a bad call doesn't leave inconsistent
state. Context options at all three sites (launch, recreate happy path,
recreate fallback) now honor the stored values instead of hardcoding
1280x720.
2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
loadedHtml from the session; restoreState replays it via newSession.
setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
is rehydrated and survives *subsequent* scale changes. In-memory only,
never persisted to disk (HTML may contain secrets or customer data).
3. newTab + restoreState now consume validateNavigationUrl's normalized
return value. file://./x, file://~/x, and bare-segment forms now take
effect at every navigation site, not just the top-level goto command.
Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.
References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.
* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch
Wires the new handlers and dispatch logic that the previous commits made
possible:
write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
actionable errors (not found, directory, oversize), extension allowlist
(.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
<div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
override, frame-context rejection. Calls session.setTabContent() so replay
metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
scale-only variant reads current size via page.viewportSize(). Invalid
scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
(not after) so a timed-out goto post-commit doesn't leave stale metadata
that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.
meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
Rejects alongside positional selector (both = error), preserves --clip
conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
watch blocking, and result labels all use canonical names while audit
labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.
server.ts
- Command canonicalization inserted immediately after parse, before scope /
watch / tab-ownership / content-wrapping checks. rawCommand preserved for
future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
upgrade hint for NEW_IN_VERSION entries.
security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
'for (const c of commands)' to match the new chain step shape. Same
isWatching + BLOCKED invariants still asserted.
* chore: bump version and changelog (v1.1.0.0)
- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
viewport --scale, file:// support, setContent replay, and DX polish in user
voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
"Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
to reflect the new command descriptions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review fixes (9 findings from specialist + adversarial review)
Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:
1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
await. Prior order left phantom HTML in replay metadata if setContent
threw (timeout, browser crash), which a later viewport --scale would
silently replay. Now loadedHtml is only recorded on successful load.
2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
recreateContext after restoring the old fields. The fallback path in
the original recreateContext builds a blank context using whatever
this.deviceScaleFactor/currentViewport hold at that moment (which were
the NEW values we were trying to apply). Rolling back the fields without
a second recreate left the live context at new-scale while state tracked
old-scale. Now: restore fields, force re-recreate with old values, only
if that ALSO fails do we return a combined error.
3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
alphabetically, so first equal-distance wins by default. The prior
'(d === bestDist && best !== undefined && cand < best)' clause was dead
code.
4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
refs + frame. Without this, a user who load-html'd then clicked a link
(or had a form submit / JS redirect / OAuth flow) would retain the stale
replay metadata. The next viewport --scale would silently revert the
tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
content was. Silent data corruption. Browser-emitted navigations trigger
this path via wirePageEvents.
5. browser-manager.ts:saveState + restoreState — tab ownership now flows
through BrowserState.owner. Without this, a scoped agent's viewport
--scale would strand them: tab IDs change during recreate, ownership
map held stale IDs, owner lookup failed. New IDs had no owner, so
writes without tabId were denied (DoS). Worse, if the agent sent a
stale tabId the server's swallowed-tab-switch-error path would let the
command hit whatever tab was currently active (cross-tab authz bypass).
Now: clear ownership before restore, re-add per-tab with new IDs.
6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
allowlist (url, isActive, storage:null) instead of object spread.
Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
user-writable state file, letting a tampered state.json smuggle HTML
past load-html's safe-dirs / extension / magic-byte / 50MB-cap
validators, or forge tab ownership. Now stripped at the boundary.
7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
across normalization. file://./app.html?route=home#login previously
resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
%23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
and fixture URLs with query params 404'd or loaded the wrong route.
Now: split on ?/# before path resolution, reattach after.
8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
parsed.hash to the normalized file:// URL. Same fix at the main
validator for absolute paths that go through fileURLToPath round-trip.
9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
'setcontent'). Previously the isAliased variable was computed but
dropped, losing the raw input from the forensic trail. Completes the
plan's codex v3 P2 requirement.
Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).
Tests pass: exit 0, no failures. Build clean.
* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases
Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).
**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected
**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage
**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws
**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
resurrect the stale HTML (validates the onMainFrameNavigated fix)
**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch
Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).
Full suite: exit 0, 223/223 pass.
* docs: update BROWSER.md + CHANGELOG for v1.1.0.0
BROWSER.md:
- Command reference table updated: goto now lists file:// support,
load-html added to Navigate row, viewport flagged with --scale
option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
--selector flag) and notes the tag-selector-not-caught-positionally
gotcha
- New "Retina screenshots — viewport --scale" subsection explains
deviceScaleFactor mechanics, context recreation side effects, and
headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
explains the two paths, their tradeoffs (URL state, relative asset
resolution), the safe-dirs policy, extension allowlist + magic-byte
sniff, 50MB cap, setContent replay across recreateContext, and the
alias routing (setcontent → load-html before scope check)
CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
JS redirects) — not just explicit goto. Also notes SPA query/fragment
preservation for goto file://
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2300067267 |
feat: UX behavioral foundations + ux-audit command (v0.17.0.0) (#1000)
* feat: UX behavioral foundations — Krug's usability principles as shared design infrastructure Add UX_PRINCIPLES resolver distilling Steve Krug's "Don't Make Me Think" into actionable guidance for AI agents. Injected into all 4 design skills as a shared behavioral foundation complementing the existing visual checklist (WHAT to check) and cognitive patterns (HOW designers see) with HOW USERS ACTUALLY BEHAVE. Methodology rewire: 6 Krug usability tests woven into existing design-review phases — Trunk Test, 3-Second Scan, Page Area Test, Happy Talk Detection with word count metric, Mindless Choice Audit, Goodwill Reservoir tracking with visual dashboard. First-person narration mode for design-review output with anti-slop guardrail. Hard rules: 4 Krug always/never rules in DESIGN_HARD_RULES (placeholder-as-label, floating headings, visited link distinction, minimum type size). Krug, Redish, Jarrett added to plan-design-review references. Token ceiling: gen-skill-docs.ts warns if any SKILL.md exceeds 100KB (~25K tokens). Documented in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B ux-audit command + snapshot --heatmap flag New browse meta-command: ux-audit extracts page structure (site ID, navigation, headings, interactive elements, text blocks) as structured JSON for agent-side UX behavioral analysis. Pure data extraction — the agent applies the 6 usability tests and makes judgment calls. Element caps: 50 headings, 100 links, 200 interactive, 50 text blocks. New snapshot flag: -H/--heatmap accepts a JSON color map mapping ref IDs to colors (green/yellow/red/blue/orange/gray). Extends existing snapshot -a annotation system with per-ref colors instead of hardcoded red. Color whitelist validation prevents CSS injection. Composable — any skill can use it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.17.0.0 ARCHITECTURE.md: added {{UX_PRINCIPLES}} resolver to placeholder table. VERSION: bumped to 0.17.0.0 for UX behavioral foundations release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.17.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adversarial review fixes for ux-audit and heatmap Security: - Remove live form value extraction from ux-audit (leaked input field values) - Add ux-audit to PAGE_CONTENT_COMMANDS (untrusted content wrapping) Correctness: - Scope youAreHere selector to nav containers (was matching animation classes) - Validate heatmap JSON is a plain object (string/array/null produced garbage) - Use textContent instead of innerText for word count (avoids layout computation) - Remove dead url variable and unused LINK_CAP constant Found by Codex + Claude adversarial review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c6e6a21d1a |
refactor: AI slop reduction with cross-model quality review (v0.16.3.0) (#941)
* refactor: add error-handling utility module with selective catches safeUnlink (ignores ENOENT), safeKill (ignores ESRCH), isProcessAlive (extracted from cli.ts with Windows support), and json() Response helper. All catches check err.code and rethrow unexpected errors instead of swallowing silently. Unit tests cover happy path + error code paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace defensive try/catches in server.ts with utilities Replace ~12 try/catch sites with safeUnlink/safeKill calls in shutdown, emergencyCleanup, killAgent, and log cleanup. Convert empty catches to selective catches with error code checks. Remove needless welcome page try/catches (fs.existsSync doesn't need wrapping). Reduces slop-scan empty-catch locations from 11 to 8 and error-swallowing from 24 to 18. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract isProcessAlive and replace try/catches in cli.ts Move isProcessAlive to shared error-handling module. Replace ~20 try/catch sites with safeUnlink/safeKill in killServer, connect, disconnect, and cleanup flows. Convert empty catches to selective catches. Reduces slop-scan empty-catch from 22 to 2 locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove unnecessary return await in content-security and read-commands Remove 6 redundant return-await patterns where there's no enclosing try block. Eliminates all defensive.async-noise findings from these files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add slop-scan config to exclude vendor files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace empty catches with selective error handling in sidebar-agent Convert 8 empty catch blocks to selective catches that check err.code (ESRCH for process kills, ENOENT for file ops). Import safeUnlink for cancel file cleanup. Unexpected errors now propagate instead of being silently swallowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace empty catches and mark pass-through wrappers in browser-manager Convert 12 empty catch blocks to selective catches: filesystem ops check ENOENT/EACCES, browser ops check for closed/Target messages, URL parsing checks TypeError. Add 'alias for active session' comments above 6 pass-through wrapper methods to document their purpose (and exempt from slop-scan pass-through-wrappers rule). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in gstack-global-discover Convert 8 defensive catch blocks to selective error handling. Filesystem ops check ENOENT/EACCES, process ops check exit status. Unexpected errors now propagate instead of returning silent defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in write-commands, cdp-inspector, meta-commands, snapshot Convert ~27 empty/obscuring catches to selective error handling across 4 browse source files. CDP ops check for closed/Target/detached messages, DOM ops check TypeError/DOMException, filesystem ops check ENOENT/EACCES, JSON parsing checks SyntaxError. Remove dead code in cdp-inspector where try/catch wrapped synchronous no-ops. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: selective catches in Chrome extension files Convert empty catches and error-swallowing patterns across inspector.js, content.js, background.js, and sidepanel.js. DOM catches filter TypeError/DOMException, chrome API catches filter Extension context invalidated, network catches filter Failed to fetch. Unexpected errors now propagate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore isProcessAlive boolean semantics, add safeUnlinkQuiet, remove unused json() isProcessAlive now catches ALL errors and returns false (pure boolean probe). Callers use it in if/while conditions without try/catch, so throwing on EPERM was a behavior change that could crash the CLI. Windows path gets its safety catch restored. safeUnlinkQuiet added for best-effort cleanup paths where throwing on non-ENOENT errors (like EPERM during shutdown) would abort cleanup. json() removed — dead code, never imported anywhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use safeUnlinkQuiet in shutdown and cleanup paths Shutdown, emergency cleanup, and disconnect paths should never throw on file deletion failures. Switched from safeUnlink (throws on EPERM) to safeUnlinkQuiet (swallows all errors) in these best-effort paths. Normal operation paths (startup, lock release) keep safeUnlink. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove brittle string-matching catches and alias comments in browser-manager Revert 6 catches that matched error messages via includes('closed'), includes('Target'), etc. back to empty catches. These fire-and-forget operations (page.close, bringToFront, dialog dismiss) genuinely don't care about any error type. String matching on error messages is brittle and will break on Playwright version bumps. Remove 6 'alias for active session' comments that existed solely to game slop-scan's pass-through-wrapper exemption rule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove brittle string-matching catches in extension files Revert error-swallowing fixes in background.js and sidepanel.js that matched error messages via includes('Failed to fetch'), includes( 'Extension context invalidated'), etc. In Chrome extensions, uncaught errors crash the entire extension. The original catch-and-log pattern is the correct choice for extension code where any error is non-fatal. content.js and inspector.js changes kept — their TypeError/DOMException catches are typed, not string-based. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add slop-scan usage guidelines to CLAUDE.md Instructions for using slop-scan to improve genuine code quality, not to game metrics or hide that we're AI-coded. Documents what to fix (empty catches on file/process ops, typed exception narrows, return await) and what NOT to fix (string-matching on error messages, linter gaming comments, tightening extension/cleanup catches). Includes utility function reference and baseline score tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add slop-scan as diagnostic in test suite Runs slop-scan after bun test as a non-blocking diagnostic. Prints the summary (top files, hotspots) so you see the number without it gating anything. Available standalone via bun run slop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: slop-diff shows only NEW findings introduced on this branch Runs slop-scan on HEAD and the merge-base, diffs results with line-number-insensitive fingerprinting so shifted code doesn't create false positives. Uses git worktree for clean base comparison. Shows net new vs removed findings. Runs automatically after bun test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: design doc for slop-scan integration in /review and /ship Deferred plan for surfacing slop-diff findings automatically during code review and shipping. Documents integration points, auto-fix vs skip heuristics, and implementation notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.3.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
b73f364411 |
feat: browser data platform for AI agents (v0.16.0.0) (#907)
* refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1868636f49 |
refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873)
* plan: batch command endpoint + multi-tab parallel execution for GStack Browser * refactor: extract TabSession from BrowserManager for per-tab state Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession class. BrowserManager delegates to the active TabSession via getActiveSession(). Zero behavior change — all existing tests pass. This is the foundation for the /batch endpoint: both /command and /batch will use the same handler functions with TabSession, eliminating shared state races during parallel tab execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: update handler signatures to use TabSession Change handleReadCommand and handleSnapshot to take TabSession instead of BrowserManager. Change handleWriteCommand to take both TabSession (per-tab ops) and BrowserManager (global ops like viewport, headers, dialog). handleMetaCommand keeps BrowserManager for tab management. Tests use thin wrapper functions that bridge the old 3-arg call pattern to the new signatures via bm.getActiveSession(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for parallel multi-tab execution Execute multiple commands across tabs in a single HTTP request. Commands targeting different tabs run concurrently via Promise.allSettled. Commands targeting the same tab run sequentially within that group. Features: - Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.) - newtab/closetab as special commands within batch - SSE streaming mode (stream: true) for partial results - Per-command error isolation (one tab failing doesn't abort the batch) - Max 50 commands per batch, soft batch-level timeout A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs in parallel, batched commands). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add batch endpoint integration tests 10 tests covering: - Multi-tab parallel execution (goto + text on different tabs) - Same-tab sequential ordering - Per-command error isolation (one tab fails, others succeed) - Page-scoped refs (snapshot refs are per-session, not global) - Per-tab lastSnapshot (snapshot -D with independent baselines) - getSession/getActiveSession API - Batch-safe command subset validation - closeTab via page.close preserves at-least-one-page invariant - Parallel goto on 3 tabs simultaneously Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25 The test was copying the full 55KB/1075-line codex SKILL.md into the fixture, requiring 8 Read calls just to consume it and exhausting the 15-turn budget before reaching the actual codex review command. Now extracts only the review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move batch endpoint plan into BROWSER.md as feature documentation The batch endpoint is implemented — document it as an actual feature in BROWSER.md (architecture, API shape, design decisions, usage pattern) and remove the standalone plan file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.16.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
8ca950f6f1 |
feat: content security — 4-layer prompt injection defense for pair-agent (#815)
* feat: token registry for multi-agent browser access Per-agent scoped tokens with read/write/admin/meta command categories, domain glob restrictions, rate limiting, expiry, and revocation. Setup key exchange for the /pair-agent ceremony (5-min one-time key → 24h session token). Idempotent exchange handles tunnel drops. 39 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate token registry + scoped auth into browse server Server changes for multi-agent browser access: - /connect endpoint: setup key exchange for /pair-agent ceremony - /token endpoint: root-only minting of scoped sub-tokens - /token/:clientId DELETE: revoke agent tokens - /agents endpoint: list connected agents (root-only) - /health: strips root token when tunnel is active (P0 security fix) - /command: scope/rate/domain checks via token registry before dispatch - Idle timer skips shutdown when tunnel is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ngrok tunnel integration + @ngrok/ngrok dependency BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve(). Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads NGROK_DOMAIN for dedicated domain (stable URL). Updates state file with tunnel URL. Feasibility spike confirmed: SDK works in compiled Bun binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tab isolation for multi-agent browser access Add per-tab ownership tracking to BrowserManager. Scoped agents must create their own tab via newtab before writing. Unowned tabs (pre-existing, user-opened) are root-only for writes. Read access always allowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tab enforcement + POST /pair endpoint + activity attribution Server-side tab ownership check blocks scoped agents from writing to unowned tabs. Special-case newtab records ownership for scoped tokens. POST /pair endpoint creates setup keys for the pairing ceremony. Activity events now include clientId for attribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pair-agent CLI command + instruction block generator One command to pair a remote agent: $B pair-agent. Creates a setup key via POST /pair, prints a copy-pasteable instruction block with curl commands. Smart tunnel fallback (tunnel URL > auto-start > localhost). Flags: --for HOST, --local HOST, --admin, --client NAME. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: tab isolation + instruction block generator tests 14 tests covering tab ownership lifecycle (access checks, unowned tabs, transferTab) and instruction block generator (scopes, URLs, admin flag, troubleshooting section). Fix server-auth test that used fragile sliceBetween boundaries broken by new endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CSO security fixes — token leak, domain bypass, input validation 1. Remove root token from /health endpoint entirely (CSO #1 CRITICAL). Origin header is spoofable. Extension reads from ~/.gstack/.auth.json. 2. Add domain check for newtab URL (CSO #5). Previously only goto was checked, allowing domain-restricted agents to bypass via newtab. 3. Validate scope values, rateLimit, expiresSeconds in createToken() (CSO #4). Rejects invalid scopes and negative values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /pair-agent skill — syntactic sugar for browser sharing Users remember /pair-agent, not $B pair-agent. The skill walks through agent selection (OpenClaw, Hermes, Codex, Cursor, generic), local vs remote setup, tunnel configuration, and includes platform-specific notes for each agent type. Wraps the CLI command with context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remote browser access reference for paired agents Full API reference, snapshot→@ref pattern, scopes, tab isolation, error codes, ngrok setup, and same-machine shortcuts. The instruction block points here for deeper reading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: improved instruction block with snapshot→@ref pattern The paste-into-agent instruction block now teaches the snapshot→@ref workflow (the most powerful browsing pattern), shows the server URL prominently, and uses clearer formatting. Tests updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: smart ngrok detection + auto-tunnel in pair-agent The pair-agent command now checks ngrok's native config (not just ~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is available. The skill template walks users through ngrok install and auth if not set up, instead of just printing a dead localhost URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: on-demand tunnel start via POST /tunnel/start pair-agent now auto-starts the ngrok tunnel without restarting the server. New POST /tunnel/start endpoint reads authtoken from env, ~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok availability and calls the endpoint automatically. Zero manual steps when ngrok is installed and authed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent skill must output the instruction block verbatim Added CRITICAL instruction: the agent MUST output the full instruction block so the user can copy it. Previously the agent could summarize over it, leaving the user with nothing to paste. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: scoped tokens rejected on /command — auth gate ordering bug The blanket validateAuth() gate (root-only) sat above the /command endpoint, rejecting all scoped tokens with 401 before they reached getTokenInfo(). Moved /command above the gate so both root and scoped tokens are accepted. This was the bug Wintermute hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pair-agent auto-launches headed mode before pairing When pair-agent detects headless mode, it auto-switches to headed (visible Chromium window) so the user can watch what the remote agent does. Use --headless to skip this. Fixed compiled binary path resolution (process.execPath, not process.argv[1] which is virtual /$bunfs/ in Bun compiled binaries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode 16 new tests covering: - /command sits above blanket auth gate (Wintermute bug) - /command uses getTokenInfo not validateAuth - /tunnel/start requires root, checks native ngrok config, returns already_active - /pair creates setup keys not session tokens - Tab ownership checked before command dispatch - Activity events include clientId - Instruction block teaches snapshot→@ref pattern - pair-agent auto-headed mode, process.execPath, --headless skip - isNgrokAvailable checks all 3 sources (gstack env, env var, native config) - handlePairAgent calls /tunnel/start not server restart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: chain scope bypass + /health info leak when tunneled 1. Chain command now pre-validates ALL subcommand scopes before executing any. A read+meta token can no longer escalate to admin via chain (eval, js, cookies were dispatched without scope checks). tokenInfo flows through handleMetaCommand into the chain handler. Rejects entire chain if any subcommand fails. 2. /health strips sensitive fields (currentUrl, agent.currentMessage, session) when tunnel is active. Only operational metadata (status, mode, uptime, tabs) exposed to the internet. Previously anyone reaching the ngrok URL could surveil browsing activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: tout /pair-agent as headline feature in CHANGELOG + README Lead with what it does for the user: type /pair-agent, paste into your other agent, done. First time AI agents from different companies can coordinate through a shared browser with real security boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: expand /pair-agent, /design-shotgun, /design-html in README Each skill gets a real narrative paragraph explaining the workflow, not just a table cell. design-shotgun: visual exploration with taste memory. design-html: production HTML with Pretext computed layout. pair-agent: cross-vendor AI agent coordination through shared browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: split handleCommand into handleCommandInternal + HTTP wrapper Chain subcommands now route through handleCommandInternal for full security enforcement (scope, domain, tab ownership, rate limiting, content wrapping). Adds recursion guard for nested chains, rate-limit exemption for chain subcommands, and activity event suppression (1 event per chain, not per sub). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add content-security.ts with datamarking, envelope, and filter hooks Four-layer prompt injection defense for pair-agent browser sharing: - Datamarking: session-scoped watermark for text exfiltration detection - Content envelope: trust boundary wrapping with ZWSP marker escaping - Content filter hooks: extensible filter pipeline with warn/block modes - Built-in URL blocklist: requestbin, pipedream, webhook.site, etc. BROWSE_CONTENT_FILTER env var controls mode: off|warn|block (default: warn) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: centralize content wrapping in handleCommandInternal response path Single wrapping location replaces fragmented per-handler wrapping: - Scoped tokens: content filters + datamarking + enhanced envelope - Root tokens: existing basic wrapping (backward compat) - Chain subcommands exempt from top-level wrapping (wrapped individually) - Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: hidden element stripping for scoped token text extraction Detects CSS-hidden elements (opacity, font-size, off-screen, same-color, clip-path) and ARIA label injection patterns. Marks elements with data-gstack-hidden, extracts text from a clean clone (no DOM mutation), then removes markers. Only active for scoped tokens on text command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: snapshot split output format for scoped tokens Scoped tokens get a split snapshot: trusted @refs section (for click/fill) separated from untrusted web content in an envelope. Ref names truncated to 50 chars in trusted section. Root tokens unchanged (backward compat). Resume command also uses split format for scoped tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add SECURITY section to pair-agent instruction block Instructs remote agents to treat content inside untrusted envelopes as potentially malicious. Lists common injection phrases to watch for. Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS section, not from page content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add 4 prompt injection test fixtures - injection-visible.html: visible injection in product review text - injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive - injection-social.html: social engineering in legitimate-looking content - injection-combined.html: all attack types + envelope escape attempt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: comprehensive content security tests (47 tests) Covers all 4 defense layers: - Datamarking: marker format, session consistency, text-only application - Content envelope: wrapping, ZWSP marker escaping, filter warnings - Content filter hooks: URL blocklist, custom filters, warn/block modes - Instruction block: SECURITY section content, ordering, generation - Centralized wrapping: source-level verification of integration - Chain security: recursion guard, rate-limit exemption, activity suppression - Hidden element stripping: 7 CSS techniques, ARIA injection, false positives - Snapshot split format: scoped vs root output, resume integration Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pair-agent skill compliance + fix all 16 pre-existing test failures Root cause: pair-agent was added without completing the gen-skill-docs compliance checklist. All 16 failures traced back to this. Fixes: - Sync package.json version to VERSION (0.15.9.0) - Add "(gstack)" to pair-agent description for discoverability - Add pair-agent to Codex path exception (legitimately documents ~/.codex/) - Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist - Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.) - Update golden file baselines for ship skill - Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they use the fast mock install instead of scanning real ~/.claude/skills/gstack Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.12.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: E2E exit reason precedence + worktree prune race condition Two fixes for E2E test reliability: 1. session-runner.ts: error_max_turns was misclassified as error_api because is_error flag was checked before subtype. Now known subtypes like error_max_turns are preserved even when is_error is set. The is_error override only applies when subtype=success (API failure). 2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid deleting worktrees from concurrent test runs still in progress. Previously any second test execution would kill the first's worktrees. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore token in /health for localhost extension auth The CSO security fix stripped the token from /health to prevent leaking when tunneled. But the extension needs it to authenticate on localhost. Now returns token only when not tunneled (safe: localhost-only path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: verify /health token is localhost-only, never served through tunnel Updated tests to match the restored token behavior: - Test 1: token assignment exists AND is inside the !tunnelActive guard - Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add security rationale for token in /health on localhost Explains why this is an accepted risk (no escalation over file-based token access), CORS protection, and tunnel guard. Prevents future CSO scans from stripping it without providing an alternative auth path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: verify tunnel is alive before returning URL to pair-agent Root cause: when ngrok dies externally (pkill, crash, timeout), the server still reports tunnelActive=true with a dead URL. pair-agent prints an instruction block pointing at a dead tunnel. The remote agent gets "endpoint offline" and the user has to manually restart everything. Three-layer fix: - Server /pair endpoint: probes tunnel URL before returning it. If dead, resets tunnelActive/tunnelUrl and returns null (triggers CLI restart). - Server /tunnel/start: probes cached tunnel before returning already_active. If dead, falls through to restart ngrok automatically. - CLI pair-agent: double-checks tunnel URL from server before printing instruction block. Falls through to auto-start on failure. 4 regression tests verify all three probe points + CLI verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for multi-command batching Remote agents controlling GStack Browser through a tunnel pay 2-5s of latency per HTTP round-trip. A typical "navigate and read" takes 4 sequential commands = 10-20 seconds. The /batch endpoint collapses N commands into a single HTTP round-trip, cutting a 20-tab crawl from ~60s to ~5s. Sequential execution through the full security pipeline (scope, domain, tab ownership, content wrapping). Rate limiting counts the batch as 1 request. Activity events emitted at batch level, not per-command. Max 50 commands per batch. Nested batches rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add source-level security tests for /batch endpoint 8 tests verifying: auth gate placement, scoped token support, max command limit, nested batch rejection, rate limiting bypass, batch-level activity events, command field validation, and tabId passthrough. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct CHANGELOG date from 2026-04-06 to 2026-04-05 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: consolidate Hermes into generic HTTP option in pair-agent Hermes doesn't have a host-specific config — it uses the same generic curl instructions as any other agent. Removing the dedicated option simplifies the menu and eliminates a misleading distinction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump VERSION to 0.15.14.0, add CHANGELOG entry for batch endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate pair-agent/SKILL.md after main merge Vendoring deprecation section from main's template wasn't reflected in the generated file. Fixes check-freshness CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: checkTabAccess uses options object, add own-only tab policy Refactors checkTabAccess(tabId, clientId, isWrite) to use an options object { isWrite?, ownOnly? }. Adds tabPolicy === 'own-only' support in the server command dispatch — scoped tokens with this policy are restricted to their own tabs for all commands, not just writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --domain flag to pair-agent CLI for domain restrictions Allows passing --domain to pair-agent to restrict the remote agent's navigation to specific domains (comma-separated). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove batch commands CHANGELOG entry and VERSION bump The batch endpoint work belongs on the browser-batch-multitab branch (port-louis), not this branch. Reverting VERSION to 0.15.14.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adopt main's headed-mode /health token serving Our merge kept the old !tunnelActive guard which conflicted with main's security-audit-r2 tests that require no currentUrl/currentMessage in /health. Adopts main's approach: serve token conditionally based on headed mode or chrome-extension origin. Updates server-auth tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve snapshot flags docs completeness for LLM judge Adds $B placeholder explanation, explicit syntax line, and detailed flag behavior (-d depth values, -s CSS selector syntax, -D unified diff format and baseline persistence, -a screenshot vs text output relationship). Fixes snapshot flags reference LLM eval scoring completeness < 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
03973c2fab |
fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847)
* fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com> |
||
|
|
3cda8deec9 |
fix: security audit round 2 (v0.13.4.0) (#640)
* fix: chrome-cdp localhost-only binding Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1 and --remote-allow-origins to prevent network-accessible debugging sessions. Clears 1 Socket anomaly (Chrome CDP session exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extension sender validation + message type allowlist Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's message handler. Defense-in-depth against message spoofing from external extensions or future externally_connectable changes. Clears 2 Socket anomalies (extension permissions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: checksum-verified bun install Replace unverified curl|bash bun installation with checksum-verified download-then-execute pattern. The install script is downloaded, sha256 verified against a known hash, then executed. Preserves the Bun-native install path without adding a Node/npm dependency. Clears Snyk W012 + 3 Socket anomalies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: content trust boundary markers in browse output Wrap page-content commands (text, html, links, forms, accessibility, console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT --- markers. Covers direct commands (server.ts), chain sub-commands, and snapshot output (meta-commands.ts). Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in commands.ts (single source of truth, DRY). Expands the SKILL.md trust warning with explicit processing rules for agents. Clears Snyk W011 (third-party content exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden trust boundary markers against escape attacks - Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent marker injection via history.pushState - Escape marker strings in content (zero-width space) so malicious pages can't forge the END marker to break out of the untrusted block - Wrap resume command snapshot with trust boundary markers - Wrap diff command output with trust boundary markers - Wrap watch stop last snapshot with trust boundary markers Found by cross-model adversarial review (Claude + Codex). * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: gitignore .factory/ and remove from tracking Factory Droid support was removed in this branch. The .factory/ directory was re-added by merging main (which had v0.13.5.0 Factory support). Gitignore it so it stays out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
7450b5160b |
fix: security audit remediation — 12 fixes, 20 tests (v0.13.1.0) (#595)
* fix: remove auth token from /health, secure extension bootstrap (CRITICAL-02 + HIGH-03) - Remove token from /health response (was leaked to any localhost process) - Write .auth.json to extension dir for Manifest V3 bootstrap - sidebar-agent reads token from state file via BROWSE_STATE_FILE env var - Remove getToken handler from extension (token via health broadcast) - Extension loads token before first health poll to prevent race condition * fix: require auth on cookie-picker data routes (CRITICAL-01) - Add Bearer token auth gate on all /cookie-picker/* data/action routes - GET /cookie-picker HTML page stays unauthenticated (UI shell) - Token embedded in served HTML for picker's fetch calls - CORS preflight now allows Authorization header * fix: add state file TTL and plaintext cookie warning (HIGH-02) - Add savedAt timestamp to state save output - Warn on load if state file older than 7 days - Auto-delete stale state files (>7 days) on server startup - Warning about plaintext cookie storage in save message * fix: innerHTML XSS in extension content script and sidepanel (MEDIUM-01) - content.js: replace innerHTML with createElement/textContent for ref panel - sidepanel.js: escape entry.command with escapeHtml() in activity feed - Both found by security audit + Codex adversarial red team * fix: symlink bypass in validateReadPath (MEDIUM-02) - Always resolve to absolute path first (fixes relative path bypass) - Use realpathSync to follow symlinks before boundary check - Throw on non-ENOENT realpathSync failures (explicit over silent) - Resolve SAFE_DIRECTORIES through realpathSync (macOS /tmp → /private/tmp) - Resolve directory part for non-existent files (ENOENT with symlinked parent) * fix: freeze hook symlink bypass and prefix collision (MEDIUM-03) - Add POSIX-portable path resolution (cd + pwd -P, works on macOS) - Fix prefix collision: /project-evil no longer matches /project freeze dir - Use trailing slash in boundary check to require directory boundary * fix: shell script injection in gstack-config and telemetry (MEDIUM-04) - gstack-config: validate keys (alphanumeric+underscore only) - gstack-config: use grep -F (fixed string) instead of -E (regex) - gstack-config: escape sed special chars in values, drop newlines - gstack-telemetry-log: sanitize REPO_SLUG and BRANCH via json_safe() * test: 20 security tests for audit remediation - server-auth: verify token removed from /health, auth on /refs, /activity/* - cookie-picker: auth required on data routes, HTML page unauthenticated - path-validation: symlink bypass blocked, realpathSync failure throws - gstack-config: regex key rejected, sed special chars preserved - state-ttl: savedAt timestamp, 7-day TTL warning - telemetry: branch/repo with quotes don't corrupt JSON - adversarial: sidepanel escapes entry.command, freeze prefix collision * chore: bump version and changelog (v0.13.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: tone down changelog — defense in depth, not catastrophic bugs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
b343ba2797 |
fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552)
* fix(security): skip hidden directories in skill template discovery
discoverTemplates() scans subdirectories for SKILL.md.tmpl files but
only skips node_modules, .git, and dist. Hidden directories like
.claude/, .agents/, and .codex/ (which contain symlinked skill
installs) were being scanned, allowing a malicious .tmpl in a
symlinked skill to inject into the generation pipeline.
Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips
all dot-prefixed directories, matching the standard convention that
hidden dirs are not source code.
* fix(security): sanitize telemetry JSONL inputs against injection
SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly
into printf %s for JSONL output. If any contain double quotes,
backslashes, or newlines, the JSON breaks — or worse, injects
arbitrary fields.
Fix: strip quotes, backslashes, and control characters from all
string fields before JSONL construction via json_safe() helper.
* fix(security): validate JSON input in gstack-review-log
gstack-review-log appends its argument directly to a JSONL file with
no validation. Malformed or crafted input could corrupt the review log
or inject arbitrary content.
Fix: validate input is parseable JSON via python3 before appending.
Reject with exit 1 and stderr message if invalid.
* fix: treat relative dot-paths as file paths in screenshot command
Closes #495
* fix: use host-specific co-author trailer in /ship and /document-release
Codex-generated skills hardcoded a Claude co-author trailer in commit
messages. Users running gstack under Codex pushed commits attributed
to the wrong AI assistant.
Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer
based on ctx.host:
- claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- codex: Co-Authored-By: OpenAI Codex <noreply@openai.com>
Replace hardcoded trailers in ship/SKILL.md.tmpl and
document-release/SKILL.md.tmpl with the resolver placeholder.
Fixes #282. Fixes #383.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: auto-upgrade marker no longer masks newer remote versions
When a just-upgraded-from marker persists across sessions, the update
check would write UP_TO_DATE to cache and exit immediately — never
fetching the remote VERSION. Users silently miss updates that landed
after their last upgrade.
Remove the early exit and premature cache write so the script falls
through to the remote check after consuming the marker. This ensures
JUST_UPGRADED is still emitted for the preamble, while also detecting
any newer versions available upstream.
Fixes #515
* fix: decouple doc generation from binary compilation in build script
The build script chains gen:skill-docs and bun build --compile with &&,
so a doc generation failure (e.g. missing Codex host config, template
error) prevents the browse binary from being compiled. Users end up
with a broken install where setup reports the binary is missing.
Replace && with ; for the two gen:skill-docs steps so they run
independently of the compilation chain. Doc generation errors are still
visible in stderr, but no longer block binary compilation.
Fixes #482
* fix: extend security sanitization + add 10 tests for merged community PRs
- Extend json_safe() to ERROR_CLASS and FAILED_STEP fields
- Improve ERROR_MESSAGE escaping to handle backslashes and newlines
- Replace python3 with bun for JSON validation in gstack-review-log
- Add 7 telemetry injection prevention tests
- Add 2 review-log JSON validation tests
- Add 1 discover-skills hidden directory filtering test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via)
browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction)
ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md
dashboard-via: extract dashboard section only + increase timeout 90s→180s
Root cause: copying full SKILL.md files into test fixtures caused context bloat,
leading to timeouts and flaky turn limits. Extracting only the relevant section
cut dashboard-via from timing out at 240s to finishing in 38s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add E2E fixture extraction rule to CLAUDE.md
Never copy full SKILL.md files into E2E test fixtures. Extract only
the section the test needs. Also: run targeted evals in foreground,
never pkill and restart mid-run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize journey-think-bigger routing test
Use exact trigger phrases from plan-ceo-review skill description
("think bigger", "expand scope", "ambitious enough") instead of
the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut
cost per attempt ($0.12 vs $0.25). Test remains periodic tier
since LLM routing is inherently non-deterministic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* remove: delete journey-think-bigger routing test
Never passed reliably. Tests ambiguous routing ("think bigger" →
plan-ceo-review) but Claude legitimately answers directly instead
of invoking a skill. The other 10 journey tests cover routing
with clear, actionable signals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.12.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: bluzername <bluzer@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>
|
||
|
|
7665adf4fe |
feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517)
* feat: CDP connect — control real Chrome/Comet via Playwright Add `connectCDP()` to BrowserManager: connects to a running browser via Chrome DevTools Protocol. All existing browse commands work unchanged through Playwright's abstraction layer. - chrome-launcher.ts: browser discovery, CDP probe, auto-relaunch with rollback - browser-manager.ts: connectCDP(), mode guards (close/closeTab/recreateContext/handoff), auto-reconnect on browser restart, getRefMap() for extension API - server.ts: CDP branch in start(), /health gains mode field, /refs endpoint, idle timer only resets on /command (not passive endpoints) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse connect/disconnect/focus CLI commands - connect: pre-server command that discovers browser, starts server in CDP mode - disconnect: drops CDP connection, restarts in headless mode - focus: brings browser window to foreground via osascript (macOS) - status: now shows Mode: cdp | launched | headed - startServer() accepts extra env vars for CDP URL/port passthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: CDP-aware skill templates — skip cookie import in real browser mode Skills now check `$B status` for CDP mode and skip: - /qa: cookie import prompt, user-agent override, headless workarounds - /design-review: cookie import for authenticated pages - /setup-browser-cookies: returns "not needed" in CDP mode Regenerated SKILL.md files from updated templates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: activity streaming — SSE endpoint for Chrome extension Side Panel Real-time browse command feed via Server-Sent Events: - activity.ts: ActivityEntry type, CircularBuffer (capacity 1000), privacy filtering (redacts passwords, auth tokens, sensitive URL params), cursor-based gap detection, async subscriber notification - server.ts: /activity/stream SSE, /activity/history REST, handleCommand instrumented with command_start/command_end events - 18 unit tests for filterArgs privacy, emitActivity, subscribe lifecycle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Chrome extension Side Panel + Conductor API proposal Chrome extension (Manifest V3, sideload): - Side Panel with live activity feed, @ref overlays, dark terminal aesthetic - Background worker: health polling, SSE relay, ref fetching - Popup: port config, connection status, side panel launcher - Content script: floating ref panel with @ref badges Conductor API proposal (docs/designs/CONDUCTOR_SESSION_API.md): - SSE endpoint for full Claude Code session mirroring in Side Panel - Discovery via HTTP endpoint (not filesystem — extensions can't read files) TODOS.md: add $B watch, multi-agent tabs, cross-platform CDP, Web Store publishing. Mark CDP mode as shipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor runtime, skip osascript quit for sandboxed apps macOS App Management blocks Electron apps (Conductor) from quitting other apps via osascript. Now detects the runtime environment: - terminal/claude-code/codex: can manage apps freely - conductor: prints manual restart instructions + polls for 60s detectRuntime() checks env vars and parent process. When Chrome needs restart but we can't quit it, prints step-by-step instructions and waits for the user to restart Chrome with --remote-debugging-port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor via actual env vars (CONDUCTOR_WORKSPACE_NAME) Previous detection checked CONDUCTOR_WORKSPACE_ID which doesn't exist. Conductor sets CONDUCTOR_WORKSPACE_NAME, CONDUCTOR_BIN_DIR, CONDUCTOR_PORT, and __CFBundleIdentifier=com.conductor.app. Check these FIRST because Conductor sessions also have ANTHROPIC_API_KEY (which was matching claude-code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: connection status pill — floating indicator when gstack controls Chrome Small pill in bottom-right corner of every page: "● gstack · 3 refs" Shows when connected via CDP, fades to 30% opacity after 3s, full on hover. Disappears entirely when disconnected. Background worker now notifies content scripts on connect/disconnect state changes so the pill appears/disappears without polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome requires --user-data-dir for remote debugging Chrome refuses --remote-debugging-port without an explicit --user-data-dir. Add userDataDir to BrowserBinary registry (macOS Application Support paths) and pass it in both auto-launch and manual restart instructions. Fix double-quoting in CLI manual restart instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome must be fully quit before launching with --remote-debugging-port Chrome refuses to enable CDP on its default profile when another instance is running (even with explicit --user-data-dir). The only reliable path: fully quit Chrome first, then relaunch with the flag. Updated instructions to emphasize this clearly with verification step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: bin/chrome-cdp — quit Chrome and relaunch with CDP in one command Quits Chrome gracefully, waits for full exit, relaunches with --remote-debugging-port, polls until CDP is ready. Usage: chrome-cdp [port] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use Playwright channel:chrome instead of broken connectOverCDP Playwright's connectOverCDP hangs with Chrome 146 due to CDP protocol version mismatch. Switch to channel:'chrome' which uses Playwright's native pipe protocol to launch the system Chrome binary directly. This is simpler and more reliable: - No CDP port discovery needed - No --remote-debugging-port or --user-data-dir hassles - $B connect just works — launches real Chrome headed window - All Playwright APIs (snapshot, click, fill) work unchanged bin/chrome-cdp updated with symlinked profile approach (kept for manual CDP use cases, but $B connect no longer needs it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: green border + gstack label on controlled Chrome window Injects a 2px green border and small "gstack" label on every page loaded in the controlled Chrome window via context.addInitScript(). Users can instantly tell which Chrome window Claude controls. Also fixes close() for channel:chrome mode (uses browser.close() not browser.disconnect() which doesn't exist). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: cleanup chrome-launcher runtime detection, remove puppeteer-core dep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(design): redesign controlled Chrome indicator Replace crude green border + label with polished indicator: - 2px shimmer gradient at top edge (green→cyan→green, 3s loop) - Floating pill bottom-right with frosted glass bg, fades to 25% opacity after 4s so it doesn't compete with page content - prefers-reduced-motion disables shimmer animation - Much more subtle — looks like a developer tool, not broken CSS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document real browser mode + Chrome extension in BROWSER.md and README.md BROWSER.md: new sections for connect/disconnect/focus commands, Chrome extension Side Panel install, CDP-aware skills, activity streaming. Updated command reference table, key components, env vars, source map. README.md: updated /browse description, added "Real browser mode" to What's New section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: step-by-step Chrome extension install guide in BROWSER.md Replace terse bullet points with numbered walkthrough covering: developer mode toggle, load unpacked, macOS file picker tip (Cmd+Shift+G), pin extension, configure port, open side panel. Added troubleshooting section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Cmd+Shift+. tip for hidden folders in macOS file picker macOS hides folders starting with . by default. Added both shortcuts: Cmd+Shift+G (paste path directly) and Cmd+Shift+. (show hidden files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: integrate hidden folder tips into the install flow naturally Move Cmd+Shift+G and Cmd+Shift+. tips inline with the file picker step instead of as a separate tip block after it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-load Chrome extension when $B connect launches Chrome Extension auto-loads via --load-extension flag — no manual chrome://extensions install needed. findExtensionPath() checks repo root, global install, and dev paths. Also adds bin/gstack-extension helper for manual install in regular Chrome, and rewrites BROWSER.md install docs with auto-load as primary path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /connect-chrome skill — one command to launch Chrome with Side Panel New skill that runs $B connect, verifies the connection, guides the user to open the Side Panel, and demos the live activity feed. Extension auto-loads via --load-extension so no manual chrome://extensions install needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use launchPersistentContext for Chrome extension loading Playwright's chromium.launch() silently ignores --load-extension. Switch to launchPersistentContext with ignoreDefaultArgs to remove --disable-extensions flag. Use bundled Chromium (real Chrome blocks unpacked extensions). Fixed port 34567 for CDP mode so the extension auto-connects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sync extension to DESIGN.md — amber accent, zinc neutrals, grain texture Import design system from gstack-website. Update all extension colors: green (#4ade80) → amber (#F59E0B/#FBBF24), zinc gray neutrals, grain texture overlay. Regenerate icons as amber "G" monogram on dark background. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat with Claude Code — icon opens side panel directly Replace popup flyout with direct side panel open on icon click. Primary UI is now a chat interface that sends messages to Claude Code via file queue. Activity/Refs tabs moved behind a debug toggle in the footer. Command bar with history, auto-poll for responses, amber design system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent — Claude-powered chat backend via file queue Add /sidebar-command, /sidebar-response, and /sidebar-chat endpoints to the browse server. sidebar-agent.ts watches the command queue file, spawns claude -p with browse context for each message, and streams responses back to the sidebar chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate gstack pill overlay, hide crash restore bubble The addInitScript indicator and the extension's content script were both injecting bottom-right pills, causing duplicates. Remove the pill from addInitScript (extension handles it). Replace --restore-last-session with --hide-crash-restore-bubble to suppress the "Chromium didn't shut down correctly" dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: state file authority — CDP server cannot be silently replaced Hardens the connect/disconnect lifecycle: - ensureServer() refuses to auto-start headless when CDP server is alive - $B connect does full cleanup: SIGTERM → 2s → SIGKILL, profile locks, state - shutdown() cleans Chromium SingletonLock/Socket/Cookie files - uncaughtException/unhandledRejection handlers do emergency cleanup This prevents the bug where a headless server overwrites the CDP server's state file, causing $B commands to hit the wrong browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent streaming events + session state management Enhance sidebar-agent.ts with: - Live streaming of claude -p events (tool_use, text, result) to sidebar - Session state file for BROWSE_STATE_FILE propagation to claude subprocess - Improved logging (stderr, exit codes, event types) - stdin.end() to prevent claude waiting for input - summarizeToolInput() with path shortening for compact sidebar display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat UI — streaming events, agent status, reconnect retry Sidebar panel improvements: - Chat tab renders streaming agent events (tool_use, text, result) - Thinking dots animation while agent processes - Agent error display with styled error blocks - tryConnect() with 2s retry loop for initial connection - Debug tabs (Activity/Refs) hidden behind gear toggle - Clear chat button - Compact tool call display with path shortening Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: server-integrated sidebar agent with sessions and message queue Move the sidebar agent from a separate bun process into server.ts: - Agent spawns claude -p directly when messages arrive via /sidebar-command - In-memory chat buffer backed by per-session chat.jsonl on disk - Session manager: create, load, persist, list sessions - Message queue (cap 5) with agent status tracking (idle/processing/hung) - Stop/kill endpoints with queue dismiss support - /health now returns agent status + session info - All sidebar endpoints require Bearer auth - Agent killed on server shutdown - 120s timeout detects hung claude processes Eliminates: file-queue polling, separate sidebar-agent.ts process, stale auth tokens, state file conflicts between processes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: extension auth + token flow for server-integrated agent Update Chrome extension to use Bearer auth on all sidebar endpoints: - background.js captures auth token from /health, exposes via getToken msg - background.js sets openPanelOnActionClick for direct side panel access - sidepanel.js gets token from background, sends in all fetch headers - Health broadcasts include token so sidebar auto-authenticates - Removes popup from manifest — icon click opens side panel directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: self-healing sidebar — reconnect banner, state machine, copy button Sidebar UI now handles disconnection gracefully: - Connection state machine: connected → reconnecting → dead - Amber pulsing banner during reconnect (2s retry, 30 attempts) - Red "Server offline" banner with Reconnect + Copy /connect-chrome buttons - Green "Reconnected" toast that fades after 3s on successful reconnect - Copy button lets user paste /connect-chrome into any Claude Code session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: crash handling — save session, kill agent, distinct exit codes Hardened shutdown/crash behavior: - Browser disconnect exits with code 2 (distinct from crash code 1) - emergencyCleanup kills agent subprocess and saves session state - Clean shutdown saves session before exit (chat history persists) - Clear user message on browser disconnect: "Run $B connect to reconnect" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: worktree-per-session isolation for sidebar agent Each sidebar session gets an isolated git worktree so the agent's file operations don't conflict with the user's working directory: - createWorktree() creates detached HEAD worktree in ~/.gstack/worktrees/ - Falls back to main cwd for non-git repos or on creation failure - Handles collision cleanup from prior crashes - removeWorktree() cleans up on session switch and shutdown - worktreePath persisted in session.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): ISSUE-001 — disconnect blocked by CDP guard in ensureServer $B disconnect was routed through ensureServer() which refused to start a headless server when a CDP state file existed. Disconnect is now handled before ensureServer() (like connect), with force-kill + cleanup fallback when the CDP server is unresponsive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude binary path for daemon-spawned agent The browse server runs as a daemon and may not inherit the user's shell PATH. Add findClaudeBin() that checks ~/.local/bin/claude (standard install location), which claude, and common system paths. Shows a clear error in the sidebar chat if claude CLI is not found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude symlinks + check Conductor bundled binary posix_spawn fails on symlinks in compiled bun binaries. Now: - Checks Conductor app's bundled binary first (not a symlink) - Scans ~/.local/share/claude/versions/ for direct versioned binaries - Uses fs.realpathSync() to resolve symlinks before spawning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: compiled bun binary cannot posix_spawn — use external agent process Compiled bun binaries fail posix_spawn on ALL executables (even /bin/bash). The server now writes to an agent queue file, and a separate non-compiled bun process (sidebar-agent.ts) reads the queue, spawns claude, and POSTs events back via /sidebar-agent/event. Changes: - server.ts: spawnClaude writes to queue file instead of spawning directly - server.ts: new /sidebar-agent/event endpoint for agent → server relay - server.ts: fix result event field name (event.text vs event.result) - sidebar-agent.ts: rewritten to poll queue file, relay events via HTTP - cli.ts: $B connect auto-starts sidebar-agent as non-compiled bun process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: loading spinner on sidebar open while connecting to server Shows an amber spinner with "Connecting..." when the sidebar first opens, replacing the empty state. After the first successful /sidebar-chat poll: - If chat history exists: renders it immediately - If no history: shows the welcome message Prevents the jarring empty-then-populated flash on sidebar open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-friction side panel — auto-open on install, pill is clickable Three changes to eliminate manual side panel setup: - Auto-open side panel on extension install/update (onInstalled listener) - gstack pill (bottom-right) is now clickable — opens the side panel - Pill has pointer-events: auto so clicks always register (was: none) User no longer needs to find the puzzle piece icon, pin the extension, or know the side panel exists. It opens automatically on first launch and can be re-opened by clicking the floating gstack pill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: kill CDP naming, delete chrome-launcher.ts dead code The connectCDP() method and connectionMode: 'cdp' naming was a legacy artifact — real Chrome was tried but failed (silently blocks --load-extension), so the implementation already used Playwright's bundled Chromium via launchPersistentContext(). The naming was misleading. Changes: - Delete chrome-launcher.ts (361 LOC) — only import was in unreachable attemptReconnect() method - Delete dead attemptReconnect() and reconnecting field - Delete preExistingTabIds (was for protecting real Chrome tabs we never connect to) - Rename connectCDP() → launchHeaded() - Rename connectionMode: 'cdp' → 'headed' across all files - Replace BROWSE_CDP_URL/BROWSE_CDP_PORT env vars with BROWSE_HEADED=1 - Regenerate SKILL.md files for updated command descriptions - Move BrowserManager unit tests to browser-manager-unit.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: converge handoff into connect — extension loads on handoff Handoff now uses launchPersistentContext() with extension auto-loading, same as the connect/launchHeaded() path. This means when the agent gets stuck (2FA, CAPTCHA) and hands off to the user, the Chrome extension + side panel are available automatically. Before: handoff used chromium.launch() + newContext() — no extension After: handoff uses chromium.launchPersistentContext() — extension loads Also sets connectionMode to 'headed' and disables dialog auto-accept on handoff, matching connect behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gate sidebar chat behind --chat flag $B connect (default): headed Chromium + extension with Activity + Refs tabs only. No separate agent spawned. Clean, no confusion. $B connect --chat: same + Chat tab with standalone claude -p agent. Shows experimental banner: "Standalone mode — this is a separate agent from your workspace." Implementation: - cli.ts: parse --chat, set BROWSE_SIDEBAR_CHAT env, conditionally spawn sidebar-agent - server.ts: gate /sidebar-* routes behind chatEnabled, return 403 when disabled, include chatEnabled in /health response - sidepanel.js: applyChatEnabled() hides/shows Chat tab + banner - background.js: forward chatEnabled from health response - sidepanel.html/css: experimental banner with amber styling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: file drop relay + $B inbox command Sidebar agent now writes structured messages to .context/sidebar-inbox/ when processing user input. The workspace agent can read these via $B inbox to see what the user reported from the browser. File drop format: .context/sidebar-inbox/{timestamp}-observation.json { type, timestamp, page: {url}, userMessage, sidebarSessionId } Atomic writes (tmp + rename) prevent partial reads. $B inbox --clear removes messages after display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B watch — passive observation mode Claude enters read-only mode and captures periodic snapshots (every 5s) while the user browses. Mutation commands (click, fill, etc.) are blocked during watch. $B watch stop exits and returns a summary with the last snapshot. Requires headed mode ($B connect). This is the inverse of the scout pattern — the workspace agent watches through the browser instead of the sidebar relaying to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for sidebar-agent, file-drop, and watch mode 33 new tests covering: - Sidebar agent queue parsing (valid/malformed/empty JSONL) - writeToInbox file drop (directory creation, atomic writes, JSON format) - Inbox command (display, sorting, --clear, malformed file handling) - Watch mode state machine (start/stop cycles, snapshots, duration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: TODOS cleanup + Chrome vs Chromium exploration doc - Update TODOS.md: mark CDP mode, $B watch, sidebar scout as SHIPPED - Delete dead "cross-platform CDP browser discovery" TODO - Rename dependencies from "CDP connect" to "headed mode" - Add docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md memorializing the architecture exploration and decision to use Playwright Chromium Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Conductor Chrome sidebar integration design doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar-agent validates cwd before spawning claude The queue entry may reference a worktree that was cleaned up between sessions. Now falls back to process.cwd() if the path doesn't exist, preventing silent spawn failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gen-skill-docs resolver merge + preamble tier gate + plan file discovery The local RESOLVERS record in gen-skill-docs.ts was shadowing the imported canonical resolvers, causing stale test coverage and preamble generators to be used instead of the authoritative versions in resolvers/. Changes: - Merge imported RESOLVERS with local overrides (spread + override pattern) - Fix preamble tier gate: tier 1 skills no longer get AskUserQuestion format - Make plan file discovery host-agnostic (search multiple plan dirs) - Add missing E2E tier entries for ship/review plan completion tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ungate sidebar agent + raise timeout to 5 minutes (v0.12.0) Sidebar chat is now always available in headed mode — no --chat flag needed. Agent tasks get 5 minutes instead of 2, enabling multi-page workflows like navigating directories and filling forms across pages. Changes: - cli.ts: remove --chat flag, always set BROWSE_SIDEBAR_CHAT=1, always spawn agent - server.ts: remove chatEnabled gate (403 response), raise AGENT_TIMEOUT_MS to 300s - sidebar-agent.ts: raise child process timeout from 120s to 300s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: headed mode + sidebar agent documentation (v0.12.0) - README: sidebar agent section, personal automation example (school parent portal), two auth paths (manual login + cookie import), DevTools MCP mention - BROWSER.md: sidebar agent section with usage, timeout, session isolation, authentication, and random delay documentation - connect-chrome template: add sidebar chat onboarding step - CHANGELOG: v0.12.0 entry covering headed mode, sidebar agent, extension - VERSION: bump to 0.12.0.0 - TODOS: Chrome DevTools MCP integration as P0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files Generated from updated templates + resolver merge. Key changes: - Tier 1 skills no longer include AskUserQuestion format section - Ship/review skills now include coverage gate with thresholds - Connect-chrome skill includes sidebar chat onboarding step - Plan file discovery uses host-agnostic paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate Codex connect-chrome skill Updated preamble with proactive prompt and sidebar chat onboarding step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: network idle, state persistence, iframe support, chain pipe format (v0.12.1.0) (#516) * feat: network idle detection + chain pipe format - Upgrade click/fill/select from domcontentloaded to networkidle wait (2s timeout, best-effort). Catches XHR/fetch triggered by interactions. - Add pipe-delimited format to chain as JSON fallback: $B chain 'goto url | click @e5 | snapshot -ic' - Add post-loop networkidle wait in chain when last command was a write. - Frame-aware: commands use target (getActiveFrameOrPage) for locator ops, page-only ops (goto/back/forward/reload) guard against frame context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B state save/load + $B frame — new browse commands - state save/load: persist cookies + URLs to .gstack/browse-states/{name}.json File perms 0o600, name sanitized to [a-zA-Z0-9_-]. V1 skips localStorage (breaks on load-before-navigate). Load replaces session via closeAllPages(). - frame: switch command context to iframe via CSS selector, @ref, --name, or --url. 'frame main' returns to main frame. Execution target abstraction (getActiveFrameOrPage) across read-commands, snapshot, and write-commands. - Frame context cleared on tab switch, navigation, resume, and handoff. - Snapshot shows [Context: iframe src="..."] header when in frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add tests for network idle, chain pipe format, state, and frame - Network idle: click on fetch button waits for XHR, static click is fast - Chain pipe: pipe-delimited commands, quoted args, JSON still works - State: save/load round-trip, name sanitization, missing state error - Frame: switch to iframe + back, snapshot context header, fill in frame, goto-in-frame guard, usage error New fixtures: network-idle.html (fetch + static buttons), iframe.html (srcdoc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review fixes — iframe ref scoping, detached frame recovery, state validation - snapshot.ts: ref locators, cursor-interactive scan, and cursor locator now use target (frame-aware) instead of page — fixes @ref clicking in iframes - browser-manager.ts: getActiveFrameOrPage auto-recovers from detached frames via isDetached() check - meta-commands.ts: state load resets activeFrame, elementHandle disposed after contentFrame(), state file schema validation (cookies + pages arrays), filter empty pipe segments in chain tokenizer - write-commands.ts: upload command uses target.locator() for frame support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + rebuild binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
cf3582c637 |
fix: community security + stability fixes (wave 1) (#325)
* feat: add /cso skill — OWASP Top 10 + STRIDE security audit * fix: harden gstack-slug against shell injection via eval Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output to prevent shell metacharacter injection when used with eval. Only affects self-hosted git servers with lax naming rules — GitHub and GitLab enforce safe characters already. Defense-in-depth. * fix(security): sanitize gstack-slug output against shell injection The gstack-slug script is consumed via eval $(gstack-slug) throughout skill templates. If a git remote URL contains shell metacharacters like $(), backticks, or semicolons, they would be executed by eval. Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and BRANCH before output. This preserves normal values while neutralizing any injection payload in malicious remote URLs. Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm After: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf- * fix(security): redact sensitive values in storage command output The browse `storage` command dumps all localStorage and sessionStorage as JSON. This can expose tokens, API keys, JWTs, and session credentials in QA reports and agent transcripts. Fix: redact values where the key matches sensitive patterns (token, secret, key, password, auth, jwt, csrf) or the value starts with known credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.). Redacted values show length to aid debugging: [REDACTED — 128 chars] * fix(browse): kill old server before restart to prevent orphaned chromium processes When the health check fails or the server connection drops, `ensureServer()` and `sendCommand()` would call `startServer()` without first killing the previous server process. This left orphaned `chrome-headless-shell` renderer processes running at ~120% CPU each. After several reconnect cycles (e.g. pages that crash during hydration or trigger hard navigations via `window.location.href`), dozens of zombie chromium processes accumulate and exhaust system resources. Fix: call `killServer()` on the stale PID before spawning a new server in both the `ensureServer()` unhealthy path and the `sendCommand()` connection- lost retry path. Fixes #294 * Fix YAML linter error: nested mapping in compact sequence entries Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations. This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior. * fix(security): add Azure metadata endpoint to SSRF blocklist Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the existing AWS/GCP endpoints. Closes the coverage gap identified in #125. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for storage redaction Test key-based redaction (auth_token, api_key), value-based redaction (JWT prefix, GitHub PAT prefix), pass-through for normal keys, and length preservation in redacted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add community PR triage process to CONTRIBUTING.md Document the wave-based PR triage pattern used for batching community contributions. References PR #205 (v0.8.3) as the original example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adjust test key names to avoid redaction pattern collision Rename testKey→testData and normalKey→displayName in storage tests to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key'). Also generate Codex variant of /cso skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.9.10.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-noise /cso security audits with FP filtering (v0.11.0.0) Absorb Anthropic's security-review false positive filtering into /cso: - 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only, regex injection, race conditions unless concrete, etc.) - 9 precedents (React XSS-safe, env vars trusted, client-side code doesn't need auth, shell scripts need concrete untrusted input path) - 8/10 confidence gate — below threshold = don't report - Independent sub-agent verification for each finding - Exploit scenario requirement per finding - Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block, removes bloated "What's new" section, adds /cso to skills table and install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage - Exclusion #10: test files must verify not imported by non-test code - Exclusion #13: distinguish user-message AI input from system-prompt injection - Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude - Add anti-manipulation rule: ignore audit-influencing instructions in codebase - Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8 - Fix verifier anchoring: send only file+line, not category/description - Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8) - Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): correct skill counts, add /autoplan to README tables Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28). Added /autoplan to specialist table. Fixed troubleshooting skills list to include all skills added since v0.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): DNS rebinding protection for SSRF blocklist validateNavigationUrl is now async — resolves hostname to IP and checks against blocked metadata IPs. Prevents DNS rebinding where evil.com initially resolves to a safe IP, then switches to 169.254.169.254. All callers updated to await. Tests updated for async assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): lockfile prevents concurrent server start races Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent TOCTOU race where two CLI invocations could both kill the old server and start new ones, leaving an orphaned chromium process. Second caller now waits for the first to finish starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): improve storage redaction — word-boundary keys + more value prefixes Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats _ as word char). Now correctly redacts auth_token, session_token while skipping keyboardShortcuts, monkeyPatch, primaryKey. Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-), Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim 5 templates and 2 bin scripts still used eval $(gstack-slug). All now use source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3 CHANGELOG entry that falsely claimed eval was fully eliminated — it was the output sanitization that made it safe, not a calling convention change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add /autoplan to install instructions, regen skill docs The install instruction blocks and troubleshooting section were missing /autoplan. All three skill list locations now include the complete 28-skill set. Regenerated codex/agents SKILL.md files to match template changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.11.0.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(cso): add disclaimer — not a substitute for professional security audits LLMs can miss subtle vulns and produce false negatives. For production systems with sensitive data, hire a real firm. /cso is a first pass, not your only line of defense. Disclaimer appended to every report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com> Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Orkun Duman <orkun1675@gmail.com> |
||
|
|
d7c732b282 |
fix: Windows support — Node.js server fallback for Playwright (#255)
* fix: Windows support — Node.js server fallback for Playwright Setup hangs on Windows 11 because Bun's child_process can't handle Playwright's --remote-debugging-pipe (fd 3/4 pipe handles). Fall back to Node.js on Windows for both the setup verification and server runtime. macOS/Linux completely unaffected — all Windows code behind IS_WINDOWS / process.platform === 'win32' guards. Based on community PR #194 by @sozairali. Fixed sed -i portability (perl -pi -e) in build-node-server.sh for macOS compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cross-platform path handling for Windows compatibility Replace hardcoded '/tmp' and 'dir + "/"' path checks with platform-aware constants from new platform.ts module. On macOS/Linux this evaluates identically ('/tmp', '/'); on Windows it uses os.tmpdir() and path.sep. Zero behavior change on Unix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add tests for Windows polyfill, platform constants, and Node server resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: Windows support in README + CHANGELOG (v0.9.1.1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.9.3.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
c0f3c3a91a |
fix: security hardening + issue triage (v0.8.3) (#205)
* fix: check for bun before running setup (#147) Users without bun installed got a cryptic "command not found" error. Now prints a clear message with install instructions. Closes #147 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via URL validation in browse commands (#17) Adds validateNavigationUrl() that blocks non-HTTP(S) schemes (file://, javascript:, data:) and cloud metadata endpoints (169.254.169.254, metadata.google.internal). Applied to goto, diff, and newTab commands. Localhost and private IPs remain allowed for local dev QA. Closes #17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace eval $(gstack-slug) with source <(...) (#133) Eliminates unnecessary use of eval across all skill templates and generated files. source <(...) has identical behavior without the shell injection surface. Also hardens gstack-diff-scope usage. Closes #133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rename /debug to /investigate to avoid Claude Code conflict (#190) Claude Code has a built-in /debug command that shadows the gstack skill. Renaming to /investigate which better reflects the systematic root-cause investigation methodology. Closes #190 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add unit tests for path validation helpers validateOutputPath() and validateReadPath() are security-critical functions with zero test coverage. Adds 14 tests covering safe paths, traversal attacks, and prefix collision edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.8.3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update /debug → /investigate references in docs CLAUDE.md, README.md, and docs/skills.md still referenced the old /debug skill name after the rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden URL validation against hostname bypasses (Codex P1) Codex review found that metadata IPs could be reached via hex (0xA9FEA9FE), decimal (2852039166), octal, trailing dot, and IPv6 bracket forms. Now normalizes hostnames before checking the blocklist and probes numeric IP representations via URL constructor. Also moves URL validation before page allocation in newTab() to prevent zombie tabs on rejection (Codex P3). 5 new test cases for bypass variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
2d97ab9931 |
feat: browse handoff — headless-to-headed browser switching (v0.7.4) (#201)
* feat: browse handoff — headless-to-headed browser switching Add `handoff` and `resume` commands that let users take over a visible Chrome when the headless browser gets stuck (CAPTCHAs, auth walls, MFA). Architecture: launch-first-close-second for safe rollback. State transfer via extracted saveState()/restoreState() helpers (DRY with recreateContext). Auto-handoff hint after 3 consecutive command failures. * test: handoff unit + integration tests (15 tests) Covers saveState/restoreState, failure tracking, edge cases (already headed, resume without handoff), and full integration flow with cookie and tab preservation across headless-to-headed switch. * docs: handoff section in browse template + TODOS update Add User Handoff section to browse/SKILL.md.tmpl with usage examples. Update State Persistence TODO noting saveState/restoreState reusability. * chore: bump version and changelog (v0.7.4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1a100a2a23 |
fix: eliminate duplicate command sets in chain, improve flush perf and type safety
- Remove duplicate CHAIN_READ/CHAIN_WRITE/CHAIN_META sets from meta-commands.ts and import from commands.ts (single source of truth). The duplicated sets would silently fail to route new commands added to commands.ts. - Replace read+concat+write log flush with fs.appendFileSync — O(new entries) instead of O(total log size) per flush cycle. - Replace `any` types for contextOptions with Playwright's BrowserContextOptions and add proper types for storage state in recreateContext(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f3ee0ee28a |
feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83)
* feat: browser ref staleness detection via async count() validation resolveRef() now checks element count to detect stale refs after page mutations (e.g. SPA navigation). RefEntry stores role+name metadata for better diagnostics. 3 new snapshot tests for staleness detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: qa-only skill, qa fix loop, plan-to-QA artifact flow Add /qa-only (report-only, Edit tool blocked), restructure /qa with find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for shared methodology. /plan-eng-review now writes test-plan artifacts to ~/.gstack/projects/<slug>/ for QA consumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: eval efficiency metrics — turns, duration, commentary across all surfaces Add generateCommentary() for natural-language delta interpretation, per-test turns/duration in comparison and summary output, judgePassed unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0 - ARCHITECTURE: add ref staleness detection section, update RefEntry type - BROWSER: add ref staleness paragraph to snapshot system docs - CONTRIBUTING: update eval tool descriptions with commentary feature - README: fix missing qa-only in project-local uninstall command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add user-facing benefit descriptions to v0.4.0 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
2aa745cb0e |
feat: screenshot element/region clipping (v0.3.7) (#56)
* feat: screenshot element/region clipping (--clip, --viewport, CSS/@ref)
Add element crop (CSS selector or @ref), region clip (--clip x,y,w,h),
and viewport-only (--viewport) modes to the screenshot command. Uses
Playwright's native locator.screenshot() and page.screenshot({ clip }).
Full page remains the default. Includes 10 new tests covering all modes
and error paths.
* chore: bump version and changelog (v0.3.7)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add screenshot modes to BROWSER.md command reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
f7b95329c1 |
feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)
* Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots - CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift) - Async buffer flush with Bun.write() (was appendFileSync) - Dialog auto-accept/dismiss with buffer + prompt text support - File upload command (upload <sel> <file...>) - Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused) - Annotated screenshots with ref labels overlaid (-a flag) - Snapshot diffing against previous snapshot (-D flag) - Cursor-interactive element scan for non-ARIA clickables (-C flag) - Snapshot scoping depth limit (-d N flag) - Health check with page.evaluate + 2s timeout - Playwright error wrapping — actionable messages for AI agents - Fix useragent — context recreation preserves cookies/storage/URLs - wait --networkidle / --load / --domcontentloaded flags - console --errors filter (error + warning only) - cookie-import <json-file> with auto-fill domain from page URL - 166 integration tests (was ~63) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 2: Rewrite SKILL.md as QA playbook + command reference Reorient SKILL.md files from raw command reference to QA-first playbook with 10 workflow patterns (test user flows, verify deployments, dogfood features, responsive layouts, file upload, forms, dialogs, compare pages). Compact command reference tables at the bottom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 3: /qa skill — systematic QA testing with health scores New /qa skill for systematic web app QA testing. Three modes: - full: 5-10 documented issues with screenshots and repro steps - quick: 30-second smoke test with health score - regression: compare against saved baseline Includes issue taxonomy (7 categories, 4 severity levels), structured report template, health score rubric (weighted across 7 categories), framework detection guidance (Next.js, Rails, WordPress, SPA). Also adds browse/bin/find-browse (DRY binary discovery using git rev-parse), .gstack/ to .gitignore, and updated TODO roadmap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Bump to v0.3.0 — Phase 2 + Phase 3 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: cookie-import-browser — Chromium cookie decryption module + tests Pure logic module for reading and decrypting cookies from macOS Chromium browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC encryption with macOS Keychain access, PBKDF2 key derivation, and per-browser key caching. 18 unit tests with encrypted cookie fixtures. * feat: cookie picker web UI + route handler Two-panel dark-theme picker served from the browse server. Left panel shows source browser domains with search and import buttons. Right panel shows imported domains with trash buttons. No cookie values exposed. 6 API endpoints, importedDomains Set tracking, inline clearCookies. * feat: wire cookie-import-browser into browse server Add cookie-picker route dispatch (no auth, localhost-only), add cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort property to BrowserManager, add write command with two modes (picker UI vs --domain direct import), update CLI help text. * chore: /setup-browser-cookies skill + docs (Phase 3.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: redact sensitive values from command output (PR #21) type no longer echoes text (reports character count), cookie redacts value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token, storage set drops value, forms redacts password fields. Prevents secrets from persisting in LLM transcripts. 7 new tests. Credit: fredluz (PR #21) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: path traversal prevention for screenshot/pdf/eval (PR #26) Add validateOutputPath() for screenshot/pdf/responsive (restricts to /tmp and cwd) and validateReadPath() for eval (blocks .. sequences and absolute paths outside safe dirs). 7 new tests. Credit: Jah-yee (PR #26) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install Playwright Chromium in setup (PR #22) Setup now verifies Playwright can launch Chromium, and auto-installs it via `bunx playwright install chromium` if missing. Exits non-zero if build or Chromium launch fails. Credit: AkbarDevop (PR #22) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: fix path validation bypass, CORS restriction, cookie-import path check - startsWith('/tmp') matched '/tmpevil' — now requires trailing slash - CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port> - cookie-import now validates file paths (was missing validateReadPath) - 3 new tests for prefix collision and cookie-import path traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review informational issues + add regression tests - Add cookie-import to CHAIN_WRITE set for chain command routing - Add path validation to snapshot -a -o output path - Fix package.json version to match 0.3.1 - Use crypto.randomUUID() for temp DB paths (unpredictable filenames) - Add regression tests for chain cookie-import and snapshot path validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add /qa, /setup-browser-cookies to README + update BROWSER.md - Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs - Add dedicated README sections for both new skills with usage examples - Update demo workflow to show cookie import → QA → browse flow - Update BROWSER.md: cookie import commands, new source files, test count (203) - Update skill count from 6 to 8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: team-aware /retro v2.0 — per-person praise and growth opportunities - Identify current user via git config, orient narrative as "you" vs teammates - Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions - New "Your Week" section with personal deep-dive for whoever runs the command - New "Team Breakdown" with per-person praise and growth opportunities - Track AI-assisted commits via Co-Authored-By trailers - Personal + team shipping streaks - Tone: praise like a 1:1, growth like investment advice, never compare negatively Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add Conductor parallel sessions section to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3d901066cd |
Initial release — gstack v0.0.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |