gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Author	SHA1	Message	Date
Garry Tan	c15b805cd8	feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 ) * feat(browse): TabSession loadedHtml + command aliases + DX polish primitives Adds the foundation layer for Puppeteer-parity features: - TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml — enables load-html content to survive context recreation (viewport --scale) via in-memory replay. ASCII lifecycle diagram in the source explains the clear-before-navigation contract. - COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth for name aliases (setcontent / set-content / setContent → load-html), consumed by server dispatch and chain prevalidation. - buildUnknownCommandError() pure function — rich error messages with Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints. - load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write tokens can use it. - screenshot and viewport descriptions updated for upcoming flags. - New browse/test/dx-polish.test.ts (15 tests): alias canonicalization, Levenshtein threshold + alphabetical tiebreak, short-input guard, NEW_IN_VERSION upgrade hint, alias + scope integration invariants. No consumers yet — pure additive foundation. Safe to bisect on its own. * feat(browse): accept file:// in goto with smart cwd/home-relative parsing Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs (cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a new normalizeFileUrl() helper that handles non-standard relative forms BEFORE the WHATWG URL parser sees them: file:///abs/path.html → unchanged file://./docs/page.html → file://<cwd>/docs/page.html file://~/Documents/page.html → file://<HOME>/Documents/page.html file://docs/page.html → file://<cwd>/docs/page.html file://localhost/abs/path → unchanged file://host.example.com/... → rejected (UNC/network) file:// and file:/// → rejected (would list a directory) Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x, file://[::1]/x, and file://C:/Users/x are explicit errors. Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so URL escapes like %20 decode correctly and Node rejects encoded-slash traversal (%2F..%2F) outright. Signature change: validateNavigationUrl now returns Promise<string> (the normalized URL) instead of Promise<void>. Existing callers that ignore the return value still compile — they just don't benefit from smart-parsing until updated in follow-up commits. Callers will be migrated in the next few commits (goto, diff, newTab, restoreState). Rewrites the url-validation test file: updates existing tests for the new return type, adds 20+ new tests covering every normalizeFileUrl shape variant, URL-encoding edge cases, and path-traversal rejection. References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath. * feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing Three tightly-coupled changes to BrowserManager, all in service of the Puppeteer-parity workflow: 1. deviceScaleFactor + currentViewport tracking. New private fields (default scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method. deviceScaleFactor is a context-level Playwright option — changing it requires recreateContext(). The method validates (finite number, 1-3 cap, headed-mode rejected), stores new values, calls recreateContext(), and rolls back the fields on failure so a bad call doesn't leave inconsistent state. Context options at all three sites (launch, recreate happy path, recreate fallback) now honor the stored values instead of hardcoding 1280x720. 2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab loadedHtml from the session; restoreState replays it via newSession. setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml is rehydrated and survives subsequent scale changes. In-memory only, never persisted to disk (HTML may contain secrets or customer data). 3. newTab + restoreState now consume validateNavigationUrl's normalized return value. file://./x, file://~/x, and bare-segment forms now take effect at every navigation site, not just the top-level goto command. Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5 → screenshot, with content surviving both context recreations. Codex v2 P0 flagged that bare page.setContent in restoreState would lose content on the second scale change — this commit implements the rehydration path. References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller return value), plan Feature 3 + Feature 4. * feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch Wires the new handlers and dispatch logic that the previous commits made possible: write-commands.ts - New 'load-html' case: validateReadPath for safe-dir scoping, stat-based actionable errors (not found, directory, oversize), extension allowlist (.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like <div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES override, frame-context rejection. Calls session.setTabContent() so replay metadata is rehydrated. - viewport command extended: optional [<WxH>], optional [--scale <n>], scale-only variant reads current size via page.viewportSize(). Invalid scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed mode rejected explicitly. - clearLoadedHtml() called BEFORE goto/back/forward/reload navigation (not after) so a timed-out goto post-commit doesn't leave stale metadata that could resurrect on a later context recreation. Codex v2 P1 catch. - goto uses validateNavigationUrl's normalized return value. meta-commands.ts - screenshot --selector <css> flag: explicit element-screenshot form. Rejects alongside positional selector (both = error), preserves --clip conflict at line 161, composes with --base64 at lines 168-174. - chain canonicalizes each step with canonicalizeCommand — step shape is now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has, watch blocking, and result labels all use canonical names while audit labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape only canonicalized at prevalidation and diverged everywhere else. - diff command consumes validateNavigationUrl return value for both URLs. server.ts - Command canonicalization inserted immediately after parse, before scope / watch / tab-ownership / content-wrapping checks. rawCommand preserved for future audit (not wired into audit log in this commit — follow-up). - Unknown-command handler replaced with buildUnknownCommandError() from commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional upgrade hint for NEW_IN_VERSION entries. security-audit-r2.test.ts - Updated chain-loop marker from 'for (const cmd of commands)' to 'for (const c of commands)' to match the new chain step shape. Same isWatching + BLOCKED invariants still asserted. * chore: bump version and changelog (v1.1.0.0) - VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands) - package.json: matching version bump - CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector, viewport --scale, file:// support, setContent replay, and DX polish in user voice with a dedicated Security section for file:// safe-dirs policy - browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13 "Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by- side API mapping and a worked tweet-renderer migration example - browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs` to reflect the new command descriptions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (9 findings from specialist + adversarial review) Adversarial review (Claude subagent + Codex) surfaced 9 bugs across CRITICAL/HIGH severity. All fixed: 1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent await. Prior order left phantom HTML in replay metadata if setContent threw (timeout, browser crash), which a later viewport --scale would silently replay. Now loadedHtml is only recorded on successful load. 2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second recreateContext after restoring the old fields. The fallback path in the original recreateContext builds a blank context using whatever this.deviceScaleFactor/currentViewport hold at that moment (which were the NEW values we were trying to apply). Rolling back the fields without a second recreate left the live context at new-scale while state tracked old-scale. Now: restore fields, force re-recreate with old values, only if that ALSO fails do we return a combined error. 3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted alphabetically, so first equal-distance wins by default. The prior '(d === bestDist && best !== undefined && cand < best)' clause was dead code. 4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just refs + frame. Without this, a user who load-html'd then clicked a link (or had a form submit / JS redirect / OAuth flow) would retain the stale replay metadata. The next viewport --scale would silently revert the tab to the ORIGINAL loaded HTML, losing whatever the post-navigation content was. Silent data corruption. Browser-emitted navigations trigger this path via wirePageEvents. 5. browser-manager.ts:saveState + restoreState — tab ownership now flows through BrowserState.owner. Without this, a scoped agent's viewport --scale would strand them: tab IDs change during recreate, ownership map held stale IDs, owner lookup failed. New IDs had no owner, so writes without tabId were denied (DoS). Worse, if the agent sent a stale tabId the server's swallowed-tab-switch-error path would let the command hit whatever tab was currently active (cross-tab authz bypass). Now: clear ownership before restore, re-add per-tab with new IDs. 6. meta-commands.ts:state load — disk-loaded state.pages is now explicit allowlist (url, isActive, storage:null) instead of object spread. Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a user-writable state file, letting a tampered state.json smuggle HTML past load-html's safe-dirs / extension / magic-byte / 50MB-cap validators, or forge tab ownership. Now stripped at the boundary. 7. url-validation.ts:normalizeFileUrl — preserves query string + fragment across normalization. file://./app.html?route=home#login previously resolved to a filesystem path that URL-encoded '?' as %3F and '#' as %23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs and fixture URLs with query params 404'd or loaded the wrong route. Now: split on ?/# before path resolution, reattach after. 8. url-validation.ts:validateNavigationUrl — reattaches parsed.search + parsed.hash to the normalized file:// URL. Same fix at the main validator for absolute paths that go through fileURLToPath round-trip. 9. server.ts:writeAuditEntry — audit entries now include aliasOf when the user typed an alias ('setcontent' → cmd: 'load-html', aliasOf: 'setcontent'). Previously the isAliased variable was computed but dropped, losing the raw input from the forensic trail. Completes the plan's codex v3 P2 requirement. Also added bm.getCurrentViewport() and switched 'viewport --scale'- without-size to read from it (more reliable than page.viewportSize() on headed/transition contexts). Tests pass: exit 0, no failures. Build clean. * test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases Adds 28 Playwright-integration tests that close the coverage gap flagged by the ship-workflow coverage audit (50% → expected ~80%+). load-html (12 tests): - happy path loads HTML file, page text matches - bare HTML fragments (<div>...</div>) accepted, not just full documents - missing file arg throws usage - non-.html extension rejected by allowlist - /etc/passwd.html rejected by safe-dirs policy - ENOENT path rejected with actionable "not found" error - directory target rejected - binary file (PNG magic bytes) disguised as .html rejected by magic-byte check - UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted - --wait-until networkidle exercises non-default branch - invalid --wait-until value rejected - unknown flag rejected screenshot --selector (5 tests): - --selector flag captures element, validates Screenshot saved (element) - conflicts with positional selector (both = error) - conflicts with --clip (mutually exclusive) - composes with --base64 (returns data:image/png;base64,...) - missing value throws usage viewport --scale (5 tests): - WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23) - --scale without WxH keeps current size + applies scale - non-finite value (abc) throws "not a finite number" - out-of-range (4, 0.5) throws "between 1 and 3" - missing value throws setContent replay across context recreation (3 tests): - load-html → viewport --scale 2: content survives (hits setTabContent replay path) - double cycle 2x → 1.5x: content still survives (proves TabSession rehydration) - goto after load-html clears replay: subsequent viewport --scale does NOT resurrect the stale HTML (validates the onMainFrameNavigated fix) Command aliases (2 tests): - setcontent routes to load-html via chain canonicalization - set-content (hyphenated) also routes — both end-to-end through chain dispatch Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is /var/folders/... on macOS and outside the safe-dirs boundary. Chain result labels use rawName→name format when an alias is resolved (matches the meta-commands.ts chain refactor). Full suite: exit 0, 223/223 pass. * docs: update BROWSER.md + CHANGELOG for v1.1.0.0 BROWSER.md: - Command reference table updated: goto now lists file:// support, load-html added to Navigate row, viewport flagged with --scale option, screenshot row shows --selector + --base64 flags - Screenshot modes table adds the fifth mode (element crop via --selector flag) and notes the tag-selector-not-caught-positionally gotcha - New "Retina screenshots — viewport --scale" subsection explains deviceScaleFactor mechanics, context recreation side effects, and headed-mode rejection - New "Loading local HTML — goto file:// vs load-html" subsection explains the two paths, their tradeoffs (URL state, relative asset resolution), the safe-dirs policy, extension allowlist + magic-byte sniff, 50MB cap, setContent replay across recreateContext, and the alias routing (setcontent → load-html before scope check) CHANGELOG.md (v1.1.0.0 security section expanded, no existing content removed): - State files cannot smuggle HTML or forge tab ownership (allowlist on disk-loaded page fields) - Audit log records aliasOf when a canonical command was reached via an alias (setcontent → load-html) - load-html content clears on real navigations (clicks, form submits, JS redirects) — not just explicit goto. Also notes SPA query/fragment preservation for goto file:// Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:25:33 +08:00
Garry Tan	7e96fe299b	fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) (#988 ) * fix(security): validateOutputPath symlink bypass — check file-level symlinks validateOutputPath() previously only resolved symlinks on the parent directory. A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is /tmp, which is safe) but the write followed the symlink outside safe dirs. Add lstatSync() check: if the target file exists and is a symlink, resolve through it and verify the real target is within SAFE_DIRECTORIES. ENOENT (file doesn't exist yet) falls through to the existing parent-dir check. Closes #921 Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com> * fix(security): shell injection in bin/ scripts — use env vars instead of interpolation gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e double-quoted blocks. A path containing quotes or backticks breaks the JS string context, enabling arbitrary code execution. Replace direct interpolation with environment variables (process.env). Same fix applied to gstack-team-init which had the same pattern. Systematic audit confirmed only these two scripts were vulnerable — all other bin/ scripts already use stdin piping or env vars. Closes #858 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): cookie-import path validation bypass + hardcoded /tmp Two fixes: 1. cookie-import relative path bypass (#707): path.isAbsolute() gated the entire validation, so relative paths like "sensitive-file.json" bypassed the safe-directory check entirely. Now always resolves to absolute path with realpathSync for symlink resolution, matching validateOutputPath(). 2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used /tmp directly instead of os.tmpdir(), breaking Windows support. Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in write-commands.ts (previously resolved implicitly through bundler). Closes #852 Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com> * fix(security): redact form fields with sensitive names, not just type=password Form redaction only applied to type="password" fields. Hidden and text fields named csrf_token, api_key, session_id, etc. were exposed unredacted in LLM context, leaking secrets. Extend redaction to check field name and id against sensitive patterns: token, secret, key, password, credential, auth, jwt, session, csrf, sid, api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME. Closes #860 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): restrict session file permissions to owner-only Design session files written to /tmp with default umask (0644) were world-readable on shared systems. Sessions contain design prompts and feedback history. Set mode 0o600 (owner read/write only) on both create and update paths. Closes #859 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): enforce frozen lockfile during setup bun install without --frozen-lockfile resolves ^semver ranges from npm on every run. If an attacker publishes a compromised compatible version of any dependency, the next ./setup pulls it silently. Add --frozen-lockfile with fallback to plain install (for fresh clones where bun.lock may not exist yet). Matches the pattern already used in the .agents/ generation block (line 237). Closes #614 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci chmod -R 1777 /tmp recursively sets sticky bit on files (no defined behavior), not just the directory. Deduplicate to single chmod 1777 /tmp. Closes #747 Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com> * fix(security): learnings input validation + cross-project trust gate Three fixes to the learnings system: 1. Input validation in gstack-learnings-log: type must be from allowed list, key must be alphanumeric, confidence must be 1-10 integer, source must be from allowed list. Prevents injection via malformed fields. 2. Prompt injection defense: insight field checked against 10 instruction-like patterns (ignore previous, system:, override, etc.). Rejected with clear error message. 3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings from other projects are filtered out. Only user-stated learnings cross project boundaries. Prevents silent prompt injection across codebases. Also adds trusted field (true for user-stated source, false for AI-generated) to enable the trust gate at read time. Closes #841 Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com> * feat(security): track cookie-imported domains and scope cookie imports Foundation for origin-pinned JS execution (#616). Tracks which domains cookies were imported from so the JS/eval commands can verify execution stays within imported origins. Changes: - BrowserManager: new cookieImportedDomains Set with track/get/has methods - cookie-import: tracks imported cookie domains after addCookies - cookie-import-browser: tracks domains on --domain direct import - cookie-import-browser --all: new explicit opt-in for all-domain import (previously implicit behavior, now requires deliberate flag) Closes #615 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): pin JS/eval execution to cookie-imported origins When cookies have been imported for specific domains, block JS execution on pages whose origin doesn't match. Prevents the attack chain: 1. Agent imports cookies for github.com 2. Prompt injection navigates to attacker.com 3. Agent runs js document.cookie → exfiltrates github cookies assertJsOriginAllowed() checks the current page hostname against imported cookie domains with subdomain matching (.github.com allows api.github.com). When no cookies are imported, all origins allowed (nothing to protect). about:blank and data: URIs are allowed (no cookies at risk). Depends on #615 (cookie domain tracking). Closes #616 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): add persistent command audit log Append-only JSONL audit trail for all browse server commands. Unlike in-memory ring buffers, the audit log persists across restarts and is never truncated. Each entry records: timestamp, command, args (truncated to 200 chars), page origin, duration, status, error (truncated to 300 chars), hasCookies flag, connection mode. All writes are best-effort — audit failures never block command execution. Log stored at ~/.gstack/.browse/browse-audit.jsonl. Closes #617 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe (hex form), which was not in the blocklist. Similarly, ::169.254.169.254 normalizes to ::a9fe:a9fe. Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught by the direct hostname check in validateNavigationUrl. Closes #739 Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com> * chore: bump version and changelog (v0.16.4.0) Security wave 3: 12 fixes, 7 contributors. Cookie origin pinning, command audit log, domain tracking. Symlink bypass, path validation, shell injection, form redaction, learnings injection, IPv6 SSRF, session permissions, frozen lockfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com> Co-authored-by: Gus <garagon@users.noreply.github.com> Co-authored-by: Toby Morning <urbantech@users.noreply.github.com> Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com> Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com> Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com> Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:49:37 -10:00
Garry Tan	1868636f49	refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 ) * plan: batch command endpoint + multi-tab parallel execution for GStack Browser * refactor: extract TabSession from BrowserManager for per-tab state Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession class. BrowserManager delegates to the active TabSession via getActiveSession(). Zero behavior change — all existing tests pass. This is the foundation for the /batch endpoint: both /command and /batch will use the same handler functions with TabSession, eliminating shared state races during parallel tab execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: update handler signatures to use TabSession Change handleReadCommand and handleSnapshot to take TabSession instead of BrowserManager. Change handleWriteCommand to take both TabSession (per-tab ops) and BrowserManager (global ops like viewport, headers, dialog). handleMetaCommand keeps BrowserManager for tab management. Tests use thin wrapper functions that bridge the old 3-arg call pattern to the new signatures via bm.getActiveSession(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for parallel multi-tab execution Execute multiple commands across tabs in a single HTTP request. Commands targeting different tabs run concurrently via Promise.allSettled. Commands targeting the same tab run sequentially within that group. Features: - Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.) - newtab/closetab as special commands within batch - SSE streaming mode (stream: true) for partial results - Per-command error isolation (one tab failing doesn't abort the batch) - Max 50 commands per batch, soft batch-level timeout A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs in parallel, batched commands). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add batch endpoint integration tests 10 tests covering: - Multi-tab parallel execution (goto + text on different tabs) - Same-tab sequential ordering - Per-command error isolation (one tab fails, others succeed) - Page-scoped refs (snapshot refs are per-session, not global) - Per-tab lastSnapshot (snapshot -D with independent baselines) - getSession/getActiveSession API - Batch-safe command subset validation - closeTab via page.close preserves at-least-one-page invariant - Parallel goto on 3 tabs simultaneously Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25 The test was copying the full 55KB/1075-line codex SKILL.md into the fixture, requiring 8 Read calls just to consume it and exhausting the 15-turn budget before reaching the actual codex review command. Now extracts only the review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move batch endpoint plan into BROWSER.md as feature documentation The batch endpoint is implemented — document it as an actual feature in BROWSER.md (architecture, API shape, design decisions, usage pattern) and remove the standalone plan file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.16.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 00:23:36 -07:00
Garry Tan	03973c2fab	fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 ) * fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>	2026-04-06 00:47:04 -07:00
Garry Tan	3cda8deec9	fix: security audit round 2 (v0.13.4.0) (#640 ) * fix: chrome-cdp localhost-only binding Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1 and --remote-allow-origins to prevent network-accessible debugging sessions. Clears 1 Socket anomaly (Chrome CDP session exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extension sender validation + message type allowlist Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's message handler. Defense-in-depth against message spoofing from external extensions or future externally_connectable changes. Clears 2 Socket anomalies (extension permissions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: checksum-verified bun install Replace unverified curl\|bash bun installation with checksum-verified download-then-execute pattern. The install script is downloaded, sha256 verified against a known hash, then executed. Preserves the Bun-native install path without adding a Node/npm dependency. Clears Snyk W012 + 3 Socket anomalies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: content trust boundary markers in browse output Wrap page-content commands (text, html, links, forms, accessibility, console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT --- markers. Covers direct commands (server.ts), chain sub-commands, and snapshot output (meta-commands.ts). Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in commands.ts (single source of truth, DRY). Expands the SKILL.md trust warning with explicit processing rules for agents. Clears Snyk W011 (third-party content exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden trust boundary markers against escape attacks - Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent marker injection via history.pushState - Escape marker strings in content (zero-width space) so malicious pages can't forge the END marker to break out of the untrusted block - Wrap resume command snapshot with trust boundary markers - Wrap diff command output with trust boundary markers - Wrap watch stop last snapshot with trust boundary markers Found by cross-model adversarial review (Claude + Codex). * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: gitignore .factory/ and remove from tracking Factory Droid support was removed in this branch. The .factory/ directory was re-added by merging main (which had v0.13.5.0 Factory support). Gitignore it so it stays out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 22:46:33 -06:00
Garry Tan	7450b5160b	fix: security audit remediation — 12 fixes, 20 tests (v0.13.1.0) (#595 ) * fix: remove auth token from /health, secure extension bootstrap (CRITICAL-02 + HIGH-03) - Remove token from /health response (was leaked to any localhost process) - Write .auth.json to extension dir for Manifest V3 bootstrap - sidebar-agent reads token from state file via BROWSE_STATE_FILE env var - Remove getToken handler from extension (token via health broadcast) - Extension loads token before first health poll to prevent race condition * fix: require auth on cookie-picker data routes (CRITICAL-01) - Add Bearer token auth gate on all /cookie-picker/* data/action routes - GET /cookie-picker HTML page stays unauthenticated (UI shell) - Token embedded in served HTML for picker's fetch calls - CORS preflight now allows Authorization header * fix: add state file TTL and plaintext cookie warning (HIGH-02) - Add savedAt timestamp to state save output - Warn on load if state file older than 7 days - Auto-delete stale state files (>7 days) on server startup - Warning about plaintext cookie storage in save message * fix: innerHTML XSS in extension content script and sidepanel (MEDIUM-01) - content.js: replace innerHTML with createElement/textContent for ref panel - sidepanel.js: escape entry.command with escapeHtml() in activity feed - Both found by security audit + Codex adversarial red team * fix: symlink bypass in validateReadPath (MEDIUM-02) - Always resolve to absolute path first (fixes relative path bypass) - Use realpathSync to follow symlinks before boundary check - Throw on non-ENOENT realpathSync failures (explicit over silent) - Resolve SAFE_DIRECTORIES through realpathSync (macOS /tmp → /private/tmp) - Resolve directory part for non-existent files (ENOENT with symlinked parent) * fix: freeze hook symlink bypass and prefix collision (MEDIUM-03) - Add POSIX-portable path resolution (cd + pwd -P, works on macOS) - Fix prefix collision: /project-evil no longer matches /project freeze dir - Use trailing slash in boundary check to require directory boundary * fix: shell script injection in gstack-config and telemetry (MEDIUM-04) - gstack-config: validate keys (alphanumeric+underscore only) - gstack-config: use grep -F (fixed string) instead of -E (regex) - gstack-config: escape sed special chars in values, drop newlines - gstack-telemetry-log: sanitize REPO_SLUG and BRANCH via json_safe() * test: 20 security tests for audit remediation - server-auth: verify token removed from /health, auth on /refs, /activity/* - cookie-picker: auth required on data routes, HTML page unauthenticated - path-validation: symlink bypass blocked, realpathSync failure throws - gstack-config: regex key rejected, sed special chars preserved - state-ttl: savedAt timestamp, 7-day TTL warning - telemetry: branch/repo with quotes don't corrupt JSON - adversarial: sidepanel escapes entry.command, freeze prefix collision * chore: bump version and changelog (v0.13.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: tone down changelog — defense in depth, not catastrophic bugs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 08:35:24 -06:00
Garry Tan	b343ba2797	fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552 ) * fix(security): skip hidden directories in skill template discovery discoverTemplates() scans subdirectories for SKILL.md.tmpl files but only skips node_modules, .git, and dist. Hidden directories like .claude/, .agents/, and .codex/ (which contain symlinked skill installs) were being scanned, allowing a malicious .tmpl in a symlinked skill to inject into the generation pipeline. Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips all dot-prefixed directories, matching the standard convention that hidden dirs are not source code. * fix(security): sanitize telemetry JSONL inputs against injection SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly into printf %s for JSONL output. If any contain double quotes, backslashes, or newlines, the JSON breaks — or worse, injects arbitrary fields. Fix: strip quotes, backslashes, and control characters from all string fields before JSONL construction via json_safe() helper. * fix(security): validate JSON input in gstack-review-log gstack-review-log appends its argument directly to a JSONL file with no validation. Malformed or crafted input could corrupt the review log or inject arbitrary content. Fix: validate input is parseable JSON via python3 before appending. Reject with exit 1 and stderr message if invalid. * fix: treat relative dot-paths as file paths in screenshot command Closes #495 * fix: use host-specific co-author trailer in /ship and /document-release Codex-generated skills hardcoded a Claude co-author trailer in commit messages. Users running gstack under Codex pushed commits attributed to the wrong AI assistant. Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer based on ctx.host: - claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> - codex: Co-Authored-By: OpenAI Codex <noreply@openai.com> Replace hardcoded trailers in ship/SKILL.md.tmpl and document-release/SKILL.md.tmpl with the resolver placeholder. Fixes #282. Fixes #383. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: auto-upgrade marker no longer masks newer remote versions When a just-upgraded-from marker persists across sessions, the update check would write UP_TO_DATE to cache and exit immediately — never fetching the remote VERSION. Users silently miss updates that landed after their last upgrade. Remove the early exit and premature cache write so the script falls through to the remote check after consuming the marker. This ensures JUST_UPGRADED is still emitted for the preamble, while also detecting any newer versions available upstream. Fixes #515 * fix: decouple doc generation from binary compilation in build script The build script chains gen:skill-docs and bun build --compile with &&, so a doc generation failure (e.g. missing Codex host config, template error) prevents the browse binary from being compiled. Users end up with a broken install where setup reports the binary is missing. Replace && with ; for the two gen:skill-docs steps so they run independently of the compilation chain. Doc generation errors are still visible in stderr, but no longer block binary compilation. Fixes #482 * fix: extend security sanitization + add 10 tests for merged community PRs - Extend json_safe() to ERROR_CLASS and FAILED_STEP fields - Improve ERROR_MESSAGE escaping to handle backslashes and newlines - Replace python3 with bun for JSON validation in gstack-review-log - Add 7 telemetry injection prevention tests - Add 2 review-log JSON validation tests - Add 1 discover-skills hidden directory filtering test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via) browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction) ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md dashboard-via: extract dashboard section only + increase timeout 90s→180s Root cause: copying full SKILL.md files into test fixtures caused context bloat, leading to timeouts and flaky turn limits. Extracting only the relevant section cut dashboard-via from timing out at 240s to finishing in 38s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add E2E fixture extraction rule to CLAUDE.md Never copy full SKILL.md files into E2E test fixtures. Extract only the section the test needs. Also: run targeted evals in foreground, never pkill and restart mid-run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize journey-think-bigger routing test Use exact trigger phrases from plan-ceo-review skill description ("think bigger", "expand scope", "ambitious enough") instead of the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut cost per attempt ($0.12 vs $0.25). Test remains periodic tier since LLM routing is inherently non-deterministic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * remove: delete journey-think-bigger routing test Never passed reliably. Tests ambiguous routing ("think bigger" → plan-ceo-review) but Claude legitimately answers directly instead of invoking a skill. The other 10 journey tests cover routing with clear, actionable signals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.7.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com> Co-authored-by: bluzername <bluzer@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>	2026-03-26 23:21:27 -06:00
Garry Tan	7665adf4fe	feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517 ) * feat: CDP connect — control real Chrome/Comet via Playwright Add `connectCDP()` to BrowserManager: connects to a running browser via Chrome DevTools Protocol. All existing browse commands work unchanged through Playwright's abstraction layer. - chrome-launcher.ts: browser discovery, CDP probe, auto-relaunch with rollback - browser-manager.ts: connectCDP(), mode guards (close/closeTab/recreateContext/handoff), auto-reconnect on browser restart, getRefMap() for extension API - server.ts: CDP branch in start(), /health gains mode field, /refs endpoint, idle timer only resets on /command (not passive endpoints) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse connect/disconnect/focus CLI commands - connect: pre-server command that discovers browser, starts server in CDP mode - disconnect: drops CDP connection, restarts in headless mode - focus: brings browser window to foreground via osascript (macOS) - status: now shows Mode: cdp \| launched \| headed - startServer() accepts extra env vars for CDP URL/port passthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: CDP-aware skill templates — skip cookie import in real browser mode Skills now check `$B status` for CDP mode and skip: - /qa: cookie import prompt, user-agent override, headless workarounds - /design-review: cookie import for authenticated pages - /setup-browser-cookies: returns "not needed" in CDP mode Regenerated SKILL.md files from updated templates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: activity streaming — SSE endpoint for Chrome extension Side Panel Real-time browse command feed via Server-Sent Events: - activity.ts: ActivityEntry type, CircularBuffer (capacity 1000), privacy filtering (redacts passwords, auth tokens, sensitive URL params), cursor-based gap detection, async subscriber notification - server.ts: /activity/stream SSE, /activity/history REST, handleCommand instrumented with command_start/command_end events - 18 unit tests for filterArgs privacy, emitActivity, subscribe lifecycle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Chrome extension Side Panel + Conductor API proposal Chrome extension (Manifest V3, sideload): - Side Panel with live activity feed, @ref overlays, dark terminal aesthetic - Background worker: health polling, SSE relay, ref fetching - Popup: port config, connection status, side panel launcher - Content script: floating ref panel with @ref badges Conductor API proposal (docs/designs/CONDUCTOR_SESSION_API.md): - SSE endpoint for full Claude Code session mirroring in Side Panel - Discovery via HTTP endpoint (not filesystem — extensions can't read files) TODOS.md: add $B watch, multi-agent tabs, cross-platform CDP, Web Store publishing. Mark CDP mode as shipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor runtime, skip osascript quit for sandboxed apps macOS App Management blocks Electron apps (Conductor) from quitting other apps via osascript. Now detects the runtime environment: - terminal/claude-code/codex: can manage apps freely - conductor: prints manual restart instructions + polls for 60s detectRuntime() checks env vars and parent process. When Chrome needs restart but we can't quit it, prints step-by-step instructions and waits for the user to restart Chrome with --remote-debugging-port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor via actual env vars (CONDUCTOR_WORKSPACE_NAME) Previous detection checked CONDUCTOR_WORKSPACE_ID which doesn't exist. Conductor sets CONDUCTOR_WORKSPACE_NAME, CONDUCTOR_BIN_DIR, CONDUCTOR_PORT, and __CFBundleIdentifier=com.conductor.app. Check these FIRST because Conductor sessions also have ANTHROPIC_API_KEY (which was matching claude-code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: connection status pill — floating indicator when gstack controls Chrome Small pill in bottom-right corner of every page: "● gstack · 3 refs" Shows when connected via CDP, fades to 30% opacity after 3s, full on hover. Disappears entirely when disconnected. Background worker now notifies content scripts on connect/disconnect state changes so the pill appears/disappears without polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome requires --user-data-dir for remote debugging Chrome refuses --remote-debugging-port without an explicit --user-data-dir. Add userDataDir to BrowserBinary registry (macOS Application Support paths) and pass it in both auto-launch and manual restart instructions. Fix double-quoting in CLI manual restart instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome must be fully quit before launching with --remote-debugging-port Chrome refuses to enable CDP on its default profile when another instance is running (even with explicit --user-data-dir). The only reliable path: fully quit Chrome first, then relaunch with the flag. Updated instructions to emphasize this clearly with verification step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: bin/chrome-cdp — quit Chrome and relaunch with CDP in one command Quits Chrome gracefully, waits for full exit, relaunches with --remote-debugging-port, polls until CDP is ready. Usage: chrome-cdp [port] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use Playwright channel:chrome instead of broken connectOverCDP Playwright's connectOverCDP hangs with Chrome 146 due to CDP protocol version mismatch. Switch to channel:'chrome' which uses Playwright's native pipe protocol to launch the system Chrome binary directly. This is simpler and more reliable: - No CDP port discovery needed - No --remote-debugging-port or --user-data-dir hassles - $B connect just works — launches real Chrome headed window - All Playwright APIs (snapshot, click, fill) work unchanged bin/chrome-cdp updated with symlinked profile approach (kept for manual CDP use cases, but $B connect no longer needs it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: green border + gstack label on controlled Chrome window Injects a 2px green border and small "gstack" label on every page loaded in the controlled Chrome window via context.addInitScript(). Users can instantly tell which Chrome window Claude controls. Also fixes close() for channel:chrome mode (uses browser.close() not browser.disconnect() which doesn't exist). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: cleanup chrome-launcher runtime detection, remove puppeteer-core dep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(design): redesign controlled Chrome indicator Replace crude green border + label with polished indicator: - 2px shimmer gradient at top edge (green→cyan→green, 3s loop) - Floating pill bottom-right with frosted glass bg, fades to 25% opacity after 4s so it doesn't compete with page content - prefers-reduced-motion disables shimmer animation - Much more subtle — looks like a developer tool, not broken CSS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document real browser mode + Chrome extension in BROWSER.md and README.md BROWSER.md: new sections for connect/disconnect/focus commands, Chrome extension Side Panel install, CDP-aware skills, activity streaming. Updated command reference table, key components, env vars, source map. README.md: updated /browse description, added "Real browser mode" to What's New section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: step-by-step Chrome extension install guide in BROWSER.md Replace terse bullet points with numbered walkthrough covering: developer mode toggle, load unpacked, macOS file picker tip (Cmd+Shift+G), pin extension, configure port, open side panel. Added troubleshooting section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Cmd+Shift+. tip for hidden folders in macOS file picker macOS hides folders starting with . by default. Added both shortcuts: Cmd+Shift+G (paste path directly) and Cmd+Shift+. (show hidden files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: integrate hidden folder tips into the install flow naturally Move Cmd+Shift+G and Cmd+Shift+. tips inline with the file picker step instead of as a separate tip block after it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-load Chrome extension when $B connect launches Chrome Extension auto-loads via --load-extension flag — no manual chrome://extensions install needed. findExtensionPath() checks repo root, global install, and dev paths. Also adds bin/gstack-extension helper for manual install in regular Chrome, and rewrites BROWSER.md install docs with auto-load as primary path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /connect-chrome skill — one command to launch Chrome with Side Panel New skill that runs $B connect, verifies the connection, guides the user to open the Side Panel, and demos the live activity feed. Extension auto-loads via --load-extension so no manual chrome://extensions install needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use launchPersistentContext for Chrome extension loading Playwright's chromium.launch() silently ignores --load-extension. Switch to launchPersistentContext with ignoreDefaultArgs to remove --disable-extensions flag. Use bundled Chromium (real Chrome blocks unpacked extensions). Fixed port 34567 for CDP mode so the extension auto-connects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sync extension to DESIGN.md — amber accent, zinc neutrals, grain texture Import design system from gstack-website. Update all extension colors: green (#4ade80) → amber (#F59E0B/#FBBF24), zinc gray neutrals, grain texture overlay. Regenerate icons as amber "G" monogram on dark background. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat with Claude Code — icon opens side panel directly Replace popup flyout with direct side panel open on icon click. Primary UI is now a chat interface that sends messages to Claude Code via file queue. Activity/Refs tabs moved behind a debug toggle in the footer. Command bar with history, auto-poll for responses, amber design system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent — Claude-powered chat backend via file queue Add /sidebar-command, /sidebar-response, and /sidebar-chat endpoints to the browse server. sidebar-agent.ts watches the command queue file, spawns claude -p with browse context for each message, and streams responses back to the sidebar chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate gstack pill overlay, hide crash restore bubble The addInitScript indicator and the extension's content script were both injecting bottom-right pills, causing duplicates. Remove the pill from addInitScript (extension handles it). Replace --restore-last-session with --hide-crash-restore-bubble to suppress the "Chromium didn't shut down correctly" dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: state file authority — CDP server cannot be silently replaced Hardens the connect/disconnect lifecycle: - ensureServer() refuses to auto-start headless when CDP server is alive - $B connect does full cleanup: SIGTERM → 2s → SIGKILL, profile locks, state - shutdown() cleans Chromium SingletonLock/Socket/Cookie files - uncaughtException/unhandledRejection handlers do emergency cleanup This prevents the bug where a headless server overwrites the CDP server's state file, causing $B commands to hit the wrong browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent streaming events + session state management Enhance sidebar-agent.ts with: - Live streaming of claude -p events (tool_use, text, result) to sidebar - Session state file for BROWSE_STATE_FILE propagation to claude subprocess - Improved logging (stderr, exit codes, event types) - stdin.end() to prevent claude waiting for input - summarizeToolInput() with path shortening for compact sidebar display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat UI — streaming events, agent status, reconnect retry Sidebar panel improvements: - Chat tab renders streaming agent events (tool_use, text, result) - Thinking dots animation while agent processes - Agent error display with styled error blocks - tryConnect() with 2s retry loop for initial connection - Debug tabs (Activity/Refs) hidden behind gear toggle - Clear chat button - Compact tool call display with path shortening Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: server-integrated sidebar agent with sessions and message queue Move the sidebar agent from a separate bun process into server.ts: - Agent spawns claude -p directly when messages arrive via /sidebar-command - In-memory chat buffer backed by per-session chat.jsonl on disk - Session manager: create, load, persist, list sessions - Message queue (cap 5) with agent status tracking (idle/processing/hung) - Stop/kill endpoints with queue dismiss support - /health now returns agent status + session info - All sidebar endpoints require Bearer auth - Agent killed on server shutdown - 120s timeout detects hung claude processes Eliminates: file-queue polling, separate sidebar-agent.ts process, stale auth tokens, state file conflicts between processes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: extension auth + token flow for server-integrated agent Update Chrome extension to use Bearer auth on all sidebar endpoints: - background.js captures auth token from /health, exposes via getToken msg - background.js sets openPanelOnActionClick for direct side panel access - sidepanel.js gets token from background, sends in all fetch headers - Health broadcasts include token so sidebar auto-authenticates - Removes popup from manifest — icon click opens side panel directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: self-healing sidebar — reconnect banner, state machine, copy button Sidebar UI now handles disconnection gracefully: - Connection state machine: connected → reconnecting → dead - Amber pulsing banner during reconnect (2s retry, 30 attempts) - Red "Server offline" banner with Reconnect + Copy /connect-chrome buttons - Green "Reconnected" toast that fades after 3s on successful reconnect - Copy button lets user paste /connect-chrome into any Claude Code session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: crash handling — save session, kill agent, distinct exit codes Hardened shutdown/crash behavior: - Browser disconnect exits with code 2 (distinct from crash code 1) - emergencyCleanup kills agent subprocess and saves session state - Clean shutdown saves session before exit (chat history persists) - Clear user message on browser disconnect: "Run $B connect to reconnect" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: worktree-per-session isolation for sidebar agent Each sidebar session gets an isolated git worktree so the agent's file operations don't conflict with the user's working directory: - createWorktree() creates detached HEAD worktree in ~/.gstack/worktrees/ - Falls back to main cwd for non-git repos or on creation failure - Handles collision cleanup from prior crashes - removeWorktree() cleans up on session switch and shutdown - worktreePath persisted in session.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): ISSUE-001 — disconnect blocked by CDP guard in ensureServer $B disconnect was routed through ensureServer() which refused to start a headless server when a CDP state file existed. Disconnect is now handled before ensureServer() (like connect), with force-kill + cleanup fallback when the CDP server is unresponsive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude binary path for daemon-spawned agent The browse server runs as a daemon and may not inherit the user's shell PATH. Add findClaudeBin() that checks ~/.local/bin/claude (standard install location), which claude, and common system paths. Shows a clear error in the sidebar chat if claude CLI is not found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude symlinks + check Conductor bundled binary posix_spawn fails on symlinks in compiled bun binaries. Now: - Checks Conductor app's bundled binary first (not a symlink) - Scans ~/.local/share/claude/versions/ for direct versioned binaries - Uses fs.realpathSync() to resolve symlinks before spawning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: compiled bun binary cannot posix_spawn — use external agent process Compiled bun binaries fail posix_spawn on ALL executables (even /bin/bash). The server now writes to an agent queue file, and a separate non-compiled bun process (sidebar-agent.ts) reads the queue, spawns claude, and POSTs events back via /sidebar-agent/event. Changes: - server.ts: spawnClaude writes to queue file instead of spawning directly - server.ts: new /sidebar-agent/event endpoint for agent → server relay - server.ts: fix result event field name (event.text vs event.result) - sidebar-agent.ts: rewritten to poll queue file, relay events via HTTP - cli.ts: $B connect auto-starts sidebar-agent as non-compiled bun process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: loading spinner on sidebar open while connecting to server Shows an amber spinner with "Connecting..." when the sidebar first opens, replacing the empty state. After the first successful /sidebar-chat poll: - If chat history exists: renders it immediately - If no history: shows the welcome message Prevents the jarring empty-then-populated flash on sidebar open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-friction side panel — auto-open on install, pill is clickable Three changes to eliminate manual side panel setup: - Auto-open side panel on extension install/update (onInstalled listener) - gstack pill (bottom-right) is now clickable — opens the side panel - Pill has pointer-events: auto so clicks always register (was: none) User no longer needs to find the puzzle piece icon, pin the extension, or know the side panel exists. It opens automatically on first launch and can be re-opened by clicking the floating gstack pill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: kill CDP naming, delete chrome-launcher.ts dead code The connectCDP() method and connectionMode: 'cdp' naming was a legacy artifact — real Chrome was tried but failed (silently blocks --load-extension), so the implementation already used Playwright's bundled Chromium via launchPersistentContext(). The naming was misleading. Changes: - Delete chrome-launcher.ts (361 LOC) — only import was in unreachable attemptReconnect() method - Delete dead attemptReconnect() and reconnecting field - Delete preExistingTabIds (was for protecting real Chrome tabs we never connect to) - Rename connectCDP() → launchHeaded() - Rename connectionMode: 'cdp' → 'headed' across all files - Replace BROWSE_CDP_URL/BROWSE_CDP_PORT env vars with BROWSE_HEADED=1 - Regenerate SKILL.md files for updated command descriptions - Move BrowserManager unit tests to browser-manager-unit.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: converge handoff into connect — extension loads on handoff Handoff now uses launchPersistentContext() with extension auto-loading, same as the connect/launchHeaded() path. This means when the agent gets stuck (2FA, CAPTCHA) and hands off to the user, the Chrome extension + side panel are available automatically. Before: handoff used chromium.launch() + newContext() — no extension After: handoff uses chromium.launchPersistentContext() — extension loads Also sets connectionMode to 'headed' and disables dialog auto-accept on handoff, matching connect behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gate sidebar chat behind --chat flag $B connect (default): headed Chromium + extension with Activity + Refs tabs only. No separate agent spawned. Clean, no confusion. $B connect --chat: same + Chat tab with standalone claude -p agent. Shows experimental banner: "Standalone mode — this is a separate agent from your workspace." Implementation: - cli.ts: parse --chat, set BROWSE_SIDEBAR_CHAT env, conditionally spawn sidebar-agent - server.ts: gate /sidebar-* routes behind chatEnabled, return 403 when disabled, include chatEnabled in /health response - sidepanel.js: applyChatEnabled() hides/shows Chat tab + banner - background.js: forward chatEnabled from health response - sidepanel.html/css: experimental banner with amber styling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: file drop relay + $B inbox command Sidebar agent now writes structured messages to .context/sidebar-inbox/ when processing user input. The workspace agent can read these via $B inbox to see what the user reported from the browser. File drop format: .context/sidebar-inbox/{timestamp}-observation.json { type, timestamp, page: {url}, userMessage, sidebarSessionId } Atomic writes (tmp + rename) prevent partial reads. $B inbox --clear removes messages after display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B watch — passive observation mode Claude enters read-only mode and captures periodic snapshots (every 5s) while the user browses. Mutation commands (click, fill, etc.) are blocked during watch. $B watch stop exits and returns a summary with the last snapshot. Requires headed mode ($B connect). This is the inverse of the scout pattern — the workspace agent watches through the browser instead of the sidebar relaying to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for sidebar-agent, file-drop, and watch mode 33 new tests covering: - Sidebar agent queue parsing (valid/malformed/empty JSONL) - writeToInbox file drop (directory creation, atomic writes, JSON format) - Inbox command (display, sorting, --clear, malformed file handling) - Watch mode state machine (start/stop cycles, snapshots, duration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: TODOS cleanup + Chrome vs Chromium exploration doc - Update TODOS.md: mark CDP mode, $B watch, sidebar scout as SHIPPED - Delete dead "cross-platform CDP browser discovery" TODO - Rename dependencies from "CDP connect" to "headed mode" - Add docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md memorializing the architecture exploration and decision to use Playwright Chromium Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Conductor Chrome sidebar integration design doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar-agent validates cwd before spawning claude The queue entry may reference a worktree that was cleaned up between sessions. Now falls back to process.cwd() if the path doesn't exist, preventing silent spawn failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gen-skill-docs resolver merge + preamble tier gate + plan file discovery The local RESOLVERS record in gen-skill-docs.ts was shadowing the imported canonical resolvers, causing stale test coverage and preamble generators to be used instead of the authoritative versions in resolvers/. Changes: - Merge imported RESOLVERS with local overrides (spread + override pattern) - Fix preamble tier gate: tier 1 skills no longer get AskUserQuestion format - Make plan file discovery host-agnostic (search multiple plan dirs) - Add missing E2E tier entries for ship/review plan completion tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ungate sidebar agent + raise timeout to 5 minutes (v0.12.0) Sidebar chat is now always available in headed mode — no --chat flag needed. Agent tasks get 5 minutes instead of 2, enabling multi-page workflows like navigating directories and filling forms across pages. Changes: - cli.ts: remove --chat flag, always set BROWSE_SIDEBAR_CHAT=1, always spawn agent - server.ts: remove chatEnabled gate (403 response), raise AGENT_TIMEOUT_MS to 300s - sidebar-agent.ts: raise child process timeout from 120s to 300s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: headed mode + sidebar agent documentation (v0.12.0) - README: sidebar agent section, personal automation example (school parent portal), two auth paths (manual login + cookie import), DevTools MCP mention - BROWSER.md: sidebar agent section with usage, timeout, session isolation, authentication, and random delay documentation - connect-chrome template: add sidebar chat onboarding step - CHANGELOG: v0.12.0 entry covering headed mode, sidebar agent, extension - VERSION: bump to 0.12.0.0 - TODOS: Chrome DevTools MCP integration as P0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files Generated from updated templates + resolver merge. Key changes: - Tier 1 skills no longer include AskUserQuestion format section - Ship/review skills now include coverage gate with thresholds - Connect-chrome skill includes sidebar chat onboarding step - Plan file discovery uses host-agnostic paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate Codex connect-chrome skill Updated preamble with proactive prompt and sidebar chat onboarding step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: network idle, state persistence, iframe support, chain pipe format (v0.12.1.0) (#516) * feat: network idle detection + chain pipe format - Upgrade click/fill/select from domcontentloaded to networkidle wait (2s timeout, best-effort). Catches XHR/fetch triggered by interactions. - Add pipe-delimited format to chain as JSON fallback: $B chain 'goto url \| click @e5 \| snapshot -ic' - Add post-loop networkidle wait in chain when last command was a write. - Frame-aware: commands use target (getActiveFrameOrPage) for locator ops, page-only ops (goto/back/forward/reload) guard against frame context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B state save/load + $B frame — new browse commands - state save/load: persist cookies + URLs to .gstack/browse-states/{name}.json File perms 0o600, name sanitized to [a-zA-Z0-9_-]. V1 skips localStorage (breaks on load-before-navigate). Load replaces session via closeAllPages(). - frame: switch command context to iframe via CSS selector, @ref, --name, or --url. 'frame main' returns to main frame. Execution target abstraction (getActiveFrameOrPage) across read-commands, snapshot, and write-commands. - Frame context cleared on tab switch, navigation, resume, and handoff. - Snapshot shows [Context: iframe src="..."] header when in frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add tests for network idle, chain pipe format, state, and frame - Network idle: click on fetch button waits for XHR, static click is fast - Chain pipe: pipe-delimited commands, quoted args, JSON still works - State: save/load round-trip, name sanitization, missing state error - Frame: switch to iframe + back, snapshot context header, fill in frame, goto-in-frame guard, usage error New fixtures: network-idle.html (fetch + static buttons), iframe.html (srcdoc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review fixes — iframe ref scoping, detached frame recovery, state validation - snapshot.ts: ref locators, cursor-interactive scan, and cursor locator now use target (frame-aware) instead of page — fixes @ref clicking in iframes - browser-manager.ts: getActiveFrameOrPage auto-recovers from detached frames via isDetached() check - meta-commands.ts: state load resets activeFrame, elementHandle disposed after contentFrame(), state file schema validation (cookies + pages arrays), filter empty pipe segments in chain tokenizer - write-commands.ts: upload command uses target.locator() for frame support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + rebuild binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 11:15:24 -06:00
Garry Tan	cf3582c637	fix: community security + stability fixes (wave 1) (#325 ) * feat: add /cso skill — OWASP Top 10 + STRIDE security audit * fix: harden gstack-slug against shell injection via eval Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output to prevent shell metacharacter injection when used with eval. Only affects self-hosted git servers with lax naming rules — GitHub and GitLab enforce safe characters already. Defense-in-depth. * fix(security): sanitize gstack-slug output against shell injection The gstack-slug script is consumed via eval $(gstack-slug) throughout skill templates. If a git remote URL contains shell metacharacters like $(), backticks, or semicolons, they would be executed by eval. Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and BRANCH before output. This preserves normal values while neutralizing any injection payload in malicious remote URLs. Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm After: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf- * fix(security): redact sensitive values in storage command output The browse `storage` command dumps all localStorage and sessionStorage as JSON. This can expose tokens, API keys, JWTs, and session credentials in QA reports and agent transcripts. Fix: redact values where the key matches sensitive patterns (token, secret, key, password, auth, jwt, csrf) or the value starts with known credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.). Redacted values show length to aid debugging: [REDACTED — 128 chars] * fix(browse): kill old server before restart to prevent orphaned chromium processes When the health check fails or the server connection drops, `ensureServer()` and `sendCommand()` would call `startServer()` without first killing the previous server process. This left orphaned `chrome-headless-shell` renderer processes running at ~120% CPU each. After several reconnect cycles (e.g. pages that crash during hydration or trigger hard navigations via `window.location.href`), dozens of zombie chromium processes accumulate and exhaust system resources. Fix: call `killServer()` on the stale PID before spawning a new server in both the `ensureServer()` unhealthy path and the `sendCommand()` connection- lost retry path. Fixes #294 * Fix YAML linter error: nested mapping in compact sequence entries Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations. This simple fix switches to block scalars (\|) to eliminate the ambiguity without changing runtime behavior. * fix(security): add Azure metadata endpoint to SSRF blocklist Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the existing AWS/GCP endpoints. Closes the coverage gap identified in #125. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for storage redaction Test key-based redaction (auth_token, api_key), value-based redaction (JWT prefix, GitHub PAT prefix), pass-through for normal keys, and length preservation in redacted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add community PR triage process to CONTRIBUTING.md Document the wave-based PR triage pattern used for batching community contributions. References PR #205 (v0.8.3) as the original example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: adjust test key names to avoid redaction pattern collision Rename testKey→testData and normalKey→displayName in storage tests to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key'). Also generate Codex variant of /cso skill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.9.10.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-noise /cso security audits with FP filtering (v0.11.0.0) Absorb Anthropic's security-review false positive filtering into /cso: - 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only, regex injection, race conditions unless concrete, etc.) - 9 precedents (React XSS-safe, env vars trusted, client-side code doesn't need auth, shell scripts need concrete untrusted input path) - 8/10 confidence gate — below threshold = don't report - Independent sub-agent verification for each finding - Exploit scenario requirement per finding - Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block, removes bloated "What's new" section, adds /cso to skills table and install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage - Exclusion #10: test files must verify not imported by non-test code - Exclusion #13: distinguish user-message AI input from system-prompt injection - Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude - Add anti-manipulation rule: ignore audit-influencing instructions in codebase - Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8 - Fix verifier anchoring: send only file+line, not category/description - Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8) - Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): correct skill counts, add /autoplan to README tables Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28). Added /autoplan to specialist table. Fixed troubleshooting skills list to include all skills added since v0.7.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): DNS rebinding protection for SSRF blocklist validateNavigationUrl is now async — resolves hostname to IP and checks against blocked metadata IPs. Prevents DNS rebinding where evil.com initially resolves to a safe IP, then switches to 169.254.169.254. All callers updated to await. Tests updated for async assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): lockfile prevents concurrent server start races Adds exclusive lockfile (O_CREAT\|O_EXCL) around ensureServer to prevent TOCTOU race where two CLI invocations could both kill the old server and start new ones, leaving an orphaned chromium process. Second caller now waits for the first to finish starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): improve storage redaction — word-boundary keys + more value prefixes Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats _ as word char). Now correctly redacts auth_token, session_token while skipping keyboardShortcuts, monkeyPatch, primaryKey. Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-), Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim 5 templates and 2 bin scripts still used eval $(gstack-slug). All now use source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3 CHANGELOG entry that falsely claimed eval was fully eliminated — it was the output sanitization that made it safe, not a calling convention change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs): add /autoplan to install instructions, regen skill docs The install instruction blocks and troubleshooting section were missing /autoplan. All three skill list locations now include the complete 28-skill set. Regenerated codex/agents SKILL.md files to match template changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.11.0.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(cso): add disclaimer — not a substitute for professional security audits LLMs can miss subtle vulns and produce false negatives. For production systems with sensitive data, hire a real firm. /cso is a first pass, not your only line of defense. Disclaimer appended to every report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com> Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Orkun Duman <orkun1675@gmail.com>	2026-03-22 13:19:10 -07:00
Garry Tan	318ffdbdf0	fix: js statement wrapping + click auto-routes option to selectOption (v0.4.5) (#117 ) * fix: js statement wrapping + click auto-routes option to selectOption Bug 1: js command wrapped all code as expressions — const, semicolons, and multi-line code broke with SyntaxError. Added needsBlockWrapper() and wrapForEvaluate() helpers (shared with eval) to detect statements and use block wrapper {…} instead of expression wrapper (…). Bug 2: clicking <option> refs hung forever because Playwright can't .click() native select UI. Click handler now checks ARIA role + DOM tagName and auto-routes to selectOption() via parent <select>. Bug 3: click timeouts on <option> elements gave no guidance. Now throws helpful error: "Use browse select instead of click." * chore: bump version and changelog (v0.4.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 21:50:43 -05:00
Garry Tan	78e519e3b7	feat: await support in browse js/eval + contributor mode v2 (#104 ) * feat: support await in $B js and eval commands Auto-wrap await expressions in async IIFE context so $B js "await fetch(...)" works without SyntaxError. - hasAwait() strips comments before detection - js: expression wrapping (async()=>(expr))() - eval: smart wrapping — single-line=expression, multi-line=block - 6 new unit tests covering async, false-positive, and return semantics * feat: redesign contributor mode — periodic reflection with 0-10 rating Replace passive "report when things break" with active reflection: - Rate gstack experience 0-10 at workflow step boundaries - Historical calibration example (await bug) anchors the reporting bar - "What would make this a 10" field focuses on actionable improvements - Removed category lists in favor of judgment-based assessment * test: add deterministic contributor mode preamble validation 40 new skill-validation tests (4 checks × 10 skills) verify: - 0-10 rating scale present - Calibration example present - "What would make this a 10" field present - Periodic reflection (not per-command) Update existing E2E contributor eval for new report format. * chore: bump version and changelog (v0.4.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve contributor mode + qa-quick E2E reliability Contributor mode: - Add "do not truncate" directive to template — agent was stopping after "My rating" without completing Steps/Raw output/What would make this a 10 sections - Restore assertions for Steps to reproduce and Date footer QA quick: - Make test server URL prominent: top of prompt, explicit "already running" and "do NOT discover ports" instructions - Bump session timeout 180s→240s and test timeout 240s→300s - Set B= at top of prompt (was buried in prose) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use flexible assertions for contributor mode E2E Agent writes thorough reports with creative section names ("Repro Steps" vs "Steps to reproduce"). Match intent not formatting: - /repro\|steps to reproduce/ for reproduction steps - /date.2026/ for date footer presence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> docs: add E2E eval failure blame protocol "Not related to our changes" is an extraordinary claim that requires extraordinary proof. When evals fail during /ship: 1. Run the same eval on main — prove it fails there too 2. If it passes on main, it IS your change — trace the blame 3. If you can't verify, say "unverified" not "pre-existing" Added to CLAUDE.md and as a comment in skill-e2e.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update CONTRIBUTING.md and BROWSER.md for v0.4.2 CONTRIBUTING.md: update contributor mode description — now describes periodic 0-10 reflection loop instead of passive friction detection. BROWSER.md: add js/eval async documentation — await expressions are auto-wrapped in async context, single-line eval returns values directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore v0.4.2 changelog entries lost during cherry-pick conflict The base branch detection entries from main were dropped when resolving the CHANGELOG conflict — should have merged both sets, not replaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 11:28:58 -05:00
Garry Tan	2aa745cb0e	feat: screenshot element/region clipping (v0.3.7) (#56 ) * feat: screenshot element/region clipping (--clip, --viewport, CSS/@ref) Add element crop (CSS selector or @ref), region clip (--clip x,y,w,h), and viewport-only (--viewport) modes to the screenshot command. Uses Playwright's native locator.screenshot() and page.screenshot({ clip }). Full page remains the default. Includes 10 new tests covering all modes and error paths. * chore: bump version and changelog (v0.3.7) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add screenshot modes to BROWSER.md command reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:47:42 -07:00
Garry Tan	07b4e15b34	feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36 ) * fix: cookie import picker returns JSON instead of HTML jsonResponse() was defined at module scope but referenced `url` which only existed as a parameter of handleCookiePickerRoute(). Every API call crashed, the catch block also crashed, and Bun returned a default HTML page that the frontend couldn't parse as JSON. Thread port via corsOrigin() helper and options objects. Add route-level tests to prevent this class of bug from shipping again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add help command to browse server Agents that don't have SKILL.md loaded (or misread flags) had no way to self-discover the CLI. The help command returns a formatted reference of all commands and snapshot flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: version-aware find-browse with META signal protocol Agents in other workspaces found stale browse binaries that were missing newer flags. find-browse now compares the local binary's git SHA against origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE when behind. SKILL.md setup checks parse META signals and prompt the user to update. - New compiled binary: browse/dist/find-browse (TypeScript, testable) - Bash shim at browse/bin/find-browse delegates to compiled binary - .version file written at build time with git commit SHA - Build script compiles both browse and find-browse binaries - Graceful degradation: offline, missing .version, corrupt cache all skip check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: clean up .bun-build temp files after compile bun build --compile leaves ~58MB temp files in the working directory. Add rm -f ..bun-build to the build script to clean up after each build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> fix: make help command reachable by removing it from META_COMMANDS help was in META_COMMANDS, so it dispatched to handleMetaCommand() which threw "Unknown meta command: help". Removing it from the set lets the dedicated else-if handler in handleCommand() execute correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared Greptile comment triage reference doc Shared reference for fetching, filtering, and classifying Greptile review comments on GitHub PRs. Used by both /review and /ship skills. Includes parallel API fetching, suppressions check, classification logic, reply APIs, and history file writes. * feat: make /review and /ship Greptile-aware /review: Step 2.5 fetches and classifies Greptile comments, Step 5 resolves them with AskUserQuestion for valid issues and false positives. /ship: Step 3.75 triages Greptile comments between pre-landing review and version bump. Adds Greptile Review section to PR body in Step 8. Re-runs tests if any Greptile fixes are applied. * feat: add Greptile batting average to /retro Reads ~/.gstack/greptile-history.md, computes signal ratio (valid catches vs false positives), includes in metrics table, JSON snapshot, and Code Quality Signals narrative. * docs: add Greptile integration section to README Personal endorsement, two-layer review narrative, full UX walkthrough transcript, skills table updates. Add Greptile training feedback loop to TODO.md future ideas. * feat: add local dev mode for testing skills from within the repo bin/dev-setup creates .claude/skills/gstack symlink to the working tree so Claude Code discovers skills locally. bin/dev-teardown cleans up. DEVELOPING_GSTACK.md documents the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow gitignore to .claude/skills/ instead of all .claude/ Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Rewritten as a contributor-friendly guide instead of a dry plan doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add browser interaction guidance to CLAUDE.md Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared config module for project-local browse state Centralizes path resolution (git root detection, state dir, log paths) into config.ts. Both cli.ts and server.ts import from it, eliminating duplicated PORT_OFFSET/BROWSE_PORT/STATE_FILE logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rewrite port selection to use random ports Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port 10000-60000. Atomic state file writes, log paths from config module, binaryVersion field for auto-restart on update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: move browse state from /tmp to project-local .gstack/ CLI now uses config module for state paths, passes BROWSE_STATE_FILE to spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup with PID verification, and removes stale global install fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update crash log path reference to .gstack/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add config tests and update CLI lifecycle test 14 new tests for config resolution, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection. Remove obsolete CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update BROWSER.md and TODO.md for project-local state Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document random port selection and per-project isolation. Add server bundling TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 - README: replace Conductor-aware language with project-local isolation, add Greptile setup note - CHANGELOG: comprehensive v0.3.2 entry with all state management changes - CONTRIBUTING: add instructions for testing branches in other repos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff When on a feature branch, /qa now reads git diff main, identifies affected pages/routes from changed files, and tests them automatically. No URL required. The most natural flow: write code, /ship, /qa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update CHANGELOG for complete v0.3.2 coverage Add missing entries: diff-aware QA mode, Greptile integration, local dev mode, crash log path fix, README/SKILL.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 18:10:56 -07:00
Garry Tan	f7b95329c1	feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29 ) * Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots - CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift) - Async buffer flush with Bun.write() (was appendFileSync) - Dialog auto-accept/dismiss with buffer + prompt text support - File upload command (upload <sel> <file...>) - Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused) - Annotated screenshots with ref labels overlaid (-a flag) - Snapshot diffing against previous snapshot (-D flag) - Cursor-interactive element scan for non-ARIA clickables (-C flag) - Snapshot scoping depth limit (-d N flag) - Health check with page.evaluate + 2s timeout - Playwright error wrapping — actionable messages for AI agents - Fix useragent — context recreation preserves cookies/storage/URLs - wait --networkidle / --load / --domcontentloaded flags - console --errors filter (error + warning only) - cookie-import <json-file> with auto-fill domain from page URL - 166 integration tests (was ~63) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 2: Rewrite SKILL.md as QA playbook + command reference Reorient SKILL.md files from raw command reference to QA-first playbook with 10 workflow patterns (test user flows, verify deployments, dogfood features, responsive layouts, file upload, forms, dialogs, compare pages). Compact command reference tables at the bottom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 3: /qa skill — systematic QA testing with health scores New /qa skill for systematic web app QA testing. Three modes: - full: 5-10 documented issues with screenshots and repro steps - quick: 30-second smoke test with health score - regression: compare against saved baseline Includes issue taxonomy (7 categories, 4 severity levels), structured report template, health score rubric (weighted across 7 categories), framework detection guidance (Next.js, Rails, WordPress, SPA). Also adds browse/bin/find-browse (DRY binary discovery using git rev-parse), .gstack/ to .gitignore, and updated TODO roadmap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Bump to v0.3.0 — Phase 2 + Phase 3 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: cookie-import-browser — Chromium cookie decryption module + tests Pure logic module for reading and decrypting cookies from macOS Chromium browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC encryption with macOS Keychain access, PBKDF2 key derivation, and per-browser key caching. 18 unit tests with encrypted cookie fixtures. * feat: cookie picker web UI + route handler Two-panel dark-theme picker served from the browse server. Left panel shows source browser domains with search and import buttons. Right panel shows imported domains with trash buttons. No cookie values exposed. 6 API endpoints, importedDomains Set tracking, inline clearCookies. * feat: wire cookie-import-browser into browse server Add cookie-picker route dispatch (no auth, localhost-only), add cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort property to BrowserManager, add write command with two modes (picker UI vs --domain direct import), update CLI help text. * chore: /setup-browser-cookies skill + docs (Phase 3.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: redact sensitive values from command output (PR #21) type no longer echoes text (reports character count), cookie redacts value with ***, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token, storage set drops value, forms redacts password fields. Prevents secrets from persisting in LLM transcripts. 7 new tests. Credit: fredluz (PR #21) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> security: path traversal prevention for screenshot/pdf/eval (PR #26) Add validateOutputPath() for screenshot/pdf/responsive (restricts to /tmp and cwd) and validateReadPath() for eval (blocks .. sequences and absolute paths outside safe dirs). 7 new tests. Credit: Jah-yee (PR #26) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install Playwright Chromium in setup (PR #22) Setup now verifies Playwright can launch Chromium, and auto-installs it via `bunx playwright install chromium` if missing. Exits non-zero if build or Chromium launch fails. Credit: AkbarDevop (PR #22) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: fix path validation bypass, CORS restriction, cookie-import path check - startsWith('/tmp') matched '/tmpevil' — now requires trailing slash - CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port> - cookie-import now validates file paths (was missing validateReadPath) - 3 new tests for prefix collision and cookie-import path traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review informational issues + add regression tests - Add cookie-import to CHAIN_WRITE set for chain command routing - Add path validation to snapshot -a -o output path - Fix package.json version to match 0.3.1 - Use crypto.randomUUID() for temp DB paths (unpredictable filenames) - Add regression tests for chain cookie-import and snapshot path validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add /qa, /setup-browser-cookies to README + update BROWSER.md - Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs - Add dedicated README sections for both new skills with usage examples - Update demo workflow to show cookie import → QA → browse flow - Update BROWSER.md: cookie import commands, new source files, test count (203) - Update skill count from 6 to 8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: team-aware /retro v2.0 — per-person praise and growth opportunities - Identify current user via git config, orient narrative as "you" vs teammates - Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions - New "Your Week" section with personal deep-dive for whoever runs the command - New "Team Breakdown" with per-person praise and growth opportunities - Track AI-assisted commits via Co-Authored-By trailers - Personal + team shipping streaks - Tone: praise like a 1:1, growth like investment advice, never compare negatively Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add Conductor parallel sessions section to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 00:31:41 -07:00
morluto	a29743b056	fix: harden browse install and lifecycle checks (#4 ) Thanks @morluto	2026-03-12 07:35:20 -07:00
Garry Tan	3d901066cd	Initial release — gstack v0.0.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 01:32:16 -07:00

16 Commits