gstack

CalvinBackup/gstack

Fork 0

mirror of https://github.com/garrytan/gstack.git synced 2026-06-21 01:00:10 +02:00

Commit Graph

Author	SHA1	Message	Date
Garry Tan	a861c00cfa	v1.58.3.0 feat: gbrowser anti-detection Layer C stealth (#2047 ) * feat: Layer C stealth — chrome., Notification, per-install hardware, toString Proxy (gbrowser T1+T3+D6) Three additions stacked into the existing applyStealth() init script to close the visible automation tells that today push GBrowser users into Google's /sorry/index captcha and similar: T1 — Strip Playwright's automation default args: --enable-automation (kills "Chrome is being controlled" infobar) --disable-popup-blocking, --disable-component-update, --disable-default-apps (Patchright's list — each is a documented tell) Now centralized in STEALTH_IGNORE_DEFAULT_ARGS export, used by BOTH launchHeaded() and handoff() (the headless → headed re-launch path). D6 — Drop "GStackBrowser" UA branding suffix: Real Chrome's UA ends `Safari/537.36`, not `Safari/537.36 GStackBrowser`. The branded suffix was a high-entropy classifier for any vendor that grep'd UA for known automation/test-browser strings. Branding still lives in the wrapper .app name + Dock icon + tray — does not need to leak via the UA string for the product to be "GBrowser." Resolves the "looks like Chrome but identifies as GStackBrowser" contradiction codex review #18 flagged. T3 — Layer C init-script additions in stealth.ts: 1. Function.prototype.toString Proxy (must run first). Wraps every patched getter / function in a WeakSet so they report `function NAME() { [native code] }` at every recursion depth, defeating the depth-3+ integrity check (fn.toString.toString.toString().includes('[native code]')). 2. window.chrome.runtime / chrome.app / chrome.csi / chrome.loadTimes restoration with full enum shape (OnInstalledReason, PlatformArch, PlatformOs, etc.) + method bodies. Real Chrome ships these; their absence is universally checked. Vendor research (gbrowser plan deep-dive on Cloudflare + DataDome) confirmed both vendors probe this shape directly. 3. Notification.permission aligned to 'default'. The existing inline addInitScript already spoofs permissions.query({name:'notifications'}) to return 'prompt' — Notification.permission being 'denied' while Permissions returns 'prompt' is a cross-source inconsistency that detectors flag specifically. 4. Per-install hardware values via GSTACK_HW_CONCURRENCY / GSTACK_DEVICE_MEMORY env vars (set by gbd's host_profile.go from system_profiler + sysctl). Reporting real host values within the Chrome shape avoids the cross-user GBrowser fingerprint cluster that hardcoded defaults would create. Codex review #10 flagged hardcoding as creating contradictions across Apple Silicon / Intel / UA-CH architecture. 5. Selenium 25-global cleanup + PhantomJS + NightmareJS + Watir + Playwright (__pwInitScripts, __playwright__binding__) static-name deletion. The inline block continues to handle the dynamic cdc_/__webdriver/__selenium/__driver prefixes. D7 (codex correction) kept: still do NOT fake navigator.plugins or navigator.languages. Synthesizing those triggers MORE consistency flags from modern fingerprinters than letting Chromium surface them natively. Test coverage: - 15 new tests in stealth-layer-c.test.ts covering: launch-flag exports, script structure, toString-Proxy installs first, every spoof present, hardware values interpolated from input (not hardcoded), Selenium global cleanup spot-check, no GStackBrowser leak in stealth payload, backwards-compat exports preserved. - All 8 existing stealth-webdriver tests still pass. - All 2 existing browser-manager-unit tests still pass. For GBrowser specifically: this is the gstack-side half of Phase 1 / T1 + T3 + D6 in the anti-detection plan. The gbrowser repo's submodule pointer bump will land alongside this. feat: buildGStackLaunchArgs — Pack 1 cmdline-switch construction for gbrowser New stealth.ts export that turns the GSTACK_* env vars (already populated by gbrowser's gbd from host_profile.go) into the --gstack-* cmdline switches the Pack 1 Chromium patches read at WebGL getParameter, NavigatorUA::userAgentData, NavigatorConcurrentHardware::hardwareConcurrency, and NavigatorDeviceMemory::deviceMemory time. Wired into all three launchArgs sites: launch() (headless), launchHeaded() (real product path), and handoff() (headless → headed re-launch). Mapping: GSTACK_GPU_VENDOR → --gstack-gpu-vendor GSTACK_GPU_RENDERER → --gstack-gpu-renderer GSTACK_PLATFORM → --gstack-ua-platform (with mapping: MacARM/MacIntel → macOS, Win32 → Windows, Linux x86_64 → Linux) GSTACK_GPU_CHIPSET → --gstack-ua-model GSTACK_HW_CONCURRENCY → --gstack-hw-concurrency GSTACK_DEVICE_MEMORY → --gstack-device-memory Each switch is emitted only when its env var is non-empty — empty values fall through to the patch's "no override" path, which returns the real Chromium native value. Safe to ship on Chromium builds without the Pack 1 patches applied (zero behavior change). The patches themselves live in the gbrowser repo at chromium/patches/ {webgl-vendor-spoof,ua-client-hints-stealth,worker-navigator-stealth}.patch. Both halves (gstack arg construction + gbrowser C++ patches) must land + Chromium rebuild before the spoof reaches the WebGL/UA-CH/ hardware accessors. Currently dormant until then. Tests (browse/test/stealth-layer-c.test.ts): 7 new buildGStackLaunchArgs cases — empty env, all-populated, partial, platform mapping (MacARM/MacIntel/Win32/Linux), unrecognized platform fallthrough, vendor-with-spaces escape-safety. All 32 stealth/browser-manager tests pass. For GBrowser specifically: gstack-side half of the Pack 1 flag plumbing. gbrowser repo will bump the submodule pointer to this commit, then re-run bun run test/anti-bot/evidence-run.ts to verify creepjs's "33% headless" score drops after Pack 1 + Chromium rebuild. * feat: buildGStackLaunchArgs adds --gstack-suppress-prepare-stack-trace Pack 2 / B11 flag plumbing for the new error-preparestacktrace-stealth.patch in gbrowser/chromium/patches/. Always emit --gstack-suppress-prepare-stack-trace unless the caller explicitly sets GSTACK_CDP_STEALTH=off in the environment. Off by default in patch behavior (no-op without the C++ patch), so this is safe on stock Playwright Chromium too. Closes the Cloudflare canary trick where a page sets Error.prepareStackTrace and watches for it to fire during CDP serialization of a logged Error object. Tests: All 33 stealth/browser-manager tests pass. New cases: - GSTACK_CDP_STEALTH=off disables suppression - empty env still emits the always-on flag (count=1) - all-populated env now emits 7 flags (was 6) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): enable Chromium sandbox on headed launchPersistentContext Mirrors v1.40.0.1 from main lineage (PR #1617). Cherry-picked onto gbrowser-anti-detection so the GBrowser submodule can consume the fix without waiting for main to merge. Playwright auto-adds --no-sandbox whenever chromiumSandbox !== true (playwright-core/lib/server/chromium/chromium.js:291-292). The headless chromium.launch() site set the option; the two headed sites (launchHeaded() and handoff()) did not. Every headed launch on macOS and Linux showed Chromium's yellow "unsupported command-line flag: --no-sandbox" infobar. shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER / root heuristic that previously lived only in the headless path's explicit --no-sandbox push at :225. All three launch sites now use the helper, and six unit tests pin the policy across darwin, linux, win32, CI, CONTAINER, and root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.40.0.2 fix(browse): Cmd+Q on managed Chromium stops triggering supervisor respawn Three browser.on('disconnected') handlers in browse/src/browser-manager.ts (launch, launchHeaded, handoff) each exited with a non-zero code on every disconnect, regardless of cause. Process supervisors that consume our exit code (gbrowser's gbd HealthMonitor in cmd/gbd/health.go) treated user Cmd+Q identical to a Chromium crash and respawned with exponential backoff, so the visible browser kept reappearing after the user closed it. Add resolveDisconnectCause(browser) that reads the underlying ChildProcess exitCode + signalCode (waiting up to 1s for the exit event if the disconnected event fired first). Exit code 0 + no signal = clean user quit; anything else = crash, signal-kill, or OOM. Wire the resolver into all three disconnect handlers: - launch() (headless): clean → exit 0, crash → exit 1 (was always 1) - launchHeaded() (headed): clean → exit 0, crash → exit 2 (was always 2) onDisconnect() cleanup callback still runs in both cases. - handoff() (re-launch): same as launch() via the helper. Preserve the per-path crash codes (1 vs 2) so any supervisor that differentiated headed vs headless crashes keeps working. Seven new unit tests in browse-manager-unit.test.ts cover the resolver across already-exited, signal-killed (SIGSEGV / SIGKILL), async exits, and null-browser inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): apply stealth on every launch path + share automation-artifact cleanup handoff() built cmdline args but never called applyStealth, so a handed-off browser had no JS stealth (no webdriver mask, no chrome.* shape, no toString proxy). And the cdc_/Permissions cleanup shim lived inline in launchHeaded() only, so headless launch() reported Notification.permission='default' without the matching permissions.query='prompt' answer — the exact cross-source inconsistency the shim exists to prevent. Move the cleanup into AUTOMATION_ARTIFACT_CLEANUP_SCRIPT inside applyStealth so all three launch paths (launch, launchHeaded, handoff) get identical stealth, and call applyStealth(newContext) in handoff() before restoreState() navigates. A static tripwire in browser-manager-unit.test.ts fails CI if any launch path drops the applyStealth call again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(browse): make --gstack-suppress-prepare-stack-trace opt-in, not default-on buildGStackLaunchArgs() pushed the flag unless GSTACK_CDP_STEALTH=off, i.e. on-by-default — contradicting its own comment ("off by default, only for gbrowser builds"). The switch is read by a C++ patch that only exists in gbrowser; on stock Playwright Chromium it is an unknown switch. Flip to opt-in: emit only when GSTACK_CDP_STEALTH is on/1/true. gbd opts in by exporting GSTACK_CDP_STEALTH=on; stock installs leave it unset so the flag never reaches a Chromium that wouldn't understand it. Comment now matches code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(browse): correct stale stealth comments The file-level stealth.ts docstring claimed "we DON'T fake navigator.plugins" while the same file now ships EXTENDED_STEALTH_SCRIPT, which does fake plugins when GSTACK_STEALTH=extended. Clarify that Layer C (the always-on default) doesn't fake plugins and the opt-in extended mode does, as the documented "actively lies, may break sites" escape hatch. Also fix the launch()/launchHeaded() comments that said "mask navigator.webdriver only" — applyStealth (Layer C) also restores window.chrome., aligns Notification.permission, and sets per-install hardware. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> test(browse): runtime + extended-mode coverage for the stealth blend The stealth tests were all static string-shape assertions; nothing executed the script in a real page. Add real-Chromium runtime checks via applyStealth + page.evaluate: - Layer C runtime: window.chrome.* rich shape, Notification.permission='default' paired with permissions.query notifications='prompt' (guards the shim now running on every path), and patched getters reporting [native code]. - Per-install hardware: navigator.hardwareConcurrency/deviceMemory reflect the GSTACK_* env profile. - Extended-mode blend: navigator.plugins is faked when GSTACK_STEALTH=extended, Layer C still wins window.chrome.runtime, and navigator.webdriver stays false (own-prop getter survives extended's prototype delete). - Persistent-context (launchHeaded/handoff) parity now uses a page created AFTER applyStealth — the old test checked pages()[0], which predates the init script, so webdriver was false only via the launch arg, not Layer C. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(browse): handoff() + launchHeaded() spread the shared STEALTH_LAUNCH_ARGS handoff() built its launch args from only ['--hide-crash-restore-bubble', ...buildGStackLaunchArgs()], omitting STEALTH_LAUNCH_ARGS — so a handed-off browser kept the --disable-blink-features=AutomationControlled tell that launch() and launchHeaded() strip. launchHeaded() also hardcoded the flag as a literal. Both now spread the shared constant, so the AutomationControlled flag lives in one place across all three launch paths. Tripwires: STEALTH_LAUNCH_ARGS spread into >= 3 sites (no inline literal) and STEALTH_IGNORE_DEFAULT_ARGS wired into both persistent-context paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(browse): drop dead HostProfile.platform, export test internals HostProfile.platform was set by readHostProfile but never read by buildStealthScript — the platform spoof is owned by the UA-CH cmdline switch in buildGStackLaunchArgs (which reads GSTACK_PLATFORM directly). Remove the dead field. Export readHostProfile and AUTOMATION_ARTIFACT_CLEANUP_SCRIPT so their clamp/shape invariants can be unit-tested. Correct the stale "25 Selenium globals" count comment and note the extended cdc_ scan is redundant-but-retained for standalone use. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(browse): cover readHostProfile clamp, toString depth-3, chrome.* calls Pre-landing review coverage gaps: - readHostProfile clamps 0/negative/NaN/missing env to 8 (a deviceMemory=0 or NaN would be a glaring bot tell) — now asserted. - toString proxy survives the depth-3 recursion trick (fn.toString.toString.toString().includes('[native code]')), the headline claim that was only tested at depth-1. - chrome.csi() and chrome.loadTimes() are invoked (not just typeof-checked) and runtime.connect() throws the native-shaped "No matching signature" error. - AUTOMATION_ARTIFACT_CLEANUP_SCRIPT static shape (cdc_/__webdriver strip + notifications->prompt) as a hermetic backup for the live-Chromium pairing test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(browse): recreateContext() re-applies stealth (closes 4th un-stealth path) useragent and viewport --scale route through recreateContext(), which rebuilds the BrowserContext via newContext() — a fresh context with no init scripts. It never called applyStealth, so a routine useragent/viewport-scale command silently dropped webdriver masking, window.chrome.* shape, hardware spoof, and the cdc/Permissions cleanup on every restored page. Caught by the cross-model adversarial review (Codex) after the Claude pass and eng review missed it. Both the main and fallback paths now call applyStealth before any page is created. The launch-path tripwire is raised to >= 4 sites and now asserts the recreateContext() body specifically, so the regression class can't recur. Also documents the load-bearing trust assumption on buildGStackLaunchArgs / readHostProfile (GSTACK_* must be gbd-sourced, never page/remote data — the injection-safety argument depends on it) and the notifications-permission spoof tradeoff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.58.3.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: sync browser stealth docs to Layer C (v1.58.3.0) BROWSER.md "Stealth scope" still described the default as navigator.webdriver masking only; Layer C is now the always-on default across all four context-creation paths. Update the stealth-scope prose, the "What GStack Browser means" blurb (stock-Chrome UA, no GStackBrowser suffix, captchas can still get through at the CDP layer), the stealth.ts source-map line, and the env-vars table (GSTACK_STEALTH, GSTACK_CDP_STEALTH, GSTACK_GPU_, GSTACK_PLATFORM, GSTACK_HW_CONCURRENCY/GSTACK_DEVICE_MEMORY + the explicit --gstack- switches and ignoreDefaultArgs stripping). Correct the stale "narrows to navigator.webdriver masking only" premise on the open CDP-patch TODO (the TODO itself stays open — the CDP-protocol layer is still unaddressed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-18 10:45:05 -07:00
Garry Tan	7ca04d8ef0	v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594 ) * fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569) gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin (e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints, analytics, and learnings into that plugin's directory. Anyone with the Codex plugin installed alongside gstack hit this silently. Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path contains "gstack"). Skill installs fall through to \$HOME/.gstack. Contributed by @ElliotDrel via #1570. Closes #1569. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+ gbrain v0.20+ changed `gbrain sources list --json` to return {sources: [...]} instead of a flat array. sourceLocalPath crashed upstream with `list.find is not a function` on every /sync-gbrain invocation against modern gbrain. Accept both shapes for forward/backward compat, matching probeSource/sourcePageCount in lib/gbrain-sources.ts. Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564 (@tonyjzhou, same fix, different shape — credit retained). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559) gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`, which fails in any environment where the `command` builtin isn't on the spawned process's PATH (most non-interactive shells). The probe then reported gbrain as missing even when it was installed, and context-load silently skipped vector/list queries. Fix: probe `gbrain --version` directly with a 500ms timeout (matching the rest of the file's MCP_TIMEOUT_MS). Same semantics, works everywhere execFile works. Contributed by @jbetala7 via #1560. Closes #1559. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418) Adds an exec-path regression test that runs a fake gbrain shim emitting the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings", exit 1 for health_score < 100, no top-level `engine` field). Confirms freshDetectEngineTier recovers stdout from the non-zero exit and falls back to GBRAIN_HOME/config.json for the engine label. The pre-existing test for #1415 only stripped gbrain from PATH; this test exercises the actual doctor parse path, closing the gap that codex's plan review flagged. Also documents the schema_version separation in lib/gbrain-local-status.ts: the local CacheEntry stays at version 1, distinct from the doctor-output schema_version which we accept across versions in gstack-memory-helpers. Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2 collapse). The fix landed pre-emptively in v1.29.x; this commit pins it with a stronger test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346) #1346 reported that gstack-memory-ingest still called the renamed gbrain put_page subcommand on gbrain v0.18+. The actual code migrated to `gbrain put` and later to batch `gbrain import <dir>` before this report landed — only documentation lag remained. This commit: - Updates the --help string ("Skip gbrain put calls (still updates state file)") so user-facing docs match the shipped subcommand - Updates two inline comments that still referenced the old name - Adds test/memory-ingest-no-put_page.test.ts: a regression pin that strips comments from bin/gstack-memory-ingest.ts and fails the build if "put_page" appears in any active code or string literal, plus a sanity check that the file still calls a supported gbrain page-write verb (put or import) Closes #1346. Reporter @kylma-code surfaced the doc lag; the original code migration credit is on the v1.27.x wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug> scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions using the renamed `gbrain put_page` subcommand across 10 skills (office-hours, investigate, plan-ceo-review, retro, plan-eng-review, ship, cso, design-consultation, fallback, entity-stub). Every gstack user copying those snippets hit "unknown command: put_page" on gbrain v0.18+. This commit: - Rewrites all 10 instruction templates to use `gbrain put <slug> --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML frontmatter inside --content, matching the v0.18+ subcommand shape - Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands" table to reference `gbrain put` and `gbrain get` - Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two invariants: (a) resolver source ships only canonical instructions, (b) every tracked SKILL.md file is free of `gbrain put_page` CHANGELOG entries are deliberately left untouched (historical record). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561) Bun's Windows shell parser rejects multiple constructs the inline package.json build chain used: brace groups `{ cmd; }`, subshells with redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells near redirections in general. Every Windows install + every auto-upgrade since v1.34.2.0 has failed on `bun run build`. Extracts the build chain to scripts/build.sh and the .version writes to scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing involved. Also adds Windows-specific bun.exe handling for non-ASCII PATHs (a separate Windows footgun where Bun's --compile fails when the binary lives under a path with non-ASCII chars). Updates test/build-script-shell-compat.test.ts to assert the new shape: no subshells with redirections anywhere in the build chain, and build delegates to scripts/build.sh which delegates .version writes. Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed in build helper), #1480 (@mikepsinn, partial overlap), #1460 (@realcarsonterry, brace-group fix subsumed) — credit retained. Closes #1538, #1537, #1530, #1457, #1561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554) bun build --compile on Windows appends .exe to the output filename, producing browse.exe instead of browse. find-browse's existsSync probe only checked the bare path and returned null on Windows even when the binary was correctly built. .gitignore similarly only excluded the bare bin/gstack-global-discover path, leaving the .exe variant tracked. This commit: - .gitignore: changes `bin/gstack-global-discover` → `bin/gstack-global-discover` so the Windows .exe variant is ignored - browse/src/find-browse.ts: adds isExecutable + findExecutable helpers that fall back to .exe/.cmd/.bat probing on Windows, mirroring the same helper already in make-pdf/src/browseClient.ts and pdftotext.ts Contributed by @Mike-E-Log via #1554. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest Adds .github/workflows/windows-setup-e2e.yml as the gate that catches Bun shell-parser regressions in the build chain before they reach users. Triggers on PRs touching package.json, scripts/build.sh, scripts/write-version-files.sh, setup, browse cli/find-browse, or gstack-paths. What it verifies: 1. bun run build completes on Windows (the previously-broken path that #1538/#1537/#1530/#1457/#1561 reported) 2. All compiled binaries land on disk (browse.exe, find-browse.exe, design.exe, gstack-global-discover.exe) 3. find-browse resolves to the .exe variant on Windows (regression gate for #1554) 4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT on Windows (regression gate for #1570) Complements the existing windows-free-tests.yml (curated unit subset); this new workflow exercises the install path itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209) Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together (mutually exclusive at argv level). Every /codex review, /review, and /ship structured Codex review call ended with an argv error before the model ran. Fix: scope the diff in prompt text using "Run git diff origin/<base>...HEAD 2>/dev/null \|\| git diff <base>...HEAD" instead of `--base <base>`. Preserves the filesystem boundary instruction across all invocations and keeps Codex's review prompt tuning. Touches: - codex/SKILL.md.tmpl + regenerated codex/SKILL.md - scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md - test/gen-skill-docs.test.ts: new regression that fails if any of the five known files still contain the prompt+--base shape - test/skill-validation.test.ts: corresponding negative + positive pin on the rendered SKILL.md files Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527 (@mvanhorn — same intent, different patch shape, CONFLICTING) and #1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): diff from git merge-base, not git diff origin/<base> (#1492) git diff origin/<base> shows everything since the common ancestor in both directions — it includes commits that landed on origin/<base> after this branch was created as deletions. That made /review and /ship's pre-landing structured review report inflated diff totals and flagged "removed" code that was actually still present in the working tree. Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff the working tree against that point. Same coverage of uncommitted edits, no phantom deletions from out-of-order base advancement. Applies to /review's Step 1 (diff existence check), Step 3 (get the diff), the build-on-intent scope-creep check, the structured review DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt. Same change flows into ship/SKILL.md via the shared resolver. Touches: - review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md - scripts/resolvers/review.ts - scripts/resolvers/review-army.ts Contributed by @mvanhorn via #1492. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522) #1503 reported that the bare codex review --base path stripped the filesystem boundary instruction, letting Codex spend tokens reading .claude/skills/ and agents/. #1522 proposed adding a skill-path detector that switched to the custom-instructions route when the diff touched skill files. After C10 (#1209) restructured codex review to always carry the boundary in the prompt (the prompt+--base argv conflict forced the restructure), the skill-path detector becomes redundant — every default call already preserves the boundary. This commit pins the post-#1209 invariant with a test that fails the build if any future refactor strips the boundary from codex/SKILL.md, review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test. #1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers its safety concern); credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): use command -v instead of which for codex detection (#1197) `which` is not on PATH in every shell — some Windows shells, BusyBox- only containers, and minimal CI images all fail when skills probe codex availability via `which codex`. `command -v` is a POSIX builtin and always available where the skill is running. Touched: - codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex \|\| echo "") - scripts/resolvers/review.ts and scripts/resolvers/design.ts: 3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1` - Regenerated all 10 affected SKILL.md files (codex, review, ship, design-consultation, design-review, office-hours, plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review) - test/skill-validation.test.ts: updated pin + defensive regression test that fails if `which codex` returns to codex/SKILL.md - test/skill-e2e-plan.test.ts: updated summary regex Contributed by @mvanhorn via #1197. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327) When codex exits non-zero (parse errors, arg-shape breaks, model API errors that propagate as non-zero status), the calling agent previously saw an empty output and burned 30-60 minutes misdiagnosing as a silent model/API stall. The hang-detection block only caught exit 124 (the timeout-wrapper signal). Adds elif blocks in all four codex invocation sites (Review default, Challenge, Consult new-session, Consult resume) that: - Echo "[codex exit N] <stderr first line>" to stdout - Indent the first 20 stderr lines for inline context - Log codex_nonzero_exit telemetry tagged with the call site Contributed by @genisis0x via #1467. Closes #1327. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248) The design binary previously called process.env.OPENAI_API_KEY without checking where the key came from. If a user ran $D inside someone else's project that had OPENAI_API_KEY in its .env, the resulting generation billed that project's account. Silent and irreversible. Fix: resolveApiKeyInfo() returns both the key and its source. When the env-var path matches an OPENAI_API_KEY entry in the current directory's .env, .env.<NODE_ENV>, or .env.local file, we set a warning. requireApiKey() prints "Using OpenAI key from <source>" plus the warning before the run — never the key itself. Adds 6 unit tests covering: config-vs-env precedence, env-only (no match), env+cwd .env match, quoted/exported values, value-mismatch (no false positive), and the no-leak invariant for requireApiKey stderr output. Contributed by @jbetala7 via #1278. Closes #1248. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214) Full-page screenshots of tall pages routinely exceeded 2000px on the longest dimension, silently bricking the agent's session: the resulting base64 reached the Anthropic vision API which rejected the oversized image, leaving the agent burning turns on a useless blob with no stderr trace from the browse side. Adds browse/src/screenshot-size-guard.ts as a shared helper: - guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000 - guardScreenshotPath(path) → file-mode variant that rewrites in place - Aspect ratio preserved via sharp's resize fit:inside - Stderr diagnostic on any downscale so callers can see when it fired - Lazy sharp import so non-screenshot paths pay no startup cost Wires the guard into all three full-page callsites codex review flagged: - browse/src/snapshot.ts: annotated + heatmap fullPage captures - browse/src/meta-commands.ts: screenshot command (path + base64 fullPage modes) plus the responsive 3-viewport sweep - browse/src/write-commands.ts: prettyscreenshot fullPage path Covers seven unit cases (pass-through, downscale, aspect ratio, exactly-2000px edge, file-mode rewrite) plus a static invariant test that fails the build if any of the three callsites stops importing the guard. Closes #1214. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370) The L4 TestSavant classifier in browse/src/security-classifier.ts can't be imported into the compiled browse server (onnxruntime-node dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent that used to host it (sidebar-agent.ts) was removed when the PTY proved out — leaving the classifier file shipped but with zero callers. Exactly the gap codex flagged in #1370. Adds browse/src/security-sidecar-entry.ts: a Node script that runs the classifier as a subprocess of the browse server. It reads NDJSON requests from stdin and writes id-correlated NDJSON responses to stdout, supporting: - op: "scan-page-content" — full L4 classifier scan - op: "ping" — liveness probe for the client's health check - op: "status" — classifier readiness (used by /pty-inject-scan to surface l4 { available: bool } in its response) Plus browse/src/find-security-sidecar.ts: a resolver that locates node + the bundled JS entry (browse/dist/security-sidecar.js, built in a follow-up package.json change) or falls back to the dev TS entry. Returns null cleanly when node isn't on PATH so the calling endpoint can degrade per D7 (extension WARN + user confirm). C17 of the security-stack wave. C18 adds the IPC client + lifecycle management; C19 wires the endpoint; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370) Adds browse/src/security-sidecar-client.ts to manage the Node L4 classifier subprocess from the compiled browse server: - Lazy spawn on first scan; reuses the same process across requests - Id-correlated request/response via NDJSON over stdio - 5s default per-scan timeout; 64KB payload cap (short-circuits before spawn so oversized requests don't waste a process) - 3-in-10-minutes respawn cap → trips circuit breaker; subsequent scans throw immediately so the /pty-inject-scan endpoint can surface l4 { available: false } to the extension and degrade to WARN+confirm - process.on('exit') sends SIGTERM to the child for clean teardown - isSidecarAvailable() lets the endpoint probe before scan calls so the response shape reflects degraded mode honestly Unit tests cover the payload cap, the availability probe, and the breaker-doesn't-crash invariant under repeated rejected calls. C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370) The sidebar's gstackInjectToTerminal callers (toolbar Cleanup, Inspector "Send to Code") were piping page-derived text directly into the live claude PTY with ZERO classifier processing — the gap codex flagged in #1370. The documented sidebar security stack had a hole the size of every Cleanup-button click. Adds POST /pty-inject-scan to browse/src/server.ts: - Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the general 404 path; never reaches the scan logic) - Root-token auth via existing validateAuth() — 401 on unauth - 64KB request cap → 413 + payload-too-large body - 5s scan timeout via sidecar client - URL-blocklist forced to BLOCK in PTY context (page-derived REPL input is higher-risk than ordinary tool output) - L4 ML classifier via the sidecar when available; degrades to WARN per D7 when sidecar is unavailable - Response goes through JSON.stringify(..., sanitizeReplacer) per v1.38.0.0 Unicode-egress hardening - Imports only from security-sidecar-client.ts, never directly from security-classifier.ts (which would brick the compiled Bun binary) Seven static-invariant tests pin the POST verb, auth gate, 64KB cap, tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability shape, and the no-direct-classifier-import rule. C19 of the security-stack wave. C20 routes the extension through it; C21 adds the invariant AST check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370) Closes the documented-vs-shipped gap codex flagged in #1370. The sidebar's two PTY-injection call sites (Inspector "Send to Code" and toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint before writing to the live claude REPL. Adds window.gstackScanForPTYInject(text, origin) to extension/sidepanel-terminal.js: - Async, returns { allow, verdict, reasons, l4 } - POST to /pty-inject-scan with the existing root-token auth - WARN+confirm on scan failure (network down, sidecar absent, etc.) rather than silent PASS — D7 honest-degradation gstackInjectToTerminal stays synchronous, returns boolean. Per D6: keeping the inject sync means existing `const ok = ...?.()` callers don't break, and the invariant test in test/extension-pty-inject-invariant.test.ts can statically pin that every call goes through the scan first. extension/sidepanel.js call sites updated: - inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via window.confirm, PASS injects silently - runCleanup() → same flow. Static cleanup prompt always PASSes but still routes through scan to honor the invariant. C20 of the security-stack wave. C21 adds the static invariant test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(security): invariant — extension PTY inject must be scan-gated (#1370) Static-analysis invariant test that fails the build if any extension/.js path calls window.gstackInjectToTerminal without a preceding window.gstackScanForPTYInject in the same enclosing function. Closes the documented-vs-shipped gap codex demanded a machine check on. Rules: - Rule 1: any file that calls inject must also reference scan - Rule 2: in the enclosing function (function declaration, arrow, async (), event handler), a scan call must appear before the inject call by source position - Exemption: sidepanel-terminal.js (the file that DEFINES the inject function) is exempt from Rule 2 since the definition is not a call Plus two structural checks: - sidepanel-terminal.js defines both the inject and scan functions - inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would silently break the `const ok = ...?.()` pattern at every caller C21 of the security-stack wave. The sidecar architecture (#1370) is complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension pre-scan wiring (C20), and now the regression gate (C21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112) Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current browse/src/stealth.ts contract. The existing minimal "codex narrowed" stealth (webdriver-mask + AutomationControlled launch arg) stays the default. PR #1112's six additional patches are added behind an opt-in GSTACK_STEALTH=extended env flag. Extended-mode patches (applied AFTER the default mask, in order): 1. delete navigator.webdriver from prototype (not just the getter — detectors check `"webdriver" in navigator`) 2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1 software-GPU tell in containers) 3. navigator.plugins returns a PluginArray-prototype-passing array with MimeType objects and namedItem() 4. window.chrome populated with chrome.app, chrome.runtime, chrome.loadTimes(), chrome.csi() with realistic shapes 5. navigator.mediaDevices backfilled when headless drops it 6. CDP cdc_-prefixed window globals cleared Why opt-in: the default mode's contract is fingerprint CONSISTENCY, which protects against detectors that flag spoofing mismatch. Extended mode actively lies about the environment; sites that reflect on these properties can break. Users who hit detection in default mode can flip GSTACK_STEALTH=extended for SannySoft 100% pass-rate. Twenty unit tests pin the env-flag semantics, all six patches' code presence, and the applyStealth wiring order. Live SannySoft pass-rate verification stays in the periodic-tier E2E suite. Contributed by @garrytan via #1112 (rebased — original PR opened before the codex-narrowed minimum landed; rebase preserves the narrowed default while adding the SannySoft-passing path as opt-in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates Updates the three ship-SKILL.md golden baselines (claude, codex, factory hosts) to match the new shape produced by: - C10 #1209 codex argv (prompt + diff scope, no --base) - C11 #1492 merge-base diff (DIFF_BASE= preamble) - C13 #1197 command -v for codex detection - C12 + boundary preservation per regen-enforcing test Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs, commit the regenerated outputs together. Goldens are part of the regen contract — without this commit, test/host-config.test.ts' golden-baseline checks fail with the diff codex review surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes) Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the release-summary format in CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table, "What this means for builders" closer, then itemized Added/Changed/Fixed/For contributors with inline credit to every PR author and original issue reporter. Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net, substantial new capability across security (PTY sidecar wiring), install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI 0.130+), and quality (screenshot guard, design key disclosure, extended stealth opt-in). MINOR is the right call. Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537, #1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327, #1193 pattern, #1152 pattern. Credit retained inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe] windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a freshly-built repo where binaries land at browse/dist/browse.exe (no .claude/skills/gstack/ install layout). The previous markers chain only matched .codex/.agents/.claude prefixed paths, so find-browse exited "not found" even when the binary was present. Adds a source-checkout fallback after the marker scan: if no installed layout resolves but <git-root>/browse/dist/browse[.exe] exists, return that. Three real callers hit this path: - gstack repo dev workflow before `./setup` runs - windows-setup-e2e.yml CI (the breakage that surfaced this) - make-pdf consumers running from a sibling source checkout Smoke-verified: a fresh git repo with browse/dist/browse on disk now resolves through the source-checkout branch (was returning null before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 07:35:01 -07:00

Author

SHA1

Message

Date

Garry Tan

a861c00cfa

v1.58.3.0 feat: gbrowser anti-detection Layer C stealth (#2047 )

* feat: Layer C stealth — chrome.*, Notification, per-install hardware, toString Proxy (gbrowser T1+T3+D6)

Three additions stacked into the existing applyStealth() init script
to close the visible automation tells that today push GBrowser users
into Google's /sorry/index captcha and similar:

T1 — Strip Playwright's automation default args:
  --enable-automation                              (kills "Chrome is being
                                                    controlled" infobar)
  --disable-popup-blocking, --disable-component-update,
  --disable-default-apps                           (Patchright's list — each
                                                    is a documented tell)

  Now centralized in STEALTH_IGNORE_DEFAULT_ARGS export, used by BOTH
  launchHeaded() and handoff() (the headless → headed re-launch path).

D6 — Drop "GStackBrowser" UA branding suffix:
  Real Chrome's UA ends `Safari/537.36`, not `Safari/537.36 GStackBrowser`.
  The branded suffix was a high-entropy classifier for any vendor that
  grep'd UA for known automation/test-browser strings. Branding still
  lives in the wrapper .app name + Dock icon + tray — does not need
  to leak via the UA string for the product to be "GBrowser." Resolves
  the "looks like Chrome but identifies as GStackBrowser" contradiction
  codex review #18 flagged.

T3 — Layer C init-script additions in stealth.ts:

  1. Function.prototype.toString Proxy (must run first). Wraps every
     patched getter / function in a WeakSet so they report
     `function NAME() { [native code] }` at every recursion depth,
     defeating the depth-3+ integrity check
     (fn.toString.toString.toString().includes('[native code]')).

  2. window.chrome.runtime / chrome.app / chrome.csi / chrome.loadTimes
     restoration with full enum shape (OnInstalledReason, PlatformArch,
     PlatformOs, etc.) + method bodies. Real Chrome ships these; their
     absence is universally checked. Vendor research (gbrowser plan
     deep-dive on Cloudflare + DataDome) confirmed both vendors probe
     this shape directly.

  3. Notification.permission aligned to 'default'. The existing inline
     addInitScript already spoofs permissions.query({name:'notifications'})
     to return 'prompt' — Notification.permission being 'denied' while
     Permissions returns 'prompt' is a cross-source inconsistency that
     detectors flag specifically.

  4. Per-install hardware values via GSTACK_HW_CONCURRENCY /
     GSTACK_DEVICE_MEMORY env vars (set by gbd's host_profile.go from
     system_profiler + sysctl). Reporting real host values within the
     Chrome shape avoids the cross-user GBrowser fingerprint cluster
     that hardcoded defaults would create. Codex review #10 flagged
     hardcoding as creating contradictions across Apple Silicon / Intel
     / UA-CH architecture.

  5. Selenium 25-global cleanup + PhantomJS + NightmareJS + Watir +
     Playwright (__pwInitScripts, __playwright__binding__) static-name
     deletion. The inline block continues to handle the dynamic
     cdc_/__webdriver/__selenium/__driver prefixes.

D7 (codex correction) kept: still do NOT fake navigator.plugins or
navigator.languages. Synthesizing those triggers MORE consistency
flags from modern fingerprinters than letting Chromium surface them
natively.

Test coverage:
- 15 new tests in stealth-layer-c.test.ts covering: launch-flag
  exports, script structure, toString-Proxy installs first, every
  spoof present, hardware values interpolated from input (not
  hardcoded), Selenium global cleanup spot-check, no GStackBrowser
  leak in stealth payload, backwards-compat exports preserved.
- All 8 existing stealth-webdriver tests still pass.
- All 2 existing browser-manager-unit tests still pass.

For GBrowser specifically: this is the gstack-side half of Phase 1 / T1
+ T3 + D6 in the anti-detection plan. The gbrowser repo's submodule
pointer bump will land alongside this.

* feat: buildGStackLaunchArgs — Pack 1 cmdline-switch construction for gbrowser

New stealth.ts export that turns the GSTACK_* env vars (already populated
by gbrowser's gbd from host_profile.go) into the --gstack-* cmdline
switches the Pack 1 Chromium patches read at WebGL getParameter,
NavigatorUA::userAgentData, NavigatorConcurrentHardware::hardwareConcurrency,
and NavigatorDeviceMemory::deviceMemory time.

Wired into all three launchArgs sites: launch() (headless), launchHeaded()
(real product path), and handoff() (headless → headed re-launch).

Mapping:
  GSTACK_GPU_VENDOR      → --gstack-gpu-vendor
  GSTACK_GPU_RENDERER    → --gstack-gpu-renderer
  GSTACK_PLATFORM        → --gstack-ua-platform (with mapping:
                            MacARM/MacIntel → macOS, Win32 → Windows,
                            Linux x86_64 → Linux)
  GSTACK_GPU_CHIPSET     → --gstack-ua-model
  GSTACK_HW_CONCURRENCY  → --gstack-hw-concurrency
  GSTACK_DEVICE_MEMORY   → --gstack-device-memory

Each switch is emitted only when its env var is non-empty — empty
values fall through to the patch's "no override" path, which returns
the real Chromium native value. Safe to ship on Chromium builds
without the Pack 1 patches applied (zero behavior change).

The patches themselves live in the gbrowser repo at chromium/patches/
{webgl-vendor-spoof,ua-client-hints-stealth,worker-navigator-stealth}.patch.
Both halves (gstack arg construction + gbrowser C++ patches) must
land + Chromium rebuild before the spoof reaches the WebGL/UA-CH/
hardware accessors. Currently dormant until then.

Tests (browse/test/stealth-layer-c.test.ts):
  7 new buildGStackLaunchArgs cases — empty env, all-populated, partial,
  platform mapping (MacARM/MacIntel/Win32/Linux), unrecognized platform
  fallthrough, vendor-with-spaces escape-safety.
  All 32 stealth/browser-manager tests pass.

For GBrowser specifically: gstack-side half of the Pack 1 flag plumbing.
gbrowser repo will bump the submodule pointer to this commit, then re-run
bun run test/anti-bot/evidence-run.ts to verify creepjs's "33% headless"
score drops after Pack 1 + Chromium rebuild.

* feat: buildGStackLaunchArgs adds --gstack-suppress-prepare-stack-trace

Pack 2 / B11 flag plumbing for the new
error-preparestacktrace-stealth.patch in gbrowser/chromium/patches/.

Always emit --gstack-suppress-prepare-stack-trace unless the caller
explicitly sets GSTACK_CDP_STEALTH=off in the environment. Off by
default in patch behavior (no-op without the C++ patch), so this is
safe on stock Playwright Chromium too.

Closes the Cloudflare canary trick where a page sets
Error.prepareStackTrace and watches for it to fire during CDP
serialization of a logged Error object.

Tests:
  All 33 stealth/browser-manager tests pass. New cases:
  - GSTACK_CDP_STEALTH=off disables suppression
  - empty env still emits the always-on flag (count=1)
  - all-populated env now emits 7 flags (was 6)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): enable Chromium sandbox on headed launchPersistentContext

Mirrors v1.40.0.1 from main lineage (PR #1617). Cherry-picked onto
gbrowser-anti-detection so the GBrowser submodule can consume the fix
without waiting for main to merge.

Playwright auto-adds --no-sandbox whenever chromiumSandbox !== true
(playwright-core/lib/server/chromium/chromium.js:291-292). The headless
chromium.launch() site set the option; the two headed sites
(launchHeaded() and handoff()) did not. Every headed launch on macOS
and Linux showed Chromium's yellow "unsupported command-line flag:
--no-sandbox" infobar.

shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER /
root heuristic that previously lived only in the headless path's
explicit --no-sandbox push at :225. All three launch sites now use the
helper, and six unit tests pin the policy across darwin, linux, win32,
CI, CONTAINER, and root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.40.0.2 fix(browse): Cmd+Q on managed Chromium stops triggering supervisor respawn

Three browser.on('disconnected') handlers in browse/src/browser-manager.ts
(launch, launchHeaded, handoff) each exited with a non-zero code on every
disconnect, regardless of cause. Process supervisors that consume our exit
code (gbrowser's gbd HealthMonitor in cmd/gbd/health.go) treated user
Cmd+Q identical to a Chromium crash and respawned with exponential
backoff, so the visible browser kept reappearing after the user closed it.

Add resolveDisconnectCause(browser) that reads the underlying ChildProcess
exitCode + signalCode (waiting up to 1s for the exit event if the
disconnected event fired first). Exit code 0 + no signal = clean user
quit; anything else = crash, signal-kill, or OOM.

Wire the resolver into all three disconnect handlers:
- launch() (headless): clean → exit 0, crash → exit 1 (was always 1)
- launchHeaded() (headed): clean → exit 0, crash → exit 2 (was always 2)
  onDisconnect() cleanup callback still runs in both cases.
- handoff() (re-launch): same as launch() via the helper.

Preserve the per-path crash codes (1 vs 2) so any supervisor that
differentiated headed vs headless crashes keeps working.

Seven new unit tests in browse-manager-unit.test.ts cover the resolver
across already-exited, signal-killed (SIGSEGV / SIGKILL), async exits,
and null-browser inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): apply stealth on every launch path + share automation-artifact cleanup

handoff() built cmdline args but never called applyStealth, so a handed-off
browser had no JS stealth (no webdriver mask, no chrome.* shape, no toString
proxy). And the cdc_/Permissions cleanup shim lived inline in launchHeaded()
only, so headless launch() reported Notification.permission='default' without
the matching permissions.query='prompt' answer — the exact cross-source
inconsistency the shim exists to prevent.

Move the cleanup into AUTOMATION_ARTIFACT_CLEANUP_SCRIPT inside applyStealth so
all three launch paths (launch, launchHeaded, handoff) get identical stealth,
and call applyStealth(newContext) in handoff() before restoreState() navigates.

A static tripwire in browser-manager-unit.test.ts fails CI if any launch path
drops the applyStealth call again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): make --gstack-suppress-prepare-stack-trace opt-in, not default-on

buildGStackLaunchArgs() pushed the flag unless GSTACK_CDP_STEALTH=off, i.e.
on-by-default — contradicting its own comment ("off by default, only for
gbrowser builds"). The switch is read by a C++ patch that only exists in
gbrowser; on stock Playwright Chromium it is an unknown switch.

Flip to opt-in: emit only when GSTACK_CDP_STEALTH is on/1/true. gbd opts in by
exporting GSTACK_CDP_STEALTH=on; stock installs leave it unset so the flag
never reaches a Chromium that wouldn't understand it. Comment now matches code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(browse): correct stale stealth comments

The file-level stealth.ts docstring claimed "we DON'T fake navigator.plugins"
while the same file now ships EXTENDED_STEALTH_SCRIPT, which does fake plugins
when GSTACK_STEALTH=extended. Clarify that Layer C (the always-on default)
doesn't fake plugins and the opt-in extended mode does, as the documented
"actively lies, may break sites" escape hatch.

Also fix the launch()/launchHeaded() comments that said "mask navigator.webdriver
only" — applyStealth (Layer C) also restores window.chrome.*, aligns
Notification.permission, and sets per-install hardware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(browse): runtime + extended-mode coverage for the stealth blend

The stealth tests were all static string-shape assertions; nothing executed
the script in a real page. Add real-Chromium runtime checks via applyStealth +
page.evaluate:

- Layer C runtime: window.chrome.* rich shape, Notification.permission='default'
  paired with permissions.query notifications='prompt' (guards the shim now
  running on every path), and patched getters reporting [native code].
- Per-install hardware: navigator.hardwareConcurrency/deviceMemory reflect the
  GSTACK_* env profile.
- Extended-mode blend: navigator.plugins is faked when GSTACK_STEALTH=extended,
  Layer C still wins window.chrome.runtime, and navigator.webdriver stays false
  (own-prop getter survives extended's prototype delete).
- Persistent-context (launchHeaded/handoff) parity now uses a page created
  AFTER applyStealth — the old test checked pages()[0], which predates the init
  script, so webdriver was false only via the launch arg, not Layer C.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): handoff() + launchHeaded() spread the shared STEALTH_LAUNCH_ARGS

handoff() built its launch args from only ['--hide-crash-restore-bubble',
...buildGStackLaunchArgs()], omitting STEALTH_LAUNCH_ARGS — so a handed-off
browser kept the --disable-blink-features=AutomationControlled tell that
launch() and launchHeaded() strip. launchHeaded() also hardcoded the flag as a
literal. Both now spread the shared constant, so the AutomationControlled flag
lives in one place across all three launch paths.

Tripwires: STEALTH_LAUNCH_ARGS spread into >= 3 sites (no inline literal) and
STEALTH_IGNORE_DEFAULT_ARGS wired into both persistent-context paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(browse): drop dead HostProfile.platform, export test internals

HostProfile.platform was set by readHostProfile but never read by
buildStealthScript — the platform spoof is owned by the UA-CH cmdline switch in
buildGStackLaunchArgs (which reads GSTACK_PLATFORM directly). Remove the dead
field. Export readHostProfile and AUTOMATION_ARTIFACT_CLEANUP_SCRIPT so their
clamp/shape invariants can be unit-tested. Correct the stale "25 Selenium
globals" count comment and note the extended cdc_ scan is redundant-but-retained
for standalone use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(browse): cover readHostProfile clamp, toString depth-3, chrome.* calls

Pre-landing review coverage gaps:
- readHostProfile clamps 0/negative/NaN/missing env to 8 (a deviceMemory=0 or
  NaN would be a glaring bot tell) — now asserted.
- toString proxy survives the depth-3 recursion trick
  (fn.toString.toString.toString().includes('[native code]')), the headline
  claim that was only tested at depth-1.
- chrome.csi() and chrome.loadTimes() are invoked (not just typeof-checked) and
  runtime.connect() throws the native-shaped "No matching signature" error.
- AUTOMATION_ARTIFACT_CLEANUP_SCRIPT static shape (cdc_/__webdriver strip +
  notifications->prompt) as a hermetic backup for the live-Chromium pairing test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): recreateContext() re-applies stealth (closes 4th un-stealth path)

useragent and viewport --scale route through recreateContext(), which rebuilds
the BrowserContext via newContext() — a fresh context with no init scripts. It
never called applyStealth, so a routine useragent/viewport-scale command
silently dropped webdriver masking, window.chrome.* shape, hardware spoof, and
the cdc/Permissions cleanup on every restored page. Caught by the cross-model
adversarial review (Codex) after the Claude pass and eng review missed it.

Both the main and fallback paths now call applyStealth before any page is
created. The launch-path tripwire is raised to >= 4 sites and now asserts the
recreateContext() body specifically, so the regression class can't recur.

Also documents the load-bearing trust assumption on buildGStackLaunchArgs /
readHostProfile (GSTACK_* must be gbd-sourced, never page/remote data — the
injection-safety argument depends on it) and the notifications-permission
spoof tradeoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.3.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: sync browser stealth docs to Layer C (v1.58.3.0)

BROWSER.md "Stealth scope" still described the default as navigator.webdriver
masking only; Layer C is now the always-on default across all four
context-creation paths. Update the stealth-scope prose, the "What GStack
Browser means" blurb (stock-Chrome UA, no GStackBrowser suffix, captchas can
still get through at the CDP layer), the stealth.ts source-map line, and the
env-vars table (GSTACK_STEALTH, GSTACK_CDP_STEALTH, GSTACK_GPU_*, GSTACK_PLATFORM,
GSTACK_HW_CONCURRENCY/GSTACK_DEVICE_MEMORY + the explicit --gstack-* switches and
ignoreDefaultArgs stripping). Correct the stale "narrows to navigator.webdriver
masking only" premise on the open CDP-patch TODO (the TODO itself stays open —
the CDP-protocol layer is still unaddressed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-18 10:45:05 -07:00

Garry Tan

7ca04d8ef0

v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594 )

* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569)

gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for
GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin
(e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env
via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints,
analytics, and learnings into that plugin's directory. Anyone with the
Codex plugin installed alongside gstack hit this silently.

Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when
CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path
contains "gstack"). Skill installs fall through to \$HOME/.gstack.

Contributed by @ElliotDrel via #1570. Closes #1569.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+

gbrain v0.20+ changed `gbrain sources list --json` to return
{sources: [...]} instead of a flat array. sourceLocalPath crashed
upstream with `list.find is not a function` on every /sync-gbrain
invocation against modern gbrain. Accept both shapes for
forward/backward compat, matching probeSource/sourcePageCount in
lib/gbrain-sources.ts.

Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564
(@tonyjzhou, same fix, different shape — credit retained).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559)

gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`,
which fails in any environment where the `command` builtin isn't on
the spawned process's PATH (most non-interactive shells). The probe
then reported gbrain as missing even when it was installed, and
context-load silently skipped vector/list queries.

Fix: probe `gbrain --version` directly with a 500ms timeout (matching
the rest of the file's MCP_TIMEOUT_MS). Same semantics, works
everywhere execFile works.

Contributed by @jbetala7 via #1560. Closes #1559.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418)

Adds an exec-path regression test that runs a fake gbrain shim emitting
the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings",
exit 1 for health_score < 100, no top-level `engine` field). Confirms
freshDetectEngineTier recovers stdout from the non-zero exit and falls
back to GBRAIN_HOME/config.json for the engine label.

The pre-existing test for #1415 only stripped gbrain from PATH; this
test exercises the actual doctor parse path, closing the gap that
codex's plan review flagged.

Also documents the schema_version separation in
lib/gbrain-local-status.ts: the local CacheEntry stays at version 1,
distinct from the doctor-output schema_version which we accept across
versions in gstack-memory-helpers.

Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2
collapse). The fix landed pre-emptively in v1.29.x; this commit pins
it with a stronger test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346)

#1346 reported that gstack-memory-ingest still called the renamed
gbrain put_page subcommand on gbrain v0.18+. The actual code migrated
to `gbrain put` and later to batch `gbrain import <dir>` before this
report landed — only documentation lag remained.

This commit:
- Updates the --help string ("Skip gbrain put calls (still updates
  state file)") so user-facing docs match the shipped subcommand
- Updates two inline comments that still referenced the old name
- Adds test/memory-ingest-no-put_page.test.ts: a regression pin that
  strips comments from bin/gstack-memory-ingest.ts and fails the build
  if "put_page" appears in any active code or string literal, plus a
  sanity check that the file still calls a supported gbrain page-write
  verb (put or import)

Closes #1346. Reporter @kylma-code surfaced the doc lag; the original
code migration credit is on the v1.27.x wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug>

scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions
using the renamed `gbrain put_page` subcommand across 10 skills
(office-hours, investigate, plan-ceo-review, retro, plan-eng-review,
ship, cso, design-consultation, fallback, entity-stub). Every gstack
user copying those snippets hit "unknown command: put_page" on gbrain
v0.18+.

This commit:
- Rewrites all 10 instruction templates to use `gbrain put <slug>
  --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML
  frontmatter inside --content, matching the v0.18+ subcommand shape
- Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands"
  table to reference `gbrain put` and `gbrain get`
- Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two
  invariants: (a) resolver source ships only canonical instructions,
  (b) every tracked SKILL.md file is free of `gbrain put_page`

CHANGELOG entries are deliberately left untouched (historical record).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561)

Bun's Windows shell parser rejects multiple constructs the inline
package.json build chain used: brace groups `{ cmd; }`, subshells with
redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells
near redirections in general. Every Windows install + every
auto-upgrade since v1.34.2.0 has failed on `bun run build`.

Extracts the build chain to scripts/build.sh and the .version writes to
scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing
involved. Also adds Windows-specific bun.exe handling for non-ASCII
PATHs (a separate Windows footgun where Bun's --compile fails when the
binary lives under a path with non-ASCII chars).

Updates test/build-script-shell-compat.test.ts to assert the new shape:
no subshells with redirections anywhere in the build chain, and build
delegates to scripts/build.sh which delegates .version writes.

Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed
in build helper), #1480 (@mikepsinn, partial overlap), #1460
(@realcarsonterry, brace-group fix subsumed) — credit retained.
Closes #1538, #1537, #1530, #1457, #1561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554)

bun build --compile on Windows appends .exe to the output filename,
producing browse.exe instead of browse. find-browse's existsSync probe
only checked the bare path and returned null on Windows even when the
binary was correctly built. .gitignore similarly only excluded the
bare bin/gstack-global-discover path, leaving the .exe variant
tracked.

This commit:
- .gitignore: changes `bin/gstack-global-discover` →
  `bin/gstack-global-discover*` so the Windows .exe variant is ignored
- browse/src/find-browse.ts: adds isExecutable + findExecutable helpers
  that fall back to .exe/.cmd/.bat probing on Windows, mirroring the
  same helper already in make-pdf/src/browseClient.ts and pdftotext.ts

Contributed by @Mike-E-Log via #1554.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest

Adds .github/workflows/windows-setup-e2e.yml as the gate that catches
Bun shell-parser regressions in the build chain before they reach
users. Triggers on PRs touching package.json, scripts/build.sh,
scripts/write-version-files.sh, setup, browse cli/find-browse, or
gstack-paths.

What it verifies:
1. bun run build completes on Windows (the previously-broken path that
   #1538/#1537/#1530/#1457/#1561 reported)
2. All compiled binaries land on disk (browse.exe, find-browse.exe,
   design.exe, gstack-global-discover.exe)
3. find-browse resolves to the .exe variant on Windows (regression
   gate for #1554)
4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT
   on Windows (regression gate for #1570)

Complements the existing windows-free-tests.yml (curated unit subset);
this new workflow exercises the install path itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209)

Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together
(mutually exclusive at argv level). Every /codex review, /review, and
/ship structured Codex review call ended with an argv error before the
model ran.

Fix: scope the diff in prompt text using
"Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD"
instead of `--base <base>`. Preserves the filesystem boundary
instruction across all invocations and keeps Codex's review prompt
tuning.

Touches:
- codex/SKILL.md.tmpl + regenerated codex/SKILL.md
- scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md
- test/gen-skill-docs.test.ts: new regression that fails if any of the
  five known files still contain the prompt+--base shape
- test/skill-validation.test.ts: corresponding negative + positive pin
  on the rendered SKILL.md files

Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527
(@mvanhorn — same intent, different patch shape, CONFLICTING) and
#1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained
in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): diff from git merge-base, not git diff origin/<base> (#1492)

git diff origin/<base> shows everything since the common ancestor in
both directions — it includes commits that landed on origin/<base>
after this branch was created as deletions. That made /review and
/ship's pre-landing structured review report inflated diff totals and
flagged "removed" code that was actually still present in the working
tree.

Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff
the working tree against that point. Same coverage of uncommitted
edits, no phantom deletions from out-of-order base advancement.

Applies to /review's Step 1 (diff existence check), Step 3 (get the
diff), the build-on-intent scope-creep check, the structured review
DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt.
Same change flows into ship/SKILL.md via the shared resolver.

Touches:
- review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md
- scripts/resolvers/review.ts
- scripts/resolvers/review-army.ts

Contributed by @mvanhorn via #1492.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522)

#1503 reported that the bare codex review --base path stripped the
filesystem boundary instruction, letting Codex spend tokens reading
.claude/skills/ and agents/. #1522 proposed adding a skill-path
detector that switched to the custom-instructions route when the diff
touched skill files.

After C10 (#1209) restructured codex review to always carry the
boundary in the prompt (the prompt+--base argv conflict forced the
restructure), the skill-path detector becomes redundant — every
default call already preserves the boundary.

This commit pins the post-#1209 invariant with a test that fails the
build if any future refactor strips the boundary from codex/SKILL.md,
review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test.

#1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers
its safety concern); credit retained in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): use command -v instead of which for codex detection (#1197)

`which` is not on PATH in every shell — some Windows shells, BusyBox-
only containers, and minimal CI images all fail when skills probe
codex availability via `which codex`. `command -v` is a POSIX builtin
and always available where the skill is running.

Touched:
- codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "")
- scripts/resolvers/review.ts and scripts/resolvers/design.ts:
  3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1`
- Regenerated all 10 affected SKILL.md files (codex, review, ship,
  design-consultation, design-review, office-hours, plan-ceo-review,
  plan-design-review, plan-devex-review, plan-eng-review)
- test/skill-validation.test.ts: updated pin + defensive regression
  test that fails if `which codex` returns to codex/SKILL.md
- test/skill-e2e-plan.test.ts: updated summary regex

Contributed by @mvanhorn via #1197.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327)

When codex exits non-zero (parse errors, arg-shape breaks, model API
errors that propagate as non-zero status), the calling agent
previously saw an empty output and burned 30-60 minutes misdiagnosing
as a silent model/API stall. The hang-detection block only caught
exit 124 (the timeout-wrapper signal).

Adds elif blocks in all four codex invocation sites (Review default,
Challenge, Consult new-session, Consult resume) that:
- Echo "[codex exit N] <stderr first line>" to stdout
- Indent the first 20 stderr lines for inline context
- Log codex_nonzero_exit telemetry tagged with the call site

Contributed by @genisis0x via #1467. Closes #1327.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248)

The design binary previously called process.env.OPENAI_API_KEY without
checking where the key came from. If a user ran $D inside someone
else's project that had OPENAI_API_KEY in its .env, the resulting
generation billed that project's account. Silent and irreversible.

Fix: resolveApiKeyInfo() returns both the key and its source. When the
env-var path matches an OPENAI_API_KEY entry in the current
directory's .env, .env.<NODE_ENV>, or .env.local file, we set a
warning. requireApiKey() prints "Using OpenAI key from <source>" plus
the warning before the run — never the key itself.

Adds 6 unit tests covering: config-vs-env precedence, env-only (no
match), env+cwd .env match, quoted/exported values, value-mismatch
(no false positive), and the no-leak invariant for requireApiKey
stderr output.

Contributed by @jbetala7 via #1278. Closes #1248.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214)

Full-page screenshots of tall pages routinely exceeded 2000px on the
longest dimension, silently bricking the agent's session: the
resulting base64 reached the Anthropic vision API which rejected the
oversized image, leaving the agent burning turns on a useless blob
with no stderr trace from the browse side.

Adds browse/src/screenshot-size-guard.ts as a shared helper:
- guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000
- guardScreenshotPath(path) → file-mode variant that rewrites in place
- Aspect ratio preserved via sharp's resize fit:inside
- Stderr diagnostic on any downscale so callers can see when it fired
- Lazy sharp import so non-screenshot paths pay no startup cost

Wires the guard into all three full-page callsites codex review
flagged:
- browse/src/snapshot.ts: annotated + heatmap fullPage captures
- browse/src/meta-commands.ts: screenshot command (path + base64
  fullPage modes) plus the responsive 3-viewport sweep
- browse/src/write-commands.ts: prettyscreenshot fullPage path

Covers seven unit cases (pass-through, downscale, aspect ratio,
exactly-2000px edge, file-mode rewrite) plus a static invariant test
that fails the build if any of the three callsites stops importing the
guard.

Closes #1214.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370)

The L4 TestSavant classifier in browse/src/security-classifier.ts
can't be imported into the compiled browse server (onnxruntime-node
dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent
that used to host it (sidebar-agent.ts) was removed when the PTY
proved out — leaving the classifier file shipped but with zero
callers. Exactly the gap codex flagged in #1370.

Adds browse/src/security-sidecar-entry.ts: a Node script that runs the
classifier as a subprocess of the browse server. It reads NDJSON
requests from stdin and writes id-correlated NDJSON responses to
stdout, supporting:
  - op: "scan-page-content" — full L4 classifier scan
  - op: "ping" — liveness probe for the client's health check
  - op: "status" — classifier readiness (used by /pty-inject-scan to
    surface l4 { available: bool } in its response)

Plus browse/src/find-security-sidecar.ts: a resolver that locates
node + the bundled JS entry (browse/dist/security-sidecar.js, built in
a follow-up package.json change) or falls back to the dev TS entry.
Returns null cleanly when node isn't on PATH so the calling endpoint
can degrade per D7 (extension WARN + user confirm).

C17 of the security-stack wave. C18 adds the IPC client + lifecycle
management; C19 wires the endpoint; C20 routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370)

Adds browse/src/security-sidecar-client.ts to manage the Node L4
classifier subprocess from the compiled browse server:

- Lazy spawn on first scan; reuses the same process across requests
- Id-correlated request/response via NDJSON over stdio
- 5s default per-scan timeout; 64KB payload cap (short-circuits before
  spawn so oversized requests don't waste a process)
- 3-in-10-minutes respawn cap → trips circuit breaker; subsequent
  scans throw immediately so the /pty-inject-scan endpoint can surface
  l4 { available: false } to the extension and degrade to WARN+confirm
- process.on('exit') sends SIGTERM to the child for clean teardown
- isSidecarAvailable() lets the endpoint probe before scan calls so
  the response shape reflects degraded mode honestly

Unit tests cover the payload cap, the availability probe, and the
breaker-doesn't-crash invariant under repeated rejected calls.

C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20
routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370)

The sidebar's gstackInjectToTerminal callers (toolbar Cleanup,
Inspector "Send to Code") were piping page-derived text directly into
the live claude PTY with ZERO classifier processing — the gap codex
flagged in #1370. The documented sidebar security stack had a hole
the size of every Cleanup-button click.

Adds POST /pty-inject-scan to browse/src/server.ts:
- Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the
  general 404 path; never reaches the scan logic)
- Root-token auth via existing validateAuth() — 401 on unauth
- 64KB request cap → 413 + payload-too-large body
- 5s scan timeout via sidecar client
- URL-blocklist forced to BLOCK in PTY context (page-derived REPL
  input is higher-risk than ordinary tool output)
- L4 ML classifier via the sidecar when available; degrades to WARN
  per D7 when sidecar is unavailable
- Response goes through JSON.stringify(..., sanitizeReplacer) per
  v1.38.0.0 Unicode-egress hardening
- Imports only from security-sidecar-client.ts, never directly from
  security-classifier.ts (which would brick the compiled Bun binary)

Seven static-invariant tests pin the POST verb, auth gate, 64KB cap,
tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability
shape, and the no-direct-classifier-import rule.

C19 of the security-stack wave. C20 routes the extension through it;
C21 adds the invariant AST check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370)

Closes the documented-vs-shipped gap codex flagged in #1370. The
sidebar's two PTY-injection call sites (Inspector "Send to Code" and
toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint
before writing to the live claude REPL.

Adds window.gstackScanForPTYInject(text, origin) to
extension/sidepanel-terminal.js:
- Async, returns { allow, verdict, reasons, l4 }
- POST to /pty-inject-scan with the existing root-token auth
- WARN+confirm on scan failure (network down, sidecar absent, etc.)
  rather than silent PASS — D7 honest-degradation

gstackInjectToTerminal stays synchronous, returns boolean. Per D6:
keeping the inject sync means existing `const ok = ...?.()` callers
don't break, and the invariant test in
test/extension-pty-inject-invariant.test.ts can statically pin that
every call goes through the scan first.

extension/sidepanel.js call sites updated:
- inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via
  window.confirm, PASS injects silently
- runCleanup() → same flow. Static cleanup prompt always PASSes but
  still routes through scan to honor the invariant.

C20 of the security-stack wave. C21 adds the static invariant test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): invariant — extension PTY inject must be scan-gated (#1370)

Static-analysis invariant test that fails the build if any
extension/*.js path calls window.gstackInjectToTerminal without a
preceding window.gstackScanForPTYInject in the same enclosing
function. Closes the documented-vs-shipped gap codex demanded a
machine check on.

Rules:
- Rule 1: any file that calls inject must also reference scan
- Rule 2: in the enclosing function (function declaration, arrow,
  async (), event handler), a scan call must appear before the inject
  call by source position
- Exemption: sidepanel-terminal.js (the file that DEFINES the inject
  function) is exempt from Rule 2 since the definition is not a call

Plus two structural checks:
- sidepanel-terminal.js defines both the inject and scan functions
- inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would
  silently break the `const ok = ...?.()` pattern at every caller

C21 of the security-stack wave. The sidecar architecture (#1370) is
complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension
pre-scan wiring (C20), and now the regression gate (C21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112)

Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current
browse/src/stealth.ts contract. The existing minimal "codex narrowed"
stealth (webdriver-mask + AutomationControlled launch arg) stays the
default. PR #1112's six additional patches are added behind an opt-in
GSTACK_STEALTH=extended env flag.

Extended-mode patches (applied AFTER the default mask, in order):
  1. delete navigator.webdriver from prototype (not just the getter —
     detectors check `"webdriver" in navigator`)
  2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1
     software-GPU tell in containers)
  3. navigator.plugins returns a PluginArray-prototype-passing array
     with MimeType objects and namedItem()
  4. window.chrome populated with chrome.app, chrome.runtime,
     chrome.loadTimes(), chrome.csi() with realistic shapes
  5. navigator.mediaDevices backfilled when headless drops it
  6. CDP cdc_*-prefixed window globals cleared

Why opt-in: the default mode's contract is fingerprint CONSISTENCY,
which protects against detectors that flag spoofing mismatch. Extended
mode actively lies about the environment; sites that reflect on these
properties can break. Users who hit detection in default mode can flip
GSTACK_STEALTH=extended for SannySoft 100% pass-rate.

Twenty unit tests pin the env-flag semantics, all six patches' code
presence, and the applyStealth wiring order. Live SannySoft pass-rate
verification stays in the periodic-tier E2E suite.

Contributed by @garrytan via #1112 (rebased — original PR opened
before the codex-narrowed minimum landed; rebase preserves the
narrowed default while adding the SannySoft-passing path as opt-in).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates

Updates the three ship-SKILL.md golden baselines (claude, codex,
factory hosts) to match the new shape produced by:
- C10 #1209 codex argv (prompt + diff scope, no --base)
- C11 #1492 merge-base diff (DIFF_BASE= preamble)
- C13 #1197 command -v for codex detection
- C12 + boundary preservation per regen-enforcing test

Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs,
commit the regenerated outputs together. Goldens are part of the
regen contract — without this commit, test/host-config.test.ts'
golden-baseline checks fail with the diff codex review surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes)

Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the
release-summary format in CLAUDE.md: two-line headline, lead
paragraph, "The numbers that matter" table, "What this means for
builders" closer, then itemized Added/Changed/Fixed/For contributors
with inline credit to every PR author and original issue reporter.

Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net,
substantial new capability across security (PTY sidecar wiring),
install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI
0.130+), and quality (screenshot guard, design key disclosure,
extended stealth opt-in). MINOR is the right call.

Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537,
#1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327,
#1193 pattern, #1152 pattern. Credit retained inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe]

windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a
freshly-built repo where binaries land at browse/dist/browse.exe (no
.claude/skills/gstack/ install layout). The previous markers chain
only matched .codex/.agents/.claude prefixed paths, so find-browse
exited "not found" even when the binary was present.

Adds a source-checkout fallback after the marker scan: if no
installed layout resolves but <git-root>/browse/dist/browse[.exe]
exists, return that. Three real callers hit this path:
- gstack repo dev workflow before `./setup` runs
- windows-setup-e2e.yml CI (the breakage that surfaced this)
- make-pdf consumers running from a sibling source checkout

Smoke-verified: a fresh git repo with browse/dist/browse on disk now
resolves through the source-checkout branch (was returning null
before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 07:35:01 -07:00

2 Commits