Files
gstack/browse/src/snapshot.ts
T
Garry Tan 7ca04d8ef0 v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569)

gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for
GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin
(e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env
via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints,
analytics, and learnings into that plugin's directory. Anyone with the
Codex plugin installed alongside gstack hit this silently.

Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when
CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path
contains "gstack"). Skill installs fall through to \$HOME/.gstack.

Contributed by @ElliotDrel via #1570. Closes #1569.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+

gbrain v0.20+ changed `gbrain sources list --json` to return
{sources: [...]} instead of a flat array. sourceLocalPath crashed
upstream with `list.find is not a function` on every /sync-gbrain
invocation against modern gbrain. Accept both shapes for
forward/backward compat, matching probeSource/sourcePageCount in
lib/gbrain-sources.ts.

Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564
(@tonyjzhou, same fix, different shape — credit retained).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559)

gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`,
which fails in any environment where the `command` builtin isn't on
the spawned process's PATH (most non-interactive shells). The probe
then reported gbrain as missing even when it was installed, and
context-load silently skipped vector/list queries.

Fix: probe `gbrain --version` directly with a 500ms timeout (matching
the rest of the file's MCP_TIMEOUT_MS). Same semantics, works
everywhere execFile works.

Contributed by @jbetala7 via #1560. Closes #1559.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418)

Adds an exec-path regression test that runs a fake gbrain shim emitting
the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings",
exit 1 for health_score < 100, no top-level `engine` field). Confirms
freshDetectEngineTier recovers stdout from the non-zero exit and falls
back to GBRAIN_HOME/config.json for the engine label.

The pre-existing test for #1415 only stripped gbrain from PATH; this
test exercises the actual doctor parse path, closing the gap that
codex's plan review flagged.

Also documents the schema_version separation in
lib/gbrain-local-status.ts: the local CacheEntry stays at version 1,
distinct from the doctor-output schema_version which we accept across
versions in gstack-memory-helpers.

Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2
collapse). The fix landed pre-emptively in v1.29.x; this commit pins
it with a stronger test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346)

#1346 reported that gstack-memory-ingest still called the renamed
gbrain put_page subcommand on gbrain v0.18+. The actual code migrated
to `gbrain put` and later to batch `gbrain import <dir>` before this
report landed — only documentation lag remained.

This commit:
- Updates the --help string ("Skip gbrain put calls (still updates
  state file)") so user-facing docs match the shipped subcommand
- Updates two inline comments that still referenced the old name
- Adds test/memory-ingest-no-put_page.test.ts: a regression pin that
  strips comments from bin/gstack-memory-ingest.ts and fails the build
  if "put_page" appears in any active code or string literal, plus a
  sanity check that the file still calls a supported gbrain page-write
  verb (put or import)

Closes #1346. Reporter @kylma-code surfaced the doc lag; the original
code migration credit is on the v1.27.x wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug>

scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions
using the renamed `gbrain put_page` subcommand across 10 skills
(office-hours, investigate, plan-ceo-review, retro, plan-eng-review,
ship, cso, design-consultation, fallback, entity-stub). Every gstack
user copying those snippets hit "unknown command: put_page" on gbrain
v0.18+.

This commit:
- Rewrites all 10 instruction templates to use `gbrain put <slug>
  --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML
  frontmatter inside --content, matching the v0.18+ subcommand shape
- Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands"
  table to reference `gbrain put` and `gbrain get`
- Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two
  invariants: (a) resolver source ships only canonical instructions,
  (b) every tracked SKILL.md file is free of `gbrain put_page`

CHANGELOG entries are deliberately left untouched (historical record).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561)

Bun's Windows shell parser rejects multiple constructs the inline
package.json build chain used: brace groups `{ cmd; }`, subshells with
redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells
near redirections in general. Every Windows install + every
auto-upgrade since v1.34.2.0 has failed on `bun run build`.

Extracts the build chain to scripts/build.sh and the .version writes to
scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing
involved. Also adds Windows-specific bun.exe handling for non-ASCII
PATHs (a separate Windows footgun where Bun's --compile fails when the
binary lives under a path with non-ASCII chars).

Updates test/build-script-shell-compat.test.ts to assert the new shape:
no subshells with redirections anywhere in the build chain, and build
delegates to scripts/build.sh which delegates .version writes.

Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed
in build helper), #1480 (@mikepsinn, partial overlap), #1460
(@realcarsonterry, brace-group fix subsumed) — credit retained.
Closes #1538, #1537, #1530, #1457, #1561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554)

bun build --compile on Windows appends .exe to the output filename,
producing browse.exe instead of browse. find-browse's existsSync probe
only checked the bare path and returned null on Windows even when the
binary was correctly built. .gitignore similarly only excluded the
bare bin/gstack-global-discover path, leaving the .exe variant
tracked.

This commit:
- .gitignore: changes `bin/gstack-global-discover` →
  `bin/gstack-global-discover*` so the Windows .exe variant is ignored
- browse/src/find-browse.ts: adds isExecutable + findExecutable helpers
  that fall back to .exe/.cmd/.bat probing on Windows, mirroring the
  same helper already in make-pdf/src/browseClient.ts and pdftotext.ts

Contributed by @Mike-E-Log via #1554.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest

Adds .github/workflows/windows-setup-e2e.yml as the gate that catches
Bun shell-parser regressions in the build chain before they reach
users. Triggers on PRs touching package.json, scripts/build.sh,
scripts/write-version-files.sh, setup, browse cli/find-browse, or
gstack-paths.

What it verifies:
1. bun run build completes on Windows (the previously-broken path that
   #1538/#1537/#1530/#1457/#1561 reported)
2. All compiled binaries land on disk (browse.exe, find-browse.exe,
   design.exe, gstack-global-discover.exe)
3. find-browse resolves to the .exe variant on Windows (regression
   gate for #1554)
4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT
   on Windows (regression gate for #1570)

Complements the existing windows-free-tests.yml (curated unit subset);
this new workflow exercises the install path itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209)

Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together
(mutually exclusive at argv level). Every /codex review, /review, and
/ship structured Codex review call ended with an argv error before the
model ran.

Fix: scope the diff in prompt text using
"Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD"
instead of `--base <base>`. Preserves the filesystem boundary
instruction across all invocations and keeps Codex's review prompt
tuning.

Touches:
- codex/SKILL.md.tmpl + regenerated codex/SKILL.md
- scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md
- test/gen-skill-docs.test.ts: new regression that fails if any of the
  five known files still contain the prompt+--base shape
- test/skill-validation.test.ts: corresponding negative + positive pin
  on the rendered SKILL.md files

Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527
(@mvanhorn — same intent, different patch shape, CONFLICTING) and
#1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained
in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): diff from git merge-base, not git diff origin/<base> (#1492)

git diff origin/<base> shows everything since the common ancestor in
both directions — it includes commits that landed on origin/<base>
after this branch was created as deletions. That made /review and
/ship's pre-landing structured review report inflated diff totals and
flagged "removed" code that was actually still present in the working
tree.

Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff
the working tree against that point. Same coverage of uncommitted
edits, no phantom deletions from out-of-order base advancement.

Applies to /review's Step 1 (diff existence check), Step 3 (get the
diff), the build-on-intent scope-creep check, the structured review
DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt.
Same change flows into ship/SKILL.md via the shared resolver.

Touches:
- review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md
- scripts/resolvers/review.ts
- scripts/resolvers/review-army.ts

Contributed by @mvanhorn via #1492.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522)

#1503 reported that the bare codex review --base path stripped the
filesystem boundary instruction, letting Codex spend tokens reading
.claude/skills/ and agents/. #1522 proposed adding a skill-path
detector that switched to the custom-instructions route when the diff
touched skill files.

After C10 (#1209) restructured codex review to always carry the
boundary in the prompt (the prompt+--base argv conflict forced the
restructure), the skill-path detector becomes redundant — every
default call already preserves the boundary.

This commit pins the post-#1209 invariant with a test that fails the
build if any future refactor strips the boundary from codex/SKILL.md,
review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test.

#1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers
its safety concern); credit retained in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): use command -v instead of which for codex detection (#1197)

`which` is not on PATH in every shell — some Windows shells, BusyBox-
only containers, and minimal CI images all fail when skills probe
codex availability via `which codex`. `command -v` is a POSIX builtin
and always available where the skill is running.

Touched:
- codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "")
- scripts/resolvers/review.ts and scripts/resolvers/design.ts:
  3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1`
- Regenerated all 10 affected SKILL.md files (codex, review, ship,
  design-consultation, design-review, office-hours, plan-ceo-review,
  plan-design-review, plan-devex-review, plan-eng-review)
- test/skill-validation.test.ts: updated pin + defensive regression
  test that fails if `which codex` returns to codex/SKILL.md
- test/skill-e2e-plan.test.ts: updated summary regex

Contributed by @mvanhorn via #1197.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327)

When codex exits non-zero (parse errors, arg-shape breaks, model API
errors that propagate as non-zero status), the calling agent
previously saw an empty output and burned 30-60 minutes misdiagnosing
as a silent model/API stall. The hang-detection block only caught
exit 124 (the timeout-wrapper signal).

Adds elif blocks in all four codex invocation sites (Review default,
Challenge, Consult new-session, Consult resume) that:
- Echo "[codex exit N] <stderr first line>" to stdout
- Indent the first 20 stderr lines for inline context
- Log codex_nonzero_exit telemetry tagged with the call site

Contributed by @genisis0x via #1467. Closes #1327.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248)

The design binary previously called process.env.OPENAI_API_KEY without
checking where the key came from. If a user ran $D inside someone
else's project that had OPENAI_API_KEY in its .env, the resulting
generation billed that project's account. Silent and irreversible.

Fix: resolveApiKeyInfo() returns both the key and its source. When the
env-var path matches an OPENAI_API_KEY entry in the current
directory's .env, .env.<NODE_ENV>, or .env.local file, we set a
warning. requireApiKey() prints "Using OpenAI key from <source>" plus
the warning before the run — never the key itself.

Adds 6 unit tests covering: config-vs-env precedence, env-only (no
match), env+cwd .env match, quoted/exported values, value-mismatch
(no false positive), and the no-leak invariant for requireApiKey
stderr output.

Contributed by @jbetala7 via #1278. Closes #1248.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214)

Full-page screenshots of tall pages routinely exceeded 2000px on the
longest dimension, silently bricking the agent's session: the
resulting base64 reached the Anthropic vision API which rejected the
oversized image, leaving the agent burning turns on a useless blob
with no stderr trace from the browse side.

Adds browse/src/screenshot-size-guard.ts as a shared helper:
- guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000
- guardScreenshotPath(path) → file-mode variant that rewrites in place
- Aspect ratio preserved via sharp's resize fit:inside
- Stderr diagnostic on any downscale so callers can see when it fired
- Lazy sharp import so non-screenshot paths pay no startup cost

Wires the guard into all three full-page callsites codex review
flagged:
- browse/src/snapshot.ts: annotated + heatmap fullPage captures
- browse/src/meta-commands.ts: screenshot command (path + base64
  fullPage modes) plus the responsive 3-viewport sweep
- browse/src/write-commands.ts: prettyscreenshot fullPage path

Covers seven unit cases (pass-through, downscale, aspect ratio,
exactly-2000px edge, file-mode rewrite) plus a static invariant test
that fails the build if any of the three callsites stops importing the
guard.

Closes #1214.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370)

The L4 TestSavant classifier in browse/src/security-classifier.ts
can't be imported into the compiled browse server (onnxruntime-node
dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent
that used to host it (sidebar-agent.ts) was removed when the PTY
proved out — leaving the classifier file shipped but with zero
callers. Exactly the gap codex flagged in #1370.

Adds browse/src/security-sidecar-entry.ts: a Node script that runs the
classifier as a subprocess of the browse server. It reads NDJSON
requests from stdin and writes id-correlated NDJSON responses to
stdout, supporting:
  - op: "scan-page-content" — full L4 classifier scan
  - op: "ping" — liveness probe for the client's health check
  - op: "status" — classifier readiness (used by /pty-inject-scan to
    surface l4 { available: bool } in its response)

Plus browse/src/find-security-sidecar.ts: a resolver that locates
node + the bundled JS entry (browse/dist/security-sidecar.js, built in
a follow-up package.json change) or falls back to the dev TS entry.
Returns null cleanly when node isn't on PATH so the calling endpoint
can degrade per D7 (extension WARN + user confirm).

C17 of the security-stack wave. C18 adds the IPC client + lifecycle
management; C19 wires the endpoint; C20 routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370)

Adds browse/src/security-sidecar-client.ts to manage the Node L4
classifier subprocess from the compiled browse server:

- Lazy spawn on first scan; reuses the same process across requests
- Id-correlated request/response via NDJSON over stdio
- 5s default per-scan timeout; 64KB payload cap (short-circuits before
  spawn so oversized requests don't waste a process)
- 3-in-10-minutes respawn cap → trips circuit breaker; subsequent
  scans throw immediately so the /pty-inject-scan endpoint can surface
  l4 { available: false } to the extension and degrade to WARN+confirm
- process.on('exit') sends SIGTERM to the child for clean teardown
- isSidecarAvailable() lets the endpoint probe before scan calls so
  the response shape reflects degraded mode honestly

Unit tests cover the payload cap, the availability probe, and the
breaker-doesn't-crash invariant under repeated rejected calls.

C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20
routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370)

The sidebar's gstackInjectToTerminal callers (toolbar Cleanup,
Inspector "Send to Code") were piping page-derived text directly into
the live claude PTY with ZERO classifier processing — the gap codex
flagged in #1370. The documented sidebar security stack had a hole
the size of every Cleanup-button click.

Adds POST /pty-inject-scan to browse/src/server.ts:
- Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the
  general 404 path; never reaches the scan logic)
- Root-token auth via existing validateAuth() — 401 on unauth
- 64KB request cap → 413 + payload-too-large body
- 5s scan timeout via sidecar client
- URL-blocklist forced to BLOCK in PTY context (page-derived REPL
  input is higher-risk than ordinary tool output)
- L4 ML classifier via the sidecar when available; degrades to WARN
  per D7 when sidecar is unavailable
- Response goes through JSON.stringify(..., sanitizeReplacer) per
  v1.38.0.0 Unicode-egress hardening
- Imports only from security-sidecar-client.ts, never directly from
  security-classifier.ts (which would brick the compiled Bun binary)

Seven static-invariant tests pin the POST verb, auth gate, 64KB cap,
tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability
shape, and the no-direct-classifier-import rule.

C19 of the security-stack wave. C20 routes the extension through it;
C21 adds the invariant AST check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370)

Closes the documented-vs-shipped gap codex flagged in #1370. The
sidebar's two PTY-injection call sites (Inspector "Send to Code" and
toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint
before writing to the live claude REPL.

Adds window.gstackScanForPTYInject(text, origin) to
extension/sidepanel-terminal.js:
- Async, returns { allow, verdict, reasons, l4 }
- POST to /pty-inject-scan with the existing root-token auth
- WARN+confirm on scan failure (network down, sidecar absent, etc.)
  rather than silent PASS — D7 honest-degradation

gstackInjectToTerminal stays synchronous, returns boolean. Per D6:
keeping the inject sync means existing `const ok = ...?.()` callers
don't break, and the invariant test in
test/extension-pty-inject-invariant.test.ts can statically pin that
every call goes through the scan first.

extension/sidepanel.js call sites updated:
- inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via
  window.confirm, PASS injects silently
- runCleanup() → same flow. Static cleanup prompt always PASSes but
  still routes through scan to honor the invariant.

C20 of the security-stack wave. C21 adds the static invariant test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): invariant — extension PTY inject must be scan-gated (#1370)

Static-analysis invariant test that fails the build if any
extension/*.js path calls window.gstackInjectToTerminal without a
preceding window.gstackScanForPTYInject in the same enclosing
function. Closes the documented-vs-shipped gap codex demanded a
machine check on.

Rules:
- Rule 1: any file that calls inject must also reference scan
- Rule 2: in the enclosing function (function declaration, arrow,
  async (), event handler), a scan call must appear before the inject
  call by source position
- Exemption: sidepanel-terminal.js (the file that DEFINES the inject
  function) is exempt from Rule 2 since the definition is not a call

Plus two structural checks:
- sidepanel-terminal.js defines both the inject and scan functions
- inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would
  silently break the `const ok = ...?.()` pattern at every caller

C21 of the security-stack wave. The sidecar architecture (#1370) is
complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension
pre-scan wiring (C20), and now the regression gate (C21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112)

Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current
browse/src/stealth.ts contract. The existing minimal "codex narrowed"
stealth (webdriver-mask + AutomationControlled launch arg) stays the
default. PR #1112's six additional patches are added behind an opt-in
GSTACK_STEALTH=extended env flag.

Extended-mode patches (applied AFTER the default mask, in order):
  1. delete navigator.webdriver from prototype (not just the getter —
     detectors check `"webdriver" in navigator`)
  2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1
     software-GPU tell in containers)
  3. navigator.plugins returns a PluginArray-prototype-passing array
     with MimeType objects and namedItem()
  4. window.chrome populated with chrome.app, chrome.runtime,
     chrome.loadTimes(), chrome.csi() with realistic shapes
  5. navigator.mediaDevices backfilled when headless drops it
  6. CDP cdc_*-prefixed window globals cleared

Why opt-in: the default mode's contract is fingerprint CONSISTENCY,
which protects against detectors that flag spoofing mismatch. Extended
mode actively lies about the environment; sites that reflect on these
properties can break. Users who hit detection in default mode can flip
GSTACK_STEALTH=extended for SannySoft 100% pass-rate.

Twenty unit tests pin the env-flag semantics, all six patches' code
presence, and the applyStealth wiring order. Live SannySoft pass-rate
verification stays in the periodic-tier E2E suite.

Contributed by @garrytan via #1112 (rebased — original PR opened
before the codex-narrowed minimum landed; rebase preserves the
narrowed default while adding the SannySoft-passing path as opt-in).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates

Updates the three ship-SKILL.md golden baselines (claude, codex,
factory hosts) to match the new shape produced by:
- C10 #1209 codex argv (prompt + diff scope, no --base)
- C11 #1492 merge-base diff (DIFF_BASE= preamble)
- C13 #1197 command -v for codex detection
- C12 + boundary preservation per regen-enforcing test

Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs,
commit the regenerated outputs together. Goldens are part of the
regen contract — without this commit, test/host-config.test.ts'
golden-baseline checks fail with the diff codex review surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes)

Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the
release-summary format in CLAUDE.md: two-line headline, lead
paragraph, "The numbers that matter" table, "What this means for
builders" closer, then itemized Added/Changed/Fixed/For contributors
with inline credit to every PR author and original issue reporter.

Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net,
substantial new capability across security (PTY sidecar wiring),
install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI
0.130+), and quality (screenshot guard, design key disclosure,
extended stealth opt-in). MINOR is the right call.

Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537,
#1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327,
#1193 pattern, #1152 pattern. Credit retained inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe]

windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a
freshly-built repo where binaries land at browse/dist/browse.exe (no
.claude/skills/gstack/ install layout). The previous markers chain
only matched .codex/.agents/.claude prefixed paths, so find-browse
exited "not found" even when the binary was present.

Adds a source-checkout fallback after the marker scan: if no
installed layout resolves but <git-root>/browse/dist/browse[.exe]
exists, return that. Three real callers hit this path:
- gstack repo dev workflow before `./setup` runs
- windows-setup-e2e.yml CI (the breakage that surfaced this)
- make-pdf consumers running from a sibling source checkout

Smoke-verified: a fresh git repo with browse/dist/browse on disk now
resolves through the source-checkout branch (was returning null
before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:35:01 -07:00

635 lines
26 KiB
TypeScript

/**
* Snapshot command — accessibility tree with ref-based element selection
*
* Architecture (Locator map — no DOM mutation):
* 1. page.locator(scope).ariaSnapshot() → YAML-like accessibility tree
* 2. Parse tree, assign refs @e1, @e2, ...
* 3. Build Playwright Locator for each ref (getByRole + nth)
* 4. Store Map<string, Locator> on BrowserManager
* 5. Return compact text output with refs prepended
*
* Extended features:
* --diff / -D: Compare against last snapshot, return unified diff
* --annotate / -a: Screenshot with overlay boxes at each @ref
* --output / -o: Output path for annotated screenshot
* -C / --cursor-interactive: Scan for cursor:pointer/onclick/tabindex elements
*
* Later: "click @e3" → look up Locator → locator.click()
*/
import type { Page, Frame, Locator } from 'playwright';
import type { TabSession, RefEntry } from './tab-session';
import * as Diff from 'diff';
import { TEMP_DIR, isPathWithin } from './platform';
import { escapeEnvelopeSentinels } from './content-security';
import { stripLoneSurrogates } from './sanitize';
import { guardScreenshotPath } from './screenshot-size-guard';
// Roles considered "interactive" for the -i flag
const INTERACTIVE_ROLES = new Set([
'button', 'link', 'textbox', 'checkbox', 'radio', 'combobox',
'listbox', 'menuitem', 'menuitemcheckbox', 'menuitemradio',
'option', 'searchbox', 'slider', 'spinbutton', 'switch', 'tab',
'treeitem',
]);
interface SnapshotOptions {
interactive?: boolean; // -i: only interactive elements
compact?: boolean; // -c: remove empty structural elements
depth?: number; // -d N: limit tree depth
selector?: string; // -s SEL: scope to CSS selector
diff?: boolean; // -D / --diff: diff against last snapshot
annotate?: boolean; // -a / --annotate: annotated screenshot
outputPath?: string; // -o / --output: path for annotated screenshot
cursorInteractive?: boolean; // -C / --cursor-interactive: scan cursor:pointer etc.
heatmap?: string; // -H / --heatmap: JSON color map for ref overlays
}
/**
* Snapshot flag metadata — single source of truth for CLI parsing and doc generation.
*
* Imported by:
* - gen-skill-docs.ts (generates {{SNAPSHOT_FLAGS}} tables)
* - skill-parser.ts (validates flags in SKILL.md examples)
*/
export const SNAPSHOT_FLAGS: Array<{
short: string;
long: string;
description: string;
takesValue?: boolean;
valueHint?: string;
optionKey: keyof SnapshotOptions;
}> = [
{ short: '-i', long: '--interactive', description: 'Interactive elements only (buttons, links, inputs) with @e refs. Also auto-enables cursor-interactive scan (-C) to capture dropdowns and popovers.', optionKey: 'interactive' },
{ short: '-c', long: '--compact', description: 'Compact (no empty structural nodes)', optionKey: 'compact' },
{ short: '-d', long: '--depth', description: 'Limit tree depth (0 = root only, default: unlimited)', takesValue: true, valueHint: '<N>', optionKey: 'depth' },
{ short: '-s', long: '--selector', description: 'Scope to CSS selector', takesValue: true, valueHint: '<sel>', optionKey: 'selector' },
{ short: '-D', long: '--diff', description: 'Unified diff against previous snapshot (first call stores baseline)', optionKey: 'diff' },
{ short: '-a', long: '--annotate', description: 'Annotated screenshot with red overlay boxes and ref labels', optionKey: 'annotate' },
{ short: '-o', long: '--output', description: 'Output path for annotated screenshot (default: <temp>/browse-annotated.png)', takesValue: true, valueHint: '<path>', optionKey: 'outputPath' },
{ short: '-C', long: '--cursor-interactive', description: 'Cursor-interactive elements (@c refs — divs with pointer, onclick). Auto-enabled when -i is used.', optionKey: 'cursorInteractive' },
{ short: '-H', long: '--heatmap', description: 'Color-coded overlay screenshot from JSON map: \'{"@e1":"green","@e3":"red"}\'. Valid colors: green, yellow, red, blue, orange, gray.', takesValue: true, valueHint: '<json>', optionKey: 'heatmap' },
];
interface ParsedNode {
indent: number;
role: string;
name: string | null;
props: string; // e.g., "[level=1]"
children: string; // inline text content after ":"
rawLine: string;
}
/**
* Parse CLI args into SnapshotOptions — driven by SNAPSHOT_FLAGS metadata.
*/
export function parseSnapshotArgs(args: string[]): SnapshotOptions {
const opts: SnapshotOptions = {};
for (let i = 0; i < args.length; i++) {
const flag = SNAPSHOT_FLAGS.find(f => f.short === args[i] || f.long === args[i]);
if (!flag) throw new Error(`Unknown snapshot flag: ${args[i]}`);
if (flag.takesValue) {
const value = args[++i];
if (!value) throw new Error(`Usage: snapshot ${flag.short} <value>`);
if (flag.optionKey === 'depth') {
(opts as any)[flag.optionKey] = parseInt(value, 10);
if (isNaN(opts.depth!)) throw new Error('Usage: snapshot -d <number>');
} else {
(opts as any)[flag.optionKey] = value;
}
} else {
(opts as any)[flag.optionKey] = true;
}
}
return opts;
}
/**
* Parse one line of ariaSnapshot output.
*
* Format examples:
* - heading "Test" [level=1]
* - link "Link A":
* - /url: /a
* - textbox "Name"
* - paragraph: Some text
* - combobox "Role":
*/
function parseLine(line: string): ParsedNode | null {
// Match: (indent)(- )(role)( "name")?( [props])?(: inline)?
const match = line.match(/^(\s*)-\s+(\w+)(?:\s+"([^"]*)")?(?:\s+(\[.*?\]))?\s*(?::\s*(.*))?$/);
if (!match) {
// Skip metadata lines like "- /url: /a"
return null;
}
return {
indent: match[1].length,
role: match[2],
name: match[3] ?? null,
props: match[4] || '',
children: match[5]?.trim() || '',
rawLine: line,
};
}
/**
* Take an accessibility snapshot and build the ref map.
*/
export async function handleSnapshot(
args: string[],
session: TabSession,
securityOpts?: { splitForScoped?: boolean },
): Promise<string> {
const opts = parseSnapshotArgs(args);
const page = session.getPage();
// Frame-aware target for accessibility tree
const target = session.getActiveFrameOrPage();
const inFrame = session.getFrame() !== null;
// Get accessibility tree via ariaSnapshot
let rootLocator: Locator;
if (opts.selector) {
rootLocator = target.locator(opts.selector);
const count = await rootLocator.count();
if (count === 0) throw new Error(`Selector not found: ${opts.selector}`);
} else {
rootLocator = target.locator('body');
}
const ariaText = await rootLocator.ariaSnapshot();
if (!ariaText || ariaText.trim().length === 0) {
session.setRefMap(new Map());
return '(no accessible elements found)';
}
// Parse the ariaSnapshot output
const lines = ariaText.split('\n');
const refMap = new Map<string, RefEntry>();
const output: string[] = [];
let refCounter = 1;
// Track role+name occurrences for nth() disambiguation
const roleNameCounts = new Map<string, number>();
const roleNameSeen = new Map<string, number>();
// First pass: count role+name pairs for disambiguation
for (const line of lines) {
const node = parseLine(line);
if (!node) continue;
const key = `${node.role}:${node.name || ''}`;
roleNameCounts.set(key, (roleNameCounts.get(key) || 0) + 1);
}
// Second pass: assign refs and build locators
for (const line of lines) {
const node = parseLine(line);
if (!node) continue;
const depth = Math.floor(node.indent / 2);
const isInteractive = INTERACTIVE_ROLES.has(node.role);
// Depth filter
if (opts.depth !== undefined && depth > opts.depth) continue;
// Interactive filter: skip non-interactive but still count for locator indices
if (opts.interactive && !isInteractive) {
// Still track for nth() counts
const key = `${node.role}:${node.name || ''}`;
roleNameSeen.set(key, (roleNameSeen.get(key) || 0) + 1);
continue;
}
// Compact filter: skip elements with no name and no inline content that aren't interactive
if (opts.compact && !isInteractive && !node.name && !node.children) continue;
// Assign ref
const ref = `e${refCounter++}`;
const indent = ' '.repeat(depth);
// Build Playwright locator
const key = `${node.role}:${node.name || ''}`;
const seenIndex = roleNameSeen.get(key) || 0;
roleNameSeen.set(key, seenIndex + 1);
const totalCount = roleNameCounts.get(key) || 1;
let locator: Locator;
if (opts.selector) {
locator = target.locator(opts.selector).getByRole(node.role as any, {
name: node.name || undefined,
});
} else {
locator = target.getByRole(node.role as any, {
name: node.name || undefined,
});
}
// Disambiguate with nth() if multiple elements share role+name
if (totalCount > 1) {
locator = locator.nth(seenIndex);
}
refMap.set(ref, { locator, role: node.role, name: node.name || '' });
// Format output line
let outputLine = `${indent}@${ref} [${node.role}]`;
if (node.name) outputLine += ` "${node.name}"`;
if (node.props) outputLine += ` ${node.props}`;
if (node.children) outputLine += `: ${node.children}`;
output.push(outputLine);
}
// ─── Cursor-interactive scan (-C, or auto with -i) ────────
// Auto-enable cursor scan when interactive mode is on — agents asking for
// interactive elements should always see clickable non-ARIA items too.
if (opts.interactive && !opts.cursorInteractive) {
opts.cursorInteractive = true;
}
if (opts.cursorInteractive) {
try {
const cursorElements = await target.evaluate(() => {
const STANDARD_INTERACTIVE = new Set([
'A', 'BUTTON', 'INPUT', 'SELECT', 'TEXTAREA', 'SUMMARY', 'DETAILS',
]);
const results: Array<{ selector: string; text: string; reason: string }> = [];
const allElements = document.querySelectorAll('*');
for (const el of allElements) {
// Skip standard interactive elements (already in ARIA tree)
if (STANDARD_INTERACTIVE.has(el.tagName)) continue;
// Skip hidden elements
if (!(el as HTMLElement).offsetParent && el.tagName !== 'BODY') continue;
const style = getComputedStyle(el);
const hasCursorPointer = style.cursor === 'pointer';
const hasOnclick = el.hasAttribute('onclick');
const hasTabindex = el.hasAttribute('tabindex') && parseInt(el.getAttribute('tabindex')!, 10) >= 0;
const hasRole = el.hasAttribute('role');
// Check if element is inside a floating container (portal/popover/dropdown)
const isInFloating = (() => {
let parent: Element | null = el;
while (parent && parent !== document.documentElement) {
const pStyle = getComputedStyle(parent);
const isFloating = (pStyle.position === 'fixed' || pStyle.position === 'absolute') &&
parseInt(pStyle.zIndex || '0', 10) >= 10;
const hasPortalAttr = parent.hasAttribute('data-floating-ui-portal') ||
parent.hasAttribute('data-radix-popper-content-wrapper') ||
parent.hasAttribute('data-radix-portal') ||
parent.hasAttribute('data-popper-placement') ||
parent.getAttribute('role') === 'listbox' ||
parent.getAttribute('role') === 'menu';
if (isFloating || hasPortalAttr) return true;
parent = parent.parentElement;
}
return false;
})();
if (!hasCursorPointer && !hasOnclick && !hasTabindex) {
// For elements inside floating containers, also check for role="option"/"menuitem"
if (isInFloating && hasRole) {
const role = el.getAttribute('role');
if (role !== 'option' && role !== 'menuitem' && role !== 'menuitemcheckbox' && role !== 'menuitemradio') continue;
} else {
continue;
}
}
// Skip elements with ARIA roles UNLESS they're inside a floating container
// (floating container items may be missed by the accessibility tree)
if (hasRole && !isInFloating) continue;
// Build deterministic nth-child CSS path
const parts: string[] = [];
let current: Element | null = el;
while (current && current !== document.documentElement) {
const parent = current.parentElement;
if (!parent) break;
const siblings = [...parent.children];
const index = siblings.indexOf(current) + 1;
parts.unshift(`${current.tagName.toLowerCase()}:nth-child(${index})`);
current = parent;
}
const selector = parts.join(' > ');
const text = (el as HTMLElement).innerText?.trim().slice(0, 80) || el.tagName.toLowerCase();
const reasons: string[] = [];
if (isInFloating) reasons.push('popover-child');
if (hasCursorPointer) reasons.push('cursor:pointer');
if (hasOnclick) reasons.push('onclick');
if (hasTabindex) reasons.push(`tabindex=${el.getAttribute('tabindex')}`);
if (hasRole) reasons.push(`role=${el.getAttribute('role')}`);
results.push({ selector, text, reason: reasons.join(', ') });
}
return results;
});
if (cursorElements.length > 0) {
output.push('');
output.push('── cursor-interactive (not in ARIA tree) ──');
let cRefCounter = 1;
for (const elem of cursorElements) {
const ref = `c${cRefCounter++}`;
const locator = target.locator(elem.selector);
refMap.set(ref, { locator, role: 'cursor-interactive', name: elem.text });
output.push(`@${ref} [${elem.reason}] "${elem.text}"`);
}
}
} catch (err: any) {
// Cursor scan fails on pages with strict CSP or when page has navigated
if (!err?.message?.includes('Execution context') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Content Security')) throw err;
output.push('');
output.push('(cursor scan failed — CSP restriction)');
}
}
// Store ref map on BrowserManager
session.setRefMap(refMap);
if (output.length === 0) {
return '(no interactive elements found)';
}
const snapshotText = output.join('\n');
// ─── Annotated screenshot (-a) ────────────────────────────
if (opts.annotate) {
const screenshotPath = opts.outputPath || `${TEMP_DIR}/browse-annotated.png`;
// Validate output path — resolve symlinks to prevent symlink traversal attacks
{
const nodePath = require('path') as typeof import('path');
const nodeFs = require('fs') as typeof import('fs');
const absolute = nodePath.resolve(screenshotPath);
const safeDirs = [TEMP_DIR, process.cwd()].map((d: string) => {
try { return nodeFs.realpathSync(d); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; return d; }
});
let realPath: string;
try {
realPath = nodeFs.realpathSync(absolute);
} catch (err: any) {
if (err.code === 'ENOENT') {
try {
const dir = nodeFs.realpathSync(nodePath.dirname(absolute));
realPath = nodePath.join(dir, nodePath.basename(absolute));
} catch (err2: any) {
if (err2?.code !== 'ENOENT') throw err2;
realPath = absolute;
}
} else {
throw new Error(`Cannot resolve real path: ${screenshotPath} (${err.code})`);
}
}
if (!safeDirs.some((dir: string) => isPathWithin(realPath, dir))) {
throw new Error(`Path must be within: ${safeDirs.join(', ')}`);
}
}
try {
// Inject overlay divs at each ref's bounding box
const boxes: Array<{ ref: string; box: { x: number; y: number; width: number; height: number } }> = [];
for (const [ref, entry] of refMap) {
try {
const box = await entry.locator.boundingBox({ timeout: 1000 });
if (box) {
boxes.push({ ref: `@${ref}`, box });
}
} catch (err: any) {
// Element may be offscreen, hidden, or page navigated — skip
if (!err?.message?.includes('Timeout') && !err?.message?.includes('timeout') && !err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context')) throw err;
}
}
await page.evaluate((boxes) => {
for (const { ref, box } of boxes) {
const overlay = document.createElement('div');
overlay.className = '__browse_annotation__';
overlay.style.cssText = `
position: absolute; top: ${box.y}px; left: ${box.x}px;
width: ${box.width}px; height: ${box.height}px;
border: 2px solid red; background: rgba(255,0,0,0.1);
pointer-events: none; z-index: 99999;
font-size: 10px; color: red; font-weight: bold;
`;
const label = document.createElement('span');
label.textContent = ref;
label.style.cssText = 'position: absolute; top: -14px; left: 0; background: red; color: white; padding: 0 3px; font-size: 10px;';
overlay.appendChild(label);
document.body.appendChild(overlay);
}
}, boxes);
await page.screenshot({ path: screenshotPath, fullPage: true });
await guardScreenshotPath(screenshotPath);
// Always remove overlays
await page.evaluate(() => {
document.querySelectorAll('.__browse_annotation__').forEach(el => el.remove());
});
output.push('');
output.push(`[annotated screenshot: ${screenshotPath}]`);
} catch (err: any) {
// Remove overlays even on screenshot failure — but only swallow page/browser errors
if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context') && !err?.message?.includes('screenshot')) throw err;
try {
await page.evaluate(() => {
document.querySelectorAll('.__browse_annotation__').forEach(el => el.remove());
});
} catch (err2: any) {
if (!err2?.message?.includes('closed') && !err2?.message?.includes('Target') && !err2?.message?.includes('Execution context')) throw err2;
}
}
}
// ─── Heatmap mode (-H) ──────────────────────────────────────
if (opts.heatmap) {
const heatmapPath = opts.outputPath || `${TEMP_DIR}/browse-heatmap.png`;
// Validate output path
{
const nodePath = require('path') as typeof import('path');
const nodeFs = require('fs') as typeof import('fs');
const absolute = nodePath.resolve(heatmapPath);
const safeDirs = [TEMP_DIR, process.cwd()].map((d: string) => {
try { return nodeFs.realpathSync(d); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; return d; }
});
let realPath: string;
try {
realPath = nodeFs.realpathSync(absolute);
} catch (err: any) {
if (err.code === 'ENOENT') {
try {
const dir = nodeFs.realpathSync(nodePath.dirname(absolute));
realPath = nodePath.join(dir, nodePath.basename(absolute));
} catch (err2: any) {
if (err2?.code !== 'ENOENT') throw err2;
realPath = absolute;
}
} else {
throw new Error(`Cannot resolve real path: ${heatmapPath} (${err.code})`);
}
}
if (!safeDirs.some((dir: string) => isPathWithin(realPath, dir))) {
throw new Error(`Path must be within: ${safeDirs.join(', ')}`);
}
}
// Parse and validate color map
const VALID_COLORS = new Set(['green', 'yellow', 'red', 'blue', 'orange', 'gray']);
const COLOR_MAP: Record<string, { border: string; bg: string }> = {
green: { border: '#00b400', bg: 'rgba(0,180,0,0.15)' },
yellow: { border: '#ffb400', bg: 'rgba(255,180,0,0.15)' },
red: { border: '#ff0000', bg: 'rgba(255,0,0,0.15)' },
blue: { border: '#0066ff', bg: 'rgba(0,102,255,0.15)' },
orange: { border: '#ff6600', bg: 'rgba(255,102,0,0.15)' },
gray: { border: '#888888', bg: 'rgba(136,136,136,0.15)' },
};
let colorAssignments: Record<string, string>;
try {
const parsed = JSON.parse(opts.heatmap);
if (typeof parsed !== 'object' || parsed === null || Array.isArray(parsed)) {
throw new Error('not an object');
}
colorAssignments = parsed;
} catch {
throw new Error('Invalid heatmap JSON. Expected object: \'{"@e1":"green","@e3":"red"}\'');
}
// Validate colors
for (const [ref, color] of Object.entries(colorAssignments)) {
if (!VALID_COLORS.has(color)) {
throw new Error(`Invalid heatmap color "${color}" for ${ref}. Valid: ${[...VALID_COLORS].join(', ')}`);
}
}
try {
const boxes: Array<{ ref: string; box: { x: number; y: number; width: number; height: number }; color: string }> = [];
for (const [refKey, color] of Object.entries(colorAssignments)) {
const cleanRef = refKey.startsWith('@') ? refKey.slice(1) : refKey;
const entry = refMap.get(cleanRef);
if (!entry) continue; // Skip refs not found on page
try {
const box = await entry.locator.boundingBox({ timeout: 1000 });
if (box) {
const colors = COLOR_MAP[color] || COLOR_MAP.gray;
boxes.push({ ref: `@${cleanRef}`, box, color: JSON.stringify(colors) });
}
} catch {
// Element may be offscreen or hidden — skip
}
}
await page.evaluate((boxes) => {
for (const { ref, box, color } of boxes) {
const colors = JSON.parse(color);
const overlay = document.createElement('div');
overlay.className = '__browse_heatmap__';
overlay.style.cssText = `
position: absolute; top: ${box.y}px; left: ${box.x}px;
width: ${box.width}px; height: ${box.height}px;
border: 2px solid ${colors.border}; background: ${colors.bg};
pointer-events: none; z-index: 99999;
font-size: 10px; color: ${colors.border}; font-weight: bold;
`;
const label = document.createElement('span');
label.textContent = ref;
label.style.cssText = `position: absolute; top: -14px; left: 0; background: ${colors.border}; color: white; padding: 0 3px; font-size: 10px;`;
overlay.appendChild(label);
document.body.appendChild(overlay);
}
}, boxes);
await page.screenshot({ path: heatmapPath, fullPage: true });
await guardScreenshotPath(heatmapPath);
// Remove heatmap overlays
await page.evaluate(() => {
document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove());
});
output.push('');
output.push(`[heatmap screenshot: ${heatmapPath}]`);
} catch (err: any) {
// Cleanup on failure
try {
await page.evaluate(() => {
document.querySelectorAll('.__browse_heatmap__').forEach(el => el.remove());
});
} catch {}
if (!err?.message?.includes('closed') && !err?.message?.includes('Target') && !err?.message?.includes('Execution context') && !err?.message?.includes('screenshot')) throw err;
}
}
// ─── Diff mode (-D) ───────────────────────────────────────
if (opts.diff) {
const lastSnapshot = session.getLastSnapshot();
if (!lastSnapshot) {
session.setLastSnapshot(snapshotText);
return snapshotText + '\n\n(no previous snapshot to diff against — this snapshot stored as baseline)';
}
const changes = Diff.diffLines(lastSnapshot, snapshotText);
const diffOutput: string[] = ['--- previous snapshot', '+++ current snapshot', ''];
for (const part of changes) {
const prefix = part.added ? '+' : part.removed ? '-' : ' ';
const diffLines = part.value.split('\n').filter(l => l.length > 0);
for (const line of diffLines) {
diffOutput.push(`${prefix} ${line}`);
}
}
session.setLastSnapshot(snapshotText);
return stripLoneSurrogates(diffOutput.join('\n'));
}
// Store for future diffs
session.setLastSnapshot(snapshotText);
// Add frame context header when operating inside an iframe
if (inFrame) {
const frameUrl = session.getFrame()?.url() ?? 'unknown';
output.unshift(`[Context: iframe src="${frameUrl}"]`);
}
// Split output for scoped tokens: trusted refs + untrusted text
if (securityOpts?.splitForScoped) {
const trustedRefs: string[] = [];
const untrustedLines: string[] = [];
for (const line of output) {
// Lines starting with @ref are interactive elements (trusted metadata)
const refMatch = line.match(/^(\s*)@(e\d+|c\d+)\s+\[([^\]]+)\]\s*(.*)/);
if (refMatch) {
const [, indent, ref, role, rest] = refMatch;
// Truncate element name/content to 50 chars for trusted section
const nameMatch = rest.match(/^"(.+?)"/);
let truncName = nameMatch ? nameMatch[1] : rest.trim();
if (truncName.length > 50) truncName = truncName.slice(0, 47) + '...';
trustedRefs.push(`${indent}@${ref} [${role}] "${truncName}"`);
}
// All lines go to untrusted section (full content)
untrustedLines.push(line);
}
const parts: string[] = [];
if (trustedRefs.length > 0) {
parts.push('INTERACTIVE ELEMENTS (trusted — use these @refs for click/fill):');
parts.push(...trustedRefs);
parts.push('');
}
// Defuse any envelope sentinel that appears inside the page's own
// accessibility text. Without this, a page whose rendered content
// contains the literal `═══ END UNTRUSTED WEB CONTENT ═══` string
// can close the envelope early and forge a fake "trusted" block
// for the LLM. Same escape that wrapUntrustedPageContent applies.
const safeUntrusted = untrustedLines.map(escapeEnvelopeSentinels);
parts.push('═══ BEGIN UNTRUSTED WEB CONTENT ═══');
parts.push(...safeUntrusted);
parts.push('═══ END UNTRUSTED WEB CONTENT ═══');
return stripLoneSurrogates(parts.join('\n'));
}
return stripLoneSurrogates(output.join('\n'));
}