Commit Graph

3 Commits

Author SHA1 Message Date
Garry Tan 66f3a180d3 v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642)
* fix(gbrain-sync): --full produces an empty code index on first run of a new repo

`gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks
the filesystem. On a freshly-registered source (0 pages), a --full run that
called reindex-code alone found nothing ("No code pages to reindex"), finished
in ~1s, and left the code index permanently empty while still reporting OK.

Fix: --full now runs `sync --strategy code` FIRST to create pages via the file
walk, then runs `reindex-code` to honor the documented "full walk + reindex"
contract for both fresh and populated sources.

Contributed by @jetsetterfl via #1584.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL

The freshClassify probe ran `gbrain sources list --json` with the inherited
process env. When the probe ran from inside a repo with its own .env (an app
DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain
connected to the wrong database, and the classifier reported broken-db on
otherwise-healthy brains.

Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the
same helper the sync orchestrator uses. DATABASE_URL is seeded from
~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no
longer propagate a poisoned negative to clean directories.

Contributed by @jetsetterfl via #1583.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624)

/retro silently produced confidently-wrong output when "today" drifted (model
session-context error) or when origin/<default> was materially behind the
actual remote — git log --since returned zero or near-zero commits and the
narrative was fabricated from nothing.

Adds Step 0.5 with four ordered pre-check branches before any window analysis:

  A. No 'origin' remote → skip with "base freshness not verified" note
  B. Detached HEAD → skip with "base freshness not verified" note
  C. `git fetch origin <default>` fails (offline) → warn, proceed against
     last-known origin/<default>
  D. Fetch succeeded → compare today vs latest origin/<default> commit; if
     gap > window-days, BLOCK with explicit citation of latest-commit date.

Skip paths still proceed to Step 1, but the disclosure is carried into the
retro narrative ("offline run, window not freshness-verified") so the output
is never silently confidently-wrong.

Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(retro): regression for #1624 stale-base pre-flight guard

13 static-invariant tests pinning the four ordered pre-check branches in
retro/SKILL.md.tmpl:Step 0.5:

  A. no-remote skip            — must check origin presence + set verdict
  B. detached-HEAD skip        — must gate behind prior verdict (ordering)
  C. fetch-fail warn           — must match `if !` or `||` shape, gate by verdict
  D. stale-base BLOCK          — must read latest-commit ISO date, cite remediation

Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be
named in prose so the retro output carries the cited reason rather than
silently misreporting.

Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer
wins), or the BLOCK message stops citing today/latest-commit/remediation
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611)

The memory and code stages hardcoded a 35-min spawn timeout. On brains with
~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at
exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json
pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler
unconditionally cleaned the dir up — so the next run found a checkpoint
pointing at nothing and restaged from scratch, repeating the SIGTERM forever.

Three changes:

1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default
   2_100_000ms = 35min unchanged):
     GSTACK_SYNC_MEMORY_TIMEOUT_MS
     GSTACK_SYNC_CODE_TIMEOUT_MS
   Out-of-range or non-numeric values warn and fall back to the default.

2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging
   dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at
   the active staging dir, the dir is PRESERVED for next-run resume.
   Otherwise (no checkpoint pointing here, crash before gbrain ever
   touched it) it's cleaned up as before.

3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in
   gstack-gbrain-sync.ts:
     - no checkpoint               → fresh ingest pass
     - checkpoint + staging ok     → set GSTACK_INGEST_RESUME_DIR; child
                                      reuses staging dir and skips
                                      writeStaged; gbrain import resumes
                                      from processedIndex+1
     - checkpoint + staging gone   → warn "previous checkpoint stale
                                      (staging dir gone), restaging from
                                      scratch" and proceed

Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store
state). Detect-then-fallback semantics per C1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-sync): regression for #1611 timeouts + resume

19 tests across three surfaces:

  - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric,
    zero, negative, below-floor, above-ceiling → warn + default; at-floor,
    at-ceiling, valid mid-range → accepted as-is.

  - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging
    ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with
    empty dir.

  - SIGTERM staging preservation (3 static invariants): memory-ingest signal
    handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve
    branch must come before cleanup branch (ordering); orchestrator must
    pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume.

Also threads process.env.HOME through readGbrainCheckpoint and
stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches
at process start and ignores later mutation, so the env override is the
only reliable test injection point.

Failing build if the timeout bounds are removed, the resume detection
short-circuits incorrectly, or the SIGTERM handler regresses to
unconditional cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): pre-emit verification gate kills Django-shape FP class (#1539)

External user filed 4/8 false positives on a /review run against a Django +
DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape:
"resolvable in <5 minutes by viewing the actual code or running a simple
grep" — fields that don't exist on the model, dict.get()-might-be-None on a
form that returns {}-initialized cleaned_data, standard ORM save behavior
called out as data loss.

Extends the Confidence Calibration resolver (consumed by review, cso,
plan-eng-review, ship) with a Pre-emit verification gate:

  Every finding MUST quote the specific code line that motivates it
  (file:line + verbatim text). If the reviewer cannot produce the quote,
  the finding is unverified — its confidence is forced to 4-5 so the
  existing "Suppress from main report" rule fires automatically. The
  finding still goes to the appendix for calibration audit, but the user
  does not see it in the critical-pass output.

Reuses the existing suppression mechanism — no new code path. The FP
classes the gate kills are enumerated in the resolver text so reviewers
see the named patterns.

Framework-meta nudge included for Django Meta, Rails associations,
SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma
generated client — the reviewer must quote the meta-construct that
generates the symbol, not just grep for the literal name. Deeper
framework-aware ORM verification (model introspection, migration-history-
aware checks) is deliberately deferred to a future wave per T-Codex-2.

Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit
per T-Codex-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(review): regression for #1539 pre-emit verification gate

12 tests pinning the gate behavior:

  - Resolver emits the gate header + #1539 reference
  - Gate requires quoting file:line + verbatim text
  - Unverified findings forced to confidence 4-5 (auto-suppress via
    existing <7-rule, no new mechanism)
  - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM,
    Sequelize, Prisma
  - Deferred design doc reference present (1539-framework-aware-review.md)
  - Four named FP classes from #1539 enumerated:
      * field doesn't exist on model
      * dict.get() might be None
      * save() might lose fields
      * update_fields might miss X
  - All four downstream SKILL.md consumers (review, cso, plan-eng-review,
    ship) carry the gate text after gen:skill-docs
  - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows
    unchanged (regression on existing behavior)

Failing build if the gate is removed, the suppression mechanism is
re-invented separately, the framework-meta nudge drops a framework, or
gen:skill-docs stops propagating the gate to consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(config): expose explain_level default

* fix(benchmark): parse positional prompt after flags

* fix(artifacts): reject malformed remote paths

* fix(learnings): preserve current entries in cross-project search

* fix(setup): register root gstack slash alias

* fix(memory): probe gitleaks without shell builtin

* fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard)

In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash
glob brackets like `[A-Z]` match lowercase letters too, so the existing
`case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case`
through validation. The function then trips `printf -v "$varname"` and
`export "$varname"` with `not a valid identifier` errors that surface
mid-prompt, which is exactly what the validator was supposed to prevent.

Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics
on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*`
contract. Declared `local` so it doesn't leak to the calling shell —
`gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare
assignment would mutate the caller's locale for the rest of the process
(silently affecting downstream `sort`, `tr`, locale-aware globs in the
same shell, etc.).

The existing regression test
`test/gbrain-lib-verify.test.ts:'rejects invalid var names'`
already covers the macOS repro shape (passes `lower-case` and expects
the validator to reject + emit `invalid var name`). On Linux CI the
test silently passed because `LC_ALL=C` is the typical default; on
macOS dev boxes it fails.

Verified:
- `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS).
- `_gstack_gbrain_validate_varname lower-case; echo $?` → 2.
- `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0.
- Caller's LC_ALL preserved across calls (confirmed via sourced bash).

* fix(land-and-deploy): detect merged PR after gh failure

After `gh pr merge` exits non-zero, the PR may already be MERGED server-side
(concurrent merge landed, or local cleanup phase failed AFTER the merge
succeeded). Calling `gh pr merge` a second time then errors with a confusing
"already merged" — and worse, the deploy workflow never runs because we
stopped on the first failure.

Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY
non-zero exit from `gh pr merge`:

  - state == MERGED  → record MERGE_PATH=direct, OFFER (don't force)
                       stale-worktree cleanup on the base branch with
                       uncommitted-work guard, proceed to §4a CI watch
  - state == OPEN    → check autoMergeRequest; if non-null treat as
                       merge-queue wait; if null surface both errors and STOP
  - state == CLOSED  → STOP

Hard invariant: never retry `gh pr merge` after a non-zero exit. Server
state is authoritative.

Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of
truth) instead of the generated SKILL.md, so the next gen:skill-docs run
preserves the change. Original diff by @davidfoy via #1620.

Related: cli/cli#3442, cli/cli#13380.

Contributed by @davidfoy via #1620.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435)

When gbrain connects through a PgBouncer transaction-mode pooler (port
6543), it auto-disables prepared statements. This breaks `gbrain search`
silently — the /sync-gbrain capability check fails and the GBrain Search
Guidance block never gets written to CLAUDE.md.

Three-layer fix:

1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in
   the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env
   passed to every gbrain spawn. This is the single chokepoint — all
   gstack gbrain invocations inherit the fix. Caller can opt out with
   `GBRAIN_PREPARE=false`.

2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports
   `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s
   delay for async index propagation under connection pooling.

3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field
   ("transaction" | "session" | null) in the preamble probe JSON so
   /setup-gbrain and /sync-gbrain can advise users about pooler state.

Closes #1435

Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects

- Single-object pooler API responses default to transaction-mode at 6543,
  but the shared pooler tenant on new projects only listens on session/5432
- Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note
- Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat
- 5 new tests covering rewrite, no-op shapes, env opt-out, array path

Fixes #1301.

* fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562)

Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing
for headless agent sessions even for normal users — /qa hangs without
--no-sandbox. The kernel policy denies the unprivileged user namespaces
Chromium needs.

Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces
the sandbox off without changing the default for everyone else. Re-authored
from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper —
purely additive, preserves the headed-launch sandbox-on-by-default behavior
that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar.

Three new regression tests cover:
  - linux + override=1 → false (the named use case)
  - darwin + override=1 → false (env wins on any platform)
  - override=0 → does NOT trigger (must be exactly "1")

Original diff by @techcenter68 via #1562.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): mirror isCustomChromium() guard in headless launch()

When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing
at a baked-extension build (GBrowser / GStack Browser), the headless launch()
path was unconditionally adding --disable-extensions-except / --load-extension.
This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that
launchHeaded() already guards against via isCustomChromium().

Mirror the existing guard: skip --load-extension flags when isCustomChromium()
returns true; always push the off-screen window geometry args.

* fix(browse): daemonize macOS/Linux server via setsid()

`Bun.spawn().unref()` only releases the child from Bun's event loop —
it does NOT call setsid(). The spawned bun server inherits the spawning
shell's process session. When the CLI runs inside a session-managed shell
that exits shortly after the CLI returns (Claude Code's per-command Bash
sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit
sends SIGHUP to every PID in the session — killing the bun server and
its Chromium grandchildren within seconds of a successful `connect`.

Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and
pair-agent) disables the parent-process watchdog but does NOT save the
server here: SIGHUP from session teardown still reaps it.

Replace the macOS/Linux `Bun.spawn().unref()` with Node's
`child_process.spawn({ detached: true })`, which calls setsid() and
gives the server its own session leader role (PPID=1, STAT=Ss). This
mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root
cause, different OS surface.

Verified on macOS in Conductor: pre-fix the server dies ~10–15s after
connect across separate Bash invocations; post-fix the same PID stays
alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/
`snapshot` across many separate shell calls.

The `proc?.stderr` startup-error branch is removed since both platforms
now spawn with `stdio: 'ignore'`; both fall through to the on-disk
`browse-startup-error.log` written by `server.ts`'s start().catch.

* fix(design): bump image-gen timeout to 240s + pin gpt-image-2

The design binary calls /v1/responses (gpt-4o + image_generation tool,
quality:high, 1536x1024) but aborted the request after a hardcoded 120s.
That class of request consistently takes ~140-160s end-to-end, so every
generate/variants/evolve/iterate call aborted before the image returned.

In /design-shotgun this cascades: Step 3c launches N parallel agents,
each calling `$D generate`, each aborts at 120s and retries, all fail,
the comparison board never opens — the skill appears to hang indefinitely.

Reproduced the exact API call with a longer budget: HTTP 200, valid
image, 143.5s. A real /design-shotgun run after the patch generated 3
variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the
161s case, which a naive 150s bump would still have failed.

- Bump AbortController timeout 120_000 -> 240_000 in generate.ts,
  variants.ts, evolve.ts, iterate.ts (both call sites)
- Pin the image_generation tool to model "gpt-image-2"

design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The
feedback-roundtrip.test.ts failures are a pre-existing browse-module
breakage (session.clearLoadedHtml undefined), unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: fill coverage gaps for PRs #1606, #1612, #1620

Three cherry-picked PRs in this wave landed without unit-test coverage for
the specific invariant they protect:

  #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname
    8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator
    directly. Asserts uppercase/digit/underscore accepted, lowercase
    REJECTED (the macOS-locale regression case), mixed-case rejected,
    LC_ALL=C scoping is local (doesn't leak to caller).

  #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn
    4 static-invariant tests on browse/src/cli.ts. The actual setsid
    syscall is hard to assert without a real spawn, so we pin the source
    shape: nodeSpawn imported from child_process; non-Windows branch uses
    nodeSpawn(...) with detached:true and .unref(); comment documents
    setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux.

  #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail
    12 static invariants on land-and-deploy/SKILL.md.tmpl + generated
    SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the
    authoritative state query, the merge-SHA capture, non-destructive
    worktree cleanup with uncommitted-work guard, autoMergeRequest probe
    on OPEN, hard "never retry gh pr merge" rule, and atomic regen
    propagation.

Failing build if any of the three invariants regresses.

Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing
glob-pattern overpermissiveness (hyphens + dots accepted) — not in
#1606's scope; documented inline as a separate cleanup target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(learnings): align injection-prevention tests with PR #1619 tagged-line shape

PR #1619 (preserve current entries in cross-project search) refactored
gstack-learnings-search to tag rows inline (`current\t<json>` vs
`cross\t<json>`) instead of filtering inside the bun block via
process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or
CROSS env vars — it parses the per-line tag and sets a per-entry
_crossProject flag.

The pre-existing test/learnings-injection.test.ts still asserted on the
old SLUG + CROSS env var shape. Updates:

  - Remove the SLUG env var assertion (no longer set on bash command line)
  - Remove the bun-block CROSS env var assertion (block reads the tag now,
    not the env)
  - Add a new positive assertion that the bun block parses the tag
    (sourceTag | tabIndex | crossProject)
  - Keep the shell-interpolation safety assertion unchanged — that's
    independent of the SLUG refactor

The CROSS env var is still SET on the bash command line (it controls
whether the cross-project find runs at all), but the bun child no longer
reads it. The existing "env vars set on bash command line" test continues
to pin that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines

ship/SKILL.md consumes the Confidence Calibration resolver via the
preamble pipeline. This wave's #1539 pre-emit verification gate extends
the resolver text, which propagated to ship/SKILL.md via gen:skill-docs.
The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape
and failed the host-config regression check.

Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md
to match the current generated output. Matches the Daegu wave's bisect
commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591)

PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added
gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not
update the schema regression check in
test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical
order matching the rest of the schema array. Downstream sync-gbrain ignores
unknown keys, so this is forward-compat.

Without this, the test fails with a diff:
  + "gbrain_pooler_mode"
because keys is the actual set returned and the expected array was
pre-#1591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.43.0.0 — post-Daegu paper-cut wave

Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new
env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX,
behavior expansion in browse/src/browser-manager.ts headless launch,
three skill-template prompt changes affecting /retro, /review,
/sync-gbrain).

CHANGELOG entry leads with what stopped happening: /retro stops
fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping
35-min restarts on big brains, /review stops shipping framework FPs the
reviewer never grep'd.

18 fixes total — 15 community PRs + 3 self-filed silent-failure issues
(#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7
new regression test files. Every wave-touched test file passes in
isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision

CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574
(garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0.
Next available MINOR slot is v1.43.2.0.

Bump VERSION + package.json + CHANGELOG entry header. No behavior
changes — purely re-versioning to clear the queue collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: Andrey Esipov <andrey.esipov@outlook.com>
Co-authored-by: David Foy <davidfoy@users.noreply.github.com>
Co-authored-by: mikeangstadt <mike.angstadt@closedloop.ai>
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Co-authored-by: techcenter68 <techcenter68@users.noreply.github.com>
Co-authored-by: shohu <shohu33@gmail.com>
Co-authored-by: Bharat <bharat@theysaid.io>
Co-authored-by: Matteo Hertel <info@matteohertel.com>
2026-05-21 21:21:07 -07:00
Garry Tan 029356e1f0 v1.42.2.0 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring) (#1629)
* v1.42.1.1 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring)

Bundles two browse launch-path bug fixes plus the missing exit-code wiring
that made the second fix actually work end-to-end.

PR #1617 — Chromium sandbox policy at all 3 launch sites
- shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER /
  root heuristic that previously lived only in the headless launch path.
- launch(), launchHeaded() / launchPersistentContext(), and handoff() now
  share the policy so Playwright stops auto-adding --no-sandbox on every
  headed launch and the yellow "unsupported command-line flag" infobar
  disappears on macOS and Linux dev.

PR #1626 — clean Cmd+Q stops triggering supervisor respawn
- resolveDisconnectCause(browser) reads the underlying Chromium
  ChildProcess exitCode + signalCode (with a 1s wait for an async exit
  event) to distinguish clean user-quit from crash.
- handleChromiumDisconnect(browser) dispatches the headless launch()
  disconnect path: clean → exit(0), crash → exit(1).
- launchHeaded() disconnect handler resolves cause inline and computes
  exitCode = 0 (clean) | 2 (crash) before forwarding to onDisconnect.
- handoff() disconnect handler uses the same shared helper.

Codex-caught propagation fix (this commit, not in either source PR)
- BrowserManager.onDisconnect signature widened to accept an exitCode
  argument. Without this, launchHeaded's locally-computed exit code was
  dropped before reaching server.ts.
- browse/src/server.ts:688 — onDisconnect callback now forwards the
  resolved code: (code) => activeShutdown?.(code ?? 2). The ?? 2
  preserves legacy crash semantics for callers that invoke onDisconnect
  without an explicit code.

Tests
- browse/test/browser-manager-unit.test.ts goes from 2 → 17 tests.
- 6 new tests pin shouldEnableChromiumSandbox across darwin / linux /
  win32 / CI / CONTAINER / root.
- 7 new tests pin resolveDisconnectCause across already-exited,
  async-exit, SIGSEGV, SIGKILL, and null-browser.
- 2 new tests (this commit) pin the onDisconnect(exitCode) propagation
  contract including the exact server.ts forwarding callback shape so a
  refactor that drops the forward fails CI before the user-visible
  respawn bug returns.

Refs PRs #1617, #1626; companion gbrowser PR #23.

* chore: bump version v1.42.1.1 → v1.42.2.0

User-requested rebump (claims v1.42.2.0 slot on the queue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 19:30:08 -07:00
Garry Tan 7665adf4fe feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517)
* feat: CDP connect — control real Chrome/Comet via Playwright

Add `connectCDP()` to BrowserManager: connects to a running browser via
Chrome DevTools Protocol. All existing browse commands work unchanged
through Playwright's abstraction layer.

- chrome-launcher.ts: browser discovery, CDP probe, auto-relaunch with rollback
- browser-manager.ts: connectCDP(), mode guards (close/closeTab/recreateContext/handoff),
  auto-reconnect on browser restart, getRefMap() for extension API
- server.ts: CDP branch in start(), /health gains mode field, /refs endpoint,
  idle timer only resets on /command (not passive endpoints)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: browse connect/disconnect/focus CLI commands

- connect: pre-server command that discovers browser, starts server in CDP mode
- disconnect: drops CDP connection, restarts in headless mode
- focus: brings browser window to foreground via osascript (macOS)
- status: now shows Mode: cdp | launched | headed
- startServer() accepts extra env vars for CDP URL/port passthrough

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: CDP-aware skill templates — skip cookie import in real browser mode

Skills now check `$B status` for CDP mode and skip:
- /qa: cookie import prompt, user-agent override, headless workarounds
- /design-review: cookie import for authenticated pages
- /setup-browser-cookies: returns "not needed" in CDP mode

Regenerated SKILL.md files from updated templates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: activity streaming — SSE endpoint for Chrome extension Side Panel

Real-time browse command feed via Server-Sent Events:
- activity.ts: ActivityEntry type, CircularBuffer (capacity 1000), privacy
  filtering (redacts passwords, auth tokens, sensitive URL params),
  cursor-based gap detection, async subscriber notification
- server.ts: /activity/stream SSE, /activity/history REST, handleCommand
  instrumented with command_start/command_end events
- 18 unit tests for filterArgs privacy, emitActivity, subscribe lifecycle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Chrome extension Side Panel + Conductor API proposal

Chrome extension (Manifest V3, sideload):
- Side Panel with live activity feed, @ref overlays, dark terminal aesthetic
- Background worker: health polling, SSE relay, ref fetching
- Popup: port config, connection status, side panel launcher
- Content script: floating ref panel with @ref badges

Conductor API proposal (docs/designs/CONDUCTOR_SESSION_API.md):
- SSE endpoint for full Claude Code session mirroring in Side Panel
- Discovery via HTTP endpoint (not filesystem — extensions can't read files)

TODOS.md: add $B watch, multi-agent tabs, cross-platform CDP, Web Store publishing.
Mark CDP mode as shipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: detect Conductor runtime, skip osascript quit for sandboxed apps

macOS App Management blocks Electron apps (Conductor) from quitting
other apps via osascript. Now detects the runtime environment:
- terminal/claude-code/codex: can manage apps freely
- conductor: prints manual restart instructions + polls for 60s

detectRuntime() checks env vars and parent process. When Chrome needs
restart but we can't quit it, prints step-by-step instructions and
waits for the user to restart Chrome with --remote-debugging-port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: detect Conductor via actual env vars (CONDUCTOR_WORKSPACE_NAME)

Previous detection checked CONDUCTOR_WORKSPACE_ID which doesn't exist.
Conductor sets CONDUCTOR_WORKSPACE_NAME, CONDUCTOR_BIN_DIR, CONDUCTOR_PORT,
and __CFBundleIdentifier=com.conductor.app. Check these FIRST because
Conductor sessions also have ANTHROPIC_API_KEY (which was matching claude-code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: connection status pill — floating indicator when gstack controls Chrome

Small pill in bottom-right corner of every page: "● gstack · 3 refs"
Shows when connected via CDP, fades to 30% opacity after 3s, full on hover.
Disappears entirely when disconnected.

Background worker now notifies content scripts on connect/disconnect state
changes so the pill appears/disappears without polling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Chrome requires --user-data-dir for remote debugging

Chrome refuses --remote-debugging-port without an explicit --user-data-dir.
Add userDataDir to BrowserBinary registry (macOS Application Support paths)
and pass it in both auto-launch and manual restart instructions.

Fix double-quoting in CLI manual restart instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Chrome must be fully quit before launching with --remote-debugging-port

Chrome refuses to enable CDP on its default profile when another instance
is running (even with explicit --user-data-dir). The only reliable path:
fully quit Chrome first, then relaunch with the flag.

Updated instructions to emphasize this clearly with verification step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: bin/chrome-cdp — quit Chrome and relaunch with CDP in one command

Quits Chrome gracefully, waits for full exit, relaunches with
--remote-debugging-port, polls until CDP is ready. Usage: chrome-cdp [port]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use Playwright channel:chrome instead of broken connectOverCDP

Playwright's connectOverCDP hangs with Chrome 146 due to CDP protocol
version mismatch. Switch to channel:'chrome' which uses Playwright's
native pipe protocol to launch the system Chrome binary directly.

This is simpler and more reliable:
- No CDP port discovery needed
- No --remote-debugging-port or --user-data-dir hassles
- $B connect just works — launches real Chrome headed window
- All Playwright APIs (snapshot, click, fill) work unchanged

bin/chrome-cdp updated with symlinked profile approach (kept for
manual CDP use cases, but $B connect no longer needs it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: green border + gstack label on controlled Chrome window

Injects a 2px green border and small "gstack" label on every page
loaded in the controlled Chrome window via context.addInitScript().
Users can instantly tell which Chrome window Claude controls.

Also fixes close() for channel:chrome mode (uses browser.close()
not browser.disconnect() which doesn't exist).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: cleanup chrome-launcher runtime detection, remove puppeteer-core dep

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): redesign controlled Chrome indicator

Replace crude green border + label with polished indicator:
- 2px shimmer gradient at top edge (green→cyan→green, 3s loop)
- Floating pill bottom-right with frosted glass bg, fades to 25%
  opacity after 4s so it doesn't compete with page content
- prefers-reduced-motion disables shimmer animation
- Much more subtle — looks like a developer tool, not broken CSS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document real browser mode + Chrome extension in BROWSER.md and README.md

BROWSER.md: new sections for connect/disconnect/focus commands,
Chrome extension Side Panel install, CDP-aware skills, activity streaming.
Updated command reference table, key components, env vars, source map.

README.md: updated /browse description, added "Real browser mode" to
What's New section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: step-by-step Chrome extension install guide in BROWSER.md

Replace terse bullet points with numbered walkthrough covering:
developer mode toggle, load unpacked, macOS file picker tip (Cmd+Shift+G),
pin extension, configure port, open side panel. Added troubleshooting section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Cmd+Shift+. tip for hidden folders in macOS file picker

macOS hides folders starting with . by default. Added both shortcuts:
Cmd+Shift+G (paste path directly) and Cmd+Shift+. (show hidden files).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: integrate hidden folder tips into the install flow naturally

Move Cmd+Shift+G and Cmd+Shift+. tips inline with the file picker
step instead of as a separate tip block after it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: auto-load Chrome extension when $B connect launches Chrome

Extension auto-loads via --load-extension flag — no manual chrome://extensions
install needed. findExtensionPath() checks repo root, global install, and dev
paths. Also adds bin/gstack-extension helper for manual install in regular
Chrome, and rewrites BROWSER.md install docs with auto-load as primary path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /connect-chrome skill — one command to launch Chrome with Side Panel

New skill that runs $B connect, verifies the connection, guides the user
to open the Side Panel, and demos the live activity feed. Extension auto-loads
via --load-extension so no manual chrome://extensions install needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use launchPersistentContext for Chrome extension loading

Playwright's chromium.launch() silently ignores --load-extension.
Switch to launchPersistentContext with ignoreDefaultArgs to remove
--disable-extensions flag. Use bundled Chromium (real Chrome blocks
unpacked extensions). Fixed port 34567 for CDP mode so the extension
auto-connects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sync extension to DESIGN.md — amber accent, zinc neutrals, grain texture

Import design system from gstack-website. Update all extension colors:
green (#4ade80) → amber (#F59E0B/#FBBF24), zinc gray neutrals, grain
texture overlay. Regenerate icons as amber "G" monogram on dark background.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar chat with Claude Code — icon opens side panel directly

Replace popup flyout with direct side panel open on icon click. Primary
UI is now a chat interface that sends messages to Claude Code via file
queue. Activity/Refs tabs moved behind a debug toggle in the footer.
Command bar with history, auto-poll for responses, amber design system.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar agent — Claude-powered chat backend via file queue

Add /sidebar-command, /sidebar-response, and /sidebar-chat endpoints
to the browse server. sidebar-agent.ts watches the command queue file,
spawns claude -p with browse context for each message, and streams
responses back to the sidebar chat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate gstack pill overlay, hide crash restore bubble

The addInitScript indicator and the extension's content script were both
injecting bottom-right pills, causing duplicates. Remove the pill from
addInitScript (extension handles it). Replace --restore-last-session with
--hide-crash-restore-bubble to suppress the "Chromium didn't shut down
correctly" dialog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: state file authority — CDP server cannot be silently replaced

Hardens the connect/disconnect lifecycle:
- ensureServer() refuses to auto-start headless when CDP server is alive
- $B connect does full cleanup: SIGTERM → 2s → SIGKILL, profile locks, state
- shutdown() cleans Chromium SingletonLock/Socket/Cookie files
- uncaughtException/unhandledRejection handlers do emergency cleanup

This prevents the bug where a headless server overwrites the CDP server's
state file, causing $B commands to hit the wrong browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar agent streaming events + session state management

Enhance sidebar-agent.ts with:
- Live streaming of claude -p events (tool_use, text, result) to sidebar
- Session state file for BROWSE_STATE_FILE propagation to claude subprocess
- Improved logging (stderr, exit codes, event types)
- stdin.end() to prevent claude waiting for input
- summarizeToolInput() with path shortening for compact sidebar display

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar chat UI — streaming events, agent status, reconnect retry

Sidebar panel improvements:
- Chat tab renders streaming agent events (tool_use, text, result)
- Thinking dots animation while agent processes
- Agent error display with styled error blocks
- tryConnect() with 2s retry loop for initial connection
- Debug tabs (Activity/Refs) hidden behind gear toggle
- Clear chat button
- Compact tool call display with path shortening

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: server-integrated sidebar agent with sessions and message queue

Move the sidebar agent from a separate bun process into server.ts:
- Agent spawns claude -p directly when messages arrive via /sidebar-command
- In-memory chat buffer backed by per-session chat.jsonl on disk
- Session manager: create, load, persist, list sessions
- Message queue (cap 5) with agent status tracking (idle/processing/hung)
- Stop/kill endpoints with queue dismiss support
- /health now returns agent status + session info
- All sidebar endpoints require Bearer auth
- Agent killed on server shutdown
- 120s timeout detects hung claude processes

Eliminates: file-queue polling, separate sidebar-agent.ts process,
stale auth tokens, state file conflicts between processes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extension auth + token flow for server-integrated agent

Update Chrome extension to use Bearer auth on all sidebar endpoints:
- background.js captures auth token from /health, exposes via getToken msg
- background.js sets openPanelOnActionClick for direct side panel access
- sidepanel.js gets token from background, sends in all fetch headers
- Health broadcasts include token so sidebar auto-authenticates
- Removes popup from manifest — icon click opens side panel directly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: self-healing sidebar — reconnect banner, state machine, copy button

Sidebar UI now handles disconnection gracefully:
- Connection state machine: connected → reconnecting → dead
- Amber pulsing banner during reconnect (2s retry, 30 attempts)
- Red "Server offline" banner with Reconnect + Copy /connect-chrome buttons
- Green "Reconnected" toast that fades after 3s on successful reconnect
- Copy button lets user paste /connect-chrome into any Claude Code session

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: crash handling — save session, kill agent, distinct exit codes

Hardened shutdown/crash behavior:
- Browser disconnect exits with code 2 (distinct from crash code 1)
- emergencyCleanup kills agent subprocess and saves session state
- Clean shutdown saves session before exit (chat history persists)
- Clear user message on browser disconnect: "Run $B connect to reconnect"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: worktree-per-session isolation for sidebar agent

Each sidebar session gets an isolated git worktree so the agent's file
operations don't conflict with the user's working directory:
- createWorktree() creates detached HEAD worktree in ~/.gstack/worktrees/
- Falls back to main cwd for non-git repos or on creation failure
- Handles collision cleanup from prior crashes
- removeWorktree() cleans up on session switch and shutdown
- worktreePath persisted in session.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(qa): ISSUE-001 — disconnect blocked by CDP guard in ensureServer

$B disconnect was routed through ensureServer() which refused to start a
headless server when a CDP state file existed. Disconnect is now handled
before ensureServer() (like connect), with force-kill + cleanup fallback
when the CDP server is unresponsive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve claude binary path for daemon-spawned agent

The browse server runs as a daemon and may not inherit the user's shell
PATH. Add findClaudeBin() that checks ~/.local/bin/claude (standard
install location), which claude, and common system paths. Shows a clear
error in the sidebar chat if claude CLI is not found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve claude symlinks + check Conductor bundled binary

posix_spawn fails on symlinks in compiled bun binaries. Now:
- Checks Conductor app's bundled binary first (not a symlink)
- Scans ~/.local/share/claude/versions/ for direct versioned binaries
- Uses fs.realpathSync() to resolve symlinks before spawning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: compiled bun binary cannot posix_spawn — use external agent process

Compiled bun binaries fail posix_spawn on ALL executables (even /bin/bash).
The server now writes to an agent queue file, and a separate non-compiled
bun process (sidebar-agent.ts) reads the queue, spawns claude, and POSTs
events back via /sidebar-agent/event.

Changes:
- server.ts: spawnClaude writes to queue file instead of spawning directly
- server.ts: new /sidebar-agent/event endpoint for agent → server relay
- server.ts: fix result event field name (event.text vs event.result)
- sidebar-agent.ts: rewritten to poll queue file, relay events via HTTP
- cli.ts: $B connect auto-starts sidebar-agent as non-compiled bun process

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: loading spinner on sidebar open while connecting to server

Shows an amber spinner with "Connecting..." when the sidebar first opens,
replacing the empty state. After the first successful /sidebar-chat poll:
- If chat history exists: renders it immediately
- If no history: shows the welcome message

Prevents the jarring empty-then-populated flash on sidebar open.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: zero-friction side panel — auto-open on install, pill is clickable

Three changes to eliminate manual side panel setup:
- Auto-open side panel on extension install/update (onInstalled listener)
- gstack pill (bottom-right) is now clickable — opens the side panel
- Pill has pointer-events: auto so clicks always register (was: none)

User no longer needs to find the puzzle piece icon, pin the extension,
or know the side panel exists. It opens automatically on first launch
and can be re-opened by clicking the floating gstack pill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: kill CDP naming, delete chrome-launcher.ts dead code

The connectCDP() method and connectionMode: 'cdp' naming was a legacy
artifact — real Chrome was tried but failed (silently blocks
--load-extension), so the implementation already used Playwright's
bundled Chromium via launchPersistentContext(). The naming was
misleading.

Changes:
- Delete chrome-launcher.ts (361 LOC) — only import was in unreachable
  attemptReconnect() method
- Delete dead attemptReconnect() and reconnecting field
- Delete preExistingTabIds (was for protecting real Chrome tabs we
  never connect to)
- Rename connectCDP() → launchHeaded()
- Rename connectionMode: 'cdp' → 'headed' across all files
- Replace BROWSE_CDP_URL/BROWSE_CDP_PORT env vars with BROWSE_HEADED=1
- Regenerate SKILL.md files for updated command descriptions
- Move BrowserManager unit tests to browser-manager-unit.test.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: converge handoff into connect — extension loads on handoff

Handoff now uses launchPersistentContext() with extension auto-loading,
same as the connect/launchHeaded() path. This means when the agent
gets stuck (2FA, CAPTCHA) and hands off to the user, the Chrome
extension + side panel are available automatically.

Before: handoff used chromium.launch() + newContext() — no extension
After: handoff uses chromium.launchPersistentContext() — extension loads

Also sets connectionMode to 'headed' and disables dialog auto-accept
on handoff, matching connect behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gate sidebar chat behind --chat flag

$B connect (default): headed Chromium + extension with Activity + Refs
tabs only. No separate agent spawned. Clean, no confusion.

$B connect --chat: same + Chat tab with standalone claude -p agent.
Shows experimental banner: "Standalone mode — this is a separate
agent from your workspace."

Implementation:
- cli.ts: parse --chat, set BROWSE_SIDEBAR_CHAT env, conditionally
  spawn sidebar-agent
- server.ts: gate /sidebar-* routes behind chatEnabled, return 403
  when disabled, include chatEnabled in /health response
- sidepanel.js: applyChatEnabled() hides/shows Chat tab + banner
- background.js: forward chatEnabled from health response
- sidepanel.html/css: experimental banner with amber styling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: file drop relay + $B inbox command

Sidebar agent now writes structured messages to .context/sidebar-inbox/
when processing user input. The workspace agent can read these via
$B inbox to see what the user reported from the browser.

File drop format:
  .context/sidebar-inbox/{timestamp}-observation.json
  { type, timestamp, page: {url}, userMessage, sidebarSessionId }

Atomic writes (tmp + rename) prevent partial reads. $B inbox --clear
removes messages after display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: $B watch — passive observation mode

Claude enters read-only mode and captures periodic snapshots (every 5s)
while the user browses. Mutation commands (click, fill, etc.) are
blocked during watch. $B watch stop exits and returns a summary with
the last snapshot.

Requires headed mode ($B connect). This is the inverse of the scout
pattern — the workspace agent watches through the browser instead of
the sidebar relaying to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for sidebar-agent, file-drop, and watch mode

33 new tests covering:
- Sidebar agent queue parsing (valid/malformed/empty JSONL)
- writeToInbox file drop (directory creation, atomic writes, JSON format)
- Inbox command (display, sorting, --clear, malformed file handling)
- Watch mode state machine (start/stop cycles, snapshots, duration)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: TODOS cleanup + Chrome vs Chromium exploration doc

- Update TODOS.md: mark CDP mode, $B watch, sidebar scout as SHIPPED
- Delete dead "cross-platform CDP browser discovery" TODO
- Rename dependencies from "CDP connect" to "headed mode"
- Add docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md memorializing
  the architecture exploration and decision to use Playwright Chromium

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Conductor Chrome sidebar integration design doc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sidebar-agent validates cwd before spawning claude

The queue entry may reference a worktree that was cleaned up between
sessions. Now falls back to process.cwd() if the path doesn't exist,
preventing silent spawn failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gen-skill-docs resolver merge + preamble tier gate + plan file discovery

The local RESOLVERS record in gen-skill-docs.ts was shadowing the imported
canonical resolvers, causing stale test coverage and preamble generators
to be used instead of the authoritative versions in resolvers/.

Changes:
- Merge imported RESOLVERS with local overrides (spread + override pattern)
- Fix preamble tier gate: tier 1 skills no longer get AskUserQuestion format
- Make plan file discovery host-agnostic (search multiple plan dirs)
- Add missing E2E tier entries for ship/review plan completion tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ungate sidebar agent + raise timeout to 5 minutes (v0.12.0)

Sidebar chat is now always available in headed mode — no --chat flag needed.
Agent tasks get 5 minutes instead of 2, enabling multi-page workflows like
navigating directories and filling forms across pages.

Changes:
- cli.ts: remove --chat flag, always set BROWSE_SIDEBAR_CHAT=1, always spawn agent
- server.ts: remove chatEnabled gate (403 response), raise AGENT_TIMEOUT_MS to 300s
- sidebar-agent.ts: raise child process timeout from 120s to 300s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: headed mode + sidebar agent documentation (v0.12.0)

- README: sidebar agent section, personal automation example (school parent
  portal), two auth paths (manual login + cookie import), DevTools MCP mention
- BROWSER.md: sidebar agent section with usage, timeout, session isolation,
  authentication, and random delay documentation
- connect-chrome template: add sidebar chat onboarding step
- CHANGELOG: v0.12.0 entry covering headed mode, sidebar agent, extension
- VERSION: bump to 0.12.0.0
- TODOS: Chrome DevTools MCP integration as P0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files

Generated from updated templates + resolver merge. Key changes:
- Tier 1 skills no longer include AskUserQuestion format section
- Ship/review skills now include coverage gate with thresholds
- Connect-chrome skill includes sidebar chat onboarding step
- Plan file discovery uses host-agnostic paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate Codex connect-chrome skill

Updated preamble with proactive prompt and sidebar chat onboarding step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: network idle, state persistence, iframe support, chain pipe format (v0.12.1.0) (#516)

* feat: network idle detection + chain pipe format

- Upgrade click/fill/select from domcontentloaded to networkidle wait
  (2s timeout, best-effort). Catches XHR/fetch triggered by interactions.
- Add pipe-delimited format to chain as JSON fallback:
  $B chain 'goto url | click @e5 | snapshot -ic'
- Add post-loop networkidle wait in chain when last command was a write.
- Frame-aware: commands use target (getActiveFrameOrPage) for locator ops,
  page-only ops (goto/back/forward/reload) guard against frame context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: $B state save/load + $B frame — new browse commands

- state save/load: persist cookies + URLs to .gstack/browse-states/{name}.json
  File perms 0o600, name sanitized to [a-zA-Z0-9_-]. V1 skips localStorage
  (breaks on load-before-navigate). Load replaces session via closeAllPages().
- frame: switch command context to iframe via CSS selector, @ref, --name, or
  --url. 'frame main' returns to main frame. Execution target abstraction
  (getActiveFrameOrPage) across read-commands, snapshot, and write-commands.
- Frame context cleared on tab switch, navigation, resume, and handoff.
- Snapshot shows [Context: iframe src="..."] header when in frame.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add tests for network idle, chain pipe format, state, and frame

- Network idle: click on fetch button waits for XHR, static click is fast
- Chain pipe: pipe-delimited commands, quoted args, JSON still works
- State: save/load round-trip, name sanitization, missing state error
- Frame: switch to iframe + back, snapshot context header, fill in frame,
  goto-in-frame guard, usage error

New fixtures: network-idle.html (fetch + static buttons), iframe.html (srcdoc)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review fixes — iframe ref scoping, detached frame recovery, state validation

- snapshot.ts: ref locators, cursor-interactive scan, and cursor locator
  now use target (frame-aware) instead of page — fixes @ref clicking in iframes
- browser-manager.ts: getActiveFrameOrPage auto-recovers from detached frames
  via isDetached() check
- meta-commands.ts: state load resets activeFrame, elementHandle disposed after
  contentFrame(), state file schema validation (cookies + pages arrays),
  filter empty pipe segments in chain tokenizer
- write-commands.ts: upload command uses target.locator() for frame support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files + rebuild binary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.12.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 11:15:24 -06:00