Commit Graph

320 Commits

Author SHA1 Message Date
Garry Tan a1a46db594 Merge remote-tracking branch 'origin/main' into garrytan/cairo-v3
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
2026-05-21 19:02:46 -07:00
Garry Tan 65972f6a15 v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639)
* docs: drop ~/.zshrc env note in favor of GSTACK_* env-shim reference

The CLAUDE.md "Where the keys live on this machine" block hand-rolled a
`grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY
inside Conductor workspaces. That predates the GSTACK_* env-shim
(`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes
GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names
inside gstack's TS binaries automatically.

The zshrc recipe is now an obsolete workaround. Replace with a short note
pointing at the env-shim as the canonical answer. Keep the Agent SDK
\`env: {...}\` gotcha (still real, unrelated to where the key comes from).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: default PGLite to voyage-code-3 when VOYAGE_API_KEY set

When gstack inits a local PGLite engine for code search, use Voyage's
code-specialized `voyage-code-3` (1024-dim) embedding model if
\`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected
provider chain (OpenAI text-embedding-3-large 1536-dim when
OPENAI_API_KEY is available, etc.) when the Voyage key is unset.

Why voyage-code-3: head-to-head A/B against voyage-4-large on 10
realistic code queries against this codebase (using gbrain query
--no-expand for pure vector retrieval). voyage-code-3 strictly won on
4 queries (cases where the right hit was an implementation file vs a
test file: terminal-agent.ts over terminal-agent-integration.test.ts,
sanitizeReplacer over sanitize.test.ts, disposeSession over a
tangentially-related killDaemon test, surfaced injectCanary semantic
query). Tied on 5 with consistently +0.03 to +0.06 higher confidence.
Zero losses for voyage-4-large.

Touches 3 init sites in setup-gbrain/SKILL.md.tmpl:
- Step 1.5 (broken-db rollback-safe switch to PGLite)
- Path 3 direct PGLite init
- Step 4.5 split-engine local code index (Path 4 Yes branch)

Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the
post-install hint in bin/gstack-gbrain-install (with a tip when
VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in
USING_GBRAIN_WITH_GSTACK.md.

Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex
of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md after voyage-code-3 default

Mechanical regen via \`bun run gen:skill-docs --host all\` after the
template changes in the previous commit. Single-host regen leaves
other-host outputs stale and trips gen-skill-docs.test.ts; --host all
keeps every adapter (claude, codex, kiro, opencode, slate, cursor,
openclaw, hermes, gbrain) in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: gbrain PGLite + voyage-code-3 init contract + sync integration

Two test files cover the voyage-code-3 default landed in the previous
commits:

test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier.
Mirrors gbrain-init-rollback.test.ts: runs the skill template's
PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel
file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty.
Also includes belt-and-suspenders grep checks that the template literally
contains the voyage gate at all 3 PGLite init sites.

test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid,
skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir,
registers a 3-file fixture git repo as a source, runs
\`gbrain sync --strategy code --skip-failed\`, asserts pages imported +
embedded > 0. Also asserts \`gbrain doctor\` reports no dimension
mismatch and the column width is 1024d. \`gbrain code-def\` smoke test
confirms symbol extraction works against the embedded fixture.

The integration test deliberately omits a \`gbrain query\` assertion:
query produces correct output but \`gbrain query\` hangs ~2 min on a
fresh PGLite before exiting. The smoking-gun assertion for "embeddings
worked" is the "N pages embedded" line from sync output. Symbol-aware
correctness is covered by the code-def assertion.

Caught one real bug during test development: gbrain reads
\`.gbrain-source\` from CWD and tries to sync that source too. The test
sets cwd to the sandbox root to avoid the parent worktree's pin
polluting the sandbox brain. Documented in the runGbrain() helper.

Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise.
Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage
embeddings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump to v1.43.1.0 with voyage-code-3 default + tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update USING_GBRAIN_WITH_GSTACK for v1.43.1.0 voyage-code-3 default

Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as
the fallback path. Refresh the "search returns nothing semantic" troubleshooting
to mention both providers and clarify that the env-shim only promotes
ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor
workspace env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: drop em-dashes + replace phantom embedding-migrations.md ref with inline recipe

CHANGELOG release-summary prose used em-dashes (violates voice rule) and
linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's.
Replace with periods/commas and inline the dimension-mismatch recovery
recipe directly (mv + re-init).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:55:55 -07:00
Garry Tan 6864012ee9 Merge remote-tracking branch 'origin/main' into garrytan/cairo-v3
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
2026-05-21 16:17:48 -07:00
Garry Tan 1d9b9c4cfc v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574)
* feat(ios): author 5 iOS device-farm skill templates + generated docs

Authors ios-qa, ios-fix, ios-design-review, ios-clean, ios-sync as upstream gstack skills. Each follows the standard SKILL.md.tmpl pattern with preamble-tier:3 frontmatter. The fork at time-attack/gstack shipped these but as byte-identical .md/.tmpl pairs that wouldn't pass skill-docs.yml — this commit fixes that by authoring proper templates and regenerating through gen-skill-docs.

* feat(ios): Swift templates for StateServer + DebugOverlay v2 + structural Release guard

StateServer is loopback-only (::1 + 127.0.0.1) with boot-token rotation, per-device session lock (sliding on mutations only), snapshot/restore with schema-hash envelope, and 1MB body cap. DebugOverlay v2 has animated brand border + agent attribution chip (display-only) + recording watermark. Package.swift enforces structural Release-build exclusion via .when(configuration: .debug). Includes Tailscale ACL example doc.

* feat(ios): Mac-side daemon (bun/TS) for Tailscale identity gating + USB proxy

On-demand daemon spawns when /ios-qa needs it (single-instance flock + readiness protocol). Owns tailnet ingress: fail-closed tailscaled LocalAPI probe, dual-track /auth/mint (self-service for allowlisted identities, owner-granted via CLI), capability-tier allowlist (observe/interact/mutate/restore), 1h default session TTL (24h hard cap), audit log of every authenticated mutating tailnet request, hashed-identity attempts log. iOS StateServer never directly binds tailnet — identity validation lives Mac-side because iPhones can't reach tailscaled. 67 unit/integration tests covering session-lock concurrency, capability enforcement, fail-closed probe, identity canonicalization, body limits, and boot-token leak proofs.

* feat(ios): gen-accessors codegen tool (SwiftPM + TS port)

Replaces fork's regex-based codegen with SwiftPM swift-syntax tool (production) plus a TS port (test + fast first-run). Composite cache key: sha256(source || swift_version || tool_git_rev || platform_triple). Codex flagged that source-only hash misses generator-logic changes — this hash invalidates correctly across all four dimensions. 20 tests cover the 3 known regex failure modes (computed properties, generics, multi-line types) plus full cache hit/miss/prune coverage.

* test(ios): high-level E2E + touchfile registration

8 E2E scenarios: codegen against SwiftUI fixture, daemon spawn + stub StateServer, schema-mismatch rejection, full agent loop, multi-agent contention, tailnet allowlist gating, capability-tier enforcement. Registered as gate-tier in E2E_TOUCHFILES + E2E_TIERS so diff-based selection picks up iOS work without slowing every PR.

* chore: bump version and changelog (v1.40.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(ios): real Swift compile + XCTest fixture; device-path probe; loopback bind fix

Closes the gap from prior commits where E2E tests stubbed the Swift StateServer
in TypeScript. Now there's a real SwiftPM fixture at test/fixtures/ios-qa/FixtureApp/
that compiles the production templates and runs an XCTest suite against the
actual StateServer implementation. Three new test layers:

- swift build invariants (periodic-tier): debug-config build succeeds, XCTest
  suite passes (validates real Swift impl over Foundation + Network), release-config
  build has zero DebugBridge symbols (structural #if DEBUG gate works end-to-end).

- Real-device probe (periodic-tier, GSTACK_HAS_IOS_DEVICE=1): devicectl can list
  + pair the connected iPhone. Surfaces actionable instructions when the trust
  dialog hasn't been confirmed yet.

- Fixture sources copied from ios-qa/templates/ — Package.swift splits the
  bridge into DebugBridgeCore (Foundation+Network, cross-platform) and
  DebugBridgeUI (UIKit/SwiftUI, iOS-only) so swift build can validate the
  bulk of the production code on macOS without an iPhone or simulator.

Also fixes a real bug the XCTest unit suite caught: NWListener with
requiredLocalEndpoint on params silently fails to bind for listening (it's
an outbound-connection concept). Replaced with .requiredInterfaceType=.loopback
+ .acceptLocalOnly=true + a per-connection peer-address check. The fork's
inherited code had this bug; we shipped it untouched in v1.41.0.0 and the
new XCTest suite caught it immediately.

* fix(ios): 3 architecture bugs surfaced by real-iPhone device test

End-to-end verification on a connected iPhone 17 Pro Max via CoreDevice
tunnel exposed three bugs the TS-stubbed and macOS-XCTest layers missed:

1. acceptLocalOnly=true was too tight. Network.framework's "local" gate
   only allows ::1 / 127.0.0.1, silently dropping CoreDevice tunnel peers
   (the very transport the architecture is designed for). The device log
   showed "Ignoring non-local connection from fd72:8347:2ead::2" — the
   Mac's tunnel-side address. Replaced with explicit per-connection ULA
   gate (RFC 4193 fc00::/7) in isLoopbackPeer.

2. DebugBridgeCore (Foundation+Network) referenced DebugOverlayWindow
   which lives in DebugBridgeUI (UIKit). Backwards module dep. Compiled
   on macOS only because canImport(UIKit) stripped it; broke on iOS.
   Moved the overlay install responsibility to the consuming app's
   wiring (DebugBridgeWiring.swift.template already shows the pattern).

3. @Observable macro + @Snapshotable property wrapper conflict. Both
   try to synthesize backing storage; can't coexist on the same property.
   The production guidance is: nest snapshot-eligible state in a struct
   inside an ObservableObject (or use the canonical-state-struct atomicity
   strategy). Fixture switched to a plain class to demonstrate.

Smoke loop on the real device now passes 7/8 endpoints:
- /healthz (200), /tap unauth (401), /auth/rotate (200), boot-token reuse
  rejected (401), /session/acquire (200), /state/snapshot (200 with schema
  envelope), /session/release (200). /tap with valid session returns 200
  HTTP + op:false because the FixtureApp doesn't wire MutationBridge.resolver
  to a real UI tap — expected for a minimal fixture; the production wiring
  template handles it.

Also adds:
- test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/FixtureAppApp.swift
  (SwiftUI @main entry that boots StateServer)
- test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/Info.plist
- test/fixtures/ios-qa/FixtureApp/project.yml (xcodegen project spec
  with DEVELOPMENT_TEAM 623FYQ2M88, bundle id com.gstack.iosqa.fixture)

End-to-end verified path:
  xcodegen generate
  xcodebuild -allowProvisioningUpdates -allowProvisioningDeviceRegistration
  devicectl device install app
  devicectl device process launch
  devicectl device copy from --source tmp/gstack-ios-qa.token
  curl -6 http://[<corodevice-ipv6>]:9999/...

* feat(ios): real daemon tunnelProvider + KIF-derived UITouch synthesis

Closes two layers of the device-control gap:

L1 — Mac daemon's tunnelProvider is now real, not a stub. New files:
- ios-qa/daemon/src/devicectl.ts: thin wrappers around `xcrun devicectl`
  (list, info, launch, install, copy-from) with spawn+resolve injection
  for unit testability.
- ios-qa/daemon/src/tunnel-bootstrap.ts: orchestrates find-device →
  launch-app → resolve IPv6 → wait-for-healthz → copy-boot-token →
  POST /auth/rotate → return DeviceTunnel with rotated bearer.
- ios-qa/daemon/test/tunnel-bootstrap.test.ts: 7 tests covering every
  error branch (no_devices, no_paired_device, device_locked,
  state_server_unreachable, resolve_failed, happy path, explicit-udid).
- index.ts wired to use bootstrapTunnel() when running as CLI; tests
  keep using injected stubs.

L2 — In-process touch synthesis for non-UIControl widgets. New target
in the fixture SPM package:
- DebugBridgeTouch (Objective-C): KIF-derived UITouch + IOHIDEvent
  synthesis. Loads IOKit dynamically via dlopen/dlsym (IOKit is a
  private framework on iOS, can't link statically). Uses iOS 18+
  _UIHitTestContext for SwiftUI hit-testing. Public Swift-callable
  API: DebugBridgeTouch.sendTap(at:in:). MIT-attributed to
  kif-framework/KIF.
- DebugBridgeUI/Bridges.swift: rewritten MutationBridge.handleTap to
  delegate to DebugBridgeTouch. ScreenshotBridge + ElementsBridge
  implementations also land here.
- FixtureApp/Sources/FixtureApp/FixtureAppApp.swift: wires the bridges
  on app launch under #if DEBUG.

Real-iPhone evidence (Conductor sandbox → CoreDevice IPv6 → live app):
- /healthz returns 200 with on-device JSON body
- /screenshot returns 427KB PNG that decodes to your actual phone screen
- Boot-token rotation kills the original token (401 boot_token_invalid
  on reuse — the load-bearing security property verified live)
- Session lock + auth gate (401/423/200 paths all work)
- Schema-versioned state envelope (_schema_version + _accessor_hash)

Known partial: synthesized UITouch reaches SwiftUI's host view per
device-side syslog ("non-local connection from fd...:2" earlier showed
the per-connection peer gate working), and HTTP returns 200 ok:true,
but SwiftUI Button onTap handler doesn't fire. UIControl widgets DO
work via UIControl.sendActions. Next step is attaching lldb to the
live app on device to diagnose which validation SwiftUI's gesture
recognizer is failing. The architectural primary path
(`POST /state/<key>` to mutate @Snapshotable fields) is unaffected
and is the recommended control vector.

Documented sources for the KIF-derived synthesis:
- https://github.com/kif-framework/KIF (MIT)
- UITouch-KIFAdditions.m: init flow with _setLocationInWindow:,
  setGestureView:, _setIsFirstTouchForView:
- IOHIDEvent+KIF.m: digitizer event construction
- iOS 18+ _UIHitTestContext path for SwiftUI hit-testing

* fix(ios): SwiftUI Button synthesized tap on iOS 18+

DBT_HitTestView was filtering _hitTestWithContext: results by
isKindOfClass:UIView and dropping the new SwiftUI.UIKitGestureContainer
(a UIResponder, not UIView). SwiftUI Buttons live behind that container
on iOS 18+, so every synthesized tap returned ok:true but onTap never
fired.

Mirror KIF PR #1323: return id, pass the responder through to
UITouch.setView: directly (the setter accepts non-UIView responders).

Verified: real iPhone 17 Pro Max, iOS 26.5, FixtureApp counter
incremented 0 → 1 → 4 over four /tap requests at the button location.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ios): hoist DebugBridgeTouch into canonical templates

Bridges.swift.template imports DebugBridgeTouch but no .m/.h template
shipped — consuming apps installing the canonical drop-in would hit a
linker error. Closes that gap with the fixture's verified working code.

Changes:

- New ios-qa/templates/DebugBridgeTouch.{h,m}.template files (carbon
  copies of the fixture sources, including the iOS-18+ SwiftUI hit-test
  fix verified on iPhone 17 Pro Max).
- Package.swift.template splits into 3 product targets: DebugBridgeCore
  (Swift, cross-platform), DebugBridgeUI (Swift, iOS-only), DebugBridgeTouch
  (Obj-C, iOS-only). Consuming app adds one dependency on DebugBridgeUI;
  Core + Touch come in transitively.
- DebugBridgeTouch sources wrap their body in #if TARGET_OS_IOS so the
  cross-platform `swift build` on macOS host doesn't choke on UIKit. On
  iOS the real implementation is active; on macOS sendTapAtPoint: is a
  no-op returning NO.
- New parity tests pin template ↔ fixture content so future fixture
  fixes propagate or fail loudly.
- Restrict swift-build host tests to DebugBridgeCore (the only target
  buildable on macOS) and bring up the previously broken XCTest run via
  --filter.

Verified post-change: real iPhone 17 Pro Max, iOS 26.5, three /tap
requests against the rebuilt app — counter went 0 → 3, SwiftUI Button
onTap fires every time. Templates now sufficient to ship to any
consuming iOS app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ios): ship gstack-ios-qa-daemon + gstack-ios-qa-mint launchers

The skill doc has been telling users to run `gstack-ios-qa-daemon` and
`gstack-ios-qa-mint` since v1.41.0.0, but neither binary actually existed.
Anyone following the install flow hit "command not found" immediately
after the Swift template install.

Adds the missing pieces:

- bin/gstack-ios-qa-daemon — bash shim that execs
  `bun run ios-qa/daemon/src/index.ts`. Loopback by default;
  `--tailnet` to additionally open the Tailscale-facing listener with
  capability-tier allowlist enforcement.
- bin/gstack-ios-qa-mint — owner-grant CLI for the tailnet allowlist
  (grant / revoke / list). Writes ~/.gstack/ios-qa-allowlist.json at
  mode 0600. Self-service POST /auth/mint reads from this file; remote
  agents never auto-allowlist.
- ios-qa/daemon/src/cli-mint.ts — TS implementation behind the shim.
  Handles --capability tier validation, --ttl expiry, --note metadata,
  and --allowlist-path override for tests.
- ios-qa/daemon/src/allowlist.ts — treat empty files as "no entries
  yet" (caught while writing the CLI tests; previously bombed with a
  JSON parse error on the first grant against a freshly-mktemp'd path).

Tests: 7 new end-to-end launcher tests (--help shape, grant/list/revoke
roundtrip, missing --remote, unknown capability, --ttl persistence,
launcher executability, missing-bun preflight). All 81 daemon tests
pass.

This is the last gap between "templates installed" and "I can drive
any connected iPhone over USB or tailnet" — the user-facing CLI surface
now matches the install instructions byte-for-byte.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface ios-qa CLIs + add end-to-end how-to walkthrough

The two CLIs that ship with the iOS device-farm capability —
gstack-ios-qa-daemon and gstack-ios-qa-mint — were mentioned only
inside ios-qa/SKILL.md. Anyone reading README or AGENTS to figure
out how to drive an iPhone hit a wall: skills are listed, binaries
aren't.

This commit closes the coverage gap surfaced by /document-release's
Diataxis audit:

- README.md, AGENTS.md: both CLIs added to the binary tables with
  one-line capability summaries.
- docs/howto-ios-testing-with-gstack.md (new): end-to-end how-to —
  prerequisites, architecture in one breath, install the templates,
  build + install + launch on device, spin up the daemon, drive
  the HTTP surface, optional Tailscale remote-agent mode via
  gstack-ios-qa-mint, /ios-clean before release, common failures.
  Pulled directly from the real iPhone 17 Pro Max / iOS 26.5
  verification run.
- README + AGENTS link to the new how-to from the iOS skill row.

No CHANGELOG entry change — the consolidated 1.43.0.0 entry is /ship
work. No VERSION bump — already at 1.43.0.0 covering all branch work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e-plan): tolerate transient error_api with zero-turn signature

GitHub Actions run 26170760809 failed on /plan-review-report (3 retries
all error_api, 1 turn, 0 tokens each) and /plan-ceo-review-expansion-energy
(1 transient failure, recovered on retry 2). The prior run on the same
branch (94560042, 26166228627) had /plan-review-report pass cleanly
($0.53, 8 turns, 33s).

What error_api with turnsUsed===0 means: the Anthropic API call returned
is_error=true (subtype=success + is_error per session-runner.ts:312-314)
before any model turn executed. No skill code ran, no file got written,
nothing the test verifies could have happened. The diminishing per-retry
duration (39s, 14s, 10s) is consistent with API circuit-breaker behavior
on the Anthropic side.

Treat that exact shape as inconclusive rather than failing the build:

  if (result.exitReason === 'error_api' && result.costEstimate?.turnsUsed === 0) {
    console.warn('[transient] ... — treating as inconclusive');
    return;
  }

Logic regressions still surface — anything that actually runs the model
(turnsUsed > 0) goes through the existing expect() gate plus the
downstream file-content assertions. This only catches the narrow case
where the model never ran at all.

Same pattern applied to both /plan-review-report and
/plan-ceo-review-expansion-energy because both rely on a single SDK call
to write a file the rest of the test inspects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: roll up iOS port CHANGELOG entry as v1.43.0.0

The v1.41.0.0 changelog entry was a branch-internal version label —
v1.41.0.0 never landed on main. Main went 1.40.0.0 → 1.41.1.0 →
1.42.0.0 → 1.42.1.0 while the iOS port lived on this branch. Per the
CLAUDE.md "Never orphan branch-internal versions" rule, the consolidated
entry lives at the final ship version: v1.43.0.0.

Updates:

- CHANGELOG.md: rename the iOS port entry from [1.41.0.0] to [1.43.0.0]
  with today's date (2026-05-20). Expand the entry to cover the
  post-1.41 hardening that landed in 1.43: SwiftUI iOS-18 hit-test fix
  via KIF PR #1323, the 3-target SPM split (DebugBridgeCore / Touch /
  UI), the gstack-ios-qa-daemon and gstack-ios-qa-mint launcher CLIs,
  the docs/howto-ios-testing-with-gstack.md walkthrough, and the
  real-iPhone-17-Pro-Max smoke verification.
- README.md: "/ios-qa (v1.40+)" → "(v1.43.0.0+)".
- AGENTS.md: "iOS device-farm (v1.40.0.0+)" → "(v1.43.0.0+)".

No other places reference the legacy iOS-port version label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): move v1.43.0.0 entry to the top

Root cause: when commit e22de602 renamed the iOS port entry from
[1.41.0.0] to [1.43.0.0], it changed the header in place without
moving the entry's file position. The block stayed slotted between
[1.41.1.0] and [1.40.0.0] — the position that made numeric sense
when it was 1.41.0.0. The next main merge (fcb491d5) brought in
1.42.2.0 / 1.42.1.0 which correctly stacked at the top, but the
1.43.0.0 entry stayed stranded in the middle.

CLAUDE.md is explicit: "Your entry goes on top because your branch
lands next." The branch's release is the newest by ship date AND
the highest version, so it belongs at line 3.

Now: [1.43.0.0] → [1.42.2.0] → [1.42.1.0] → [1.42.0.0] → [1.41.1.0]
→ [1.40.0.0]. Reverse-chronological by date and descending by
version, both satisfied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 16:09:26 -07:00
Garry Tan 6f31954299 chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision
CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574
(garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0.
Next available MINOR slot is v1.43.2.0.

Bump VERSION + package.json + CHANGELOG entry header. No behavior
changes — purely re-versioning to clear the queue collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 11:55:17 -07:00
Garry Tan 72dac4e392 chore(release): v1.43.0.0 — post-Daegu paper-cut wave
Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new
env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX,
behavior expansion in browse/src/browser-manager.ts headless launch,
three skill-template prompt changes affecting /retro, /review,
/sync-gbrain).

CHANGELOG entry leads with what stopped happening: /retro stops
fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping
35-min restarts on big brains, /review stops shipping framework FPs the
reviewer never grep'd.

18 fixes total — 15 community PRs + 3 self-filed silent-failure issues
(#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7
new regression test files. Every wave-touched test file passes in
isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:26:40 -07:00
Garry Tan 0ee920bbe6 test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591)
PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added
gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not
update the schema regression check in
test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical
order matching the rest of the schema array. Downstream sync-gbrain ignores
unknown keys, so this is forward-compat.

Without this, the test fails with a diff:
  + "gbrain_pooler_mode"
because keys is the actual set returned and the expected array was
pre-#1591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:23:42 -07:00
Garry Tan 8df2a9ca00 test(fixtures): regenerate ship-SKILL.md golden baselines
ship/SKILL.md consumes the Confidence Calibration resolver via the
preamble pipeline. This wave's #1539 pre-emit verification gate extends
the resolver text, which propagated to ship/SKILL.md via gen:skill-docs.
The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape
and failed the host-config regression check.

Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md
to match the current generated output. Matches the Daegu wave's bisect
commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:19:54 -07:00
Garry Tan 144327dc3d test(learnings): align injection-prevention tests with PR #1619 tagged-line shape
PR #1619 (preserve current entries in cross-project search) refactored
gstack-learnings-search to tag rows inline (`current\t<json>` vs
`cross\t<json>`) instead of filtering inside the bun block via
process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or
CROSS env vars — it parses the per-line tag and sets a per-entry
_crossProject flag.

The pre-existing test/learnings-injection.test.ts still asserted on the
old SLUG + CROSS env var shape. Updates:

  - Remove the SLUG env var assertion (no longer set on bash command line)
  - Remove the bun-block CROSS env var assertion (block reads the tag now,
    not the env)
  - Add a new positive assertion that the bun block parses the tag
    (sourceTag | tabIndex | crossProject)
  - Keep the shell-interpolation safety assertion unchanged — that's
    independent of the SLUG refactor

The CROSS env var is still SET on the bash command line (it controls
whether the cross-project find runs at all), but the bun child no longer
reads it. The existing "env vars set on bash command line" test continues
to pin that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:14:26 -07:00
Garry Tan e75a5e8e5f test: fill coverage gaps for PRs #1606, #1612, #1620
Three cherry-picked PRs in this wave landed without unit-test coverage for
the specific invariant they protect:

  #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname
    8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator
    directly. Asserts uppercase/digit/underscore accepted, lowercase
    REJECTED (the macOS-locale regression case), mixed-case rejected,
    LC_ALL=C scoping is local (doesn't leak to caller).

  #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn
    4 static-invariant tests on browse/src/cli.ts. The actual setsid
    syscall is hard to assert without a real spawn, so we pin the source
    shape: nodeSpawn imported from child_process; non-Windows branch uses
    nodeSpawn(...) with detached:true and .unref(); comment documents
    setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux.

  #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail
    12 static invariants on land-and-deploy/SKILL.md.tmpl + generated
    SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the
    authoritative state query, the merge-SHA capture, non-destructive
    worktree cleanup with uncommitted-work guard, autoMergeRequest probe
    on OPEN, hard "never retry gh pr merge" rule, and atomic regen
    propagation.

Failing build if any of the three invariants regresses.

Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing
glob-pattern overpermissiveness (hyphens + dots accepted) — not in
#1606's scope; documented inline as a separate cleanup target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:03:18 -07:00
Matteo Hertel bd3a6c68b2 fix(design): bump image-gen timeout to 240s + pin gpt-image-2
The design binary calls /v1/responses (gpt-4o + image_generation tool,
quality:high, 1536x1024) but aborted the request after a hardcoded 120s.
That class of request consistently takes ~140-160s end-to-end, so every
generate/variants/evolve/iterate call aborted before the image returned.

In /design-shotgun this cascades: Step 3c launches N parallel agents,
each calling `$D generate`, each aborts at 120s and retries, all fail,
the comparison board never opens — the skill appears to hang indefinitely.

Reproduced the exact API call with a longer budget: HTTP 200, valid
image, 143.5s. A real /design-shotgun run after the patch generated 3
variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the
161s case, which a naive 150s bump would still have failed.

- Bump AbortController timeout 120_000 -> 240_000 in generate.ts,
  variants.ts, evolve.ts, iterate.ts (both call sites)
- Pin the image_generation tool to model "gpt-image-2"

design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The
feedback-roundtrip.test.ts failures are a pre-existing browse-module
breakage (session.clearLoadedHtml undefined), unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:56:05 -07:00
Bharat 707a82e88c fix(browse): daemonize macOS/Linux server via setsid()
`Bun.spawn().unref()` only releases the child from Bun's event loop —
it does NOT call setsid(). The spawned bun server inherits the spawning
shell's process session. When the CLI runs inside a session-managed shell
that exits shortly after the CLI returns (Claude Code's per-command Bash
sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit
sends SIGHUP to every PID in the session — killing the bun server and
its Chromium grandchildren within seconds of a successful `connect`.

Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and
pair-agent) disables the parent-process watchdog but does NOT save the
server here: SIGHUP from session teardown still reaps it.

Replace the macOS/Linux `Bun.spawn().unref()` with Node's
`child_process.spawn({ detached: true })`, which calls setsid() and
gives the server its own session leader role (PPID=1, STAT=Ss). This
mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root
cause, different OS surface.

Verified on macOS in Conductor: pre-fix the server dies ~10–15s after
connect across separate Bash invocations; post-fix the same PID stays
alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/
`snapshot` across many separate shell calls.

The `proc?.stderr` startup-error branch is removed since both platforms
now spawn with `stdio: 'ignore'`; both fall through to the on-disk
`browse-startup-error.log` written by `server.ts`'s start().catch.
2026-05-21 09:56:03 -07:00
shohu 7703f7cfbf fix(browse): mirror isCustomChromium() guard in headless launch()
When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing
at a baked-extension build (GBrowser / GStack Browser), the headless launch()
path was unconditionally adding --disable-extensions-except / --load-extension.
This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that
launchHeaded() already guards against via isCustomChromium().

Mirror the existing guard: skip --load-extension flags when isCustomChromium()
returns true; always push the off-screen window geometry args.
2026-05-21 09:56:02 -07:00
techcenter68 e7074b54d7 fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562)
Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing
for headless agent sessions even for normal users — /qa hangs without
--no-sandbox. The kernel policy denies the unprivileged user namespaces
Chromium needs.

Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces
the sandbox off without changing the default for everyone else. Re-authored
from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper —
purely additive, preserves the headed-launch sandbox-on-by-default behavior
that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar.

Three new regression tests cover:
  - linux + override=1 → false (the named use case)
  - darwin + override=1 → false (env wins on any platform)
  - override=0 → does NOT trigger (must be exactly "1")

Original diff by @techcenter68 via #1562.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:55:54 -07:00
0xDevNinja 7ea6b1dc89 fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects
- Single-object pooler API responses default to transaction-mode at 6543,
  but the shared pooler tenant on new projects only listens on session/5432
- Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note
- Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat
- 5 new tests covering rewrite, no-op shapes, env opt-out, array path

Fixes #1301.
2026-05-21 09:51:08 -07:00
mikeangstadt db2ed599a3 fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435)
When gbrain connects through a PgBouncer transaction-mode pooler (port
6543), it auto-disables prepared statements. This breaks `gbrain search`
silently — the /sync-gbrain capability check fails and the GBrain Search
Guidance block never gets written to CLAUDE.md.

Three-layer fix:

1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in
   the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env
   passed to every gbrain spawn. This is the single chokepoint — all
   gstack gbrain invocations inherit the fix. Caller can opt out with
   `GBRAIN_PREPARE=false`.

2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports
   `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s
   delay for async index propagation under connection pooling.

3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field
   ("transaction" | "session" | null) in the preamble probe JSON so
   /setup-gbrain and /sync-gbrain can advise users about pooler state.

Closes #1435

Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 09:51:06 -07:00
David Foy c427340fce fix(land-and-deploy): detect merged PR after gh failure
After `gh pr merge` exits non-zero, the PR may already be MERGED server-side
(concurrent merge landed, or local cleanup phase failed AFTER the merge
succeeded). Calling `gh pr merge` a second time then errors with a confusing
"already merged" — and worse, the deploy workflow never runs because we
stopped on the first failure.

Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY
non-zero exit from `gh pr merge`:

  - state == MERGED  → record MERGE_PATH=direct, OFFER (don't force)
                       stale-worktree cleanup on the base branch with
                       uncommitted-work guard, proceed to §4a CI watch
  - state == OPEN    → check autoMergeRequest; if non-null treat as
                       merge-queue wait; if null surface both errors and STOP
  - state == CLOSED  → STOP

Hard invariant: never retry `gh pr merge` after a non-zero exit. Server
state is authoritative.

Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of
truth) instead of the generated SKILL.md, so the next gen:skill-docs run
preserves the change. Original diff by @davidfoy via #1620.

Related: cli/cli#3442, cli/cli#13380.

Contributed by @davidfoy via #1620.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:50:51 -07:00
Andrey Esipov 5e20b41743 fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard)
In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash
glob brackets like `[A-Z]` match lowercase letters too, so the existing
`case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case`
through validation. The function then trips `printf -v "$varname"` and
`export "$varname"` with `not a valid identifier` errors that surface
mid-prompt, which is exactly what the validator was supposed to prevent.

Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics
on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*`
contract. Declared `local` so it doesn't leak to the calling shell —
`gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare
assignment would mutate the caller's locale for the rest of the process
(silently affecting downstream `sort`, `tr`, locale-aware globs in the
same shell, etc.).

The existing regression test
`test/gbrain-lib-verify.test.ts:'rejects invalid var names'`
already covers the macOS repro shape (passes `lower-case` and expects
the validator to reject + emit `invalid var name`). On Linux CI the
test silently passed because `LC_ALL=C` is the typical default; on
macOS dev boxes it fails.

Verified:
- `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS).
- `_gstack_gbrain_validate_varname lower-case; echo $?` → 2.
- `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0.
- Caller's LC_ALL preserved across calls (confirmed via sourced bash).
2026-05-21 09:49:56 -07:00
Jayesh Betala 07a84a0bc7 fix(memory): probe gitleaks without shell builtin 2026-05-21 09:49:55 -07:00
Jayesh Betala 78d30524fd fix(setup): register root gstack slash alias 2026-05-21 09:49:53 -07:00
Jayesh Betala 873799c90a fix(learnings): preserve current entries in cross-project search 2026-05-21 09:49:52 -07:00
Jayesh Betala b9eefbed68 fix(artifacts): reject malformed remote paths 2026-05-21 09:49:50 -07:00
Jayesh Betala 7320f36ab4 fix(benchmark): parse positional prompt after flags 2026-05-21 09:49:36 -07:00
Jayesh Betala d7f474f8a4 fix(config): expose explain_level default 2026-05-21 09:49:03 -07:00
Garry Tan 7ec546deb4 test(review): regression for #1539 pre-emit verification gate
12 tests pinning the gate behavior:

  - Resolver emits the gate header + #1539 reference
  - Gate requires quoting file:line + verbatim text
  - Unverified findings forced to confidence 4-5 (auto-suppress via
    existing <7-rule, no new mechanism)
  - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM,
    Sequelize, Prisma
  - Deferred design doc reference present (1539-framework-aware-review.md)
  - Four named FP classes from #1539 enumerated:
      * field doesn't exist on model
      * dict.get() might be None
      * save() might lose fields
      * update_fields might miss X
  - All four downstream SKILL.md consumers (review, cso, plan-eng-review,
    ship) carry the gate text after gen:skill-docs
  - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows
    unchanged (regression on existing behavior)

Failing build if the gate is removed, the suppression mechanism is
re-invented separately, the framework-meta nudge drops a framework, or
gen:skill-docs stops propagating the gate to consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:48:24 -07:00
Garry Tan 2a517753ec fix(review): pre-emit verification gate kills Django-shape FP class (#1539)
External user filed 4/8 false positives on a /review run against a Django +
DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape:
"resolvable in <5 minutes by viewing the actual code or running a simple
grep" — fields that don't exist on the model, dict.get()-might-be-None on a
form that returns {}-initialized cleaned_data, standard ORM save behavior
called out as data loss.

Extends the Confidence Calibration resolver (consumed by review, cso,
plan-eng-review, ship) with a Pre-emit verification gate:

  Every finding MUST quote the specific code line that motivates it
  (file:line + verbatim text). If the reviewer cannot produce the quote,
  the finding is unverified — its confidence is forced to 4-5 so the
  existing "Suppress from main report" rule fires automatically. The
  finding still goes to the appendix for calibration audit, but the user
  does not see it in the critical-pass output.

Reuses the existing suppression mechanism — no new code path. The FP
classes the gate kills are enumerated in the resolver text so reviewers
see the named patterns.

Framework-meta nudge included for Django Meta, Rails associations,
SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma
generated client — the reviewer must quote the meta-construct that
generates the symbol, not just grep for the literal name. Deeper
framework-aware ORM verification (model introspection, migration-history-
aware checks) is deliberately deferred to a future wave per T-Codex-2.

Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit
per T-Codex-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:47:33 -07:00
Garry Tan 64a7bee176 test(gbrain-sync): regression for #1611 timeouts + resume
19 tests across three surfaces:

  - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric,
    zero, negative, below-floor, above-ceiling → warn + default; at-floor,
    at-ceiling, valid mid-range → accepted as-is.

  - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging
    ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with
    empty dir.

  - SIGTERM staging preservation (3 static invariants): memory-ingest signal
    handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve
    branch must come before cleanup branch (ordering); orchestrator must
    pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume.

Also threads process.env.HOME through readGbrainCheckpoint and
stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches
at process start and ignores later mutation, so the env override is the
only reliable test injection point.

Failing build if the timeout bounds are removed, the resume detection
short-circuits incorrectly, or the SIGTERM handler regresses to
unconditional cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:45:56 -07:00
Garry Tan 700c9a4ff8 fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611)
The memory and code stages hardcoded a 35-min spawn timeout. On brains with
~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at
exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json
pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler
unconditionally cleaned the dir up — so the next run found a checkpoint
pointing at nothing and restaged from scratch, repeating the SIGTERM forever.

Three changes:

1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default
   2_100_000ms = 35min unchanged):
     GSTACK_SYNC_MEMORY_TIMEOUT_MS
     GSTACK_SYNC_CODE_TIMEOUT_MS
   Out-of-range or non-numeric values warn and fall back to the default.

2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging
   dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at
   the active staging dir, the dir is PRESERVED for next-run resume.
   Otherwise (no checkpoint pointing here, crash before gbrain ever
   touched it) it's cleaned up as before.

3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in
   gstack-gbrain-sync.ts:
     - no checkpoint               → fresh ingest pass
     - checkpoint + staging ok     → set GSTACK_INGEST_RESUME_DIR; child
                                      reuses staging dir and skips
                                      writeStaged; gbrain import resumes
                                      from processedIndex+1
     - checkpoint + staging gone   → warn "previous checkpoint stale
                                      (staging dir gone), restaging from
                                      scratch" and proceed

Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store
state). Detect-then-fallback semantics per C1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:44:10 -07:00
Garry Tan a05546cddc test(retro): regression for #1624 stale-base pre-flight guard
13 static-invariant tests pinning the four ordered pre-check branches in
retro/SKILL.md.tmpl:Step 0.5:

  A. no-remote skip            — must check origin presence + set verdict
  B. detached-HEAD skip        — must gate behind prior verdict (ordering)
  C. fetch-fail warn           — must match `if !` or `||` shape, gate by verdict
  D. stale-base BLOCK          — must read latest-commit ISO date, cite remediation

Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be
named in prose so the retro output carries the cited reason rather than
silently misreporting.

Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer
wins), or the BLOCK message stops citing today/latest-commit/remediation
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:40:06 -07:00
Garry Tan d709139513 fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624)
/retro silently produced confidently-wrong output when "today" drifted (model
session-context error) or when origin/<default> was materially behind the
actual remote — git log --since returned zero or near-zero commits and the
narrative was fabricated from nothing.

Adds Step 0.5 with four ordered pre-check branches before any window analysis:

  A. No 'origin' remote → skip with "base freshness not verified" note
  B. Detached HEAD → skip with "base freshness not verified" note
  C. `git fetch origin <default>` fails (offline) → warn, proceed against
     last-known origin/<default>
  D. Fetch succeeded → compare today vs latest origin/<default> commit; if
     gap > window-days, BLOCK with explicit citation of latest-commit date.

Skip paths still proceed to Step 1, but the disclosure is carried into the
retro narrative ("offline run, window not freshness-verified") so the output
is never silently confidently-wrong.

Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:37:00 -07:00
Garry Tan d6b6737ba3 fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL
The freshClassify probe ran `gbrain sources list --json` with the inherited
process env. When the probe ran from inside a repo with its own .env (an app
DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain
connected to the wrong database, and the classifier reported broken-db on
otherwise-healthy brains.

Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the
same helper the sync orchestrator uses. DATABASE_URL is seeded from
~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no
longer propagate a poisoned negative to clean directories.

Contributed by @jetsetterfl via #1583.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:35:42 -07:00
Garry Tan b84ec551a4 fix(gbrain-sync): --full produces an empty code index on first run of a new repo
`gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks
the filesystem. On a freshly-registered source (0 pages), a --full run that
called reindex-code alone found nothing ("No code pages to reindex"), finished
in ~1s, and left the code index permanently empty while still reporting OK.

Fix: --full now runs `sync --strategy code` FIRST to create pages via the file
walk, then runs `reindex-code` to honor the documented "full walk + reindex"
contract for both fresh and populated sources.

Contributed by @jetsetterfl via #1584.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 09:34:53 -07:00
Garry Tan 029356e1f0 v1.42.2.0 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring) (#1629)
* v1.42.1.1 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring)

Bundles two browse launch-path bug fixes plus the missing exit-code wiring
that made the second fix actually work end-to-end.

PR #1617 — Chromium sandbox policy at all 3 launch sites
- shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER /
  root heuristic that previously lived only in the headless launch path.
- launch(), launchHeaded() / launchPersistentContext(), and handoff() now
  share the policy so Playwright stops auto-adding --no-sandbox on every
  headed launch and the yellow "unsupported command-line flag" infobar
  disappears on macOS and Linux dev.

PR #1626 — clean Cmd+Q stops triggering supervisor respawn
- resolveDisconnectCause(browser) reads the underlying Chromium
  ChildProcess exitCode + signalCode (with a 1s wait for an async exit
  event) to distinguish clean user-quit from crash.
- handleChromiumDisconnect(browser) dispatches the headless launch()
  disconnect path: clean → exit(0), crash → exit(1).
- launchHeaded() disconnect handler resolves cause inline and computes
  exitCode = 0 (clean) | 2 (crash) before forwarding to onDisconnect.
- handoff() disconnect handler uses the same shared helper.

Codex-caught propagation fix (this commit, not in either source PR)
- BrowserManager.onDisconnect signature widened to accept an exitCode
  argument. Without this, launchHeaded's locally-computed exit code was
  dropped before reaching server.ts.
- browse/src/server.ts:688 — onDisconnect callback now forwards the
  resolved code: (code) => activeShutdown?.(code ?? 2). The ?? 2
  preserves legacy crash semantics for callers that invoke onDisconnect
  without an explicit code.

Tests
- browse/test/browser-manager-unit.test.ts goes from 2 → 17 tests.
- 6 new tests pin shouldEnableChromiumSandbox across darwin / linux /
  win32 / CI / CONTAINER / root.
- 7 new tests pin resolveDisconnectCause across already-exited,
  async-exit, SIGSEGV, SIGKILL, and null-browser.
- 2 new tests (this commit) pin the onDisconnect(exitCode) propagation
  contract including the exact server.ts forwarding callback shape so a
  refactor that drops the forward fails CI before the user-visible
  respawn bug returns.

Refs PRs #1617, #1626; companion gbrowser PR #23.

* chore: bump version v1.42.1.1 → v1.42.2.0

User-requested rebump (claims v1.42.2.0 slot on the queue).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 19:30:08 -07:00
Garry Tan b03cd1ae2d v1.42.1.0 feat: gate terminal-agent teardown on ServerConfig.ownsTerminalAgent (unblocks gbrowser embedder) (#1615)
* feat: gate terminal-agent teardown on ServerConfig.ownsTerminalAgent

Adds ownsTerminalAgent?: boolean to ServerConfig (default true). Wraps the
three shutdown side effects (pkill -f terminal-agent\.ts + 2 safeUnlinkQuiet
calls for terminal-port and terminal-internal-token) inside a single
if (ownsTerminalAgent) block. Embedders (gbrowser phoenix overlay) pass
false to keep their own PTY lifecycle intact across gstack's teardown.

CLI start() call site passes ownsTerminalAgent: true explicitly; static-grep
test in the new test file catches a refactor that drops it.

Strict opt-out: only explicit false flips the gate (cfg.ownsTerminalAgent
=== false ? false : true). Defends against JS callers passing truthy non-bool
values.

Adds __resetShuttingDown test-only export mirroring __resetRegistry. The
module-scoped isShuttingDown latch otherwise silently no-ops a second
shutdown() in the same process.

Drops dead try/catch wrappers around safeUnlinkQuiet inside the new gate —
safeUnlinkQuiet already swallows all errors internally.

New test file (4 cases) stubs both process.exit AND child_process.spawnSync
so a real pkill -f terminal-agent\.ts never fires on the developer machine.
beforeAll/afterAll save and restore real-daemon file contents in the state
dir so the test cannot clobber a running gstack session.

* chore: file followup TODOs (identity-based pkill, cfg.config composition gap, ownership-object trigger)

Three P3 followups surfaced by /autoplan + /plan-eng-review while reviewing
the ownsTerminalAgent gate:

- Identity-based terminal-agent kill: pkill -f terminal-agent\.ts is a latent
  CLI footgun (regex match kills sibling gstack sessions, editor processes,
  etc.). Replace with PID-tracked process.kill at both cli.ts:1047 and
  server.ts:1281.

- shutdown() reads module-level config, not cfg.config (pre-existing
  composition gap). Same gap applies to cleanSingletonLocks(resolveChromiumProfile())
  at server.ts:1298 (should be cfg.chromiumProfile). Both are followup work
  for the embedder-composition story.

- 4th caller-owned teardown gate trigger: today ServerConfig has 3 (xvfb?,
  proxyBridge?, ownsTerminalAgent). If a 4th appears, collapse to
  cfg.callerOwns?: Set<...> ownership object.

* chore: bump version and changelog (v1.42.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: note ServerConfig.ownsTerminalAgent in CLAUDE.md sidebar block

Adds a one-paragraph reference for the v1.42.1.0 embedder teardown gate
right after the Sidebar architecture block. Covers default semantics,
when embedders must pass `false`, polarity inversion vs xvfb?/proxyBridge?,
and the static-grep CI test that pins the CLI call site.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 08:41:29 -07:00
Garry Tan 7ca04d8ef0 v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569)

gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for
GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin
(e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env
via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints,
analytics, and learnings into that plugin's directory. Anyone with the
Codex plugin installed alongside gstack hit this silently.

Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when
CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path
contains "gstack"). Skill installs fall through to \$HOME/.gstack.

Contributed by @ElliotDrel via #1570. Closes #1569.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+

gbrain v0.20+ changed `gbrain sources list --json` to return
{sources: [...]} instead of a flat array. sourceLocalPath crashed
upstream with `list.find is not a function` on every /sync-gbrain
invocation against modern gbrain. Accept both shapes for
forward/backward compat, matching probeSource/sourcePageCount in
lib/gbrain-sources.ts.

Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564
(@tonyjzhou, same fix, different shape — credit retained).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559)

gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`,
which fails in any environment where the `command` builtin isn't on
the spawned process's PATH (most non-interactive shells). The probe
then reported gbrain as missing even when it was installed, and
context-load silently skipped vector/list queries.

Fix: probe `gbrain --version` directly with a 500ms timeout (matching
the rest of the file's MCP_TIMEOUT_MS). Same semantics, works
everywhere execFile works.

Contributed by @jbetala7 via #1560. Closes #1559.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418)

Adds an exec-path regression test that runs a fake gbrain shim emitting
the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings",
exit 1 for health_score < 100, no top-level `engine` field). Confirms
freshDetectEngineTier recovers stdout from the non-zero exit and falls
back to GBRAIN_HOME/config.json for the engine label.

The pre-existing test for #1415 only stripped gbrain from PATH; this
test exercises the actual doctor parse path, closing the gap that
codex's plan review flagged.

Also documents the schema_version separation in
lib/gbrain-local-status.ts: the local CacheEntry stays at version 1,
distinct from the doctor-output schema_version which we accept across
versions in gstack-memory-helpers.

Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2
collapse). The fix landed pre-emptively in v1.29.x; this commit pins
it with a stronger test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346)

#1346 reported that gstack-memory-ingest still called the renamed
gbrain put_page subcommand on gbrain v0.18+. The actual code migrated
to `gbrain put` and later to batch `gbrain import <dir>` before this
report landed — only documentation lag remained.

This commit:
- Updates the --help string ("Skip gbrain put calls (still updates
  state file)") so user-facing docs match the shipped subcommand
- Updates two inline comments that still referenced the old name
- Adds test/memory-ingest-no-put_page.test.ts: a regression pin that
  strips comments from bin/gstack-memory-ingest.ts and fails the build
  if "put_page" appears in any active code or string literal, plus a
  sanity check that the file still calls a supported gbrain page-write
  verb (put or import)

Closes #1346. Reporter @kylma-code surfaced the doc lag; the original
code migration credit is on the v1.27.x wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug>

scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions
using the renamed `gbrain put_page` subcommand across 10 skills
(office-hours, investigate, plan-ceo-review, retro, plan-eng-review,
ship, cso, design-consultation, fallback, entity-stub). Every gstack
user copying those snippets hit "unknown command: put_page" on gbrain
v0.18+.

This commit:
- Rewrites all 10 instruction templates to use `gbrain put <slug>
  --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML
  frontmatter inside --content, matching the v0.18+ subcommand shape
- Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands"
  table to reference `gbrain put` and `gbrain get`
- Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two
  invariants: (a) resolver source ships only canonical instructions,
  (b) every tracked SKILL.md file is free of `gbrain put_page`

CHANGELOG entries are deliberately left untouched (historical record).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561)

Bun's Windows shell parser rejects multiple constructs the inline
package.json build chain used: brace groups `{ cmd; }`, subshells with
redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells
near redirections in general. Every Windows install + every
auto-upgrade since v1.34.2.0 has failed on `bun run build`.

Extracts the build chain to scripts/build.sh and the .version writes to
scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing
involved. Also adds Windows-specific bun.exe handling for non-ASCII
PATHs (a separate Windows footgun where Bun's --compile fails when the
binary lives under a path with non-ASCII chars).

Updates test/build-script-shell-compat.test.ts to assert the new shape:
no subshells with redirections anywhere in the build chain, and build
delegates to scripts/build.sh which delegates .version writes.

Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed
in build helper), #1480 (@mikepsinn, partial overlap), #1460
(@realcarsonterry, brace-group fix subsumed) — credit retained.
Closes #1538, #1537, #1530, #1457, #1561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554)

bun build --compile on Windows appends .exe to the output filename,
producing browse.exe instead of browse. find-browse's existsSync probe
only checked the bare path and returned null on Windows even when the
binary was correctly built. .gitignore similarly only excluded the
bare bin/gstack-global-discover path, leaving the .exe variant
tracked.

This commit:
- .gitignore: changes `bin/gstack-global-discover` →
  `bin/gstack-global-discover*` so the Windows .exe variant is ignored
- browse/src/find-browse.ts: adds isExecutable + findExecutable helpers
  that fall back to .exe/.cmd/.bat probing on Windows, mirroring the
  same helper already in make-pdf/src/browseClient.ts and pdftotext.ts

Contributed by @Mike-E-Log via #1554.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest

Adds .github/workflows/windows-setup-e2e.yml as the gate that catches
Bun shell-parser regressions in the build chain before they reach
users. Triggers on PRs touching package.json, scripts/build.sh,
scripts/write-version-files.sh, setup, browse cli/find-browse, or
gstack-paths.

What it verifies:
1. bun run build completes on Windows (the previously-broken path that
   #1538/#1537/#1530/#1457/#1561 reported)
2. All compiled binaries land on disk (browse.exe, find-browse.exe,
   design.exe, gstack-global-discover.exe)
3. find-browse resolves to the .exe variant on Windows (regression
   gate for #1554)
4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT
   on Windows (regression gate for #1570)

Complements the existing windows-free-tests.yml (curated unit subset);
this new workflow exercises the install path itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209)

Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together
(mutually exclusive at argv level). Every /codex review, /review, and
/ship structured Codex review call ended with an argv error before the
model ran.

Fix: scope the diff in prompt text using
"Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD"
instead of `--base <base>`. Preserves the filesystem boundary
instruction across all invocations and keeps Codex's review prompt
tuning.

Touches:
- codex/SKILL.md.tmpl + regenerated codex/SKILL.md
- scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md
- test/gen-skill-docs.test.ts: new regression that fails if any of the
  five known files still contain the prompt+--base shape
- test/skill-validation.test.ts: corresponding negative + positive pin
  on the rendered SKILL.md files

Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527
(@mvanhorn — same intent, different patch shape, CONFLICTING) and
#1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained
in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): diff from git merge-base, not git diff origin/<base> (#1492)

git diff origin/<base> shows everything since the common ancestor in
both directions — it includes commits that landed on origin/<base>
after this branch was created as deletions. That made /review and
/ship's pre-landing structured review report inflated diff totals and
flagged "removed" code that was actually still present in the working
tree.

Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff
the working tree against that point. Same coverage of uncommitted
edits, no phantom deletions from out-of-order base advancement.

Applies to /review's Step 1 (diff existence check), Step 3 (get the
diff), the build-on-intent scope-creep check, the structured review
DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt.
Same change flows into ship/SKILL.md via the shared resolver.

Touches:
- review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md
- scripts/resolvers/review.ts
- scripts/resolvers/review-army.ts

Contributed by @mvanhorn via #1492.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522)

#1503 reported that the bare codex review --base path stripped the
filesystem boundary instruction, letting Codex spend tokens reading
.claude/skills/ and agents/. #1522 proposed adding a skill-path
detector that switched to the custom-instructions route when the diff
touched skill files.

After C10 (#1209) restructured codex review to always carry the
boundary in the prompt (the prompt+--base argv conflict forced the
restructure), the skill-path detector becomes redundant — every
default call already preserves the boundary.

This commit pins the post-#1209 invariant with a test that fails the
build if any future refactor strips the boundary from codex/SKILL.md,
review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test.

#1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers
its safety concern); credit retained in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): use command -v instead of which for codex detection (#1197)

`which` is not on PATH in every shell — some Windows shells, BusyBox-
only containers, and minimal CI images all fail when skills probe
codex availability via `which codex`. `command -v` is a POSIX builtin
and always available where the skill is running.

Touched:
- codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "")
- scripts/resolvers/review.ts and scripts/resolvers/design.ts:
  3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1`
- Regenerated all 10 affected SKILL.md files (codex, review, ship,
  design-consultation, design-review, office-hours, plan-ceo-review,
  plan-design-review, plan-devex-review, plan-eng-review)
- test/skill-validation.test.ts: updated pin + defensive regression
  test that fails if `which codex` returns to codex/SKILL.md
- test/skill-e2e-plan.test.ts: updated summary regex

Contributed by @mvanhorn via #1197.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327)

When codex exits non-zero (parse errors, arg-shape breaks, model API
errors that propagate as non-zero status), the calling agent
previously saw an empty output and burned 30-60 minutes misdiagnosing
as a silent model/API stall. The hang-detection block only caught
exit 124 (the timeout-wrapper signal).

Adds elif blocks in all four codex invocation sites (Review default,
Challenge, Consult new-session, Consult resume) that:
- Echo "[codex exit N] <stderr first line>" to stdout
- Indent the first 20 stderr lines for inline context
- Log codex_nonzero_exit telemetry tagged with the call site

Contributed by @genisis0x via #1467. Closes #1327.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248)

The design binary previously called process.env.OPENAI_API_KEY without
checking where the key came from. If a user ran $D inside someone
else's project that had OPENAI_API_KEY in its .env, the resulting
generation billed that project's account. Silent and irreversible.

Fix: resolveApiKeyInfo() returns both the key and its source. When the
env-var path matches an OPENAI_API_KEY entry in the current
directory's .env, .env.<NODE_ENV>, or .env.local file, we set a
warning. requireApiKey() prints "Using OpenAI key from <source>" plus
the warning before the run — never the key itself.

Adds 6 unit tests covering: config-vs-env precedence, env-only (no
match), env+cwd .env match, quoted/exported values, value-mismatch
(no false positive), and the no-leak invariant for requireApiKey
stderr output.

Contributed by @jbetala7 via #1278. Closes #1248.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214)

Full-page screenshots of tall pages routinely exceeded 2000px on the
longest dimension, silently bricking the agent's session: the
resulting base64 reached the Anthropic vision API which rejected the
oversized image, leaving the agent burning turns on a useless blob
with no stderr trace from the browse side.

Adds browse/src/screenshot-size-guard.ts as a shared helper:
- guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000
- guardScreenshotPath(path) → file-mode variant that rewrites in place
- Aspect ratio preserved via sharp's resize fit:inside
- Stderr diagnostic on any downscale so callers can see when it fired
- Lazy sharp import so non-screenshot paths pay no startup cost

Wires the guard into all three full-page callsites codex review
flagged:
- browse/src/snapshot.ts: annotated + heatmap fullPage captures
- browse/src/meta-commands.ts: screenshot command (path + base64
  fullPage modes) plus the responsive 3-viewport sweep
- browse/src/write-commands.ts: prettyscreenshot fullPage path

Covers seven unit cases (pass-through, downscale, aspect ratio,
exactly-2000px edge, file-mode rewrite) plus a static invariant test
that fails the build if any of the three callsites stops importing the
guard.

Closes #1214.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370)

The L4 TestSavant classifier in browse/src/security-classifier.ts
can't be imported into the compiled browse server (onnxruntime-node
dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent
that used to host it (sidebar-agent.ts) was removed when the PTY
proved out — leaving the classifier file shipped but with zero
callers. Exactly the gap codex flagged in #1370.

Adds browse/src/security-sidecar-entry.ts: a Node script that runs the
classifier as a subprocess of the browse server. It reads NDJSON
requests from stdin and writes id-correlated NDJSON responses to
stdout, supporting:
  - op: "scan-page-content" — full L4 classifier scan
  - op: "ping" — liveness probe for the client's health check
  - op: "status" — classifier readiness (used by /pty-inject-scan to
    surface l4 { available: bool } in its response)

Plus browse/src/find-security-sidecar.ts: a resolver that locates
node + the bundled JS entry (browse/dist/security-sidecar.js, built in
a follow-up package.json change) or falls back to the dev TS entry.
Returns null cleanly when node isn't on PATH so the calling endpoint
can degrade per D7 (extension WARN + user confirm).

C17 of the security-stack wave. C18 adds the IPC client + lifecycle
management; C19 wires the endpoint; C20 routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370)

Adds browse/src/security-sidecar-client.ts to manage the Node L4
classifier subprocess from the compiled browse server:

- Lazy spawn on first scan; reuses the same process across requests
- Id-correlated request/response via NDJSON over stdio
- 5s default per-scan timeout; 64KB payload cap (short-circuits before
  spawn so oversized requests don't waste a process)
- 3-in-10-minutes respawn cap → trips circuit breaker; subsequent
  scans throw immediately so the /pty-inject-scan endpoint can surface
  l4 { available: false } to the extension and degrade to WARN+confirm
- process.on('exit') sends SIGTERM to the child for clean teardown
- isSidecarAvailable() lets the endpoint probe before scan calls so
  the response shape reflects degraded mode honestly

Unit tests cover the payload cap, the availability probe, and the
breaker-doesn't-crash invariant under repeated rejected calls.

C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20
routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370)

The sidebar's gstackInjectToTerminal callers (toolbar Cleanup,
Inspector "Send to Code") were piping page-derived text directly into
the live claude PTY with ZERO classifier processing — the gap codex
flagged in #1370. The documented sidebar security stack had a hole
the size of every Cleanup-button click.

Adds POST /pty-inject-scan to browse/src/server.ts:
- Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the
  general 404 path; never reaches the scan logic)
- Root-token auth via existing validateAuth() — 401 on unauth
- 64KB request cap → 413 + payload-too-large body
- 5s scan timeout via sidecar client
- URL-blocklist forced to BLOCK in PTY context (page-derived REPL
  input is higher-risk than ordinary tool output)
- L4 ML classifier via the sidecar when available; degrades to WARN
  per D7 when sidecar is unavailable
- Response goes through JSON.stringify(..., sanitizeReplacer) per
  v1.38.0.0 Unicode-egress hardening
- Imports only from security-sidecar-client.ts, never directly from
  security-classifier.ts (which would brick the compiled Bun binary)

Seven static-invariant tests pin the POST verb, auth gate, 64KB cap,
tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability
shape, and the no-direct-classifier-import rule.

C19 of the security-stack wave. C20 routes the extension through it;
C21 adds the invariant AST check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370)

Closes the documented-vs-shipped gap codex flagged in #1370. The
sidebar's two PTY-injection call sites (Inspector "Send to Code" and
toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint
before writing to the live claude REPL.

Adds window.gstackScanForPTYInject(text, origin) to
extension/sidepanel-terminal.js:
- Async, returns { allow, verdict, reasons, l4 }
- POST to /pty-inject-scan with the existing root-token auth
- WARN+confirm on scan failure (network down, sidecar absent, etc.)
  rather than silent PASS — D7 honest-degradation

gstackInjectToTerminal stays synchronous, returns boolean. Per D6:
keeping the inject sync means existing `const ok = ...?.()` callers
don't break, and the invariant test in
test/extension-pty-inject-invariant.test.ts can statically pin that
every call goes through the scan first.

extension/sidepanel.js call sites updated:
- inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via
  window.confirm, PASS injects silently
- runCleanup() → same flow. Static cleanup prompt always PASSes but
  still routes through scan to honor the invariant.

C20 of the security-stack wave. C21 adds the static invariant test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): invariant — extension PTY inject must be scan-gated (#1370)

Static-analysis invariant test that fails the build if any
extension/*.js path calls window.gstackInjectToTerminal without a
preceding window.gstackScanForPTYInject in the same enclosing
function. Closes the documented-vs-shipped gap codex demanded a
machine check on.

Rules:
- Rule 1: any file that calls inject must also reference scan
- Rule 2: in the enclosing function (function declaration, arrow,
  async (), event handler), a scan call must appear before the inject
  call by source position
- Exemption: sidepanel-terminal.js (the file that DEFINES the inject
  function) is exempt from Rule 2 since the definition is not a call

Plus two structural checks:
- sidepanel-terminal.js defines both the inject and scan functions
- inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would
  silently break the `const ok = ...?.()` pattern at every caller

C21 of the security-stack wave. The sidecar architecture (#1370) is
complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension
pre-scan wiring (C20), and now the regression gate (C21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112)

Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current
browse/src/stealth.ts contract. The existing minimal "codex narrowed"
stealth (webdriver-mask + AutomationControlled launch arg) stays the
default. PR #1112's six additional patches are added behind an opt-in
GSTACK_STEALTH=extended env flag.

Extended-mode patches (applied AFTER the default mask, in order):
  1. delete navigator.webdriver from prototype (not just the getter —
     detectors check `"webdriver" in navigator`)
  2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1
     software-GPU tell in containers)
  3. navigator.plugins returns a PluginArray-prototype-passing array
     with MimeType objects and namedItem()
  4. window.chrome populated with chrome.app, chrome.runtime,
     chrome.loadTimes(), chrome.csi() with realistic shapes
  5. navigator.mediaDevices backfilled when headless drops it
  6. CDP cdc_*-prefixed window globals cleared

Why opt-in: the default mode's contract is fingerprint CONSISTENCY,
which protects against detectors that flag spoofing mismatch. Extended
mode actively lies about the environment; sites that reflect on these
properties can break. Users who hit detection in default mode can flip
GSTACK_STEALTH=extended for SannySoft 100% pass-rate.

Twenty unit tests pin the env-flag semantics, all six patches' code
presence, and the applyStealth wiring order. Live SannySoft pass-rate
verification stays in the periodic-tier E2E suite.

Contributed by @garrytan via #1112 (rebased — original PR opened
before the codex-narrowed minimum landed; rebase preserves the
narrowed default while adding the SannySoft-passing path as opt-in).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates

Updates the three ship-SKILL.md golden baselines (claude, codex,
factory hosts) to match the new shape produced by:
- C10 #1209 codex argv (prompt + diff scope, no --base)
- C11 #1492 merge-base diff (DIFF_BASE= preamble)
- C13 #1197 command -v for codex detection
- C12 + boundary preservation per regen-enforcing test

Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs,
commit the regenerated outputs together. Goldens are part of the
regen contract — without this commit, test/host-config.test.ts'
golden-baseline checks fail with the diff codex review surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes)

Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the
release-summary format in CLAUDE.md: two-line headline, lead
paragraph, "The numbers that matter" table, "What this means for
builders" closer, then itemized Added/Changed/Fixed/For contributors
with inline credit to every PR author and original issue reporter.

Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net,
substantial new capability across security (PTY sidecar wiring),
install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI
0.130+), and quality (screenshot guard, design key disclosure,
extended stealth opt-in). MINOR is the right call.

Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537,
#1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327,
#1193 pattern, #1152 pattern. Credit retained inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe]

windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a
freshly-built repo where binaries land at browse/dist/browse.exe (no
.claude/skills/gstack/ install layout). The previous markers chain
only matched .codex/.agents/.claude prefixed paths, so find-browse
exited "not found" even when the binary was present.

Adds a source-checkout fallback after the marker scan: if no
installed layout resolves but <git-root>/browse/dist/browse[.exe]
exists, return that. Three real callers hit this path:
- gstack repo dev workflow before `./setup` runs
- windows-setup-e2e.yml CI (the breakage that surfaced this)
- make-pdf consumers running from a sibling source checkout

Smoke-verified: a fresh git repo with browse/dist/browse on disk now
resolves through the source-checkout branch (was returning null
before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:35:01 -07:00
Garry Tan 40d00bd2ce v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592)
* fix(build-app): escape sed replacement metachars in Chromium rebrand

build-app.sh injects \$APP_NAME directly into the replacement half of
sed's s/// when patching Chromium's localized InfoPlist.strings. If
\$APP_NAME ever carries '/', '&', or '\\' — the command either breaks
or starts interpreting input as sed syntax. The trailing '|| true'
would then silently hide the failure and ship a DMG that still says
'Google Chrome for Testing' in the menu bar.

Escape replacement metachars before substitution. No change for the
default name 'GStack Browser'.

* fix(build-app): bail out if 'mktemp -d' fails instead of cp-ing into '/'

The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check.
If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP
is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' —
expands to 'cp -a "<app>" "/"', which copies the bundle into the root of
the filesystem.

Refuse to continue unless mktemp produced a real directory. Defensive
second check catches the (rare) case where mktemp succeeds but returns
something that isn't a directory we can cp into.

* fix(telemetry-sync): drop predictable $$ tmp-file fallback

gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on
failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable
and reusable, so on shared hosts another user can pre-create or symlink
the path and either steal the response body or clobber an unrelated
file when curl writes through it.

Drop the fallback. If mktemp cannot produce a unique file we just skip
this sync cycle — the events stay on disk and the next run picks them
up. Also install an EXIT trap so the response file is cleaned up on
unexpected exit, not just on the happy path.

* fix(verify-rls): drop predictable $$-based tmp file fallback

Same shape as gstack-telemetry-sync: on mktemp failure the script fell
back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the
PID and a per-check counter. On a shared box another user can pre-create
or symlink the path and either capture the HTTP response body (which may
leak what the RLS tests revealed) or corrupt an unrelated file that curl
writes through.

Make mktemp strict. On failure return from the check function; the caller
tallies a FAIL and the run moves on.

* fix(security-classifier): close writer + delete tmp on download error

downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives
it from a fetch body reader, but if reader.read() or writer.write()
throws mid-download the writer is never closed. That leaks an FD per
failed attempt and leaves the half-written tmp on disk. A later retry
can land in renameSync(tmp, dest) with a truncated TestSavantAI /
DeBERTa ONNX file — which then loads but produces garbage classifier
verdicts until the user manually nukes the models cache.

Wrap the download loop in try/catch. On failure, destroy() the writer
and unlink the tmp before rethrowing, so the next attempt starts from a
clean slate.

* fix(meta-commands): guard JSON.parse in pdf --from-file parser

parsePdfFromFile() runs JSON.parse on user-supplied file contents with
no try/catch. A malformed payload surfaces as an uncaught SyntaxError
from the 'pdf' command handler and the user sees an opaque stack trace
instead of "this file isn't valid JSON". Worse, the same call path is
used by make-pdf when header/footer HTML would overflow Windows'
CreateProcess argv cap, so a corrupt payload file there can take down
the make-pdf run.

Wrap JSON.parse. Re-throw with a message that names the offending file
and echoes the parser's own explanation. Also reject top-level non-
objects (null, array, primitive) since the rest of the function treats
json as an object — catching that here produces a clear error instead
of a TypeError further down.

* fix(global-discover): stop dropping sessions when header >8KB

extractCwdFromJsonl() reads the first 8KB of each JSONL session file and
runs JSON.parse on every newline-split line. When a session record
happens to straddle the 8KB cap, the last line ends in a truncated JSON
fragment, JSON.parse throws, the catch block 'continue's silently, and
if that was the only line carrying 'cwd' the whole project gets dropped
from the discovery output without a warning.

Two independent hardening steps:
  1. Raise the read cap to 64KB. Session headers observed in Claude
     Code / Codex / Gemini transcripts fit comfortably; this just moves
     the cliff out of the normal range.
  2. Drop the final segment after splitting on '\\n'. If the read hit
     the cap mid-line, that segment is guaranteed incomplete; if the
     file ended inside the buffer, the split produces an empty final
     segment and dropping it is a no-op.

Together these make the parser robust regardless of how verbose the
leading records are.

* test: export downloadFile, parsePdfFromFile, extractCwdFromJsonl

These three internal helpers are now imported by regression tests
landing in the next commits (PR #1169 follow-up). Pattern matches the
existing normalizeRemoteUrl export in gstack-global-discover.ts which
test/global-discover.test.ts already imports side-effect-free.

No change to runtime behavior; gstack has no public package entrypoint
that would re-export these, so the in-repo surface is unchanged for
callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security-classifier): await writer close before unlinking tmp on error

The earlier downloadFile() error-path cleanup hit a race: Node's
createWriteStream lazily opens the FD and flushes buffered writes during
destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()`
hits ENOENT (file not yet on disk), then the writer's destroy finishes on the
next tick and creates the file fresh — leaving the half-written tmp behind
exactly as the original fix tried to prevent.

The new sequence awaits the writer's 'close' event before unlinking, so the FD
is fully torn down and no subsequent flush can re-create the path.

Caught by browse/test/security-classifier-download-cleanup.test.ts in the
next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(browse): regression tests for downloadFile cleanup + parsePdfFromFile guard

Covers PR #1169 bugs #6 and #7:

- security-classifier-download-cleanup.test.ts pins downloadFile error-path
  cleanup against three failure shapes: reader rejects mid-stream, non-2xx
  response, missing body. Asserts the dest file is not created and no
  <dest>.tmp.* siblings remain (glob-matched, not exact path — codex push:
  if the fix later switches to mkdtempSync, the assertion still holds).
  Includes a happy-path case so the cleanup isn't fighting a correct download.

- regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile
  to throw a helpful error for: invalid JSON, empty file, top-level array,
  top-level number, top-level string, top-level null, top-level boolean.
  Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof
  guard must be tested separately from the JSON.parse try/catch.

Both files use mkdtempSync(process.cwd()/...) for fixture isolation since
SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(global-discover): regression for extractCwdFromJsonl 64KB cap

PR #1169 bug #8: the 8KB read cap landed mid-line on Claude Code session
headers, JSON.parse threw on the truncated tail, the catch silently
continued, and the project disappeared from /gstack discovery output.

Six new cases under describe("extractCwdFromJsonl 64KB cap"):

- happy path: small JSONL with obj.cwd returns it
- 12KB first line with obj.cwd: returns cwd (the bug case)
- 80KB single line overflowing 64KB: returns null without crashing
- complete line followed by partial second line: trailing-partial-drop
  must not poison the result; returns first line's cwd
- missing file: returns null (file read error swallowed)
- malformed first line + valid second line within cap: skips bad,
  returns second's cwd

Tests use the exported extractCwdFromJsonl (added in earlier export
commit) and live in a separate describe block from the existing
"4KB / 128KB buffer" tests, which exercise the unrelated scanCodex
meta.payload.cwd path at L338 — different function, different bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression tests for shell-script bugs in PR #1169 (#2-#5)

Two new test files pinning the four shell-script invariants from the
external audit:

regression-pr1169-build-app-sed.test.ts — bugs #2 + #3
- Runtime isolation: extracts the sed-escape sequence from build-app.sh
  and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App",
  "A/B\C&D"). Asserts the literal hostile name round-trips through a real
  `sed s///` invocation, locking the metachar safety end-to-end.
- Static check: the rebrand block must contain both the escape line AND
  the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME
  interpolation directly into the s/// replacement is rejected.
- Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }`
  failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation
  AND the cp -a appears AFTER both guards.
- Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that
  exits 1, asserts the script exits non-zero before any cp block can reach.

regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5
- Per codex pushback, the invariant is "no `mktemp ... || echo <path>`
  fallback shape" — not just "no $$ token." That's a stronger invariant
  that catches future swaps to $RANDOM or hardcoded paths.
- For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh:
  - no echo-based fallback after mktemp
  - no $$ inside any /tmp path literal
  - mktemp failure path explicitly exits / returns non-zero
  - telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup
    so success paths don't leak the tmp on normal exit.

All seven new test files are gate-tier (deterministic, sub-second, no LLM,
no network). Runtime shell tests use fake-bin PATH stubs in temp dirs;
no $HOME mutation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.41.1.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: RagavRida <ragavrida@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 06:56:41 -07:00
Garry Tan 026751ea20 v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547)
* fix(gbrain-sync): fold hostname into code-source id hash + migration (#1414)

Cherry-picked from #1468 by 0xDevNinja and extended with the
hostname-fold migration that codex review surfaced.

Pre-fix `deriveCodeSourceId` hashed the absolute repo path alone, so two
machines with identical home-dir layouts (chezmoi-managed dotfiles,
ansible-provisioned VMs) derived the same id and clobbered each other's
`local_path` in a federated brain. Last-writer-wins, with cryptic "Not a
git repository" errors on the loser.

Hash key is now `\${hostname}::\${path}`. Conductor worktrees on a single
host stay distinct (path entropy unchanged within a host); cross-machine
federations stop colliding.

Migration (D1=B + codex refinements): every existing user has a
pre-#1468 path-only-hash source id in their brain that no longer matches
what `deriveCodeSourceId` produces. Without migration, the next sync
registers a fresh source and orphans the old one. This commit adds:

- \`derivePathOnlyHashLegacyId\` — separate helper for the pre-#1468 form.
  Distinct from \`deriveLegacyCodeSourceId\` (pre-pathhash v1.x form);
  both probes run.

- \`planHostnameFoldMigration\` — feature-checks \`gbrain sources rename
  <old> <new>\` (exact argument shape, not just \`--help\`), gates on
  path-drift (skip migration if old source's \`local_path\` differs from
  current repo root), and falls back to register-new + sync-OK +
  remove-old when rename is unsupported. As of gbrain 0.35.0.0 the
  rename subcommand does not exist, so users go through the cleanup
  path; the rename path stays dormant until gbrain ships it.

- \`removeOrphanedSource\` — called only AFTER new-source sync verifies
  page_count > 0. Closes the data-loss window codex flagged where
  "register new, remove old before sync" can wipe pages if sync fails.

- \`sourceLocalPath\` — looks up a source's \`local_path\` from
  \`gbrain sources list --json\` for the drift gate.

- Helpers accept an optional \`env\` parameter so tests can inject a
  gbrain shim via PATH without process-wide PATH mutation (Bun's
  spawnSync doesn't pick up runtime PATH changes). Pre-positions for
  commit 4's centralized gbrain-exec helper.

- \`if (import.meta.main)\` guard around \`main()\` so the helpers can be
  imported for in-process unit tests.

Tests cover: pure derivation, ids-match degenerate case, no-legacy
short-circuit, path-drift skip path, rename path with shim, cleanup
fallback when rename unsupported, cleanup fallback when rename call
itself fails, source-lookup happy/missing/error paths.

\`GSTACK_HOSTNAME\` env var is a test-only knob; production uses
\`os.hostname()\`.

Fixes #1414

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(gbrain-sync): cut source-id slugs on hyphen boundaries (+ #1357)

Cherry-picked from #1481 by drummerms and extended with the explicit
HTTPS-remote regression case for #1357 (decision D2=A).

`constrainSourceId` truncated the slug with `slug.slice(-tailBudget)`,
which cut mid-word when the boundary fell inside a token. For a repo
where the combined `prefix-org-repo-pathhash` exceeded 32 chars, this
produced embarrassing artifacts like `gstack-code-kill-270c0001-c32152`
(from `drummerms-av-sow-wiz-skill-270c0001`).

Two changes carried from #1481, adapted for the #1468 hostpathhash:

1. `constrainSourceId` now walks hyphen-separated tokens from the right,
   accumulating whole tokens until adding the next would exceed
   `tailBudget`. When no token fits, falls through to the existing
   `${prefix}-${hash}` form.

2. `deriveCodeSourceId` now retries with `repo-only-hostpathhash`
   (dropping the org segment) when the full `org-repo-hostpathhash`
   triggers truncation. Keeps the repo name readable when it fits at all.

Plus a new test asserting the source id is period-free for the exact
HTTPS-with-.git remote shape from #1357 (`https://github.com/foo/bar.git`).
canonicalizeRemote strips `.git`; the sanitizer strips any residual
non-alnum. The test closes #1357 by pinning the property.

Closes #1357

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(gbrain): probe CLI without command builtin

* fix(gbrain-sync): centralize gbrain spawn surface + seed DATABASE_URL

Cherry-picked from #1508 by jasshultz, restructured per codex review #4
and #7 to widen scope and centralize the spawn surface.

The bug: gbrain auto-loads .env.local from cwd via dotenv. When
/sync-gbrain runs inside a Next.js / Prisma / Rails project whose
.env.local defines its own DATABASE_URL (pointing at the app's local
DB), gbrain reads that value instead of its own
~/.gbrain/config.json — auth fails, code + memory stages crash.

This commit:

- Adds lib/gbrain-exec.ts: buildGbrainEnv, spawnGbrain, execGbrainJson,
  execGbrainText, spawnGbrainAsync (the last one for memory-ingest's
  streaming gbrain import call). buildGbrainEnv seeds DATABASE_URL from
  ${GBRAIN_HOME:-$HOME/.gbrain}/config.json, returns a fresh env object
  (never the caller's by identity — codex review #11), and honors the
  GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch.

- Routes every gbrain spawn in bin/gstack-gbrain-sync.ts and
  bin/gstack-memory-ingest.ts through the helpers. Both files now own
  zero direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain"
  call sites.

- Threads buildGbrainEnv into the spawnSync("bun", [memory-ingest], ...)
  grandchild in runMemoryIngest (codex review #7). Without this, the
  parent fix is half-baked — the bun child inherits a clean env but
  needs DATABASE_URL pre-seeded too. spawnGbrainAsync inside
  memory-ingest provides defense in depth for standalone invocations.

- Adds GBRAIN_HOME support — aligns with detectEngineTier (already
  honors GBRAIN_HOME) so all gstack-side gbrain calls agree on which
  config file matters. Resolves baseEnv.HOME first, then homedir(), so
  test injection works without process-wide HOME mutation.

- Adds test/build-gbrain-env.test.ts: 10 unit tests covering all five
  env-seeding branches (seed from config / override caller /
  GSTACK_RESPECT escape hatch / missing config / unparseable config /
  no database_url field / GBRAIN_HOME path / object-identity guard /
  unrelated-vars preservation / idempotent-when-matches).

- Adds test/gbrain-exec-invariant.test.ts: static-source check that
  greps both bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts
  for direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain"|
  execSync(...gbrain matches and fails the build if any are found.
  Refactor-proof against future contributors adding a new gbrain spawn
  without env threading.

The invariant is intentionally narrow — only the two files where the
DATABASE_URL bug actually hurts users are guarded. Migrating the
spawn sites in lib/gbrain-local-status.ts, lib/gstack-memory-helpers.ts,
and bin/gstack-brain-context-load.ts is a follow-up.

Co-Authored-By: Jason Shultz <jasshultz@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>

* fix(gbrain-sync): add .gbrain-source to consumer repo .gitignore (#1384)

The v1.29.0.0 changelog promised .gbrain-source would be added to the
consuming repo's .gitignore so the per-worktree pin stays local, but the
change actually only added it to gstack's own .gitignore. Without the
consumer-side entry, the pin gets committed and Conductor sibling
worktrees of the same repo + branch step on each other's pin every time
anyone commits.

Add ensureGbrainSourceGitignored after a successful gbrain sources
attach in runCodeImport. Idempotent on repeat runs (line-trim match),
creates .gitignore if missing, logs a warning and continues on
permission errors so a read-only checkout doesn't fail the sync.

Gate the top-level main() call behind import.meta.main so tests can
import the helper without triggering a full sync run on module load.

Tests in test/gbrain-source-gitignore.test.ts cover: create-when-missing,
append-without-trailing-newline, append-with-trailing-newline,
idempotent on repeat, recognize whitespace-surrounded entry, no-throw
on read-only file. 6 pass.

* fix(gbrain-sources): bump gbrain sources list --json timeout 10s → 30s

Supabase free-tier cold-starts can push `gbrain sources list --json` past
10s (observed 14.5s in the wild), causing probeSource() to throw ETIMEDOUT
during /sync-gbrain code stage even though the underlying CLI was healthy.
Matches the 30s ceiling already used by `sources add` / `sources remove`
in the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-allowlist): sync project-root eng-review-test-plan artifacts (#1452)

Cherry-picked from #1465 by genisis0x and extended with the v1.40.0.0
upgrade migration that codex review #5 surfaced.

#1465 alone only patches bin/gstack-artifacts-init, which means fresh
installs and re-inits pick up the new pattern. But existing users who
already ran v1.38.1.0 have a `.migrations/v1.38.1.0.done` marker — that
migration won't re-run no matter what we change. So their installed
`.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes` stay
without the new pattern, and `/plan-eng-review` artifacts continue to
silently drop out of their federation queue.

This commit:

- bin/gstack-artifacts-init: adds projects/*/*-eng-review-test-plan-*.md
  to the three managed blocks. v1.38.1.0 covered design + test-plan; this
  completes the set for /plan-eng-review.

- gstack-upgrade/migrations/v1.40.0.0.sh: targeted in-place repair for
  existing installs. Same idempotent jq-based shape as v1.38.1.0. Adds
  the new pattern to .brain-allowlist (before the USER ADDITIONS marker),
  .brain-privacy-map.json (as class=artifact), and .gitattributes (as
  merge=union). NEVER commits + pushes — the user controls when the
  patches ship to their federated artifacts repo.

- test/artifacts-init-migration.test.ts: 5 new tests covering the
  v1.40.0.0 migration applied on top of a post-v1.38.1.0 state, jq
  patching, gitattributes append, idempotent re-run, and done-marker
  write when files are missing entirely.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(gbrain-install): skip postinstall on Windows MSYS/MINGW + post-install probe

Cherry-picked from #1487 by genisis0x and extended with the post-install
subcommand probe per T6 / codex review #19.

`bun install` in $INSTALL_DIR fails on Windows MSYS/MINGW/Cygwin shells
because gbrain's native postinstall script mis-parses path arguments
and aborts with a non-zero exit, breaking gstack-gbrain-install for
Windows users running git-bash/MSYS2. The package installs cleanly
without scripts.

This commit:

- Adds Windows shell detection via `uname -s` matching
  MINGW*/MSYS*/CYGWIN*/Windows_NT (#1487's case statement already covers
  all four — codex review #18 confirmed MINGW* is included). Windows
  paths get `bun install --ignore-scripts`; macOS and Linux unchanged.

- Adds a post-install probe of `gbrain sources --help`. `gbrain --version`
  already runs (D19 PATH-shadowing validation), but version success
  doesn't prove the subcommand surface is reachable — and
  `--ignore-scripts` may have skipped artifacts that subcommands need.
  Probe failure logs a clear warning (with Windows-specific remediation
  pointing at re-running `bun install` outside MSYS) but does NOT exit
  non-zero; users may still get value from gbrain even if the probe
  fails transiently.

Refs #1271

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: v1.40.0.0 — gbrain sync hardening wave

Bumps VERSION 1.39.2.0 → 1.40.0.0 (MINOR — substantial gbrain capability
hardening across sync pipeline, install path, federation allowlist;
~600 net LOC added across 8 community PRs + plan-review refinements).

CHANGELOG entry follows the release-summary format: two-line headline,
lead paragraph, "numbers that matter" with before/after table across 8
user-visible surfaces, "what this means for builders" closer, itemized
Added/Changed/Fixed/NOT fixed/For contributors sections.

Per-commit contributor credits: 0xDevNinja, drummerms, Jayesh Betala,
Jason Shultz, genisis0x. Also names NikhileshNanduri and realcarsonterry
in the wave's "Fixed" section for independent submissions of the
.gbrain-source gitignore bug.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: drummerms <mike@av2o.com>
Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: Jason Shultz <jasshultz@gmail.com>
Co-authored-by: genisis0x <manietdavv@gmail.com>
2026-05-17 08:26:36 -07:00
Garry Tan 33cb4715ef v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534)
* feat: GSTACK_* env-key shim for Conductor workspaces

New lib/conductor-env-shim.ts promotes GSTACK_ANTHROPIC_API_KEY and
GSTACK_OPENAI_API_KEY to canonical names when canonical is empty. Wired
into the four TS entry points that hit paid APIs or gbrain embeddings:
gstack-gbrain-sync.ts, gstack-model-benchmark, preflight-agent-sdk.ts,
test/helpers/e2e-helpers.ts. Side-effect-only import, 15 lines total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: gbrain+gstack setup, Conductor env mapping (v1.39.2.0)

USING_GBRAIN_WITH_GSTACK.md: new "What you get after setup" section,
Path 4 (remote MCP / split-engine), /sync-gbrain workflow stages +
watermark mechanics, "Conductor + GSTACK_* env vars" section, env vars
table extended, two troubleshooting entries (silent embedding failure
and FILE_TOO_LARGE watermark block).

CONTRIBUTING.md "Conductor workspaces": new paragraph on the GSTACK_*
prefix pattern and the four entry points importing the shim.

VERSION 1.39.1.0 → 1.39.2.0 and CHANGELOG entry covering the shim +
docs (full release-summary format with before/after table).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: unit coverage for conductor-env-shim

Refactor lib/conductor-env-shim.ts to export promoteConductorEnv()
so unit tests can manipulate env and call it directly (a bare side-
effect IIFE on import isn't reachable from bun:test once cached).
The on-import IIFE still runs — existing four-entry-point imports
keep working unchanged.

test/conductor-env-shim.test.ts covers all three branches:
GSTACK_FOO present + FOO empty → promotion; FOO already set →
no-overwrite; nothing in env → no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: Conductor strips canonical API keys (not just "doesn't inherit")

The prior docs framed the GSTACK_* prefix as collision-avoidance:
"Conductor exposes API keys under a GSTACK_ prefix so it never
collides with whatever the host system has set." That understates
the mechanism — Conductor actively strips ANTHROPIC_API_KEY and
OPENAI_API_KEY from every workspace's process env, so setting them
in ~/.zshrc or .env doesn't help. The fix path is to set the
GSTACK_-prefixed forms in Conductor's workspace env config; Conductor
passes those through untouched.

Three docs updated to reflect the strip, not the polite framing:
USING_GBRAIN_WITH_GSTACK.md (Conductor section), CONTRIBUTING.md
(Conductor workspaces paragraph), CHANGELOG.md (release summary).

README.md gains a "Running gstack in Conductor?" callout in the
GBrain section pointing at the canonical doc's anchor, plus a fourth
path entry (remote gbrain MCP / split-engine) that was already
documented in USING_GBRAIN but missing from the README summary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 12:32:33 -07:00
Garry Tan f58977041c v1.39.1.0 feat: EXIT PLAN MODE GATE for plan-mode review skills (#1512)
* feat: EXIT PLAN MODE GATE for plan-mode review skills

Add a terminal BLOCKING checklist that verifies the plan file ends with
`## GSTACK REVIEW REPORT` before ExitPlanMode is called. Lives at EOF of all
four plan-* review skills (eng/ceo/design/devex) and inside codex Step 2A.
Tones down the preamble's "Plan Status Footer" to a neutral forward reference
so review-report rules don't bleed into operational skills (/ship /qa /review).

Single source of truth: `generateExitPlanModeGate` in scripts/resolvers/review.ts,
registered as EXIT_PLAN_MODE_GATE in scripts/resolvers/index.ts. New test in
test/gen-skill-docs.test.ts strips fenced code blocks before matching `## `
headings and asserts the gate is the terminal heading in all four plan-* review
SKILL.md files. Codex's SKILL.md uses toContain (mid-file by design — Step 2B/2C
are not plan-touching modes).

Decisions locked via /plan-eng-review + /codex outside-voice:
- D1=A: 4 plan-* reviews + codex (autoplan, office-hours deferred)
- D2=B → D4=A: tone preamble down to neutral forward reference
- D3=A: add automated test in test/gen-skill-docs.test.ts
- D5=B: keep codex gate inside Step 2A (mid-file acceptable per gate self-gating)

Codex pre-merge findings folded in: line numbers obsolete (use EOF), test regex
must strip fences, fresh skill list (not stale REVIEW_SKILLS constant), gate
check 4 short-circuits when no plan file in context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v1.39.1.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: package.json build script uses subshells, not brace groups

The three `{ git rev-parse HEAD 2>/dev/null || true; } > path/.version`
brace groups in the build script regressed when v1.38.0.0 merged into this
branch (resolved with --ours during conflict). Bun on Windows can't parse
brace groups in this position; the v1.38.0.0 invariant requires `(...)`
subshells. Windows CI test `package.json build scripts — POSIX shell compat`
caught it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 08:13:20 -07:00
Garry Tan 25cf5edf21 v1.39.0.0 feat: buildFetchHandler factory unblocks gbrowser submodule consumption (#1511)
* feat: buildFetchHandler factory unblocks gbrowser submodule consumption

Add buildFetchHandler(cfg: ServerConfig): ServerHandle in browse/src/server.ts.
Refactor start() to delegate handler construction to the factory and read env
once via resolveConfigFromEnv(). Wire the beforeRoute hook (runs after the
tunnel surface filter, before per-route dispatch).

Auth is now cfg-driven end-to-end. Module-level AUTH_TOKEN const +
initRegistry(AUTH_TOKEN) boot call, validateAuth, and shutdown are deleted;
factory closure owns them. start() threads cfg.authToken into launchHeaded,
the state-file write, and the factory.

initRegistry is idempotent for same-token re-init; throws clearly for
different-token re-init. __resetRegistry() test helper added (mirrors
__resetConnectRateLimit). Existing tests that did rotateRoot() ->
initRegistry('fixed-token') swap to __resetRegistry() to avoid the new guard.

14 factory contract tests added covering ServerHandle shape, auth wiring,
validation throws, hook semantics across both surfaces, and registry
idempotency.

Source-pattern tests in dual-listener.test.ts and server-auth.test.ts
updated for the new identifiers (handle.fetchLocal/fetchTunnel, authToken,
shutdownFn).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v1.39.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 21:55:29 -07:00
Garry Tan ea51b45e08 v1.38.1.0 fix wave: surrogate-safe page captures (#1440), Implementation Tasks across review skills (#1454), root-level artifact patterns (#1452) (#1504)
* fix(browse): sanitize lone Unicode surrogates at commandResult chokepoint + /batch envelope (#1440)

Page captures with mixed-script Unicode round-trip cleanly to the Claude API.
Two new utilities in browse/src/sanitize.ts: stripLoneSurrogates for raw UTF-16
strings, stripLoneSurrogateEscapes for \uXXXX JSON escape text. sanitizeBody
picks the right pass based on cr.json.

buildCommandResponse is extracted from handleCommand (now exported) and
applies sanitization before new Response(). /batch was bypassing this
chokepoint via direct JSON.stringify, so it sanitizes each cr.result before
pushing AND wraps the envelope with stripLoneSurrogateEscapes. Defense in
depth wraps at getCleanText, getCleanTextWithStripping, html, accessibility,
and snapshot.ts return points so downstream consumers (datamarking, envelope
wrapping) see sanitized text before the response is built.

25 new unit tests across sanitize.test.ts and build-command-response.test.ts.
content-security.test.ts updated to accept either pre- or post-sanitize form
of the snapshot scoped branch (source-level regression check).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: bug fix wave v1.36.0.0 — Implementation Tasks, allowlist patterns, surrogate-safe page captures (#1440 #1452 #1454)

Three filed issues land together:

#1440 — Page captures from real-world HTML hit 'API Error 400: no low
surrogate in string'. Sanitizers + buildCommandResponse extraction shipped in
the prior commit; this commit adds the migration script that patches existing
brain-allowlist/privacy-map/gitattributes installs and the supporting tests.

#1452 — Federation sync was silently skipping root-level design and test-plan
docs. bin/gstack-artifacts-init adds two patterns to all three managed blocks
(.brain-allowlist, .brain-privacy-map.json, .gitattributes). Idempotent
migration v1.36.0.0.sh repairs existing installs in place via jq (preserves
JSON validity) — no commit + push from the migration.

#1454 — All four review skills (CEO/design/eng/DX) emit an Implementation
Tasks markdown section AND write a jq-built JSONL artifact per phase.
/autoplan reads all four files, scopes by current branch + 5-commit window,
dedupes on exact (component, sorted(files), title), and renders an aggregated
list in the Final Approval Gate.

New tests:
- browse/test/sanitize.test.ts (18 cases)
- browse/test/build-command-response.test.ts (7 cases)
- test/artifacts-init-migration.test.ts (7 cases)

VERSION → 1.36.0.0. Skips the v1.34.x slot taken by 'gstack consumable as
submodule' and the v1.35.0.0 slot taken by /document-generate. #1428 was
shipped separately by v1.34.2.0 with a different approach; follow-up #1503
filed for the bare-path filesystem boundary concern surfaced during our
analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump to v1.38.1.0

VERSION + package.json + CHANGELOG header + migration filename + test
reference all consistently at v1.38.1.0. Migration renamed:
gstack-upgrade/migrations/v1.38.0.0.sh -> v1.38.1.0.sh.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 21:46:50 -07:00
Garry Tan 3bf43766d5 v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505)
* fix(browse): single-point Unicode sanitization at server egress

Add sanitizeLoneSurrogates (regex-based UTF-16 lone-half cleaner) and
sanitizeReplacer (JSON.stringify replacer that runs the cleaner on every
string field during encoding).

Split handleCommandInternal into handleCommandInternalImpl (raw) plus a
thin sanitizing wrapper. The wrapper applies sanitizeLoneSurrogates to
cr.result so both single-command (handleCommand line 1034) and batch-loop
(line 1966) egress paths inherit it. Inline INVARIANT comment near the
wrapper documents the architectural constraint.

Both SSE producers (activity feed at /activity/stream and inspector
stream) stringify with sanitizeReplacer. Post-stringify regex is
ineffective on those paths because JSON.stringify has already converted
the lone surrogate into the escape sequence "\\\\uD800" before any regex
could match it; the replacer runs during stringify on the raw string
value, so the substitution lands.

Originated from @realcarsonterry PR #1463 (handleCommand-only wrap).
Architectural lift to handleCommandInternal + SSE coverage authored on
this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(setup): _link_or_copy helper for Windows file-copy fallback

On Windows without Developer Mode (MSYS2/Git Bash), plain ln -snf
silently creates a frozen file copy that doesn't refresh on git pull.
Skill files become stale after every upgrade.

Add a _link_or_copy SRC DST helper near IS_WINDOWS detection (line ~33).
It auto-dispatches: on Unix it preserves ln -snf semantics, on Windows
it copies (cp -R for directories, cp -f for files). When the source is
a Unix-style name-only alias that doesn't resolve on disk (the
connect-chrome → gstack/open-gstack-browser pattern), the helper
returns 0 silently on Windows rather than aborting setup under set -e.

Rewrite all 42 prior ln -snf call sites to route through the helper:
link_claude_skill_dirs (line 437), team-claude install paths (lines 556,
581, 592), Codex host adapter block (lines 618-640), Factory host
adapter block (lines 658-678), OpenCode host adapter block (lines
696-731), Kiro host adapter block (lines 939-953), plus migration and
alias sites.

Add _print_windows_copy_note_once helper and call it from
link_claude_skill_dirs after any linking work completes so Windows
users see one user-visible note explaining they must re-run ./setup
after every git pull.

Extend cleanup_old_claude_symlinks and cleanup_prefixed_claude_symlinks
with a Windows branch: when the target is a real directory containing a
real-file SKILL.md (no symlink to readlink), and IS_WINDOWS=1, treat
the name-matched directory as gstack-managed and remove it. This makes
--prefix / --no-prefix flips work on Windows instead of leaving stale
copies behind.

Originated from @realcarsonterry PR #1462 (1 of 42 sites). Helper
extraction, 42-site rewrite, alias-resolution edge case, and Windows
cleanup compat authored on this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(docs): rename stale gbrain_sync_mode to artifacts_sync_mode + register /document-generate

Five stale gstack-config references in docs/ pointed to the deprecated
gbrain_sync_mode key (renamed to artifacts_sync_mode in v1.27.0.0):
- docs/gbrain-sync.md: lines 62, 110, 111, 173
- docs/gbrain-sync-errors.md: lines 26, 203

Users following the docs would set a key that gstack-brain-sync no
longer reads, silently breaking artifacts sync.

Originated from @realcarsonterry PR #1461 (verbatim).

Also register /document-generate in AGENTS.md (Operational + memory
table) and docs/skills.md (skill index). The skill shipped in v1.35.0.0
but the doc-inventory cross-check in test/skill-validation.test.ts was
failing because neither file mentioned it.

Allowlist the new test/docs-config-keys.test.ts file in
test/no-stale-gstack-brain-refs.test.ts — it intentionally lists the
deprecated keys in its DEPRECATED_KEYS denylist (defending the rename).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(windows): migrate windows-free-tests to paid faster runner + register wave tests

Move the Windows free-test job from GitHub-hosted windows-latest to
Blacksmith's paid Windows runner (blacksmith-2vcpu-windows-2022).
Spin-up drops from ~60s to ~10s and Bun installs land 3-4x faster. The
label can swap to namespace-profile-windows or ubicloud-windows-* if
this repo's Blacksmith installation isn't configured.

Register the four new wave tests in the workflow's curated test list:
  - browse/test/server-sanitize-surrogates.test.ts
  - test/setup-windows-fallback.test.ts
  - test/build-script-shell-compat.test.ts
  - test/docs-config-keys.test.ts

These tests cover the Windows-hardening surface that this wave ships
(sanitizer wiring, _link_or_copy helper, build-script subshells, doc-
config drift), so they need to run on Windows where the bug shapes
actually manifest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: wave coverage for sanitizer, link_or_copy, build script, doc drift

Four new test files (29 cases total):

browse/test/server-sanitize-surrogates.test.ts:
  - 11 unit cases for sanitizeLoneSurrogates (passthrough, valid pair,
    lone high/low mid-string, trailing/leading lone, adjacent doubles,
    pair-then-lone, lone-then-pair, empty)
  - 2 bug-repro tests pinning the regression intent (UTF-8 round-trip,
    JSON.parse round-trip with codepoint assertion)
  - 4 wiring invariants asserting the architectural choke points stay
    intact (handleCommandInternalImpl rename, central sanitization
    line, sanitizeReplacer function exists, SSE producers stringify
    with replacer)
  Function extracted from server.ts via regex + eval'd in test scope
  so no production-code export is needed.

test/setup-windows-fallback.test.ts:
  - Static invariant (D7): zero raw `ln` calls outside the
    _link_or_copy helper body and comments
  - Helper-existence assertions
  - 4-cell behavior matrix (file/dir × Windows/Unix) via awk-style
    helper extraction + bash -c sourcing
  - Windows-note printer registration check
  Mirrors test/setup-conductor-worktree.test.ts patterns.

test/build-script-shell-compat.test.ts:
  - Regex assertion that package.json scripts.* contain no bash brace
    groups (Bun-Windows-hostile)
  - Subshell-precedence check for `.version` redirects
  Strips single-quoted strings before regexing so embedded JS code
  inside echo '...' doesn't false-positive.

test/docs-config-keys.test.ts:
  - DEPRECATED_KEYS denylist scanned across docs/**/*.md
  - Round-trip test for `gstack-config get artifacts_sync_mode`
  Defends the v1.27.0.0 rename from doc drift.

Updates to two existing tests:
  - test/setup-conductor-worktree.test.ts: expect `_link_or_copy`
    instead of `ln -snf` at the Conductor-worktree guard call site
  - test/gen-skill-docs.test.ts: same swap at three assertion sites
    (Codex section, Claude link_claude_skill_dirs body, Codex
    link_codex_skill_dirs body)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump v1.38.0.0 + build-script subshells + CHANGELOG

VERSION 1.35.0.0 → 1.38.0.0 (MINOR). PR #1500 (lyon-v2) claimed
v1.37.0.0 ahead of this branch; v1.38.0.0 is the next free MINOR slot
per bin/gstack-next-version queue check. Workspace-aware ship rule
applies — queue-advancing past a claimed version within the same
bump level is explicitly permitted.

package.json build script: three `{ git rev-parse HEAD ...; }` brace
groups → `( git rev-parse HEAD ... )` subshells. Bun's Windows shell
parser doesn't grok bash brace groups; subshells are POSIX-universal.
Originated from @realcarsonterry PR #1460.

CHANGELOG entry covers the full wave:
- Windows install hardening (42-site _link_or_copy + cleanup compat)
- Unicode sanitization architecture (handleCommandInternal + SSE
  replacer)
- Build script POSIX-shell compat (subshells)
- Doc rename (gbrain_sync_mode → artifacts_sync_mode)
- Windows CI on paid faster runner
- 4 new wave tests (29 cases)
Frames each item as a current system property, not a fix narrative.

Credits @realcarsonterry for PRs #1460, #1461, #1462, #1463 (the seed
of the wave). Scope expansion to all 42 setup sites, every server
egress path, Windows CI migration, and codex-flagged P0/P1 fixes
(connect-chrome alias on Windows, SSE replacer, prefix-cleanup
Windows compat) authored on this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: post-ship sync for v1.38.0.0

Document the two architectural invariants that landed in v1.38.0.0 in
their persistent homes (not just CHANGELOG):

- README Windows section: add the `./setup` re-run-after-git-pull
  requirement that `_print_windows_copy_note_once` shows at runtime.
- CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for
  contributors editing `setup`, with the test that enforces it.
- ARCHITECTURE: new "Unicode sanitization at server egress" section
  between Shell injection prevention and Prompt injection defense,
  with egress table (HTTP/batch/SSE) and the post-stringify-regex
  rationale.
- CLAUDE.md: cross-references for both invariants, matching the
  v1.6.0.0 dual-listener pattern (each constraint says which files
  to read before editing and which test pins it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(windows): use windows-latest-8-cores instead of unregistered Blacksmith label

actionlint failed PR #1505 because `blacksmith-2vcpu-windows-2022` isn't
in the repo's approved runner-label list (actionlint.yaml only registers
`ubicloud-standard-2`, and Ubicloud doesn't ship a Windows pool).

Switch to GitHub's paid larger Windows runner `windows-latest-8-cores`
— 4x the cores of the free `windows-latest` at the larger-runner billing
rate, no new third-party CI provider, no actionlint config changes.

CHANGELOG: replace "Blacksmith" / "blacksmith-2vcpu-windows-2022" /
"~6x faster spin-up" claims with the actual choice (8 cores vs 4, paid
larger runner).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(windows): switch from windows-latest-8-cores to ubicloud-standard-2-windows

`windows-latest-8-cores` sat queued indefinitely because the GitHub
larger-runner billing isn't enabled at the org level — the
"Queued — Waiting to run this check" status surfaced on PR #1505 with
no progress for the whole CI run.

Switch to Ubicloud Windows runners (`ubicloud-standard-2-windows`) so
Windows CI uses the same provider as the existing Linux evals
(`ubicloud-standard-2`). Billing stays under one account instead of
two.

Register the new label in actionlint.yaml alongside the existing
ubicloud-standard-2 entry so actionlint doesn't reject it as unknown.

CHANGELOG entry updated: runner row reflects the actual provider chosen,
"Itemized changes" mentions the actionlint.yaml registration, and the
narrative paragraph documents why `windows-latest-8-cores` failed first.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: migrate all workflows to Ubicloud (Linux + Windows, 8-core)

Switch every `runs-on` in this repo to Ubicloud so CI has a single billing
surface, consistent capacity, and 4x more cores on the workloads that were
previously stuck on free `ubuntu-latest` (2 cores). Windows uses Ubicloud's
Windows pool too — `ubicloud-standard-8-windows` — so the queued-forever
problem with GitHub's `windows-latest-8-cores` paid larger runner (org-level
larger-runner billing not enabled) goes away.

Workflows touched (9):
- evals.yml, evals-periodic.yml, ci-image.yml — bump default + matrix from
  `ubicloud-standard-2` to `ubicloud-standard-8`. The one matrix entry that
  was already on -8 stays.
- windows-free-tests.yml — `ubicloud-standard-2-windows` → `ubicloud-standard-8-windows`.
- make-pdf-gate.yml — matrix `ubuntu-latest` → `ubicloud-standard-8`. macOS
  entry preserved; the poppler-install `if: matrix.os` conditional swaps to
  match the new label.
- actionlint.yml, pr-title-sync.yml, skill-docs.yml, version-gate.yml —
  `ubuntu-latest` → `ubicloud-standard-8`.

.github/actionlint.yaml registers all four Ubicloud labels in one place:
- ubicloud-standard-2
- ubicloud-standard-8
- ubicloud-standard-2-windows  (the v1.38.0.0 windows-free-tests target)
- ubicloud-standard-8-windows  (this PR's windows-free-tests target)

Removed the duplicate `actionlint.yaml` at the repo root that I accidentally
created in the prior commit — actionlint only reads `.github/actionlint.yaml`,
so the root file was dead weight.

CHANGELOG entry updated: a single "all Ubicloud" sentence in the narrative
plus a metrics-row covering the runner pool change, and the itemized line
expanded to enumerate the 9 affected workflows. The previously-orphaned
"Itemized changes" line about just `windows-free-tests.yml` is replaced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(windows): revert to free `windows-latest`

Ubicloud doesn't ship Windows runners — confirmed via their docs. The
`ubicloud-standard-*-windows` labels I added do not exist and were causing
`windows-free-tests` to sit "Queued — Waiting to run this check" forever
(GitHub Actions can't tell a typoed label from a self-hosted runner that's
about to register; it just waits).

Three prior Windows-runner attempts all failed for different reasons:
- `blacksmith-2vcpu-windows-2022` — Blacksmith app not installed on the org
- `windows-latest-8-cores` — GitHub paid larger-runner billing not enabled
- `ubicloud-standard-2/8-windows` — Ubicloud doesn't offer Windows at all

The free `windows-latest` runner (4 cores, ~60s spin-up, $0) is the one
path that actually runs. The wave-coverage Windows tests are <30s of real
work; total job time stays under 2 minutes.

Cleaned up `.github/actionlint.yaml` to drop the bogus
`ubicloud-standard-*-windows` entries — kept only the two real Linux labels.

CHANGELOG: split the runner-pool row into Linux (migrated to Ubicloud-8)
vs Windows (stays on free windows-latest), with the why on each. Itemized
line for windows-free-tests rewritten to reflect the actual outcome.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(windows): skip Unix-only cases on Windows runner

windows-free-tests on GitHub free windows-latest fails three cases that
depend on Unix tooling the runner doesn't have:

1. `setup-windows-fallback.test.ts` behavior matrix — IS_WINDOWS=0 cells
   assert `ln -snf` produces a real symlink. On Windows-without-Developer-
   Mode (which the free `windows-latest` runner is), `ln -snf` silently
   creates a file copy. That's literally the bug `_link_or_copy` exists
   to work around, so the assertion can never pass there. Skip the whole
   describe block on win32. The static-invariant test (zero raw `ln`
   outside the helper body) above the matrix still runs and pins the
   shape the Windows install relies on.

2. `docs-config-keys.test.ts` round-trip — spawnSync(`bin/gstack-config`)
   on Windows doesn't read the bash shebang and fails to exec. Skip on
   win32; the deprecated-key denylist test in the same file still runs
   and is the actual invariant defending the v1.27.0.0 rename at the doc
   layer.

Use `describe.skipIf(process.platform === 'win32', ...)` and
`test.skipIf(process.platform === 'win32', ...)`. Tests still run on
macOS and Linux unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 21:19:58 -07:00
Garry Tan e362b0ae2f v1.37.0.0 feat: split-engine gbrain (remote MCP brain + local PGLite for code) (#1500)
* feat(gbrain): add lib/gbrain-local-status classifier with 5-state engine status + 60s cache

Foundation for split-engine gbrain: shared classifier used by both
bin/gstack-gbrain-detect (preamble probe) and bin/gstack-gbrain-sync.ts
(orchestrator SKIP-when-not-ok). Single source of truth.

Probes via `gbrain sources list --json` and classifies stderr against the
same patterns lib/gbrain-sources.ts:66-67 already uses ("Cannot connect to
database", "config.json"). Returns one of: ok, no-cli, missing-config,
broken-config, broken-db. Defensive default: unrecognized failures
classify as broken-config so the raw stderr can be surfaced upstream.

Cache at ~/.gstack/.gbrain-local-status-cache.json keyed on
{home, path_hash, gbrain_bin_path, gbrain_version, config_mtime, config_size}
with 60s TTL. Cache invalidates on any invariant change. --no-cache option
busts the cache for callers that just mutated state (/setup-gbrain,
/sync-gbrain after init/migration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(gbrain): rewrite gstack-gbrain-detect bash→TS + add gbrain_local_status field

Replaces the bash detect helper with a bun shebang script sharing the
gbrain_local_status classifier from lib/gbrain-local-status.ts with the
sync orchestrator. Single source of truth for engine-status classification
between preamble-probe and orchestrator-skip paths.

Filename stays gstack-gbrain-detect (no .ts extension) so existing skill
preamble callers shell out unchanged. Shebang `#!/usr/bin/env -S bun run`
resolves bun at runtime.

Output is key/type backward-compatible with the bash version per plan
codex #5: the 9 pre-existing keys (gbrain_on_path, gbrain_version,
gbrain_config_exists, gbrain_engine, gbrain_doctor_ok, gbrain_mcp_mode,
gstack_brain_sync_mode, gstack_brain_git, gstack_artifacts_remote) stay
identical in name + type + value semantics. One new key added:
gbrain_local_status (5-state string enum).

Updates the existing schema regression at test/gstack-gbrain-detect-mcp-mode.test.ts
to include the new key. Adds test/gbrain-detect-shape.test.ts asserting
the regression contract for future changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): orchestrator SKIP when local engine not ok + remote-http transcripts via artifacts pipeline

Two changes in the sync orchestrator, both per plan D11/D12:

1. bin/gstack-gbrain-sync.ts: runCodeImport + runMemoryIngest call
   localEngineStatus() (shared classifier from lib/gbrain-local-status.ts).
   When status is not 'ok', return a SKIP stage result with a clear reason
   instead of crashing with "source registration failed: gbrain not
   configured". Brain-sync stage runs regardless — it doesn't depend on
   local engine. dry-run preview path is gated above the check so it
   continues to show would-do steps even when the engine is broken.

2. bin/gstack-memory-ingest.ts: when gbrain MCP is registered as
   remote-http (Path 4), persist staged transcripts to
   ~/.gstack/transcripts/run-<pid>-<ts>/ instead of the ephemeral
   ~/.gstack/.staging-ingest-<pid>-<ts>/ tmp dir, and SKIP the local
   `gbrain import` call entirely. The artifacts pipeline (gstack-brain-sync
   push to git, brain admin pulls and indexes) handles routing to the
   remote brain. Local PGLite (when present via Step 4.5) stays code-only.

State recording still happens — prepared pages get their mtime+sha256
stamped under remote-http mode so the next /sync-gbrain doesn't
re-stage them. Cleanup is skipped intentionally so the persisted dir
survives until gstack-brain-sync moves it.

Adds test/gbrain-sync-skip.test.ts covering 5 SKIP scenarios (broken-db,
broken-config, no-cli, missing-config, ok pass-through). All 25
sync-related unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): v1.34.0.0 migration notice + transcripts allowlist for artifacts pipeline

Per plan D5 + D11. Two pieces of the split-engine rollout:

1. gstack-upgrade/migrations/v1.34.0.0.sh — prints a one-time
   discoverability notice for existing Path 4 (remote-http MCP) users
   whose machine has no local engine yet. Tells them about /setup-gbrain
   Step 4.5 (the new local-PGLite opt-in). Silent for everyone else.
   User can suppress permanently via `gstack-config set
   local_code_index_offered true`. Touchfile at
   ~/.gstack/.migrations/v1.34.0.0.done makes it idempotent.

2. bin/gstack-artifacts-init — adds `transcripts/run-*/*.md` and
   `transcripts/run-*/**/*.md` to the managed allowlist so the
   gstack-memory-ingest persistent staging dir (used in remote-http
   mode per D11) gets pushed to the artifacts repo. Brain admin's
   pull job then indexes transcripts into the remote brain.
   Privacy class: behavioral (matches transcript content).

Adds test/gstack-upgrade-migration-v1_34_0_0.test.ts with 5 cases:
state match, no-MCP, local-config-present, opt-out, and idempotency.
All 5 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gbrain): /setup-gbrain Step 1.5/4.5 + /sync-gbrain Step 1.5 templates

Per plan D4, D10, D11, D12. Wires the skill prose to the new
split-engine flow + classifier introduced in earlier commits.

setup-gbrain/SKILL.md.tmpl:
  - Step 1: detect output description now includes the v1.34.0.0
    gbrain_local_status field (5 values).
  - Step 1.5 (NEW): broken-db / broken-config remediation. AskUserQuestion
    with 4 options — Retry / Switch to PGLite / Switch brain mode / Quit
    (plan D4). Retry is recommended first since broken-db often = transient
    Postgres outage. PGLite is explicitly one-way + destructive (moves
    existing config to ~/.gbrain/config.json.gstack-bak-<ts>); rollback on
    init failure restores the .bak (plan D7).
  - Step 4d → Step 4.5 (NEW): in Path 4, after the verify step, offer
    local PGLite for code search. AskUserQuestion Yes/No (plan D10/D11).
    Yes path runs gstack-gbrain-install + `gbrain init --pglite --json`
    with the same rollback-safe sequence. No path skips Steps 3/4/5/7.5.
  - Step 10 verdict (Path 4): adds "Code search" row reflecting Step 4.5
    choice. Updates "Transcripts" row to describe the new D11 routing
    (artifacts repo → remote brain).

sync-gbrain/SKILL.md.tmpl:
  - Step 1 split-engine prose: corrects the prior misleading claim that
    "memory routes through whatever setup-gbrain configured, including
    remote-MCP" (codex finding #3). Memory stage shells out to local
    `gbrain import` in local-stdio mode; in remote-http mode it persists
    to ~/.gstack/transcripts/ for the artifacts pipeline.
  - Step 1.5 (NEW): local-engine pre-flight. STOP on no-cli, broken-config,
    broken-db. Soft skip (continue with code+memory SKIP) on
    missing-config + remote-http per plan D12. Surfaces actionable user
    remediation message instead of the orchestrator crashing two stages
    with ERR.

Regenerated SKILL.md for all hosts (claude, kiro, opencode, slate,
cursor, openclaw, hermes, gbrain). All 712 skill-validation + gen-skill-docs
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): .bak-rollback contract for Step 1.5 / 4.5 init failure path

Per plan D7 (rollback semantics) and codex #10 (rollback scope). The
/setup-gbrain skill instructs the model to follow a specific shell
sequence when running `gbrain init --pglite` against an existing
config:

  1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts>
  2. gbrain init --pglite --json
  3. on non-zero exit: mv .bak back; surface error

This test verifies that contract using a fake `gbrain` binary that
fails on init. Three cases:

  - FAILURE: gbrain init exits non-zero → broken config restored to
    original path, no leftover .bak.
  - SUCCESS: gbrain init exits 0 → new config in place, .bak survives
    for audit (user reviews + deletes manually).
  - SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT
    auto-cleaned. We only promise to restore config.json; PGLite
    cleanup is the user's call (codex #10).

If the skill template rewrites this sequence in a future change, this
test should fail until the test's shell is updated too. That's the
point — keep the test and the skill template aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): periodic E2E for /setup-gbrain Path 4 + Step 4.5 Yes flow

End-to-end coverage of the new opt-in question via runAgentSdkTest.
Stubs the MCP endpoint at /tools/list with a 200 response carrying a
fake gbrain v0.32.3.0 serverInfo, and fakes the gbrain + claude CLIs
so init writes a PGLite config and mcp add succeeds. Asserts the model:

  1. invokes gstack-gbrain-install (Step 4.5 Yes branch)
  2. invokes `gbrain init --pglite --json`
  3. writes a working ~/.gbrain/config.json with engine=pglite
  4. registers the remote MCP via `claude mcp add --transport http`
  5. never leaks the bearer token to CLAUDE.md

Classified as periodic-tier per plan D6 (codex #12 flagged AgentSDK
flakiness; gate-tier coverage of the split-engine behavior lives in the
deterministic unit tests at gbrain-local-status.test.ts and
gbrain-sync-skip.test.ts). Touchfile fires the test when the skill
template, install/verify/init helpers, the local-status classifier, or
the agent-sdk-runner harness changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(gbrain): bump migration to v1.35.0.0 after main merge

main shipped v1.34.0.0 (factory-export submodule) + v1.34.1.0 (update-check
hardening) while this branch was in flight. The migration file I named
v1.34.0.0.sh now belongs at v1.35.0.0 — the next minor on top of main,
matching the scale of split-engine work (new lib + orchestrator skip +
template overhaul + transcripts routing).

Renames the migration script and its test file; updates all internal
version references in both files. Behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(gbrain): memoize gbrain resolution + use --fast doctor in detect

Cuts detect's wall time substantially by sharing fork-exec results
between the helper that walks the JSON output and the localEngineStatus
classifier from lib/gbrain-local-status.ts.

Before: detect made 2x `command -v gbrain` calls (one in detect's
detectGbrain, one in the classifier's resolveGbrainBin) and 2x
`gbrain --version` calls. With memoization keyed on PATH, both
collapse to one fork each (~400ms saved per skill preamble).

Also adds `--fast` to the `gbrain doctor --json` call in detect so a
broken-db config (Garry's repro) doesn't burn a full 5s timeout on the
doctor's DB-connection check. The classifier still probes the DB
directly via `gbrain sources list --json` for engine reachability —
that's `gbrain_local_status`, separate from the coarse
`gbrain_doctor_ok` summary flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain): relax E2E assertions to smoke-test contract

Per codex #12 (AgentSDK harness is non-deterministic): the E2E now
asserts the model followed the split-engine path WITHOUT requiring a
specific subcommand sequence. Three assertions:

  1. AskUserQuestion was called (model reached interactive branches)
  2. At least one of {gstack-gbrain-install, `gbrain init --pglite`,
     `claude mcp add`} fired (model followed the skill, not a no-op)
  3. The fake bearer token never leaked to CLAUDE.md (security regression)

Deterministic per-step coverage of the same flow lives in the gate-tier
unit tests (gbrain-local-status, gbrain-sync-skip, init-rollback,
upgrade-migration). The E2E exists to catch the "model can't follow
the skill at all" regression class, not to pin the exact tool sequence.

Test passes in 280s against the live Agent SDK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(version): bump CLI smoke-test timeout to 15s (flaky at 5s under load)

The gstack-next-version integration smoke test spawns a child process
that does git operations + sibling-worktree probing. Wall time hovers
4-5s on M-series Macs; flakes at exactly 5001-5002ms when the test
suite runs under load (bun's parallel scheduling). Bumping per-test
timeout to 15s eliminates the flake without changing test logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.37.0.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:20:48 -07:00
Garry Tan 40e34deb7a v1.35.0.0 feat: add /document-generate skill + enhance /document-release with Diataxis coverage map (#1477)
* feat(document-release): add Diataxis coverage map, diagram drift detection, and docs debt tracking

Inspired by @doodlestein's documentation-website skill. Three key ideas incorporated:

1. Step 1.5: Coverage Map (Blast-Radius Analysis) — before editing any docs,
   scan the diff for new public surface and assess documentation coverage across
   Diataxis quadrants (reference/how-to/tutorial/explanation). Flags gaps without
   auto-generating content.

2. Architecture diagram drift detection — extracts entity names from ASCII/Mermaid
   diagrams and cross-references against the diff to catch stale diagrams.

3. Enhanced CHANGELOG sell test — Diataxis rubric scoring (0-3) replaces the
   subjective 'would a user want this?' check.

4. Documentation Debt section in PR body — surfaces coverage gaps and diagram
   drift as actionable items for future work.

All changes are audit-only: the skill flags what's missing, never auto-generates
missing documentation pages. Stays in its lane as a post-ship updater.

Co-Authored-By: Hermes Agent <agent@nousresearch.com>

* feat(document-generate): add Diataxis documentation generation skill

New /document-generate skill, the companion to /document-release. While
/document-release audits and fixes existing docs post-ship, /document-generate
writes missing documentation from scratch using the Diataxis framework.

Inspired by doodlestein documentation-website-for-software-project skill.

Co-Authored-By: Hermes Agent <agent@nousresearch.com>

* chore(docs): regenerate gstack/llms.txt with /document-generate entry

CI's check-freshness step ran gen:skill-docs and found llms.txt stale —
the index wasn't regenerated when /document-generate was added in the
preceding commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docs): regen document-generate/SKILL.md after merging main

Main brought in the Non-ASCII characters directive in the AskUserQuestion
Format resolver (scripts/resolvers/preamble/generate-ask-user-format.ts).
Regenerating document-generate/SKILL.md propagates the new section into
the generated output. check-freshness should now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(CLAUDE.md): add workflow for fork PRs from garrytan-agents

Fork PRs from non-collaborators don't get base-repo secrets passed to
their CI workflows, so eval/E2E jobs fail with empty-env auth. New
section: when checking out a PR from garrytan-agents, push the branch
to garrytan/gstack and re-target the PR from there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync project docs for v1.35.0.0 + bump VERSION

- README.md: add /document-generate to skills table (Technical Writer
  category) + install-command skill lists
- CLAUDE.md: add document-generate/ to project structure tree
- SKILL.md.tmpl + regenerated SKILL.md: add /document-generate routing
  line ("write docs from scratch")
- VERSION: 1.34.0.0 → 1.35.0.0 (MINOR: new skill + enhancement)

CHANGELOG entry deferred to /ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.35.0.0)

CHANGELOG entry for the document-generate skill + document-release
Diataxis enhancements. package.json synced to VERSION (drift repair
after merging main which had bumped pkg to 1.34.2.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: generate /document-generate Diataxis docs (tutorial + how-to + explanation)

Fills the documentation debt items flagged by /document-release in PR #1477:
critical-gap tutorial coverage and common-gap explanation coverage for the
new /document-generate skill.

Quadrants: tutorial, how-to, explanation (reference already covered by
document-generate/SKILL.md).

- docs/tutorial-document-generate.md (1009 words): newcomer 90-second flow
- docs/howto-document-a-shipped-feature.md (770 words): post-ship audit + fill workflow
- docs/explanation-diataxis-in-gstack.md (1106 words): why Diataxis, trade-offs, alternatives
- README.md: links the three docs from the /document-generate skills-table row

All cross-links verified — every Related section points at an existing file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Hermes Agent <agent@nousresearch.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 11:35:32 -04:00
Garry Tan b9371d716e v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478)
* fix(learnings): accept type:"investigation" in gstack-learnings-log

The /investigate skill instructed agents to log learnings with type:"investigation",
but bin/gstack-learnings-log:22 rejected anything not in
[pattern, pitfall, preference, architecture, tool, operational]. Every
investigation run exited 1 to stderr and the learning was dropped, silently
to the user.

Fix: add 'investigation' to ALLOWED_TYPES.

Regression test: round-trips a learning with type:"investigation" and asserts
exit 0 + file write; second test reads investigate/SKILL.md.tmpl and asserts
it emits the literal type:"investigation" string, guarding the
template/validator contract at both ends.

Fixes #1423. Reported by diogolealassis.

* fix(gbrain): engine detection survives gbrain ≥0.25 schema + non-zero doctor exit

freshDetectEngineTier() in lib/gstack-memory-helpers.ts returned engine:
"unknown" for every Supabase user on gbrain ≥0.25. Two stacking bugs:

1. execSync("gbrain doctor --json --fast 2>/dev/null") threw on non-zero
   exit. gbrain doctor exits 1 whenever health_score < 100, which is
   essentially every fresh install due to resolver_health warnings. The
   JSON output never reached the parser.
2. gbrain ≥0.25 shipped schema_version:2 doctor output that dropped the
   top-level 'engine' field entirely.

Result: every /sync-gbrain on Supabase logged 'engine=unknown' and skipped
all sync stages silently.

Fix:
- Replace execSync with execFileSync (no shell, no bash-specific 2>/dev/null
  redirect; portable to Windows).
- Recover stdout from the thrown error object so non-zero exits still parse.
- Fall back to reading gbrain's config.json (respecting GBRAIN_HOME env var,
  defaulting to ~/.gbrain/config.json) when doctor output doesn't surface
  an engine field.
- Add logGbrainError() helper that appends one-line JSONL to
  ~/.gstack/.gbrain-errors.jsonl on parse failure, so future regressions
  leave a forensic trail.

The "supabase" tier here means "remote postgres" in practice — gbrain
config uses engine:"postgres" for both real Supabase and any other
remote postgres (e.g. local-postgres-for-testing). Downstream sync code
treats them identically, so the label compression is intentional and
documented inline.

Regression test: existing detectEngineTier suite now isolates HOME +
GBRAIN_HOME + PATH to temp dirs (closes a flake source where the prior
tests would read whatever was on the reviewer's machine). New test
forces gbrain off PATH, writes a synthetic config.json with
engine:"postgres", asserts detectEngineTier() returns
engine:"supabase".

Fixes #1415. Patch shape contributed by Shiv @shivasymbl (tested on
gstack v1.31.0.0 + gbrain v0.31.3 + Supabase).

* fix(codex): /codex review works on Codex CLI ≥0.130.0

Codex CLI 0.130.0 made [PROMPT] and --base <BRANCH> mutually exclusive at
argv level. Step 2A of codex/SKILL.md.tmpl had always passed both (the
filesystem boundary prefix as the prompt argument + the base branch), so
every /codex review call died with:

  error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>'

Fix: split Step 2A into two paths.

Default (no custom user instructions): bare 'codex review --base <base>'.
Codex's review prompt is internally diff-scoped, so the model focuses on
the changes against base. The filesystem boundary prefix is dropped here
because Codex 0.130 has no documented system-prompt config key
(probed -c 'system_prompt="..."' against 0.130 — the flag is silently
accepted but the value isn't applied). Skill files under .claude/ and
agents/ are public, so this is a token-efficiency concern, not a safety
one.

Custom instructions (/codex review <focus>): route through codex exec
with the diff written to a tempfile, inlined into the prompt between
explicit DIFF_START / DIFF_END markers. The boundary is preserved here
because codex exec isn't auto-scoped to the diff. The DIFF_START/END
delimiters tell the model where data ends and instructions resume, which
materially reduces prompt-injection hijack rates when the diff contains
adversarial content.

Note on bash semantics: codex's earlier review flagged the exec route as
"command injection via $_DIFF interpolation." That framing is wrong —
bash parameter expansion does not re-evaluate $(...) or backticks inside
the expanded value, so a diff containing $(rm -rf /) is plain string
data to codex exec. The real risk is prompt injection (model-side, not
shell-side), which the DIFF_START/END pattern mitigates.

Regression tests in test/codex-hardening.test.ts assert across BOTH
codex/SKILL.md.tmpl AND the generated codex/SKILL.md:
1. No 'codex review' invocation line combines a quoted-string OR variable
   positional argument with --base.
2. Step 2A still contains either bare 'codex review --base' OR 'codex
   exec' (guards against accidental deletion of both fix paths).

Fixes #1428. Reported by Stashub.

* test: raise timeouts for slow integration tests

Two test files were timing out at the default 5s on developer machines,
both pre-existing on origin/main but unrelated to this branch's bug fixes:

- test/gstack-artifacts-init.test.ts: 13 tests spawning real subprocesses
  via fake gh/glab/git shims in PATH. bun's fork+exec overhead pushed
  these past 5s consistently. Added a local test-wrapper that aliases
  test() with a 30s timeout (matches the brain-sync.test.ts pattern
  already in the repo).
- test/gstack-next-version.test.ts: one integration smoke test that
  spawns 'bun run ./bin/gstack-next-version' and parses the resulting
  JSON. The subprocess does a 'gh pr list' against the live GitHub API
  to enumerate claimed version slots. Network latency makes 5s tight;
  raised this single test to 30s.

No production code changed. The tests already passed deterministically
once given enough wall-clock time.

* chore: bump version and changelog (v1.34.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 11:11:52 -04:00
Garry Tan 386fe518f9 v1.34.1.0 fix: gstack-update-check resists stale GitHub raw CDN + adds semver-order guard (#1475)
* fix: gstack-update-check resolves remote VERSION via SHA-pinned URL

Replace branch-raw fetch with git ls-remote + SHA-pinned raw URL. Add
semver-order guard via sort -V so REMOTE < LOCAL stays silent instead
of emitting a backwards UPGRADE_AVAILABLE line. Fence git ls-remote
with GIT_TERMINAL_PROMPT=0 + 5s low-speed timeout. Honor explicit
GSTACK_REMOTE_URL overrides for test fixtures and private mirrors.

3 new tests cover stale-CDN regression, multi-segment 1.9 vs 1.10
both directions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v1.34.1.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 13:37:31 -04:00
Garry Tan 0c88517a0f v1.34.0.0 feat: gstack consumable as submodule (factory-export API + AUTH_TOKEN env + import.meta.main gate) (#1472)
* feat(config): add resolveGstackHome, resolveChromiumProfile, cleanSingletonLocks

Three new exported helpers in browse/src/config.ts:

- resolveGstackHome(): honors GSTACK_HOME env, falls back to os.homedir()/.gstack
  Matches the existing convention in browse/src/telemetry.ts:26 and
  browse/src/domain-skills.ts:66.

- resolveChromiumProfile(explicit?): explicit arg wins -> CHROMIUM_PROFILE env
  -> resolveGstackHome()/chromium-profile. Lets gbrowser pass per-workspace
  profile paths through ServerConfig instead of relying on ambient env state.

- cleanSingletonLocks(dir): removes SingletonLock/Socket/Cookie via safeUnlinkQuiet.
  Defensive guard refuses to operate unless dir basename is 'chromium-profile'
  OR matches explicit CHROMIUM_PROFILE env value, preventing accidental
  deletion in unrelated directories.

Extends browse/test/config.test.ts with 12 tests covering env precedence,
guard behavior, ENOENT swallowing, and CHROMIUM_PROFILE override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security-classifier): TDZ when claude CLI is missing from PATH

The checkTranscript Promise executor in browse/src/security-classifier.ts
referenced `finish()` at the !claude early-return guard before declaring
it 5 lines later. JavaScript throws ReferenceError: Cannot access 'finish'
before initialization (TDZ) for that path, but the path is only reachable
when resolveClaudeCommand returns null inside the spawn block (a TOCTOU
window vs. the outer checkHaikuAvailable cache).

Fix: hoist `let stdout = ''`, `let done = false`, and `const finish` block
above `const claude = resolveClaudeCommand()` so finish is in scope before
any reference to it. Behavior is identical when claude is on PATH; the
fix only matters for the dormant missing-CLI degraded path.

Adds browse/test/security-classifier-tdz.test.ts as the regression guard:
clears PATH + override env vars, calls checkTranscript, asserts the result
serializes with degraded:true and a meaningful reason field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browser-manager): isCustomChromium gate + per-workspace profile + lock cleanup

Three fold-ins so gbrowser can become a thin overlay instead of forking
browse-server:

- Export isCustomChromium(): detects custom Chromium builds that bake the
  extension in as a component extension. Prefers explicit
  GSTACK_CHROMIUM_KIND=custom-extension-baked signal; falls back to
  GSTACK_CHROMIUM_PATH substring containing 'GBrowser' / 'gbrowser'.
  Gates the --load-extension push at launchHeaded so we don't trigger
  ServiceWorkerState::SetWorkerId DCHECK when two copies of the same
  service worker race to register.

- Swap hardcoded path.join(HOME, '.gstack', 'chromium-profile') in
  launchHeaded for resolveChromiumProfile() so phoenix can pass a
  per-workspace profile via CHROMIUM_PROFILE env (one daemon per gbd
  workspace, each with a distinct profile dir).

- Call cleanSingletonLocks(userDataDir) immediately after mkdirSync.
  Chromium's ProcessSingleton refuses to start when stale
  SingletonLock/Socket/Cookie files survive a SIGKILL or hard crash;
  pre-launch cleanup defends against the crash case. Safe under external
  coordination (gbd.lock for gbrowser, single-instance CLI check for
  gstack).

The existing .auth.json write at L291-302 is preserved — extensions
still need it for bootstrap even when component-baked.

Adds browse/test/browser-manager-custom-chromium.test.ts with 8 tests
covering both the env-kind and path-substring signals plus stock /
playwright-bundled Chromium negative cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): factory-export API surface + import.meta.main gate

Surfaces the embedder API gbrowser (phoenix) needs to consume gstack as a
submodule, and gates module-load side effects so the file is safe to
import without auto-starting a daemon.

Changes to browse/src/server.ts:

- AUTH_TOKEN now honors process.env.AUTH_TOKEN (trimmed) before falling
  back to crypto.randomUUID(). Whitespace-only values are rejected so the
  security boundary can't be silently weakened.

- New exported types: ServerConfig and ServerHandle. ServerConfig documents
  the full factory contract (authToken, browsePort, idleTimeoutMs, config,
  browserManager, chromiumProfile, xvfb, proxyBridge, startTime, beforeRoute).
  ServerHandle documents the return shape (fetchLocal, fetchTunnel,
  shutdown, stopListeners). Caller-owned lifecycle annotations on xvfb and
  proxyBridge prevent double-close bugs from surprise ownership.

- New exported function: resolveConfigFromEnv() builds a ServerConfig-shaped
  object from process.env for CLI use. Embedders construct their own
  ServerConfig explicitly.

- start() is now exported. Embedders can call it with env vars set as a
  v1 escape hatch until full buildFetchHandler extraction lands.

- Signal handlers (SIGINT, SIGTERM, Windows exit, uncaughtException,
  unhandledRejection) and the auto-kickoff at module bottom are now wrapped
  in `if (import.meta.main)`. CLI path is unchanged. Embedders register
  their own handlers.

- shutdown() and emergencyCleanup() now call cleanSingletonLocks(
  resolveChromiumProfile()) instead of inline path+loop. Single
  implementation, defensive guard, honors per-workspace CHROMIUM_PROFILE.

New tests:
- browse/test/server-no-import-side-effects.test.ts: spawns a fresh Bun
  subprocess that imports server.ts, asserts no signal handlers registered,
  no state-dir populated. Guards the core refactor invariant from
  regression.
- browse/test/server-factory.test.ts: 12 tests covering AUTH_TOKEN env
  behavior (honored, whitespace-rejected, trimmed), preserved exports
  (TUNNEL_COMMANDS, canDispatchOverTunnel), and ServerConfig/ServerHandle
  type compatibility.

Deferred to follow-up PR: full buildFetchHandler extraction that hoists
the 13 module-level mutables + helpers into a factory closure. Phoenix
can ship v0.6.0.0 against the start()+env surface today; the cleaner
factory comes next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: harden auth-token validation, TDZ try/catch, lockfile path safety

Three security hardening fixes from /ship adversarial review:

1. AUTH_TOKEN unicode-whitespace bypass (server.ts:67-83).
   Old: `process.env.AUTH_TOKEN?.trim() || randomUUID()` only stripped
   ASCII whitespace. A misconfigured embedder shipping AUTH_TOKEN=$''
   (BOM) or $'​' (zero-width space) would silently get a
   one-character bearer secret. New `sanitizeAuthToken()` strips all
   unicode whitespace via regex and requires >= 16 chars after stripping;
   anything shorter falls back to crypto.randomUUID(). Same sanitizer
   used by `resolveConfigFromEnv()` so the embedder path is hardened too.

2. security-classifier.ts checkTranscript safety net.
   `resolveClaudeCommand()` and `spawn()` can throw under transient
   conditions (PATH probe failure, posix_spawn ENOMEM). Old code let the
   throw propagate and rejected the Promise with a raw exception. Now
   wrapped in try/catch that calls finish() with a degraded signal,
   matching the graceful-degradation contract the layer already promises
   for missing-CLI / exit-nonzero / parse-error.

3. cleanSingletonLocks defensive guard tightened (config.ts).
   Old: basename === 'chromium-profile' OR userDataDir === $CHROMIUM_PROFILE.
   The second branch was env-controlled and the first was bypassable by
   passing a relative path that resolved to chromium-profile via CWD
   drift. New guard: refuses relative paths outright, resolves both
   sides via path.resolve(), and only accepts the env-match path when
   $CHROMIUM_PROFILE is itself absolute.

Test updates: replace the old `.trim()` test with three new cases
covering unicode-whitespace stripping, short-token rejection, and
zero-width-only rejection (server-factory.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.34.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:22:30 -04:00
Garry Tan dc6252d1df v1.33.2.0 fix: setup guards against Conductor worktree pollution of global install (#1446)
* fix(setup): skip Claude skill registration when run from a worktree of the global install

Add a guard before `ln -snf "$SOURCE_GSTACK_DIR" "$HOME/.claude/skills/gstack"`
that detects whether the target already exists as a separate real directory.
On macOS/BSD, `ln -snf SRC DST` does not replace a real DST — it creates
DST/$(basename SRC) → SRC inside it. Running ./setup from each Conductor
worktree of the gstack repo was leaking per-worktree child symlinks into the
global install, which Claude Code then picked up as separate top-level skills.

The guard uses `cd ... && pwd -P` to resolve the existing real dir and compare
against the source (mirroring setup's own `SOURCE_GSTACK_DIR` resolution).
When they differ, prints a four-line remediation hint naming both paths and
exits the Claude registration branch cleanly. Binaries still build locally.

The four other code paths through this branch are unchanged: fresh install,
retarget an existing symlink, self-rerun where the existing dir resolves to
the same source, and --local installs.

Includes 8 tests covering static guard placement, `pwd -P` resolution, the
remediation message, a behavioral reproduction of the BSD `ln -snf` child-
symlink bug, and every branch of the guard (skip on real-dir-elsewhere, allow
on fresh, allow on existing symlink, allow on self-rerun).

* chore: bump version and changelog (v1.33.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 20:35:58 -07:00
Garry Tan 1a4f0c9c15 v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills (#1442)
* fix(learnings): use token-OR matching in gstack-learnings-search --query

Split the query on whitespace into tokens; a learning matches if ANY
token appears as a substring in ANY of key/insight/files. Previously
the whole query was a single substring, so multi-word queries like
"debug investigation" only matched learnings whose insight contained
that exact contiguous phrase, which is usually nothing.

Whitespace-only query falls through to no-query (matches today's no-flag
behavior). Single-word queries behave exactly as before.

Adds test/gstack-learnings-search.test.ts: 3 assertions covering
multi-token, single-token, and no-query backwards compat.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(resolver): parameterized LEARNINGS_SEARCH with shell-injection guard

The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that
gets interpolated as --query "<keyword>" into the generated bash. Empty
value falls through to no-query (principle of least surprise: a stray
{{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a
build failure). Pattern reuses the parameterized-macro parsing from
composition.ts. The 13 templates that don't pass a query stay
byte-identical in their generated SKILL.md output.

Shell-injection guard: the query value is whitelisted to
^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks,
semicolons, or quotes throw a loud build error instead of emitting
executable bash. Static template queries are safe by inspection;
this defends against future contributors writing dangerous values.

Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args,
claude+query=foo bar on both cross-project and project-scoped branches,
codex host variant, empty value semantics, and shell-injection payloads
(\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(skills): task-shaped queries + mid-flow refresh in /investigate /qa /ship

The three long skills now pull learnings keyed to their theme at the
top, then re-pull at phase boundaries as work shifts to new sub-tasks.

Top-of-skill queries (5-6 token unions, token-OR matched):
- investigate: "debug investigation root cause hypothesis bug fix"
- qa: "qa testing bug regression flake fixture"
- ship: "release ship version changelog merge pr"

Mid-flow refresh blocks (concrete keyword recipe + worked examples):
- investigate: between Phase 1 (hypothesis) and Phase 2 (analysis),
  keyed to the hypothesis noun. Examples: auth-cookie, session-expiry.
- qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the
  buggy component name. Examples: checkout-button, signup-form.
- ship: just before Step 12 (VERSION bump), keyed to the headline
  feature. Examples: learnings-search, pacing, worktree-ship.

Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes,
dots, colons) so dynamic queries cannot inject shell metacharacters.

The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form.
Backwards-compat verified via diff: their generated SKILL.md output is
byte-identical to before this change.

Golden ship fixtures regenerated to match the new ship/SKILL.md output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump version and changelog (v1.33.1.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: refresh codex+factory ship golden fixtures

Follow-up to 513c9660 — the codex and factory host outputs needed
regeneration too, missed in the initial commit because gen:skill-docs
was only run for the claude host. Now matches gen:skill-docs --host all.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 19:34:33 -07:00
Garry Tan d21ba06b5a v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup (#1432)
* refactor: batch-import architecture (D1-D8) + F6 atomic state + F9 full-file hash

bin/gstack-memory-ingest.ts: rewrite memory ingest around `gbrain import <dir>`
batch path. Replaces per-file gbrainPutPage loop (~470s of subprocess startup
per cold run) with prepare-then-batch:

  walkAllSources
    -> preparePages: mtime-skip + optional gitleaks (--scan-secrets) + parse
    -> writeStaged: mkdir -p per slug segment, hierarchical (D1)
    -> snapshot ~/.gbrain/sync-failures.jsonl byte offset
    -> runGbrainImport (async spawn) -> parseImportJson
    -> readNewFailures: read appended bytes, map back to source paths (D7)
    -> state.sessions[path] = {...} for files NOT in failed set
    -> saveStateAtomic (F6) + cleanupStagingDir

Architecture decisions:
  D1 hierarchical staging dir
  D2 cut over, deleted gbrainPutPage entirely
  D3 source-file gitleaks made opt-in via --scan-secrets (gstack-brain-sync
     owns the cross-machine boundary; per-file scan was redundant ~470s tax)
  D4 OK/ERR verdict (no DEGRADED tri-state)
  D5 unified state schema (no separate skip-list)
  D6 trust gbrain content_hash idempotency (no skip_reason bookkeeping)
  D7 byte-offset snapshot of sync-failures.jsonl + per-source mapping
  F6 saveState uses tmp+rename atomic write
  F9 fileSha256 removes 1MB cap; full-file hash (no more silent tail-edit
     misses on long partial transcripts)

Signal handling: installSignalForwarder propagates SIGTERM/SIGINT to the
gbrain child process AND synchronously cleans the staging dir before
process.exit. Pre-fix, orchestrator timeouts left gbrain processes
orphaned holding the PGLite write lock (observed: 15-hour-CPU-time
orphan still alive a day later).

parseImportJson returns null on unparseable output (treated as ERR by
caller) instead of silently zeroing through.

gbrainAvailable() probes for the `import` subcommand instead of `put`.

Plan + review chain at /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: orchestrator OK/ERR verdict parser for batch memory ingest

gstack-gbrain-sync.ts: memory-stage parser now picks [memory-ingest] ERR
lines preferentially over the latest [memory-ingest] line, strips the
prefix and any leading 'ERR: ' for cleaner summary output, and surfaces
'(killed by signal / timeout)' when the child exits with status=null.

Matches D6's OK/ERR contract: per-file failures (FILE_TOO_LARGE etc.)
show in the summary count but only system-level failures (gbrain crash,
process kill, missing CLI) mark the stage ERR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: batch-ingest writer regressions + refresh golden ship fixtures

test/gstack-memory-ingest.test.ts: 5 new tests for the batch-import
architecture:
  1. D1 hierarchical staging slug round-trip — asserts staged file lives
     in transcripts/claude-code/<dir>/*.md, not flat at staging root
  2. Frontmatter injection — asserts title/type/tags written into the
     staged page's YAML block
  3. D7 sync-failures.jsonl exclusion — files listed as failed by
     gbrain do NOT get state-recorded; one of two test sessions lands,
     the other stays un-ingested for retry next run
  4. Missing-`import`-subcommand error path — when gbrain only advertises
     legacy `put`, memory-ingest exits 1 with [memory-ingest] ERR
  5. --scan-secrets opt-in path — verifies a dirty-source file is
     skipped via the secret-scan match when the flag is on, while a
     clean session in the same run still gets staged

Replaces the prior put-per-file shim with an import-batch shim. The
shim fails loudly (exit 99) if the new code ever regresses to per-file
`gbrain put` calls.

test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md: refresh
golden baselines to match the current generated SKILL.md content after
the v1.31.0.0 AskUserQuestion fallback-clause deletion. Goldens were
stale from that release; test was failing on origin/main before this
PR. Caught by the /ship test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v1.33.0.0 docs: design doc, P2 perf TODOs, gbrain guidance block, changelog

docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8
decisions (D1-D8), source-verified gbrain behaviors (content_hash
idempotency, frontmatter parity, path-authoritative slug, per-file
failure surface), measured performance vs plan target, F9 hash
migration one-time cliff note, and follow-up TODOs.

CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain
indicating this worktree's pin and how the agent should prefer gbrain
search over Grep for semantic queries.

TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation
(5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1
SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import
at the prepare-batch level for true no-op fast paths.

VERSION + package.json: bump to 1.33.0.0 (queue-aware via
bin/gstack-next-version — skipped v1.32.0.0 which is claimed by
sibling worktree garrytan/wellington / PR #1431).

CHANGELOG.md: v1.33.0.0 entry per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: setup-gbrain/memory.md reflects opt-in per-file gitleaks

Per-file gitleaks scanning during memory ingest is now opt-in via
--scan-secrets (or GSTACK_MEMORY_INGEST_SCAN_SECRETS=1). Update the
user-facing reference doc so it stops claiming "every page passes
through gitleaks." Also corrects the /gbrain-sync → /sync-gbrain
command typo and the post-incident recovery section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:47:33 -07:00