mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-17 07:10:12 +02:00
garrytan/cairo-v3
320 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a1a46db594 |
Merge remote-tracking branch 'origin/main' into garrytan/cairo-v3
# Conflicts: # CHANGELOG.md # VERSION # package.json |
||
|
|
65972f6a15 |
v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639)
* docs: drop ~/.zshrc env note in favor of GSTACK_* env-shim reference
The CLAUDE.md "Where the keys live on this machine" block hand-rolled a
`grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY
inside Conductor workspaces. That predates the GSTACK_* env-shim
(`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes
GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names
inside gstack's TS binaries automatically.
The zshrc recipe is now an obsolete workaround. Replace with a short note
pointing at the env-shim as the canonical answer. Keep the Agent SDK
\`env: {...}\` gotcha (still real, unrelated to where the key comes from).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: default PGLite to voyage-code-3 when VOYAGE_API_KEY set
When gstack inits a local PGLite engine for code search, use Voyage's
code-specialized `voyage-code-3` (1024-dim) embedding model if
\`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected
provider chain (OpenAI text-embedding-3-large 1536-dim when
OPENAI_API_KEY is available, etc.) when the Voyage key is unset.
Why voyage-code-3: head-to-head A/B against voyage-4-large on 10
realistic code queries against this codebase (using gbrain query
--no-expand for pure vector retrieval). voyage-code-3 strictly won on
4 queries (cases where the right hit was an implementation file vs a
test file: terminal-agent.ts over terminal-agent-integration.test.ts,
sanitizeReplacer over sanitize.test.ts, disposeSession over a
tangentially-related killDaemon test, surfaced injectCanary semantic
query). Tied on 5 with consistently +0.03 to +0.06 higher confidence.
Zero losses for voyage-4-large.
Touches 3 init sites in setup-gbrain/SKILL.md.tmpl:
- Step 1.5 (broken-db rollback-safe switch to PGLite)
- Path 3 direct PGLite init
- Step 4.5 split-engine local code index (Path 4 Yes branch)
Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the
post-install hint in bin/gstack-gbrain-install (with a tip when
VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in
USING_GBRAIN_WITH_GSTACK.md.
Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex
of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md after voyage-code-3 default
Mechanical regen via \`bun run gen:skill-docs --host all\` after the
template changes in the previous commit. Single-host regen leaves
other-host outputs stale and trips gen-skill-docs.test.ts; --host all
keeps every adapter (claude, codex, kiro, opencode, slate, cursor,
openclaw, hermes, gbrain) in sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: gbrain PGLite + voyage-code-3 init contract + sync integration
Two test files cover the voyage-code-3 default landed in the previous
commits:
test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier.
Mirrors gbrain-init-rollback.test.ts: runs the skill template's
PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel
file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty.
Also includes belt-and-suspenders grep checks that the template literally
contains the voyage gate at all 3 PGLite init sites.
test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid,
skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir,
registers a 3-file fixture git repo as a source, runs
\`gbrain sync --strategy code --skip-failed\`, asserts pages imported +
embedded > 0. Also asserts \`gbrain doctor\` reports no dimension
mismatch and the column width is 1024d. \`gbrain code-def\` smoke test
confirms symbol extraction works against the embedded fixture.
The integration test deliberately omits a \`gbrain query\` assertion:
query produces correct output but \`gbrain query\` hangs ~2 min on a
fresh PGLite before exiting. The smoking-gun assertion for "embeddings
worked" is the "N pages embedded" line from sync output. Symbol-aware
correctness is covered by the code-def assertion.
Caught one real bug during test development: gbrain reads
\`.gbrain-source\` from CWD and tries to sync that source too. The test
sets cwd to the sandbox root to avoid the parent worktree's pin
polluting the sandbox brain. Documented in the runGbrain() helper.
Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise.
Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage
embeddings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump to v1.43.1.0 with voyage-code-3 default + tests
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: update USING_GBRAIN_WITH_GSTACK for v1.43.1.0 voyage-code-3 default
Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as
the fallback path. Refresh the "search returns nothing semantic" troubleshooting
to mention both providers and clarify that the env-shim only promotes
ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor
workspace env.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: drop em-dashes + replace phantom embedding-migrations.md ref with inline recipe
CHANGELOG release-summary prose used em-dashes (violates voice rule) and
linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's.
Replace with periods/commas and inline the dimension-mismatch recovery
recipe directly (mv + re-init).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6864012ee9 |
Merge remote-tracking branch 'origin/main' into garrytan/cairo-v3
# Conflicts: # CHANGELOG.md # VERSION # package.json |
||
|
|
1d9b9c4cfc |
v1.43.0.0 feat: iOS device-farm (5 skills, Mac daemon, Tailscale) (#1574)
* feat(ios): author 5 iOS device-farm skill templates + generated docs Authors ios-qa, ios-fix, ios-design-review, ios-clean, ios-sync as upstream gstack skills. Each follows the standard SKILL.md.tmpl pattern with preamble-tier:3 frontmatter. The fork at time-attack/gstack shipped these but as byte-identical .md/.tmpl pairs that wouldn't pass skill-docs.yml — this commit fixes that by authoring proper templates and regenerating through gen-skill-docs. * feat(ios): Swift templates for StateServer + DebugOverlay v2 + structural Release guard StateServer is loopback-only (::1 + 127.0.0.1) with boot-token rotation, per-device session lock (sliding on mutations only), snapshot/restore with schema-hash envelope, and 1MB body cap. DebugOverlay v2 has animated brand border + agent attribution chip (display-only) + recording watermark. Package.swift enforces structural Release-build exclusion via .when(configuration: .debug). Includes Tailscale ACL example doc. * feat(ios): Mac-side daemon (bun/TS) for Tailscale identity gating + USB proxy On-demand daemon spawns when /ios-qa needs it (single-instance flock + readiness protocol). Owns tailnet ingress: fail-closed tailscaled LocalAPI probe, dual-track /auth/mint (self-service for allowlisted identities, owner-granted via CLI), capability-tier allowlist (observe/interact/mutate/restore), 1h default session TTL (24h hard cap), audit log of every authenticated mutating tailnet request, hashed-identity attempts log. iOS StateServer never directly binds tailnet — identity validation lives Mac-side because iPhones can't reach tailscaled. 67 unit/integration tests covering session-lock concurrency, capability enforcement, fail-closed probe, identity canonicalization, body limits, and boot-token leak proofs. * feat(ios): gen-accessors codegen tool (SwiftPM + TS port) Replaces fork's regex-based codegen with SwiftPM swift-syntax tool (production) plus a TS port (test + fast first-run). Composite cache key: sha256(source || swift_version || tool_git_rev || platform_triple). Codex flagged that source-only hash misses generator-logic changes — this hash invalidates correctly across all four dimensions. 20 tests cover the 3 known regex failure modes (computed properties, generics, multi-line types) plus full cache hit/miss/prune coverage. * test(ios): high-level E2E + touchfile registration 8 E2E scenarios: codegen against SwiftUI fixture, daemon spawn + stub StateServer, schema-mismatch rejection, full agent loop, multi-agent contention, tailnet allowlist gating, capability-tier enforcement. Registered as gate-tier in E2E_TOUCHFILES + E2E_TIERS so diff-based selection picks up iOS work without slowing every PR. * chore: bump version and changelog (v1.40.0.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(ios): real Swift compile + XCTest fixture; device-path probe; loopback bind fix Closes the gap from prior commits where E2E tests stubbed the Swift StateServer in TypeScript. Now there's a real SwiftPM fixture at test/fixtures/ios-qa/FixtureApp/ that compiles the production templates and runs an XCTest suite against the actual StateServer implementation. Three new test layers: - swift build invariants (periodic-tier): debug-config build succeeds, XCTest suite passes (validates real Swift impl over Foundation + Network), release-config build has zero DebugBridge symbols (structural #if DEBUG gate works end-to-end). - Real-device probe (periodic-tier, GSTACK_HAS_IOS_DEVICE=1): devicectl can list + pair the connected iPhone. Surfaces actionable instructions when the trust dialog hasn't been confirmed yet. - Fixture sources copied from ios-qa/templates/ — Package.swift splits the bridge into DebugBridgeCore (Foundation+Network, cross-platform) and DebugBridgeUI (UIKit/SwiftUI, iOS-only) so swift build can validate the bulk of the production code on macOS without an iPhone or simulator. Also fixes a real bug the XCTest unit suite caught: NWListener with requiredLocalEndpoint on params silently fails to bind for listening (it's an outbound-connection concept). Replaced with .requiredInterfaceType=.loopback + .acceptLocalOnly=true + a per-connection peer-address check. The fork's inherited code had this bug; we shipped it untouched in v1.41.0.0 and the new XCTest suite caught it immediately. * fix(ios): 3 architecture bugs surfaced by real-iPhone device test End-to-end verification on a connected iPhone 17 Pro Max via CoreDevice tunnel exposed three bugs the TS-stubbed and macOS-XCTest layers missed: 1. acceptLocalOnly=true was too tight. Network.framework's "local" gate only allows ::1 / 127.0.0.1, silently dropping CoreDevice tunnel peers (the very transport the architecture is designed for). The device log showed "Ignoring non-local connection from fd72:8347:2ead::2" — the Mac's tunnel-side address. Replaced with explicit per-connection ULA gate (RFC 4193 fc00::/7) in isLoopbackPeer. 2. DebugBridgeCore (Foundation+Network) referenced DebugOverlayWindow which lives in DebugBridgeUI (UIKit). Backwards module dep. Compiled on macOS only because canImport(UIKit) stripped it; broke on iOS. Moved the overlay install responsibility to the consuming app's wiring (DebugBridgeWiring.swift.template already shows the pattern). 3. @Observable macro + @Snapshotable property wrapper conflict. Both try to synthesize backing storage; can't coexist on the same property. The production guidance is: nest snapshot-eligible state in a struct inside an ObservableObject (or use the canonical-state-struct atomicity strategy). Fixture switched to a plain class to demonstrate. Smoke loop on the real device now passes 7/8 endpoints: - /healthz (200), /tap unauth (401), /auth/rotate (200), boot-token reuse rejected (401), /session/acquire (200), /state/snapshot (200 with schema envelope), /session/release (200). /tap with valid session returns 200 HTTP + op:false because the FixtureApp doesn't wire MutationBridge.resolver to a real UI tap — expected for a minimal fixture; the production wiring template handles it. Also adds: - test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/FixtureAppApp.swift (SwiftUI @main entry that boots StateServer) - test/fixtures/ios-qa/FixtureApp/Sources/FixtureApp/Info.plist - test/fixtures/ios-qa/FixtureApp/project.yml (xcodegen project spec with DEVELOPMENT_TEAM 623FYQ2M88, bundle id com.gstack.iosqa.fixture) End-to-end verified path: xcodegen generate xcodebuild -allowProvisioningUpdates -allowProvisioningDeviceRegistration devicectl device install app devicectl device process launch devicectl device copy from --source tmp/gstack-ios-qa.token curl -6 http://[<corodevice-ipv6>]:9999/... * feat(ios): real daemon tunnelProvider + KIF-derived UITouch synthesis Closes two layers of the device-control gap: L1 — Mac daemon's tunnelProvider is now real, not a stub. New files: - ios-qa/daemon/src/devicectl.ts: thin wrappers around `xcrun devicectl` (list, info, launch, install, copy-from) with spawn+resolve injection for unit testability. - ios-qa/daemon/src/tunnel-bootstrap.ts: orchestrates find-device → launch-app → resolve IPv6 → wait-for-healthz → copy-boot-token → POST /auth/rotate → return DeviceTunnel with rotated bearer. - ios-qa/daemon/test/tunnel-bootstrap.test.ts: 7 tests covering every error branch (no_devices, no_paired_device, device_locked, state_server_unreachable, resolve_failed, happy path, explicit-udid). - index.ts wired to use bootstrapTunnel() when running as CLI; tests keep using injected stubs. L2 — In-process touch synthesis for non-UIControl widgets. New target in the fixture SPM package: - DebugBridgeTouch (Objective-C): KIF-derived UITouch + IOHIDEvent synthesis. Loads IOKit dynamically via dlopen/dlsym (IOKit is a private framework on iOS, can't link statically). Uses iOS 18+ _UIHitTestContext for SwiftUI hit-testing. Public Swift-callable API: DebugBridgeTouch.sendTap(at:in:). MIT-attributed to kif-framework/KIF. - DebugBridgeUI/Bridges.swift: rewritten MutationBridge.handleTap to delegate to DebugBridgeTouch. ScreenshotBridge + ElementsBridge implementations also land here. - FixtureApp/Sources/FixtureApp/FixtureAppApp.swift: wires the bridges on app launch under #if DEBUG. Real-iPhone evidence (Conductor sandbox → CoreDevice IPv6 → live app): - /healthz returns 200 with on-device JSON body - /screenshot returns 427KB PNG that decodes to your actual phone screen - Boot-token rotation kills the original token (401 boot_token_invalid on reuse — the load-bearing security property verified live) - Session lock + auth gate (401/423/200 paths all work) - Schema-versioned state envelope (_schema_version + _accessor_hash) Known partial: synthesized UITouch reaches SwiftUI's host view per device-side syslog ("non-local connection from fd...:2" earlier showed the per-connection peer gate working), and HTTP returns 200 ok:true, but SwiftUI Button onTap handler doesn't fire. UIControl widgets DO work via UIControl.sendActions. Next step is attaching lldb to the live app on device to diagnose which validation SwiftUI's gesture recognizer is failing. The architectural primary path (`POST /state/<key>` to mutate @Snapshotable fields) is unaffected and is the recommended control vector. Documented sources for the KIF-derived synthesis: - https://github.com/kif-framework/KIF (MIT) - UITouch-KIFAdditions.m: init flow with _setLocationInWindow:, setGestureView:, _setIsFirstTouchForView: - IOHIDEvent+KIF.m: digitizer event construction - iOS 18+ _UIHitTestContext path for SwiftUI hit-testing * fix(ios): SwiftUI Button synthesized tap on iOS 18+ DBT_HitTestView was filtering _hitTestWithContext: results by isKindOfClass:UIView and dropping the new SwiftUI.UIKitGestureContainer (a UIResponder, not UIView). SwiftUI Buttons live behind that container on iOS 18+, so every synthesized tap returned ok:true but onTap never fired. Mirror KIF PR #1323: return id, pass the responder through to UITouch.setView: directly (the setter accepts non-UIView responders). Verified: real iPhone 17 Pro Max, iOS 26.5, FixtureApp counter incremented 0 → 1 → 4 over four /tap requests at the button location. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ios): hoist DebugBridgeTouch into canonical templates Bridges.swift.template imports DebugBridgeTouch but no .m/.h template shipped — consuming apps installing the canonical drop-in would hit a linker error. Closes that gap with the fixture's verified working code. Changes: - New ios-qa/templates/DebugBridgeTouch.{h,m}.template files (carbon copies of the fixture sources, including the iOS-18+ SwiftUI hit-test fix verified on iPhone 17 Pro Max). - Package.swift.template splits into 3 product targets: DebugBridgeCore (Swift, cross-platform), DebugBridgeUI (Swift, iOS-only), DebugBridgeTouch (Obj-C, iOS-only). Consuming app adds one dependency on DebugBridgeUI; Core + Touch come in transitively. - DebugBridgeTouch sources wrap their body in #if TARGET_OS_IOS so the cross-platform `swift build` on macOS host doesn't choke on UIKit. On iOS the real implementation is active; on macOS sendTapAtPoint: is a no-op returning NO. - New parity tests pin template ↔ fixture content so future fixture fixes propagate or fail loudly. - Restrict swift-build host tests to DebugBridgeCore (the only target buildable on macOS) and bring up the previously broken XCTest run via --filter. Verified post-change: real iPhone 17 Pro Max, iOS 26.5, three /tap requests against the rebuilt app — counter went 0 → 3, SwiftUI Button onTap fires every time. Templates now sufficient to ship to any consuming iOS app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ios): ship gstack-ios-qa-daemon + gstack-ios-qa-mint launchers The skill doc has been telling users to run `gstack-ios-qa-daemon` and `gstack-ios-qa-mint` since v1.41.0.0, but neither binary actually existed. Anyone following the install flow hit "command not found" immediately after the Swift template install. Adds the missing pieces: - bin/gstack-ios-qa-daemon — bash shim that execs `bun run ios-qa/daemon/src/index.ts`. Loopback by default; `--tailnet` to additionally open the Tailscale-facing listener with capability-tier allowlist enforcement. - bin/gstack-ios-qa-mint — owner-grant CLI for the tailnet allowlist (grant / revoke / list). Writes ~/.gstack/ios-qa-allowlist.json at mode 0600. Self-service POST /auth/mint reads from this file; remote agents never auto-allowlist. - ios-qa/daemon/src/cli-mint.ts — TS implementation behind the shim. Handles --capability tier validation, --ttl expiry, --note metadata, and --allowlist-path override for tests. - ios-qa/daemon/src/allowlist.ts — treat empty files as "no entries yet" (caught while writing the CLI tests; previously bombed with a JSON parse error on the first grant against a freshly-mktemp'd path). Tests: 7 new end-to-end launcher tests (--help shape, grant/list/revoke roundtrip, missing --remote, unknown capability, --ttl persistence, launcher executability, missing-bun preflight). All 81 daemon tests pass. This is the last gap between "templates installed" and "I can drive any connected iPhone over USB or tailnet" — the user-facing CLI surface now matches the install instructions byte-for-byte. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: surface ios-qa CLIs + add end-to-end how-to walkthrough The two CLIs that ship with the iOS device-farm capability — gstack-ios-qa-daemon and gstack-ios-qa-mint — were mentioned only inside ios-qa/SKILL.md. Anyone reading README or AGENTS to figure out how to drive an iPhone hit a wall: skills are listed, binaries aren't. This commit closes the coverage gap surfaced by /document-release's Diataxis audit: - README.md, AGENTS.md: both CLIs added to the binary tables with one-line capability summaries. - docs/howto-ios-testing-with-gstack.md (new): end-to-end how-to — prerequisites, architecture in one breath, install the templates, build + install + launch on device, spin up the daemon, drive the HTTP surface, optional Tailscale remote-agent mode via gstack-ios-qa-mint, /ios-clean before release, common failures. Pulled directly from the real iPhone 17 Pro Max / iOS 26.5 verification run. - README + AGENTS link to the new how-to from the iOS skill row. No CHANGELOG entry change — the consolidated 1.43.0.0 entry is /ship work. No VERSION bump — already at 1.43.0.0 covering all branch work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e-plan): tolerate transient error_api with zero-turn signature GitHub Actions run 26170760809 failed on /plan-review-report (3 retries all error_api, 1 turn, 0 tokens each) and /plan-ceo-review-expansion-energy (1 transient failure, recovered on retry 2). The prior run on the same branch ( |
||
|
|
6f31954299 |
chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision
CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
72dac4e392 |
chore(release): v1.43.0.0 — post-Daegu paper-cut wave
Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX, behavior expansion in browse/src/browser-manager.ts headless launch, three skill-template prompt changes affecting /retro, /review, /sync-gbrain). CHANGELOG entry leads with what stopped happening: /retro stops fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping 35-min restarts on big brains, /review stops shipping framework FPs the reviewer never grep'd. 18 fixes total — 15 community PRs + 3 self-filed silent-failure issues (#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7 new regression test files. Every wave-touched test file passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0ee920bbe6 |
test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591)
PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not update the schema regression check in test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical order matching the rest of the schema array. Downstream sync-gbrain ignores unknown keys, so this is forward-compat. Without this, the test fails with a diff: + "gbrain_pooler_mode" because keys is the actual set returned and the expected array was pre-#1591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8df2a9ca00 |
test(fixtures): regenerate ship-SKILL.md golden baselines
ship/SKILL.md consumes the Confidence Calibration resolver via the preamble pipeline. This wave's #1539 pre-emit verification gate extends the resolver text, which propagated to ship/SKILL.md via gen:skill-docs. The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape and failed the host-config regression check. Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md to match the current generated output. Matches the Daegu wave's bisect commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
144327dc3d |
test(learnings): align injection-prevention tests with PR #1619 tagged-line shape
PR #1619 (preserve current entries in cross-project search) refactored gstack-learnings-search to tag rows inline (`current\t<json>` vs `cross\t<json>`) instead of filtering inside the bun block via process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or CROSS env vars — it parses the per-line tag and sets a per-entry _crossProject flag. The pre-existing test/learnings-injection.test.ts still asserted on the old SLUG + CROSS env var shape. Updates: - Remove the SLUG env var assertion (no longer set on bash command line) - Remove the bun-block CROSS env var assertion (block reads the tag now, not the env) - Add a new positive assertion that the bun block parses the tag (sourceTag | tabIndex | crossProject) - Keep the shell-interpolation safety assertion unchanged — that's independent of the SLUG refactor The CROSS env var is still SET on the bash command line (it controls whether the cross-project find runs at all), but the bun child no longer reads it. The existing "env vars set on bash command line" test continues to pin that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e75a5e8e5f |
test: fill coverage gaps for PRs #1606, #1612, #1620
Three cherry-picked PRs in this wave landed without unit-test coverage for the specific invariant they protect: #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname 8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator directly. Asserts uppercase/digit/underscore accepted, lowercase REJECTED (the macOS-locale regression case), mixed-case rejected, LC_ALL=C scoping is local (doesn't leak to caller). #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn 4 static-invariant tests on browse/src/cli.ts. The actual setsid syscall is hard to assert without a real spawn, so we pin the source shape: nodeSpawn imported from child_process; non-Windows branch uses nodeSpawn(...) with detached:true and .unref(); comment documents setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux. #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail 12 static invariants on land-and-deploy/SKILL.md.tmpl + generated SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the authoritative state query, the merge-SHA capture, non-destructive worktree cleanup with uncommitted-work guard, autoMergeRequest probe on OPEN, hard "never retry gh pr merge" rule, and atomic regen propagation. Failing build if any of the three invariants regresses. Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing glob-pattern overpermissiveness (hyphens + dots accepted) — not in #1606's scope; documented inline as a separate cleanup target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bd3a6c68b2 |
fix(design): bump image-gen timeout to 240s + pin gpt-image-2
The design binary calls /v1/responses (gpt-4o + image_generation tool, quality:high, 1536x1024) but aborted the request after a hardcoded 120s. That class of request consistently takes ~140-160s end-to-end, so every generate/variants/evolve/iterate call aborted before the image returned. In /design-shotgun this cascades: Step 3c launches N parallel agents, each calling `$D generate`, each aborts at 120s and retries, all fail, the comparison board never opens — the skill appears to hang indefinitely. Reproduced the exact API call with a longer budget: HTTP 200, valid image, 143.5s. A real /design-shotgun run after the patch generated 3 variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the 161s case, which a naive 150s bump would still have failed. - Bump AbortController timeout 120_000 -> 240_000 in generate.ts, variants.ts, evolve.ts, iterate.ts (both call sites) - Pin the image_generation tool to model "gpt-image-2" design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The feedback-roundtrip.test.ts failures are a pre-existing browse-module breakage (session.clearLoadedHtml undefined), unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
707a82e88c |
fix(browse): daemonize macOS/Linux server via setsid()
`Bun.spawn().unref()` only releases the child from Bun's event loop —
it does NOT call setsid(). The spawned bun server inherits the spawning
shell's process session. When the CLI runs inside a session-managed shell
that exits shortly after the CLI returns (Claude Code's per-command Bash
sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit
sends SIGHUP to every PID in the session — killing the bun server and
its Chromium grandchildren within seconds of a successful `connect`.
Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and
pair-agent) disables the parent-process watchdog but does NOT save the
server here: SIGHUP from session teardown still reaps it.
Replace the macOS/Linux `Bun.spawn().unref()` with Node's
`child_process.spawn({ detached: true })`, which calls setsid() and
gives the server its own session leader role (PPID=1, STAT=Ss). This
mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root
cause, different OS surface.
Verified on macOS in Conductor: pre-fix the server dies ~10–15s after
connect across separate Bash invocations; post-fix the same PID stays
alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/
`snapshot` across many separate shell calls.
The `proc?.stderr` startup-error branch is removed since both platforms
now spawn with `stdio: 'ignore'`; both fall through to the on-disk
`browse-startup-error.log` written by `server.ts`'s start().catch.
|
||
|
|
7703f7cfbf |
fix(browse): mirror isCustomChromium() guard in headless launch()
When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing at a baked-extension build (GBrowser / GStack Browser), the headless launch() path was unconditionally adding --disable-extensions-except / --load-extension. This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that launchHeaded() already guards against via isCustomChromium(). Mirror the existing guard: skip --load-extension flags when isCustomChromium() returns true; always push the off-screen window geometry args. |
||
|
|
e7074b54d7 |
fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562)
Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing for headless agent sessions even for normal users — /qa hangs without --no-sandbox. The kernel policy denies the unprivileged user namespaces Chromium needs. Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces the sandbox off without changing the default for everyone else. Re-authored from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper — purely additive, preserves the headed-launch sandbox-on-by-default behavior that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar. Three new regression tests cover: - linux + override=1 → false (the named use case) - darwin + override=1 → false (env wins on any platform) - override=0 → does NOT trigger (must be exactly "1") Original diff by @techcenter68 via #1562. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7ea6b1dc89 |
fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects
- Single-object pooler API responses default to transaction-mode at 6543, but the shared pooler tenant on new projects only listens on session/5432 - Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note - Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat - 5 new tests covering rewrite, no-op shapes, env opt-out, array path Fixes #1301. |
||
|
|
db2ed599a3 |
fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435)
When gbrain connects through a PgBouncer transaction-mode pooler (port
6543), it auto-disables prepared statements. This breaks `gbrain search`
silently — the /sync-gbrain capability check fails and the GBrain Search
Guidance block never gets written to CLAUDE.md.
Three-layer fix:
1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in
the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env
passed to every gbrain spawn. This is the single chokepoint — all
gstack gbrain invocations inherit the fix. Caller can opt out with
`GBRAIN_PREPARE=false`.
2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports
`GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s
delay for async index propagation under connection pooling.
3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field
("transaction" | "session" | null) in the preamble probe JSON so
/setup-gbrain and /sync-gbrain can advise users about pooler state.
Closes #1435
Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
c427340fce |
fix(land-and-deploy): detect merged PR after gh failure
After `gh pr merge` exits non-zero, the PR may already be MERGED server-side
(concurrent merge landed, or local cleanup phase failed AFTER the merge
succeeded). Calling `gh pr merge` a second time then errors with a confusing
"already merged" — and worse, the deploy workflow never runs because we
stopped on the first failure.
Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY
non-zero exit from `gh pr merge`:
- state == MERGED → record MERGE_PATH=direct, OFFER (don't force)
stale-worktree cleanup on the base branch with
uncommitted-work guard, proceed to §4a CI watch
- state == OPEN → check autoMergeRequest; if non-null treat as
merge-queue wait; if null surface both errors and STOP
- state == CLOSED → STOP
Hard invariant: never retry `gh pr merge` after a non-zero exit. Server
state is authoritative.
Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of
truth) instead of the generated SKILL.md, so the next gen:skill-docs run
preserves the change. Original diff by @davidfoy via #1620.
Related: cli/cli#3442, cli/cli#13380.
Contributed by @davidfoy via #1620.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5e20b41743 |
fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard)
In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash glob brackets like `[A-Z]` match lowercase letters too, so the existing `case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case` through validation. The function then trips `printf -v "$varname"` and `export "$varname"` with `not a valid identifier` errors that surface mid-prompt, which is exactly what the validator was supposed to prevent. Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*` contract. Declared `local` so it doesn't leak to the calling shell — `gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare assignment would mutate the caller's locale for the rest of the process (silently affecting downstream `sort`, `tr`, locale-aware globs in the same shell, etc.). The existing regression test `test/gbrain-lib-verify.test.ts:'rejects invalid var names'` already covers the macOS repro shape (passes `lower-case` and expects the validator to reject + emit `invalid var name`). On Linux CI the test silently passed because `LC_ALL=C` is the typical default; on macOS dev boxes it fails. Verified: - `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS). - `_gstack_gbrain_validate_varname lower-case; echo $?` → 2. - `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0. - Caller's LC_ALL preserved across calls (confirmed via sourced bash). |
||
|
|
07a84a0bc7 | fix(memory): probe gitleaks without shell builtin | ||
|
|
78d30524fd | fix(setup): register root gstack slash alias | ||
|
|
873799c90a | fix(learnings): preserve current entries in cross-project search | ||
|
|
b9eefbed68 | fix(artifacts): reject malformed remote paths | ||
|
|
7320f36ab4 | fix(benchmark): parse positional prompt after flags | ||
|
|
d7f474f8a4 | fix(config): expose explain_level default | ||
|
|
7ec546deb4 |
test(review): regression for #1539 pre-emit verification gate
12 tests pinning the gate behavior: - Resolver emits the gate header + #1539 reference - Gate requires quoting file:line + verbatim text - Unverified findings forced to confidence 4-5 (auto-suppress via existing <7-rule, no new mechanism) - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM, Sequelize, Prisma - Deferred design doc reference present (1539-framework-aware-review.md) - Four named FP classes from #1539 enumerated: * field doesn't exist on model * dict.get() might be None * save() might lose fields * update_fields might miss X - All four downstream SKILL.md consumers (review, cso, plan-eng-review, ship) carry the gate text after gen:skill-docs - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows unchanged (regression on existing behavior) Failing build if the gate is removed, the suppression mechanism is re-invented separately, the framework-meta nudge drops a framework, or gen:skill-docs stops propagating the gate to consumers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a517753ec |
fix(review): pre-emit verification gate kills Django-shape FP class (#1539)
External user filed 4/8 false positives on a /review run against a Django +
DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape:
"resolvable in <5 minutes by viewing the actual code or running a simple
grep" — fields that don't exist on the model, dict.get()-might-be-None on a
form that returns {}-initialized cleaned_data, standard ORM save behavior
called out as data loss.
Extends the Confidence Calibration resolver (consumed by review, cso,
plan-eng-review, ship) with a Pre-emit verification gate:
Every finding MUST quote the specific code line that motivates it
(file:line + verbatim text). If the reviewer cannot produce the quote,
the finding is unverified — its confidence is forced to 4-5 so the
existing "Suppress from main report" rule fires automatically. The
finding still goes to the appendix for calibration audit, but the user
does not see it in the critical-pass output.
Reuses the existing suppression mechanism — no new code path. The FP
classes the gate kills are enumerated in the resolver text so reviewers
see the named patterns.
Framework-meta nudge included for Django Meta, Rails associations,
SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma
generated client — the reviewer must quote the meta-construct that
generates the symbol, not just grep for the literal name. Deeper
framework-aware ORM verification (model introspection, migration-history-
aware checks) is deliberately deferred to a future wave per T-Codex-2.
Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit
per T-Codex-3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
64a7bee176 |
test(gbrain-sync): regression for #1611 timeouts + resume
19 tests across three surfaces:
- resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric,
zero, negative, below-floor, above-ceiling → warn + default; at-floor,
at-ceiling, valid mid-range → accepted as-is.
- decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging
ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with
empty dir.
- SIGTERM staging preservation (3 static invariants): memory-ingest signal
handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve
branch must come before cleanup branch (ordering); orchestrator must
pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume.
Also threads process.env.HOME through readGbrainCheckpoint and
stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches
at process start and ignores later mutation, so the env override is the
only reliable test injection point.
Failing build if the timeout bounds are removed, the resume detection
short-circuits incorrectly, or the SIGTERM handler regresses to
unconditional cleanup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
700c9a4ff8 |
fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611)
The memory and code stages hardcoded a 35-min spawn timeout. On brains with
~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at
exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json
pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler
unconditionally cleaned the dir up — so the next run found a checkpoint
pointing at nothing and restaged from scratch, repeating the SIGTERM forever.
Three changes:
1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default
2_100_000ms = 35min unchanged):
GSTACK_SYNC_MEMORY_TIMEOUT_MS
GSTACK_SYNC_CODE_TIMEOUT_MS
Out-of-range or non-numeric values warn and fall back to the default.
2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging
dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at
the active staging dir, the dir is PRESERVED for next-run resume.
Otherwise (no checkpoint pointing here, crash before gbrain ever
touched it) it's cleaned up as before.
3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in
gstack-gbrain-sync.ts:
- no checkpoint → fresh ingest pass
- checkpoint + staging ok → set GSTACK_INGEST_RESUME_DIR; child
reuses staging dir and skips
writeStaged; gbrain import resumes
from processedIndex+1
- checkpoint + staging gone → warn "previous checkpoint stale
(staging dir gone), restaging from
scratch" and proceed
Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store
state). Detect-then-fallback semantics per C1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a05546cddc |
test(retro): regression for #1624 stale-base pre-flight guard
13 static-invariant tests pinning the four ordered pre-check branches in retro/SKILL.md.tmpl:Step 0.5: A. no-remote skip — must check origin presence + set verdict B. detached-HEAD skip — must gate behind prior verdict (ordering) C. fetch-fail warn — must match `if !` or `||` shape, gate by verdict D. stale-base BLOCK — must read latest-commit ISO date, cite remediation Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be named in prose so the retro output carries the cited reason rather than silently misreporting. Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer wins), or the BLOCK message stops citing today/latest-commit/remediation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d709139513 |
fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624)
/retro silently produced confidently-wrong output when "today" drifted (model
session-context error) or when origin/<default> was materially behind the
actual remote — git log --since returned zero or near-zero commits and the
narrative was fabricated from nothing.
Adds Step 0.5 with four ordered pre-check branches before any window analysis:
A. No 'origin' remote → skip with "base freshness not verified" note
B. Detached HEAD → skip with "base freshness not verified" note
C. `git fetch origin <default>` fails (offline) → warn, proceed against
last-known origin/<default>
D. Fetch succeeded → compare today vs latest origin/<default> commit; if
gap > window-days, BLOCK with explicit citation of latest-commit date.
Skip paths still proceed to Step 1, but the disclosure is carried into the
retro narrative ("offline run, window not freshness-verified") so the output
is never silently confidently-wrong.
Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d6b6737ba3 |
fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL
The freshClassify probe ran `gbrain sources list --json` with the inherited process env. When the probe ran from inside a repo with its own .env (an app DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain connected to the wrong database, and the classifier reported broken-db on otherwise-healthy brains. Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the same helper the sync orchestrator uses. DATABASE_URL is seeded from ~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no longer propagate a poisoned negative to clean directories. Contributed by @jetsetterfl via #1583. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b84ec551a4 |
fix(gbrain-sync): --full produces an empty code index on first run of a new repo
`gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks
the filesystem. On a freshly-registered source (0 pages), a --full run that
called reindex-code alone found nothing ("No code pages to reindex"), finished
in ~1s, and left the code index permanently empty while still reporting OK.
Fix: --full now runs `sync --strategy code` FIRST to create pages via the file
walk, then runs `reindex-code` to honor the documented "full walk + reindex"
contract for both fresh and populated sources.
Contributed by @jetsetterfl via #1584.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
029356e1f0 |
v1.42.2.0 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring) (#1629)
* v1.42.1.1 fix wave: browse launch hardening (2 bug fixes + headed exit-code wiring) Bundles two browse launch-path bug fixes plus the missing exit-code wiring that made the second fix actually work end-to-end. PR #1617 — Chromium sandbox policy at all 3 launch sites - shouldEnableChromiumSandbox() centralizes the Win32 / CI / CONTAINER / root heuristic that previously lived only in the headless launch path. - launch(), launchHeaded() / launchPersistentContext(), and handoff() now share the policy so Playwright stops auto-adding --no-sandbox on every headed launch and the yellow "unsupported command-line flag" infobar disappears on macOS and Linux dev. PR #1626 — clean Cmd+Q stops triggering supervisor respawn - resolveDisconnectCause(browser) reads the underlying Chromium ChildProcess exitCode + signalCode (with a 1s wait for an async exit event) to distinguish clean user-quit from crash. - handleChromiumDisconnect(browser) dispatches the headless launch() disconnect path: clean → exit(0), crash → exit(1). - launchHeaded() disconnect handler resolves cause inline and computes exitCode = 0 (clean) | 2 (crash) before forwarding to onDisconnect. - handoff() disconnect handler uses the same shared helper. Codex-caught propagation fix (this commit, not in either source PR) - BrowserManager.onDisconnect signature widened to accept an exitCode argument. Without this, launchHeaded's locally-computed exit code was dropped before reaching server.ts. - browse/src/server.ts:688 — onDisconnect callback now forwards the resolved code: (code) => activeShutdown?.(code ?? 2). The ?? 2 preserves legacy crash semantics for callers that invoke onDisconnect without an explicit code. Tests - browse/test/browser-manager-unit.test.ts goes from 2 → 17 tests. - 6 new tests pin shouldEnableChromiumSandbox across darwin / linux / win32 / CI / CONTAINER / root. - 7 new tests pin resolveDisconnectCause across already-exited, async-exit, SIGSEGV, SIGKILL, and null-browser. - 2 new tests (this commit) pin the onDisconnect(exitCode) propagation contract including the exact server.ts forwarding callback shape so a refactor that drops the forward fails CI before the user-visible respawn bug returns. Refs PRs #1617, #1626; companion gbrowser PR #23. * chore: bump version v1.42.1.1 → v1.42.2.0 User-requested rebump (claims v1.42.2.0 slot on the queue). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b03cd1ae2d |
v1.42.1.0 feat: gate terminal-agent teardown on ServerConfig.ownsTerminalAgent (unblocks gbrowser embedder) (#1615)
* feat: gate terminal-agent teardown on ServerConfig.ownsTerminalAgent Adds ownsTerminalAgent?: boolean to ServerConfig (default true). Wraps the three shutdown side effects (pkill -f terminal-agent\.ts + 2 safeUnlinkQuiet calls for terminal-port and terminal-internal-token) inside a single if (ownsTerminalAgent) block. Embedders (gbrowser phoenix overlay) pass false to keep their own PTY lifecycle intact across gstack's teardown. CLI start() call site passes ownsTerminalAgent: true explicitly; static-grep test in the new test file catches a refactor that drops it. Strict opt-out: only explicit false flips the gate (cfg.ownsTerminalAgent === false ? false : true). Defends against JS callers passing truthy non-bool values. Adds __resetShuttingDown test-only export mirroring __resetRegistry. The module-scoped isShuttingDown latch otherwise silently no-ops a second shutdown() in the same process. Drops dead try/catch wrappers around safeUnlinkQuiet inside the new gate — safeUnlinkQuiet already swallows all errors internally. New test file (4 cases) stubs both process.exit AND child_process.spawnSync so a real pkill -f terminal-agent\.ts never fires on the developer machine. beforeAll/afterAll save and restore real-daemon file contents in the state dir so the test cannot clobber a running gstack session. * chore: file followup TODOs (identity-based pkill, cfg.config composition gap, ownership-object trigger) Three P3 followups surfaced by /autoplan + /plan-eng-review while reviewing the ownsTerminalAgent gate: - Identity-based terminal-agent kill: pkill -f terminal-agent\.ts is a latent CLI footgun (regex match kills sibling gstack sessions, editor processes, etc.). Replace with PID-tracked process.kill at both cli.ts:1047 and server.ts:1281. - shutdown() reads module-level config, not cfg.config (pre-existing composition gap). Same gap applies to cleanSingletonLocks(resolveChromiumProfile()) at server.ts:1298 (should be cfg.chromiumProfile). Both are followup work for the embedder-composition story. - 4th caller-owned teardown gate trigger: today ServerConfig has 3 (xvfb?, proxyBridge?, ownsTerminalAgent). If a 4th appears, collapse to cfg.callerOwns?: Set<...> ownership object. * chore: bump version and changelog (v1.42.1.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: note ServerConfig.ownsTerminalAgent in CLAUDE.md sidebar block Adds a one-paragraph reference for the v1.42.1.0 embedder teardown gate right after the Sidebar architecture block. Covers default semantics, when embedders must pass `false`, polarity inversion vs xvfb?/proxyBridge?, and the static-grep CI test that pins the CLI call site. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7ca04d8ef0 |
v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569) gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin (e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints, analytics, and learnings into that plugin's directory. Anyone with the Codex plugin installed alongside gstack hit this silently. Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path contains "gstack"). Skill installs fall through to \$HOME/.gstack. Contributed by @ElliotDrel via #1570. Closes #1569. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+ gbrain v0.20+ changed `gbrain sources list --json` to return {sources: [...]} instead of a flat array. sourceLocalPath crashed upstream with `list.find is not a function` on every /sync-gbrain invocation against modern gbrain. Accept both shapes for forward/backward compat, matching probeSource/sourcePageCount in lib/gbrain-sources.ts. Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564 (@tonyjzhou, same fix, different shape — credit retained). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559) gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`, which fails in any environment where the `command` builtin isn't on the spawned process's PATH (most non-interactive shells). The probe then reported gbrain as missing even when it was installed, and context-load silently skipped vector/list queries. Fix: probe `gbrain --version` directly with a 500ms timeout (matching the rest of the file's MCP_TIMEOUT_MS). Same semantics, works everywhere execFile works. Contributed by @jbetala7 via #1560. Closes #1559. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418) Adds an exec-path regression test that runs a fake gbrain shim emitting the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings", exit 1 for health_score < 100, no top-level `engine` field). Confirms freshDetectEngineTier recovers stdout from the non-zero exit and falls back to GBRAIN_HOME/config.json for the engine label. The pre-existing test for #1415 only stripped gbrain from PATH; this test exercises the actual doctor parse path, closing the gap that codex's plan review flagged. Also documents the schema_version separation in lib/gbrain-local-status.ts: the local CacheEntry stays at version 1, distinct from the doctor-output schema_version which we accept across versions in gstack-memory-helpers. Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2 collapse). The fix landed pre-emptively in v1.29.x; this commit pins it with a stronger test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346) #1346 reported that gstack-memory-ingest still called the renamed gbrain put_page subcommand on gbrain v0.18+. The actual code migrated to `gbrain put` and later to batch `gbrain import <dir>` before this report landed — only documentation lag remained. This commit: - Updates the --help string ("Skip gbrain put calls (still updates state file)") so user-facing docs match the shipped subcommand - Updates two inline comments that still referenced the old name - Adds test/memory-ingest-no-put_page.test.ts: a regression pin that strips comments from bin/gstack-memory-ingest.ts and fails the build if "put_page" appears in any active code or string literal, plus a sanity check that the file still calls a supported gbrain page-write verb (put or import) Closes #1346. Reporter @kylma-code surfaced the doc lag; the original code migration credit is on the v1.27.x wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug> scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions using the renamed `gbrain put_page` subcommand across 10 skills (office-hours, investigate, plan-ceo-review, retro, plan-eng-review, ship, cso, design-consultation, fallback, entity-stub). Every gstack user copying those snippets hit "unknown command: put_page" on gbrain v0.18+. This commit: - Rewrites all 10 instruction templates to use `gbrain put <slug> --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML frontmatter inside --content, matching the v0.18+ subcommand shape - Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands" table to reference `gbrain put` and `gbrain get` - Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two invariants: (a) resolver source ships only canonical instructions, (b) every tracked SKILL.md file is free of `gbrain put_page` CHANGELOG entries are deliberately left untouched (historical record). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561) Bun's Windows shell parser rejects multiple constructs the inline package.json build chain used: brace groups `{ cmd; }`, subshells with redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells near redirections in general. Every Windows install + every auto-upgrade since v1.34.2.0 has failed on `bun run build`. Extracts the build chain to scripts/build.sh and the .version writes to scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing involved. Also adds Windows-specific bun.exe handling for non-ASCII PATHs (a separate Windows footgun where Bun's --compile fails when the binary lives under a path with non-ASCII chars). Updates test/build-script-shell-compat.test.ts to assert the new shape: no subshells with redirections anywhere in the build chain, and build delegates to scripts/build.sh which delegates .version writes. Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed in build helper), #1480 (@mikepsinn, partial overlap), #1460 (@realcarsonterry, brace-group fix subsumed) — credit retained. Closes #1538, #1537, #1530, #1457, #1561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554) bun build --compile on Windows appends .exe to the output filename, producing browse.exe instead of browse. find-browse's existsSync probe only checked the bare path and returned null on Windows even when the binary was correctly built. .gitignore similarly only excluded the bare bin/gstack-global-discover path, leaving the .exe variant tracked. This commit: - .gitignore: changes `bin/gstack-global-discover` → `bin/gstack-global-discover*` so the Windows .exe variant is ignored - browse/src/find-browse.ts: adds isExecutable + findExecutable helpers that fall back to .exe/.cmd/.bat probing on Windows, mirroring the same helper already in make-pdf/src/browseClient.ts and pdftotext.ts Contributed by @Mike-E-Log via #1554. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest Adds .github/workflows/windows-setup-e2e.yml as the gate that catches Bun shell-parser regressions in the build chain before they reach users. Triggers on PRs touching package.json, scripts/build.sh, scripts/write-version-files.sh, setup, browse cli/find-browse, or gstack-paths. What it verifies: 1. bun run build completes on Windows (the previously-broken path that #1538/#1537/#1530/#1457/#1561 reported) 2. All compiled binaries land on disk (browse.exe, find-browse.exe, design.exe, gstack-global-discover.exe) 3. find-browse resolves to the .exe variant on Windows (regression gate for #1554) 4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT on Windows (regression gate for #1570) Complements the existing windows-free-tests.yml (curated unit subset); this new workflow exercises the install path itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209) Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together (mutually exclusive at argv level). Every /codex review, /review, and /ship structured Codex review call ended with an argv error before the model ran. Fix: scope the diff in prompt text using "Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD" instead of `--base <base>`. Preserves the filesystem boundary instruction across all invocations and keeps Codex's review prompt tuning. Touches: - codex/SKILL.md.tmpl + regenerated codex/SKILL.md - scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md - test/gen-skill-docs.test.ts: new regression that fails if any of the five known files still contain the prompt+--base shape - test/skill-validation.test.ts: corresponding negative + positive pin on the rendered SKILL.md files Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527 (@mvanhorn — same intent, different patch shape, CONFLICTING) and #1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): diff from git merge-base, not git diff origin/<base> (#1492) git diff origin/<base> shows everything since the common ancestor in both directions — it includes commits that landed on origin/<base> after this branch was created as deletions. That made /review and /ship's pre-landing structured review report inflated diff totals and flagged "removed" code that was actually still present in the working tree. Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff the working tree against that point. Same coverage of uncommitted edits, no phantom deletions from out-of-order base advancement. Applies to /review's Step 1 (diff existence check), Step 3 (get the diff), the build-on-intent scope-creep check, the structured review DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt. Same change flows into ship/SKILL.md via the shared resolver. Touches: - review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md - scripts/resolvers/review.ts - scripts/resolvers/review-army.ts Contributed by @mvanhorn via #1492. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522) #1503 reported that the bare codex review --base path stripped the filesystem boundary instruction, letting Codex spend tokens reading .claude/skills/ and agents/. #1522 proposed adding a skill-path detector that switched to the custom-instructions route when the diff touched skill files. After C10 (#1209) restructured codex review to always carry the boundary in the prompt (the prompt+--base argv conflict forced the restructure), the skill-path detector becomes redundant — every default call already preserves the boundary. This commit pins the post-#1209 invariant with a test that fails the build if any future refactor strips the boundary from codex/SKILL.md, review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test. #1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers its safety concern); credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): use command -v instead of which for codex detection (#1197) `which` is not on PATH in every shell — some Windows shells, BusyBox- only containers, and minimal CI images all fail when skills probe codex availability via `which codex`. `command -v` is a POSIX builtin and always available where the skill is running. Touched: - codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "") - scripts/resolvers/review.ts and scripts/resolvers/design.ts: 3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1` - Regenerated all 10 affected SKILL.md files (codex, review, ship, design-consultation, design-review, office-hours, plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review) - test/skill-validation.test.ts: updated pin + defensive regression test that fails if `which codex` returns to codex/SKILL.md - test/skill-e2e-plan.test.ts: updated summary regex Contributed by @mvanhorn via #1197. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327) When codex exits non-zero (parse errors, arg-shape breaks, model API errors that propagate as non-zero status), the calling agent previously saw an empty output and burned 30-60 minutes misdiagnosing as a silent model/API stall. The hang-detection block only caught exit 124 (the timeout-wrapper signal). Adds elif blocks in all four codex invocation sites (Review default, Challenge, Consult new-session, Consult resume) that: - Echo "[codex exit N] <stderr first line>" to stdout - Indent the first 20 stderr lines for inline context - Log codex_nonzero_exit telemetry tagged with the call site Contributed by @genisis0x via #1467. Closes #1327. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248) The design binary previously called process.env.OPENAI_API_KEY without checking where the key came from. If a user ran $D inside someone else's project that had OPENAI_API_KEY in its .env, the resulting generation billed that project's account. Silent and irreversible. Fix: resolveApiKeyInfo() returns both the key and its source. When the env-var path matches an OPENAI_API_KEY entry in the current directory's .env, .env.<NODE_ENV>, or .env.local file, we set a warning. requireApiKey() prints "Using OpenAI key from <source>" plus the warning before the run — never the key itself. Adds 6 unit tests covering: config-vs-env precedence, env-only (no match), env+cwd .env match, quoted/exported values, value-mismatch (no false positive), and the no-leak invariant for requireApiKey stderr output. Contributed by @jbetala7 via #1278. Closes #1248. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214) Full-page screenshots of tall pages routinely exceeded 2000px on the longest dimension, silently bricking the agent's session: the resulting base64 reached the Anthropic vision API which rejected the oversized image, leaving the agent burning turns on a useless blob with no stderr trace from the browse side. Adds browse/src/screenshot-size-guard.ts as a shared helper: - guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000 - guardScreenshotPath(path) → file-mode variant that rewrites in place - Aspect ratio preserved via sharp's resize fit:inside - Stderr diagnostic on any downscale so callers can see when it fired - Lazy sharp import so non-screenshot paths pay no startup cost Wires the guard into all three full-page callsites codex review flagged: - browse/src/snapshot.ts: annotated + heatmap fullPage captures - browse/src/meta-commands.ts: screenshot command (path + base64 fullPage modes) plus the responsive 3-viewport sweep - browse/src/write-commands.ts: prettyscreenshot fullPage path Covers seven unit cases (pass-through, downscale, aspect ratio, exactly-2000px edge, file-mode rewrite) plus a static invariant test that fails the build if any of the three callsites stops importing the guard. Closes #1214. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370) The L4 TestSavant classifier in browse/src/security-classifier.ts can't be imported into the compiled browse server (onnxruntime-node dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent that used to host it (sidebar-agent.ts) was removed when the PTY proved out — leaving the classifier file shipped but with zero callers. Exactly the gap codex flagged in #1370. Adds browse/src/security-sidecar-entry.ts: a Node script that runs the classifier as a subprocess of the browse server. It reads NDJSON requests from stdin and writes id-correlated NDJSON responses to stdout, supporting: - op: "scan-page-content" — full L4 classifier scan - op: "ping" — liveness probe for the client's health check - op: "status" — classifier readiness (used by /pty-inject-scan to surface l4 { available: bool } in its response) Plus browse/src/find-security-sidecar.ts: a resolver that locates node + the bundled JS entry (browse/dist/security-sidecar.js, built in a follow-up package.json change) or falls back to the dev TS entry. Returns null cleanly when node isn't on PATH so the calling endpoint can degrade per D7 (extension WARN + user confirm). C17 of the security-stack wave. C18 adds the IPC client + lifecycle management; C19 wires the endpoint; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370) Adds browse/src/security-sidecar-client.ts to manage the Node L4 classifier subprocess from the compiled browse server: - Lazy spawn on first scan; reuses the same process across requests - Id-correlated request/response via NDJSON over stdio - 5s default per-scan timeout; 64KB payload cap (short-circuits before spawn so oversized requests don't waste a process) - 3-in-10-minutes respawn cap → trips circuit breaker; subsequent scans throw immediately so the /pty-inject-scan endpoint can surface l4 { available: false } to the extension and degrade to WARN+confirm - process.on('exit') sends SIGTERM to the child for clean teardown - isSidecarAvailable() lets the endpoint probe before scan calls so the response shape reflects degraded mode honestly Unit tests cover the payload cap, the availability probe, and the breaker-doesn't-crash invariant under repeated rejected calls. C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370) The sidebar's gstackInjectToTerminal callers (toolbar Cleanup, Inspector "Send to Code") were piping page-derived text directly into the live claude PTY with ZERO classifier processing — the gap codex flagged in #1370. The documented sidebar security stack had a hole the size of every Cleanup-button click. Adds POST /pty-inject-scan to browse/src/server.ts: - Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the general 404 path; never reaches the scan logic) - Root-token auth via existing validateAuth() — 401 on unauth - 64KB request cap → 413 + payload-too-large body - 5s scan timeout via sidecar client - URL-blocklist forced to BLOCK in PTY context (page-derived REPL input is higher-risk than ordinary tool output) - L4 ML classifier via the sidecar when available; degrades to WARN per D7 when sidecar is unavailable - Response goes through JSON.stringify(..., sanitizeReplacer) per v1.38.0.0 Unicode-egress hardening - Imports only from security-sidecar-client.ts, never directly from security-classifier.ts (which would brick the compiled Bun binary) Seven static-invariant tests pin the POST verb, auth gate, 64KB cap, tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability shape, and the no-direct-classifier-import rule. C19 of the security-stack wave. C20 routes the extension through it; C21 adds the invariant AST check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370) Closes the documented-vs-shipped gap codex flagged in #1370. The sidebar's two PTY-injection call sites (Inspector "Send to Code" and toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint before writing to the live claude REPL. Adds window.gstackScanForPTYInject(text, origin) to extension/sidepanel-terminal.js: - Async, returns { allow, verdict, reasons, l4 } - POST to /pty-inject-scan with the existing root-token auth - WARN+confirm on scan failure (network down, sidecar absent, etc.) rather than silent PASS — D7 honest-degradation gstackInjectToTerminal stays synchronous, returns boolean. Per D6: keeping the inject sync means existing `const ok = ...?.()` callers don't break, and the invariant test in test/extension-pty-inject-invariant.test.ts can statically pin that every call goes through the scan first. extension/sidepanel.js call sites updated: - inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via window.confirm, PASS injects silently - runCleanup() → same flow. Static cleanup prompt always PASSes but still routes through scan to honor the invariant. C20 of the security-stack wave. C21 adds the static invariant test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(security): invariant — extension PTY inject must be scan-gated (#1370) Static-analysis invariant test that fails the build if any extension/*.js path calls window.gstackInjectToTerminal without a preceding window.gstackScanForPTYInject in the same enclosing function. Closes the documented-vs-shipped gap codex demanded a machine check on. Rules: - Rule 1: any file that calls inject must also reference scan - Rule 2: in the enclosing function (function declaration, arrow, async (), event handler), a scan call must appear before the inject call by source position - Exemption: sidepanel-terminal.js (the file that DEFINES the inject function) is exempt from Rule 2 since the definition is not a call Plus two structural checks: - sidepanel-terminal.js defines both the inject and scan functions - inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would silently break the `const ok = ...?.()` pattern at every caller C21 of the security-stack wave. The sidecar architecture (#1370) is complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension pre-scan wiring (C20), and now the regression gate (C21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112) Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current browse/src/stealth.ts contract. The existing minimal "codex narrowed" stealth (webdriver-mask + AutomationControlled launch arg) stays the default. PR #1112's six additional patches are added behind an opt-in GSTACK_STEALTH=extended env flag. Extended-mode patches (applied AFTER the default mask, in order): 1. delete navigator.webdriver from prototype (not just the getter — detectors check `"webdriver" in navigator`) 2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1 software-GPU tell in containers) 3. navigator.plugins returns a PluginArray-prototype-passing array with MimeType objects and namedItem() 4. window.chrome populated with chrome.app, chrome.runtime, chrome.loadTimes(), chrome.csi() with realistic shapes 5. navigator.mediaDevices backfilled when headless drops it 6. CDP cdc_*-prefixed window globals cleared Why opt-in: the default mode's contract is fingerprint CONSISTENCY, which protects against detectors that flag spoofing mismatch. Extended mode actively lies about the environment; sites that reflect on these properties can break. Users who hit detection in default mode can flip GSTACK_STEALTH=extended for SannySoft 100% pass-rate. Twenty unit tests pin the env-flag semantics, all six patches' code presence, and the applyStealth wiring order. Live SannySoft pass-rate verification stays in the periodic-tier E2E suite. Contributed by @garrytan via #1112 (rebased — original PR opened before the codex-narrowed minimum landed; rebase preserves the narrowed default while adding the SannySoft-passing path as opt-in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates Updates the three ship-SKILL.md golden baselines (claude, codex, factory hosts) to match the new shape produced by: - C10 #1209 codex argv (prompt + diff scope, no --base) - C11 #1492 merge-base diff (DIFF_BASE= preamble) - C13 #1197 command -v for codex detection - C12 + boundary preservation per regen-enforcing test Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs, commit the regenerated outputs together. Goldens are part of the regen contract — without this commit, test/host-config.test.ts' golden-baseline checks fail with the diff codex review surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes) Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the release-summary format in CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table, "What this means for builders" closer, then itemized Added/Changed/Fixed/For contributors with inline credit to every PR author and original issue reporter. Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net, substantial new capability across security (PTY sidecar wiring), install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI 0.130+), and quality (screenshot guard, design key disclosure, extended stealth opt-in). MINOR is the right call. Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537, #1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327, #1193 pattern, #1152 pattern. Credit retained inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe] windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a freshly-built repo where binaries land at browse/dist/browse.exe (no .claude/skills/gstack/ install layout). The previous markers chain only matched .codex/.agents/.claude prefixed paths, so find-browse exited "not found" even when the binary was present. Adds a source-checkout fallback after the marker scan: if no installed layout resolves but <git-root>/browse/dist/browse[.exe] exists, return that. Three real callers hit this path: - gstack repo dev workflow before `./setup` runs - windows-setup-e2e.yml CI (the breakage that surfaced this) - make-pdf consumers running from a sibling source checkout Smoke-verified: a fresh git repo with browse/dist/browse on disk now resolves through the source-checkout branch (was returning null before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
40d00bd2ce |
v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) (#1592)
* fix(build-app): escape sed replacement metachars in Chromium rebrand
build-app.sh injects \$APP_NAME directly into the replacement half of
sed's s/// when patching Chromium's localized InfoPlist.strings. If
\$APP_NAME ever carries '/', '&', or '\\' — the command either breaks
or starts interpreting input as sed syntax. The trailing '|| true'
would then silently hide the failure and ship a DMG that still says
'Google Chrome for Testing' in the menu bar.
Escape replacement metachars before substitution. No change for the
default name 'GStack Browser'.
* fix(build-app): bail out if 'mktemp -d' fails instead of cp-ing into '/'
The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check.
If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP
is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' —
expands to 'cp -a "<app>" "/"', which copies the bundle into the root of
the filesystem.
Refuse to continue unless mktemp produced a real directory. Defensive
second check catches the (rare) case where mktemp succeeds but returns
something that isn't a directory we can cp into.
* fix(telemetry-sync): drop predictable $$ tmp-file fallback
gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on
failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable
and reusable, so on shared hosts another user can pre-create or symlink
the path and either steal the response body or clobber an unrelated
file when curl writes through it.
Drop the fallback. If mktemp cannot produce a unique file we just skip
this sync cycle — the events stay on disk and the next run picks them
up. Also install an EXIT trap so the response file is cleaned up on
unexpected exit, not just on the happy path.
* fix(verify-rls): drop predictable $$-based tmp file fallback
Same shape as gstack-telemetry-sync: on mktemp failure the script fell
back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the
PID and a per-check counter. On a shared box another user can pre-create
or symlink the path and either capture the HTTP response body (which may
leak what the RLS tests revealed) or corrupt an unrelated file that curl
writes through.
Make mktemp strict. On failure return from the check function; the caller
tallies a FAIL and the run moves on.
* fix(security-classifier): close writer + delete tmp on download error
downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives
it from a fetch body reader, but if reader.read() or writer.write()
throws mid-download the writer is never closed. That leaks an FD per
failed attempt and leaves the half-written tmp on disk. A later retry
can land in renameSync(tmp, dest) with a truncated TestSavantAI /
DeBERTa ONNX file — which then loads but produces garbage classifier
verdicts until the user manually nukes the models cache.
Wrap the download loop in try/catch. On failure, destroy() the writer
and unlink the tmp before rethrowing, so the next attempt starts from a
clean slate.
* fix(meta-commands): guard JSON.parse in pdf --from-file parser
parsePdfFromFile() runs JSON.parse on user-supplied file contents with
no try/catch. A malformed payload surfaces as an uncaught SyntaxError
from the 'pdf' command handler and the user sees an opaque stack trace
instead of "this file isn't valid JSON". Worse, the same call path is
used by make-pdf when header/footer HTML would overflow Windows'
CreateProcess argv cap, so a corrupt payload file there can take down
the make-pdf run.
Wrap JSON.parse. Re-throw with a message that names the offending file
and echoes the parser's own explanation. Also reject top-level non-
objects (null, array, primitive) since the rest of the function treats
json as an object — catching that here produces a clear error instead
of a TypeError further down.
* fix(global-discover): stop dropping sessions when header >8KB
extractCwdFromJsonl() reads the first 8KB of each JSONL session file and
runs JSON.parse on every newline-split line. When a session record
happens to straddle the 8KB cap, the last line ends in a truncated JSON
fragment, JSON.parse throws, the catch block 'continue's silently, and
if that was the only line carrying 'cwd' the whole project gets dropped
from the discovery output without a warning.
Two independent hardening steps:
1. Raise the read cap to 64KB. Session headers observed in Claude
Code / Codex / Gemini transcripts fit comfortably; this just moves
the cliff out of the normal range.
2. Drop the final segment after splitting on '\\n'. If the read hit
the cap mid-line, that segment is guaranteed incomplete; if the
file ended inside the buffer, the split produces an empty final
segment and dropping it is a no-op.
Together these make the parser robust regardless of how verbose the
leading records are.
* test: export downloadFile, parsePdfFromFile, extractCwdFromJsonl
These three internal helpers are now imported by regression tests
landing in the next commits (PR #1169 follow-up). Pattern matches the
existing normalizeRemoteUrl export in gstack-global-discover.ts which
test/global-discover.test.ts already imports side-effect-free.
No change to runtime behavior; gstack has no public package entrypoint
that would re-export these, so the in-repo surface is unchanged for
callers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security-classifier): await writer close before unlinking tmp on error
The earlier downloadFile() error-path cleanup hit a race: Node's
createWriteStream lazily opens the FD and flushes buffered writes during
destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()`
hits ENOENT (file not yet on disk), then the writer's destroy finishes on the
next tick and creates the file fresh — leaving the half-written tmp behind
exactly as the original fix tried to prevent.
The new sequence awaits the writer's 'close' event before unlinking, so the FD
is fully torn down and no subsequent flush can re-create the path.
Caught by browse/test/security-classifier-download-cleanup.test.ts in the
next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(browse): regression tests for downloadFile cleanup + parsePdfFromFile guard
Covers PR #1169 bugs #6 and #7:
- security-classifier-download-cleanup.test.ts pins downloadFile error-path
cleanup against three failure shapes: reader rejects mid-stream, non-2xx
response, missing body. Asserts the dest file is not created and no
<dest>.tmp.* siblings remain (glob-matched, not exact path — codex push:
if the fix later switches to mkdtempSync, the assertion still holds).
Includes a happy-path case so the cleanup isn't fighting a correct download.
- regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile
to throw a helpful error for: invalid JSON, empty file, top-level array,
top-level number, top-level string, top-level null, top-level boolean.
Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof
guard must be tested separately from the JSON.parse try/catch.
Both files use mkdtempSync(process.cwd()/...) for fixture isolation since
SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(global-discover): regression for extractCwdFromJsonl 64KB cap
PR #1169 bug #8: the 8KB read cap landed mid-line on Claude Code session
headers, JSON.parse threw on the truncated tail, the catch silently
continued, and the project disappeared from /gstack discovery output.
Six new cases under describe("extractCwdFromJsonl 64KB cap"):
- happy path: small JSONL with obj.cwd returns it
- 12KB first line with obj.cwd: returns cwd (the bug case)
- 80KB single line overflowing 64KB: returns null without crashing
- complete line followed by partial second line: trailing-partial-drop
must not poison the result; returns first line's cwd
- missing file: returns null (file read error swallowed)
- malformed first line + valid second line within cap: skips bad,
returns second's cwd
Tests use the exported extractCwdFromJsonl (added in earlier export
commit) and live in a separate describe block from the existing
"4KB / 128KB buffer" tests, which exercise the unrelated scanCodex
meta.payload.cwd path at L338 — different function, different bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: regression tests for shell-script bugs in PR #1169 (#2-#5)
Two new test files pinning the four shell-script invariants from the
external audit:
regression-pr1169-build-app-sed.test.ts — bugs #2 + #3
- Runtime isolation: extracts the sed-escape sequence from build-app.sh
and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App",
"A/B\C&D"). Asserts the literal hostile name round-trips through a real
`sed s///` invocation, locking the metachar safety end-to-end.
- Static check: the rebrand block must contain both the escape line AND
the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME
interpolation directly into the s/// replacement is rejected.
- Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }`
failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation
AND the cp -a appears AFTER both guards.
- Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that
exits 1, asserts the script exits non-zero before any cp block can reach.
regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5
- Per codex pushback, the invariant is "no `mktemp ... || echo <path>`
fallback shape" — not just "no $$ token." That's a stronger invariant
that catches future swaps to $RANDOM or hardcoded paths.
- For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh:
- no echo-based fallback after mktemp
- no $$ inside any /tmp path literal
- mktemp failure path explicitly exits / returns non-zero
- telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup
so success paths don't leak the tmp on normal exit.
All seven new test files are gate-tier (deterministic, sub-second, no LLM,
no network). Runtime shell tests use fake-bin PATH stubs in temp dirs;
no $HOME mutation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.41.1.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: RagavRida <ragavrida@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
026751ea20 |
v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547)
* fix(gbrain-sync): fold hostname into code-source id hash + migration (#1414) Cherry-picked from #1468 by 0xDevNinja and extended with the hostname-fold migration that codex review surfaced. Pre-fix `deriveCodeSourceId` hashed the absolute repo path alone, so two machines with identical home-dir layouts (chezmoi-managed dotfiles, ansible-provisioned VMs) derived the same id and clobbered each other's `local_path` in a federated brain. Last-writer-wins, with cryptic "Not a git repository" errors on the loser. Hash key is now `\${hostname}::\${path}`. Conductor worktrees on a single host stay distinct (path entropy unchanged within a host); cross-machine federations stop colliding. Migration (D1=B + codex refinements): every existing user has a pre-#1468 path-only-hash source id in their brain that no longer matches what `deriveCodeSourceId` produces. Without migration, the next sync registers a fresh source and orphans the old one. This commit adds: - \`derivePathOnlyHashLegacyId\` — separate helper for the pre-#1468 form. Distinct from \`deriveLegacyCodeSourceId\` (pre-pathhash v1.x form); both probes run. - \`planHostnameFoldMigration\` — feature-checks \`gbrain sources rename <old> <new>\` (exact argument shape, not just \`--help\`), gates on path-drift (skip migration if old source's \`local_path\` differs from current repo root), and falls back to register-new + sync-OK + remove-old when rename is unsupported. As of gbrain 0.35.0.0 the rename subcommand does not exist, so users go through the cleanup path; the rename path stays dormant until gbrain ships it. - \`removeOrphanedSource\` — called only AFTER new-source sync verifies page_count > 0. Closes the data-loss window codex flagged where "register new, remove old before sync" can wipe pages if sync fails. - \`sourceLocalPath\` — looks up a source's \`local_path\` from \`gbrain sources list --json\` for the drift gate. - Helpers accept an optional \`env\` parameter so tests can inject a gbrain shim via PATH without process-wide PATH mutation (Bun's spawnSync doesn't pick up runtime PATH changes). Pre-positions for commit 4's centralized gbrain-exec helper. - \`if (import.meta.main)\` guard around \`main()\` so the helpers can be imported for in-process unit tests. Tests cover: pure derivation, ids-match degenerate case, no-legacy short-circuit, path-drift skip path, rename path with shim, cleanup fallback when rename unsupported, cleanup fallback when rename call itself fails, source-lookup happy/missing/error paths. \`GSTACK_HOSTNAME\` env var is a test-only knob; production uses \`os.hostname()\`. Fixes #1414 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): cut source-id slugs on hyphen boundaries (+ #1357) Cherry-picked from #1481 by drummerms and extended with the explicit HTTPS-remote regression case for #1357 (decision D2=A). `constrainSourceId` truncated the slug with `slug.slice(-tailBudget)`, which cut mid-word when the boundary fell inside a token. For a repo where the combined `prefix-org-repo-pathhash` exceeded 32 chars, this produced embarrassing artifacts like `gstack-code-kill-270c0001-c32152` (from `drummerms-av-sow-wiz-skill-270c0001`). Two changes carried from #1481, adapted for the #1468 hostpathhash: 1. `constrainSourceId` now walks hyphen-separated tokens from the right, accumulating whole tokens until adding the next would exceed `tailBudget`. When no token fits, falls through to the existing `${prefix}-${hash}` form. 2. `deriveCodeSourceId` now retries with `repo-only-hostpathhash` (dropping the org segment) when the full `org-repo-hostpathhash` triggers truncation. Keeps the repo name readable when it fits at all. Plus a new test asserting the source id is period-free for the exact HTTPS-with-.git remote shape from #1357 (`https://github.com/foo/bar.git`). canonicalizeRemote strips `.git`; the sanitizer strips any residual non-alnum. The test closes #1357 by pinning the property. Closes #1357 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain): probe CLI without command builtin * fix(gbrain-sync): centralize gbrain spawn surface + seed DATABASE_URL Cherry-picked from #1508 by jasshultz, restructured per codex review #4 and #7 to widen scope and centralize the spawn surface. The bug: gbrain auto-loads .env.local from cwd via dotenv. When /sync-gbrain runs inside a Next.js / Prisma / Rails project whose .env.local defines its own DATABASE_URL (pointing at the app's local DB), gbrain reads that value instead of its own ~/.gbrain/config.json — auth fails, code + memory stages crash. This commit: - Adds lib/gbrain-exec.ts: buildGbrainEnv, spawnGbrain, execGbrainJson, execGbrainText, spawnGbrainAsync (the last one for memory-ingest's streaming gbrain import call). buildGbrainEnv seeds DATABASE_URL from ${GBRAIN_HOME:-$HOME/.gbrain}/config.json, returns a fresh env object (never the caller's by identity — codex review #11), and honors the GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch. - Routes every gbrain spawn in bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts through the helpers. Both files now own zero direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain" call sites. - Threads buildGbrainEnv into the spawnSync("bun", [memory-ingest], ...) grandchild in runMemoryIngest (codex review #7). Without this, the parent fix is half-baked — the bun child inherits a clean env but needs DATABASE_URL pre-seeded too. spawnGbrainAsync inside memory-ingest provides defense in depth for standalone invocations. - Adds GBRAIN_HOME support — aligns with detectEngineTier (already honors GBRAIN_HOME) so all gstack-side gbrain calls agree on which config file matters. Resolves baseEnv.HOME first, then homedir(), so test injection works without process-wide HOME mutation. - Adds test/build-gbrain-env.test.ts: 10 unit tests covering all five env-seeding branches (seed from config / override caller / GSTACK_RESPECT escape hatch / missing config / unparseable config / no database_url field / GBRAIN_HOME path / object-identity guard / unrelated-vars preservation / idempotent-when-matches). - Adds test/gbrain-exec-invariant.test.ts: static-source check that greps both bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts for direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain"| execSync(...gbrain matches and fails the build if any are found. Refactor-proof against future contributors adding a new gbrain spawn without env threading. The invariant is intentionally narrow — only the two files where the DATABASE_URL bug actually hurts users are guarded. Migrating the spawn sites in lib/gbrain-local-status.ts, lib/gstack-memory-helpers.ts, and bin/gstack-brain-context-load.ts is a follow-up. Co-Authored-By: Jason Shultz <jasshultz@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): add .gbrain-source to consumer repo .gitignore (#1384) The v1.29.0.0 changelog promised .gbrain-source would be added to the consuming repo's .gitignore so the per-worktree pin stays local, but the change actually only added it to gstack's own .gitignore. Without the consumer-side entry, the pin gets committed and Conductor sibling worktrees of the same repo + branch step on each other's pin every time anyone commits. Add ensureGbrainSourceGitignored after a successful gbrain sources attach in runCodeImport. Idempotent on repeat runs (line-trim match), creates .gitignore if missing, logs a warning and continues on permission errors so a read-only checkout doesn't fail the sync. Gate the top-level main() call behind import.meta.main so tests can import the helper without triggering a full sync run on module load. Tests in test/gbrain-source-gitignore.test.ts cover: create-when-missing, append-without-trailing-newline, append-with-trailing-newline, idempotent on repeat, recognize whitespace-surrounded entry, no-throw on read-only file. 6 pass. * fix(gbrain-sources): bump gbrain sources list --json timeout 10s → 30s Supabase free-tier cold-starts can push `gbrain sources list --json` past 10s (observed 14.5s in the wild), causing probeSource() to throw ETIMEDOUT during /sync-gbrain code stage even though the underlying CLI was healthy. Matches the 30s ceiling already used by `sources add` / `sources remove` in the same file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-allowlist): sync project-root eng-review-test-plan artifacts (#1452) Cherry-picked from #1465 by genisis0x and extended with the v1.40.0.0 upgrade migration that codex review #5 surfaced. #1465 alone only patches bin/gstack-artifacts-init, which means fresh installs and re-inits pick up the new pattern. But existing users who already ran v1.38.1.0 have a `.migrations/v1.38.1.0.done` marker — that migration won't re-run no matter what we change. So their installed `.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes` stay without the new pattern, and `/plan-eng-review` artifacts continue to silently drop out of their federation queue. This commit: - bin/gstack-artifacts-init: adds projects/*/*-eng-review-test-plan-*.md to the three managed blocks. v1.38.1.0 covered design + test-plan; this completes the set for /plan-eng-review. - gstack-upgrade/migrations/v1.40.0.0.sh: targeted in-place repair for existing installs. Same idempotent jq-based shape as v1.38.1.0. Adds the new pattern to .brain-allowlist (before the USER ADDITIONS marker), .brain-privacy-map.json (as class=artifact), and .gitattributes (as merge=union). NEVER commits + pushes — the user controls when the patches ship to their federated artifacts repo. - test/artifacts-init-migration.test.ts: 5 new tests covering the v1.40.0.0 migration applied on top of a post-v1.38.1.0 state, jq patching, gitattributes append, idempotent re-run, and done-marker write when files are missing entirely. Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-install): skip postinstall on Windows MSYS/MINGW + post-install probe Cherry-picked from #1487 by genisis0x and extended with the post-install subcommand probe per T6 / codex review #19. `bun install` in $INSTALL_DIR fails on Windows MSYS/MINGW/Cygwin shells because gbrain's native postinstall script mis-parses path arguments and aborts with a non-zero exit, breaking gstack-gbrain-install for Windows users running git-bash/MSYS2. The package installs cleanly without scripts. This commit: - Adds Windows shell detection via `uname -s` matching MINGW*/MSYS*/CYGWIN*/Windows_NT (#1487's case statement already covers all four — codex review #18 confirmed MINGW* is included). Windows paths get `bun install --ignore-scripts`; macOS and Linux unchanged. - Adds a post-install probe of `gbrain sources --help`. `gbrain --version` already runs (D19 PATH-shadowing validation), but version success doesn't prove the subcommand surface is reachable — and `--ignore-scripts` may have skipped artifacts that subcommands need. Probe failure logs a clear warning (with Windows-specific remediation pointing at re-running `bun install` outside MSYS) but does NOT exit non-zero; users may still get value from gbrain even if the probe fails transiently. Refs #1271 Co-Authored-By: Claude <noreply@anthropic.com> * chore: v1.40.0.0 — gbrain sync hardening wave Bumps VERSION 1.39.2.0 → 1.40.0.0 (MINOR — substantial gbrain capability hardening across sync pipeline, install path, federation allowlist; ~600 net LOC added across 8 community PRs + plan-review refinements). CHANGELOG entry follows the release-summary format: two-line headline, lead paragraph, "numbers that matter" with before/after table across 8 user-visible surfaces, "what this means for builders" closer, itemized Added/Changed/Fixed/NOT fixed/For contributors sections. Per-commit contributor credits: 0xDevNinja, drummerms, Jayesh Betala, Jason Shultz, genisis0x. Also names NikhileshNanduri and realcarsonterry in the wave's "Fixed" section for independent submissions of the .gbrain-source gitignore bug. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: drummerms <mike@av2o.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Jason Shultz <jasshultz@gmail.com> Co-authored-by: genisis0x <manietdavv@gmail.com> |
||
|
|
33cb4715ef |
v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534)
* feat: GSTACK_* env-key shim for Conductor workspaces New lib/conductor-env-shim.ts promotes GSTACK_ANTHROPIC_API_KEY and GSTACK_OPENAI_API_KEY to canonical names when canonical is empty. Wired into the four TS entry points that hit paid APIs or gbrain embeddings: gstack-gbrain-sync.ts, gstack-model-benchmark, preflight-agent-sdk.ts, test/helpers/e2e-helpers.ts. Side-effect-only import, 15 lines total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: gbrain+gstack setup, Conductor env mapping (v1.39.2.0) USING_GBRAIN_WITH_GSTACK.md: new "What you get after setup" section, Path 4 (remote MCP / split-engine), /sync-gbrain workflow stages + watermark mechanics, "Conductor + GSTACK_* env vars" section, env vars table extended, two troubleshooting entries (silent embedding failure and FILE_TOO_LARGE watermark block). CONTRIBUTING.md "Conductor workspaces": new paragraph on the GSTACK_* prefix pattern and the four entry points importing the shim. VERSION 1.39.1.0 → 1.39.2.0 and CHANGELOG entry covering the shim + docs (full release-summary format with before/after table). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: unit coverage for conductor-env-shim Refactor lib/conductor-env-shim.ts to export promoteConductorEnv() so unit tests can manipulate env and call it directly (a bare side- effect IIFE on import isn't reachable from bun:test once cached). The on-import IIFE still runs — existing four-entry-point imports keep working unchanged. test/conductor-env-shim.test.ts covers all three branches: GSTACK_FOO present + FOO empty → promotion; FOO already set → no-overwrite; nothing in env → no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: Conductor strips canonical API keys (not just "doesn't inherit") The prior docs framed the GSTACK_* prefix as collision-avoidance: "Conductor exposes API keys under a GSTACK_ prefix so it never collides with whatever the host system has set." That understates the mechanism — Conductor actively strips ANTHROPIC_API_KEY and OPENAI_API_KEY from every workspace's process env, so setting them in ~/.zshrc or .env doesn't help. The fix path is to set the GSTACK_-prefixed forms in Conductor's workspace env config; Conductor passes those through untouched. Three docs updated to reflect the strip, not the polite framing: USING_GBRAIN_WITH_GSTACK.md (Conductor section), CONTRIBUTING.md (Conductor workspaces paragraph), CHANGELOG.md (release summary). README.md gains a "Running gstack in Conductor?" callout in the GBrain section pointing at the canonical doc's anchor, plus a fourth path entry (remote gbrain MCP / split-engine) that was already documented in USING_GBRAIN but missing from the README summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f58977041c |
v1.39.1.0 feat: EXIT PLAN MODE GATE for plan-mode review skills (#1512)
* feat: EXIT PLAN MODE GATE for plan-mode review skills Add a terminal BLOCKING checklist that verifies the plan file ends with `## GSTACK REVIEW REPORT` before ExitPlanMode is called. Lives at EOF of all four plan-* review skills (eng/ceo/design/devex) and inside codex Step 2A. Tones down the preamble's "Plan Status Footer" to a neutral forward reference so review-report rules don't bleed into operational skills (/ship /qa /review). Single source of truth: `generateExitPlanModeGate` in scripts/resolvers/review.ts, registered as EXIT_PLAN_MODE_GATE in scripts/resolvers/index.ts. New test in test/gen-skill-docs.test.ts strips fenced code blocks before matching `## ` headings and asserts the gate is the terminal heading in all four plan-* review SKILL.md files. Codex's SKILL.md uses toContain (mid-file by design — Step 2B/2C are not plan-touching modes). Decisions locked via /plan-eng-review + /codex outside-voice: - D1=A: 4 plan-* reviews + codex (autoplan, office-hours deferred) - D2=B → D4=A: tone preamble down to neutral forward reference - D3=A: add automated test in test/gen-skill-docs.test.ts - D5=B: keep codex gate inside Step 2A (mid-file acceptable per gate self-gating) Codex pre-merge findings folded in: line numbers obsolete (use EOF), test regex must strip fences, fresh skill list (not stale REVIEW_SKILLS constant), gate check 4 short-circuits when no plan file in context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.39.1.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: package.json build script uses subshells, not brace groups The three `{ git rev-parse HEAD 2>/dev/null || true; } > path/.version` brace groups in the build script regressed when v1.38.0.0 merged into this branch (resolved with --ours during conflict). Bun on Windows can't parse brace groups in this position; the v1.38.0.0 invariant requires `(...)` subshells. Windows CI test `package.json build scripts — POSIX shell compat` caught it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
25cf5edf21 |
v1.39.0.0 feat: buildFetchHandler factory unblocks gbrowser submodule consumption (#1511)
* feat: buildFetchHandler factory unblocks gbrowser submodule consumption
Add buildFetchHandler(cfg: ServerConfig): ServerHandle in browse/src/server.ts.
Refactor start() to delegate handler construction to the factory and read env
once via resolveConfigFromEnv(). Wire the beforeRoute hook (runs after the
tunnel surface filter, before per-route dispatch).
Auth is now cfg-driven end-to-end. Module-level AUTH_TOKEN const +
initRegistry(AUTH_TOKEN) boot call, validateAuth, and shutdown are deleted;
factory closure owns them. start() threads cfg.authToken into launchHeaded,
the state-file write, and the factory.
initRegistry is idempotent for same-token re-init; throws clearly for
different-token re-init. __resetRegistry() test helper added (mirrors
__resetConnectRateLimit). Existing tests that did rotateRoot() ->
initRegistry('fixed-token') swap to __resetRegistry() to avoid the new guard.
14 factory contract tests added covering ServerHandle shape, auth wiring,
validation throws, hook semantics across both surfaces, and registry
idempotency.
Source-pattern tests in dual-listener.test.ts and server-auth.test.ts
updated for the new identifiers (handle.fetchLocal/fetchTunnel, authToken,
shutdownFn).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore: bump version and changelog (v1.39.0.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
ea51b45e08 |
v1.38.1.0 fix wave: surrogate-safe page captures (#1440), Implementation Tasks across review skills (#1454), root-level artifact patterns (#1452) (#1504)
* fix(browse): sanitize lone Unicode surrogates at commandResult chokepoint + /batch envelope (#1440) Page captures with mixed-script Unicode round-trip cleanly to the Claude API. Two new utilities in browse/src/sanitize.ts: stripLoneSurrogates for raw UTF-16 strings, stripLoneSurrogateEscapes for \uXXXX JSON escape text. sanitizeBody picks the right pass based on cr.json. buildCommandResponse is extracted from handleCommand (now exported) and applies sanitization before new Response(). /batch was bypassing this chokepoint via direct JSON.stringify, so it sanitizes each cr.result before pushing AND wraps the envelope with stripLoneSurrogateEscapes. Defense in depth wraps at getCleanText, getCleanTextWithStripping, html, accessibility, and snapshot.ts return points so downstream consumers (datamarking, envelope wrapping) see sanitized text before the response is built. 25 new unit tests across sanitize.test.ts and build-command-response.test.ts. content-security.test.ts updated to accept either pre- or post-sanitize form of the snapshot scoped branch (source-level regression check). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat: bug fix wave v1.36.0.0 — Implementation Tasks, allowlist patterns, surrogate-safe page captures (#1440 #1452 #1454) Three filed issues land together: #1440 — Page captures from real-world HTML hit 'API Error 400: no low surrogate in string'. Sanitizers + buildCommandResponse extraction shipped in the prior commit; this commit adds the migration script that patches existing brain-allowlist/privacy-map/gitattributes installs and the supporting tests. #1452 — Federation sync was silently skipping root-level design and test-plan docs. bin/gstack-artifacts-init adds two patterns to all three managed blocks (.brain-allowlist, .brain-privacy-map.json, .gitattributes). Idempotent migration v1.36.0.0.sh repairs existing installs in place via jq (preserves JSON validity) — no commit + push from the migration. #1454 — All four review skills (CEO/design/eng/DX) emit an Implementation Tasks markdown section AND write a jq-built JSONL artifact per phase. /autoplan reads all four files, scopes by current branch + 5-commit window, dedupes on exact (component, sorted(files), title), and renders an aggregated list in the Final Approval Gate. New tests: - browse/test/sanitize.test.ts (18 cases) - browse/test/build-command-response.test.ts (7 cases) - test/artifacts-init-migration.test.ts (7 cases) VERSION → 1.36.0.0. Skips the v1.34.x slot taken by 'gstack consumable as submodule' and the v1.35.0.0 slot taken by /document-generate. #1428 was shipped separately by v1.34.2.0 with a different approach; follow-up #1503 filed for the bare-path filesystem boundary concern surfaced during our analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump to v1.38.1.0 VERSION + package.json + CHANGELOG header + migration filename + test reference all consistently at v1.38.1.0. Migration renamed: gstack-upgrade/migrations/v1.38.0.0.sh -> v1.38.1.0.sh. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3bf43766d5 |
v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505)
* fix(browse): single-point Unicode sanitization at server egress Add sanitizeLoneSurrogates (regex-based UTF-16 lone-half cleaner) and sanitizeReplacer (JSON.stringify replacer that runs the cleaner on every string field during encoding). Split handleCommandInternal into handleCommandInternalImpl (raw) plus a thin sanitizing wrapper. The wrapper applies sanitizeLoneSurrogates to cr.result so both single-command (handleCommand line 1034) and batch-loop (line 1966) egress paths inherit it. Inline INVARIANT comment near the wrapper documents the architectural constraint. Both SSE producers (activity feed at /activity/stream and inspector stream) stringify with sanitizeReplacer. Post-stringify regex is ineffective on those paths because JSON.stringify has already converted the lone surrogate into the escape sequence "\\\\uD800" before any regex could match it; the replacer runs during stringify on the raw string value, so the substitution lands. Originated from @realcarsonterry PR #1463 (handleCommand-only wrap). Architectural lift to handleCommandInternal + SSE coverage authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(setup): _link_or_copy helper for Windows file-copy fallback On Windows without Developer Mode (MSYS2/Git Bash), plain ln -snf silently creates a frozen file copy that doesn't refresh on git pull. Skill files become stale after every upgrade. Add a _link_or_copy SRC DST helper near IS_WINDOWS detection (line ~33). It auto-dispatches: on Unix it preserves ln -snf semantics, on Windows it copies (cp -R for directories, cp -f for files). When the source is a Unix-style name-only alias that doesn't resolve on disk (the connect-chrome → gstack/open-gstack-browser pattern), the helper returns 0 silently on Windows rather than aborting setup under set -e. Rewrite all 42 prior ln -snf call sites to route through the helper: link_claude_skill_dirs (line 437), team-claude install paths (lines 556, 581, 592), Codex host adapter block (lines 618-640), Factory host adapter block (lines 658-678), OpenCode host adapter block (lines 696-731), Kiro host adapter block (lines 939-953), plus migration and alias sites. Add _print_windows_copy_note_once helper and call it from link_claude_skill_dirs after any linking work completes so Windows users see one user-visible note explaining they must re-run ./setup after every git pull. Extend cleanup_old_claude_symlinks and cleanup_prefixed_claude_symlinks with a Windows branch: when the target is a real directory containing a real-file SKILL.md (no symlink to readlink), and IS_WINDOWS=1, treat the name-matched directory as gstack-managed and remove it. This makes --prefix / --no-prefix flips work on Windows instead of leaving stale copies behind. Originated from @realcarsonterry PR #1462 (1 of 42 sites). Helper extraction, 42-site rewrite, alias-resolution edge case, and Windows cleanup compat authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(docs): rename stale gbrain_sync_mode to artifacts_sync_mode + register /document-generate Five stale gstack-config references in docs/ pointed to the deprecated gbrain_sync_mode key (renamed to artifacts_sync_mode in v1.27.0.0): - docs/gbrain-sync.md: lines 62, 110, 111, 173 - docs/gbrain-sync-errors.md: lines 26, 203 Users following the docs would set a key that gstack-brain-sync no longer reads, silently breaking artifacts sync. Originated from @realcarsonterry PR #1461 (verbatim). Also register /document-generate in AGENTS.md (Operational + memory table) and docs/skills.md (skill index). The skill shipped in v1.35.0.0 but the doc-inventory cross-check in test/skill-validation.test.ts was failing because neither file mentioned it. Allowlist the new test/docs-config-keys.test.ts file in test/no-stale-gstack-brain-refs.test.ts — it intentionally lists the deprecated keys in its DEPRECATED_KEYS denylist (defending the rename). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): migrate windows-free-tests to paid faster runner + register wave tests Move the Windows free-test job from GitHub-hosted windows-latest to Blacksmith's paid Windows runner (blacksmith-2vcpu-windows-2022). Spin-up drops from ~60s to ~10s and Bun installs land 3-4x faster. The label can swap to namespace-profile-windows or ubicloud-windows-* if this repo's Blacksmith installation isn't configured. Register the four new wave tests in the workflow's curated test list: - browse/test/server-sanitize-surrogates.test.ts - test/setup-windows-fallback.test.ts - test/build-script-shell-compat.test.ts - test/docs-config-keys.test.ts These tests cover the Windows-hardening surface that this wave ships (sanitizer wiring, _link_or_copy helper, build-script subshells, doc- config drift), so they need to run on Windows where the bug shapes actually manifest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: wave coverage for sanitizer, link_or_copy, build script, doc drift Four new test files (29 cases total): browse/test/server-sanitize-surrogates.test.ts: - 11 unit cases for sanitizeLoneSurrogates (passthrough, valid pair, lone high/low mid-string, trailing/leading lone, adjacent doubles, pair-then-lone, lone-then-pair, empty) - 2 bug-repro tests pinning the regression intent (UTF-8 round-trip, JSON.parse round-trip with codepoint assertion) - 4 wiring invariants asserting the architectural choke points stay intact (handleCommandInternalImpl rename, central sanitization line, sanitizeReplacer function exists, SSE producers stringify with replacer) Function extracted from server.ts via regex + eval'd in test scope so no production-code export is needed. test/setup-windows-fallback.test.ts: - Static invariant (D7): zero raw `ln` calls outside the _link_or_copy helper body and comments - Helper-existence assertions - 4-cell behavior matrix (file/dir × Windows/Unix) via awk-style helper extraction + bash -c sourcing - Windows-note printer registration check Mirrors test/setup-conductor-worktree.test.ts patterns. test/build-script-shell-compat.test.ts: - Regex assertion that package.json scripts.* contain no bash brace groups (Bun-Windows-hostile) - Subshell-precedence check for `.version` redirects Strips single-quoted strings before regexing so embedded JS code inside echo '...' doesn't false-positive. test/docs-config-keys.test.ts: - DEPRECATED_KEYS denylist scanned across docs/**/*.md - Round-trip test for `gstack-config get artifacts_sync_mode` Defends the v1.27.0.0 rename from doc drift. Updates to two existing tests: - test/setup-conductor-worktree.test.ts: expect `_link_or_copy` instead of `ln -snf` at the Conductor-worktree guard call site - test/gen-skill-docs.test.ts: same swap at three assertion sites (Codex section, Claude link_claude_skill_dirs body, Codex link_codex_skill_dirs body) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump v1.38.0.0 + build-script subshells + CHANGELOG VERSION 1.35.0.0 → 1.38.0.0 (MINOR). PR #1500 (lyon-v2) claimed v1.37.0.0 ahead of this branch; v1.38.0.0 is the next free MINOR slot per bin/gstack-next-version queue check. Workspace-aware ship rule applies — queue-advancing past a claimed version within the same bump level is explicitly permitted. package.json build script: three `{ git rev-parse HEAD ...; }` brace groups → `( git rev-parse HEAD ... )` subshells. Bun's Windows shell parser doesn't grok bash brace groups; subshells are POSIX-universal. Originated from @realcarsonterry PR #1460. CHANGELOG entry covers the full wave: - Windows install hardening (42-site _link_or_copy + cleanup compat) - Unicode sanitization architecture (handleCommandInternal + SSE replacer) - Build script POSIX-shell compat (subshells) - Doc rename (gbrain_sync_mode → artifacts_sync_mode) - Windows CI on paid faster runner - 4 new wave tests (29 cases) Frames each item as a current system property, not a fix narrative. Credits @realcarsonterry for PRs #1460, #1461, #1462, #1463 (the seed of the wave). Scope expansion to all 42 setup sites, every server egress path, Windows CI migration, and codex-flagged P0/P1 fixes (connect-chrome alias on Windows, SSE replacer, prefix-cleanup Windows compat) authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship sync for v1.38.0.0 Document the two architectural invariants that landed in v1.38.0.0 in their persistent homes (not just CHANGELOG): - README Windows section: add the `./setup` re-run-after-git-pull requirement that `_print_windows_copy_note_once` shows at runtime. - CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for contributors editing `setup`, with the test that enforces it. - ARCHITECTURE: new "Unicode sanitization at server egress" section between Shell injection prevention and Prompt injection defense, with egress table (HTTP/batch/SSE) and the post-stringify-regex rationale. - CLAUDE.md: cross-references for both invariants, matching the v1.6.0.0 dual-listener pattern (each constraint says which files to read before editing and which test pins it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): use windows-latest-8-cores instead of unregistered Blacksmith label actionlint failed PR #1505 because `blacksmith-2vcpu-windows-2022` isn't in the repo's approved runner-label list (actionlint.yaml only registers `ubicloud-standard-2`, and Ubicloud doesn't ship a Windows pool). Switch to GitHub's paid larger Windows runner `windows-latest-8-cores` — 4x the cores of the free `windows-latest` at the larger-runner billing rate, no new third-party CI provider, no actionlint config changes. CHANGELOG: replace "Blacksmith" / "blacksmith-2vcpu-windows-2022" / "~6x faster spin-up" claims with the actual choice (8 cores vs 4, paid larger runner). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): switch from windows-latest-8-cores to ubicloud-standard-2-windows `windows-latest-8-cores` sat queued indefinitely because the GitHub larger-runner billing isn't enabled at the org level — the "Queued — Waiting to run this check" status surfaced on PR #1505 with no progress for the whole CI run. Switch to Ubicloud Windows runners (`ubicloud-standard-2-windows`) so Windows CI uses the same provider as the existing Linux evals (`ubicloud-standard-2`). Billing stays under one account instead of two. Register the new label in actionlint.yaml alongside the existing ubicloud-standard-2 entry so actionlint doesn't reject it as unknown. CHANGELOG entry updated: runner row reflects the actual provider chosen, "Itemized changes" mentions the actionlint.yaml registration, and the narrative paragraph documents why `windows-latest-8-cores` failed first. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: migrate all workflows to Ubicloud (Linux + Windows, 8-core) Switch every `runs-on` in this repo to Ubicloud so CI has a single billing surface, consistent capacity, and 4x more cores on the workloads that were previously stuck on free `ubuntu-latest` (2 cores). Windows uses Ubicloud's Windows pool too — `ubicloud-standard-8-windows` — so the queued-forever problem with GitHub's `windows-latest-8-cores` paid larger runner (org-level larger-runner billing not enabled) goes away. Workflows touched (9): - evals.yml, evals-periodic.yml, ci-image.yml — bump default + matrix from `ubicloud-standard-2` to `ubicloud-standard-8`. The one matrix entry that was already on -8 stays. - windows-free-tests.yml — `ubicloud-standard-2-windows` → `ubicloud-standard-8-windows`. - make-pdf-gate.yml — matrix `ubuntu-latest` → `ubicloud-standard-8`. macOS entry preserved; the poppler-install `if: matrix.os` conditional swaps to match the new label. - actionlint.yml, pr-title-sync.yml, skill-docs.yml, version-gate.yml — `ubuntu-latest` → `ubicloud-standard-8`. .github/actionlint.yaml registers all four Ubicloud labels in one place: - ubicloud-standard-2 - ubicloud-standard-8 - ubicloud-standard-2-windows (the v1.38.0.0 windows-free-tests target) - ubicloud-standard-8-windows (this PR's windows-free-tests target) Removed the duplicate `actionlint.yaml` at the repo root that I accidentally created in the prior commit — actionlint only reads `.github/actionlint.yaml`, so the root file was dead weight. CHANGELOG entry updated: a single "all Ubicloud" sentence in the narrative plus a metrics-row covering the runner pool change, and the itemized line expanded to enumerate the 9 affected workflows. The previously-orphaned "Itemized changes" line about just `windows-free-tests.yml` is replaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): revert to free `windows-latest` Ubicloud doesn't ship Windows runners — confirmed via their docs. The `ubicloud-standard-*-windows` labels I added do not exist and were causing `windows-free-tests` to sit "Queued — Waiting to run this check" forever (GitHub Actions can't tell a typoed label from a self-hosted runner that's about to register; it just waits). Three prior Windows-runner attempts all failed for different reasons: - `blacksmith-2vcpu-windows-2022` — Blacksmith app not installed on the org - `windows-latest-8-cores` — GitHub paid larger-runner billing not enabled - `ubicloud-standard-2/8-windows` — Ubicloud doesn't offer Windows at all The free `windows-latest` runner (4 cores, ~60s spin-up, $0) is the one path that actually runs. The wave-coverage Windows tests are <30s of real work; total job time stays under 2 minutes. Cleaned up `.github/actionlint.yaml` to drop the bogus `ubicloud-standard-*-windows` entries — kept only the two real Linux labels. CHANGELOG: split the runner-pool row into Linux (migrated to Ubicloud-8) vs Windows (stays on free windows-latest), with the why on each. Itemized line for windows-free-tests rewritten to reflect the actual outcome. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(windows): skip Unix-only cases on Windows runner windows-free-tests on GitHub free windows-latest fails three cases that depend on Unix tooling the runner doesn't have: 1. `setup-windows-fallback.test.ts` behavior matrix — IS_WINDOWS=0 cells assert `ln -snf` produces a real symlink. On Windows-without-Developer- Mode (which the free `windows-latest` runner is), `ln -snf` silently creates a file copy. That's literally the bug `_link_or_copy` exists to work around, so the assertion can never pass there. Skip the whole describe block on win32. The static-invariant test (zero raw `ln` outside the helper body) above the matrix still runs and pins the shape the Windows install relies on. 2. `docs-config-keys.test.ts` round-trip — spawnSync(`bin/gstack-config`) on Windows doesn't read the bash shebang and fails to exec. Skip on win32; the deprecated-key denylist test in the same file still runs and is the actual invariant defending the v1.27.0.0 rename at the doc layer. Use `describe.skipIf(process.platform === 'win32', ...)` and `test.skipIf(process.platform === 'win32', ...)`. Tests still run on macOS and Linux unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
e362b0ae2f |
v1.37.0.0 feat: split-engine gbrain (remote MCP brain + local PGLite for code) (#1500)
* feat(gbrain): add lib/gbrain-local-status classifier with 5-state engine status + 60s cache
Foundation for split-engine gbrain: shared classifier used by both
bin/gstack-gbrain-detect (preamble probe) and bin/gstack-gbrain-sync.ts
(orchestrator SKIP-when-not-ok). Single source of truth.
Probes via `gbrain sources list --json` and classifies stderr against the
same patterns lib/gbrain-sources.ts:66-67 already uses ("Cannot connect to
database", "config.json"). Returns one of: ok, no-cli, missing-config,
broken-config, broken-db. Defensive default: unrecognized failures
classify as broken-config so the raw stderr can be surfaced upstream.
Cache at ~/.gstack/.gbrain-local-status-cache.json keyed on
{home, path_hash, gbrain_bin_path, gbrain_version, config_mtime, config_size}
with 60s TTL. Cache invalidates on any invariant change. --no-cache option
busts the cache for callers that just mutated state (/setup-gbrain,
/sync-gbrain after init/migration).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(gbrain): rewrite gstack-gbrain-detect bash→TS + add gbrain_local_status field
Replaces the bash detect helper with a bun shebang script sharing the
gbrain_local_status classifier from lib/gbrain-local-status.ts with the
sync orchestrator. Single source of truth for engine-status classification
between preamble-probe and orchestrator-skip paths.
Filename stays gstack-gbrain-detect (no .ts extension) so existing skill
preamble callers shell out unchanged. Shebang `#!/usr/bin/env -S bun run`
resolves bun at runtime.
Output is key/type backward-compatible with the bash version per plan
codex #5: the 9 pre-existing keys (gbrain_on_path, gbrain_version,
gbrain_config_exists, gbrain_engine, gbrain_doctor_ok, gbrain_mcp_mode,
gstack_brain_sync_mode, gstack_brain_git, gstack_artifacts_remote) stay
identical in name + type + value semantics. One new key added:
gbrain_local_status (5-state string enum).
Updates the existing schema regression at test/gstack-gbrain-detect-mcp-mode.test.ts
to include the new key. Adds test/gbrain-detect-shape.test.ts asserting
the regression contract for future changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gbrain): orchestrator SKIP when local engine not ok + remote-http transcripts via artifacts pipeline
Two changes in the sync orchestrator, both per plan D11/D12:
1. bin/gstack-gbrain-sync.ts: runCodeImport + runMemoryIngest call
localEngineStatus() (shared classifier from lib/gbrain-local-status.ts).
When status is not 'ok', return a SKIP stage result with a clear reason
instead of crashing with "source registration failed: gbrain not
configured". Brain-sync stage runs regardless — it doesn't depend on
local engine. dry-run preview path is gated above the check so it
continues to show would-do steps even when the engine is broken.
2. bin/gstack-memory-ingest.ts: when gbrain MCP is registered as
remote-http (Path 4), persist staged transcripts to
~/.gstack/transcripts/run-<pid>-<ts>/ instead of the ephemeral
~/.gstack/.staging-ingest-<pid>-<ts>/ tmp dir, and SKIP the local
`gbrain import` call entirely. The artifacts pipeline (gstack-brain-sync
push to git, brain admin pulls and indexes) handles routing to the
remote brain. Local PGLite (when present via Step 4.5) stays code-only.
State recording still happens — prepared pages get their mtime+sha256
stamped under remote-http mode so the next /sync-gbrain doesn't
re-stage them. Cleanup is skipped intentionally so the persisted dir
survives until gstack-brain-sync moves it.
Adds test/gbrain-sync-skip.test.ts covering 5 SKIP scenarios (broken-db,
broken-config, no-cli, missing-config, ok pass-through). All 25
sync-related unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gbrain): v1.34.0.0 migration notice + transcripts allowlist for artifacts pipeline
Per plan D5 + D11. Two pieces of the split-engine rollout:
1. gstack-upgrade/migrations/v1.34.0.0.sh — prints a one-time
discoverability notice for existing Path 4 (remote-http MCP) users
whose machine has no local engine yet. Tells them about /setup-gbrain
Step 4.5 (the new local-PGLite opt-in). Silent for everyone else.
User can suppress permanently via `gstack-config set
local_code_index_offered true`. Touchfile at
~/.gstack/.migrations/v1.34.0.0.done makes it idempotent.
2. bin/gstack-artifacts-init — adds `transcripts/run-*/*.md` and
`transcripts/run-*/**/*.md` to the managed allowlist so the
gstack-memory-ingest persistent staging dir (used in remote-http
mode per D11) gets pushed to the artifacts repo. Brain admin's
pull job then indexes transcripts into the remote brain.
Privacy class: behavioral (matches transcript content).
Adds test/gstack-upgrade-migration-v1_34_0_0.test.ts with 5 cases:
state match, no-MCP, local-config-present, opt-out, and idempotency.
All 5 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gbrain): /setup-gbrain Step 1.5/4.5 + /sync-gbrain Step 1.5 templates
Per plan D4, D10, D11, D12. Wires the skill prose to the new
split-engine flow + classifier introduced in earlier commits.
setup-gbrain/SKILL.md.tmpl:
- Step 1: detect output description now includes the v1.34.0.0
gbrain_local_status field (5 values).
- Step 1.5 (NEW): broken-db / broken-config remediation. AskUserQuestion
with 4 options — Retry / Switch to PGLite / Switch brain mode / Quit
(plan D4). Retry is recommended first since broken-db often = transient
Postgres outage. PGLite is explicitly one-way + destructive (moves
existing config to ~/.gbrain/config.json.gstack-bak-<ts>); rollback on
init failure restores the .bak (plan D7).
- Step 4d → Step 4.5 (NEW): in Path 4, after the verify step, offer
local PGLite for code search. AskUserQuestion Yes/No (plan D10/D11).
Yes path runs gstack-gbrain-install + `gbrain init --pglite --json`
with the same rollback-safe sequence. No path skips Steps 3/4/5/7.5.
- Step 10 verdict (Path 4): adds "Code search" row reflecting Step 4.5
choice. Updates "Transcripts" row to describe the new D11 routing
(artifacts repo → remote brain).
sync-gbrain/SKILL.md.tmpl:
- Step 1 split-engine prose: corrects the prior misleading claim that
"memory routes through whatever setup-gbrain configured, including
remote-MCP" (codex finding #3). Memory stage shells out to local
`gbrain import` in local-stdio mode; in remote-http mode it persists
to ~/.gstack/transcripts/ for the artifacts pipeline.
- Step 1.5 (NEW): local-engine pre-flight. STOP on no-cli, broken-config,
broken-db. Soft skip (continue with code+memory SKIP) on
missing-config + remote-http per plan D12. Surfaces actionable user
remediation message instead of the orchestrator crashing two stages
with ERR.
Regenerated SKILL.md for all hosts (claude, kiro, opencode, slate,
cursor, openclaw, hermes, gbrain). All 712 skill-validation + gen-skill-docs
tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(gbrain): .bak-rollback contract for Step 1.5 / 4.5 init failure path
Per plan D7 (rollback semantics) and codex #10 (rollback scope). The
/setup-gbrain skill instructs the model to follow a specific shell
sequence when running `gbrain init --pglite` against an existing
config:
1. mv ~/.gbrain/config.json ~/.gbrain/config.json.gstack-bak-<ts>
2. gbrain init --pglite --json
3. on non-zero exit: mv .bak back; surface error
This test verifies that contract using a fake `gbrain` binary that
fails on init. Three cases:
- FAILURE: gbrain init exits non-zero → broken config restored to
original path, no leftover .bak.
- SUCCESS: gbrain init exits 0 → new config in place, .bak survives
for audit (user reviews + deletes manually).
- SCOPE: any partial PGLite directory at ~/.gbrain/pglite/ is NOT
auto-cleaned. We only promise to restore config.json; PGLite
cleanup is the user's call (codex #10).
If the skill template rewrites this sequence in a future change, this
test should fail until the test's shell is updated too. That's the
point — keep the test and the skill template aligned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(gbrain): periodic E2E for /setup-gbrain Path 4 + Step 4.5 Yes flow
End-to-end coverage of the new opt-in question via runAgentSdkTest.
Stubs the MCP endpoint at /tools/list with a 200 response carrying a
fake gbrain v0.32.3.0 serverInfo, and fakes the gbrain + claude CLIs
so init writes a PGLite config and mcp add succeeds. Asserts the model:
1. invokes gstack-gbrain-install (Step 4.5 Yes branch)
2. invokes `gbrain init --pglite --json`
3. writes a working ~/.gbrain/config.json with engine=pglite
4. registers the remote MCP via `claude mcp add --transport http`
5. never leaks the bearer token to CLAUDE.md
Classified as periodic-tier per plan D6 (codex #12 flagged AgentSDK
flakiness; gate-tier coverage of the split-engine behavior lives in the
deterministic unit tests at gbrain-local-status.test.ts and
gbrain-sync-skip.test.ts). Touchfile fires the test when the skill
template, install/verify/init helpers, the local-status classifier, or
the agent-sdk-runner harness changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(gbrain): bump migration to v1.35.0.0 after main merge
main shipped v1.34.0.0 (factory-export submodule) + v1.34.1.0 (update-check
hardening) while this branch was in flight. The migration file I named
v1.34.0.0.sh now belongs at v1.35.0.0 — the next minor on top of main,
matching the scale of split-engine work (new lib + orchestrator skip +
template overhaul + transcripts routing).
Renames the migration script and its test file; updates all internal
version references in both files. Behavior unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf(gbrain): memoize gbrain resolution + use --fast doctor in detect
Cuts detect's wall time substantially by sharing fork-exec results
between the helper that walks the JSON output and the localEngineStatus
classifier from lib/gbrain-local-status.ts.
Before: detect made 2x `command -v gbrain` calls (one in detect's
detectGbrain, one in the classifier's resolveGbrainBin) and 2x
`gbrain --version` calls. With memoization keyed on PATH, both
collapse to one fork each (~400ms saved per skill preamble).
Also adds `--fast` to the `gbrain doctor --json` call in detect so a
broken-db config (Garry's repro) doesn't burn a full 5s timeout on the
doctor's DB-connection check. The classifier still probes the DB
directly via `gbrain sources list --json` for engine reachability —
that's `gbrain_local_status`, separate from the coarse
`gbrain_doctor_ok` summary flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(gbrain): relax E2E assertions to smoke-test contract
Per codex #12 (AgentSDK harness is non-deterministic): the E2E now
asserts the model followed the split-engine path WITHOUT requiring a
specific subcommand sequence. Three assertions:
1. AskUserQuestion was called (model reached interactive branches)
2. At least one of {gstack-gbrain-install, `gbrain init --pglite`,
`claude mcp add`} fired (model followed the skill, not a no-op)
3. The fake bearer token never leaked to CLAUDE.md (security regression)
Deterministic per-step coverage of the same flow lives in the gate-tier
unit tests (gbrain-local-status, gbrain-sync-skip, init-rollback,
upgrade-migration). The E2E exists to catch the "model can't follow
the skill at all" regression class, not to pin the exact tool sequence.
Test passes in 280s against the live Agent SDK.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(version): bump CLI smoke-test timeout to 15s (flaky at 5s under load)
The gstack-next-version integration smoke test spawns a child process
that does git operations + sibling-worktree probing. Wall time hovers
4-5s on M-series Macs; flakes at exactly 5001-5002ms when the test
suite runs under load (bun's parallel scheduling). Bumping per-test
timeout to 15s eliminates the flake without changing test logic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v1.37.0.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
40e34deb7a |
v1.35.0.0 feat: add /document-generate skill + enhance /document-release with Diataxis coverage map (#1477)
* feat(document-release): add Diataxis coverage map, diagram drift detection, and docs debt tracking Inspired by @doodlestein's documentation-website skill. Three key ideas incorporated: 1. Step 1.5: Coverage Map (Blast-Radius Analysis) — before editing any docs, scan the diff for new public surface and assess documentation coverage across Diataxis quadrants (reference/how-to/tutorial/explanation). Flags gaps without auto-generating content. 2. Architecture diagram drift detection — extracts entity names from ASCII/Mermaid diagrams and cross-references against the diff to catch stale diagrams. 3. Enhanced CHANGELOG sell test — Diataxis rubric scoring (0-3) replaces the subjective 'would a user want this?' check. 4. Documentation Debt section in PR body — surfaces coverage gaps and diagram drift as actionable items for future work. All changes are audit-only: the skill flags what's missing, never auto-generates missing documentation pages. Stays in its lane as a post-ship updater. Co-Authored-By: Hermes Agent <agent@nousresearch.com> * feat(document-generate): add Diataxis documentation generation skill New /document-generate skill, the companion to /document-release. While /document-release audits and fixes existing docs post-ship, /document-generate writes missing documentation from scratch using the Diataxis framework. Inspired by doodlestein documentation-website-for-software-project skill. Co-Authored-By: Hermes Agent <agent@nousresearch.com> * chore(docs): regenerate gstack/llms.txt with /document-generate entry CI's check-freshness step ran gen:skill-docs and found llms.txt stale — the index wasn't regenerated when /document-generate was added in the preceding commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(docs): regen document-generate/SKILL.md after merging main Main brought in the Non-ASCII characters directive in the AskUserQuestion Format resolver (scripts/resolvers/preamble/generate-ask-user-format.ts). Regenerating document-generate/SKILL.md propagates the new section into the generated output. check-freshness should now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(CLAUDE.md): add workflow for fork PRs from garrytan-agents Fork PRs from non-collaborators don't get base-repo secrets passed to their CI workflows, so eval/E2E jobs fail with empty-env auth. New section: when checking out a PR from garrytan-agents, push the branch to garrytan/gstack and re-target the PR from there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: sync project docs for v1.35.0.0 + bump VERSION - README.md: add /document-generate to skills table (Technical Writer category) + install-command skill lists - CLAUDE.md: add document-generate/ to project structure tree - SKILL.md.tmpl + regenerated SKILL.md: add /document-generate routing line ("write docs from scratch") - VERSION: 1.34.0.0 → 1.35.0.0 (MINOR: new skill + enhancement) CHANGELOG entry deferred to /ship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.35.0.0) CHANGELOG entry for the document-generate skill + document-release Diataxis enhancements. package.json synced to VERSION (drift repair after merging main which had bumped pkg to 1.34.2.0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: generate /document-generate Diataxis docs (tutorial + how-to + explanation) Fills the documentation debt items flagged by /document-release in PR #1477: critical-gap tutorial coverage and common-gap explanation coverage for the new /document-generate skill. Quadrants: tutorial, how-to, explanation (reference already covered by document-generate/SKILL.md). - docs/tutorial-document-generate.md (1009 words): newcomer 90-second flow - docs/howto-document-a-shipped-feature.md (770 words): post-ship audit + fill workflow - docs/explanation-diataxis-in-gstack.md (1106 words): why Diataxis, trade-offs, alternatives - README.md: links the three docs from the /document-generate skills-table row All cross-links verified — every Related section points at an existing file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Hermes Agent <agent@nousresearch.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b9371d716e |
v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478)
* fix(learnings): accept type:"investigation" in gstack-learnings-log The /investigate skill instructed agents to log learnings with type:"investigation", but bin/gstack-learnings-log:22 rejected anything not in [pattern, pitfall, preference, architecture, tool, operational]. Every investigation run exited 1 to stderr and the learning was dropped, silently to the user. Fix: add 'investigation' to ALLOWED_TYPES. Regression test: round-trips a learning with type:"investigation" and asserts exit 0 + file write; second test reads investigate/SKILL.md.tmpl and asserts it emits the literal type:"investigation" string, guarding the template/validator contract at both ends. Fixes #1423. Reported by diogolealassis. * fix(gbrain): engine detection survives gbrain ≥0.25 schema + non-zero doctor exit freshDetectEngineTier() in lib/gstack-memory-helpers.ts returned engine: "unknown" for every Supabase user on gbrain ≥0.25. Two stacking bugs: 1. execSync("gbrain doctor --json --fast 2>/dev/null") threw on non-zero exit. gbrain doctor exits 1 whenever health_score < 100, which is essentially every fresh install due to resolver_health warnings. The JSON output never reached the parser. 2. gbrain ≥0.25 shipped schema_version:2 doctor output that dropped the top-level 'engine' field entirely. Result: every /sync-gbrain on Supabase logged 'engine=unknown' and skipped all sync stages silently. Fix: - Replace execSync with execFileSync (no shell, no bash-specific 2>/dev/null redirect; portable to Windows). - Recover stdout from the thrown error object so non-zero exits still parse. - Fall back to reading gbrain's config.json (respecting GBRAIN_HOME env var, defaulting to ~/.gbrain/config.json) when doctor output doesn't surface an engine field. - Add logGbrainError() helper that appends one-line JSONL to ~/.gstack/.gbrain-errors.jsonl on parse failure, so future regressions leave a forensic trail. The "supabase" tier here means "remote postgres" in practice — gbrain config uses engine:"postgres" for both real Supabase and any other remote postgres (e.g. local-postgres-for-testing). Downstream sync code treats them identically, so the label compression is intentional and documented inline. Regression test: existing detectEngineTier suite now isolates HOME + GBRAIN_HOME + PATH to temp dirs (closes a flake source where the prior tests would read whatever was on the reviewer's machine). New test forces gbrain off PATH, writes a synthetic config.json with engine:"postgres", asserts detectEngineTier() returns engine:"supabase". Fixes #1415. Patch shape contributed by Shiv @shivasymbl (tested on gstack v1.31.0.0 + gbrain v0.31.3 + Supabase). * fix(codex): /codex review works on Codex CLI ≥0.130.0 Codex CLI 0.130.0 made [PROMPT] and --base <BRANCH> mutually exclusive at argv level. Step 2A of codex/SKILL.md.tmpl had always passed both (the filesystem boundary prefix as the prompt argument + the base branch), so every /codex review call died with: error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>' Fix: split Step 2A into two paths. Default (no custom user instructions): bare 'codex review --base <base>'. Codex's review prompt is internally diff-scoped, so the model focuses on the changes against base. The filesystem boundary prefix is dropped here because Codex 0.130 has no documented system-prompt config key (probed -c 'system_prompt="..."' against 0.130 — the flag is silently accepted but the value isn't applied). Skill files under .claude/ and agents/ are public, so this is a token-efficiency concern, not a safety one. Custom instructions (/codex review <focus>): route through codex exec with the diff written to a tempfile, inlined into the prompt between explicit DIFF_START / DIFF_END markers. The boundary is preserved here because codex exec isn't auto-scoped to the diff. The DIFF_START/END delimiters tell the model where data ends and instructions resume, which materially reduces prompt-injection hijack rates when the diff contains adversarial content. Note on bash semantics: codex's earlier review flagged the exec route as "command injection via $_DIFF interpolation." That framing is wrong — bash parameter expansion does not re-evaluate $(...) or backticks inside the expanded value, so a diff containing $(rm -rf /) is plain string data to codex exec. The real risk is prompt injection (model-side, not shell-side), which the DIFF_START/END pattern mitigates. Regression tests in test/codex-hardening.test.ts assert across BOTH codex/SKILL.md.tmpl AND the generated codex/SKILL.md: 1. No 'codex review' invocation line combines a quoted-string OR variable positional argument with --base. 2. Step 2A still contains either bare 'codex review --base' OR 'codex exec' (guards against accidental deletion of both fix paths). Fixes #1428. Reported by Stashub. * test: raise timeouts for slow integration tests Two test files were timing out at the default 5s on developer machines, both pre-existing on origin/main but unrelated to this branch's bug fixes: - test/gstack-artifacts-init.test.ts: 13 tests spawning real subprocesses via fake gh/glab/git shims in PATH. bun's fork+exec overhead pushed these past 5s consistently. Added a local test-wrapper that aliases test() with a 30s timeout (matches the brain-sync.test.ts pattern already in the repo). - test/gstack-next-version.test.ts: one integration smoke test that spawns 'bun run ./bin/gstack-next-version' and parses the resulting JSON. The subprocess does a 'gh pr list' against the live GitHub API to enumerate claimed version slots. Network latency makes 5s tight; raised this single test to 30s. No production code changed. The tests already passed deterministically once given enough wall-clock time. * chore: bump version and changelog (v1.34.2.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
386fe518f9 |
v1.34.1.0 fix: gstack-update-check resists stale GitHub raw CDN + adds semver-order guard (#1475)
* fix: gstack-update-check resolves remote VERSION via SHA-pinned URL Replace branch-raw fetch with git ls-remote + SHA-pinned raw URL. Add semver-order guard via sort -V so REMOTE < LOCAL stays silent instead of emitting a backwards UPGRADE_AVAILABLE line. Fence git ls-remote with GIT_TERMINAL_PROMPT=0 + 5s low-speed timeout. Honor explicit GSTACK_REMOTE_URL overrides for test fixtures and private mirrors. 3 new tests cover stale-CDN regression, multi-segment 1.9 vs 1.10 both directions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.34.1.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
0c88517a0f |
v1.34.0.0 feat: gstack consumable as submodule (factory-export API + AUTH_TOKEN env + import.meta.main gate) (#1472)
* feat(config): add resolveGstackHome, resolveChromiumProfile, cleanSingletonLocks Three new exported helpers in browse/src/config.ts: - resolveGstackHome(): honors GSTACK_HOME env, falls back to os.homedir()/.gstack Matches the existing convention in browse/src/telemetry.ts:26 and browse/src/domain-skills.ts:66. - resolveChromiumProfile(explicit?): explicit arg wins -> CHROMIUM_PROFILE env -> resolveGstackHome()/chromium-profile. Lets gbrowser pass per-workspace profile paths through ServerConfig instead of relying on ambient env state. - cleanSingletonLocks(dir): removes SingletonLock/Socket/Cookie via safeUnlinkQuiet. Defensive guard refuses to operate unless dir basename is 'chromium-profile' OR matches explicit CHROMIUM_PROFILE env value, preventing accidental deletion in unrelated directories. Extends browse/test/config.test.ts with 12 tests covering env precedence, guard behavior, ENOENT swallowing, and CHROMIUM_PROFILE override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security-classifier): TDZ when claude CLI is missing from PATH The checkTranscript Promise executor in browse/src/security-classifier.ts referenced `finish()` at the !claude early-return guard before declaring it 5 lines later. JavaScript throws ReferenceError: Cannot access 'finish' before initialization (TDZ) for that path, but the path is only reachable when resolveClaudeCommand returns null inside the spawn block (a TOCTOU window vs. the outer checkHaikuAvailable cache). Fix: hoist `let stdout = ''`, `let done = false`, and `const finish` block above `const claude = resolveClaudeCommand()` so finish is in scope before any reference to it. Behavior is identical when claude is on PATH; the fix only matters for the dormant missing-CLI degraded path. Adds browse/test/security-classifier-tdz.test.ts as the regression guard: clears PATH + override env vars, calls checkTranscript, asserts the result serializes with degraded:true and a meaningful reason field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browser-manager): isCustomChromium gate + per-workspace profile + lock cleanup Three fold-ins so gbrowser can become a thin overlay instead of forking browse-server: - Export isCustomChromium(): detects custom Chromium builds that bake the extension in as a component extension. Prefers explicit GSTACK_CHROMIUM_KIND=custom-extension-baked signal; falls back to GSTACK_CHROMIUM_PATH substring containing 'GBrowser' / 'gbrowser'. Gates the --load-extension push at launchHeaded so we don't trigger ServiceWorkerState::SetWorkerId DCHECK when two copies of the same service worker race to register. - Swap hardcoded path.join(HOME, '.gstack', 'chromium-profile') in launchHeaded for resolveChromiumProfile() so phoenix can pass a per-workspace profile via CHROMIUM_PROFILE env (one daemon per gbd workspace, each with a distinct profile dir). - Call cleanSingletonLocks(userDataDir) immediately after mkdirSync. Chromium's ProcessSingleton refuses to start when stale SingletonLock/Socket/Cookie files survive a SIGKILL or hard crash; pre-launch cleanup defends against the crash case. Safe under external coordination (gbd.lock for gbrowser, single-instance CLI check for gstack). The existing .auth.json write at L291-302 is preserved — extensions still need it for bootstrap even when component-baked. Adds browse/test/browser-manager-custom-chromium.test.ts with 8 tests covering both the env-kind and path-substring signals plus stock / playwright-bundled Chromium negative cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): factory-export API surface + import.meta.main gate Surfaces the embedder API gbrowser (phoenix) needs to consume gstack as a submodule, and gates module-load side effects so the file is safe to import without auto-starting a daemon. Changes to browse/src/server.ts: - AUTH_TOKEN now honors process.env.AUTH_TOKEN (trimmed) before falling back to crypto.randomUUID(). Whitespace-only values are rejected so the security boundary can't be silently weakened. - New exported types: ServerConfig and ServerHandle. ServerConfig documents the full factory contract (authToken, browsePort, idleTimeoutMs, config, browserManager, chromiumProfile, xvfb, proxyBridge, startTime, beforeRoute). ServerHandle documents the return shape (fetchLocal, fetchTunnel, shutdown, stopListeners). Caller-owned lifecycle annotations on xvfb and proxyBridge prevent double-close bugs from surprise ownership. - New exported function: resolveConfigFromEnv() builds a ServerConfig-shaped object from process.env for CLI use. Embedders construct their own ServerConfig explicitly. - start() is now exported. Embedders can call it with env vars set as a v1 escape hatch until full buildFetchHandler extraction lands. - Signal handlers (SIGINT, SIGTERM, Windows exit, uncaughtException, unhandledRejection) and the auto-kickoff at module bottom are now wrapped in `if (import.meta.main)`. CLI path is unchanged. Embedders register their own handlers. - shutdown() and emergencyCleanup() now call cleanSingletonLocks( resolveChromiumProfile()) instead of inline path+loop. Single implementation, defensive guard, honors per-workspace CHROMIUM_PROFILE. New tests: - browse/test/server-no-import-side-effects.test.ts: spawns a fresh Bun subprocess that imports server.ts, asserts no signal handlers registered, no state-dir populated. Guards the core refactor invariant from regression. - browse/test/server-factory.test.ts: 12 tests covering AUTH_TOKEN env behavior (honored, whitespace-rejected, trimmed), preserved exports (TUNNEL_COMMANDS, canDispatchOverTunnel), and ServerConfig/ServerHandle type compatibility. Deferred to follow-up PR: full buildFetchHandler extraction that hoists the 13 module-level mutables + helpers into a factory closure. Phoenix can ship v0.6.0.0 against the start()+env surface today; the cleaner factory comes next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: harden auth-token validation, TDZ try/catch, lockfile path safety Three security hardening fixes from /ship adversarial review: 1. AUTH_TOKEN unicode-whitespace bypass (server.ts:67-83). Old: `process.env.AUTH_TOKEN?.trim() || randomUUID()` only stripped ASCII whitespace. A misconfigured embedder shipping AUTH_TOKEN=$'' (BOM) or $'' (zero-width space) would silently get a one-character bearer secret. New `sanitizeAuthToken()` strips all unicode whitespace via regex and requires >= 16 chars after stripping; anything shorter falls back to crypto.randomUUID(). Same sanitizer used by `resolveConfigFromEnv()` so the embedder path is hardened too. 2. security-classifier.ts checkTranscript safety net. `resolveClaudeCommand()` and `spawn()` can throw under transient conditions (PATH probe failure, posix_spawn ENOMEM). Old code let the throw propagate and rejected the Promise with a raw exception. Now wrapped in try/catch that calls finish() with a degraded signal, matching the graceful-degradation contract the layer already promises for missing-CLI / exit-nonzero / parse-error. 3. cleanSingletonLocks defensive guard tightened (config.ts). Old: basename === 'chromium-profile' OR userDataDir === $CHROMIUM_PROFILE. The second branch was env-controlled and the first was bypassable by passing a relative path that resolved to chromium-profile via CWD drift. New guard: refuses relative paths outright, resolves both sides via path.resolve(), and only accepts the env-match path when $CHROMIUM_PROFILE is itself absolute. Test updates: replace the old `.trim()` test with three new cases covering unicode-whitespace stripping, short-token rejection, and zero-width-only rejection (server-factory.test.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.34.0.0) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
dc6252d1df |
v1.33.2.0 fix: setup guards against Conductor worktree pollution of global install (#1446)
* fix(setup): skip Claude skill registration when run from a worktree of the global install Add a guard before `ln -snf "$SOURCE_GSTACK_DIR" "$HOME/.claude/skills/gstack"` that detects whether the target already exists as a separate real directory. On macOS/BSD, `ln -snf SRC DST` does not replace a real DST — it creates DST/$(basename SRC) → SRC inside it. Running ./setup from each Conductor worktree of the gstack repo was leaking per-worktree child symlinks into the global install, which Claude Code then picked up as separate top-level skills. The guard uses `cd ... && pwd -P` to resolve the existing real dir and compare against the source (mirroring setup's own `SOURCE_GSTACK_DIR` resolution). When they differ, prints a four-line remediation hint naming both paths and exits the Claude registration branch cleanly. Binaries still build locally. The four other code paths through this branch are unchanged: fresh install, retarget an existing symlink, self-rerun where the existing dir resolves to the same source, and --local installs. Includes 8 tests covering static guard placement, `pwd -P` resolution, the remediation message, a behavioral reproduction of the BSD `ln -snf` child- symlink bug, and every branch of the guard (skip on real-dir-elsewhere, allow on fresh, allow on existing symlink, allow on self-rerun). * chore: bump version and changelog (v1.33.2.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
1a4f0c9c15 |
v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills (#1442)
* fix(learnings): use token-OR matching in gstack-learnings-search --query
Split the query on whitespace into tokens; a learning matches if ANY
token appears as a substring in ANY of key/insight/files. Previously
the whole query was a single substring, so multi-word queries like
"debug investigation" only matched learnings whose insight contained
that exact contiguous phrase, which is usually nothing.
Whitespace-only query falls through to no-query (matches today's no-flag
behavior). Single-word queries behave exactly as before.
Adds test/gstack-learnings-search.test.ts: 3 assertions covering
multi-token, single-token, and no-query backwards compat.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(resolver): parameterized LEARNINGS_SEARCH with shell-injection guard
The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that
gets interpolated as --query "<keyword>" into the generated bash. Empty
value falls through to no-query (principle of least surprise: a stray
{{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a
build failure). Pattern reuses the parameterized-macro parsing from
composition.ts. The 13 templates that don't pass a query stay
byte-identical in their generated SKILL.md output.
Shell-injection guard: the query value is whitelisted to
^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks,
semicolons, or quotes throw a loud build error instead of emitting
executable bash. Static template queries are safe by inspection;
this defends against future contributors writing dangerous values.
Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args,
claude+query=foo bar on both cross-project and project-scoped branches,
codex host variant, empty value semantics, and shell-injection payloads
(\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(skills): task-shaped queries + mid-flow refresh in /investigate /qa /ship
The three long skills now pull learnings keyed to their theme at the
top, then re-pull at phase boundaries as work shifts to new sub-tasks.
Top-of-skill queries (5-6 token unions, token-OR matched):
- investigate: "debug investigation root cause hypothesis bug fix"
- qa: "qa testing bug regression flake fixture"
- ship: "release ship version changelog merge pr"
Mid-flow refresh blocks (concrete keyword recipe + worked examples):
- investigate: between Phase 1 (hypothesis) and Phase 2 (analysis),
keyed to the hypothesis noun. Examples: auth-cookie, session-expiry.
- qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the
buggy component name. Examples: checkout-button, signup-form.
- ship: just before Step 12 (VERSION bump), keyed to the headline
feature. Examples: learnings-search, pacing, worktree-ship.
Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes,
dots, colons) so dynamic queries cannot inject shell metacharacters.
The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form.
Backwards-compat verified via diff: their generated SKILL.md output is
byte-identical to before this change.
Golden ship fixtures regenerated to match the new ship/SKILL.md output.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore: bump version and changelog (v1.33.1.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test: refresh codex+factory ship golden fixtures
Follow-up to
|
||
|
|
d21ba06b5a |
v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup (#1432)
* refactor: batch-import architecture (D1-D8) + F6 atomic state + F9 full-file hash
bin/gstack-memory-ingest.ts: rewrite memory ingest around `gbrain import <dir>`
batch path. Replaces per-file gbrainPutPage loop (~470s of subprocess startup
per cold run) with prepare-then-batch:
walkAllSources
-> preparePages: mtime-skip + optional gitleaks (--scan-secrets) + parse
-> writeStaged: mkdir -p per slug segment, hierarchical (D1)
-> snapshot ~/.gbrain/sync-failures.jsonl byte offset
-> runGbrainImport (async spawn) -> parseImportJson
-> readNewFailures: read appended bytes, map back to source paths (D7)
-> state.sessions[path] = {...} for files NOT in failed set
-> saveStateAtomic (F6) + cleanupStagingDir
Architecture decisions:
D1 hierarchical staging dir
D2 cut over, deleted gbrainPutPage entirely
D3 source-file gitleaks made opt-in via --scan-secrets (gstack-brain-sync
owns the cross-machine boundary; per-file scan was redundant ~470s tax)
D4 OK/ERR verdict (no DEGRADED tri-state)
D5 unified state schema (no separate skip-list)
D6 trust gbrain content_hash idempotency (no skip_reason bookkeeping)
D7 byte-offset snapshot of sync-failures.jsonl + per-source mapping
F6 saveState uses tmp+rename atomic write
F9 fileSha256 removes 1MB cap; full-file hash (no more silent tail-edit
misses on long partial transcripts)
Signal handling: installSignalForwarder propagates SIGTERM/SIGINT to the
gbrain child process AND synchronously cleans the staging dir before
process.exit. Pre-fix, orchestrator timeouts left gbrain processes
orphaned holding the PGLite write lock (observed: 15-hour-CPU-time
orphan still alive a day later).
parseImportJson returns null on unparseable output (treated as ERR by
caller) instead of silently zeroing through.
gbrainAvailable() probes for the `import` subcommand instead of `put`.
Plan + review chain at /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: orchestrator OK/ERR verdict parser for batch memory ingest
gstack-gbrain-sync.ts: memory-stage parser now picks [memory-ingest] ERR
lines preferentially over the latest [memory-ingest] line, strips the
prefix and any leading 'ERR: ' for cleaner summary output, and surfaces
'(killed by signal / timeout)' when the child exits with status=null.
Matches D6's OK/ERR contract: per-file failures (FILE_TOO_LARGE etc.)
show in the summary count but only system-level failures (gbrain crash,
process kill, missing CLI) mark the stage ERR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: batch-ingest writer regressions + refresh golden ship fixtures
test/gstack-memory-ingest.test.ts: 5 new tests for the batch-import
architecture:
1. D1 hierarchical staging slug round-trip — asserts staged file lives
in transcripts/claude-code/<dir>/*.md, not flat at staging root
2. Frontmatter injection — asserts title/type/tags written into the
staged page's YAML block
3. D7 sync-failures.jsonl exclusion — files listed as failed by
gbrain do NOT get state-recorded; one of two test sessions lands,
the other stays un-ingested for retry next run
4. Missing-`import`-subcommand error path — when gbrain only advertises
legacy `put`, memory-ingest exits 1 with [memory-ingest] ERR
5. --scan-secrets opt-in path — verifies a dirty-source file is
skipped via the secret-scan match when the flag is on, while a
clean session in the same run still gets staged
Replaces the prior put-per-file shim with an import-batch shim. The
shim fails loudly (exit 99) if the new code ever regresses to per-file
`gbrain put` calls.
test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md: refresh
golden baselines to match the current generated SKILL.md content after
the v1.31.0.0 AskUserQuestion fallback-clause deletion. Goldens were
stale from that release; test was failing on origin/main before this
PR. Caught by the /ship test pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* v1.33.0.0 docs: design doc, P2 perf TODOs, gbrain guidance block, changelog
docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8
decisions (D1-D8), source-verified gbrain behaviors (content_hash
idempotency, frontmatter parity, path-authoritative slug, per-file
failure surface), measured performance vs plan target, F9 hash
migration one-time cliff note, and follow-up TODOs.
CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain
indicating this worktree's pin and how the agent should prefer gbrain
search over Grep for semantic queries.
TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation
(5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1
SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import
at the prepare-batch level for true no-op fast paths.
VERSION + package.json: bump to 1.33.0.0 (queue-aware via
bin/gstack-next-version — skipped v1.32.0.0 which is claimed by
sibling worktree garrytan/wellington / PR #1431).
CHANGELOG.md: v1.33.0.0 entry per the release-summary format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: setup-gbrain/memory.md reflects opt-in per-file gitleaks
Per-file gitleaks scanning during memory ingest is now opt-in via
--scan-secrets (or GSTACK_MEMORY_INGEST_SCAN_SECRETS=1). Update the
user-facing reference doc so it stops claiming "every page passes
through gitleaks." Also corrects the /gbrain-sync → /sync-gbrain
command typo and the post-incident recovery section.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|