gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-06-22 09:39:59 +02:00

Author	SHA1	Message	Date
Garry Tan	9fd03fae9e	v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077 ) * fix(gbrain): stop forcing GBRAIN_PREPARE on transaction-mode poolers (#1965) buildGbrainEnv auto-set GBRAIN_PREPARE=true whenever DATABASE_URL targeted port 6543, and the /sync-gbrain capability check exported it for the rest of the skill run. Both had the semantics inverted: gbrain auto-disables prepared statements on transaction-mode poolers because they break every write there ("prepared statement does not exist"); GBRAIN_PREPARE=true is gbrain's documented override for SESSION-mode poolers on 6543, not a requirement for transaction mode. The #1435 search symptom the auto-set worked around was fixed gbrain-side. Remove both force-sets. A caller-set GBRAIN_PREPARE (either value) still passes through untouched, preserving the session-mode-on-6543 escape hatch. isTransactionModePooler stays exported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(gbrain): classify probe timeout as its own status; sync proceeds instead of skipping (#1964) The 5s engine probe misclassified healthy-but-slow engines (cold Supabase pooler connections measured at 6.9-10.7s) as broken-config, so /sync-gbrain silently skipped code+memory and told the user their config was malformed. - New "timeout" status: probe killed at the deadline with no recognized stderr pattern. Default deadline is now 15s, overridable via GSTACK_GBRAIN_PROBE_TIMEOUT_MS (tests set 300ms against a fake that sleeps 2s). - Sync stages PROCEED on timeout with a stderr warning naming the env knob; a genuinely-dead engine surfaces its real error at the first operation instead of a false config diagnosis. - Consistency everywhere "ok" gated behavior: gstack-gbrain-detect --is-ok exits 0 on timeout, and gen-skill-docs' detection gate accepts it, so a slow engine no longer silently suppresses brain-aware features. - Status cache: key now includes the effective probe timeout (raising it invalidates a cached timeout) and GBRAIN_HOME; config detection honors GBRAIN_HOME so relocated-home users stop being misclassified as missing-config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(bins): cygpath-normalize SCRIPT_DIR for bun imports; surface learnings-log errors (#1950) Under Windows git-bash, pwd yields a POSIX path (/c/Users/...) that Bun on Windows cannot resolve as an ES module specifier. gstack-learnings-log interpolates SCRIPT_DIR into a bun -e import, so every invocation died with "Cannot find module" — and 2>/dev/null swallowed the error, silently dropping every AI-logged learning for Windows users. - 3-line cygpath -m guard in gstack-learnings-log and gstack-question-log (which gains the same import shape in the next commit). Matches the duplicated IS_WINDOWS convention in setup; no shared shell lib exists. - learnings-log adopts question-log's set +e / TMPERR capture pattern wholesale: validation errors now print to stderr. The old `if [ $? -ne 0 ]` check was dead code under set -euo pipefail — the script exited at the failing assignment before reaching it. - New test/bin-windows-bun-import-paths.test.ts: static invariant (any bash bin interpolating $SCRIPT_DIR into a bun -e import must carry the guard) + behavioral end-to-end run invoked via `bash <bin>` — added to the windows-free-tests workflow list so the conversion is proven on the only platform where the bug exists. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(question-log): dedupe INJECTION_PATTERNS via lib/jsonl-store (#1934) bin/gstack-question-log carried a local copy of the injection-pattern list, so pattern fixes to lib/jsonl-store.ts never propagated — including the /override[:\s]/i false-positive fix arriving via community PR #1940. Import the shared hasInjection instead (enabled by the previous commit's cygpath guard). question-log also gets the lib's stricter superset (human:, disregard, from-now-on, approve-all patterns). Tests pin the contract in a #1940-order-independent way: an "Override: ignore all previous instructions" header is rejected, "prose overrides the deterministic table" is accepted, and a static invariant keeps local INJECTION_PATTERNS duplicates out of the bin. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(security): community-pulse + both dashboards never report fake zeros (#1947) The security-signaling surface failed open at three layers — every failure mode read as a reassuring "0 attacks" / "0 installs": - community-pulse edge function: supabase-js returns {data,error} without throwing, and all five queries discarded `error` — a DB outage produced real-looking zeros via the SUCCESS path, and the catch (also returning zeros with HTTP 200) was unreachable for query failures. Every query now destructures and throws; the catch serves the stale cache (marked "stale": true) when one exists, else 503 {"error":"pulse_unavailable"}. Success responses carry "status":"ok" so clients can distinguish authoritative data from legacy backends. NOTE: the edge function deploys out-of-band (supabase functions deploy community-pulse). - gstack-security-dashboard: captures the HTTP status; non-200 / network failure / error body / missing section → "unknown — backend error"; jq missing → "unknown — install jq" (the lossy grep fallback broke on nested arrays and under-reported attacks as zero — removed); a 200 without the new marker shows figures with an "unverified (legacy backend)" note. Also fixes a latent display bug: the TOTAL grep matched the digit 7 inside "attacks_last_7_days" and misreported every count. - gstack-community-dashboard: same class — curl \|\| echo "{}" plus grep \|\| echo "0" printed "Weekly active installs: 0" on any failure. Now "unknown — backend error (HTTP N)". test/security-dashboard-fallback.test.ts pins the matrix (200+marker, 200-legacy, 503, network failure) x (jq present, jq absent) for both bins: "unknown" states never render as 0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(telemetry): redact error_message spans before they leave the machine (#1947) error_message was uploaded with only quote/newline escaping — stack traces and failed-API errors can embed credentials, private paths, and hostnames, and the sync path strips only _repo_slug/_branch. New lib/redact-engine.ts export redactFindingSpans(): replaces EVERY finding's span with <REDACTED-{id}> regardless of tier (applyRedactions is the interactive PII-only path and exits nonzero on credential findings, so it can't serve machine egress). Returns null when a span can't be located — callers drop the whole payload rather than risk a leak. gstack-telemetry-log pipes error_message through it at LOG time, so the local JSONL at rest is clean too; surrounding text survives for crash triage. FAIL CLOSED: bun missing, engine error, or non-JSON-string output all null the field. Tests pin: embedded ghp_ token → <REDACTED-github.pat> with context intact; redactor unavailable → null; raw bytes on disk never contain the token. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(redact): prepush guard fails closed on git failure; /ship owns hook install (#1946) Two gaps closed: 1. Fail closed. The git() helper returned "" on ANY non-zero exit or maxBuffer overflow (status null), addedLinesFor produced an empty string, and the push sailed through unscanned — fail-open on exactly the oversized-diff case where a large secret-bearing blob is most likely. The diff call now uses a strict variant that throws; main blocks with a clear message naming the GSTACK_REDACT_PREPUSH=skip escape valve. Probe calls (symbolic-ref, rev-parse, merge-base) keep the permissive helper — their failures are normal control flow. 2. Install path. The hook was installed by nothing ("opt-in, installed by nothing" was the issue's words). ./setup runs in the gstack checkout — the wrong repo for a per-project hook — so it gets a one-line hint only. /ship owns per-repo install: config redact_prepush_hook=true + hook missing → silent install (consent already given); config unset + no ~/.gstack/.redact-prepush-prompted marker → one-time machine-wide AskUserQuestion offer, answer persisted. ship/SKILL.md regenerated in this same commit (check-freshness bisect discipline). Tests: unscannable diff (bogus SHAs) → exit 1 + valve named; empty-but- successful diff → exit 0; static asserts pin setup as hint-only and the ship template as the installer surface. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(redact): six new credential patterns — GitLab, HuggingFace, npm, DigitalOcean, Bearer, GCP SA (#1946) Coverage gaps from the #1946 security review, including token types for tooling gstack itself drives (glab): HIGH (block): gitlab.token (glpat-/glptt-/gldt-), huggingface.token (hf_), npm.token (npm_), digitalocean.token (dop_v1_), gcp.service_account (the JSON-escaped "private_key" form that dodges pem.private_key's literal-block match when minified, confirmed by "private_key_id" proximity). MEDIUM (warn): auth.bearer — the most FP-prone shape in the set (docs are full of "Authorization: Bearer <token>"), so it requires header-context proximity and the same entropy>=3.0 + placeholder validator recipe as env.kv. "Bearer YOUR_TOKEN_HERE" never fires; calibration over coverage, per the cries-wolf principle. All shapes are linear-time; test/redact-pattern-lint.test.ts covers them automatically. Engine tests add positive + placeholder-negative cases per pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: coverage-audit additions for the fix wave Ship Step 7 gap-fill (all passing, 248 tests across the touched suites): memory + dream stage probe-timeout proceeds, gbrain-detect override paths, stale-flag passthrough, 200-body-missing-.security fail-closed case, telemetry redaction edges, and credential-pattern edge cases. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: pre-landing review fixes Review army findings (1 critical, auto-fixed with regression tests): - CRITICAL (security specialist, verified live): redactFindingSpans spliced only the regex capture span, and pem.private_key / gcp.service_account capture just the BEGIN-header — the key body survived "redaction" and shipped via telemetry. Marker-only patterns now drop the whole payload (null, fail closed). Overlapping spans (Bearer+JWT on the same bytes) are coalesced before splicing so stale offsets can't leave partial secret bytes behind. - gitStrict: drop the dead `\|\| r.status === null` disjunct (null !== 0 already covers it); add the signal-kill/null-status regression test the docstring promised. - security-dashboard human mode flags stale snapshots ("figures may be out of date") instead of presenting frozen counts as current. - community-dashboard marker check uses jq when available — the grep-only variant misclassified whitespaced/reserialized bodies as legacy. - telemetry fail-closed test now shadows bun with a failing stub (deterministic on any host layout); stale "five status cases" describe title renamed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: adversarial review fixes (Claude + Codex cross-model passes) Both adversarial passes ran against the wave; every FIXABLE finding landed with a regression test: - probeTimeoutMs clamps to >=1ms: a fractional override floored to 0, and execFileSync treats timeout:0 as NO timeout — the probe that exists to bound hangs could hang forever (found by both models independently). - /ship silent hook install now requires the hooks dir to live inside .git: with core.hooksPath (husky's COMMITTED .husky/), the chaining installer would have renamed the team's committed pre-push and written a machine-local wrapper into the working tree (found by both models). - gstack-config gbrain-refresh accepts the "timeout" status — the last consumer still gating on literal "ok" (Codex); gstack-gbrain-detect's config-derived fields honor GBRAIN_HOME so the detection JSON can't report status ok alongside config_exists false (Codex). - prepush: a remote sha absent locally (shallow clone / stale fetch) falls back to the merge-base/empty-tree range — scans MORE, never blocks a legitimate push into training users toward --no-verify. - dashboards: curl's own 000 no longer doubles to "HTTP 000000"; the community dashboard flags stale snapshots like the security one; array sections parse via jq (the sed/grep loops truncated at the first ']'); the no-jq marker grep tolerates whitespace. - telemetry: multi-line redactor output nulls the field instead of corrupting the JSONL record; setup's hint fires only when the config key is genuinely unset (an explicit false is a recorded decline); the /ship prompt marker honors GSTACK_HOME. Kept as designed (cross-model tension noted): Bearer stays MEDIUM in the prepush gate — a HIGH Bearer would block every docs example; the entropy validator can't eliminate that FP class, and MEDIUM warns visibly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore: bump version and changelog (v1.57.11.0) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs: P1 TODO — eval harness live progress + incremental persistence Root-caused during this ship: a killed eval run was indistinguishable from a healthy one for hours (per-file output buffering across mega test files, no incremental eval-store writes, no honest liveness signal). Full context and starting points in the entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: fix operational-learning E2E fixture — copy lib/jsonl-store.ts Pre-existing breakage, proven on main: gstack-learnings-log has imported lib/jsonl-store.ts (shared injection patterns) since v1.57.5.0 / #1910, but the fixture copies only the bin scripts — the bin exits 1 before writing anything, on main silently (stderr swallowed) and on this branch loudly (the #1950 error-surfacing made the four-day-old failure visible). A real install always ships bin/ and lib/ together; the fixture now does too. Verified: the fixture-shaped invocation writes the learning (exit 0) with lib present, exits 1 on both main and this branch without it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(ios-qa): isolate E2E tests under --concurrent (3 real races) The ios-qa E2E file failed intermittently under `bun test --concurrent` (the eval harness default). Three distinct shared-state races, all fixed: 1. Shared pidfile: a module-level `workDir` reassigned in beforeEach was clobbered by parallel tests, so concurrent daemons collided on the same pidfile and the loser returned `already_running`. Each test now gets its own dir via makeWorkDir(). 2. process.env path globals: tests set GSTACK_IOS_AUDIT_PATH / _ATTEMPTS_PATH / _ALLOWLIST_PATH on the shared process env; concurrent tests stomped each other's audit/attempts destinations. Threaded auditPath/attemptsPath/allowlistPath through DaemonOptions (and mintForCaller) as explicit args — env is no longer load-bearing. 3. afterEach cleanup race: the per-test cleanup drained a shared dir array, so the first test to finish deleted still-running tests' workDirs mid-assertion. Moved to afterAll (cleans once, after all settle). Verified: 5/5 clean full-suite runs at --max-concurrency 15 (was intermittent); daemon unit suite 91/91; daemon source compiles. The paths default to the env-derived locations when options are omitted, so the production CLI path is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(pty): pin spawned claude to EVALS model chain (default claude-sonnet-4-6) launchClaudePty spawned the interactive `claude` TUI with no --model flag, so the child inherited the operator's ~/.claude/settings.json model. On a slow-thinking model that meant 5+ min of extended thinking on empty plan-mode context, timing out the plan-mode smoke tests regardless of contention. Pin the model via opts.model ?? EVALS_MODEL ?? 'claude-sonnet-4-6' — byte-identical to session-runner.ts:144, so PTY and `claude -p` evals always agree. Pushed before extraArgs (last flag wins, so a per-test --model still overrides). Placement leaves the spawn region byte-stable for a clean merge with the in-flight hermetic-env branch. Plumbed model through the three plan-skill wrappers. Static-grep tripwires guard the pin, its fallback chain, the before-extraArgs ordering, and all three wrapper forwards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(pty): detect markdown bold-bullet prose AUQs (fixes office-hours smoke) office-hours auto-mode renders its mode question as `- Building a startup` markdown bullets (office-hours/SKILL.md.tmpl:102) with no letter/number marker. isProseAUQVisible only matched `A)`-style lettered or `1.`-style numbered options, so the question went undetected: the model surfaced it at ~2m19s (well under the 300s budget) but the harness kept scoring the run "working" off the spinner glyphs and timed out — a false timeout on a question that was already on screen. Add Pattern 3: when an interrogative line ('?') is present AND 3+ bold-bullet markers (`- *`) appear in the 4KB tail, classify as a prose AUQ. Bold is the discriminator vs incidental prose bullets; the line anchor is dropped (stripAnsi can collapse option lines) and the existing `❯ 1.` cursor gate still defers to a live native list. Wires through the existing classifyVisible 'asked' path and the timeout high-water-mark, so office-hours now classifies 'asked' instead of 'timeout'. Five unit cases: the office-hours render passes; no-'?', <3-bullet, plain-bullet, and native-cursor cases stay false. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> test(pty): detect stripAnsi-collapsed prose AUQs + judge spinner-precedence The plan-eng/plan-design plan-mode + finding-floor smokes timed out even when the skill HAD rendered a complete prose AskUserQuestion and was waiting: the PTY strips cursor-positioning escapes, collapsing the option newlines/spaces so "A) ..." arrives as "A(recommended)" / "-B:" and "Reply with A, B, or C" as "ReplywithA,B,orC". Every line-anchored detector (Patterns 1-3) returns false on those bytes, so proseAUQEverObserved never latched and the run timed out on a question that was already on screen. Add Pattern 4/5: a two-signal collapsed-form detector — a reply/recommendation marker (space-insensitive "reply with [A-D]", "Recommendation:", or "(recommended)") AND 2+ distinct A-D letters each punctuated by ) : or (. The conjunction is what separates a real AUQ from incidental report prose; verified true on the verbatim failing-run buffers where Patterns 1-3 return false. Also fix the Haiku judge spinner bias: of 614 verdicts, 569 were 'working' and 95 of those noted a question was visible — Claude Code keeps the spinner animating at an idle prose decision, so the judge coin-flipped. Add a precedence override: when an option list AND a Recommendation/Reply instruction are both visible, classify WAITING even with spinner glyphs. Kept the strict dual-signal gate (never option-list-alone) so auto-decide-preserved doesn't flip. 5 unit tests pin the two-signal contract (2 true on real collapsed bytes, 3 false guards). 90 -> 95 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(plan-review): ask-first scope gate for plan-eng + plan-design review On an empty/cold invocation, plan-eng-review and plan-design-review would dive straight into repo exploration (plan-eng) or a 7-pass mockup+audit (plan-design) and only ask the user much later, if at all. plan-ceo-review already asks first via an unconditional Step-0 gate and behaves well; these two did not. Add a hard-STOP scope gate as the FIRST operational instruction in each skill (above the design-doc check / pre-review audit / mockup defaults it explicitly overrides): the first tool call must be AskUserQuestion confirming the review target, before any git/Read/Grep/Glob/Bash or mockup generation. Under --disallowedTools the options render as plain column-0 lettered prose with a Recommendation + "Reply with A, B, or C" line so the answer is detectable. This is correct cold-start UX (confirm what to review before grinding a full review on nothing) and it is the product half of the plan-mode smoke fix; the harness collapsed-form detector is the deterministic half that catches the ask however it renders. Templates + regenerated SKILL.md (default variant). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(tiers): reclassify stochastic plan-eng/plan-design ask-first smokes as periodic plan-eng-review and plan-design-review run a long explore/audit before their first AskUserQuestion, so whether the plan-mode + finding-floor smokes reach a terminal outcome within the 300s/600s budget depends on stochastic ask-first compliance (measured ~50-67%/run even with the hardened gate). Per the "non-deterministic -> periodic" tiering rule, move the four affected smokes (plan-eng/plan-design review-plan-mode + finding-floor) to periodic. The deterministic harness fix (collapsed-form detector + judge precedence) and the ask-first gate lift these from always-failing to mostly-passing and are the real product+harness improvements; periodic monitoring tracks the rate weekly without blocking PRs on an LLM coin-flip. plan-ceo/plan-devex ask-first reliably and stay gate-tier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(evals): gate the deterministic PTY plan-mode smokes in CI The real-PTY plan-mode smokes never ran in CI — the gate was local-only. Add an e2e-pty-plan-smoke matrix suite running the two deterministically-reliable ones (office-hours-auto-mode, plan-mode-no-op) so a regression there blocks PRs. The stochastic plan-eng/plan-design ask-first smokes stay periodic (touchfiles E2E_TIERS) and are not CI-gated. A fresh CI container has no ~/.claude.json, so the spawned interactive `claude` would wedge on the onboarding + API-key-approval dialog. Add a scoped seed step (hasCompletedOnboarding + key approval, its own ANTHROPIC_API_KEY env) before the run — mirrors what the hermetic E2E child env seeds. Per-suite timeout override (35 min) via matrix.suite.timeout so the PTY suite has headroom for --retry 2 without bumping the other 12 suites. Report runner count 12 -> 13. Validate via workflow_dispatch before relying on the gate (PTY-in-CI is new). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(evals): install gstack skill registry for the PTY smoke suite The first dry-run of e2e-pty-plan-smoke failed: the spawned interactive `claude` printed "Unknown command: /plan-ceo-review". .claude/skills is gitignored, so a fresh CI checkout has no gstack skill registry and the TUI can't resolve /office-hours or /plan-ceo-review. Add a Register step (scoped to the suite, after Seed, before Run) that mirrors setup's --no-prefix user-scoped registry minimally: $HOME/.claude/skills/gstack -> repo (resolves the preambles' absolute ~/.claude/skills/gstack/bin/* and <skill>/sections/* paths) + per-skill SKILL.md/sections symlinks for the two skills these tests invoke. HOME is /github/home in this container and the runner adds no HOME/CLAUDE_CONFIG_DIR override (no hermetic mode), so $HOME is the right anchor — the Seed step already proved claude reads it. No ./setup (binary build + Chromium + fonts + /dev/tty prompt); SKILL.md + bin/ + sections/ are committed. Self-validating: fails the step loudly on a dangling symlink or missing `name:` frontmatter, so a moved target surfaces here instead of as a silent 35-min "Unknown command" timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.58.4.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-21 07:15:19 -07:00
Garry Tan	3bef43bc5a	v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808 ) * fix(jsonl-merge): make equal-ts resolution converge across machines The JSONL append merge driver sorted timestamped entries by (0, ts) with no further tiebreaker. Equal-ts entries then fell back to stable-sort insertion order (base, ours, theirs), but git assigns the local side to "ours", so two machines resolving the same conflict emitted equal-ts lines in opposite order. The merged files diverged and never converged. gstack-telemetry-log uses second-granularity timestamps, so same-ts collisions are routine. Add the line content as the final sort tiebreaker so the order is total and side-independent. Add a regression test that runs the driver with the two sides swapped and asserts identical output. * fix(gen-skill-docs): quote frontmatter descriptions with interior colons (#1778) Generated SKILL.md frontmatter emitted the catalog-trimmed description: as a plain YAML scalar. A description with an interior ": " (e.g. "Ship workflow: detect...") parses as a nested mapping under strict YAML loaders, so Codex/OpenAI skill loading rejected those skills. applyCatalogTrim now routes the value through toYamlInlineScalar, which quotes (via JSON.stringify) only when a plain scalar would be invalid — interior ": ", inline " #", leading indicator char, or surrounding whitespace. Strings that are already valid plain scalars pass through unchanged to keep regen diffs small. The frontmatter test now parses every generated block (Claude + Codex hosts) with Bun.YAML.parse instead of string-checking that name:/description: substrings exist, so the regression can't reappear. Runs under `bun test` (already in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(skills): regenerate SKILL.md after frontmatter quoting fix (#1778) 9 catalog-trimmed descriptions whose values contain an interior colon or inline- comment marker are now quoted. Generated output only; rerun of bun run gen:skill-docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(gbrain-sources): centralize sources-list shape handling in parseSourcesList (#1576) #1576's crash in sourceLocalPath was already fixed in v1.42.0.0 (dual-shape handling). But the readers disagreed: sourceLocalPath accepted both the wrapped {sources:[...]} object (v0.20+) and a bare array, while probeSource and sourcePageCount accepted only the wrapped shape. Extract one parseSourcesList() normalizer and route all three through it, so the shape assumption lives in a single place. This is also the base the #1734 remote_url audit builds on. parseSourcesList returns [] for null/garbage rather than throwing; callers treat 'no rows' as absent. New test/gbrain-sources-parse.test.ts pins both shapes plus the garbage paths and confirms config.remote_url survives for the audit. #1576 is closeable as already-fixed in v1.42.0.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gbrain): spawn gbrain + brain-sync through a shell on Windows (#1731) On Windows, bun/npm install gbrain as a gbrain.cmd/.ps1 shim and gstack-brain-sync is a bash shebang script. spawnSync/spawn/execFileSync resolve neither without a shell, so the child spawn failed ENOENT — on the sync orchestrator this surfaced as 'brain-sync exited undefined' (#1731). Add NEEDS_SHELL_ON_WINDOWS (process.platform === 'win32') in gbrain-exec and pass it as shell: to every gbrain/brain-sync child spawn: spawnGbrain, spawnGbrainAsync, execGbrainText (gbrain-exec), the two sources-list/remove/add spawns (gbrain-sources), the version + probe spawns (gbrain-local-status), and the two brain-sync spawns in the orchestrator. POSIX keeps the cheaper no-shell path. macOS/Linux CI can't exercise the Windows path, so test/gbrain-spawn-windows-shell.ts is a static-grep tripwire: it fails CI if a gbrain/brain-sync spawn is added without the shell flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(catalog-trim): expect YAML-quoted descriptions with interior colons (#1778) The quoting fix wraps colon-bearing catalog descriptions in double quotes; two catalog-trim assertions still pinned the old unquoted form. Tolerate the optional quotes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): defensive guards against destructive gbrain ops (#1734) The orchestrator shelled out to gbrain's destructive subcommands as if they were safe. gbrain can rm-rf a user's working tree during an autopilot race (its own bug, upstream gbrain #1526); gstack now defends itself. New lib/gbrain-guards.ts gates the two destructive reach points, all checked immediately before the op: - Autopilot refuse (multi-signal, affirmative-only): refuse a destructive op when a live 'gbrain autopilot' process (primary) or a known autopilot lock file (secondary; checked under both GBRAIN_HOME and ~/.gbrain since gbrain #1226 ignores GBRAIN_HOME) is present. No signal → proceed; inability to introspect never bricks a normal sync. - sources remove: routed through safeSourcesRemove → decideSourceRemove. Fail CLOSED — refuse to remove a user-managed source (remote_url set, local_path outside gbrain's clones) when gbrain has no --keep-storage to protect the files (it doesn't in 0.41.x). Also fail closed when the source list can't be read. Path containment uses realpath so a symlink can't smuggle a delete out of clones. - sync --strategy code: decideCodeSync refuses URL-managed sources (remote_url set) unless --allow-reclone is passed, since the walk can auto-reclone (rm-rf). Capability detection memoizes per process keyed to gbrain's identity (no stale persistent cache); --keep-storage can't be probed (generic help) so it defaults unsupported → fail closed. Every guard surfaces a visible reason; autopilot/reclone refusals fail the code stage (verdict ERR) rather than silently skipping protection. test/gbrain-guards.test.ts covers all branches hermetically (injected rows + probe overrides): autopilot signals, fail-closed remove, keep-storage path, reclone gate, realpath/symlink containment. Supersedes #1736 (which guarded a nonexistent path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(sync-gbrain): warn against running during autopilot; prefer --path sources (#1734) Adds a Safety note to the /sync-gbrain guidance (template + regenerated SKILL.md + this repo's CLAUDE.md): don't run while autopilot is active, and prefer `gbrain sources add --path` over URL-managed sources, which can auto-reclone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(memory-ingest): configurable import timeout + resume-on-timeout messaging (#1611) The gbrain import (the long pole on big brains) had a hardcoded 30-min timeout, so large memory corpora got SIGTERM'd mid-import on /sync-gbrain --full. Make it configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, validated 1min–24h). gstack can't drive gbrain's internal resume, but the existing SIGTERM forwarder already preserves gbrain's import-checkpoint.json, so the next run resumes. On a timeout we now say so explicitly ('checkpoint preserved — re-run /sync-gbrain to resume, raise GSTACK_INGEST_TIMEOUT_MS for big brains') instead of surfacing a bare 'exited null'. True gstack-driven ingest-resume is deferred to gbrain (.context/gbrain-asks.md). Also guards the module's main() behind import.meta.main so resolveImportTimeoutMs is unit-testable; the orchestrator runs it as a subprocess where main still fires. New test/memory-ingest-timeout.test.ts pins default/override/invalid resolution. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(browse): stop the headed daemon crash-loop + silent headless downgrade (#1781) A headed session against a beacon-heavy page (analytics/extension load) could tip the single-threaded daemon into a self-inflicted crash-loop: a brief HTTP stall was read as a crash, the restart didn't clear the dead Chromium's SingletonLock, the relaunch failed, and the session silently came back headless. Four fixes: 1. Busy-vs-dead (sendCommand): on a connection error, if the process is alive give /health a bounded probe (3x/250ms) and just retry the command — never kill+restart a live-but-busy server. A 30s timeout now reports 'busy, not restarting' when the process is alive instead of exiting into a kill cycle. 2. Profile-lock cleanup on (re)start: startServer reaps the orphaned Chromium holding the SingletonLock and clears Singleton{Lock,Socket,Cookie} before relaunch, so the auto-restart path gets the same clean profile the manual connect preamble did. 3. Headed persistence: the restart env reapplies BROWSE_HEADED from this invocation OR the persisted server state (mode==='headed'), so a restart from a plain command never downgrades a headed window to invisible headless. Extracted to buildRestartEnv. 4. Force-clean disconnect reaps the Chromium child tree (via the SingletonLock PID) so the next connect starts clean instead of fighting an orphan. Plus macOS window surfacing: connect + focus raise 'Google Chrome for Testing' to the active Space (best-effort osascript) with a Mission Control hint — the first thing users read as 'I can't see the browser'. Shared lock helpers (chromiumProfileDir / cleanChromiumProfileLocks / killOrphanChromium) dedupe the connect, disconnect, and restart paths. browse/test/restart-env.test.ts pins the headed-persistence decision; the full crash-loop repro is an E2E (periodic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-install): remove the v0.18.2 pin, install latest + version floor + doctor self-test (#1744) The installer pinned gbrain at v0.18.2 while gbrain shipped v0.41.x — ~23 versions behind. Remove the hard pin: a fresh clone now stays on the latest default-branch HEAD. --pinned-commit <sha> still pins for reproducibility. Unpinning removes the version gate the pin provided, so add two install-time gates that fail closed (exit 3, matching the existing PATH-shadow/version-mismatch posture): - MIN_GBRAIN_VERSION floor (0.20.0, the sources-list/federated surface gstack needs): refuse an install below it. - gbrain doctor --fast self-test when a brain config already exists (re-install / detected clone): refuse to leave a broken gbrain in place. Pre-init installs skip it; the full /sync-gbrain --dry-run self-test runs from /setup-gbrain after init. Docs updated (USING_GBRAIN_WITH_GSTACK.md no longer says 'edit PINNED_COMMIT'). Detect-install tests bump the success-path fixtures above the floor and add a below-floor exit-3 test. The gbrain-side asks (root #1526 fix, --keep-storage, remove-lease, capability command, ingest-resume, integration CI) are written to .context/gbrain-asks.md for filing against garrytan/gbrain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(#1778): update claude-ship golden + catalog-mode assertions for quoted descriptions ship's catalog description ('Ship workflow: detect...') has an interior colon, so the #1778 fix now YAML-quotes it. Refresh the claude-ship golden baseline to the quoted output and make the catalog-mode-full trim/restore assertions quote-tolerant. codex/factory ship goldens are unaffected (they use block-scalar descriptions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gen-skill-docs): use function replacer so a $ in a description can't corrupt frontmatter (#1778) String.prototype.replace treats $&/$1/$` in the replacement as patterns. A future skill description containing $ (e.g. referencing $B/$D) would silently corrupt the generated frontmatter. Use a function replacer. Behavior-preserving for all current descriptions (regen produces no diff). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.55.0.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(gbrain): document configurable memory-ingest timeout for v1.55.0.0 USING_GBRAIN_WITH_GSTACK.md: note GSTACK_INGEST_TIMEOUT_MS (default 30 min, 1 min-24h range) on the /sync-gbrain memory stage, plus checkpoint-resume on timeout. Fills the reference gap left by the configurable-import-timeout fix (#1611) shipped in v1.55.0.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-30 14:57:07 -07:00
Garry Tan	66f3a180d3	v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642 ) * fix(gbrain-sync): --full produces an empty code index on first run of a new repo `gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks the filesystem. On a freshly-registered source (0 pages), a --full run that called reindex-code alone found nothing ("No code pages to reindex"), finished in ~1s, and left the code index permanently empty while still reporting OK. Fix: --full now runs `sync --strategy code` FIRST to create pages via the file walk, then runs `reindex-code` to honor the documented "full walk + reindex" contract for both fresh and populated sources. Contributed by @jetsetterfl via #1584. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL The freshClassify probe ran `gbrain sources list --json` with the inherited process env. When the probe ran from inside a repo with its own .env (an app DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain connected to the wrong database, and the classifier reported broken-db on otherwise-healthy brains. Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the same helper the sync orchestrator uses. DATABASE_URL is seeded from ~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no longer propagate a poisoned negative to clean directories. Contributed by @jetsetterfl via #1583. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624) /retro silently produced confidently-wrong output when "today" drifted (model session-context error) or when origin/<default> was materially behind the actual remote — git log --since returned zero or near-zero commits and the narrative was fabricated from nothing. Adds Step 0.5 with four ordered pre-check branches before any window analysis: A. No 'origin' remote → skip with "base freshness not verified" note B. Detached HEAD → skip with "base freshness not verified" note C. `git fetch origin <default>` fails (offline) → warn, proceed against last-known origin/<default> D. Fetch succeeded → compare today vs latest origin/<default> commit; if gap > window-days, BLOCK with explicit citation of latest-commit date. Skip paths still proceed to Step 1, but the disclosure is carried into the retro narrative ("offline run, window not freshness-verified") so the output is never silently confidently-wrong. Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(retro): regression for #1624 stale-base pre-flight guard 13 static-invariant tests pinning the four ordered pre-check branches in retro/SKILL.md.tmpl:Step 0.5: A. no-remote skip — must check origin presence + set verdict B. detached-HEAD skip — must gate behind prior verdict (ordering) C. fetch-fail warn — must match `if !` or `\|\|` shape, gate by verdict D. stale-base BLOCK — must read latest-commit ISO date, cite remediation Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be named in prose so the retro output carries the cited reason rather than silently misreporting. Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer wins), or the BLOCK message stops citing today/latest-commit/remediation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611) The memory and code stages hardcoded a 35-min spawn timeout. On brains with ~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler unconditionally cleaned the dir up — so the next run found a checkpoint pointing at nothing and restaged from scratch, repeating the SIGTERM forever. Three changes: 1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default 2_100_000ms = 35min unchanged): GSTACK_SYNC_MEMORY_TIMEOUT_MS GSTACK_SYNC_CODE_TIMEOUT_MS Out-of-range or non-numeric values warn and fall back to the default. 2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at the active staging dir, the dir is PRESERVED for next-run resume. Otherwise (no checkpoint pointing here, crash before gbrain ever touched it) it's cleaned up as before. 3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in gstack-gbrain-sync.ts: - no checkpoint → fresh ingest pass - checkpoint + staging ok → set GSTACK_INGEST_RESUME_DIR; child reuses staging dir and skips writeStaged; gbrain import resumes from processedIndex+1 - checkpoint + staging gone → warn "previous checkpoint stale (staging dir gone), restaging from scratch" and proceed Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store state). Detect-then-fallback semantics per C1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-sync): regression for #1611 timeouts + resume 19 tests across three surfaces: - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric, zero, negative, below-floor, above-ceiling → warn + default; at-floor, at-ceiling, valid mid-range → accepted as-is. - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with empty dir. - SIGTERM staging preservation (3 static invariants): memory-ingest signal handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve branch must come before cleanup branch (ordering); orchestrator must pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume. Also threads process.env.HOME through readGbrainCheckpoint and stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches at process start and ignores later mutation, so the env override is the only reliable test injection point. Failing build if the timeout bounds are removed, the resume detection short-circuits incorrectly, or the SIGTERM handler regresses to unconditional cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): pre-emit verification gate kills Django-shape FP class (#1539) External user filed 4/8 false positives on a /review run against a Django + DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape: "resolvable in <5 minutes by viewing the actual code or running a simple grep" — fields that don't exist on the model, dict.get()-might-be-None on a form that returns {}-initialized cleaned_data, standard ORM save behavior called out as data loss. Extends the Confidence Calibration resolver (consumed by review, cso, plan-eng-review, ship) with a Pre-emit verification gate: Every finding MUST quote the specific code line that motivates it (file:line + verbatim text). If the reviewer cannot produce the quote, the finding is unverified — its confidence is forced to 4-5 so the existing "Suppress from main report" rule fires automatically. The finding still goes to the appendix for calibration audit, but the user does not see it in the critical-pass output. Reuses the existing suppression mechanism — no new code path. The FP classes the gate kills are enumerated in the resolver text so reviewers see the named patterns. Framework-meta nudge included for Django Meta, Rails associations, SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma generated client — the reviewer must quote the meta-construct that generates the symbol, not just grep for the literal name. Deeper framework-aware ORM verification (model introspection, migration-history- aware checks) is deliberately deferred to a future wave per T-Codex-2. Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit per T-Codex-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(review): regression for #1539 pre-emit verification gate 12 tests pinning the gate behavior: - Resolver emits the gate header + #1539 reference - Gate requires quoting file:line + verbatim text - Unverified findings forced to confidence 4-5 (auto-suppress via existing <7-rule, no new mechanism) - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM, Sequelize, Prisma - Deferred design doc reference present (1539-framework-aware-review.md) - Four named FP classes from #1539 enumerated: * field doesn't exist on model * dict.get() might be None * save() might lose fields * update_fields might miss X - All four downstream SKILL.md consumers (review, cso, plan-eng-review, ship) carry the gate text after gen:skill-docs - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows unchanged (regression on existing behavior) Failing build if the gate is removed, the suppression mechanism is re-invented separately, the framework-meta nudge drops a framework, or gen:skill-docs stops propagating the gate to consumers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(config): expose explain_level default * fix(benchmark): parse positional prompt after flags * fix(artifacts): reject malformed remote paths * fix(learnings): preserve current entries in cross-project search * fix(setup): register root gstack slash alias * fix(memory): probe gitleaks without shell builtin * fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard) In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash glob brackets like `[A-Z]` match lowercase letters too, so the existing `case "$name" in [A-Z_][A-Z0-9_])` branch lets names like `lower-case` through validation. The function then trips `printf -v "$varname"` and `export "$varname"` with `not a valid identifier` errors that surface mid-prompt, which is exactly what the validator was supposed to prevent. Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]` contract. Declared `local` so it doesn't leak to the calling shell — `gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare assignment would mutate the caller's locale for the rest of the process (silently affecting downstream `sort`, `tr`, locale-aware globs in the same shell, etc.). The existing regression test `test/gbrain-lib-verify.test.ts:'rejects invalid var names'` already covers the macOS repro shape (passes `lower-case` and expects the validator to reject + emit `invalid var name`). On Linux CI the test silently passed because `LC_ALL=C` is the typical default; on macOS dev boxes it fails. Verified: - `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS). - `_gstack_gbrain_validate_varname lower-case; echo $?` → 2. - `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0. - Caller's LC_ALL preserved across calls (confirmed via sourced bash). * fix(land-and-deploy): detect merged PR after gh failure After `gh pr merge` exits non-zero, the PR may already be MERGED server-side (concurrent merge landed, or local cleanup phase failed AFTER the merge succeeded). Calling `gh pr merge` a second time then errors with a confusing "already merged" — and worse, the deploy workflow never runs because we stopped on the first failure. Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY non-zero exit from `gh pr merge`: - state == MERGED → record MERGE_PATH=direct, OFFER (don't force) stale-worktree cleanup on the base branch with uncommitted-work guard, proceed to §4a CI watch - state == OPEN → check autoMergeRequest; if non-null treat as merge-queue wait; if null surface both errors and STOP - state == CLOSED → STOP Hard invariant: never retry `gh pr merge` after a non-zero exit. Server state is authoritative. Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of truth) instead of the generated SKILL.md, so the next gen:skill-docs run preserves the change. Original diff by @davidfoy via #1620. Related: cli/cli#3442, cli/cli#13380. Contributed by @davidfoy via #1620. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435) When gbrain connects through a PgBouncer transaction-mode pooler (port 6543), it auto-disables prepared statements. This breaks `gbrain search` silently — the /sync-gbrain capability check fails and the GBrain Search Guidance block never gets written to CLAUDE.md. Three-layer fix: 1. lib/gbrain-exec.ts — `buildGbrainEnv()` now detects port 6543 in the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env passed to every gbrain spawn. This is the single chokepoint — all gstack gbrain invocations inherit the fix. Caller can opt out with `GBRAIN_PREPARE=false`. 2. sync-gbrain/SKILL.md{,.tmpl} — capability check now exports `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s delay for async index propagation under connection pooling. 3. bin/gstack-gbrain-detect — surfaces `gbrain_pooler_mode` field ("transaction" \| "session" \| null) in the preamble probe JSON so /setup-gbrain and /sync-gbrain can advise users about pooler state. Closes #1435 Built with [ClosedLoop.AI](https://closedloop.ai) \| [GitHub](https://github.com/closedloop-ai/claude-plugins) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects - Single-object pooler API responses default to transaction-mode at 6543, but the shared pooler tenant on new projects only listens on session/5432 - Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note - Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat - 5 new tests covering rewrite, no-op shapes, env opt-out, array path Fixes #1301. * fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562) Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing for headless agent sessions even for normal users — /qa hangs without --no-sandbox. The kernel policy denies the unprivileged user namespaces Chromium needs. Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces the sandbox off without changing the default for everyone else. Re-authored from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper — purely additive, preserves the headed-launch sandbox-on-by-default behavior that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar. Three new regression tests cover: - linux + override=1 → false (the named use case) - darwin + override=1 → false (env wins on any platform) - override=0 → does NOT trigger (must be exactly "1") Original diff by @techcenter68 via #1562. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): mirror isCustomChromium() guard in headless launch() When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing at a baked-extension build (GBrowser / GStack Browser), the headless launch() path was unconditionally adding --disable-extensions-except / --load-extension. This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that launchHeaded() already guards against via isCustomChromium(). Mirror the existing guard: skip --load-extension flags when isCustomChromium() returns true; always push the off-screen window geometry args. * fix(browse): daemonize macOS/Linux server via setsid() `Bun.spawn().unref()` only releases the child from Bun's event loop — it does NOT call setsid(). The spawned bun server inherits the spawning shell's process session. When the CLI runs inside a session-managed shell that exits shortly after the CLI returns (Claude Code's per-command Bash sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit sends SIGHUP to every PID in the session — killing the bun server and its Chromium grandchildren within seconds of a successful `connect`. Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and pair-agent) disables the parent-process watchdog but does NOT save the server here: SIGHUP from session teardown still reaps it. Replace the macOS/Linux `Bun.spawn().unref()` with Node's `child_process.spawn({ detached: true })`, which calls setsid() and gives the server its own session leader role (PPID=1, STAT=Ss). This mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root cause, different OS surface. Verified on macOS in Conductor: pre-fix the server dies ~10–15s after connect across separate Bash invocations; post-fix the same PID stays alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/ `snapshot` across many separate shell calls. The `proc?.stderr` startup-error branch is removed since both platforms now spawn with `stdio: 'ignore'`; both fall through to the on-disk `browse-startup-error.log` written by `server.ts`'s start().catch. * fix(design): bump image-gen timeout to 240s + pin gpt-image-2 The design binary calls /v1/responses (gpt-4o + image_generation tool, quality:high, 1536x1024) but aborted the request after a hardcoded 120s. That class of request consistently takes ~140-160s end-to-end, so every generate/variants/evolve/iterate call aborted before the image returned. In /design-shotgun this cascades: Step 3c launches N parallel agents, each calling `$D generate`, each aborts at 120s and retries, all fail, the comparison board never opens — the skill appears to hang indefinitely. Reproduced the exact API call with a longer budget: HTTP 200, valid image, 143.5s. A real /design-shotgun run after the patch generated 3 variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the 161s case, which a naive 150s bump would still have failed. - Bump AbortController timeout 120_000 -> 240_000 in generate.ts, variants.ts, evolve.ts, iterate.ts (both call sites) - Pin the image_generation tool to model "gpt-image-2" design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The feedback-roundtrip.test.ts failures are a pre-existing browse-module breakage (session.clearLoadedHtml undefined), unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: fill coverage gaps for PRs #1606, #1612, #1620 Three cherry-picked PRs in this wave landed without unit-test coverage for the specific invariant they protect: #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname 8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator directly. Asserts uppercase/digit/underscore accepted, lowercase REJECTED (the macOS-locale regression case), mixed-case rejected, LC_ALL=C scoping is local (doesn't leak to caller). #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn 4 static-invariant tests on browse/src/cli.ts. The actual setsid syscall is hard to assert without a real spawn, so we pin the source shape: nodeSpawn imported from child_process; non-Windows branch uses nodeSpawn(...) with detached:true and .unref(); comment documents setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux. #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail 12 static invariants on land-and-deploy/SKILL.md.tmpl + generated SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the authoritative state query, the merge-SHA capture, non-destructive worktree cleanup with uncommitted-work guard, autoMergeRequest probe on OPEN, hard "never retry gh pr merge" rule, and atomic regen propagation. Failing build if any of the three invariants regresses. Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing glob-pattern overpermissiveness (hyphens + dots accepted) — not in #1606's scope; documented inline as a separate cleanup target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(learnings): align injection-prevention tests with PR #1619 tagged-line shape PR #1619 (preserve current entries in cross-project search) refactored gstack-learnings-search to tag rows inline (`current\t<json>` vs `cross\t<json>`) instead of filtering inside the bun block via process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or CROSS env vars — it parses the per-line tag and sets a per-entry _crossProject flag. The pre-existing test/learnings-injection.test.ts still asserted on the old SLUG + CROSS env var shape. Updates: - Remove the SLUG env var assertion (no longer set on bash command line) - Remove the bun-block CROSS env var assertion (block reads the tag now, not the env) - Add a new positive assertion that the bun block parses the tag (sourceTag \| tabIndex \| crossProject) - Keep the shell-interpolation safety assertion unchanged — that's independent of the SLUG refactor The CROSS env var is still SET on the bash command line (it controls whether the cross-project find runs at all), but the bun child no longer reads it. The existing "env vars set on bash command line" test continues to pin that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines ship/SKILL.md consumes the Confidence Calibration resolver via the preamble pipeline. This wave's #1539 pre-emit verification gate extends the resolver text, which propagated to ship/SKILL.md via gen:skill-docs. The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape and failed the host-config regression check. Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md to match the current generated output. Matches the Daegu wave's bisect commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591) PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not update the schema regression check in test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical order matching the rest of the schema array. Downstream sync-gbrain ignores unknown keys, so this is forward-compat. Without this, the test fails with a diff: + "gbrain_pooler_mode" because keys is the actual set returned and the expected array was pre-#1591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.43.0.0 — post-Daegu paper-cut wave Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new env-var surface GSTACK_SYNC__TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX, behavior expansion in browse/src/browser-manager.ts headless launch, three skill-template prompt changes affecting /retro, /review, /sync-gbrain). CHANGELOG entry leads with what stopped happening: /retro stops fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping 35-min restarts on big brains, /review stops shipping framework FPs the reviewer never grep'd. 18 fixes total — 15 community PRs + 3 self-filed silent-failure issues (#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7 new regression test files. Every wave-touched test file passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Andrey Esipov <andrey.esipov@outlook.com> Co-authored-by: David Foy <davidfoy@users.noreply.github.com> Co-authored-by: mikeangstadt <mike.angstadt@closedloop.ai> Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: techcenter68 <techcenter68@users.noreply.github.com> Co-authored-by: shohu <shohu33@gmail.com> Co-authored-by: Bharat <bharat@theysaid.io> Co-authored-by: Matteo Hertel <info@matteohertel.com>	2026-05-21 21:21:07 -07:00
Garry Tan	026751ea20	v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547 ) * fix(gbrain-sync): fold hostname into code-source id hash + migration (#1414) Cherry-picked from #1468 by 0xDevNinja and extended with the hostname-fold migration that codex review surfaced. Pre-fix `deriveCodeSourceId` hashed the absolute repo path alone, so two machines with identical home-dir layouts (chezmoi-managed dotfiles, ansible-provisioned VMs) derived the same id and clobbered each other's `local_path` in a federated brain. Last-writer-wins, with cryptic "Not a git repository" errors on the loser. Hash key is now `\${hostname}::\${path}`. Conductor worktrees on a single host stay distinct (path entropy unchanged within a host); cross-machine federations stop colliding. Migration (D1=B + codex refinements): every existing user has a pre-#1468 path-only-hash source id in their brain that no longer matches what `deriveCodeSourceId` produces. Without migration, the next sync registers a fresh source and orphans the old one. This commit adds: - \`derivePathOnlyHashLegacyId\` — separate helper for the pre-#1468 form. Distinct from \`deriveLegacyCodeSourceId\` (pre-pathhash v1.x form); both probes run. - \`planHostnameFoldMigration\` — feature-checks \`gbrain sources rename <old> <new>\` (exact argument shape, not just \`--help\`), gates on path-drift (skip migration if old source's \`local_path\` differs from current repo root), and falls back to register-new + sync-OK + remove-old when rename is unsupported. As of gbrain 0.35.0.0 the rename subcommand does not exist, so users go through the cleanup path; the rename path stays dormant until gbrain ships it. - \`removeOrphanedSource\` — called only AFTER new-source sync verifies page_count > 0. Closes the data-loss window codex flagged where "register new, remove old before sync" can wipe pages if sync fails. - \`sourceLocalPath\` — looks up a source's \`local_path\` from \`gbrain sources list --json\` for the drift gate. - Helpers accept an optional \`env\` parameter so tests can inject a gbrain shim via PATH without process-wide PATH mutation (Bun's spawnSync doesn't pick up runtime PATH changes). Pre-positions for commit 4's centralized gbrain-exec helper. - \`if (import.meta.main)\` guard around \`main()\` so the helpers can be imported for in-process unit tests. Tests cover: pure derivation, ids-match degenerate case, no-legacy short-circuit, path-drift skip path, rename path with shim, cleanup fallback when rename unsupported, cleanup fallback when rename call itself fails, source-lookup happy/missing/error paths. \`GSTACK_HOSTNAME\` env var is a test-only knob; production uses \`os.hostname()\`. Fixes #1414 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): cut source-id slugs on hyphen boundaries (+ #1357) Cherry-picked from #1481 by drummerms and extended with the explicit HTTPS-remote regression case for #1357 (decision D2=A). `constrainSourceId` truncated the slug with `slug.slice(-tailBudget)`, which cut mid-word when the boundary fell inside a token. For a repo where the combined `prefix-org-repo-pathhash` exceeded 32 chars, this produced embarrassing artifacts like `gstack-code-kill-270c0001-c32152` (from `drummerms-av-sow-wiz-skill-270c0001`). Two changes carried from #1481, adapted for the #1468 hostpathhash: 1. `constrainSourceId` now walks hyphen-separated tokens from the right, accumulating whole tokens until adding the next would exceed `tailBudget`. When no token fits, falls through to the existing `${prefix}-${hash}` form. 2. `deriveCodeSourceId` now retries with `repo-only-hostpathhash` (dropping the org segment) when the full `org-repo-hostpathhash` triggers truncation. Keeps the repo name readable when it fits at all. Plus a new test asserting the source id is period-free for the exact HTTPS-with-.git remote shape from #1357 (`https://github.com/foo/bar.git`). canonicalizeRemote strips `.git`; the sanitizer strips any residual non-alnum. The test closes #1357 by pinning the property. Closes #1357 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain): probe CLI without command builtin * fix(gbrain-sync): centralize gbrain spawn surface + seed DATABASE_URL Cherry-picked from #1508 by jasshultz, restructured per codex review #4 and #7 to widen scope and centralize the spawn surface. The bug: gbrain auto-loads .env.local from cwd via dotenv. When /sync-gbrain runs inside a Next.js / Prisma / Rails project whose .env.local defines its own DATABASE_URL (pointing at the app's local DB), gbrain reads that value instead of its own ~/.gbrain/config.json — auth fails, code + memory stages crash. This commit: - Adds lib/gbrain-exec.ts: buildGbrainEnv, spawnGbrain, execGbrainJson, execGbrainText, spawnGbrainAsync (the last one for memory-ingest's streaming gbrain import call). buildGbrainEnv seeds DATABASE_URL from ${GBRAIN_HOME:-$HOME/.gbrain}/config.json, returns a fresh env object (never the caller's by identity — codex review #11), and honors the GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch. - Routes every gbrain spawn in bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts through the helpers. Both files now own zero direct spawnSync("gbrain"\|spawn("gbrain"\|execFileSync("gbrain" call sites. - Threads buildGbrainEnv into the spawnSync("bun", [memory-ingest], ...) grandchild in runMemoryIngest (codex review #7). Without this, the parent fix is half-baked — the bun child inherits a clean env but needs DATABASE_URL pre-seeded too. spawnGbrainAsync inside memory-ingest provides defense in depth for standalone invocations. - Adds GBRAIN_HOME support — aligns with detectEngineTier (already honors GBRAIN_HOME) so all gstack-side gbrain calls agree on which config file matters. Resolves baseEnv.HOME first, then homedir(), so test injection works without process-wide HOME mutation. - Adds test/build-gbrain-env.test.ts: 10 unit tests covering all five env-seeding branches (seed from config / override caller / GSTACK_RESPECT escape hatch / missing config / unparseable config / no database_url field / GBRAIN_HOME path / object-identity guard / unrelated-vars preservation / idempotent-when-matches). - Adds test/gbrain-exec-invariant.test.ts: static-source check that greps both bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts for direct spawnSync("gbrain"\|spawn("gbrain"\|execFileSync("gbrain"\| execSync(...gbrain matches and fails the build if any are found. Refactor-proof against future contributors adding a new gbrain spawn without env threading. The invariant is intentionally narrow — only the two files where the DATABASE_URL bug actually hurts users are guarded. Migrating the spawn sites in lib/gbrain-local-status.ts, lib/gstack-memory-helpers.ts, and bin/gstack-brain-context-load.ts is a follow-up. Co-Authored-By: Jason Shultz <jasshultz@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): add .gbrain-source to consumer repo .gitignore (#1384) The v1.29.0.0 changelog promised .gbrain-source would be added to the consuming repo's .gitignore so the per-worktree pin stays local, but the change actually only added it to gstack's own .gitignore. Without the consumer-side entry, the pin gets committed and Conductor sibling worktrees of the same repo + branch step on each other's pin every time anyone commits. Add ensureGbrainSourceGitignored after a successful gbrain sources attach in runCodeImport. Idempotent on repeat runs (line-trim match), creates .gitignore if missing, logs a warning and continues on permission errors so a read-only checkout doesn't fail the sync. Gate the top-level main() call behind import.meta.main so tests can import the helper without triggering a full sync run on module load. Tests in test/gbrain-source-gitignore.test.ts cover: create-when-missing, append-without-trailing-newline, append-with-trailing-newline, idempotent on repeat, recognize whitespace-surrounded entry, no-throw on read-only file. 6 pass. * fix(gbrain-sources): bump gbrain sources list --json timeout 10s → 30s Supabase free-tier cold-starts can push `gbrain sources list --json` past 10s (observed 14.5s in the wild), causing probeSource() to throw ETIMEDOUT during /sync-gbrain code stage even though the underlying CLI was healthy. Matches the 30s ceiling already used by `sources add` / `sources remove` in the same file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-allowlist): sync project-root eng-review-test-plan artifacts (#1452) Cherry-picked from #1465 by genisis0x and extended with the v1.40.0.0 upgrade migration that codex review #5 surfaced. #1465 alone only patches bin/gstack-artifacts-init, which means fresh installs and re-inits pick up the new pattern. But existing users who already ran v1.38.1.0 have a `.migrations/v1.38.1.0.done` marker — that migration won't re-run no matter what we change. So their installed `.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes` stay without the new pattern, and `/plan-eng-review` artifacts continue to silently drop out of their federation queue. This commit: - bin/gstack-artifacts-init: adds projects//-eng-review-test-plan-.md to the three managed blocks. v1.38.1.0 covered design + test-plan; this completes the set for /plan-eng-review. - gstack-upgrade/migrations/v1.40.0.0.sh: targeted in-place repair for existing installs. Same idempotent jq-based shape as v1.38.1.0. Adds the new pattern to .brain-allowlist (before the USER ADDITIONS marker), .brain-privacy-map.json (as class=artifact), and .gitattributes (as merge=union). NEVER commits + pushes — the user controls when the patches ship to their federated artifacts repo. - test/artifacts-init-migration.test.ts: 5 new tests covering the v1.40.0.0 migration applied on top of a post-v1.38.1.0 state, jq patching, gitattributes append, idempotent re-run, and done-marker write when files are missing entirely. Co-Authored-By: Claude <noreply@anthropic.com> fix(gbrain-install): skip postinstall on Windows MSYS/MINGW + post-install probe Cherry-picked from #1487 by genisis0x and extended with the post-install subcommand probe per T6 / codex review #19. `bun install` in $INSTALL_DIR fails on Windows MSYS/MINGW/Cygwin shells because gbrain's native postinstall script mis-parses path arguments and aborts with a non-zero exit, breaking gstack-gbrain-install for Windows users running git-bash/MSYS2. The package installs cleanly without scripts. This commit: - Adds Windows shell detection via `uname -s` matching MINGW/MSYS/CYGWIN/Windows_NT (#1487's case statement already covers all four — codex review #18 confirmed MINGW is included). Windows paths get `bun install --ignore-scripts`; macOS and Linux unchanged. - Adds a post-install probe of `gbrain sources --help`. `gbrain --version` already runs (D19 PATH-shadowing validation), but version success doesn't prove the subcommand surface is reachable — and `--ignore-scripts` may have skipped artifacts that subcommands need. Probe failure logs a clear warning (with Windows-specific remediation pointing at re-running `bun install` outside MSYS) but does NOT exit non-zero; users may still get value from gbrain even if the probe fails transiently. Refs #1271 Co-Authored-By: Claude <noreply@anthropic.com> * chore: v1.40.0.0 — gbrain sync hardening wave Bumps VERSION 1.39.2.0 → 1.40.0.0 (MINOR — substantial gbrain capability hardening across sync pipeline, install path, federation allowlist; ~600 net LOC added across 8 community PRs + plan-review refinements). CHANGELOG entry follows the release-summary format: two-line headline, lead paragraph, "numbers that matter" with before/after table across 8 user-visible surfaces, "what this means for builders" closer, itemized Added/Changed/Fixed/NOT fixed/For contributors sections. Per-commit contributor credits: 0xDevNinja, drummerms, Jayesh Betala, Jason Shultz, genisis0x. Also names NikhileshNanduri and realcarsonterry in the wave's "Fixed" section for independent submissions of the .gbrain-source gitignore bug. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: drummerms <mike@av2o.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Jason Shultz <jasshultz@gmail.com> Co-authored-by: genisis0x <manietdavv@gmail.com>	2026-05-17 08:26:36 -07:00

4 Commits