v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808)

* fix(jsonl-merge): make equal-ts resolution converge across machines

The JSONL append merge driver sorted timestamped entries by (0, ts) with no
further tiebreaker. Equal-ts entries then fell back to stable-sort insertion
order (base, ours, theirs), but git assigns the local side to "ours", so two
machines resolving the same conflict emitted equal-ts lines in opposite order.
The merged files diverged and never converged. gstack-telemetry-log uses
second-granularity timestamps, so same-ts collisions are routine.

Add the line content as the final sort tiebreaker so the order is total and
side-independent. Add a regression test that runs the driver with the two
sides swapped and asserts identical output.

* fix(gen-skill-docs): quote frontmatter descriptions with interior colons (#1778)

Generated SKILL.md frontmatter emitted the catalog-trimmed description: as a
plain YAML scalar. A description with an interior ": " (e.g. "Ship workflow:
detect...") parses as a nested mapping under strict YAML loaders, so Codex/OpenAI
skill loading rejected those skills.

applyCatalogTrim now routes the value through toYamlInlineScalar, which quotes
(via JSON.stringify) only when a plain scalar would be invalid — interior ": ",
inline " #", leading indicator char, or surrounding whitespace. Strings that are
already valid plain scalars pass through unchanged to keep regen diffs small.

The frontmatter test now parses every generated block (Claude + Codex hosts) with
Bun.YAML.parse instead of string-checking that name:/description: substrings exist,
so the regression can't reappear. Runs under `bun test` (already in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(skills): regenerate SKILL.md after frontmatter quoting fix (#1778)

9 catalog-trimmed descriptions whose values contain an interior colon or inline-
comment marker are now quoted. Generated output only; rerun of bun run gen:skill-docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(gbrain-sources): centralize sources-list shape handling in parseSourcesList (#1576)

#1576's crash in sourceLocalPath was already fixed in v1.42.0.0 (dual-shape
handling). But the readers disagreed: sourceLocalPath accepted both the wrapped
{sources:[...]} object (v0.20+) and a bare array, while probeSource and
sourcePageCount accepted only the wrapped shape. Extract one parseSourcesList()
normalizer and route all three through it, so the shape assumption lives in a
single place. This is also the base the #1734 remote_url audit builds on.

parseSourcesList returns [] for null/garbage rather than throwing; callers treat
'no rows' as absent. New test/gbrain-sources-parse.test.ts pins both shapes plus
the garbage paths and confirms config.remote_url survives for the audit.

#1576 is closeable as already-fixed in v1.42.0.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(gbrain): spawn gbrain + brain-sync through a shell on Windows (#1731)

On Windows, bun/npm install gbrain as a gbrain.cmd/.ps1 shim and gstack-brain-sync
is a bash shebang script. spawnSync/spawn/execFileSync resolve neither without a
shell, so the child spawn failed ENOENT — on the sync orchestrator this surfaced
as 'brain-sync exited undefined' (#1731).

Add NEEDS_SHELL_ON_WINDOWS (process.platform === 'win32') in gbrain-exec and pass
it as shell: to every gbrain/brain-sync child spawn: spawnGbrain, spawnGbrainAsync,
execGbrainText (gbrain-exec), the two sources-list/remove/add spawns (gbrain-sources),
the version + probe spawns (gbrain-local-status), and the two brain-sync spawns in
the orchestrator. POSIX keeps the cheaper no-shell path.

macOS/Linux CI can't exercise the Windows path, so test/gbrain-spawn-windows-shell.ts
is a static-grep tripwire: it fails CI if a gbrain/brain-sync spawn is added without
the shell flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(catalog-trim): expect YAML-quoted descriptions with interior colons (#1778)

The quoting fix wraps colon-bearing catalog descriptions in double quotes;
two catalog-trim assertions still pinned the old unquoted form. Tolerate the
optional quotes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): defensive guards against destructive gbrain ops (#1734)

The orchestrator shelled out to gbrain's destructive subcommands as if they were
safe. gbrain can rm-rf a user's working tree during an autopilot race (its own
bug, upstream gbrain #1526); gstack now defends itself. New lib/gbrain-guards.ts
gates the two destructive reach points, all checked immediately before the op:

- Autopilot refuse (multi-signal, affirmative-only): refuse a destructive op when
  a live 'gbrain autopilot' process (primary) or a known autopilot lock file
  (secondary; checked under both GBRAIN_HOME and ~/.gbrain since gbrain #1226
  ignores GBRAIN_HOME) is present. No signal → proceed; inability to introspect
  never bricks a normal sync.
- sources remove: routed through safeSourcesRemove → decideSourceRemove. Fail
  CLOSED — refuse to remove a user-managed source (remote_url set, local_path
  outside gbrain's clones) when gbrain has no --keep-storage to protect the files
  (it doesn't in 0.41.x). Also fail closed when the source list can't be read.
  Path containment uses realpath so a symlink can't smuggle a delete out of clones.
- sync --strategy code: decideCodeSync refuses URL-managed sources (remote_url
  set) unless --allow-reclone is passed, since the walk can auto-reclone (rm-rf).

Capability detection memoizes per process keyed to gbrain's identity (no stale
persistent cache); --keep-storage can't be probed (generic help) so it defaults
unsupported → fail closed. Every guard surfaces a visible reason; autopilot/reclone
refusals fail the code stage (verdict ERR) rather than silently skipping protection.

test/gbrain-guards.test.ts covers all branches hermetically (injected rows + probe
overrides): autopilot signals, fail-closed remove, keep-storage path, reclone gate,
realpath/symlink containment. Supersedes #1736 (which guarded a nonexistent path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(sync-gbrain): warn against running during autopilot; prefer --path sources (#1734)

Adds a Safety note to the /sync-gbrain guidance (template + regenerated SKILL.md +
this repo's CLAUDE.md): don't run while autopilot is active, and prefer
`gbrain sources add --path` over URL-managed sources, which can auto-reclone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(memory-ingest): configurable import timeout + resume-on-timeout messaging (#1611)

The gbrain import (the long pole on big brains) had a hardcoded 30-min timeout,
so large memory corpora got SIGTERM'd mid-import on /sync-gbrain --full. Make it
configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, validated 1min–24h).

gstack can't drive gbrain's internal resume, but the existing SIGTERM forwarder
already preserves gbrain's import-checkpoint.json, so the next run resumes. On a
timeout we now say so explicitly ('checkpoint preserved — re-run /sync-gbrain to
resume, raise GSTACK_INGEST_TIMEOUT_MS for big brains') instead of surfacing a
bare 'exited null'. True gstack-driven ingest-resume is deferred to gbrain
(.context/gbrain-asks.md).

Also guards the module's main() behind import.meta.main so resolveImportTimeoutMs
is unit-testable; the orchestrator runs it as a subprocess where main still fires.
New test/memory-ingest-timeout.test.ts pins default/override/invalid resolution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(browse): stop the headed daemon crash-loop + silent headless downgrade (#1781)

A headed session against a beacon-heavy page (analytics/extension load) could tip
the single-threaded daemon into a self-inflicted crash-loop: a brief HTTP stall
was read as a crash, the restart didn't clear the dead Chromium's SingletonLock,
the relaunch failed, and the session silently came back headless. Four fixes:

1. Busy-vs-dead (sendCommand): on a connection error, if the process is alive give
   /health a bounded probe (3x/250ms) and just retry the command — never kill+restart
   a live-but-busy server. A 30s timeout now reports 'busy, not restarting' when the
   process is alive instead of exiting into a kill cycle.
2. Profile-lock cleanup on (re)start: startServer reaps the orphaned Chromium holding
   the SingletonLock and clears Singleton{Lock,Socket,Cookie} before relaunch, so the
   auto-restart path gets the same clean profile the manual connect preamble did.
3. Headed persistence: the restart env reapplies BROWSE_HEADED from this invocation OR
   the persisted server state (mode==='headed'), so a restart from a plain command
   never downgrades a headed window to invisible headless. Extracted to buildRestartEnv.
4. Force-clean disconnect reaps the Chromium child tree (via the SingletonLock PID) so
   the next connect starts clean instead of fighting an orphan.

Plus macOS window surfacing: connect + focus raise 'Google Chrome for Testing' to the
active Space (best-effort osascript) with a Mission Control hint — the first thing
users read as 'I can't see the browser'.

Shared lock helpers (chromiumProfileDir / cleanChromiumProfileLocks / killOrphanChromium)
dedupe the connect, disconnect, and restart paths. browse/test/restart-env.test.ts pins
the headed-persistence decision; the full crash-loop repro is an E2E (periodic).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(gbrain-install): remove the v0.18.2 pin, install latest + version floor + doctor self-test (#1744)

The installer pinned gbrain at v0.18.2 while gbrain shipped v0.41.x — ~23 versions
behind. Remove the hard pin: a fresh clone now stays on the latest default-branch
HEAD. --pinned-commit <sha> still pins for reproducibility.

Unpinning removes the version gate the pin provided, so add two install-time gates
that fail closed (exit 3, matching the existing PATH-shadow/version-mismatch posture):
- MIN_GBRAIN_VERSION floor (0.20.0, the sources-list/federated surface gstack needs):
  refuse an install below it.
- gbrain doctor --fast self-test when a brain config already exists (re-install /
  detected clone): refuse to leave a broken gbrain in place. Pre-init installs skip
  it; the full /sync-gbrain --dry-run self-test runs from /setup-gbrain after init.

Docs updated (USING_GBRAIN_WITH_GSTACK.md no longer says 'edit PINNED_COMMIT').
Detect-install tests bump the success-path fixtures above the floor and add a
below-floor exit-3 test. The gbrain-side asks (root #1526 fix, --keep-storage,
remove-lease, capability command, ingest-resume, integration CI) are written to
.context/gbrain-asks.md for filing against garrytan/gbrain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(#1778): update claude-ship golden + catalog-mode assertions for quoted descriptions

ship's catalog description ('Ship workflow: detect...') has an interior colon, so
the #1778 fix now YAML-quotes it. Refresh the claude-ship golden baseline to the
quoted output and make the catalog-mode-full trim/restore assertions quote-tolerant.
codex/factory ship goldens are unaffected (they use block-scalar descriptions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(gen-skill-docs): use function replacer so a $ in a description can't corrupt frontmatter (#1778)

String.prototype.replace treats $&/$1/$` in the replacement as patterns. A future
skill description containing $ (e.g. referencing $B/$D) would silently corrupt the
generated frontmatter. Use a function replacer. Behavior-preserving for all current
descriptions (regen produces no diff).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.55.0.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(gbrain): document configurable memory-ingest timeout for v1.55.0.0

USING_GBRAIN_WITH_GSTACK.md: note GSTACK_INGEST_TIMEOUT_MS (default 30 min,
1 min-24h range) on the /sync-gbrain memory stage, plus checkpoint-resume on
timeout. Fills the reference gap left by the configurable-import-timeout fix
(#1611) shipped in v1.55.0.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-30 14:57:07 -07:00
committed by GitHub
parent b88223677b
commit 3bef43bc5a
37 changed files with 1241 additions and 116 deletions
+61 -8
View File
@@ -19,9 +19,14 @@
# - git
# - network reachability to https://github.com
#
# The pinned commit is declared here rather than resolved dynamically so
# upgrades are explicit and reviewable. Update PINNED_COMMIT when gstack
# verifies compatibility with a new gbrain release.
# gbrain installs at the latest default-branch HEAD by default — the hard pin
# was removed in #1744 (it had drifted ~23 versions behind). Pass
# --pinned-commit <sha> to install a specific commit for reproducibility. A
# minimum-version floor (MIN_GBRAIN_VERSION) hard-fails the install when the
# resulting gbrain is too old for gstack's sync integration, and a fast
# `gbrain doctor` self-test hard-fails a broken install when gbrain is already
# configured. This keeps the version gate that the pin used to provide without
# freezing users 23 releases behind.
#
# Env:
# GBRAIN_INSTALL_DIR — override default install path (~/gbrain)
@@ -33,8 +38,14 @@
set -euo pipefail
# --- defaults ---
PINNED_COMMIT="08b3698e90532b7b66c445e6b1d8cdfe71822802" # gbrain v0.18.2
PINNED_TAG="v0.18.2"
# No version pin by default — install the latest default-branch HEAD (#1744).
# --pinned-commit <sha> overrides for reproducibility.
PINNED_COMMIT=""
PINNED_TAG=""
# Minimum gbrain version gstack's integration is known to work with. The
# `sources list --json` wrapped-object shape + federated sources landed by 0.20;
# older predates the surface gstack drives. Hard-fail below this floor (#1744).
MIN_GBRAIN_VERSION="0.20.0"
GBRAIN_REPO_URL="https://github.com/garrytan/gbrain.git"
DEFAULT_INSTALL_DIR="${GBRAIN_INSTALL_DIR:-$HOME/gbrain}"
INSTALL_DIR="$DEFAULT_INSTALL_DIR"
@@ -113,7 +124,7 @@ elif [ -n "$DETECTED_CLONE" ]; then
else
# Fresh clone path.
if $DRY_RUN; then
log "DRY RUN: would clone $GBRAIN_REPO_URL @ $PINNED_COMMIT → $INSTALL_DIR"
log "DRY RUN: would clone $GBRAIN_REPO_URL ${PINNED_COMMIT:+@ $PINNED_COMMIT }→ $INSTALL_DIR (latest HEAD unless --pinned-commit)"
exit 0
fi
if [ -d "$INSTALL_DIR" ]; then
@@ -121,8 +132,12 @@ else
fi
log "cloning $GBRAIN_REPO_URL → $INSTALL_DIR"
git clone --quiet "$GBRAIN_REPO_URL" "$INSTALL_DIR"
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
log "pinned to $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
if [ -n "$PINNED_COMMIT" ]; then
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
log "checked out pinned commit $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
else
log "installed latest gbrain (default-branch HEAD)"
fi
fi
if $DRY_RUN; then
@@ -195,6 +210,44 @@ fi
log "installed gbrain $actual_version from $INSTALL_DIR"
# --- minimum-version floor (#1744) ---
# Unpinning means new installs track gbrain HEAD. Hard-fail if the resulting
# version is below the floor gstack's sync integration needs — same exit-3 posture
# as the PATH-shadow / version-mismatch failures above. A warning here is exactly
# how the data-loss class slipped through, so this gate fails closed.
version_lt() {
# 0 (true) when $1 < $2 by version sort; equal versions are NOT less-than.
[ "$1" = "$2" ] && return 1
[ "$(printf '%s\n%s\n' "$1" "$2" | sort -V | head -1)" = "$1" ]
}
if version_lt "$actual_norm" "$MIN_GBRAIN_VERSION"; then
echo "" >&2
echo "gstack-gbrain-install: gbrain $actual_version is below the minimum gstack-tested version ($MIN_GBRAIN_VERSION)." >&2
echo " gstack's sync integration needs the v0.20+ source/list surface." >&2
echo " Fix: update the gbrain clone at $INSTALL_DIR to a newer release (git pull), then" >&2
echo " re-run /setup-gbrain. Or pass --pinned-commit <sha> to install a specific newer commit." >&2
echo "" >&2
exit 3
fi
# --- functional self-test when gbrain is already configured (#1744) ---
# When a brain config exists (re-install / detected clone), run a fast doctor as
# a hard gate so a broken gbrain is caught at setup, not at data-loss time.
# Pre-init installs skip this (config not written yet); the full
# `/sync-gbrain --dry-run` self-test runs from /setup-gbrain after `gbrain init`.
_GBRAIN_HOME_CHECK="${GBRAIN_HOME:-$HOME/.gbrain}"
if [ -f "$_GBRAIN_HOME_CHECK/config.json" ]; then
if ! gbrain doctor --fast >/dev/null 2>&1; then
echo "" >&2
echo "gstack-gbrain-install: gbrain $actual_version installed but 'gbrain doctor --fast' failed." >&2
echo " Refusing to leave a broken gbrain in place. Run 'gbrain doctor' to see what's wrong," >&2
echo " fix it, then re-run /setup-gbrain." >&2
echo "" >&2
exit 3
fi
log "gbrain doctor --fast passed"
fi
# v1.40.0.0 post-install validation (T6 / codex review #19): --ignore-scripts
# may skip artifacts gbrain needs at runtime, especially on Windows
# MSYS/MINGW where we DID pass --ignore-scripts. `gbrain --version` above
+85 -25
View File
@@ -37,9 +37,10 @@ import { createHash } from "crypto";
import "../lib/conductor-env-shim";
import { detectEngineTier, withErrorContext, canonicalizeRemote } from "../lib/gstack-memory-helpers";
import { ensureSourceRegistered, sourcePageCount } from "../lib/gbrain-sources";
import { ensureSourceRegistered, sourcePageCount, parseSourcesList } from "../lib/gbrain-sources";
import { detectAutopilot, decideSourceRemove, decideCodeSync } from "../lib/gbrain-guards";
import { localEngineStatus, type LocalEngineStatus } from "../lib/gbrain-local-status";
import { buildGbrainEnv, spawnGbrain, execGbrainJson } from "../lib/gbrain-exec";
import { buildGbrainEnv, spawnGbrain, execGbrainJson, NEEDS_SHELL_ON_WINDOWS } from "../lib/gbrain-exec";
// ── Types ──────────────────────────────────────────────────────────────────
@@ -52,6 +53,8 @@ interface CliArgs {
noMemory: boolean;
noBrainSync: boolean;
codeOnly: boolean;
/** #1734: opt-in to sync a URL-managed source whose code walk may auto-reclone. */
allowReclone: boolean;
}
interface CodeStageDetail {
@@ -59,7 +62,7 @@ interface CodeStageDetail {
source_path?: string;
page_count?: number | null;
last_imported?: string;
status?: "ok" | "skipped" | "failed";
status?: "ok" | "skipped" | "failed" | "refused-autopilot" | "refused-reclone";
}
interface StageResult {
@@ -205,6 +208,8 @@ Options:
--no-memory Skip the gstack-memory-ingest stage (transcripts + artifacts).
--no-brain-sync Skip the gstack-brain-sync git pipeline stage.
--code-only Only run the code-import stage (alias for --no-memory --no-brain-sync).
--allow-reclone Permit the code walk for URL-managed sources (remote_url set)
even though gbrain may auto-reclone the working tree (#1734).
--help This text.
Stages run in order: code → memory ingest → curated git push.
@@ -220,6 +225,7 @@ function parseArgs(): CliArgs {
let noMemory = false;
let noBrainSync = false;
let codeOnly = false;
let allowReclone = false;
for (let i = 0; i < args.length; i++) {
const a = args[i];
@@ -231,6 +237,7 @@ function parseArgs(): CliArgs {
case "--no-code": noCode = true; break;
case "--no-memory": noMemory = true; break;
case "--no-brain-sync": noBrainSync = true; break;
case "--allow-reclone": allowReclone = true; break;
case "--code-only":
codeOnly = true;
noMemory = true;
@@ -247,7 +254,7 @@ function parseArgs(): CliArgs {
}
}
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly };
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly, allowReclone };
}
// ── Helpers ────────────────────────────────────────────────────────────────
@@ -407,10 +414,7 @@ export function sourceLocalPath(sourceId: string, env?: NodeJS.ProcessEnv): stri
{ baseEnv: env },
);
if (!raw) return null;
const list: Array<{ id?: string; local_path?: string }> = Array.isArray(raw)
? (raw as Array<{ id?: string; local_path?: string }>)
: ((raw as { sources?: Array<{ id?: string; local_path?: string }> }).sources ?? []);
const found = list.find((s) => s.id === sourceId);
const found = parseSourcesList(raw).find((s) => s.id === sourceId);
return found?.local_path ?? null;
}
@@ -469,20 +473,50 @@ export function planHostnameFoldMigration(
return { kind: "pending-cleanup", oldId: legacyPathHashId };
}
export interface GuardedRemoveResult {
removed: boolean;
/** True when a guard refused the remove (autopilot active or unsafe source). */
skipped: boolean;
reason: string;
}
/**
* #1734: run `gbrain sources remove <id> --confirm-destructive` only behind the
* data-loss guards. Checked immediately before the destructive op (E8: as late
* as possible) so the autopilot window is as small as we can make it without a
* gbrain-side lease. Refuses when autopilot is active or when the source is
* user-managed and gbrain can't keep its storage. Pure side-effect helper; the
* caller decides whether a skip is fatal (it never is today — removes are
* best-effort cleanup).
*/
export function safeSourcesRemove(sourceId: string, env?: NodeJS.ProcessEnv): GuardedRemoveResult {
const ap = detectAutopilot(env);
if (ap.active) {
return {
removed: false,
skipped: true,
reason: `autopilot active (${ap.signal}); refusing destructive remove of ${sourceId}. ` +
`Stop autopilot, then re-run /sync-gbrain.`,
};
}
const decision = decideSourceRemove(sourceId, env);
if (!decision.allow) {
return { removed: false, skipped: true, reason: decision.reason };
}
const r = spawnGbrain(
["sources", "remove", sourceId, "--confirm-destructive", ...decision.extraArgs],
{ baseEnv: env },
);
return { removed: r.status === 0, skipped: false, reason: decision.reason };
}
/**
* Remove an orphaned source. Called only after new-source sync verifies pages
* exist, so the old source is provably redundant before deletion.
*
* Flag note: existing call sites used `--confirm-destructive` here and
* `--yes` in `lib/gbrain-sources.ts` — gbrain 0.35.0.0 accepts neither
* deterministically (the subcommand surface help is generic). We pass
* `--confirm-destructive` to match the existing call site convention; the
* flag-helper centralization in commit 4 (lib/gbrain-exec.ts) will resolve
* the inconsistency across the codebase.
* exist, so the old source is provably redundant before deletion. Routed through
* safeSourcesRemove for the #1734 guards.
*/
export function removeOrphanedSource(oldId: string, env?: NodeJS.ProcessEnv): boolean {
const r = spawnGbrain(["sources", "remove", oldId, "--confirm-destructive"], { baseEnv: env });
return r.status === 0;
return safeSourcesRemove(oldId, env).removed;
}
/**
@@ -661,13 +695,12 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
const legacyId = deriveLegacyCodeSourceId(root);
let legacyRemoved = false;
if (legacyId !== sourceId) {
const rm = spawnGbrain(["sources", "remove", legacyId, "--confirm-destructive"], {
timeout: 30_000,
baseEnv: gbrainEnv,
});
// Treat absent-source as success (clean state). gbrain emits "not found" on
// missing id; treat any non-zero exit without "not found" as a soft fail.
if (rm.status === 0) legacyRemoved = true;
// #1734: route through the data-loss guards (autopilot + source-safety).
const rm = safeSourcesRemove(legacyId, gbrainEnv);
if (rm.skipped && !args.quiet) {
console.error(`[sync:code] legacy-source cleanup skipped: ${rm.reason}`);
}
if (rm.removed) legacyRemoved = true;
}
// Step 0b: Hostname-fold migration (#1414).
@@ -720,6 +753,29 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
process.env.GSTACK_SYNC_CODE_TIMEOUT_MS,
"GSTACK_SYNC_CODE_TIMEOUT_MS",
);
// #1734 guards, checked immediately before the destructive walk (E8):
// - autopilot active → refuse (the race that wiped a working tree).
// - URL-managed source → the walk can auto-reclone (rm-rf); require
// --allow-reclone. Both surface a visible reason and fail the stage so the
// verdict shows ERR rather than silently skipping protection.
const apBeforeWalk = detectAutopilot(gbrainEnv);
if (apBeforeWalk.active) {
return {
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
summary: `refused: gbrain autopilot active (${apBeforeWalk.signal}). Stop autopilot, then re-run /sync-gbrain.`,
detail: { source_id: sourceId, source_path: root, status: "refused-autopilot" },
};
}
const reclone = decideCodeSync(sourceId, gbrainEnv, args.allowReclone);
if (!reclone.allow) {
return {
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
summary: `refused: ${reclone.reason}`,
detail: { source_id: sourceId, source_path: root, status: "refused-reclone" },
};
}
const walkResult = spawnGbrain(["sync", "--strategy", "code", "--source", sourceId], {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: codeTimeoutMs,
@@ -961,13 +1017,17 @@ function runBrainSyncPush(args: CliArgs): StageResult {
return { name: "brain-sync", ran: false, ok: true, duration_ms: 0, summary: "skipped (gstack-brain-sync not installed)" };
}
// #1731: gstack-brain-sync is a bash shebang script; Windows can't spawn it
// without a shell, which surfaced as "brain-sync exited undefined".
spawnSync(brainSyncPath, ["--discover-new"], {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: 60 * 1000,
shell: NEEDS_SHELL_ON_WINDOWS,
});
const result = spawnSync(brainSyncPath, ["--once"], {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: 60 * 1000,
shell: NEEDS_SHELL_ON_WINDOWS,
});
return {
+10 -3
View File
@@ -53,18 +53,25 @@ for path in paths:
continue
if line in seen:
continue
# Prefer ISO ts field for sort; fall back to SHA-256.
# Prefer ISO ts field for sort; fall back to SHA-256. The line
# content is the final tiebreaker so the order is total: two
# entries sharing a ts must resolve identically regardless of
# which side they arrive on. Without it, equal-ts entries fall
# back to insertion order (base, ours, theirs), and since ours
# and theirs are swapped depending on which machine runs the
# merge, the two sides produce divergent files that never
# converge.
sort_key = None
try:
obj = json.loads(line)
ts = obj.get('ts') or obj.get('timestamp')
if isinstance(ts, str):
sort_key = (0, ts)
sort_key = (0, ts, line)
except (json.JSONDecodeError, ValueError, TypeError):
pass
if sort_key is None:
h = hashlib.sha256(line.encode('utf-8')).hexdigest()
sort_key = (1, h)
sort_key = (1, h, line)
seen[line] = sort_key
except FileNotFoundError:
# Absent base / absent ours / absent theirs are all valid.
+55 -6
View File
@@ -1349,10 +1349,32 @@ function installSignalForwarder(): void {
* that kill the child on parent SIGTERM/SIGINT. Returns the same shape as
* spawnSync's result so the caller doesn't care which mode was used.
*/
/**
* #1611: the `gbrain import` is the long pole on big brains. Its timeout is
* configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, 1min24h) so large
* memory corpora aren't SIGTERM'd mid-import. On timeout we SIGTERM the child,
* which preserves gbrain's import-checkpoint.json (see installSignalForwarder)
* so the next run resumes instead of restarting from scratch.
*/
const DEFAULT_IMPORT_TIMEOUT_MS = 30 * 60 * 1000;
export function resolveImportTimeoutMs(
raw: string | undefined = process.env.GSTACK_INGEST_TIMEOUT_MS,
): number {
if (raw === undefined || raw === "") return DEFAULT_IMPORT_TIMEOUT_MS;
const n = Number.parseInt(raw, 10);
if (!Number.isFinite(n) || Number.isNaN(n) || n < 60_000 || n > 86_400_000) {
console.error(
`[memory-ingest] GSTACK_INGEST_TIMEOUT_MS="${raw}" invalid (need 6000086400000ms); using ${DEFAULT_IMPORT_TIMEOUT_MS}ms`,
);
return DEFAULT_IMPORT_TIMEOUT_MS;
}
return n;
}
function runGbrainImport(
stagingDir: string,
timeoutMs: number,
): Promise<{ status: number | null; stdout: string; stderr: string }> {
): Promise<{ status: number | null; stdout: string; stderr: string; timedOut: boolean }> {
installSignalForwarder();
return new Promise((resolve) => {
// Seed DATABASE_URL from gbrain's own config so this stage works
@@ -1385,6 +1407,7 @@ function runGbrainImport(
status: timedOut ? null : status,
stdout,
stderr,
timedOut,
});
});
child.on("error", (err) => {
@@ -1394,6 +1417,7 @@ function runGbrainImport(
status: null,
stdout,
stderr: stderr + `\n[spawn-error] ${(err as Error).message}`,
timedOut,
});
});
});
@@ -1608,13 +1632,33 @@ async function ingestPass(args: CliArgs): Promise<BulkResult> {
// spawn, parent termination orphans the gbrain process (observed
// during 2026-05-10 cold-run testing — gbrain kept running 15 min
// after the orchestrator timed out).
const importResult = await runGbrainImport(stagingDir, 30 * 60 * 1000);
const importResult = await runGbrainImport(stagingDir, resolveImportTimeoutMs());
const stdout = importResult.stdout || "";
const stderr = importResult.stderr || "";
const importJson = parseImportJson(stdout);
if (importResult.status !== 0) {
// #1611: on timeout, gbrain's import-checkpoint.json is preserved (the
// SIGTERM forwarder keeps the staging dir), so the next /sync-gbrain
// resumes rather than restarting. Tell the user instead of looking failed.
if (importResult.timedOut) {
const mins = Math.round(resolveImportTimeoutMs() / 60000);
const msg =
`gbrain import timed out after ${mins}min; checkpoint preserved — re-run ` +
`/sync-gbrain to resume (raise GSTACK_INGEST_TIMEOUT_MS for big brains)`;
console.error(`[memory-ingest] ${msg}`);
return {
written: 0,
skipped_secret: prep.skippedSecret,
skipped_dedup: prep.skippedDedup,
skipped_unattributed: prep.skippedUnattributed,
failed,
duration_ms: Date.now() - t0,
partial_pages: prep.partialPages,
system_error: msg,
};
}
const tail = (stderr.trim().split("\n").pop() || "").slice(0, 300);
const msg = `gbrain import exited ${importResult.status}: ${tail}`;
console.error(`[memory-ingest] ERR: ${msg}`);
@@ -1810,7 +1854,12 @@ async function main(): Promise<void> {
if (result.system_error) process.exit(1);
}
main().catch((err) => {
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
});
// Guard so the module is import-safe for unit tests (e.g. resolveImportTimeoutMs).
// The orchestrator runs it as `bun gstack-memory-ingest.ts ...`, where
// import.meta.main is true, so the CLI path is unaffected.
if (import.meta.main) {
main().catch((err) => {
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
});
}