mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-05 01:28:15 +02:00
v1.55.0.0 fix wave: gbrain data-loss guards + browser crash-loop + 6 more (#1808)
* fix(jsonl-merge): make equal-ts resolution converge across machines The JSONL append merge driver sorted timestamped entries by (0, ts) with no further tiebreaker. Equal-ts entries then fell back to stable-sort insertion order (base, ours, theirs), but git assigns the local side to "ours", so two machines resolving the same conflict emitted equal-ts lines in opposite order. The merged files diverged and never converged. gstack-telemetry-log uses second-granularity timestamps, so same-ts collisions are routine. Add the line content as the final sort tiebreaker so the order is total and side-independent. Add a regression test that runs the driver with the two sides swapped and asserts identical output. * fix(gen-skill-docs): quote frontmatter descriptions with interior colons (#1778) Generated SKILL.md frontmatter emitted the catalog-trimmed description: as a plain YAML scalar. A description with an interior ": " (e.g. "Ship workflow: detect...") parses as a nested mapping under strict YAML loaders, so Codex/OpenAI skill loading rejected those skills. applyCatalogTrim now routes the value through toYamlInlineScalar, which quotes (via JSON.stringify) only when a plain scalar would be invalid — interior ": ", inline " #", leading indicator char, or surrounding whitespace. Strings that are already valid plain scalars pass through unchanged to keep regen diffs small. The frontmatter test now parses every generated block (Claude + Codex hosts) with Bun.YAML.parse instead of string-checking that name:/description: substrings exist, so the regression can't reappear. Runs under `bun test` (already in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(skills): regenerate SKILL.md after frontmatter quoting fix (#1778) 9 catalog-trimmed descriptions whose values contain an interior colon or inline- comment marker are now quoted. Generated output only; rerun of bun run gen:skill-docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(gbrain-sources): centralize sources-list shape handling in parseSourcesList (#1576) #1576's crash in sourceLocalPath was already fixed in v1.42.0.0 (dual-shape handling). But the readers disagreed: sourceLocalPath accepted both the wrapped {sources:[...]} object (v0.20+) and a bare array, while probeSource and sourcePageCount accepted only the wrapped shape. Extract one parseSourcesList() normalizer and route all three through it, so the shape assumption lives in a single place. This is also the base the #1734 remote_url audit builds on. parseSourcesList returns [] for null/garbage rather than throwing; callers treat 'no rows' as absent. New test/gbrain-sources-parse.test.ts pins both shapes plus the garbage paths and confirms config.remote_url survives for the audit. #1576 is closeable as already-fixed in v1.42.0.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gbrain): spawn gbrain + brain-sync through a shell on Windows (#1731) On Windows, bun/npm install gbrain as a gbrain.cmd/.ps1 shim and gstack-brain-sync is a bash shebang script. spawnSync/spawn/execFileSync resolve neither without a shell, so the child spawn failed ENOENT — on the sync orchestrator this surfaced as 'brain-sync exited undefined' (#1731). Add NEEDS_SHELL_ON_WINDOWS (process.platform === 'win32') in gbrain-exec and pass it as shell: to every gbrain/brain-sync child spawn: spawnGbrain, spawnGbrainAsync, execGbrainText (gbrain-exec), the two sources-list/remove/add spawns (gbrain-sources), the version + probe spawns (gbrain-local-status), and the two brain-sync spawns in the orchestrator. POSIX keeps the cheaper no-shell path. macOS/Linux CI can't exercise the Windows path, so test/gbrain-spawn-windows-shell.ts is a static-grep tripwire: it fails CI if a gbrain/brain-sync spawn is added without the shell flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(catalog-trim): expect YAML-quoted descriptions with interior colons (#1778) The quoting fix wraps colon-bearing catalog descriptions in double quotes; two catalog-trim assertions still pinned the old unquoted form. Tolerate the optional quotes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): defensive guards against destructive gbrain ops (#1734) The orchestrator shelled out to gbrain's destructive subcommands as if they were safe. gbrain can rm-rf a user's working tree during an autopilot race (its own bug, upstream gbrain #1526); gstack now defends itself. New lib/gbrain-guards.ts gates the two destructive reach points, all checked immediately before the op: - Autopilot refuse (multi-signal, affirmative-only): refuse a destructive op when a live 'gbrain autopilot' process (primary) or a known autopilot lock file (secondary; checked under both GBRAIN_HOME and ~/.gbrain since gbrain #1226 ignores GBRAIN_HOME) is present. No signal → proceed; inability to introspect never bricks a normal sync. - sources remove: routed through safeSourcesRemove → decideSourceRemove. Fail CLOSED — refuse to remove a user-managed source (remote_url set, local_path outside gbrain's clones) when gbrain has no --keep-storage to protect the files (it doesn't in 0.41.x). Also fail closed when the source list can't be read. Path containment uses realpath so a symlink can't smuggle a delete out of clones. - sync --strategy code: decideCodeSync refuses URL-managed sources (remote_url set) unless --allow-reclone is passed, since the walk can auto-reclone (rm-rf). Capability detection memoizes per process keyed to gbrain's identity (no stale persistent cache); --keep-storage can't be probed (generic help) so it defaults unsupported → fail closed. Every guard surfaces a visible reason; autopilot/reclone refusals fail the code stage (verdict ERR) rather than silently skipping protection. test/gbrain-guards.test.ts covers all branches hermetically (injected rows + probe overrides): autopilot signals, fail-closed remove, keep-storage path, reclone gate, realpath/symlink containment. Supersedes #1736 (which guarded a nonexistent path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(sync-gbrain): warn against running during autopilot; prefer --path sources (#1734) Adds a Safety note to the /sync-gbrain guidance (template + regenerated SKILL.md + this repo's CLAUDE.md): don't run while autopilot is active, and prefer `gbrain sources add --path` over URL-managed sources, which can auto-reclone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(memory-ingest): configurable import timeout + resume-on-timeout messaging (#1611) The gbrain import (the long pole on big brains) had a hardcoded 30-min timeout, so large memory corpora got SIGTERM'd mid-import on /sync-gbrain --full. Make it configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, validated 1min–24h). gstack can't drive gbrain's internal resume, but the existing SIGTERM forwarder already preserves gbrain's import-checkpoint.json, so the next run resumes. On a timeout we now say so explicitly ('checkpoint preserved — re-run /sync-gbrain to resume, raise GSTACK_INGEST_TIMEOUT_MS for big brains') instead of surfacing a bare 'exited null'. True gstack-driven ingest-resume is deferred to gbrain (.context/gbrain-asks.md). Also guards the module's main() behind import.meta.main so resolveImportTimeoutMs is unit-testable; the orchestrator runs it as a subprocess where main still fires. New test/memory-ingest-timeout.test.ts pins default/override/invalid resolution. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(browse): stop the headed daemon crash-loop + silent headless downgrade (#1781) A headed session against a beacon-heavy page (analytics/extension load) could tip the single-threaded daemon into a self-inflicted crash-loop: a brief HTTP stall was read as a crash, the restart didn't clear the dead Chromium's SingletonLock, the relaunch failed, and the session silently came back headless. Four fixes: 1. Busy-vs-dead (sendCommand): on a connection error, if the process is alive give /health a bounded probe (3x/250ms) and just retry the command — never kill+restart a live-but-busy server. A 30s timeout now reports 'busy, not restarting' when the process is alive instead of exiting into a kill cycle. 2. Profile-lock cleanup on (re)start: startServer reaps the orphaned Chromium holding the SingletonLock and clears Singleton{Lock,Socket,Cookie} before relaunch, so the auto-restart path gets the same clean profile the manual connect preamble did. 3. Headed persistence: the restart env reapplies BROWSE_HEADED from this invocation OR the persisted server state (mode==='headed'), so a restart from a plain command never downgrades a headed window to invisible headless. Extracted to buildRestartEnv. 4. Force-clean disconnect reaps the Chromium child tree (via the SingletonLock PID) so the next connect starts clean instead of fighting an orphan. Plus macOS window surfacing: connect + focus raise 'Google Chrome for Testing' to the active Space (best-effort osascript) with a Mission Control hint — the first thing users read as 'I can't see the browser'. Shared lock helpers (chromiumProfileDir / cleanChromiumProfileLocks / killOrphanChromium) dedupe the connect, disconnect, and restart paths. browse/test/restart-env.test.ts pins the headed-persistence decision; the full crash-loop repro is an E2E (periodic). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-install): remove the v0.18.2 pin, install latest + version floor + doctor self-test (#1744) The installer pinned gbrain at v0.18.2 while gbrain shipped v0.41.x — ~23 versions behind. Remove the hard pin: a fresh clone now stays on the latest default-branch HEAD. --pinned-commit <sha> still pins for reproducibility. Unpinning removes the version gate the pin provided, so add two install-time gates that fail closed (exit 3, matching the existing PATH-shadow/version-mismatch posture): - MIN_GBRAIN_VERSION floor (0.20.0, the sources-list/federated surface gstack needs): refuse an install below it. - gbrain doctor --fast self-test when a brain config already exists (re-install / detected clone): refuse to leave a broken gbrain in place. Pre-init installs skip it; the full /sync-gbrain --dry-run self-test runs from /setup-gbrain after init. Docs updated (USING_GBRAIN_WITH_GSTACK.md no longer says 'edit PINNED_COMMIT'). Detect-install tests bump the success-path fixtures above the floor and add a below-floor exit-3 test. The gbrain-side asks (root #1526 fix, --keep-storage, remove-lease, capability command, ingest-resume, integration CI) are written to .context/gbrain-asks.md for filing against garrytan/gbrain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(#1778): update claude-ship golden + catalog-mode assertions for quoted descriptions ship's catalog description ('Ship workflow: detect...') has an interior colon, so the #1778 fix now YAML-quotes it. Refresh the claude-ship golden baseline to the quoted output and make the catalog-mode-full trim/restore assertions quote-tolerant. codex/factory ship goldens are unaffected (they use block-scalar descriptions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(gen-skill-docs): use function replacer so a $ in a description can't corrupt frontmatter (#1778) String.prototype.replace treats $&/$1/$` in the replacement as patterns. A future skill description containing $ (e.g. referencing $B/$D) would silently corrupt the generated frontmatter. Use a function replacer. Behavior-preserving for all current descriptions (regen produces no diff). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.55.0.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(gbrain): document configurable memory-ingest timeout for v1.55.0.0 USING_GBRAIN_WITH_GSTACK.md: note GSTACK_INGEST_TIMEOUT_MS (default 30 min, 1 min-24h range) on the /sync-gbrain memory stage, plus checkpoint-resume on timeout. Fills the reference gap left by the configurable-import-timeout fix (#1611) shipped in v1.55.0.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -19,9 +19,14 @@
|
||||
# - git
|
||||
# - network reachability to https://github.com
|
||||
#
|
||||
# The pinned commit is declared here rather than resolved dynamically so
|
||||
# upgrades are explicit and reviewable. Update PINNED_COMMIT when gstack
|
||||
# verifies compatibility with a new gbrain release.
|
||||
# gbrain installs at the latest default-branch HEAD by default — the hard pin
|
||||
# was removed in #1744 (it had drifted ~23 versions behind). Pass
|
||||
# --pinned-commit <sha> to install a specific commit for reproducibility. A
|
||||
# minimum-version floor (MIN_GBRAIN_VERSION) hard-fails the install when the
|
||||
# resulting gbrain is too old for gstack's sync integration, and a fast
|
||||
# `gbrain doctor` self-test hard-fails a broken install when gbrain is already
|
||||
# configured. This keeps the version gate that the pin used to provide without
|
||||
# freezing users 23 releases behind.
|
||||
#
|
||||
# Env:
|
||||
# GBRAIN_INSTALL_DIR — override default install path (~/gbrain)
|
||||
@@ -33,8 +38,14 @@
|
||||
set -euo pipefail
|
||||
|
||||
# --- defaults ---
|
||||
PINNED_COMMIT="08b3698e90532b7b66c445e6b1d8cdfe71822802" # gbrain v0.18.2
|
||||
PINNED_TAG="v0.18.2"
|
||||
# No version pin by default — install the latest default-branch HEAD (#1744).
|
||||
# --pinned-commit <sha> overrides for reproducibility.
|
||||
PINNED_COMMIT=""
|
||||
PINNED_TAG=""
|
||||
# Minimum gbrain version gstack's integration is known to work with. The
|
||||
# `sources list --json` wrapped-object shape + federated sources landed by 0.20;
|
||||
# older predates the surface gstack drives. Hard-fail below this floor (#1744).
|
||||
MIN_GBRAIN_VERSION="0.20.0"
|
||||
GBRAIN_REPO_URL="https://github.com/garrytan/gbrain.git"
|
||||
DEFAULT_INSTALL_DIR="${GBRAIN_INSTALL_DIR:-$HOME/gbrain}"
|
||||
INSTALL_DIR="$DEFAULT_INSTALL_DIR"
|
||||
@@ -113,7 +124,7 @@ elif [ -n "$DETECTED_CLONE" ]; then
|
||||
else
|
||||
# Fresh clone path.
|
||||
if $DRY_RUN; then
|
||||
log "DRY RUN: would clone $GBRAIN_REPO_URL @ $PINNED_COMMIT → $INSTALL_DIR"
|
||||
log "DRY RUN: would clone $GBRAIN_REPO_URL ${PINNED_COMMIT:+@ $PINNED_COMMIT }→ $INSTALL_DIR (latest HEAD unless --pinned-commit)"
|
||||
exit 0
|
||||
fi
|
||||
if [ -d "$INSTALL_DIR" ]; then
|
||||
@@ -121,8 +132,12 @@ else
|
||||
fi
|
||||
log "cloning $GBRAIN_REPO_URL → $INSTALL_DIR"
|
||||
git clone --quiet "$GBRAIN_REPO_URL" "$INSTALL_DIR"
|
||||
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
|
||||
log "pinned to $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
|
||||
if [ -n "$PINNED_COMMIT" ]; then
|
||||
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
|
||||
log "checked out pinned commit $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
|
||||
else
|
||||
log "installed latest gbrain (default-branch HEAD)"
|
||||
fi
|
||||
fi
|
||||
|
||||
if $DRY_RUN; then
|
||||
@@ -195,6 +210,44 @@ fi
|
||||
|
||||
log "installed gbrain $actual_version from $INSTALL_DIR"
|
||||
|
||||
# --- minimum-version floor (#1744) ---
|
||||
# Unpinning means new installs track gbrain HEAD. Hard-fail if the resulting
|
||||
# version is below the floor gstack's sync integration needs — same exit-3 posture
|
||||
# as the PATH-shadow / version-mismatch failures above. A warning here is exactly
|
||||
# how the data-loss class slipped through, so this gate fails closed.
|
||||
version_lt() {
|
||||
# 0 (true) when $1 < $2 by version sort; equal versions are NOT less-than.
|
||||
[ "$1" = "$2" ] && return 1
|
||||
[ "$(printf '%s\n%s\n' "$1" "$2" | sort -V | head -1)" = "$1" ]
|
||||
}
|
||||
if version_lt "$actual_norm" "$MIN_GBRAIN_VERSION"; then
|
||||
echo "" >&2
|
||||
echo "gstack-gbrain-install: gbrain $actual_version is below the minimum gstack-tested version ($MIN_GBRAIN_VERSION)." >&2
|
||||
echo " gstack's sync integration needs the v0.20+ source/list surface." >&2
|
||||
echo " Fix: update the gbrain clone at $INSTALL_DIR to a newer release (git pull), then" >&2
|
||||
echo " re-run /setup-gbrain. Or pass --pinned-commit <sha> to install a specific newer commit." >&2
|
||||
echo "" >&2
|
||||
exit 3
|
||||
fi
|
||||
|
||||
# --- functional self-test when gbrain is already configured (#1744) ---
|
||||
# When a brain config exists (re-install / detected clone), run a fast doctor as
|
||||
# a hard gate so a broken gbrain is caught at setup, not at data-loss time.
|
||||
# Pre-init installs skip this (config not written yet); the full
|
||||
# `/sync-gbrain --dry-run` self-test runs from /setup-gbrain after `gbrain init`.
|
||||
_GBRAIN_HOME_CHECK="${GBRAIN_HOME:-$HOME/.gbrain}"
|
||||
if [ -f "$_GBRAIN_HOME_CHECK/config.json" ]; then
|
||||
if ! gbrain doctor --fast >/dev/null 2>&1; then
|
||||
echo "" >&2
|
||||
echo "gstack-gbrain-install: gbrain $actual_version installed but 'gbrain doctor --fast' failed." >&2
|
||||
echo " Refusing to leave a broken gbrain in place. Run 'gbrain doctor' to see what's wrong," >&2
|
||||
echo " fix it, then re-run /setup-gbrain." >&2
|
||||
echo "" >&2
|
||||
exit 3
|
||||
fi
|
||||
log "gbrain doctor --fast passed"
|
||||
fi
|
||||
|
||||
# v1.40.0.0 post-install validation (T6 / codex review #19): --ignore-scripts
|
||||
# may skip artifacts gbrain needs at runtime, especially on Windows
|
||||
# MSYS/MINGW where we DID pass --ignore-scripts. `gbrain --version` above
|
||||
|
||||
+85
-25
@@ -37,9 +37,10 @@ import { createHash } from "crypto";
|
||||
|
||||
import "../lib/conductor-env-shim";
|
||||
import { detectEngineTier, withErrorContext, canonicalizeRemote } from "../lib/gstack-memory-helpers";
|
||||
import { ensureSourceRegistered, sourcePageCount } from "../lib/gbrain-sources";
|
||||
import { ensureSourceRegistered, sourcePageCount, parseSourcesList } from "../lib/gbrain-sources";
|
||||
import { detectAutopilot, decideSourceRemove, decideCodeSync } from "../lib/gbrain-guards";
|
||||
import { localEngineStatus, type LocalEngineStatus } from "../lib/gbrain-local-status";
|
||||
import { buildGbrainEnv, spawnGbrain, execGbrainJson } from "../lib/gbrain-exec";
|
||||
import { buildGbrainEnv, spawnGbrain, execGbrainJson, NEEDS_SHELL_ON_WINDOWS } from "../lib/gbrain-exec";
|
||||
|
||||
// ── Types ──────────────────────────────────────────────────────────────────
|
||||
|
||||
@@ -52,6 +53,8 @@ interface CliArgs {
|
||||
noMemory: boolean;
|
||||
noBrainSync: boolean;
|
||||
codeOnly: boolean;
|
||||
/** #1734: opt-in to sync a URL-managed source whose code walk may auto-reclone. */
|
||||
allowReclone: boolean;
|
||||
}
|
||||
|
||||
interface CodeStageDetail {
|
||||
@@ -59,7 +62,7 @@ interface CodeStageDetail {
|
||||
source_path?: string;
|
||||
page_count?: number | null;
|
||||
last_imported?: string;
|
||||
status?: "ok" | "skipped" | "failed";
|
||||
status?: "ok" | "skipped" | "failed" | "refused-autopilot" | "refused-reclone";
|
||||
}
|
||||
|
||||
interface StageResult {
|
||||
@@ -205,6 +208,8 @@ Options:
|
||||
--no-memory Skip the gstack-memory-ingest stage (transcripts + artifacts).
|
||||
--no-brain-sync Skip the gstack-brain-sync git pipeline stage.
|
||||
--code-only Only run the code-import stage (alias for --no-memory --no-brain-sync).
|
||||
--allow-reclone Permit the code walk for URL-managed sources (remote_url set)
|
||||
even though gbrain may auto-reclone the working tree (#1734).
|
||||
--help This text.
|
||||
|
||||
Stages run in order: code → memory ingest → curated git push.
|
||||
@@ -220,6 +225,7 @@ function parseArgs(): CliArgs {
|
||||
let noMemory = false;
|
||||
let noBrainSync = false;
|
||||
let codeOnly = false;
|
||||
let allowReclone = false;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
const a = args[i];
|
||||
@@ -231,6 +237,7 @@ function parseArgs(): CliArgs {
|
||||
case "--no-code": noCode = true; break;
|
||||
case "--no-memory": noMemory = true; break;
|
||||
case "--no-brain-sync": noBrainSync = true; break;
|
||||
case "--allow-reclone": allowReclone = true; break;
|
||||
case "--code-only":
|
||||
codeOnly = true;
|
||||
noMemory = true;
|
||||
@@ -247,7 +254,7 @@ function parseArgs(): CliArgs {
|
||||
}
|
||||
}
|
||||
|
||||
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly };
|
||||
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly, allowReclone };
|
||||
}
|
||||
|
||||
// ── Helpers ────────────────────────────────────────────────────────────────
|
||||
@@ -407,10 +414,7 @@ export function sourceLocalPath(sourceId: string, env?: NodeJS.ProcessEnv): stri
|
||||
{ baseEnv: env },
|
||||
);
|
||||
if (!raw) return null;
|
||||
const list: Array<{ id?: string; local_path?: string }> = Array.isArray(raw)
|
||||
? (raw as Array<{ id?: string; local_path?: string }>)
|
||||
: ((raw as { sources?: Array<{ id?: string; local_path?: string }> }).sources ?? []);
|
||||
const found = list.find((s) => s.id === sourceId);
|
||||
const found = parseSourcesList(raw).find((s) => s.id === sourceId);
|
||||
return found?.local_path ?? null;
|
||||
}
|
||||
|
||||
@@ -469,20 +473,50 @@ export function planHostnameFoldMigration(
|
||||
return { kind: "pending-cleanup", oldId: legacyPathHashId };
|
||||
}
|
||||
|
||||
export interface GuardedRemoveResult {
|
||||
removed: boolean;
|
||||
/** True when a guard refused the remove (autopilot active or unsafe source). */
|
||||
skipped: boolean;
|
||||
reason: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* #1734: run `gbrain sources remove <id> --confirm-destructive` only behind the
|
||||
* data-loss guards. Checked immediately before the destructive op (E8: as late
|
||||
* as possible) so the autopilot window is as small as we can make it without a
|
||||
* gbrain-side lease. Refuses when autopilot is active or when the source is
|
||||
* user-managed and gbrain can't keep its storage. Pure side-effect helper; the
|
||||
* caller decides whether a skip is fatal (it never is today — removes are
|
||||
* best-effort cleanup).
|
||||
*/
|
||||
export function safeSourcesRemove(sourceId: string, env?: NodeJS.ProcessEnv): GuardedRemoveResult {
|
||||
const ap = detectAutopilot(env);
|
||||
if (ap.active) {
|
||||
return {
|
||||
removed: false,
|
||||
skipped: true,
|
||||
reason: `autopilot active (${ap.signal}); refusing destructive remove of ${sourceId}. ` +
|
||||
`Stop autopilot, then re-run /sync-gbrain.`,
|
||||
};
|
||||
}
|
||||
const decision = decideSourceRemove(sourceId, env);
|
||||
if (!decision.allow) {
|
||||
return { removed: false, skipped: true, reason: decision.reason };
|
||||
}
|
||||
const r = spawnGbrain(
|
||||
["sources", "remove", sourceId, "--confirm-destructive", ...decision.extraArgs],
|
||||
{ baseEnv: env },
|
||||
);
|
||||
return { removed: r.status === 0, skipped: false, reason: decision.reason };
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove an orphaned source. Called only after new-source sync verifies pages
|
||||
* exist, so the old source is provably redundant before deletion.
|
||||
*
|
||||
* Flag note: existing call sites used `--confirm-destructive` here and
|
||||
* `--yes` in `lib/gbrain-sources.ts` — gbrain 0.35.0.0 accepts neither
|
||||
* deterministically (the subcommand surface help is generic). We pass
|
||||
* `--confirm-destructive` to match the existing call site convention; the
|
||||
* flag-helper centralization in commit 4 (lib/gbrain-exec.ts) will resolve
|
||||
* the inconsistency across the codebase.
|
||||
* exist, so the old source is provably redundant before deletion. Routed through
|
||||
* safeSourcesRemove for the #1734 guards.
|
||||
*/
|
||||
export function removeOrphanedSource(oldId: string, env?: NodeJS.ProcessEnv): boolean {
|
||||
const r = spawnGbrain(["sources", "remove", oldId, "--confirm-destructive"], { baseEnv: env });
|
||||
return r.status === 0;
|
||||
return safeSourcesRemove(oldId, env).removed;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -661,13 +695,12 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
const legacyId = deriveLegacyCodeSourceId(root);
|
||||
let legacyRemoved = false;
|
||||
if (legacyId !== sourceId) {
|
||||
const rm = spawnGbrain(["sources", "remove", legacyId, "--confirm-destructive"], {
|
||||
timeout: 30_000,
|
||||
baseEnv: gbrainEnv,
|
||||
});
|
||||
// Treat absent-source as success (clean state). gbrain emits "not found" on
|
||||
// missing id; treat any non-zero exit without "not found" as a soft fail.
|
||||
if (rm.status === 0) legacyRemoved = true;
|
||||
// #1734: route through the data-loss guards (autopilot + source-safety).
|
||||
const rm = safeSourcesRemove(legacyId, gbrainEnv);
|
||||
if (rm.skipped && !args.quiet) {
|
||||
console.error(`[sync:code] legacy-source cleanup skipped: ${rm.reason}`);
|
||||
}
|
||||
if (rm.removed) legacyRemoved = true;
|
||||
}
|
||||
|
||||
// Step 0b: Hostname-fold migration (#1414).
|
||||
@@ -720,6 +753,29 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
process.env.GSTACK_SYNC_CODE_TIMEOUT_MS,
|
||||
"GSTACK_SYNC_CODE_TIMEOUT_MS",
|
||||
);
|
||||
|
||||
// #1734 guards, checked immediately before the destructive walk (E8):
|
||||
// - autopilot active → refuse (the race that wiped a working tree).
|
||||
// - URL-managed source → the walk can auto-reclone (rm-rf); require
|
||||
// --allow-reclone. Both surface a visible reason and fail the stage so the
|
||||
// verdict shows ERR rather than silently skipping protection.
|
||||
const apBeforeWalk = detectAutopilot(gbrainEnv);
|
||||
if (apBeforeWalk.active) {
|
||||
return {
|
||||
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
|
||||
summary: `refused: gbrain autopilot active (${apBeforeWalk.signal}). Stop autopilot, then re-run /sync-gbrain.`,
|
||||
detail: { source_id: sourceId, source_path: root, status: "refused-autopilot" },
|
||||
};
|
||||
}
|
||||
const reclone = decideCodeSync(sourceId, gbrainEnv, args.allowReclone);
|
||||
if (!reclone.allow) {
|
||||
return {
|
||||
name: "code", ran: true, ok: false, duration_ms: Date.now() - t0,
|
||||
summary: `refused: ${reclone.reason}`,
|
||||
detail: { source_id: sourceId, source_path: root, status: "refused-reclone" },
|
||||
};
|
||||
}
|
||||
|
||||
const walkResult = spawnGbrain(["sync", "--strategy", "code", "--source", sourceId], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: codeTimeoutMs,
|
||||
@@ -961,13 +1017,17 @@ function runBrainSyncPush(args: CliArgs): StageResult {
|
||||
return { name: "brain-sync", ran: false, ok: true, duration_ms: 0, summary: "skipped (gstack-brain-sync not installed)" };
|
||||
}
|
||||
|
||||
// #1731: gstack-brain-sync is a bash shebang script; Windows can't spawn it
|
||||
// without a shell, which surfaced as "brain-sync exited undefined".
|
||||
spawnSync(brainSyncPath, ["--discover-new"], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: 60 * 1000,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS,
|
||||
});
|
||||
const result = spawnSync(brainSyncPath, ["--once"], {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: 60 * 1000,
|
||||
shell: NEEDS_SHELL_ON_WINDOWS,
|
||||
});
|
||||
|
||||
return {
|
||||
|
||||
+10
-3
@@ -53,18 +53,25 @@ for path in paths:
|
||||
continue
|
||||
if line in seen:
|
||||
continue
|
||||
# Prefer ISO ts field for sort; fall back to SHA-256.
|
||||
# Prefer ISO ts field for sort; fall back to SHA-256. The line
|
||||
# content is the final tiebreaker so the order is total: two
|
||||
# entries sharing a ts must resolve identically regardless of
|
||||
# which side they arrive on. Without it, equal-ts entries fall
|
||||
# back to insertion order (base, ours, theirs), and since ours
|
||||
# and theirs are swapped depending on which machine runs the
|
||||
# merge, the two sides produce divergent files that never
|
||||
# converge.
|
||||
sort_key = None
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
ts = obj.get('ts') or obj.get('timestamp')
|
||||
if isinstance(ts, str):
|
||||
sort_key = (0, ts)
|
||||
sort_key = (0, ts, line)
|
||||
except (json.JSONDecodeError, ValueError, TypeError):
|
||||
pass
|
||||
if sort_key is None:
|
||||
h = hashlib.sha256(line.encode('utf-8')).hexdigest()
|
||||
sort_key = (1, h)
|
||||
sort_key = (1, h, line)
|
||||
seen[line] = sort_key
|
||||
except FileNotFoundError:
|
||||
# Absent base / absent ours / absent theirs are all valid.
|
||||
|
||||
@@ -1349,10 +1349,32 @@ function installSignalForwarder(): void {
|
||||
* that kill the child on parent SIGTERM/SIGINT. Returns the same shape as
|
||||
* spawnSync's result so the caller doesn't care which mode was used.
|
||||
*/
|
||||
/**
|
||||
* #1611: the `gbrain import` is the long pole on big brains. Its timeout is
|
||||
* configurable via GSTACK_INGEST_TIMEOUT_MS (default 30 min, 1min–24h) so large
|
||||
* memory corpora aren't SIGTERM'd mid-import. On timeout we SIGTERM the child,
|
||||
* which preserves gbrain's import-checkpoint.json (see installSignalForwarder)
|
||||
* so the next run resumes instead of restarting from scratch.
|
||||
*/
|
||||
const DEFAULT_IMPORT_TIMEOUT_MS = 30 * 60 * 1000;
|
||||
export function resolveImportTimeoutMs(
|
||||
raw: string | undefined = process.env.GSTACK_INGEST_TIMEOUT_MS,
|
||||
): number {
|
||||
if (raw === undefined || raw === "") return DEFAULT_IMPORT_TIMEOUT_MS;
|
||||
const n = Number.parseInt(raw, 10);
|
||||
if (!Number.isFinite(n) || Number.isNaN(n) || n < 60_000 || n > 86_400_000) {
|
||||
console.error(
|
||||
`[memory-ingest] GSTACK_INGEST_TIMEOUT_MS="${raw}" invalid (need 60000–86400000ms); using ${DEFAULT_IMPORT_TIMEOUT_MS}ms`,
|
||||
);
|
||||
return DEFAULT_IMPORT_TIMEOUT_MS;
|
||||
}
|
||||
return n;
|
||||
}
|
||||
|
||||
function runGbrainImport(
|
||||
stagingDir: string,
|
||||
timeoutMs: number,
|
||||
): Promise<{ status: number | null; stdout: string; stderr: string }> {
|
||||
): Promise<{ status: number | null; stdout: string; stderr: string; timedOut: boolean }> {
|
||||
installSignalForwarder();
|
||||
return new Promise((resolve) => {
|
||||
// Seed DATABASE_URL from gbrain's own config so this stage works
|
||||
@@ -1385,6 +1407,7 @@ function runGbrainImport(
|
||||
status: timedOut ? null : status,
|
||||
stdout,
|
||||
stderr,
|
||||
timedOut,
|
||||
});
|
||||
});
|
||||
child.on("error", (err) => {
|
||||
@@ -1394,6 +1417,7 @@ function runGbrainImport(
|
||||
status: null,
|
||||
stdout,
|
||||
stderr: stderr + `\n[spawn-error] ${(err as Error).message}`,
|
||||
timedOut,
|
||||
});
|
||||
});
|
||||
});
|
||||
@@ -1608,13 +1632,33 @@ async function ingestPass(args: CliArgs): Promise<BulkResult> {
|
||||
// spawn, parent termination orphans the gbrain process (observed
|
||||
// during 2026-05-10 cold-run testing — gbrain kept running 15 min
|
||||
// after the orchestrator timed out).
|
||||
const importResult = await runGbrainImport(stagingDir, 30 * 60 * 1000);
|
||||
const importResult = await runGbrainImport(stagingDir, resolveImportTimeoutMs());
|
||||
|
||||
const stdout = importResult.stdout || "";
|
||||
const stderr = importResult.stderr || "";
|
||||
const importJson = parseImportJson(stdout);
|
||||
|
||||
if (importResult.status !== 0) {
|
||||
// #1611: on timeout, gbrain's import-checkpoint.json is preserved (the
|
||||
// SIGTERM forwarder keeps the staging dir), so the next /sync-gbrain
|
||||
// resumes rather than restarting. Tell the user instead of looking failed.
|
||||
if (importResult.timedOut) {
|
||||
const mins = Math.round(resolveImportTimeoutMs() / 60000);
|
||||
const msg =
|
||||
`gbrain import timed out after ${mins}min; checkpoint preserved — re-run ` +
|
||||
`/sync-gbrain to resume (raise GSTACK_INGEST_TIMEOUT_MS for big brains)`;
|
||||
console.error(`[memory-ingest] ${msg}`);
|
||||
return {
|
||||
written: 0,
|
||||
skipped_secret: prep.skippedSecret,
|
||||
skipped_dedup: prep.skippedDedup,
|
||||
skipped_unattributed: prep.skippedUnattributed,
|
||||
failed,
|
||||
duration_ms: Date.now() - t0,
|
||||
partial_pages: prep.partialPages,
|
||||
system_error: msg,
|
||||
};
|
||||
}
|
||||
const tail = (stderr.trim().split("\n").pop() || "").slice(0, 300);
|
||||
const msg = `gbrain import exited ${importResult.status}: ${tail}`;
|
||||
console.error(`[memory-ingest] ERR: ${msg}`);
|
||||
@@ -1810,7 +1854,12 @@ async function main(): Promise<void> {
|
||||
if (result.system_error) process.exit(1);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
process.exit(1);
|
||||
});
|
||||
// Guard so the module is import-safe for unit tests (e.g. resolveImportTimeoutMs).
|
||||
// The orchestrator runs it as `bun gstack-memory-ingest.ts ...`, where
|
||||
// import.meta.main is true, so the CLI path is unaffected.
|
||||
if (import.meta.main) {
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-memory-ingest fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user