mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-27 05:22:29 +02:00
v1.40.0.0 fix wave: gbrain sync hardening (8 community PRs + migration) (#1547)
* fix(gbrain-sync): fold hostname into code-source id hash + migration (#1414) Cherry-picked from #1468 by 0xDevNinja and extended with the hostname-fold migration that codex review surfaced. Pre-fix `deriveCodeSourceId` hashed the absolute repo path alone, so two machines with identical home-dir layouts (chezmoi-managed dotfiles, ansible-provisioned VMs) derived the same id and clobbered each other's `local_path` in a federated brain. Last-writer-wins, with cryptic "Not a git repository" errors on the loser. Hash key is now `\${hostname}::\${path}`. Conductor worktrees on a single host stay distinct (path entropy unchanged within a host); cross-machine federations stop colliding. Migration (D1=B + codex refinements): every existing user has a pre-#1468 path-only-hash source id in their brain that no longer matches what `deriveCodeSourceId` produces. Without migration, the next sync registers a fresh source and orphans the old one. This commit adds: - \`derivePathOnlyHashLegacyId\` — separate helper for the pre-#1468 form. Distinct from \`deriveLegacyCodeSourceId\` (pre-pathhash v1.x form); both probes run. - \`planHostnameFoldMigration\` — feature-checks \`gbrain sources rename <old> <new>\` (exact argument shape, not just \`--help\`), gates on path-drift (skip migration if old source's \`local_path\` differs from current repo root), and falls back to register-new + sync-OK + remove-old when rename is unsupported. As of gbrain 0.35.0.0 the rename subcommand does not exist, so users go through the cleanup path; the rename path stays dormant until gbrain ships it. - \`removeOrphanedSource\` — called only AFTER new-source sync verifies page_count > 0. Closes the data-loss window codex flagged where "register new, remove old before sync" can wipe pages if sync fails. - \`sourceLocalPath\` — looks up a source's \`local_path\` from \`gbrain sources list --json\` for the drift gate. - Helpers accept an optional \`env\` parameter so tests can inject a gbrain shim via PATH without process-wide PATH mutation (Bun's spawnSync doesn't pick up runtime PATH changes). Pre-positions for commit 4's centralized gbrain-exec helper. - \`if (import.meta.main)\` guard around \`main()\` so the helpers can be imported for in-process unit tests. Tests cover: pure derivation, ids-match degenerate case, no-legacy short-circuit, path-drift skip path, rename path with shim, cleanup fallback when rename unsupported, cleanup fallback when rename call itself fails, source-lookup happy/missing/error paths. \`GSTACK_HOSTNAME\` env var is a test-only knob; production uses \`os.hostname()\`. Fixes #1414 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): cut source-id slugs on hyphen boundaries (+ #1357) Cherry-picked from #1481 by drummerms and extended with the explicit HTTPS-remote regression case for #1357 (decision D2=A). `constrainSourceId` truncated the slug with `slug.slice(-tailBudget)`, which cut mid-word when the boundary fell inside a token. For a repo where the combined `prefix-org-repo-pathhash` exceeded 32 chars, this produced embarrassing artifacts like `gstack-code-kill-270c0001-c32152` (from `drummerms-av-sow-wiz-skill-270c0001`). Two changes carried from #1481, adapted for the #1468 hostpathhash: 1. `constrainSourceId` now walks hyphen-separated tokens from the right, accumulating whole tokens until adding the next would exceed `tailBudget`. When no token fits, falls through to the existing `${prefix}-${hash}` form. 2. `deriveCodeSourceId` now retries with `repo-only-hostpathhash` (dropping the org segment) when the full `org-repo-hostpathhash` triggers truncation. Keeps the repo name readable when it fits at all. Plus a new test asserting the source id is period-free for the exact HTTPS-with-.git remote shape from #1357 (`https://github.com/foo/bar.git`). canonicalizeRemote strips `.git`; the sanitizer strips any residual non-alnum. The test closes #1357 by pinning the property. Closes #1357 Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain): probe CLI without command builtin * fix(gbrain-sync): centralize gbrain spawn surface + seed DATABASE_URL Cherry-picked from #1508 by jasshultz, restructured per codex review #4 and #7 to widen scope and centralize the spawn surface. The bug: gbrain auto-loads .env.local from cwd via dotenv. When /sync-gbrain runs inside a Next.js / Prisma / Rails project whose .env.local defines its own DATABASE_URL (pointing at the app's local DB), gbrain reads that value instead of its own ~/.gbrain/config.json — auth fails, code + memory stages crash. This commit: - Adds lib/gbrain-exec.ts: buildGbrainEnv, spawnGbrain, execGbrainJson, execGbrainText, spawnGbrainAsync (the last one for memory-ingest's streaming gbrain import call). buildGbrainEnv seeds DATABASE_URL from ${GBRAIN_HOME:-$HOME/.gbrain}/config.json, returns a fresh env object (never the caller's by identity — codex review #11), and honors the GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch. - Routes every gbrain spawn in bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts through the helpers. Both files now own zero direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain" call sites. - Threads buildGbrainEnv into the spawnSync("bun", [memory-ingest], ...) grandchild in runMemoryIngest (codex review #7). Without this, the parent fix is half-baked — the bun child inherits a clean env but needs DATABASE_URL pre-seeded too. spawnGbrainAsync inside memory-ingest provides defense in depth for standalone invocations. - Adds GBRAIN_HOME support — aligns with detectEngineTier (already honors GBRAIN_HOME) so all gstack-side gbrain calls agree on which config file matters. Resolves baseEnv.HOME first, then homedir(), so test injection works without process-wide HOME mutation. - Adds test/build-gbrain-env.test.ts: 10 unit tests covering all five env-seeding branches (seed from config / override caller / GSTACK_RESPECT escape hatch / missing config / unparseable config / no database_url field / GBRAIN_HOME path / object-identity guard / unrelated-vars preservation / idempotent-when-matches). - Adds test/gbrain-exec-invariant.test.ts: static-source check that greps both bin/gstack-gbrain-sync.ts and bin/gstack-memory-ingest.ts for direct spawnSync("gbrain"|spawn("gbrain"|execFileSync("gbrain"| execSync(...gbrain matches and fails the build if any are found. Refactor-proof against future contributors adding a new gbrain spawn without env threading. The invariant is intentionally narrow — only the two files where the DATABASE_URL bug actually hurts users are guarded. Migrating the spawn sites in lib/gbrain-local-status.ts, lib/gstack-memory-helpers.ts, and bin/gstack-brain-context-load.ts is a follow-up. Co-Authored-By: Jason Shultz <jasshultz@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-sync): add .gbrain-source to consumer repo .gitignore (#1384) The v1.29.0.0 changelog promised .gbrain-source would be added to the consuming repo's .gitignore so the per-worktree pin stays local, but the change actually only added it to gstack's own .gitignore. Without the consumer-side entry, the pin gets committed and Conductor sibling worktrees of the same repo + branch step on each other's pin every time anyone commits. Add ensureGbrainSourceGitignored after a successful gbrain sources attach in runCodeImport. Idempotent on repeat runs (line-trim match), creates .gitignore if missing, logs a warning and continues on permission errors so a read-only checkout doesn't fail the sync. Gate the top-level main() call behind import.meta.main so tests can import the helper without triggering a full sync run on module load. Tests in test/gbrain-source-gitignore.test.ts cover: create-when-missing, append-without-trailing-newline, append-with-trailing-newline, idempotent on repeat, recognize whitespace-surrounded entry, no-throw on read-only file. 6 pass. * fix(gbrain-sources): bump gbrain sources list --json timeout 10s → 30s Supabase free-tier cold-starts can push `gbrain sources list --json` past 10s (observed 14.5s in the wild), causing probeSource() to throw ETIMEDOUT during /sync-gbrain code stage even though the underlying CLI was healthy. Matches the 30s ceiling already used by `sources add` / `sources remove` in the same file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-allowlist): sync project-root eng-review-test-plan artifacts (#1452) Cherry-picked from #1465 by genisis0x and extended with the v1.40.0.0 upgrade migration that codex review #5 surfaced. #1465 alone only patches bin/gstack-artifacts-init, which means fresh installs and re-inits pick up the new pattern. But existing users who already ran v1.38.1.0 have a `.migrations/v1.38.1.0.done` marker — that migration won't re-run no matter what we change. So their installed `.brain-allowlist`, `.brain-privacy-map.json`, and `.gitattributes` stay without the new pattern, and `/plan-eng-review` artifacts continue to silently drop out of their federation queue. This commit: - bin/gstack-artifacts-init: adds projects/*/*-eng-review-test-plan-*.md to the three managed blocks. v1.38.1.0 covered design + test-plan; this completes the set for /plan-eng-review. - gstack-upgrade/migrations/v1.40.0.0.sh: targeted in-place repair for existing installs. Same idempotent jq-based shape as v1.38.1.0. Adds the new pattern to .brain-allowlist (before the USER ADDITIONS marker), .brain-privacy-map.json (as class=artifact), and .gitattributes (as merge=union). NEVER commits + pushes — the user controls when the patches ship to their federated artifacts repo. - test/artifacts-init-migration.test.ts: 5 new tests covering the v1.40.0.0 migration applied on top of a post-v1.38.1.0 state, jq patching, gitattributes append, idempotent re-run, and done-marker write when files are missing entirely. Co-Authored-By: Claude <noreply@anthropic.com> * fix(gbrain-install): skip postinstall on Windows MSYS/MINGW + post-install probe Cherry-picked from #1487 by genisis0x and extended with the post-install subcommand probe per T6 / codex review #19. `bun install` in $INSTALL_DIR fails on Windows MSYS/MINGW/Cygwin shells because gbrain's native postinstall script mis-parses path arguments and aborts with a non-zero exit, breaking gstack-gbrain-install for Windows users running git-bash/MSYS2. The package installs cleanly without scripts. This commit: - Adds Windows shell detection via `uname -s` matching MINGW*/MSYS*/CYGWIN*/Windows_NT (#1487's case statement already covers all four — codex review #18 confirmed MINGW* is included). Windows paths get `bun install --ignore-scripts`; macOS and Linux unchanged. - Adds a post-install probe of `gbrain sources --help`. `gbrain --version` already runs (D19 PATH-shadowing validation), but version success doesn't prove the subcommand surface is reachable — and `--ignore-scripts` may have skipped artifacts that subcommands need. Probe failure logs a clear warning (with Windows-specific remediation pointing at re-running `bun install` outside MSYS) but does NOT exit non-zero; users may still get value from gbrain even if the probe fails transiently. Refs #1271 Co-Authored-By: Claude <noreply@anthropic.com> * chore: v1.40.0.0 — gbrain sync hardening wave Bumps VERSION 1.39.2.0 → 1.40.0.0 (MINOR — substantial gbrain capability hardening across sync pipeline, install path, federation allowlist; ~600 net LOC added across 8 community PRs + plan-review refinements). CHANGELOG entry follows the release-summary format: two-line headline, lead paragraph, "numbers that matter" with before/after table across 8 user-visible surfaces, "what this means for builders" closer, itemized Added/Changed/Fixed/NOT fixed/For contributors sections. Per-commit contributor credits: 0xDevNinja, drummerms, Jayesh Betala, Jason Shultz, genisis0x. Also names NikhileshNanduri and realcarsonterry in the wave's "Fixed" section for independent submissions of the .gbrain-source gitignore bug. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: drummerms <mike@av2o.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Jason Shultz <jasshultz@gmail.com> Co-authored-by: genisis0x <manietdavv@gmail.com>
This commit is contained in:
+292
-39
@@ -32,13 +32,14 @@
|
||||
import { existsSync, statSync, mkdirSync, writeFileSync, readFileSync, unlinkSync, renameSync } from "fs";
|
||||
import { join, dirname } from "path";
|
||||
import { execSync, spawnSync } from "child_process";
|
||||
import { homedir } from "os";
|
||||
import { homedir, hostname } from "os";
|
||||
import { createHash } from "crypto";
|
||||
|
||||
import "../lib/conductor-env-shim";
|
||||
import { detectEngineTier, withErrorContext, canonicalizeRemote } from "../lib/gstack-memory-helpers";
|
||||
import { ensureSourceRegistered, sourcePageCount } from "../lib/gbrain-sources";
|
||||
import { localEngineStatus, type LocalEngineStatus } from "../lib/gbrain-local-status";
|
||||
import { buildGbrainEnv, spawnGbrain, execGbrainJson } from "../lib/gbrain-exec";
|
||||
|
||||
// ── Types ──────────────────────────────────────────────────────────────────
|
||||
|
||||
@@ -161,30 +162,42 @@ function originUrl(): string | null {
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive a worktree-aware source id for the cwd code corpus.
|
||||
* Derive a host- and worktree-aware source id for the cwd code corpus.
|
||||
*
|
||||
* Pattern: `gstack-code-<slug>-<pathhash8>` where slug comes from origin
|
||||
* (org/repo) and pathhash8 is the first 8 hex chars of sha1(absolute repo
|
||||
* path). The pathhash8 is what makes Conductor worktrees of the same repo
|
||||
* coexist as separate sources in the same gbrain DB instead of stomping on
|
||||
* each other.
|
||||
* Pattern: `gstack-code-<slug>-<hostpathhash8>` where slug comes from origin
|
||||
* (org/repo) and hostpathhash8 is the first 8 hex chars of
|
||||
* sha1(`${hostname}::${absolute repo path}`). Folding hostname into the hash
|
||||
* keeps Conductor worktrees of the same repo as distinct sources on one host
|
||||
* AND keeps two machines that share an absolute layout (e.g. chezmoi-managed
|
||||
* home dirs against a federated brain) from colliding on each other.
|
||||
*
|
||||
* Falls back to the repo basename when there is no origin (local repo).
|
||||
*
|
||||
* `GSTACK_HOSTNAME` env override is honored for deterministic tests; in
|
||||
* production paths it is unset and `os.hostname()` is used.
|
||||
*
|
||||
* gbrain enforces source ids to be 1-32 lowercase alnum chars with
|
||||
* optional interior hyphens. `constrainSourceId` handles the 32-char cap
|
||||
* with a hashed-tail fallback when the combined slug exceeds budget.
|
||||
*/
|
||||
function deriveCodeSourceId(repoPath: string): string {
|
||||
const pathHash = createHash("sha1").update(repoPath).digest("hex").slice(0, 8);
|
||||
const host = process.env.GSTACK_HOSTNAME || hostname();
|
||||
const hostPathHash = createHash("sha1").update(`${host}::${repoPath}`).digest("hex").slice(0, 8);
|
||||
const remote = canonicalizeRemote(originUrl());
|
||||
if (remote) {
|
||||
const segs = remote.split("/").filter(Boolean);
|
||||
const slugSource = segs.slice(-2).join("-");
|
||||
return constrainSourceId("gstack-code", `${slugSource}-${pathHash}`);
|
||||
const fullId = constrainSourceId("gstack-code", `${slugSource}-${hostPathHash}`);
|
||||
// If the org+repo+hostpathhash fits cleanly (suffix preserved), use it.
|
||||
if (fullId.endsWith(`-${hostPathHash}`)) return fullId;
|
||||
// Otherwise drop the org prefix and retry with just repo+hostpathhash so
|
||||
// the repo name stays readable. If that still doesn't fit,
|
||||
// constrainSourceId falls back to a deterministic hash-only form.
|
||||
const repoOnly = segs[segs.length - 1] || "repo";
|
||||
return constrainSourceId("gstack-code", `${repoOnly}-${hostPathHash}`);
|
||||
}
|
||||
const base = repoPath.split("/").pop() || "repo";
|
||||
return constrainSourceId("gstack-code", `${base}-${pathHash}`);
|
||||
return constrainSourceId("gstack-code", `${base}-${hostPathHash}`);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -208,10 +221,162 @@ function deriveLegacyCodeSourceId(repoPath: string): string {
|
||||
return constrainSourceId("gstack-code", base);
|
||||
}
|
||||
|
||||
/**
|
||||
* Pre-#1468 path-only-hash source id, kept for hostname-fold migration only.
|
||||
*
|
||||
* Before the hostname fold, `deriveCodeSourceId` hashed only the absolute
|
||||
* repo path: `gstack-code-<slug>-<sha1(path).slice(0,8)>`. After #1468 the
|
||||
* hash key is `${hostname}::${path}`, so every existing user's brain has a
|
||||
* legacy id that no longer matches what `deriveCodeSourceId` produces. We
|
||||
* detect this form once, attempt rename-in-place if the gbrain CLI supports
|
||||
* `sources rename`, and otherwise clean up after the new source successfully
|
||||
* syncs. Distinct from `deriveLegacyCodeSourceId` (pre-pathhash v1.x form);
|
||||
* both probes run.
|
||||
*/
|
||||
export function derivePathOnlyHashLegacyId(repoPath: string): string {
|
||||
const pathHash = createHash("sha1").update(repoPath).digest("hex").slice(0, 8);
|
||||
const remote = canonicalizeRemote(originUrl());
|
||||
if (remote) {
|
||||
const segs = remote.split("/").filter(Boolean);
|
||||
const slugSource = segs.slice(-2).join("-");
|
||||
return constrainSourceId("gstack-code", `${slugSource}-${pathHash}`);
|
||||
}
|
||||
const base = repoPath.split("/").pop() || "repo";
|
||||
return constrainSourceId("gstack-code", `${base}-${pathHash}`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Feature-check whether the installed gbrain CLI ships `sources rename <old> <new>`.
|
||||
*
|
||||
* Per the v1.40.0.0 design review: probing `gbrain sources rename --help` and
|
||||
* matching for the exact argument shape catches the case where gbrain's
|
||||
* `sources` parent help mentions a `rename` subcommand but the CLI doesn't
|
||||
* accept the `<old> <new>` form (or vice versa). Cached for the lifetime
|
||||
* of the process. As of gbrain 0.35.0.0 this command does not exist, so the
|
||||
* function returns false and the migration path falls back to register-new
|
||||
* + sync-OK + remove-old.
|
||||
*/
|
||||
let _gbrainSupportsRenameCache: boolean | null = null;
|
||||
export function _resetGbrainSupportsRenameCache(): void {
|
||||
_gbrainSupportsRenameCache = null;
|
||||
}
|
||||
function gbrainSupportsSourcesRename(env?: NodeJS.ProcessEnv): boolean {
|
||||
if (_gbrainSupportsRenameCache !== null) return _gbrainSupportsRenameCache;
|
||||
try {
|
||||
const r = spawnGbrain(["sources", "rename", "--help"], {
|
||||
timeout: 5_000,
|
||||
baseEnv: env,
|
||||
});
|
||||
const out = `${r.stdout || ""}\n${r.stderr || ""}`;
|
||||
// Match the exact argument shape: `rename <old> <new>` (with literal
|
||||
// angle brackets in usage strings) or `rename OLD NEW`.
|
||||
const exact = /sources\s+rename\s+<old>\s+<new>/i.test(out)
|
||||
|| /sources\s+rename\s+OLD\s+NEW/.test(out)
|
||||
|| /sources\s+rename\s+<oldId>\s+<newId>/i.test(out);
|
||||
_gbrainSupportsRenameCache = exact && r.status === 0;
|
||||
} catch {
|
||||
_gbrainSupportsRenameCache = false;
|
||||
}
|
||||
return _gbrainSupportsRenameCache;
|
||||
}
|
||||
|
||||
/**
|
||||
* Look up a source's `local_path` from `gbrain sources list --json`.
|
||||
* Returns null when the source is absent or the listing fails.
|
||||
*
|
||||
* `env` is the environment passed to the spawned `gbrain` process; defaults
|
||||
* to `process.env`. Tests inject a PATH that points at a gbrain shim so the
|
||||
* helper can be exercised without a real gbrain CLI.
|
||||
*/
|
||||
export function sourceLocalPath(sourceId: string, env?: NodeJS.ProcessEnv): string | null {
|
||||
const list = execGbrainJson<Array<{ id: string; local_path?: string }>>(
|
||||
["sources", "list", "--json"],
|
||||
{ baseEnv: env },
|
||||
);
|
||||
if (!list) return null;
|
||||
const found = list.find((s) => s.id === sourceId);
|
||||
return found?.local_path ?? null;
|
||||
}
|
||||
|
||||
/** Result of `planHostnameFoldMigration` — informs `runCodeImport` of next steps. */
|
||||
export type HostnameFoldMigration =
|
||||
| { kind: "none"; reason: "ids-match" | "no-legacy-source" }
|
||||
| { kind: "skipped-path-drift"; oldId: string; oldPath: string; currentPath: string }
|
||||
| { kind: "renamed"; oldId: string; newId: string }
|
||||
| { kind: "pending-cleanup"; oldId: string };
|
||||
|
||||
/**
|
||||
* Decide how to migrate from the pre-#1468 path-only-hash source id to the
|
||||
* new hostname-fold id.
|
||||
*
|
||||
* Order:
|
||||
* 1. If old == new → no-op.
|
||||
* 2. Look up old source's local_path. Absent → no legacy source to migrate.
|
||||
* 3. local_path != currentRoot → user moved the repo or two machines share a
|
||||
* hash slot. Skip migration; let the user clean up manually. We will NOT
|
||||
* rename or remove anything; the new source is registered alongside.
|
||||
* 4. Otherwise: feature-check `gbrain sources rename`. If supported and the
|
||||
* rename call exits 0 → renamed, pages preserved.
|
||||
* 5. Else: pending-cleanup. Caller registers + syncs new source first; only
|
||||
* after sync succeeds with a non-zero page count does it remove the old.
|
||||
* This avoids a data-loss window where the old source is gone before the
|
||||
* new one is verifiably populated.
|
||||
*/
|
||||
export function planHostnameFoldMigration(
|
||||
currentRoot: string,
|
||||
newSourceId: string,
|
||||
legacyPathHashId: string,
|
||||
env?: NodeJS.ProcessEnv,
|
||||
): HostnameFoldMigration {
|
||||
if (legacyPathHashId === newSourceId) {
|
||||
return { kind: "none", reason: "ids-match" };
|
||||
}
|
||||
const oldPath = sourceLocalPath(legacyPathHashId, env);
|
||||
if (oldPath === null) {
|
||||
return { kind: "none", reason: "no-legacy-source" };
|
||||
}
|
||||
if (oldPath !== currentRoot) {
|
||||
return {
|
||||
kind: "skipped-path-drift",
|
||||
oldId: legacyPathHashId,
|
||||
oldPath,
|
||||
currentPath: currentRoot,
|
||||
};
|
||||
}
|
||||
if (gbrainSupportsSourcesRename(env)) {
|
||||
const r = spawnGbrain(["sources", "rename", legacyPathHashId, newSourceId], { baseEnv: env });
|
||||
if (r.status === 0) {
|
||||
return { kind: "renamed", oldId: legacyPathHashId, newId: newSourceId };
|
||||
}
|
||||
// Rename failed at runtime — fall through to cleanup path.
|
||||
}
|
||||
return { kind: "pending-cleanup", oldId: legacyPathHashId };
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove an orphaned source. Called only after new-source sync verifies pages
|
||||
* exist, so the old source is provably redundant before deletion.
|
||||
*
|
||||
* Flag note: existing call sites used `--confirm-destructive` here and
|
||||
* `--yes` in `lib/gbrain-sources.ts` — gbrain 0.35.0.0 accepts neither
|
||||
* deterministically (the subcommand surface help is generic). We pass
|
||||
* `--confirm-destructive` to match the existing call site convention; the
|
||||
* flag-helper centralization in commit 4 (lib/gbrain-exec.ts) will resolve
|
||||
* the inconsistency across the codebase.
|
||||
*/
|
||||
export function removeOrphanedSource(oldId: string, env?: NodeJS.ProcessEnv): boolean {
|
||||
const r = spawnGbrain(["sources", "remove", oldId, "--confirm-destructive"], { baseEnv: env });
|
||||
return r.status === 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build a gbrain-valid source id (1-32 lowercase alnum + interior hyphens). Sanitizes
|
||||
* `raw`, prefixes with `prefix`, and falls back to a hashed-tail form when total length
|
||||
* would exceed 32 chars.
|
||||
*
|
||||
* Truncation cuts on hyphen boundaries (whole-word units) from the right, never
|
||||
* mid-word. Inputs like "drummerms-av-sow-wiz-skill-270c0001" produce
|
||||
* "${prefix}-270c0001-<hash>", not "${prefix}-kill-270c0001-<hash>".
|
||||
*/
|
||||
function constrainSourceId(prefix: string, raw: string): string {
|
||||
const MAX = 32;
|
||||
@@ -230,17 +395,21 @@ function constrainSourceId(prefix: string, raw: string): string {
|
||||
// Total budget: prefix + "-" + tail + "-" + hash
|
||||
const tailBudget = MAX - prefix.length - 2 - hash.length;
|
||||
if (tailBudget < 1) return `${prefix}-${hash}`;
|
||||
const tail = slug.slice(-tailBudget).replace(/^-+|-+$/g, "");
|
||||
return tail ? `${prefix}-${tail}-${hash}` : `${prefix}-${hash}`;
|
||||
}
|
||||
|
||||
function gbrainAvailable(): boolean {
|
||||
try {
|
||||
execSync("command -v gbrain", { stdio: "ignore" });
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
// Cut on hyphen boundaries instead of mid-word. Walk tokens from the right,
|
||||
// accumulating until adding the next token would exceed tailBudget. This
|
||||
// preserves readable suffixes (pathhash, repo name) and avoids embarrassing
|
||||
// mid-word artifacts like "skill" → "kill".
|
||||
const tokens = slug.split("-").filter(Boolean);
|
||||
const kept: string[] = [];
|
||||
let len = 0;
|
||||
for (let i = tokens.length - 1; i >= 0; i--) {
|
||||
const add = kept.length === 0 ? tokens[i].length : tokens[i].length + 1;
|
||||
if (len + add > tailBudget) break;
|
||||
kept.unshift(tokens[i]);
|
||||
len += add;
|
||||
}
|
||||
const tail = kept.join("-");
|
||||
return tail ? `${prefix}-${tail}-${hash}` : `${prefix}-${hash}`;
|
||||
}
|
||||
|
||||
// ── Lock file (D1) ─────────────────────────────────────────────────────────
|
||||
@@ -334,9 +503,6 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
if (!root) {
|
||||
return { name: "code", ran: false, ok: true, duration_ms: 0, summary: "skipped (not in git repo)" };
|
||||
}
|
||||
if (!gbrainAvailable()) {
|
||||
return { name: "code", ran: false, ok: false, duration_ms: 0, summary: "skipped (gbrain CLI not in PATH)" };
|
||||
}
|
||||
|
||||
const sourceId = deriveCodeSourceId(root);
|
||||
|
||||
@@ -365,31 +531,52 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
return skipStageForLocalStatus("code", localStatus, t0);
|
||||
}
|
||||
|
||||
// Step 0: Best-effort cleanup of pre-pathhash legacy source.
|
||||
// Step 0a: Best-effort cleanup of pre-pathhash legacy source (v1.x form).
|
||||
// Earlier /sync-gbrain versions registered `gstack-code-<slug>` (no path
|
||||
// suffix). On a multi-worktree repo, those collapsed onto a single id
|
||||
// with last-sync-wins. Federated search would return stale duplicate
|
||||
// hits forever if we left the orphan in place. Remove the legacy id once
|
||||
// here so users don't accumulate orphans.
|
||||
// Failure is non-fatal — we still register the new id below.
|
||||
// gbrainEnv seeds DATABASE_URL from gbrain's config so this stage works
|
||||
// inside Next.js / Prisma / Rails projects with their own .env.local
|
||||
// (codex review #7 — bug fix is wider than #1508 as filed).
|
||||
const gbrainEnv = buildGbrainEnv({ announce: !args.quiet });
|
||||
const legacyId = deriveLegacyCodeSourceId(root);
|
||||
let legacyRemoved = false;
|
||||
if (legacyId !== sourceId) {
|
||||
const rm = spawnSync("gbrain", ["sources", "remove", legacyId, "--confirm-destructive"], {
|
||||
encoding: "utf-8",
|
||||
const rm = spawnGbrain(["sources", "remove", legacyId, "--confirm-destructive"], {
|
||||
timeout: 30_000,
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
baseEnv: gbrainEnv,
|
||||
});
|
||||
// Treat absent-source as success (clean state). gbrain emits "not found" on
|
||||
// missing id; treat any non-zero exit without "not found" as a soft fail.
|
||||
if (rm.status === 0) legacyRemoved = true;
|
||||
}
|
||||
|
||||
// Step 0b: Hostname-fold migration (#1414).
|
||||
// Before #1468 the source id hashed only the absolute repo path. After the
|
||||
// hostname fold, every existing user has a legacy id that no longer matches
|
||||
// what deriveCodeSourceId produces. Try rename-in-place first (preserves
|
||||
// pages); fall back to register-new → sync-OK → remove-old. Path-drift
|
||||
// (user moved the repo, etc.) skips migration with a warning.
|
||||
const pathOnlyHashLegacyId = derivePathOnlyHashLegacyId(root);
|
||||
const migration = planHostnameFoldMigration(root, sourceId, pathOnlyHashLegacyId, gbrainEnv);
|
||||
if (migration.kind === "skipped-path-drift" && !args.quiet) {
|
||||
console.error(
|
||||
`[sync:code] hostname-fold migration skipped: legacy source ${migration.oldId} `
|
||||
+ `points at ${migration.oldPath}, current repo is ${migration.currentPath}. `
|
||||
+ `Clean up manually with: gbrain sources remove ${migration.oldId} --confirm-destructive`,
|
||||
);
|
||||
} else if (migration.kind === "renamed" && !args.quiet) {
|
||||
console.error(`[sync:code] hostname-fold migration: renamed ${migration.oldId} → ${migration.newId} (pages preserved)`);
|
||||
}
|
||||
|
||||
// Step 1: Ensure source registered (idempotent). Single source of truth in lib —
|
||||
// no synchronous duplicate here (per /codex review #12).
|
||||
let registered = false;
|
||||
try {
|
||||
const result = await ensureSourceRegistered(sourceId, root, { federated: true });
|
||||
const result = await ensureSourceRegistered(sourceId, root, { federated: true, env: gbrainEnv });
|
||||
registered = result.changed;
|
||||
} catch (err) {
|
||||
return {
|
||||
@@ -407,9 +594,10 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
? ["reindex-code", "--source", sourceId, "--yes"]
|
||||
: ["sync", "--strategy", "code", "--source", sourceId];
|
||||
|
||||
const syncResult = spawnSync("gbrain", syncArgs, {
|
||||
const syncResult = spawnGbrain(syncArgs, {
|
||||
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
|
||||
timeout: 35 * 60 * 1000,
|
||||
baseEnv: gbrainEnv,
|
||||
});
|
||||
|
||||
if (syncResult.status !== 0) {
|
||||
@@ -432,14 +620,32 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
// the wrong/default source. Treat it as a stage failure (ok=false) so the
|
||||
// verdict block surfaces ERR and the user knows to retry rather than
|
||||
// trusting stale results.
|
||||
const attach = spawnSync("gbrain", ["sources", "attach", sourceId], {
|
||||
encoding: "utf-8",
|
||||
const attach = spawnGbrain(["sources", "attach", sourceId], {
|
||||
timeout: 10_000,
|
||||
cwd: root,
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
baseEnv: gbrainEnv,
|
||||
});
|
||||
const pageCount = sourcePageCount(sourceId);
|
||||
const legacyNote = legacyRemoved ? `, removed legacy ${legacyId}` : "";
|
||||
const pageCount = sourcePageCount(sourceId, gbrainEnv);
|
||||
|
||||
// Step 4: Deferred hostname-fold cleanup.
|
||||
// Only remove the pre-#1468 path-only-hash source NOW that the new source
|
||||
// has registered + synced + has pages. Removing before sync would create a
|
||||
// data-loss window if sync failed; removing without a page-count check would
|
||||
// wipe pages when sync silently no-op'd. This is the codex-review-flagged
|
||||
// safety: register → sync → verify → THEN delete.
|
||||
let hostnameLegacyRemoved = false;
|
||||
if (migration.kind === "pending-cleanup" && pageCount !== null && pageCount > 0) {
|
||||
hostnameLegacyRemoved = removeOrphanedSource(migration.oldId, gbrainEnv);
|
||||
if (hostnameLegacyRemoved && !args.quiet) {
|
||||
console.error(`[sync:code] hostname-fold migration: removed legacy ${migration.oldId} after new source sync verified (page_count=${pageCount})`);
|
||||
}
|
||||
}
|
||||
|
||||
const legacyParts: string[] = [];
|
||||
if (legacyRemoved) legacyParts.push(`removed legacy ${legacyId}`);
|
||||
if (migration.kind === "renamed") legacyParts.push(`renamed ${migration.oldId}→${migration.newId}`);
|
||||
if (hostnameLegacyRemoved) legacyParts.push(`removed pre-hostname-fold ${migration.kind === "pending-cleanup" ? migration.oldId : ""}`);
|
||||
const legacyNote = legacyParts.length > 0 ? `, ${legacyParts.join(", ")}` : "";
|
||||
const baseSummary = `${registered ? "registered + " : ""}synced ${sourceId} (page_count=${pageCount ?? "unknown"}${legacyNote})`;
|
||||
|
||||
if (attach.status !== 0) {
|
||||
@@ -460,6 +666,13 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
};
|
||||
}
|
||||
|
||||
// v1.29.0.0 changelog promised the per-worktree pin would be ignored in the
|
||||
// consuming repo, but the change actually only added .gbrain-source to
|
||||
// gstack's own .gitignore. Without the consumer-side entry, the pin gets
|
||||
// committed and breaks the per-worktree promise: Conductor sibling worktrees
|
||||
// step on each other's pin every time anyone commits (#1384).
|
||||
ensureGbrainSourceGitignored(root);
|
||||
|
||||
return {
|
||||
name: "code",
|
||||
ran: true,
|
||||
@@ -476,6 +689,39 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Ensure `.gbrain-source` is listed in the consumer repo's `.gitignore`.
|
||||
*
|
||||
* Idempotent: only appends when the entry is not already present (matched on
|
||||
* trimmed lines so a leading/trailing whitespace difference doesn't add a
|
||||
* second copy). Wraps writes in try/catch so a read-only checkout or weird
|
||||
* perms logs a warning and lets the rest of the sync continue.
|
||||
*/
|
||||
export function ensureGbrainSourceGitignored(root: string): void {
|
||||
const gitignorePath = join(root, ".gitignore");
|
||||
try {
|
||||
let existing = "";
|
||||
try {
|
||||
existing = readFileSync(gitignorePath, "utf-8");
|
||||
} catch {
|
||||
// No .gitignore yet — we'll create it.
|
||||
}
|
||||
const alreadyIgnored = existing
|
||||
.split("\n")
|
||||
.some((line) => line.trim() === ".gbrain-source");
|
||||
if (alreadyIgnored) {
|
||||
return;
|
||||
}
|
||||
const sep = existing.length > 0 && !existing.endsWith("\n") ? "\n" : "";
|
||||
writeFileSync(gitignorePath, existing + sep + ".gbrain-source\n");
|
||||
} catch (err) {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
console.warn(
|
||||
`[sync:code] could not add .gbrain-source to ${gitignorePath}: ${msg}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function runMemoryIngest(args: CliArgs): StageResult {
|
||||
const t0 = Date.now();
|
||||
|
||||
@@ -498,9 +744,14 @@ function runMemoryIngest(args: CliArgs): StageResult {
|
||||
else ingestArgs.push("--incremental");
|
||||
if (args.quiet) ingestArgs.push("--quiet");
|
||||
|
||||
// Thread the seeded env into the bun grandchild (codex review #7 — the
|
||||
// .env.local footgun affects gstack-memory-ingest.ts too, not just the
|
||||
// direct gbrain spawns in this file). The grandchild calls gbrain import
|
||||
// internally and must see the DATABASE_URL from gbrain's own config.
|
||||
const result = spawnSync("bun", ingestArgs, {
|
||||
encoding: "utf-8",
|
||||
timeout: 35 * 60 * 1000,
|
||||
env: buildGbrainEnv({ announce: false }),
|
||||
});
|
||||
|
||||
// D6: parse [memory-ingest] lines from the child's stderr. ERR-prefixed
|
||||
@@ -675,8 +926,10 @@ async function main(): Promise<void> {
|
||||
process.exit(exitCode);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-gbrain-sync fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
releaseLock();
|
||||
process.exit(1);
|
||||
});
|
||||
if (import.meta.main) {
|
||||
main().catch((err) => {
|
||||
console.error(`gstack-gbrain-sync fatal: ${err instanceof Error ? err.message : String(err)}`);
|
||||
releaseLock();
|
||||
process.exit(1);
|
||||
});
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user