Files
gstack/bin/gstack-gbrain-sync.ts
T
Garry Tan bf65487162 v1.26.0.0 feat: V1 transcript ingest + per-skill gbrain manifests + retrieval surface (#1298)
* feat: lib/gstack-memory-helpers shared module for V1 memory ingest pipeline

Lane 0 foundation per plan §"Eng review additions". 5 public functions
imported by the V1 helpers (Lanes A/B/C):

  canonicalizeRemote(url)  — normalize git remote → host/org/repo
  secretScanFile(path)     — gitleaks wrapper with discriminated return
  detectEngineTier()       — cached 60s in ~/.gstack/.gbrain-engine-cache.json
  parseSkillManifest(path) — extract gbrain.context_queries: from frontmatter
  withErrorContext(op,fn,caller) — async-aware error logging

22 unit tests, all passing. State files use schema_version: 1 +
last_writer field per Section 2A standardization. Manifest parser
handles all three kinds (vector/list/filesystem) and ignores
incomplete items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-memory-ingest — V1 unified memory ingest helper

Lane A. Walks coding-agent transcripts (Claude Code + Codex; Cursor V1.0.1
follow-up) AND ~/.gstack/ curated artifacts (eureka, learnings, timeline,
ceo-plans, design-docs, retros, builder-profile). Calls gbrain put_page
with type-tagged frontmatter. Uses gstack-memory-helpers (Lane 0):

  - Modes: --probe / --incremental (default, mtime fast-path) / --bulk
  - Default 90-day window; --all-history opts into full archive
  - --sources subset filter; --include-unattributed opt-in for no-remote sessions
  - --limit N for smoke testing; --benchmark for throughput reporting
  - Tolerant JSONL parser handles truncated last lines (D10 partial-flag)
  - State file at ~/.gstack/.transcript-ingest-state.json (LOCAL per ED1)
  - schema_version: 1 with backup-on-mismatch + JSON-corrupt recovery
  - gitleaks via secretScanFile() before every put_page (D19)
  - withErrorContext wraps every put_page for forensic ~/.gstack/.gbrain-errors.jsonl

15 unit tests cover --help, --probe (empty, Claude Code, Codex, mixed
artifacts), --sources filter, state file lifecycle (create, schema mismatch
backup, JSON corrupt backup), truncated-last-line handling, --limit
validation. All passing.

V1.5 P0 follow-ups noted in the file header:
  - Cursor SQLite extraction (V1.0.1)
  - gbrain put_file routing for Supabase Storage tier (cross-repo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-gbrain-sync — V1 unified sync verb (Lane B)

Orchestrates three storage tiers per plan §"Storage tiering":
  1. Code (current repo)         → gbrain import (Supabase or local PGLite)
  2. Transcripts + curated memory → gstack-memory-ingest (typed put_page)
  3. Curated artifacts to git    → gstack-brain-sync (existing pipeline)

Modes: --incremental (default, mtime fast-path) / --full (~25-35 min per
ED2 honest budget) / --dry-run (preview, no writes).

Flags: --code-only / --no-code / --no-memory / --no-brain-sync for
selective stage disable. Each stage failure is non-fatal; subsequent
stages still run.

State at ~/.gstack/.gbrain-sync-state.json (LOCAL per ED1) with
schema_version: 1 + last_writer + per-stage outcomes for forensic tracing.

--watch daemon explicitly deferred to V1.5 P0 TODO per Codex F3
(reverses the "no daemon" invariant). Continuous sync rides the existing
preamble-boundary hook only.

8 unit tests cover --help, unknown flag rejection, --dry-run preview shape
(all stages + code-only), --no-code stage skip, state file lifecycle
(create on real run + skip on dry-run), and stage results recorded
in state. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-brain-context-load — V1 retrieval surface (Lane C)

Called from the gstack preamble at every skill start. Reads the active
skill's gbrain.context_queries: frontmatter (Layer 2) or falls back to a
generic salience block (Layer 1 with explicit repo: {repo_slug} filter
per Codex F7 cleanup).

Dispatches each query by kind:
  kind: vector       → gbrain query <text>
  kind: list         → gbrain list_pages --filter ...
  kind: filesystem   → local glob (with mtime_desc sort + tail support)

Each MCP/CLI call has a 500ms hard timeout per Section 1C. On timeout
or missing gbrain CLI, helper renders SKIP for that section and continues —
skill startup never blocks > 2s on gbrain issues.

Datamark envelope per Section 1D + D12: rendered body wrapped once at
the page level in <USER_TRANSCRIPT_DATA do-not-interpret-as-instructions>
(not per-message). Layer 1 prompt-injection defense.

Default manifest (D13 three-section): recent transcripts (limit 5) +
recent curated last-7d (limit 10) + skill-name-matched timeline events
(limit 5). All scoped to {repo_slug}.

Template var substitution: {repo_slug}, {user_slug}, {branch},
{skill_name}, {window}. Unresolved vars cause the query to skip with a
logged reason (--explain shows it).

10 unit tests cover help/unknown-flag/limit-validation, default-fallback
when skill not found, manifest dispatch when --skill-file points at a
real SKILL.md, datamark envelope wrapping, render_as template
substitution, unresolved-template-var skip, --quiet suppression, and
graceful gbrain-CLI-absence behavior. All passing.

V1.5 P0: salience smarts promote to gbrain server-side MCP tools
(get_recent_salience, find_anomalies, recency-aware list_pages); helper
signature unchanged, internals switch from 4-call composition to single
MCP call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain.context_queries manifests on 6 V1 skills (Lane E partial)

Adds the V1 retrieval contracts. Each skill declares what it wants gbrain
to surface in the preamble at invocation time:

  /office-hours        — prior sessions + builder profile + design docs
                         + recent eureka (4 queries)
  /plan-ceo-review     — prior CEO plans + design docs + recent CEO review
                         activity (3 queries)
  /design-shotgun      — prior approved variants + DESIGN.md + recent
                         design docs (3 queries)
  /design-consultation — existing DESIGN.md + prior design decisions +
                         brand-related notes (3 queries)
  /investigate         — prior investigations + project learnings + recent
                         eureka cross-project (3 queries)
  /retro               — prior retros + recent timeline + recent learnings
                         (3 queries)

Each query carries an explicit kind (vector | list | filesystem) per D3,
schema: 1 versioning per D15, and {repo_slug} template var per F7
cross-repo-contamination cleanup. Mix of vector / list / filesystem
matches what each skill actually needs:

  - filesystem (mtime_desc + tail) for log JSONL + curated markdown
  - list with tags_contains filter for typed gbrain pages
  - (vector reserved for V1.0.1 when gbrain query surface stabilizes)

Smoke test: bun run bin/gstack-brain-context-load.ts --skill-file
office-hours/SKILL.md --repo test-repo --explain returns mode=manifest
queries=4 with the filesystem kinds populating real data from
~/.gstack/builder-profile.jsonl + ~/.gstack/analytics/eureka.jsonl on
this Mac. End-to-end retrieval flow confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Step 7.5 ingest gate + Step 10 verdict + memory.md ref doc (Lane E partial)

Step 7.5: Transcript & memory ingest gate. After Step 7 wires brain-sync
but before Step 8's CLAUDE.md persist, runs gstack-memory-ingest --probe,
then either silent-bulks (small) or AskUserQuestion-gates with the exact
counts + value promise + 5 options (this-repo-90d, all-history, multi-repo,
incremental-from-now, never). Decision persists to
gstack-config set transcript_ingest_mode <choice>.

Step 10: GREEN/YELLOW/RED verdict block. Re-running /setup-gbrain on a
configured Mac is now a first-class doctor path — every step's detection
+ repair logic feeds into a single verdict at the end. Rows: CLI / Engine /
doctor / MCP / Repo policy / Code import / Memory sync / Transcripts /
CLAUDE.md / Smoke. Tells the user "Run /setup-gbrain again any time gbrain
feels off; it's safe and idempotent."

setup-gbrain/memory.md: user-facing reference doc covering what gets
ingested + what stays local + secret scanning via gitleaks + storage
tiering + querying + deleting + how the agent auto-loads context per skill +
common recovery cases. Linked from Step 8's CLAUDE.md persist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: V1 E2E pipeline + --no-write flag for ingest helper (Lane F)

E2E pipeline test exercises the full Lane A → B → C value loop:
  1. Set up fake $HOME with all 8 memory source types as fixtures
  2. gstack-memory-ingest --probe verifies counts match disk
  3. gstack-memory-ingest --incremental writes state with schema_version: 1
  4. Idempotency: re-run reports 0 changes
  5. --probe distinguishes new vs unchanged after first incremental
  6. gstack-gbrain-sync --dry-run previews 3 stages
  7. --no-code --no-brain-sync --quiet writes sync state with 1 stage entry
  8. office-hours/SKILL.md V1 manifest dispatches 4 queries (mode=manifest)
  9. Datamark envelope wraps every loaded section (Section 1D + D12)
 10. Layer 1 fallback when no skill specified — default 3-section manifest
 11. plan-ceo-review/SKILL.md manifest also dispatches (regression for V1
     manifest authoring across all 6 V1 skills)

Side effect: bin/gstack-memory-ingest.ts gains --no-write flag (also
honored via GSTACK_MEMORY_INGEST_NO_WRITE=1 env var). Skips gbrain put_page
calls while still updating the state file. Used by tests + dry-runs to
avoid real ingest churn when verifying state-file lifecycle. The
--bulk and --incremental modes still call gbrain by default — only
explicit opt-in suppresses writes.

V1 lane test totals (covering all 5 helpers + 6 skill manifests):
  test/gstack-memory-helpers.test.ts     22 tests
  test/gstack-memory-ingest.test.ts      15 tests
  test/gstack-gbrain-sync.test.ts         8 tests
  test/gstack-brain-context-load.test.ts 10 tests
  test/skill-e2e-memory-pipeline.test.ts 10 tests
  ────────────────────────────────────── ─────────
  TOTAL                                  65 passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.26.0.0)

V1 of memory ingest + retrieval surface. Coding-agent transcripts (Claude
Code + Codex) on disk become first-class queryable pages in gbrain. Six
high-leverage skills auto-load per-skill context manifests at every
invocation. Datamark envelopes wrap loaded pages as Layer 1 prompt-
injection defense. Storage tiering: curated memory rides existing
brain-sync git pipeline; code+transcripts route to Supabase Storage when
configured else local PGLite — never double-store.

Net branch size vs main: +4174/-849 across 39 files. 65 V1 tests, all
green. Goldilocks scope per CEO D18; V1.5 P0 follow-ups documented in
the plan's V1.5 TODOs section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:40:30 -07:00

333 lines
11 KiB
TypeScript

#!/usr/bin/env bun
/**
* gstack-gbrain-sync — V1 unified sync verb.
*
* Orchestrates three storage tiers per plan §"Storage tiering":
*
* 1. Code (current repo) → gbrain import (Supabase or local PGLite)
* 2. Transcripts + curated memory → gstack-memory-ingest (typed put_page)
* 3. Curated artifacts to git → gstack-brain-sync (existing pipeline)
*
* Modes:
* --incremental (default) — mtime fast-path; runs all 3 stages with cache hits
* --full — first-run; full walk + import; honest budget per ED2
* --dry-run — preview what would sync; no writes
*
* --watch (V1.5 P0 TODO): file-watcher daemon. Deferred per Codex F3 ("no daemon"
* invariant). For V1, continuous sync rides the preamble-boundary hook only.
*
* Cross-repo TODO (V1.5): when gbrain CLI ships `put_file` + `restore-from-sync`,
* this helper picks them up via version probe (Codex F6 + D9) and routes
* code/transcripts to Supabase Storage instead of put_page.
*/
import { existsSync, statSync, mkdirSync, writeFileSync, readFileSync } from "fs";
import { join, dirname } from "path";
import { execSync, spawnSync } from "child_process";
import { homedir } from "os";
import { detectEngineTier, withErrorContext } from "../lib/gstack-memory-helpers";
// ── Types ──────────────────────────────────────────────────────────────────
type Mode = "incremental" | "full" | "dry-run";
interface CliArgs {
mode: Mode;
quiet: boolean;
noCode: boolean;
noMemory: boolean;
noBrainSync: boolean;
codeOnly: boolean;
}
interface StageResult {
name: string;
ran: boolean;
ok: boolean;
duration_ms: number;
summary: string;
}
// ── Constants ──────────────────────────────────────────────────────────────
const HOME = homedir();
const GSTACK_HOME = process.env.GSTACK_HOME || join(HOME, ".gstack");
const STATE_PATH = join(GSTACK_HOME, ".gbrain-sync-state.json");
// ── CLI ────────────────────────────────────────────────────────────────────
function printUsage(): void {
console.error(`Usage: gstack-gbrain-sync [--incremental|--full|--dry-run] [options]
Modes:
--incremental Default. mtime fast-path; ~50ms steady-state.
--full First-run; full walk + import. Honest ~25-35 min for big Macs (ED2).
--dry-run Preview what would sync; no writes.
Options:
--quiet Suppress per-stage output.
--no-code Skip the gbrain import (current repo) stage.
--no-memory Skip the gstack-memory-ingest stage (transcripts + artifacts).
--no-brain-sync Skip the gstack-brain-sync git pipeline stage.
--code-only Only run the gbrain import stage (alias for --no-memory --no-brain-sync).
--help This text.
Stages run in order: code import → memory ingest → curated git push.
Each stage failure is non-fatal; subsequent stages still run.
`);
}
function parseArgs(): CliArgs {
const args = process.argv.slice(2);
let mode: Mode = "incremental";
let quiet = false;
let noCode = false;
let noMemory = false;
let noBrainSync = false;
let codeOnly = false;
for (let i = 0; i < args.length; i++) {
const a = args[i];
switch (a) {
case "--incremental": mode = "incremental"; break;
case "--full": mode = "full"; break;
case "--dry-run": mode = "dry-run"; break;
case "--quiet": quiet = true; break;
case "--no-code": noCode = true; break;
case "--no-memory": noMemory = true; break;
case "--no-brain-sync": noBrainSync = true; break;
case "--code-only":
codeOnly = true;
noMemory = true;
noBrainSync = true;
break;
case "--help":
case "-h":
printUsage();
process.exit(0);
default:
console.error(`Unknown argument: ${a}`);
printUsage();
process.exit(1);
}
}
return { mode, quiet, noCode, noMemory, noBrainSync, codeOnly };
}
// ── Stage runners ──────────────────────────────────────────────────────────
function repoRoot(): string | null {
try {
const out = execSync("git rev-parse --show-toplevel", { encoding: "utf-8", timeout: 2000 });
return out.trim();
} catch {
return null;
}
}
function gbrainAvailable(): boolean {
try {
execSync("command -v gbrain", { stdio: "ignore" });
return true;
} catch {
return false;
}
}
function runCodeImport(args: CliArgs): StageResult {
const t0 = Date.now();
const root = repoRoot();
if (!root) {
return { name: "code", ran: false, ok: true, duration_ms: 0, summary: "skipped (not in git repo)" };
}
if (!gbrainAvailable()) {
return { name: "code", ran: false, ok: false, duration_ms: 0, summary: "skipped (gbrain CLI not in PATH)" };
}
if (args.mode === "dry-run") {
return { name: "code", ran: false, ok: true, duration_ms: 0, summary: `would: gbrain import ${root} --no-embed` };
}
const importArgs = ["import", root, "--no-embed"];
if (args.mode === "incremental") {
// gbrain import is itself idempotent on re-import; --incremental flag if it supports
importArgs.push("--incremental");
}
try {
spawnSync("gbrain", importArgs, {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: 5 * 60 * 1000,
});
// Trigger background embedding catch-up
spawnSync("gbrain", ["embed", "--stale"], {
stdio: ["ignore", "ignore", "ignore"],
timeout: 1000, // background spawn; don't wait
});
return {
name: "code",
ran: true,
ok: true,
duration_ms: Date.now() - t0,
summary: `imported ${root}`,
};
} catch (err) {
return {
name: "code",
ran: true,
ok: false,
duration_ms: Date.now() - t0,
summary: `gbrain import failed: ${(err as Error).message}`,
};
}
}
function runMemoryIngest(args: CliArgs): StageResult {
const t0 = Date.now();
if (args.mode === "dry-run") {
return { name: "memory", ran: false, ok: true, duration_ms: 0, summary: "would: gstack-memory-ingest --probe" };
}
const ingestPath = join(import.meta.dir, "gstack-memory-ingest.ts");
const ingestArgs = ["run", ingestPath];
if (args.mode === "full") ingestArgs.push("--bulk");
else ingestArgs.push("--incremental");
if (args.quiet) ingestArgs.push("--quiet");
const result = spawnSync("bun", ingestArgs, {
encoding: "utf-8",
timeout: 35 * 60 * 1000, // honest 35-min ceiling per ED2
});
const summary = (result.stderr || "").split("\n").filter((l) => l.includes("[memory-ingest]")).slice(-1)[0] || "ingest pass complete";
return {
name: "memory",
ran: true,
ok: result.status === 0,
duration_ms: Date.now() - t0,
summary: result.status === 0 ? summary : `memory ingest exited ${result.status}`,
};
}
function runBrainSyncPush(args: CliArgs): StageResult {
const t0 = Date.now();
if (args.mode === "dry-run") {
return { name: "brain-sync", ran: false, ok: true, duration_ms: 0, summary: "would: gstack-brain-sync --discover-new --once" };
}
const brainSyncPath = join(HOME, ".claude", "skills", "gstack", "bin", "gstack-brain-sync");
if (!existsSync(brainSyncPath)) {
return { name: "brain-sync", ran: false, ok: true, duration_ms: 0, summary: "skipped (gstack-brain-sync not installed)" };
}
// Discover new artifacts then drain queue
spawnSync(brainSyncPath, ["--discover-new"], {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: 60 * 1000,
});
const result = spawnSync(brainSyncPath, ["--once"], {
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
timeout: 60 * 1000,
});
return {
name: "brain-sync",
ran: true,
ok: result.status === 0,
duration_ms: Date.now() - t0,
summary: result.status === 0 ? "curated artifacts pushed" : `gstack-brain-sync exited ${result.status}`,
};
}
// ── State file (records last sync timestamp + stage outcomes) ──────────────
interface SyncState {
schema_version: 1;
last_writer: string;
last_sync?: string;
last_full_sync?: string;
last_stages?: StageResult[];
}
function loadSyncState(): SyncState {
if (!existsSync(STATE_PATH)) {
return { schema_version: 1, last_writer: "gstack-gbrain-sync" };
}
try {
const raw = JSON.parse(readFileSync(STATE_PATH, "utf-8")) as SyncState;
if (raw.schema_version === 1) return raw;
} catch {
// fall through
}
return { schema_version: 1, last_writer: "gstack-gbrain-sync" };
}
function saveSyncState(state: SyncState): void {
try {
mkdirSync(dirname(STATE_PATH), { recursive: true });
writeFileSync(STATE_PATH, JSON.stringify(state, null, 2), "utf-8");
} catch {
// non-fatal
}
}
// ── Output ─────────────────────────────────────────────────────────────────
function formatStage(s: StageResult): string {
const status = !s.ran ? "SKIP" : s.ok ? "OK" : "ERR";
const dur = s.duration_ms > 0 ? ` (${(s.duration_ms / 1000).toFixed(1)}s)` : "";
return ` ${status.padEnd(5)} ${s.name.padEnd(12)} ${s.summary}${dur}`;
}
// ── Main ───────────────────────────────────────────────────────────────────
async function main(): Promise<void> {
const args = parseArgs();
if (!args.quiet) {
const engine = detectEngineTier();
console.error(`[gbrain-sync] mode=${args.mode} engine=${engine.engine}`);
}
const state = loadSyncState();
const stages: StageResult[] = [];
if (!args.noCode) {
stages.push(await withErrorContext("sync:code", () => runCodeImport(args), "gstack-gbrain-sync"));
}
if (!args.noMemory) {
stages.push(await withErrorContext("sync:memory", () => runMemoryIngest(args), "gstack-gbrain-sync"));
}
if (!args.noBrainSync) {
stages.push(await withErrorContext("sync:brain-sync", () => runBrainSyncPush(args), "gstack-gbrain-sync"));
}
// Persist state (skip on dry-run)
if (args.mode !== "dry-run") {
state.last_sync = new Date().toISOString();
if (args.mode === "full") state.last_full_sync = state.last_sync;
state.last_stages = stages;
saveSyncState(state);
}
if (!args.quiet || args.mode === "dry-run") {
console.log(`\ngstack-gbrain-sync (${args.mode}):`);
for (const s of stages) console.log(formatStage(s));
const okCount = stages.filter((s) => s.ok).length;
const errCount = stages.filter((s) => !s.ok && s.ran).length;
console.log(`\n ${okCount} ok, ${errCount} error, ${stages.length - okCount - errCount} skipped`);
}
const anyError = stages.some((s) => s.ran && !s.ok);
process.exit(anyError ? 1 : 0);
}
main().catch((err) => {
console.error(`gstack-gbrain-sync fatal: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
});