v1.26.5.0 fix wave: gbrain ingest writer (hybrid frontmatter) + gbrain-valid source ids (#1344)

* fix: use correct `gbrain put <slug>` CLI verb in memory ingest

`put_page` is the MCP tool name, not a CLI subcommand. The actual
gbrain verb is `put <slug>` with content via stdin and tags in YAML
frontmatter. Every transcript / memory ingest fails today on clean
installs.

Switch to the right verb and inject title/type/tags into the
frontmatter that buildTranscriptPage / buildArtifactPage already
produce.

Bundled in the same function:

- timeout: 30s → 60s. Auto-link reconciliation hits 30s once the
  brain has a few hundred pages.
- maxBuffer: 1MB → 16MB. Without it Node truncates gbrain's stderr
  and callers see only `Command failed:` with no detail.
- Surface stderr/stdout in the returned error instead of the bare
  exception.

Verified: bun test test/gstack-memory-ingest.test.ts -> 15/15 pass.
bun test on the three test files touching this path -> 362/362.

* fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names

`deriveCodeSourceId` previously concatenated the canonicalized remote with only `/`
and whitespace stripped, leaving dots from hostnames (`github.com`) and no length
cap. gbrain rejects any source id containing characters outside [a-z0-9-] or longer
than 32 chars, so `github.com/<org>/<repo>` produced `gstack-code-github.com-<org>-<repo>`
(40 chars, plus dots) and registration failed:

    code  source registration failed: Invalid source id
          "gstack-code-github.com-radubach-platform". Must be 1-32 lowercase alnum
          chars with optional interior hyphens.

Fix:
- Drop the host segment (`github.com` is the same for nearly every user and just
  consumes the 32-char budget). Use only the last two path segments (org-repo).
- Sanitize any remaining non-alnum to hyphens, then collapse and trim.
- For genuinely long org/repo names that still exceed the budget, keep the tail
  (most distinctive end of the slug) and append a 6-char sha1 hash for collision
  resistance.

Adds a regression test that spawns the CLI in temp git repos with controlled
remotes (dot in hostname, SCP-style, multi-dot host, long names forcing
hash-truncation) and asserts every derived id is ≤32 chars and matches the
gbrain validator regex.

* fix(memory-ingest): hybrid frontmatter writer + tightened gbrain availability probe

PR #1328 (merged in the prior commit) correctly injects title/type/tags
into the YAML frontmatter that buildTranscriptPage already prepends. But
buildArtifactPage emits raw markdown without frontmatter, so design-docs,
learnings, and builder-profile-entries were landing in gbrain with empty
title/type/tags. Add the no-frontmatter wrap branch so artifact pages get
the same metadata the inject branch provides for transcripts.

Also bring in gbrainAvailable()'s --help probe (originally proposed in
PR #1341 by Alex Medina), with the regex tightened from /(^|\s)put(\s|$)/m
to /^\s+put\s/m. Anchoring on the indented subcommand format gbrain's
help actually uses keeps the probe from matching "put" appearing as
prose in help text, while still failing fast with one clean error if a
future gbrain renames or removes the put subcommand.

Updates the V1.5 NOTE doc block at the top of the file to describe the
current put-via-stdin shape rather than the legacy put_page flag form.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* test+fix(memory-ingest): strengthen regression tests, fix inject for malformed-close frontmatter

Imports the shim-based regression tests from PR #1341 (Alex Medina) and
strengthens them to assert title, type, and tags actually arrive in put
stdin — not just `agent: claude-code`. Asserting the metadata fields
matches the regression class that's caused this fix wave: writers can
"succeed" while metadata is silently lost. The original PR #1341 tests
would have passed even with title/type/tags missing.

Strengthening the test surfaced a deeper issue. buildTranscriptPage joins
frontmatter array elements with "\n" and does not append a trailing
newline, so the close fence is "\n---<content>" directly, not "\n---\n".
PR #1328's inject branch searched for "\n---\n" and never matched —
which means even with PR #1328 alone, transcript pages were landing in
gbrain with no title/type/tags. Two-line fix: search for "\n---" only,
since the inject lands before the close fence regardless of what
follows it.

Also imports PR #1341's V1.5 NOTE doc-block update and the section
comment refresh so the prose stays accurate against the new writer
shape.

Co-Authored-By: Alex Medina <oficina@puntoverdemc.com>

* fix+test(gbrain-sync): handle empty-slug edge in constrainSourceId, add no-origin and basename-empty regression tests

PR #1330 (merged in the prior commit) addressed the dot-in-host and
length-overflow cases for source-id derivation, but constrainSourceId
silently returned "${prefix}-" when the input sanitized to an empty
slug — invalid per gbrain's `^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$`
validator on the trailing hyphen. Adds an explicit empty-slug branch
that falls back to a sha1-prefixed id ("gstack-code-<6hex>") so the
output stays gbrain-valid for every input shape.

Two new regression tests cover the corners PR #1330's coverage left
exposed:
- no-origin fallback: a cwd repo with no `origin` remote configured
  must still derive a valid id from the basename.
- basename-sanitizes-to-empty: a repo whose path basename is all
  non-alnum (e.g. "___") must produce the hash-only fallback, not
  an invalid trailing-hyphen id.

Both run the CLI inside temp git repos for genuine end-to-end
coverage (matches the pattern PR #1330 established for its own four
remote-shape cases).

Co-Authored-By: Richard Dubach <radubach@gmail.com>

* chore: bump VERSION to 1.26.5.0 + CHANGELOG entry for fix wave

PATCH bump. Three bug fixes (memory-ingest put_page CLI verb mismatch,
hybrid frontmatter writer for transcripts AND artifacts, gbrain-valid
source-id derivation for github-hosted repos), no new user capability.

CHANGELOG release-summary leads with what users can now do (clean-
install transcripts populate the brain, github-hosted repos register
code sources) and tabulates before/after numbers from real gbrain
v0.25.1 smoke output. Itemized changes credit @smithjoshua, @AZ-1224,
and @radubach for the originating PRs plus the additional hybrid
branch + strengthened tests added on top per Codex plan-review.

* docs(todos): file P2 (gbrain install-pin staleness) + P3 (source-id host-collision) follow-ups

Two follow-ups surfaced during the v1.26.5.0 fix-wave plan review.

P2 — Issue #1305 part 2: bin/gstack-gbrain-install pins gbrain to
v0.18.2 (commit 08b3698) but doesn't move when gstack ships features
that depend on newer gbrain ops or schema. Fresh /setup-gbrain on
v1.26.x lands users on schema 24 with v1.26 features expecting 32+.
Captured for a future fix-wave.

P3 — Codex P1.3 from the v1.26.5.0 plan review: deriveCodeSourceId
drops the host segment to fit gbrain's 32-char source-id budget,
which means github.com/acme/foo and gitlab.com/acme/foo collapse to
the same source id. Real but rare; PR #1330 author explicitly
considered this and chose budget over cross-host uniqueness. Captured
as a long-tail concern.

---------

Co-authored-by: Joshua Smith <joshualowellsmith@gmail.com>
Co-authored-by: Richard Dubach <radubach@gmail.com>
Co-authored-by: Alex Medina <oficina@puntoverdemc.com>
This commit is contained in:
Garry Tan
2026-05-06 17:51:36 -07:00
committed by GitHub
parent 19e699ab9b
commit c7aefc1abd
7 changed files with 428 additions and 26 deletions
+41
View File
@@ -1,5 +1,46 @@
# Changelog
## [1.26.5.0] - 2026-05-06
## **The v1.26 memory feature now actually works on a fresh `/setup-gbrain` install, and `/sync-gbrain --full` actually registers github-hosted code sources.**
Two fix-wave bugs closed in one ship. Until this version, the headline v1.26 features ended setup green but did nothing: every transcript page failed `Unknown command: put_page`, and every `github.com/<org>/<repo>` repo got rejected for an invalid source id. After upgrade, clean-install transcripts land in gbrain with title/type/tags intact, and any github-hosted repo registers a code source on the first try.
### The numbers that matter
Both numbers come from running the binaries against the real gbrain v0.25.1 install on this machine, against `origin/main` first (buggy) and the merged branch second.
| Surface | Before (v1.26.4.0) | After (v1.26.5.0) | Δ |
|---|---|---|---|
| Memory-ingest writer verb | `gbrain put_page --slug ... --title ...` (CLI rejects: `Unknown command`) | `gbrain put <slug>` with frontmatter (CLI accepts) | from 100% fail to 0% fail |
| Transcript pages with title/type/tags | none — fields rode CLI flags that no gbrain version accepts | injected into existing frontmatter on every page | search/filter by `--type transcript` actually returns results now |
| Source id derived for `github.com/garrytan/gstack` | `gstack-code-github.com-garrytan-gstack` (38 chars, contains `.`, fails gbrain `[a-z0-9-]{1,32}` validator) | `gstack-code-garrytan-gstack` (27 chars, valid) | 100% of github-hosted repos go from rejected to accepted |
| Availability probe failure mode | every page errors with `Unknown command: put_page` | one clean error: `gbrain CLI not in PATH or missing put subcommand` | log spam goes from N copies to 1 |
| Available `gbrainPutPage()` timeout | 30 s (auto-link reconciliation hits 30 s on dense brains) | 60 s | brains with hundreds of existing pages stop hitting the ceiling on every put |
| `gbrainPutPage()` error surface | `Command failed:` (Node truncates 1 MB stderr) | first 300 chars of `err.stderr` | debugging stops requiring strace; the failure is visible |
The `gbrain put` verb has existed since v0.18.2 and was always the right CLI surface. The `put_page` shape was the MCP tool name leaking into the CLI path. The hybrid writer now handles both transcript pages (existing frontmatter from `buildTranscriptPage`, inject title/type/tags into it) and raw artifact pages (no frontmatter, wrap with new frontmatter).
### What this means for new users
Run `/setup-gbrain` on a clean install, choose any path, and Step 7.5 actually populates the brain with your transcripts plus their metadata. Run `/sync-gbrain --full` on any github-hosted repo and the code stage registers the source instead of failing the `sources add` validator. The headline v1.26 features finally do the thing they shipped to do.
### Itemized changes
#### Fixed
- `bin/gstack-memory-ingest.ts:gbrainPutPage` — switched the writer from the legacy flag-based `gbrain put_page --slug X --title Y --type Z --tags T` form to the CLI surface `gbrain put <slug>` (positional slug, content via stdin, metadata in YAML frontmatter). Two-branch hybrid: when the page body already starts with frontmatter (transcript pages from `buildTranscriptPage`, which prepends agent/session_id/cwd/git_remote/etc. but no title/type/tags), inject title/type/tags into the existing block before the closing `---`. When the body has no frontmatter (raw artifact pages: design-docs, learnings, builder-profile-entries), wrap with a fresh frontmatter carrying the same fields. Either branch produces a page that gbrain's pages list, search, and tag filters actually surface. Contributed by @smithjoshua (PR #1328: base writer + 60 s timeout + 16 MB maxBuffer + stderr first-line surface) and the artifact-wrap branch added on top here.
- `bin/gstack-memory-ingest.ts:gbrainAvailable` — adds a `gbrain --help` probe with a regex anchored on the indented subcommand format (`/^\s+put\s/m`). Replaces the previous `command -v` only check. If a future gbrain renames or removes `put`, the writer fails fast with one clean error per ingest pass instead of N copies of `Unknown command: put_page`. Contributed by @AZ-1224 (PR #1341: probe origin); regex tightening added on top here per Codex P2 plan-review feedback.
- `bin/gstack-gbrain-sync.ts:deriveCodeSourceId` — drops the host segment from canonical remote URLs (the same `github.com-` prefix on every user's id was eating 12 chars of the 32-char gbrain budget for nothing) and falls back to a 6-char sha1 hash on the slug tail when org/repo names still exceed the limit. Every `github.com/<org>/<repo>` derives a gbrain-valid id on the first try. Contributed by @radubach (PR #1330).
- `bin/gstack-gbrain-sync.ts:constrainSourceId` — handles the empty-slug edge case (input sanitizes to all non-alnum chars). Pre-fix the function returned `${prefix}-` which fails gbrain's validator on the trailing hyphen; now falls back to a deterministic sha1-prefixed id. Surfaced via the new `basename-sanitizes-to-empty` regression test added in this version per Codex plan-review.
#### Added
- `test/gstack-memory-ingest.test.ts` — two regression tests stand up a fake `gbrain` shim on PATH and run the real `--bulk` ingest pipeline against a planted Claude Code session. The first asserts the writer hits `gbrain put <slug>` (not `put_page`) and that title, type, AND tags arrive in the put stdin. The second points the writer at a legacy-only shim and asserts the availability probe surfaces a single missing-subcommand error instead of N per-page failures. Contributed by @AZ-1224 (PR #1341); the assertions for title/type/tags arriving in stdin are added on top here. The strengthened test surfaced a deeper issue in PR #1328's inject branch: it searched for `\n---\n` (with trailing newline) but `buildTranscriptPage` joins frontmatter without a trailing newline, so the search never matched. Two-line fix on top: search for `\n---` only.
- `test/gstack-gbrain-sync.test.ts` — four cases from PR #1330 (dot-host, SCP-style remote, multi-dot host, long org/repo forcing hash-truncate) plus two new edge cases this version (no-origin fallback path; basename-sanitizes-to-empty). Each test spawns the CLI inside a temp git repo and asserts the derived id passes gbrain's validator regex. Contributed by @radubach for the four core cases.
#### For contributors
- Codex outside-voice plan review caught three P1 ship-blockers in the originally proposed merge (the no-frontmatter-wrap branch from PR #1341 alone would have silently dropped title/type/tags from every transcript page — its own tests passed because they only asserted `agent: claude-code`). The plan pivoted from `merge #1341 + cherry-pick from #1328` to `merge #1328 + hybrid writer + cherry-pick #1341's tests, strengthened`. Two-pass live smoke against real gbrain (where the database connects) confirmed source-id length goes 38 → 27 chars; memory-ingest writer correctness was verified by the strengthened shim tests against a real `gbrain` CLI process.
- Two follow-up TODOs filed: P2 to bump the `bin/gstack-gbrain-install` pin in lockstep with gstack memory-feature releases (issue #1305 part 2), P3 to handle source-id cross-host collisions (`github.com/acme/foo` and `gitlab.com/acme/foo` currently collapse to the same id; rare but silent).
## [1.26.4.0] - 2026-05-05
## **`/autoplan` review reports now reliably land at the bottom of the plan, even when an older copy lives mid-file.**
+36
View File
@@ -88,6 +88,42 @@
---
### P2: Bump gbrain install-pin in lockstep with gstack memory-feature releases (#1305 part 2)
**What:** `bin/gstack-gbrain-install` pins gbrain to commit `08b3698` (v0.18.2). When gstack ships features that depend on newer gbrain ops or schema (e.g. v1.26.0 manifests + `code-def`/`code-refs`/`reindex-code`), the pin doesn't move with it. Fresh `/setup-gbrain` installs an old gbrain that fails `gbrain doctor` schema_version checks (24 vs latest 32+) until the user manually upgrades.
**Why:** Filed in #1305 alongside the `put_page` CLI bug. Out of scope for the v1.26.5.0 fix wave (separate release-coordination concern: which gbrain version we install vs. how we call it). The install-pin should either (a) auto-bump whenever gstack releases features that need newer gbrain, or (b) detect a stale pin during preamble and either auto-upgrade gbrain or print a one-line FIX hint.
**Pros:** Closes the "fresh-install paper-cut" path. New users land on a healthy schema. Reduces support noise on `/setup-gbrain` flows. Makes the gstack/gbrain release contract visible.
**Cons:** Adds release-cadence coupling between gstack and gbrain. Needs a policy: pin = "minimum version that still works" vs "latest known good." If gbrain ships a breaking change to `put` shape and gstack doesn't update the pin, fresh installs break in a new way.
**Context:** Issue #1305 part 1 (the `put_page` CLI verb bug) was handled in v1.26.5.0. Part 2 (this TODO) is the install-pin staleness. Pin lives in `bin/gstack-gbrain-install` near the top as a constant. Easiest minimal fix: ship the pin as a tracked release artifact (e.g. write it from `package.json` at build time) and add a doctor-style preamble check.
**Effort:** S (human: ~2 days / CC: ~3 hours)
**Priority:** P2
**Depends on:** Nothing.
---
### P3: Source-id host-collision risk in `deriveCodeSourceId` (cross-host duplicate org/repo)
**What:** v1.26.5.0's `deriveCodeSourceId` drops the host segment to fit gbrain's 32-char source-id budget. This means `github.com/acme/foo` and `gitlab.com/acme/foo` collapse to the same `gstack-code-acme-foo`. `ensureSourceRegisteredSync()` in `bin/gstack-gbrain-sync.ts:323` will silently re-register the source when `local_path` differs, evicting one side.
**Why:** Vanishingly rare in practice — same `<org>/<repo>` shape across both github.com and gitlab.com on the same machine almost never happens. But the failure mode is silent (one repo evicts the other in the brain), and the user has no signal anything is wrong.
**Pros:** Closes the silent-eviction edge. Two viable approaches: short host marker (`gh-` / `gl-` / `bb-`) eats 3 chars but keeps cross-host uniqueness; OR include a 3-char hash of the host alongside the org-repo.
**Cons:** Source IDs change shape again — anyone with existing registrations on v1.26.5.0 gets a one-time re-register. Net break-even because the current scheme also changed from v1.26.4.0.
**Context:** Filed in #1320 / #1322 / #1323 / #1331 (the underlying source-id validation bugs), addressed in v1.26.5.0 by dropping host segment + hash-truncating. Cross-host collision was a known accepted tradeoff in PR #1330's design ("vanishingly rare in practice"). Codex outside-voice plan review surfaced it as a long-tail concern; this TODO captures it for a future bump.
**Effort:** XS (human: ~4 hours / CC: ~30 min)
**Priority:** P3
**Depends on:** Nothing.
---
### P3: GBrain skillpack publishing for domain skills
**What:** Domain skills are agent-authored notes per hostname. Right now they're per-machine or per-agent-repo. The natural compounding extension: publish curated skill packs to GBrain (`gstack-brain-sync`) so others can subscribe. "Louise's LinkedIn skills" or "Garry's GitHub skills" become packs anyone can pull.
+1 -1
View File
@@ -1 +1 @@
1.26.4.0
1.26.5.0
+39 -7
View File
@@ -33,6 +33,7 @@ import { existsSync, statSync, mkdirSync, writeFileSync, readFileSync, unlinkSyn
import { join, dirname } from "path";
import { execSync, execFileSync, spawnSync } from "child_process";
import { homedir } from "os";
import { createHash } from "crypto";
import { detectEngineTier, withErrorContext, canonicalizeRemote } from "../lib/gstack-memory-helpers";
import { sourcePageCount } from "../lib/gbrain-sources";
@@ -158,20 +159,51 @@ function originUrl(): string | null {
}
/**
* Derive a stable source id for the cwd code corpus. Pattern: `gstack-code-<slug>`,
* where <slug> comes from canonicalizeRemote() then `/` → `-` (e.g.,
* `github.com/garrytan/gstack` → `gstack-code-github-com-garrytan-gstack`).
* Derive a stable source id for the cwd code corpus. Pattern: `gstack-code-<slug>`.
*
* Falls back to `gstack-code-<basename(repo)>` when there is no origin (local repo).
* gbrain enforces source ids to be 1-32 lowercase alnum chars with optional interior
* hyphens. We use the last two segments of the canonical remote (org/repo) and skip
* the host — `github.com` etc. is the same for nearly every user and just eats budget.
* If the resulting id still exceeds 32 chars, we keep the tail (most distinctive end)
* and append a 6-char hash of the full slug for collision resistance.
*
* Falls back to the repo basename when there is no origin (local repo).
*/
function deriveCodeSourceId(repoPath: string): string {
const remote = canonicalizeRemote(originUrl());
if (remote) {
return `gstack-code-${remote.replace(/[\/\s]+/g, "-").replace(/-+/g, "-")}`;
const segs = remote.split("/").filter(Boolean);
const slugSource = segs.slice(-2).join("-");
return constrainSourceId("gstack-code", slugSource);
}
// Fallback for repos without a remote.
const base = repoPath.split("/").pop() || "repo";
return `gstack-code-${base.toLowerCase().replace(/[^a-z0-9-]+/g, "-").replace(/-+/g, "-")}`;
return constrainSourceId("gstack-code", base);
}
/**
* Build a gbrain-valid source id (1-32 lowercase alnum + interior hyphens). Sanitizes
* `raw`, prefixes with `prefix`, and falls back to a hashed-tail form when total length
* would exceed 32 chars.
*/
function constrainSourceId(prefix: string, raw: string): string {
const MAX = 32;
const slug = raw.toLowerCase().replace(/[^a-z0-9]+/g, "-").replace(/^-+|-+$/g, "");
// Empty slug after sanitize (e.g. raw was all non-alnum like "___") would
// produce "${prefix}-" which fails gbrain's validator on the trailing
// hyphen. Fall back to a deterministic hash of the original input so the
// result is stable across runs of the same repo.
if (!slug) {
const hash = createHash("sha1").update(raw || "_empty").digest("hex").slice(0, 6);
return `${prefix}-${hash}`;
}
const full = `${prefix}-${slug}`;
if (full.length <= MAX) return full;
const hash = createHash("sha1").update(slug).digest("hex").slice(0, 6);
// Total budget: prefix + "-" + tail + "-" + hash
const tailBudget = MAX - prefix.length - 2 - hash.length;
if (tailBudget < 1) return `${prefix}-${hash}`;
const tail = slug.slice(-tailBudget).replace(/^-+|-+$/g, "");
return tail ? `${prefix}-${tail}-${hash}` : `${prefix}-${hash}`;
}
function gbrainAvailable(): boolean {
+67 -17
View File
@@ -34,8 +34,9 @@
* keep V1 ship-tight. See TODOS.md.
*
* V1.5 NOTE: When `gbrain put_file` ships in the gbrain CLI (cross-repo P0 TODO),
* transcripts will route to Supabase Storage instead of put_page. Until then, all
* content rides put_page; gbrain's native dedup keys on session_id.
* transcripts will route to Supabase Storage instead of the page-write path.
* Until then, all content rides `gbrain put <slug>` (stdin, YAML frontmatter for
* title/type/tags); gbrain's native dedup keys on session_id.
*/
import {
@@ -745,14 +746,25 @@ function buildArtifactPage(path: string, type: MemoryType): PageRecord {
};
}
// ── Writer (calls gbrain put_page) ─────────────────────────────────────────
// ── Writer (calls `gbrain put`) ────────────────────────────────────────────
let _gbrainAvailability: boolean | null = null;
function gbrainAvailable(): boolean {
if (_gbrainAvailability !== null) return _gbrainAvailability;
try {
execSync("command -v gbrain", { stdio: "ignore" });
_gbrainAvailability = true;
// gbrain v0.27 retired the legacy `put_page` flag-form for `put <slug>`
// (content via stdin, metadata as YAML frontmatter). Probe `--help` for
// the `put` subcommand so we surface a single clean error here rather
// than failing every page with "Unknown command: put_page". The regex
// anchors on the indented subcommand format gbrain's help actually uses
// (" put ..."), not any whitespace-bordered "put" word in prose.
const help = execFileSync("gbrain", ["--help"], {
encoding: "utf-8",
timeout: 5000,
stdio: ["ignore", "pipe", "pipe"],
});
_gbrainAvailability = /^\s+put\s/m.test(help);
} catch {
_gbrainAvailability = false;
}
@@ -761,25 +773,63 @@ function gbrainAvailable(): boolean {
function gbrainPutPage(page: PageRecord): { ok: boolean; error?: string } {
if (!gbrainAvailable()) {
return { ok: false, error: "gbrain CLI not in PATH" };
return { ok: false, error: "gbrain CLI not in PATH or missing `put` subcommand" };
}
// gbrain v0.27+ uses `put <slug>` (positional, content via stdin) instead
// of the legacy `put_page` flag form. Metadata rides as YAML frontmatter:
// - When the page body already starts with frontmatter (transcripts), inject
// title/type/tags into the existing block so gbrain's frontmatter parser
// picks them up.
// - When the page body has no frontmatter (raw artifacts: design-docs,
// learnings, builder-profile-entries), wrap with a fresh frontmatter
// carrying the same fields. Without this branch, artifact pages would
// land in gbrain with empty title/type/tags.
let body = page.body;
if (body.startsWith("---\n")) {
// Locate the closing --- delimiter. buildTranscriptPage joins with "\n"
// and does not append a trailing newline, so the close fence looks like
// "...\n---" followed directly by body content (no "\n---\n" pattern).
// Match the close on "\n---" only — the inject lands BEFORE the close
// fence, inside the frontmatter block, regardless of what follows it.
const end = body.indexOf("\n---", 4);
if (end > 0) {
const inject = [
`title: ${JSON.stringify(page.title)}`,
`type: ${page.type}`,
`tags:`,
...page.tags.map((t) => ` - ${t}`),
].join("\n");
body = body.slice(0, end) + "\n" + inject + body.slice(end);
}
} else {
body = [
"---",
`title: ${JSON.stringify(page.title)}`,
`type: ${page.type}`,
`tags: [${page.tags.map((t) => JSON.stringify(t)).join(", ")}]`,
"---",
"",
body,
].join("\n");
}
try {
const args = [
"put_page",
"--slug", page.slug,
"--title", page.title,
"--type", page.type,
"--tags", page.tags.join(","),
];
execFileSync("gbrain", args, {
input: page.body,
execFileSync("gbrain", ["put", page.slug], {
input: body,
encoding: "utf-8",
timeout: 30000,
// Bumped from 30s: auto-link reconciliation on dense transcripts hits
// 30s once the brain has a few hundred existing pages.
timeout: 60000,
// Bumped from default 1MB: without this, gbrain's actual stderr gets
// truncated and callers see only "Command failed:" with no detail.
maxBuffer: 16 * 1024 * 1024,
stdio: ["pipe", "pipe", "pipe"],
});
return { ok: true };
} catch (err) {
return { ok: false, error: err instanceof Error ? err.message : String(err) };
} catch (err: any) {
const stderr = err?.stderr?.toString?.() ?? "";
const stdout = err?.stdout?.toString?.() ?? "";
const detail = stderr || stdout || (err instanceof Error ? err.message : String(err));
return { ok: false, error: detail.split("\n")[0].slice(0, 300) };
}
}
+102
View File
@@ -108,6 +108,108 @@ describe("gstack-gbrain-sync CLI", () => {
rmSync(home, { recursive: true, force: true });
});
it("derived source ids are gbrain-valid (≤32 chars, alnum + interior hyphens, no dots) for any remote", () => {
// gbrain enforces source ids to be 1-32 lowercase alnum chars with optional interior
// hyphens. Pre-fix, the slug came from canonicalizeRemote() with only `/` and
// whitespace stripped — leaving dots from hostnames (`github.com`) and no length cap.
// For `github.com/<org>/<repo>`, the id was `gstack-code-github.com-<org>-<repo>`,
// which fails validation on both counts. This test exercises the derivation against
// controlled remotes by spawning the CLI in a temp git repo.
const cases = [
"https://github.com/radubach/platform.git", // dot in hostname, total > 32 with old slug
"git@github.com:garrytan/gstack.git", // SCP-style remote
"https://gitlab.example.com/team/proj.git", // multi-dot host, non-github
"https://github.com/some-very-long-org-name/some-very-long-repo-name.git", // forces hash-truncate
];
const VALID_ID = /^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$/;
for (const remote of cases) {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const repo = mkdtempSync(join(tmpdir(), "gstack-source-id-repo-"));
spawnSync("git", ["init", "--quiet", "-b", "main"], { cwd: repo });
spawnSync("git", ["remote", "add", "origin", remote], { cwd: repo });
const r = spawnSync("bun", [SCRIPT, "--dry-run", "--code-only", "--quiet"], {
encoding: "utf-8",
timeout: 60000,
cwd: repo,
env: { ...process.env, HOME: home, GSTACK_HOME: gstackHome },
});
expect(r.status).toBe(0);
const m = (r.stdout || "").match(/gbrain sources add (\S+)/);
expect(m).not.toBeNull();
const id = m![1];
expect(id.length).toBeLessThanOrEqual(32);
expect(id).toMatch(VALID_ID);
expect(id.startsWith("gstack-code-")).toBe(true);
rmSync(repo, { recursive: true, force: true });
rmSync(home, { recursive: true, force: true });
}
});
it("derives a gbrain-valid source id when the cwd repo has NO origin remote", () => {
// Fallback path in deriveCodeSourceId(): no `origin` remote configured,
// so the slug comes from the repo basename. The fallback must still
// produce a gbrain-valid id (no dots, ≤32 chars, no trailing hyphen).
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const repo = mkdtempSync(join(tmpdir(), "gstack-no-origin-"));
spawnSync("git", ["init", "--quiet", "-b", "main"], { cwd: repo });
// No `git remote add origin` — this is the no-remote case.
const r = spawnSync("bun", [SCRIPT, "--dry-run", "--code-only", "--quiet"], {
encoding: "utf-8",
timeout: 60000,
cwd: repo,
env: { ...process.env, HOME: home, GSTACK_HOME: gstackHome },
});
expect(r.status).toBe(0);
const m = (r.stdout || "").match(/gbrain sources add (\S+)/);
expect(m).not.toBeNull();
const id = m![1];
expect(id.startsWith("gstack-code-")).toBe(true);
expect(id.length).toBeLessThanOrEqual(32);
expect(id).toMatch(/^[a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])?$/);
rmSync(repo, { recursive: true, force: true });
rmSync(home, { recursive: true, force: true });
});
it("derives a gbrain-valid source id when the basename sanitizes to empty", () => {
// Pathological edge: a repo whose basename is all non-alnum (e.g. "___")
// sanitizes to an empty slug. Pre-fix, constrainSourceId returned
// "gstack-code-" — invalid per the gbrain validator on the trailing
// hyphen. Fix falls back to a deterministic hash of the original input.
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const parent = mkdtempSync(join(tmpdir(), "gstack-empty-base-"));
const repo = join(parent, "___");
mkdirSync(repo);
spawnSync("git", ["init", "--quiet", "-b", "main"], { cwd: repo });
// No `origin` remote — forces the basename-fallback path.
const r = spawnSync("bun", [SCRIPT, "--dry-run", "--code-only", "--quiet"], {
encoding: "utf-8",
timeout: 60000,
cwd: repo,
env: { ...process.env, HOME: home, GSTACK_HOME: gstackHome },
});
expect(r.status).toBe(0);
const m = (r.stdout || "").match(/gbrain sources add (\S+)/);
expect(m).not.toBeNull();
const id = m![1];
// Expect hash-only fallback shape: gstack-code-<6 hex chars>
expect(id).toMatch(/^gstack-code-[0-9a-f]{6}$/);
expect(id.length).toBeLessThanOrEqual(32);
rmSync(parent, { recursive: true, force: true });
rmSync(home, { recursive: true, force: true });
});
it("dry-run does NOT acquire the lock file (lock is for write paths only)", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
+142 -1
View File
@@ -15,7 +15,7 @@
*/
import { describe, it, expect, beforeEach, afterEach } from "bun:test";
import { mkdtempSync, writeFileSync, readFileSync, existsSync, rmSync, mkdirSync, statSync } from "fs";
import { mkdtempSync, writeFileSync, readFileSync, existsSync, rmSync, mkdirSync, statSync, chmodSync } from "fs";
import { tmpdir } from "os";
import { join } from "path";
import { spawnSync } from "child_process";
@@ -265,3 +265,144 @@ describe("gstack-memory-ingest --limit", () => {
expect(r.stderr).toContain("--limit requires a positive integer");
});
});
// ── Writer regression: gbrain v0.27+ uses `put`, not `put_page` ───────────
/**
* Stand up a fake `gbrain` shim on PATH that:
* - advertises `put` in `--help` output (so gbrainAvailable() passes)
* - records `put <slug>` invocations + their stdin to a log
* - rejects `put_page` with a non-zero exit, mimicking real gbrain v0.27+
*
* If the writer ever regresses to the legacy flag-form, the bulk pass will
* report 0 writes and the assertion on `Wrote: 1` will fail loudly.
*/
function installFakeGbrain(home: string): { binDir: string; logFile: string; stdinFile: string } {
const binDir = join(home, "fake-bin");
mkdirSync(binDir, { recursive: true });
const logFile = join(home, "gbrain-calls.log");
const stdinFile = join(home, "gbrain-stdin.log");
const script = `#!/usr/bin/env bash
set -euo pipefail
LOG="${logFile}"
STDIN_LOG="${stdinFile}"
case "\${1:-}" in
--help|-h)
cat <<EOF
Usage: gbrain <command> [options]
Commands:
put <slug> Write a page (content via stdin, YAML frontmatter for metadata)
search <query> Keyword search across pages
ask <question> Hybrid semantic + keyword query
EOF
exit 0
;;
put)
if [ "\${2:-}" = "--help" ]; then
echo "Usage: gbrain put <slug>"
exit 0
fi
echo "put \${2:-}" >> "\$LOG"
{
echo "--- slug=\${2:-} ---"
cat
echo
} >> "\$STDIN_LOG"
exit 0
;;
put_page|put-page)
echo "Unknown command: \$1" >&2
exit 2
;;
*)
echo "Unknown command: \${1:-<empty>}" >&2
exit 2
;;
esac
`;
const binPath = join(binDir, "gbrain");
writeFileSync(binPath, script, "utf-8");
chmodSync(binPath, 0o755);
return { binDir, logFile, stdinFile };
}
describe("gstack-memory-ingest writer (gbrain v0.27+ `put` interface)", () => {
it("invokes `gbrain put <slug>` with stdin body, not legacy `put_page`", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const { binDir, logFile, stdinFile } = installFakeGbrain(home);
// Single Claude Code session fixture. --include-unattributed lets it write
// even though there's no resolvable git remote in /tmp.
const session =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/foo"}\n` +
`{"type":"assistant","message":{"role":"assistant","content":"hello"},"timestamp":"2026-05-01T00:00:01Z"}\n`;
writeClaudeCodeSession(home, "tmp-foo", "abc123", session);
const r = runScript(["--bulk", "--include-unattributed", "--quiet"], {
HOME: home,
GSTACK_HOME: gstackHome,
PATH: `${binDir}:${process.env.PATH || ""}`,
});
expect(r.exitCode).toBe(0);
expect(existsSync(logFile)).toBe(true);
const calls = readFileSync(logFile, "utf-8");
expect(calls).toContain("put ");
expect(calls).not.toContain("put_page");
// Body should ride stdin and carry frontmatter that gbrain can parse.
// The transcript builder prepends its own frontmatter (agent, session_id,
// etc.) but does NOT include title/type/tags — the writer injects those
// into the existing frontmatter so gbrain pages list/search/filter
// actually surface the page. Asserting all three guards against the
// exact regression that landed in v1.26.0.0 (writer ignored these fields
// entirely; pages landed empty-titled, un-typed, un-tagged).
const stdin = readFileSync(stdinFile, "utf-8");
expect(stdin).toContain("---");
expect(stdin).toMatch(/agent:\s+claude-code/);
expect(stdin).toMatch(/title:\s/);
expect(stdin).toMatch(/type:\s+transcript/);
expect(stdin).toMatch(/tags:/);
rmSync(home, { recursive: true, force: true });
});
it("fails fast when gbrain CLI is missing the `put` subcommand", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
// Fake gbrain that ONLY advertises legacy `put_page` (no `put`).
const binDir = join(home, "legacy-bin");
mkdirSync(binDir, { recursive: true });
const script = `#!/usr/bin/env bash
case "\${1:-}" in
--help|-h) echo "Commands:"; echo " put_page Write a page (legacy)"; exit 0 ;;
*) echo "Unknown command: \$1" >&2; exit 2 ;;
esac
`;
const binPath = join(binDir, "gbrain");
writeFileSync(binPath, script, "utf-8");
chmodSync(binPath, 0o755);
const session =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/bar"}\n`;
writeClaudeCodeSession(home, "tmp-bar", "def456", session);
const r = runScript(["--bulk", "--include-unattributed"], {
HOME: home,
GSTACK_HOME: gstackHome,
PATH: `${binDir}:${process.env.PATH || ""}`,
});
// Bulk completes (the script is per-page tolerant), but every page
// surfaces the missing-`put` error rather than the old "Unknown command".
expect(r.stderr + r.stdout).toMatch(/missing `put` subcommand|gbrain CLI not in PATH/);
rmSync(home, { recursive: true, force: true });
});
});