v1.33.0.0 docs: design doc, P2 perf TODOs, gbrain guidance block, changelog

docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8
decisions (D1-D8), source-verified gbrain behaviors (content_hash
idempotency, frontmatter parity, path-authoritative slug, per-file
failure surface), measured performance vs plan target, F9 hash
migration one-time cliff note, and follow-up TODOs.

CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain
indicating this worktree's pin and how the agent should prefer gbrain
search over Grep for semantic queries.

TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation
(5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1
SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import
at the prepare-batch level for true no-op fast paths.

VERSION + package.json: bump to 1.33.0.0 (queue-aware via
bin/gstack-next-version — skipped v1.32.0.0 which is claimed by
sibling worktree garrytan/wellington / PR #1431).

CHANGELOG.md: v1.33.0.0 entry per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-11 10:16:18 -07:00
parent 9d023c0410
commit 0d6511ad6a
6 changed files with 503 additions and 2 deletions
+61
View File
@@ -1,5 +1,66 @@
# TODOS
## /sync-gbrain memory stage perf follow-up
### P2: Investigate `gbrain import` perf on large staging dirs
**What:** Cold-run time on a 5131-file staging dir is >10 min in `gbrain import`
alone (after gstack's prepare phase, which is now <10s after dropping per-file
gitleaks). On 501 files it took 10s. The scaling is worse than linear and the
bottleneck is inside gbrain, not the gstack orchestrator.
**Why:** With memory-ingest's prepare phase now fast, the remaining cold-run cost
is entirely on the gbrain side. Users with large corpora (5K+ files) currently pay
~15-30 min on first ingest. Likely culprits in `~/git/gbrain/src/core/import-file.ts`:
- N+1 SQL queries: `engine.getPage(slug)` for each file's content_hash check
(line 242 + 478) — should be batched into a single query
- Per-page auto-link reconciliation that fires even for unchanged content
- FTS / vector index updates without batching transactions
**Pros:** Lives in gbrain (cleaner separation). Fix in gbrain benefits other
gbrain callers too (`gbrain sync`, MCP `put_page` workflows). Likely 10-50x
speedup from batched queries alone.
**Cons:** Cross-repo change, requires gbrain test coverage for the new batched
path. Not on the gstack critical path; gstack's architecture is already correct.
**Context:** Verified on real corpus 2026-05-10. gstack-side prepare with
`--scan-secrets` off runs in <10s. The full gbrain import on the same staged
dir consumes 100% CPU for >10 min. Both observations from
`bin/gstack-memory-ingest.ts:ingestPass` reaching the `runGbrainImport` call
quickly, then the child process taking the bulk of the wall time.
**Depends on:** None — gstack's batch-ingest architecture (D1-D8 in
`docs/designs/SYNC_GBRAIN_BATCH_INGEST.md`) is already shipped and correct.
---
### P3: Cache "no changes since last import" at the prepare-batch level
**What:** Even with the prepare phase fast (<10s for 5135 files), walking and
mtime-stat'ing every file on a true no-op run adds a few seconds and creates
spurious staging dirs. Cache the most-recent-source-mtime per-source in the
state file; if no source dir has a newer mtime, skip the walk + stage + import
entirely.
**Why:** Most `/sync-gbrain` invocations have nothing new to ingest. The
fastest path is "do nothing, fast." `gbrain doctor` should still report state,
but the actual ingest pipeline can short-circuit when last_full_walk is recent
and no source-tree mtime has moved.
**Pros:** Trivial implementation (~20 lines in `ingestPass`). Makes the
incremental fast-path actually live up to "<30s" in the original plan.
**Cons:** Adds a cache invalidation surface. If a user edits a file but its
parent dir's mtime doesn't update (rare on macOS APFS), changes get missed.
Mitigation: only short-circuit when last_full_walk is recent (e.g. <1 min ago).
**Context:** Filed during 2026-05-10 perf testing after `--scan-secrets` was
made opt-in. Lower priority than the gbrain-side perf issue above.
---
## Browser-skills follow-on (Phases 2-4)
### P1: Browser-skills Phase 2 — `/scrape` and `/skillify` skill templates