# gstack memory ingest — what it does, what stays local, what you can do with it This is the user-facing reference for the V1 transcript + memory ingest feature in `/setup-gbrain`. If you ran `/setup-gbrain` and it asked "Ingest THIS repo's transcripts into gbrain?", this doc explains what happens after you say yes. ## What gets ingested | Source | Type | Where | Sensitivity | |---|---|---|---| | Claude Code session JSONL | `transcript` | `~/.claude/projects/*/` | High — full conversations including tool I/O | | Codex CLI session JSONL | `transcript` | `~/.codex/sessions/YYYY/MM/DD/` | High | | Cursor session SQLite (V1.0.1) | `transcript` | `~/Library/Application Support/Cursor/` | Same — deferred V1.0.1 | | Eureka log | `eureka` | `~/.gstack/analytics/eureka.jsonl` | Medium — your insights, often non-secret | | Project learnings | `learning` | `~/.gstack/projects//learnings.jsonl` | Medium | | Project timeline | `timeline` | `~/.gstack/projects//timeline.jsonl` | Low | | CEO plans | `ceo-plan` | `~/.gstack/projects//ceo-plans/*.md` | Medium | | Design docs | `design-doc` | `~/.gstack/projects//*-design-*.md` | Medium | | Retros | `retro` | `~/.gstack/projects//retros/*.md` | Medium | | Builder profile | `builder-profile-entry` | `~/.gstack/builder-profile.jsonl` | Low | ## What stays local - **State files** (`~/.gstack/.gbrain-sync-state.json`, `~/.gstack/.transcript-ingest-state.json`, `~/.gstack/.gbrain-engine-cache.json`, `~/.gstack/.gbrain-errors.jsonl`) are local-only per ED1 (state file sync semantics decision). They are not synced via the brain remote. - **Sessions with no resolvable git remote** (running in `/tmp/`, scratch dirs, etc.) are skipped by default. Pass `--include-unattributed` to the ingest helper to opt them in. - **Repos under a `deny` trust policy** (set in `/setup-gbrain` Step 6) are skipped — neither code nor transcripts from those repos ingest. ## What gets scanned for secrets Every ingested page passes through **gitleaks** before write (per D19 — replaces the regex scanner that previously ran only on staged git diffs). Gitleaks is industry-standard, covers: - AWS / GCP / Azure access keys - ANTHROPIC_API_KEY, OPENAI_API_KEY, GitHub tokens - Stripe keys, Slack tokens, JWT secrets - Generic high-entropy strings (configurable threshold) A session with a positive finding is **skipped entirely** — not partially redacted. The match line + rule ID are logged to stderr; you can see what was skipped via `bun run bin/gstack-memory-ingest.ts --probe` (which shows new vs. updated counts) or by reviewing the helper's output during `/gbrain-sync --full`. If gitleaks is not installed (run `brew install gitleaks` on macOS, or `apt install gitleaks` on Linux), the helper warns once and disables secret scanning. **In that mode, transcripts ingest unscanned. Don't run ingest without gitleaks if you have any concern about secrets in your sessions.** ## Where it goes Storage tier depends on your gbrain engine (set during `/setup-gbrain`): - **Supabase configured:** code + transcripts go to Supabase Storage (multi-Mac native). Curated memory (eureka/learnings/etc.) goes to the brain-linked git repo via `gstack-brain-sync`. - **Local PGLite only:** everything stays on this Mac. Curated memory syncs via git if you've enabled brain-sync. The "never double-store" rule per the plan: code and transcripts NEVER go in the gbrain-linked git repo. They're too big and they're replaceable from disk on each Mac. ## What you can do with it - **Query in natural language:** ```bash gbrain query "what was I doing on the auth migration" gbrain search "session_id:abc123" ``` - **Browse by type:** ```bash gbrain list_pages --type transcript --limit 10 gbrain list_pages --type ceo-plan ``` - **Read a specific page:** ```bash gbrain get_page transcripts/claude-code/garrytan-gstack/2026-05-01-abc123 ``` - **Delete a page:** ```bash gbrain delete_page ``` Caveat: with brain-sync enabled, the page is removed from gbrain's index but git history retains it. For hard-delete, run `git filter-repo` on the brain remote. - **Bulk-delete by criteria** (V1.0.1 follow-up — `gstack-transcript-prune` helper). For V1.0, use `gbrain delete_page ` per-page or write a small loop over `gbrain list_pages` output. - **Disable entirely:** ```bash gstack-config set transcript_ingest_mode off gstack-config set gbrain_context_load off # also disables retrieval ``` ## How the agent uses it At every gstack skill start, the preamble runs `gstack-brain-context-load` which: 1. Reads the active skill's `gbrain.context_queries:` frontmatter 2. Dispatches each query to gbrain (vector / list / filesystem) 3. Renders results into `## ` sections wrapped in `` envelopes 4. The model sees this as part of the preamble before making any decisions For example, when you run `/office-hours`, the model context automatically includes: - `## Prior office-hours sessions in this repo` (last 5) - `## Your builder profile snapshot` (latest entry) - `## Recent design docs for this project` (last 3) - `## Recent eureka moments` (last 5) So the "Welcome back, last time you were on X" beat is sourced from your actual data, not cold-start. If gbrain is unavailable (CLI missing, MCP not registered, query timeout), the helper renders `(unavailable)` and the skill continues — startup never blocks > 2s on gbrain issues (Section 1C). ## What to do when something feels off Run `/setup-gbrain` again. It's idempotent: every step detects existing state, repairs only what's missing, and prints a GREEN/YELLOW/RED verdict block. If a row is RED, the row tells you what to do. Common cases: - **Salience block is empty** — your transcripts may not be ingested yet. Run `gstack-gbrain-sync --full` to do a full pass. - **"gbrain CLI missing" in the preamble output** — gbrain isn't on your PATH. Run `/setup-gbrain` to install/wire it. - **PGLite engine corrupt (V1.5)** — V1.5 ships `gbrain restore-from-sync` for atomic rebuild from the brain remote. For V1.0, manual recovery: `cd ~/.gbrain && rm -rf db && gbrain init --pglite && gbrain import `. - **A page has stale or wrong content** — `gbrain delete_page `, then re-run `gstack-gbrain-sync --incremental` to re-ingest from source if the source file is still on disk and unchanged. ## Privacy + audit - Every `secretScanFile` finding is logged to stderr at ingest time. - Every gbrain put/delete is logged to `~/.gstack/.gbrain-errors.jsonl` with `{ts, op, duration_ms, outcome}` for forensic tracing. - `~/.gstack/.gbrain-engine-cache.json` shows which storage tier is active (PGLite vs Supabase). - Brain-sync git history shows every curated artifact push with the user's git identity. If you find a transcript page that contains a secret gitleaks missed, the recovery path is: 1. `gbrain delete_page ` — removes from index immediately 2. Rotate the secret (rotate it anyway as a defensive measure) 3. If brain-sync is on: `git filter-repo --invert-paths --path ` on the brain remote for hard-delete from history 4. File a gitleaks issue with the pattern (or extend the gitleaks config at `~/.gitleaks.toml`).