mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-27 13:34:25 +02:00
v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639)
* docs: drop ~/.zshrc env note in favor of GSTACK_* env-shim reference
The CLAUDE.md "Where the keys live on this machine" block hand-rolled a
`grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY
inside Conductor workspaces. That predates the GSTACK_* env-shim
(`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes
GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names
inside gstack's TS binaries automatically.
The zshrc recipe is now an obsolete workaround. Replace with a short note
pointing at the env-shim as the canonical answer. Keep the Agent SDK
\`env: {...}\` gotcha (still real, unrelated to where the key comes from).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: default PGLite to voyage-code-3 when VOYAGE_API_KEY set
When gstack inits a local PGLite engine for code search, use Voyage's
code-specialized `voyage-code-3` (1024-dim) embedding model if
\`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected
provider chain (OpenAI text-embedding-3-large 1536-dim when
OPENAI_API_KEY is available, etc.) when the Voyage key is unset.
Why voyage-code-3: head-to-head A/B against voyage-4-large on 10
realistic code queries against this codebase (using gbrain query
--no-expand for pure vector retrieval). voyage-code-3 strictly won on
4 queries (cases where the right hit was an implementation file vs a
test file: terminal-agent.ts over terminal-agent-integration.test.ts,
sanitizeReplacer over sanitize.test.ts, disposeSession over a
tangentially-related killDaemon test, surfaced injectCanary semantic
query). Tied on 5 with consistently +0.03 to +0.06 higher confidence.
Zero losses for voyage-4-large.
Touches 3 init sites in setup-gbrain/SKILL.md.tmpl:
- Step 1.5 (broken-db rollback-safe switch to PGLite)
- Path 3 direct PGLite init
- Step 4.5 split-engine local code index (Path 4 Yes branch)
Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the
post-install hint in bin/gstack-gbrain-install (with a tip when
VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in
USING_GBRAIN_WITH_GSTACK.md.
Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex
of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md after voyage-code-3 default
Mechanical regen via \`bun run gen:skill-docs --host all\` after the
template changes in the previous commit. Single-host regen leaves
other-host outputs stale and trips gen-skill-docs.test.ts; --host all
keeps every adapter (claude, codex, kiro, opencode, slate, cursor,
openclaw, hermes, gbrain) in sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: gbrain PGLite + voyage-code-3 init contract + sync integration
Two test files cover the voyage-code-3 default landed in the previous
commits:
test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier.
Mirrors gbrain-init-rollback.test.ts: runs the skill template's
PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel
file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty.
Also includes belt-and-suspenders grep checks that the template literally
contains the voyage gate at all 3 PGLite init sites.
test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid,
skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir,
registers a 3-file fixture git repo as a source, runs
\`gbrain sync --strategy code --skip-failed\`, asserts pages imported +
embedded > 0. Also asserts \`gbrain doctor\` reports no dimension
mismatch and the column width is 1024d. \`gbrain code-def\` smoke test
confirms symbol extraction works against the embedded fixture.
The integration test deliberately omits a \`gbrain query\` assertion:
query produces correct output but \`gbrain query\` hangs ~2 min on a
fresh PGLite before exiting. The smoking-gun assertion for "embeddings
worked" is the "N pages embedded" line from sync output. Symbol-aware
correctness is covered by the code-def assertion.
Caught one real bug during test development: gbrain reads
\`.gbrain-source\` from CWD and tries to sync that source too. The test
sets cwd to the sandbox root to avoid the parent worktree's pin
polluting the sandbox brain. Documented in the runGbrain() helper.
Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise.
Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage
embeddings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump to v1.43.1.0 with voyage-code-3 default + tests
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: update USING_GBRAIN_WITH_GSTACK for v1.43.1.0 voyage-code-3 default
Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as
the fallback path. Refresh the "search returns nothing semantic" troubleshooting
to mention both providers and clarify that the env-shim only promotes
ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor
workspace env.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: drop em-dashes + replace phantom embedding-migrations.md ref with inline recipe
CHANGELOG release-summary prose used em-dashes (violates voice rule) and
linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's.
Replace with periods/commas and inline the dimension-mismatch recovery
recipe directly (mv + re-init).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+27
-4
@@ -845,7 +845,14 @@ with `GSTACK_DETECT_NO_CACHE=1` (busts the 60s cache). If the new
|
||||
```bash
|
||||
BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)"
|
||||
mv "$HOME/.gbrain/config.json" "$BACKUP"
|
||||
if ! gbrain init --pglite --json; then
|
||||
# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — best for
|
||||
# code retrieval. Without the key, fall back to gbrain's own auto-selected
|
||||
# embedding provider chain (OpenAI 1536d when OPENAI_API_KEY is present, etc.).
|
||||
GBRAIN_EMBED_FLAGS=""
|
||||
if [ -n "${VOYAGE_API_KEY:-}" ]; then
|
||||
GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024"
|
||||
fi
|
||||
if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then
|
||||
# Restore on failure
|
||||
mv "$BACKUP" "$HOME/.gbrain/config.json"
|
||||
echo "gbrain init failed. Your previous config was restored at $HOME/.gbrain/config.json." >&2
|
||||
@@ -1052,10 +1059,18 @@ Then follow the same secret-read + verify + init flow as Path 1.
|
||||
### Path 3 (PGLite local)
|
||||
|
||||
```bash
|
||||
gbrain init --pglite --json
|
||||
# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — code
|
||||
# retrieval beats general-purpose embeddings on real code queries (validated
|
||||
# A/B). Without the key, gbrain auto-selects (OpenAI 1536d when available).
|
||||
GBRAIN_EMBED_FLAGS=""
|
||||
if [ -n "${VOYAGE_API_KEY:-}" ]; then
|
||||
GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024"
|
||||
fi
|
||||
gbrain init --pglite --json $GBRAIN_EMBED_FLAGS
|
||||
```
|
||||
|
||||
Done. No network, no secrets.
|
||||
Done. No network, no secrets (beyond Voyage embedding API calls during sync, if
|
||||
`VOYAGE_API_KEY` is set — ~$0.18 per 1M tokens, pennies per repo).
|
||||
|
||||
### Path 4 (Remote gbrain MCP — HTTP transport with bearer token)
|
||||
|
||||
@@ -1135,7 +1150,15 @@ if [ -f "$HOME/.gbrain/config.json" ]; then
|
||||
BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)"
|
||||
mv "$HOME/.gbrain/config.json" "$BACKUP"
|
||||
fi
|
||||
if ! gbrain init --pglite --json; then
|
||||
# gstack default for local code-search PGLite: voyage-code-3 (1024d) when
|
||||
# VOYAGE_API_KEY is set. It wins the A/B over voyage-4-large and OpenAI
|
||||
# text-embedding-3-large on this codebase's symbol queries. Falls back to
|
||||
# gbrain's auto-selected provider when the key isn't present.
|
||||
GBRAIN_EMBED_FLAGS=""
|
||||
if [ -n "${VOYAGE_API_KEY:-}" ]; then
|
||||
GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024"
|
||||
fi
|
||||
if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then
|
||||
if [ -n "${BACKUP:-}" ] && [ -f "$BACKUP" ]; then mv "$BACKUP" "$HOME/.gbrain/config.json"; fi
|
||||
echo "gbrain init failed. Existing config (if any) was restored. PGLite at ~/.gbrain/pglite/ may be in a partial state — \`rm -rf ~/.gbrain/pglite\` to reset." >&2
|
||||
echo "Continuing setup without local code search; you can re-run /setup-gbrain to retry." >&2
|
||||
|
||||
Reference in New Issue
Block a user