Files
gstack/bin/gstack-gbrain-install
T
Garry Tan 65972f6a15 v1.43.1.0 feat: default PGLite to voyage-code-3 for code search + e2e tests (#1639)
* docs: drop ~/.zshrc env note in favor of GSTACK_* env-shim reference

The CLAUDE.md "Where the keys live on this machine" block hand-rolled a
`grep ~/.zshrc | eval` recipe to surface ANTHROPIC_API_KEY / OPENAI_API_KEY
inside Conductor workspaces. That predates the GSTACK_* env-shim
(`lib/conductor-env-shim.ts`, v1.39.2.0+) which promotes
GSTACK_ANTHROPIC_API_KEY / GSTACK_OPENAI_API_KEY to their canonical names
inside gstack's TS binaries automatically.

The zshrc recipe is now an obsolete workaround. Replace with a short note
pointing at the env-shim as the canonical answer. Keep the Agent SDK
\`env: {...}\` gotcha (still real, unrelated to where the key comes from).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: default PGLite to voyage-code-3 when VOYAGE_API_KEY set

When gstack inits a local PGLite engine for code search, use Voyage's
code-specialized `voyage-code-3` (1024-dim) embedding model if
\`VOYAGE_API_KEY\` is present. Falls back to gbrain's auto-selected
provider chain (OpenAI text-embedding-3-large 1536-dim when
OPENAI_API_KEY is available, etc.) when the Voyage key is unset.

Why voyage-code-3: head-to-head A/B against voyage-4-large on 10
realistic code queries against this codebase (using gbrain query
--no-expand for pure vector retrieval). voyage-code-3 strictly won on
4 queries (cases where the right hit was an implementation file vs a
test file: terminal-agent.ts over terminal-agent-integration.test.ts,
sanitizeReplacer over sanitize.test.ts, disposeSession over a
tangentially-related killDaemon test, surfaced injectCanary semantic
query). Tied on 5 with consistently +0.03 to +0.06 higher confidence.
Zero losses for voyage-4-large.

Touches 3 init sites in setup-gbrain/SKILL.md.tmpl:
- Step 1.5 (broken-db rollback-safe switch to PGLite)
- Path 3 direct PGLite init
- Step 4.5 split-engine local code index (Path 4 Yes branch)

Plus 2 manual-repair hints in sync-gbrain/SKILL.md.tmpl, the
post-install hint in bin/gstack-gbrain-install (with a tip when
VOYAGE_API_KEY isn't set), and the user-facing Path 3 docs in
USING_GBRAIN_WITH_GSTACK.md.

Cost is trivial: voyage-code-3 at \$0.18/1M tokens means a full reindex
of a 100K-LOC repo runs about \$0.20. Incremental syncs are pennies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md after voyage-code-3 default

Mechanical regen via \`bun run gen:skill-docs --host all\` after the
template changes in the previous commit. Single-host regen leaves
other-host outputs stale and trips gen-skill-docs.test.ts; --host all
keeps every adapter (claude, codex, kiro, opencode, slate, cursor,
openclaw, hermes, gbrain) in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: gbrain PGLite + voyage-code-3 init contract + sync integration

Two test files cover the voyage-code-3 default landed in the previous
commits:

test/gbrain-init-voyage-code-3.test.ts — free, deterministic, gate-tier.
Mirrors gbrain-init-rollback.test.ts: runs the skill template's
PGLite-init bash against a fake \`gbrain\` that logs argv to a sentinel
file, asserts the right flags pass under VOYAGE_API_KEY set/unset/empty.
Also includes belt-and-suspenders grep checks that the template literally
contains the voyage gate at all 3 PGLite init sites.

test/gbrain-sync-voyage-code-3-integration.test.ts — real, paid,
skip-if-no-key. Inits a sandbox PGLite with voyage-code-3 in a tempdir,
registers a 3-file fixture git repo as a source, runs
\`gbrain sync --strategy code --skip-failed\`, asserts pages imported +
embedded > 0. Also asserts \`gbrain doctor\` reports no dimension
mismatch and the column width is 1024d. \`gbrain code-def\` smoke test
confirms symbol extraction works against the embedded fixture.

The integration test deliberately omits a \`gbrain query\` assertion:
query produces correct output but \`gbrain query\` hangs ~2 min on a
fresh PGLite before exiting. The smoking-gun assertion for "embeddings
worked" is the "N pages embedded" line from sync output. Symbol-aware
correctness is covered by the code-def assertion.

Caught one real bug during test development: gbrain reads
\`.gbrain-source\` from CWD and tries to sync that source too. The test
sets cwd to the sandbox root to avoid the parent worktree's pin
polluting the sandbox brain. Documented in the runGbrain() helper.

Runtime: ~22s when VOYAGE_API_KEY is set, instant skip otherwise.
Cost: ~\$0.001 per run (3 tiny fixture files, ~500 tokens of Voyage
embeddings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump to v1.43.1.0 with voyage-code-3 default + tests

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update USING_GBRAIN_WITH_GSTACK for v1.43.1.0 voyage-code-3 default

Add VOYAGE_API_KEY row to the env-var table; clarify the OPENAI_API_KEY row as
the fallback path. Refresh the "search returns nothing semantic" troubleshooting
to mention both providers and clarify that the env-shim only promotes
ANTHROPIC/OPENAI from GSTACK_ — VOYAGE_API_KEY must be set directly in Conductor
workspace env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: drop em-dashes + replace phantom embedding-migrations.md ref with inline recipe

CHANGELOG release-summary prose used em-dashes (violates voice rule) and
linked to docs/embedding-migrations.md which is gbrain's doc, not gstack's.
Replace with periods/commas and inline the dimension-mismatch recovery
recipe directly (mv + re-init).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 18:55:55 -07:00

230 lines
8.8 KiB
Bash
Executable File

#!/usr/bin/env bash
# gstack-gbrain-install — install the gbrain CLI on a local Mac.
#
# Usage:
# gstack-gbrain-install [--install-dir <dir>] [--pinned-commit <sha>] [--dry-run]
#
# D5 detect-first: before cloning anywhere, probe likely pre-existing
# locations (~/git/gbrain and ~/gbrain) and reuse a working clone if one
# exists. Falls back to a fresh clone of the pinned commit at ~/gbrain
# (override with GBRAIN_INSTALL_DIR or --install-dir).
#
# D19 PATH-shadowing: after `bun link`, compare `gbrain --version` output
# to the install-dir's package.json version. On mismatch, abort with an
# actionable error listing every gbrain on PATH. Never "silently fixes"
# PATH; setup skills should refuse broken environments.
#
# Prerequisites (checked before doing anything):
# - bun (install: curl -fsSL https://bun.sh/install | bash)
# - git
# - network reachability to https://github.com
#
# The pinned commit is declared here rather than resolved dynamically so
# upgrades are explicit and reviewable. Update PINNED_COMMIT when gstack
# verifies compatibility with a new gbrain release.
#
# Env:
# GBRAIN_INSTALL_DIR — override default install path (~/gbrain)
#
# Exit codes:
# 0 — success (or --dry-run printed the plan)
# 2 — prerequisite missing or invalid argument
# 3 — post-install validation failed (PATH shadow, broken binary, etc.)
set -euo pipefail
# --- defaults ---
PINNED_COMMIT="08b3698e90532b7b66c445e6b1d8cdfe71822802" # gbrain v0.18.2
PINNED_TAG="v0.18.2"
GBRAIN_REPO_URL="https://github.com/garrytan/gbrain.git"
DEFAULT_INSTALL_DIR="${GBRAIN_INSTALL_DIR:-$HOME/gbrain}"
INSTALL_DIR="$DEFAULT_INSTALL_DIR"
DRY_RUN=false
VALIDATE_ONLY=false
die() { echo "gstack-gbrain-install: $*" >&2; exit 2; }
fail() { echo "gstack-gbrain-install: $*" >&2; exit 3; }
log() { echo "gstack-gbrain-install: $*"; }
# --- parse args ---
while [ $# -gt 0 ]; do
case "$1" in
--install-dir) INSTALL_DIR="$2"; shift 2 ;;
--pinned-commit) PINNED_COMMIT="$2"; PINNED_TAG=""; shift 2 ;;
--dry-run) DRY_RUN=true; shift ;;
--validate-only) VALIDATE_ONLY=true; shift ;;
--help|-h) sed -n '2,30p' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) die "unknown flag: $1" ;;
esac
done
# --- prerequisites ---
check_prereq() {
local bin="$1"
local hint="$2"
if ! command -v "$bin" >/dev/null 2>&1; then
fail "required tool '$bin' not found. $hint"
fi
}
if ! $VALIDATE_ONLY; then
check_prereq bun "Install: curl -fsSL https://bun.sh/install | bash"
check_prereq git "Install: xcode-select --install (macOS) or your package manager"
# GitHub reachability — fail fast if offline rather than hanging `git clone`.
# --max-time 10, --head (no body), quiet. Status code 200-4xx means we reached
# the server (even 404 is reachability proof).
if ! curl -s --head --max-time 10 https://github.com >/dev/null 2>&1; then
fail "cannot reach https://github.com. Check your network and try again."
fi
fi
# --- D5 detect-first: probe common locations before cloning fresh ---
# Accept any directory that looks like a gbrain clone: has package.json
# with name "gbrain" and a `bin.gbrain` entry. Don't accept version mismatches
# here — we'll let bun link run and then D19-validate.
is_valid_clone() {
local dir="$1"
[ -d "$dir" ] || return 1
[ -f "$dir/package.json" ] || return 1
local name
name=$(jq -r '.name // empty' "$dir/package.json" 2>/dev/null || true)
[ "$name" = "gbrain" ] || return 1
local bin
bin=$(jq -r '.bin.gbrain // empty' "$dir/package.json" 2>/dev/null || true)
[ -n "$bin" ] || return 1
return 0
}
DETECTED_CLONE=""
if ! $VALIDATE_ONLY; then
for candidate in "$HOME/git/gbrain" "$HOME/gbrain" "$INSTALL_DIR"; do
if is_valid_clone "$candidate"; then
DETECTED_CLONE="$candidate"
break
fi
done
fi
if $VALIDATE_ONLY; then
log "validate-only mode: skipping detect + clone + install + link"
elif [ -n "$DETECTED_CLONE" ]; then
log "detected existing gbrain clone at $DETECTED_CLONE — reusing"
INSTALL_DIR="$DETECTED_CLONE"
else
# Fresh clone path.
if $DRY_RUN; then
log "DRY RUN: would clone $GBRAIN_REPO_URL @ $PINNED_COMMIT$INSTALL_DIR"
exit 0
fi
if [ -d "$INSTALL_DIR" ]; then
fail "install dir $INSTALL_DIR exists but is not a valid gbrain clone. Remove it or pass --install-dir <other>."
fi
log "cloning $GBRAIN_REPO_URL$INSTALL_DIR"
git clone --quiet "$GBRAIN_REPO_URL" "$INSTALL_DIR"
( cd "$INSTALL_DIR" && git checkout --quiet "$PINNED_COMMIT" )
log "pinned to $PINNED_COMMIT${PINNED_TAG:+ ($PINNED_TAG)}"
fi
if $DRY_RUN; then
log "DRY RUN: would run bun install + bun link in $INSTALL_DIR"
exit 0
fi
# --- install + link ---
# On Windows MSYS/Cygwin shells, bun's postinstall scripts (notably gbrain's
# native-bindings setup) fail to parse path arguments correctly and abort
# `bun install` with a non-zero exit. The package itself installs fine
# without scripts, so detect Windows and pass --ignore-scripts there. The
# `bun link` step below is unaffected.
IS_WINDOWS=0
case "$(uname -s)" in
MINGW*|MSYS*|CYGWIN*|Windows_NT) IS_WINDOWS=1 ;;
esac
if ! $VALIDATE_ONLY; then
if [ "$IS_WINDOWS" -eq 1 ]; then
log "running bun install --ignore-scripts in $INSTALL_DIR (Windows shell detected)"
( cd "$INSTALL_DIR" && bun install --silent --ignore-scripts )
else
log "running bun install in $INSTALL_DIR"
( cd "$INSTALL_DIR" && bun install --silent )
fi
log "running bun link in $INSTALL_DIR"
( cd "$INSTALL_DIR" && bun link --silent )
fi
# --- D19 PATH-shadowing validation ---
# Read the version from the install-dir's package.json; compare to
# `gbrain --version`. If they disagree, PATH is returning a DIFFERENT
# gbrain than the one we just linked. Fail hard with remediation.
expected_version=$(jq -r '.version // empty' "$INSTALL_DIR/package.json" 2>/dev/null || true)
if [ -z "$expected_version" ]; then
fail "cannot read version from $INSTALL_DIR/package.json (install may be broken)"
fi
if ! command -v gbrain >/dev/null 2>&1; then
fail "bun link completed but 'gbrain' is not on PATH. Ensure ~/.bun/bin is in your PATH."
fi
actual_version=$(gbrain --version 2>/dev/null | head -1 | awk '{print $NF}' | tr -d '[:space:]' || true)
if [ -z "$actual_version" ]; then
fail "gbrain is on PATH but 'gbrain --version' produced no output — the binary may be broken."
fi
# Tolerate a leading "v" (gbrain may print either "0.18.2" or "v0.18.2").
expected_norm="${expected_version#v}"
actual_norm="${actual_version#v}"
if [ "$actual_norm" != "$expected_norm" ]; then
echo "" >&2
echo "gstack-gbrain-install: PATH SHADOWING DETECTED" >&2
echo "" >&2
echo " We just linked gbrain $expected_version from $INSTALL_DIR," >&2
echo " but PATH is returning gbrain $actual_version." >&2
echo "" >&2
echo " All gbrain binaries on PATH:" >&2
type -a gbrain 2>&1 | sed 's/^/ /' >&2 || true
echo "" >&2
echo " Fix one of the following, then re-run /setup-gbrain:" >&2
echo " a) rm the shadowing binary: rm \$(which gbrain)" >&2
echo " b) prepend ~/.bun/bin to PATH in your shell rc" >&2
echo " c) point GBRAIN_INSTALL_DIR at the shadowing binary's install dir" >&2
echo "" >&2
exit 3
fi
log "installed gbrain $actual_version from $INSTALL_DIR"
# v1.40.0.0 post-install validation (T6 / codex review #19): --ignore-scripts
# may skip artifacts gbrain needs at runtime, especially on Windows
# MSYS/MINGW where we DID pass --ignore-scripts. `gbrain --version` above
# already confirmed the binary runs; this second probe checks that the
# subcommand surface is reachable (`sources` is the entry point the sync
# stage hits first). If the probe fails, we warn but don't exit non-zero —
# the user may still be able to use other commands.
if ! gbrain sources --help >/dev/null 2>&1; then
echo "" >&2
echo "gstack-gbrain-install: WARNING — gbrain installed but 'gbrain sources --help' did not exit 0." >&2
if [ "$IS_WINDOWS" -eq 1 ]; then
echo " Windows shells skip bun postinstall scripts; some gbrain features may need native build tools." >&2
echo " If /sync-gbrain fails to find subcommands, install gbrain from a non-MSYS shell," >&2
echo " or run: cd $INSTALL_DIR && bun install (without --ignore-scripts)" >&2
else
echo " This may be a transient gbrain CLI issue or a missing native dependency." >&2
echo " If /sync-gbrain fails, re-run: cd $INSTALL_DIR && bun install" >&2
fi
echo "" >&2
fi
echo ""
if [ -n "${VOYAGE_API_KEY:-}" ]; then
echo "Next: gbrain init --pglite --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024"
echo " (or run /setup-gbrain for the full setup flow)"
else
echo "Next: gbrain init --pglite (or run /setup-gbrain for the full setup flow)"
echo ""
echo "Tip: set VOYAGE_API_KEY before init to use voyage-code-3 (best embedding"
echo "model for code retrieval on Voyage). Without it, gbrain falls back to its"
echo "auto-selected provider (OpenAI when OPENAI_API_KEY is set, etc.)."
fi