diff --git a/bin/gstack-jsonl-merge b/bin/gstack-jsonl-merge
index 2be0ea9d..c777612a 100755
--- a/bin/gstack-jsonl-merge
+++ b/bin/gstack-jsonl-merge
@@ -4,7 +4,7 @@
# Usage (called by git, not by users):
# gstack-jsonl-merge
#
-# Registered in local git config by bin/gstack-brain-init and
+# Registered in local git config by bin/gstack-artifacts-init and
# bin/gstack-brain-restore:
# git config merge.jsonl-append.driver \
# "$GSTACK_BIN/gstack-jsonl-merge %O %A %B"
diff --git a/bin/gstack-timeline-log b/bin/gstack-timeline-log
index 1b2fff76..6b7dc7e4 100755
--- a/bin/gstack-timeline-log
+++ b/bin/gstack-timeline-log
@@ -4,7 +4,7 @@
#
# Session timeline: local by default. If the user enables `artifacts_sync_mode`
# with the `full` (not `artifacts-only`) privacy tier — via the first-run
-# stop-gate from `gstack-brain-init` or the preamble — timeline events are
+# stop-gate from `gstack-artifacts-init` or the preamble — timeline events are
# published to the user's private GBrain sync repo. See docs/gbrain-sync.md.
# Required fields: skill, event (started|completed).
# Optional: branch, outcome, duration_s, session, ts.
diff --git a/setup-gbrain/memory.md b/setup-gbrain/memory.md
index 40f38922..86e3ac35 100644
--- a/setup-gbrain/memory.md
+++ b/setup-gbrain/memory.md
@@ -176,3 +176,101 @@ the recovery path is:
on the brain remote for hard-delete from history
4. File a gitleaks issue with the pattern (or extend the gitleaks config
at `~/.gitleaks.toml`).
+
+## Path 4: Remote MCP setup (v1.27.0.0+)
+
+If you don't run gbrain locally — you have a teammate or another machine
+running `gbrain serve` over HTTP, accessible via Tailscale, ngrok, or
+internal LAN — `/setup-gbrain` Path 4 is the one-paste flow.
+
+You provide:
+- The MCP URL (e.g., `https://wintermute.tail554574.ts.net:3131/mcp`)
+- A bearer token (issued by the brain admin via `gbrain access-token issue`)
+
+What `/setup-gbrain` does:
+1. Verifies the URL + token via `gstack-gbrain-mcp-verify`. Three failure
+ modes get classified with one-line remediation hints:
+ **NETWORK** ("check Tailscale/DNS"), **AUTH** ("rotate token"),
+ **MALFORMED** ("Accept-header gotcha — pass both `application/json`
+ AND `text/event-stream`").
+2. Registers the MCP at user scope:
+ ```
+ claude mcp add --scope user --transport http gbrain "$URL" \
+ --header "Authorization: Bearer $TOKEN"
+ ```
+3. Skips local install, local doctor, transcript ingest, and federated
+ source registration. All four require a local `gbrain` CLI that Path 4
+ doesn't install.
+4. Optionally provisions a `gstack-artifacts-$USER` private repo on
+ GitHub or GitLab and prints the one-line `gbrain sources add` command
+ for your brain admin to run on the brain host.
+
+### Token storage trade-off
+
+The bearer token lives in `~/.claude.json` (mode 0600), where Claude Code
+stores every MCP server's credentials. During `claude mcp add --header
+"Authorization: Bearer $TOKEN"`, the token is briefly visible in
+process argv (~10ms) — visible to `ps` running concurrently. The window
+is small but it's not zero.
+
+Mitigations we've considered:
+- **Stdin or env-var input form for headers** — would close the argv
+ window. As of Claude Code v1.0.x, the CLI doesn't expose either.
+ When it does, `/setup-gbrain` Path 4 will switch automatically.
+- **Keychain storage** — explicitly out of scope (the token's resting
+ state in `~/.claude.json` is the existing trust surface for every MCP
+ credential; expanding to Keychain would touch every MCP server, not
+ just gbrain).
+
+### Why Path 4 is "always print" for the brain-admin hookup
+
+`gstack-artifacts-init` always prints the `gbrain sources add` command
+labeled "Send this to your brain admin" — even when the user IS the
+brain admin (consistent UX, no mode-detection fragility).
+
+A previous design proposed probing whether the user's bearer has admin
+scope (via a benign MCP write call like `add_tag`) and auto-executing
+the source registration when scope was sufficient. The design review
+flagged that page-write doesn't actually prove source-management
+permission — those are different scopes in any sensible auth model.
+Until gbrain ships:
+- a `mcp__gbrain__whoami` capability tool that returns the bearer's
+ scope set, AND
+- a `mcp__gbrain__sources_add` MCP tool with admin-scope gating
+
+we always print the command rather than pretending we know who has
+permission to run it.
+
+### CLAUDE.md block in Path 4
+
+Distinct from local-stdio mode. Token is **never** written to CLAUDE.md
+(many projects check CLAUDE.md into git). The block records the URL,
+the verified server version, the artifacts repo URL (if provisioned),
+and the per-repo trust policy.
+
+```markdown
+## GBrain Configuration (configured by /setup-gbrain)
+- Mode: remote-http
+- MCP URL: https://wintermute.tail554574.ts.net:3131/mcp
+- Server version: gbrain v0.27.1
+- Setup date: 2026-05-06
+- MCP registered: yes (user scope)
+- Token: stored in ~/.claude.json (do not commit; never written to CLAUDE.md)
+- Artifacts repo: github.com/garrytan/gstack-artifacts-garrytan (private)
+- Artifacts sync: artifacts-only
+- Current repo policy: read-write
+```
+
+### Token rotation
+
+Server-side. When verify hits `AUTH` (e.g., the brain admin rotated the
+token), the helper says: "rotate token on the brain host, re-run
+/setup-gbrain." On wintermute or wherever your gbrain server lives:
+
+```
+gbrain access-token rotate # invalidates old, issues new
+```
+
+(See `gstack/setup-gbrain/SKILL.md.tmpl` for the full Path 4 flow plus
+the gbrain enhancement requests around scoped tokens that would let
+gstack auto-rotate in V2.)
diff --git a/sync-gbrain/SKILL.md b/sync-gbrain/SKILL.md
index 9dc66c09..6265eab6 100644
--- a/sync-gbrain/SKILL.md
+++ b/sync-gbrain/SKILL.md
@@ -768,6 +768,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
```
+**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
+this skill is a graceful no-op. The brain server's own indexing cadence
+handles code import + search refresh; this Mac doesn't run a local gbrain
+CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
+
+> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
+> Your brain server (`` from claude.json) handles indexing on its own
+> cadence. If indexing seems stale, ping your brain admin or trigger a
+> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
+> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
+> follow-on TODO."
+
+Then exit cleanly. Do NOT proceed to Step 2.
+
+For local-stdio mode and unconfigured states:
+
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
tell the user:
diff --git a/sync-gbrain/SKILL.md.tmpl b/sync-gbrain/SKILL.md.tmpl
index ce647f1b..15e524c5 100644
--- a/sync-gbrain/SKILL.md.tmpl
+++ b/sync-gbrain/SKILL.md.tmpl
@@ -66,6 +66,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
```
+**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
+this skill is a graceful no-op. The brain server's own indexing cadence
+handles code import + search refresh; this Mac doesn't run a local gbrain
+CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
+
+> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
+> Your brain server (`` from claude.json) handles indexing on its own
+> cadence. If indexing seems stale, ping your brain admin or trigger a
+> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
+> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
+> follow-on TODO."
+
+Then exit cleanly. Do NOT proceed to Step 2.
+
+For local-stdio mode and unconfigured states:
+
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
tell the user:
diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts
index 42ce4027..ab4cdaf6 100644
--- a/test/helpers/touchfiles.ts
+++ b/test/helpers/touchfiles.ts
@@ -133,7 +133,14 @@ export const E2E_TOUCHFILES: Record = {
'plan-eng-finding-count': ['plan-eng-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-eng-finding-count.test.ts'],
'plan-design-finding-count': ['plan-design-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-design-finding-count.test.ts'],
'plan-devex-finding-count': ['plan-devex-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-devex-finding-count.test.ts'],
- 'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-brain-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
+ 'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-artifacts-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
+
+ // /setup-gbrain Path 4 (Remote MCP) — happy + bad-token end-to-end via
+ // Agent SDK. Gate-tier (deterministic stub server, fixed inputs); fires
+ // when the skill template, the verify helper, the artifacts-init helper,
+ // or the detect script changes.
+ 'setup-gbrain-remote': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'bin/gstack-artifacts-init', 'bin/gstack-gbrain-detect', 'test/helpers/agent-sdk-runner.ts'],
+ 'setup-gbrain-bad-token': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'test/helpers/agent-sdk-runner.ts'],
// AskUserQuestion format regression (RECOMMENDATION + Completeness: N/10)
// Fires when either template OR the two preamble resolvers change.
@@ -427,6 +434,12 @@ export const E2E_TIERS: Record = {
// costs ~$0.30-$0.50 per run, not needed on every commit)
'brain-privacy-gate': 'periodic',
+ // /setup-gbrain Path 4 (Remote MCP) — gate-tier. Stub HTTP server is
+ // deterministic; Path 4's STOP gates are the failure mode this catches
+ // (token in CLAUDE.md, partial registration on bad bearer).
+ 'setup-gbrain-remote': 'gate',
+ 'setup-gbrain-bad-token': 'gate',
+
// AskUserQuestion format regression — periodic (Opus 4.7 non-deterministic benchmark)
'plan-ceo-review-format-mode': 'periodic',
'plan-ceo-review-format-approach': 'periodic',
diff --git a/test/no-stale-gstack-brain-refs.test.ts b/test/no-stale-gstack-brain-refs.test.ts
new file mode 100644
index 00000000..50929918
--- /dev/null
+++ b/test/no-stale-gstack-brain-refs.test.ts
@@ -0,0 +1,120 @@
+/**
+ * Regression: no stale `gstack-brain-init`, `gbrain_sync_mode`, or
+ * `~/.gstack-brain-remote.txt` references survive the v1.27.0.0 rename.
+ *
+ * Per codex Findings #1 + #8 + #9: the rename's blast radius is wider than
+ * the obvious bin/ + scripts/ surface. This test grep-scans the broader
+ * tree (bin, scripts, *.tmpl, generated *.md, test/, docs/) for the
+ * deprecated identifiers and fails CI if any callers were missed.
+ *
+ * Allowlist: the migration script (`gstack-upgrade/migrations/v1.27.0.0.sh`)
+ * legitimately references the old names — it's the rename actor itself.
+ * Old migration scripts (v1.17.0.0.sh and similar) reference the old names
+ * for their own historical context and are also allowlisted.
+ *
+ * The test is mechanical: if you find yourself adding a non-historical
+ * file to the allowlist, you probably need to actually fix the rename
+ * instead.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import { spawnSync } from 'child_process';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+const ALLOWLIST = [
+ // The migration script that performs the rename. Self-references are expected.
+ 'gstack-upgrade/migrations/v1.27.0.0.sh',
+ // Older migration scripts — historical references; these document past state.
+ 'gstack-upgrade/migrations/v1.17.0.0.sh',
+ // The migration test itself — it asserts on the migration's behavior.
+ 'test/migrations-v1.27.0.0.test.ts',
+ // The test for the v1.17.0.0 historical migration.
+ 'test/gstack-upgrade-migration-v1_17_0_0.test.ts',
+ // CHANGELOG entries describe historical state by their nature.
+ 'CHANGELOG.md',
+ // TODOS may reference past or future states by name.
+ 'TODOS.md',
+ // The plan file for v1.27.0.0 documents why we're renaming.
+ '.context/plans/setup-gbrain-remote-mcp-rename-brain-artifacts.md',
+ // The bin/gstack-config comment explicitly preserves the rename note.
+ 'bin/gstack-config',
+ // Detect script's "renamed in v1.27.0.0" comment + brain-remote-fallback path.
+ 'bin/gstack-gbrain-detect',
+ // brain-restore + source-wireup keep the old file as a migration-window fallback
+ // (read both, prefer artifacts). brain-uninstall has the same fallback.
+ 'bin/gstack-brain-restore',
+ 'bin/gstack-gbrain-source-wireup',
+ 'bin/gstack-brain-uninstall',
+ // The preamble resolver reads the legacy file as a fallback during the
+ // migration window — same pattern.
+ 'scripts/resolvers/preamble/generate-brain-sync-block.ts',
+ // gstack-upgrade.test.ts may exercise old migration behavior.
+ 'test/gstack-upgrade.test.ts',
+ // This test itself references the patterns to grep for.
+ 'test/no-stale-gstack-brain-refs.test.ts',
+ // memory.md documents the rename context.
+ 'setup-gbrain/memory.md',
+ // The new init script's header comment intentionally cites the rename.
+ 'bin/gstack-artifacts-init',
+ // The replacement test mirrors the pattern of the old test (lineage note).
+ 'test/gstack-artifacts-init.test.ts',
+ // The post-rename-doc-regen test references the patterns it greps for.
+ 'test/post-rename-doc-regen.test.ts',
+ // The Path 4 structural lint references some legacy names in comments.
+ 'test/setup-gbrain-path4-structure.test.ts',
+ // Generated docs that include the preamble bash (which has the fallback).
+ // We grep template sources, not generated output, by limiting scan paths.
+];
+
+const FORBIDDEN_PATTERNS = [
+ 'gstack-brain-init',
+ 'gbrain_sync_mode',
+];
+
+const SCAN_PATHS = [
+ 'bin/',
+ 'scripts/',
+ 'setup-gbrain/SKILL.md.tmpl',
+ 'sync-gbrain/SKILL.md.tmpl',
+ 'health/SKILL.md.tmpl',
+ 'plan-eng-review/SKILL.md.tmpl',
+ 'plan-ceo-review/SKILL.md.tmpl',
+ 'review/SKILL.md.tmpl',
+ 'ship/SKILL.md.tmpl',
+ 'test/',
+];
+
+function grepRefs(pattern: string): string[] {
+ const args = ['-rn', '--', pattern, ...SCAN_PATHS.map((p) => path.join(ROOT, p))];
+ const r = spawnSync('grep', args, { encoding: 'utf-8' });
+ // grep exits 1 when no matches — that's fine for our purposes.
+ const lines = (r.stdout || '').split('\n').filter((l) => l.trim().length > 0);
+ return lines
+ .map((line) => {
+ // Strip ROOT prefix to get repo-relative path.
+ const colon = line.indexOf(':');
+ const file = line.slice(0, colon);
+ return path.relative(ROOT, file);
+ })
+ .filter((file) => !ALLOWLIST.includes(file))
+ // Filter out any file that's inside a directory we don't actually scan.
+ .filter((file) => !file.startsWith('node_modules/') && !file.startsWith('.git/'));
+}
+
+describe('no stale gstack-brain refs (v1.27.0.0 rename)', () => {
+ for (const pattern of FORBIDDEN_PATTERNS) {
+ test(`no non-allowlisted references to "${pattern}"`, () => {
+ const offenders = [...new Set(grepRefs(pattern))];
+ if (offenders.length > 0) {
+ console.error(`Found stale "${pattern}" references in:\n${offenders.map((f) => ` - ${f}`).join('\n')}`);
+ console.error(
+ `If a file is intentionally referencing the old name (migration, historical doc, fallback path), add it to ALLOWLIST in this test.`
+ );
+ }
+ expect(offenders).toEqual([]);
+ });
+ }
+});
diff --git a/test/post-rename-doc-regen.test.ts b/test/post-rename-doc-regen.test.ts
new file mode 100644
index 00000000..14949fc4
--- /dev/null
+++ b/test/post-rename-doc-regen.test.ts
@@ -0,0 +1,74 @@
+// Post-rename doc-regen regression: after `bun run gen:skill-docs`, no
+// `gstack-brain-init` or `gbrain_sync_mode` strings appear in any of the
+// generated SKILL.md files (the cross-product blind spot codex
+// Finding #12 flagged).
+//
+// The check runs against the canonical claude-host output already on
+// disk. We don't shell out to gen-skill-docs again; the existing
+// freshness check in gen-skill-docs.test.ts covers that. This test
+// just verifies the rename actually propagated to the generated
+// artifacts that users see.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+const FORBIDDEN_PATTERNS = [
+ // Bare identifier — should NEVER appear in generated docs (if it does,
+ // a template still has the old call site).
+ /^.*\bgstack-brain-init\b.*$/m,
+ /^.*\bgbrain_sync_mode\b.*$/m,
+];
+
+// Per the preamble resolver: generated docs DO contain the
+// "~/.gstack-brain-remote.txt" string in the migration-window fallback. We
+// don't grep for that — it's intentional. We grep for the call-site
+// identifiers only.
+
+function findSkillMdFiles(): string[] {
+ const skillMd = path.join(ROOT, 'SKILL.md');
+ const files: string[] = [skillMd];
+ // Top-level skill directories with their own SKILL.md.
+ const entries = fs.readdirSync(ROOT, { withFileTypes: true });
+ for (const e of entries) {
+ if (e.isDirectory() && !e.name.startsWith('.') && !['node_modules', 'test'].includes(e.name)) {
+ const inner = path.join(ROOT, e.name, 'SKILL.md');
+ if (fs.existsSync(inner)) files.push(inner);
+ }
+ }
+ return files;
+}
+
+describe('post-rename doc-regen regression (codex Finding #12)', () => {
+ test('no generated SKILL.md contains "gstack-brain-init"', () => {
+ const offenders: string[] = [];
+ for (const file of findSkillMdFiles()) {
+ const content = fs.readFileSync(file, 'utf-8');
+ const m = content.match(/^.*\bgstack-brain-init\b.*$/m);
+ if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
+ }
+ if (offenders.length > 0) {
+ console.error(`Stale "gstack-brain-init" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
+ }
+ expect(offenders).toEqual([]);
+ });
+
+ test('no generated SKILL.md contains "gbrain_sync_mode"', () => {
+ const offenders: string[] = [];
+ for (const file of findSkillMdFiles()) {
+ const content = fs.readFileSync(file, 'utf-8');
+ const m = content.match(/^.*\bgbrain_sync_mode\b.*$/m);
+ if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
+ }
+ if (offenders.length > 0) {
+ console.error(`Stale "gbrain_sync_mode" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
+ }
+ expect(offenders).toEqual([]);
+ });
+
+ test('top-level SKILL.md exists and is regenerated', () => {
+ expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
+ });
+});
diff --git a/test/setup-gbrain-path4-structure.test.ts b/test/setup-gbrain-path4-structure.test.ts
new file mode 100644
index 00000000..1363e069
--- /dev/null
+++ b/test/setup-gbrain-path4-structure.test.ts
@@ -0,0 +1,133 @@
+// setup-gbrain Path 4 structural lint.
+//
+// Verifies the SKILL.md.tmpl has the prose contract that Path 4 (Remote MCP)
+// depends on: STOP gates after verify failures, never-write-token rules,
+// mode-aware CLAUDE.md block, idempotent re-run path.
+//
+// Why a structural test instead of a full Agent SDK E2E:
+// - Side effects (claude.json mutation, MCP registration) are covered
+// by unit tests for gstack-gbrain-mcp-verify and gstack-artifacts-init.
+// - The structural prose is the source of regressions for AUQ pacing
+// (the failure mode the gstack repo has tracked since v1.26.x:
+// "wrote_findings_before_asking"). A grep-based regression on the
+// template prose is fast (<200ms), free, and catches the same drift
+// as the paid E2E without spending tokens.
+// - The full Agent SDK E2E remains the right tool for end-to-end
+// pacing eval; this is the gate-tier check that catches the failure
+// class deterministically.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+const TMPL = path.join(ROOT, 'setup-gbrain', 'SKILL.md.tmpl');
+
+const tmpl = fs.readFileSync(TMPL, 'utf-8');
+
+describe('setup-gbrain Path 4 (Remote MCP) — structural contract', () => {
+ test('Step 2 lists Path 4 as one of the path options', () => {
+ // "4 — Remote gbrain MCP" with em-dash (—, U+2014 — one codepoint).
+ expect(tmpl).toMatch(/\*\*4 . Remote gbrain MCP/);
+ });
+
+ test('Step 4 has a Path 4 sub-section', () => {
+ expect(tmpl).toMatch(/### Path 4 \(Remote gbrain MCP/);
+ });
+
+ test('Step 4 collects the bearer via read_secret_to_env, never argv', () => {
+ // The secret-read helper is the canonical token-capture pattern.
+ // Without it, tokens land in shell history.
+ expect(tmpl).toContain('read_secret_to_env GBRAIN_MCP_TOKEN');
+ });
+
+ test('Step 4c invokes gstack-gbrain-mcp-verify and STOPs on failure', () => {
+ expect(tmpl).toContain('gstack-gbrain-mcp-verify');
+ // The STOP rule is what prevents partial registration after auth fail.
+ const path4Section = tmpl.split('### Path 4')[1] || '';
+ expect(path4Section).toMatch(/STOP/);
+ });
+
+ test('Step 4d explicitly skips Steps 3, 4 (other paths), 5, 7.5 in remote mode', () => {
+ expect(tmpl).toMatch(/4d.*[Ss]kip Steps? 3, 4.*5.*7\.5/s);
+ });
+
+ test('Step 5a has a Path 4 branch with claude mcp add --transport http', () => {
+ expect(tmpl).toMatch(/Path 4 \(Remote MCP/);
+ expect(tmpl).toMatch(/claude mcp add --scope user --transport http gbrain/);
+ expect(tmpl).toContain('Authorization: Bearer $GBRAIN_MCP_TOKEN');
+ // Token must be unset after registration so it doesn't linger in env.
+ expect(tmpl).toMatch(/unset GBRAIN_MCP_TOKEN/);
+ });
+
+ test('Step 5a removes any prior gbrain registration before adding the new one', () => {
+ // Otherwise local-stdio + remote-http coexist, which breaks routing.
+ expect(tmpl).toMatch(/claude mcp remove gbrain/);
+ });
+
+ test('Step 7 calls gstack-artifacts-init with --url-form-supported flag', () => {
+ expect(tmpl).toMatch(/gstack-artifacts-init.*--url-form-supported/);
+ });
+
+ test('Step 8 CLAUDE.md block branches on mode', () => {
+ // The remote-http block has Mode: remote-http; local-stdio block has Engine:.
+ expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
+ expect(tmpl).toMatch(/Mode: remote-http/);
+ expect(tmpl).toMatch(/Mode: local-stdio/);
+ });
+
+ test('Step 8 explicitly says the bearer is never written to CLAUDE.md', () => {
+ // Token-leak regression guard. CLAUDE.md is committed in many projects.
+ expect(tmpl).toMatch(/bearer token is \*\*never\*\* written to CLAUDE\.md/);
+ });
+
+ test('Step 9 smoke test on Path 4 prints a placeholder, never the real token', () => {
+ // Don't paste the token into the curl example the user might share.
+ expect(tmpl).toMatch(//);
+ });
+
+ test('Step 10 verdict block has a remote-http variant separate from local-stdio', () => {
+ expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
+ expect(tmpl).toMatch(/mode: remote-http/);
+ expect(tmpl).toMatch(/N\/A.*remote mode/);
+ });
+
+ test('idempotency: re-running with gbrain_mcp_mode=remote-http skips Step 2', () => {
+ // Re-run path stays graceful; no double-registration.
+ expect(tmpl).toMatch(/gbrain_mcp_mode=remote-http/);
+ });
+
+ test('Step 5 (local doctor) explicitly skips on Path 4', () => {
+ expect(tmpl).toMatch(/SKIP entirely on Path 4 \(Remote MCP\)/);
+ });
+
+ test('Step 7.5 (transcript ingest) explicitly skips on Path 4', () => {
+ // Transcript ingest needs local gbrain CLI which Path 4 doesn't install.
+ const matches = tmpl.match(/SKIP entirely on Path 4 \(Remote MCP\)/g);
+ expect(matches?.length).toBeGreaterThanOrEqual(2);
+ });
+});
+
+describe('setup-gbrain Path 4 — token security regressions', () => {
+ test('the template never inlines a real-shaped bearer string', () => {
+ // We never want a literal "gbrain_" token to appear in the
+ // template — placeholders only. This catches the failure mode where
+ // someone copies a real token into the template by accident.
+ const realTokenShape = /gbrain_[a-f0-9]{40,}/;
+ expect(tmpl).not.toMatch(realTokenShape);
+ });
+
+ test('Path 4 always uses env-var $GBRAIN_MCP_TOKEN, never inline strings', () => {
+ // Find every reference to the bearer header in Path 4 and verify it's
+ // either an env-var expansion or an explicit placeholder. Allow:
+ // - $GBRAIN_MCP_TOKEN (env-var expansion)
+ // - , , (placeholder)
+ // - "..." (rest-of-doc-text continuation; a doc note showing how
+ // `claude mcp add --header` shapes its argv).
+ const path4Section = tmpl.match(/### Path 4 \(Remote MCP[\s\S]*?(?=###|## )/g)?.join('') || '';
+ const bearerLines = path4Section.match(/Bearer\s+\S+/g) || [];
+ for (const line of bearerLines) {
+ expect(line).toMatch(/Bearer (\$GBRAIN_MCP_TOKEN||||\.\.\."?)/);
+ }
+ });
+});
diff --git a/test/skill-e2e-setup-gbrain-bad-token.test.ts b/test/skill-e2e-setup-gbrain-bad-token.test.ts
new file mode 100644
index 00000000..61012a1d
--- /dev/null
+++ b/test/skill-e2e-setup-gbrain-bad-token.test.ts
@@ -0,0 +1,148 @@
+// E2E: /setup-gbrain Path 4 with a bad bearer token via Agent SDK.
+//
+// Drives the skill against a stub HTTP MCP server that returns 401
+// (auth-shape body). Asserts that the AUTH classifier hint shows up
+// AND no MCP registration happens (no claude mcp add --transport http
+// in the call log; no half-written CLAUDE.md block). This is the
+// regression guard for the "verify failed → STOP" gate.
+//
+// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import * as http from 'http';
+import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
+
+const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
+const describeE2E = shouldRun ? describe : describe.skip;
+
+function startStub401(): Promise<{ url: string; close: () => Promise }> {
+ return new Promise((resolve) => {
+ const server = http.createServer((req, res) => {
+ let body = '';
+ req.on('data', (c) => (body += c));
+ req.on('end', () => {
+ res.statusCode = 401;
+ res.setHeader('Content-Type', 'application/json');
+ res.end(
+ JSON.stringify({ error: 'unauthorized', error_description: 'invalid or expired auth token' })
+ );
+ });
+ });
+ server.listen(0, '127.0.0.1', () => {
+ const addr = server.address();
+ if (!addr || typeof addr === 'string') throw new Error('no address');
+ resolve({
+ url: `http://127.0.0.1:${addr.port}/mcp`,
+ close: () => new Promise((r) => server.close(() => r())),
+ });
+ });
+ });
+}
+
+function makeFakeClaude(fakeBinDir: string): string {
+ const callLog = path.join(fakeBinDir, 'claude-calls.log');
+ const script = `#!/bin/bash
+echo "claude $@" >> "${callLog}"
+case "$1 $2" in
+ "mcp add") exit 0 ;;
+ "mcp list") echo "no gbrain" ; exit 0 ;;
+ "mcp remove") exit 0 ;;
+ "mcp get") exit 1 ;;
+esac
+exit 0
+`;
+ fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
+ return callLog;
+}
+
+describeE2E('/setup-gbrain Path 4 — bad token STOPs cleanly', () => {
+ test('AUTH classifier fires, no MCP registration, no CLAUDE.md mutation', async () => {
+ const stubServer = await startStub401();
+ const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-'));
+ const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-bin-'));
+ const callLog = makeFakeClaude(fakeBinDir);
+
+ const ORIGINAL_CLAUDE_MD = '# Test project\n\nSome existing content here.\n';
+ fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), ORIGINAL_CLAUDE_MD);
+
+ const BAD_TOKEN = 'gbrain_BAD_TOKEN_67890_DELIBERATELY_INVALID';
+ const askUserQuestions: Array<{ input: Record }> = [];
+ const binary = resolveClaudeBinary();
+
+ const orig = {
+ gstackHome: process.env.GSTACK_HOME,
+ pathEnv: process.env.PATH,
+ mcpToken: process.env.GBRAIN_MCP_TOKEN,
+ };
+ process.env.GSTACK_HOME = gstackHome;
+ process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
+ process.env.GBRAIN_MCP_TOKEN = BAD_TOKEN;
+
+ let modelTextOutput = '';
+
+ try {
+ const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
+ const result = await runAgentSdkTest({
+ systemPrompt: { type: 'preset', preset: 'claude_code' },
+ userPrompt:
+ `Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
+ `Use this MCP URL: ${stubServer.url}. ` +
+ `The bearer token is already in the GBRAIN_MCP_TOKEN env var. ` +
+ `If verify fails (Step 4c), follow the skill's STOP rule — surface the error and stop. ` +
+ `Do NOT register the MCP if verify failed. ` +
+ `Do NOT modify CLAUDE.md if verify failed.`,
+ workingDirectory: gstackHome,
+ maxTurns: 15,
+ allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
+ ...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
+ canUseTool: async (toolName, input) => {
+ if (toolName === 'AskUserQuestion') {
+ askUserQuestions.push({ input });
+ const q = (input.questions as Array<{
+ question: string;
+ options: Array<{ label: string }>;
+ }>)[0];
+ const decline = q.options.find((o) => /skip|decline|no/i.test(o.label)) ?? q.options[0]!;
+ return {
+ behavior: 'allow',
+ updatedInput: { questions: input.questions, answers: { [q.question]: decline.label } },
+ };
+ }
+ return passThroughNonAskUserQuestion(toolName, input);
+ },
+ });
+
+ modelTextOutput = JSON.stringify(result);
+
+ // Assertion 1: the AUTH classifier hint surfaced somewhere in the run.
+ // The verify helper outputs `"error_class": "AUTH"` and the hint
+ // "rotate token on the brain host" — at least one should be visible.
+ const hintShown =
+ /error_class.*AUTH/i.test(modelTextOutput) ||
+ /rotate token/i.test(modelTextOutput) ||
+ /AUTH.*HTTP 401/i.test(modelTextOutput);
+ expect(hintShown).toBe(true);
+
+ // Assertion 2: claude mcp add was NEVER called (verify failed → STOP).
+ const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
+ expect(calls).not.toMatch(/mcp add.*--transport http/);
+
+ // Assertion 3: CLAUDE.md is unchanged (no half-written block).
+ const finalClaudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
+ expect(finalClaudeMd).toBe(ORIGINAL_CLAUDE_MD);
+
+ // Assertion 4: the bad token never leaked to CLAUDE.md.
+ expect(finalClaudeMd).not.toContain(BAD_TOKEN);
+ } finally {
+ if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
+ if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
+ if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
+ await stubServer.close();
+ fs.rmSync(gstackHome, { recursive: true, force: true });
+ fs.rmSync(fakeBinDir, { recursive: true, force: true });
+ }
+ }, 240_000);
+});
diff --git a/test/skill-e2e-setup-gbrain-remote.test.ts b/test/skill-e2e-setup-gbrain-remote.test.ts
new file mode 100644
index 00000000..3ff90973
--- /dev/null
+++ b/test/skill-e2e-setup-gbrain-remote.test.ts
@@ -0,0 +1,214 @@
+// E2E: /setup-gbrain Path 4 (Remote MCP) happy path via Agent SDK.
+//
+// Drives the skill against a stub HTTP MCP server and a stubbed `claude`
+// binary that records `claude mcp add` calls. Asserts:
+// - The verify helper succeeds (no AUTH/MALFORMED/NETWORK error in output)
+// - The skill calls `claude mcp add --transport http` with the bearer
+// - The token NEVER appears in the CLAUDE.md block the skill writes
+// - The wrote_findings_before_asking failure mode is NOT triggered
+//
+// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
+//
+// See setup-gbrain/SKILL.md.tmpl Step 4 (Path 4) for the contract under test.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import * as http from 'http';
+import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
+
+const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
+const describeE2E = shouldRun ? describe : describe.skip;
+
+// Spin up a stub MCP server that responds to initialize + tools/list.
+function startStubMcpServer(opts: { failWithStatus?: number; failBody?: string } = {}): Promise<{ url: string; close: () => Promise }> {
+ return new Promise((resolve) => {
+ const server = http.createServer((req, res) => {
+ if (req.method !== 'POST' || !(req.url ?? '').endsWith('/mcp')) {
+ res.statusCode = 404;
+ res.end();
+ return;
+ }
+ let body = '';
+ req.on('data', (c) => (body += c));
+ req.on('end', () => {
+ if (opts.failWithStatus) {
+ res.statusCode = opts.failWithStatus;
+ res.setHeader('Content-Type', 'application/json');
+ res.end(opts.failBody ?? JSON.stringify({ error: 'fail' }));
+ return;
+ }
+ const reqJson = (() => {
+ try { return JSON.parse(body); } catch { return {} as any; }
+ })();
+ let respBody: any;
+ if (reqJson.method === 'initialize') {
+ respBody = {
+ result: {
+ protocolVersion: '2024-11-05',
+ capabilities: { tools: {} },
+ serverInfo: { name: 'gbrain', version: '0.27.1' },
+ },
+ jsonrpc: '2.0',
+ id: reqJson.id,
+ };
+ } else if (reqJson.method === 'tools/list') {
+ respBody = { result: { tools: [{ name: 'search' }, { name: 'put_page' }] }, jsonrpc: '2.0', id: reqJson.id };
+ } else {
+ respBody = { error: { code: -32601, message: 'unknown method' }, jsonrpc: '2.0', id: reqJson.id };
+ }
+ // SSE-shape since the verify helper supports both, and many MCP
+ // servers (including wintermute) wrap responses as SSE.
+ res.statusCode = 200;
+ res.setHeader('Content-Type', 'text/event-stream');
+ res.end(`event: message\ndata: ${JSON.stringify(respBody)}\n\n`);
+ });
+ });
+ server.listen(0, '127.0.0.1', () => {
+ const addr = server.address();
+ if (!addr || typeof addr === 'string') throw new Error('no address');
+ resolve({
+ url: `http://127.0.0.1:${addr.port}/mcp`,
+ close: () => new Promise((r) => server.close(() => r())),
+ });
+ });
+ });
+}
+
+// Stubbed `claude` binary: intercepts `mcp add` and `mcp list` commands so
+// the skill's Step 5a registration appears to succeed, while we record
+// every invocation for assertions.
+function makeFakeClaude(fakeBinDir: string): string {
+ const claudeJsonPath = path.join(fakeBinDir, 'claude.json');
+ const callLog = path.join(fakeBinDir, 'claude-calls.log');
+ const script = `#!/bin/bash
+echo "claude $@" >> "${callLog}"
+case "$1 $2" in
+ "mcp add")
+ # Just record the call; pretend it succeeded.
+ exit 0
+ ;;
+ "mcp list")
+ echo "gbrain: http://127.0.0.1:0/mcp (HTTP) - ✓ Connected"
+ exit 0
+ ;;
+ "mcp remove")
+ exit 0
+ ;;
+ "mcp get")
+ # First few calls return "no entry"; after mcp add fires, return success.
+ if [ -f "${claudeJsonPath}" ]; then
+ cat "${claudeJsonPath}"
+ exit 0
+ fi
+ exit 1
+ ;;
+esac
+exit 0
+`;
+ fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
+ return callLog;
+}
+
+describeE2E('/setup-gbrain Path 4 (Remote MCP) — happy path', () => {
+ test('verifies, registers HTTP MCP, never writes token to CLAUDE.md', async () => {
+ const stubServer = await startStubMcpServer();
+ const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-'));
+ const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-bin-'));
+ const callLog = makeFakeClaude(fakeBinDir);
+
+ // The skill writes CLAUDE.md in cwd. Use gstackHome as cwd so we
+ // can inspect it after the run.
+ fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), '# Test project\n');
+
+ const SECRET_TOKEN = 'gbrain_TEST_TOKEN_THAT_MUST_NEVER_LEAK_84613';
+ const askUserQuestions: Array<{ input: Record }> = [];
+ const binary = resolveClaudeBinary();
+
+ // Ambient env mutations. Restored in finally.
+ const orig = {
+ gstackHome: process.env.GSTACK_HOME,
+ pathEnv: process.env.PATH,
+ mcpToken: process.env.GBRAIN_MCP_TOKEN,
+ };
+ process.env.GSTACK_HOME = gstackHome;
+ process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
+ process.env.GBRAIN_MCP_TOKEN = SECRET_TOKEN;
+
+ let modelTextOutput = '';
+
+ try {
+ const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
+ const result = await runAgentSdkTest({
+ systemPrompt: { type: 'preset', preset: 'claude_code' },
+ userPrompt:
+ `Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
+ `Use this MCP URL: ${stubServer.url}. ` +
+ `The bearer token is already in the GBRAIN_MCP_TOKEN env var (do not echo it). ` +
+ `Skip the privacy gate — answer "Decline" if the preamble fires. ` +
+ `Skip the artifacts-repo provisioning step (Step 7) — answer "No thanks". ` +
+ `Skip per-remote policy (Step 6) — answer "skip-for-now". ` +
+ `Walk through Steps 4a, 4b, 4c, 5a, 8, 10 ONLY.`,
+ workingDirectory: gstackHome,
+ maxTurns: 25,
+ allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
+ ...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
+ canUseTool: async (toolName, input) => {
+ if (toolName === 'AskUserQuestion') {
+ askUserQuestions.push({ input });
+ const q = (input.questions as Array<{
+ question: string;
+ options: Array<{ label: string }>;
+ }>)[0];
+ // Auto-decline / skip everything except the path-pick (which the
+ // user-prompt already directed to Path 4).
+ const decline =
+ q.options.find((o) => /skip|decline|no thanks|local/i.test(o.label)) ?? q.options[q.options.length - 1]!;
+ return {
+ behavior: 'allow',
+ updatedInput: {
+ questions: input.questions,
+ answers: { [q.question]: decline.label },
+ },
+ };
+ }
+ return passThroughNonAskUserQuestion(toolName, input);
+ },
+ });
+
+ modelTextOutput = JSON.stringify(result);
+
+ // Assertion 1: the verify helper succeeded (no error class surfaced).
+ expect(modelTextOutput).not.toMatch(/error_class.*NETWORK/i);
+ expect(modelTextOutput).not.toMatch(/error_class.*AUTH/i);
+ expect(modelTextOutput).not.toMatch(/error_class.*MALFORMED/i);
+
+ // Assertion 2: claude mcp add was called with --transport http.
+ const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
+ expect(calls).toMatch(/mcp add.*--transport http/);
+
+ // Assertion 3: the secret token NEVER appears in the final CLAUDE.md.
+ const claudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
+ expect(claudeMd).not.toContain(SECRET_TOKEN);
+
+ // Assertion 4: CLAUDE.md got the remote-http block.
+ expect(claudeMd).toMatch(/Mode: remote-http/);
+
+ // Assertion 5: classifier — the model didn't write findings before
+ // asking. The Path 4 prose has 5 STOP gates; if any of them got
+ // skipped, that's the wrote_findings_before_asking pattern.
+ const wroteBefore = /## GSTACK REVIEW REPORT|critical_gaps/i.test(modelTextOutput);
+ // Setup-gbrain doesn't have a review report contract, so this is
+ // a structural shape check, not a hard failure mode.
+ expect(wroteBefore).toBe(false);
+ } finally {
+ if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
+ if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
+ if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
+ await stubServer.close();
+ fs.rmSync(gstackHome, { recursive: true, force: true });
+ fs.rmSync(fakeBinDir, { recursive: true, force: true });
+ }
+ }, 240_000);
+});