test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex Findings #1, #8, #9, #12): - test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl, test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode); fails CI if any non-allowlisted file references them. - test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has no stale references in any */SKILL.md (the cross-product blind spot). - test/setup-gbrain-path4-structure.test.ts: structural lint over the Path 4 prose contract — STOP gates after verify failure, never-write- token rules, mode-aware CLAUDE.md block, bearer always via env-var. Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs): - test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs an HTTP MCP server, drives the skill via Agent SDK with a stubbed bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets the remote-http block, the secret token NEVER leaks to CLAUDE.md. - test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401; asserts the AUTH classifier hint surfaces, no MCP registration occurs, CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP" rule. touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at gate-tier so CI catches Path 4 regressions on every PR. Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log (legacy gstack-brain-init mentions in headers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 06:26:45 +02:00 · 2026-05-06 10:15:23 -07:00
parent 935adf3a50
commit b0e0a76dca
11 changed files with 835 additions and 3 deletions
@@ -4,7 +4,7 @@
 # Usage (called by git, not by users):
 #   gstack-jsonl-merge <base> <ours> <theirs>
 #
-# Registered in local git config by bin/gstack-brain-init and
+# Registered in local git config by bin/gstack-artifacts-init and
 # bin/gstack-brain-restore:
 #   git config merge.jsonl-append.driver \
 #       "$GSTACK_BIN/gstack-jsonl-merge %O %A %B"
@@ -4,7 +4,7 @@
 #
 # Session timeline: local by default. If the user enables `artifacts_sync_mode`
 # with the `full` (not `artifacts-only`) privacy tier — via the first-run
-# stop-gate from `gstack-brain-init` or the preamble — timeline events are
+# stop-gate from `gstack-artifacts-init` or the preamble — timeline events are
 # published to the user's private GBrain sync repo. See docs/gbrain-sync.md.
 # Required fields: skill, event (started|completed).
 # Optional: branch, outcome, duration_s, session, ts.
@@ -176,3 +176,101 @@ the recovery path is:
   on the brain remote for hard-delete from history
 4. File a gitleaks issue with the pattern (or extend the gitleaks config
   at `~/.gitleaks.toml`).
+
+## Path 4: Remote MCP setup (v1.27.0.0+)
+
+If you don't run gbrain locally — you have a teammate or another machine
+running `gbrain serve` over HTTP, accessible via Tailscale, ngrok, or
+internal LAN — `/setup-gbrain` Path 4 is the one-paste flow.
+
+You provide:
+- The MCP URL (e.g., `https://wintermute.tail554574.ts.net:3131/mcp`)
+- A bearer token (issued by the brain admin via `gbrain access-token issue`)
+
+What `/setup-gbrain` does:
+1. Verifies the URL + token via `gstack-gbrain-mcp-verify`. Three failure
+   modes get classified with one-line remediation hints:
+   **NETWORK** ("check Tailscale/DNS"), **AUTH** ("rotate token"),
+   **MALFORMED** ("Accept-header gotcha — pass both `application/json`
+   AND `text/event-stream`").
+2. Registers the MCP at user scope:
+   ```
+   claude mcp add --scope user --transport http gbrain "$URL" \
+     --header "Authorization: Bearer $TOKEN"
+   ```
+3. Skips local install, local doctor, transcript ingest, and federated
+   source registration. All four require a local `gbrain` CLI that Path 4
+   doesn't install.
+4. Optionally provisions a `gstack-artifacts-$USER` private repo on
+   GitHub or GitLab and prints the one-line `gbrain sources add` command
+   for your brain admin to run on the brain host.
+
+### Token storage trade-off
+
+The bearer token lives in `~/.claude.json` (mode 0600), where Claude Code
+stores every MCP server's credentials. During `claude mcp add --header
+"Authorization: Bearer $TOKEN"`, the token is briefly visible in
+process argv (~10ms) — visible to `ps` running concurrently. The window
+is small but it's not zero.
+
+Mitigations we've considered:
+- **Stdin or env-var input form for headers** — would close the argv
+  window. As of Claude Code v1.0.x, the CLI doesn't expose either.
+  When it does, `/setup-gbrain` Path 4 will switch automatically.
+- **Keychain storage** — explicitly out of scope (the token's resting
+  state in `~/.claude.json` is the existing trust surface for every MCP
+  credential; expanding to Keychain would touch every MCP server, not
+  just gbrain).
+
+### Why Path 4 is "always print" for the brain-admin hookup
+
+`gstack-artifacts-init` always prints the `gbrain sources add` command
+labeled "Send this to your brain admin" — even when the user IS the
+brain admin (consistent UX, no mode-detection fragility).
+
+A previous design proposed probing whether the user's bearer has admin
+scope (via a benign MCP write call like `add_tag`) and auto-executing
+the source registration when scope was sufficient. The design review
+flagged that page-write doesn't actually prove source-management
+permission — those are different scopes in any sensible auth model.
+Until gbrain ships:
+- a `mcp__gbrain__whoami` capability tool that returns the bearer's
+  scope set, AND
+- a `mcp__gbrain__sources_add` MCP tool with admin-scope gating
+
+we always print the command rather than pretending we know who has
+permission to run it.
+
+### CLAUDE.md block in Path 4
+
+Distinct from local-stdio mode. Token is **never** written to CLAUDE.md
+(many projects check CLAUDE.md into git). The block records the URL,
+the verified server version, the artifacts repo URL (if provisioned),
+and the per-repo trust policy.
+
+```markdown
+## GBrain Configuration (configured by /setup-gbrain)
+- Mode: remote-http
+- MCP URL: https://wintermute.tail554574.ts.net:3131/mcp
+- Server version: gbrain v0.27.1
+- Setup date: 2026-05-06
+- MCP registered: yes (user scope)
+- Token: stored in ~/.claude.json (do not commit; never written to CLAUDE.md)
+- Artifacts repo: github.com/garrytan/gstack-artifacts-garrytan (private)
+- Artifacts sync: artifacts-only
+- Current repo policy: read-write
+```
+
+### Token rotation
+
+Server-side. When verify hits `AUTH` (e.g., the brain admin rotated the
+token), the helper says: "rotate token on the brain host, re-run
+/setup-gbrain." On wintermute or wherever your gbrain server lives:
+
+```
+gbrain access-token rotate    # invalidates old, issues new
+```
+
+(See `gstack/setup-gbrain/SKILL.md.tmpl` for the full Path 4 flow plus
+the gbrain enhancement requests around scoped tokens that would let
+gstack auto-rotate in V2.)
@@ -768,6 +768,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
 ~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
 ```

+**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
+this skill is a graceful no-op. The brain server's own indexing cadence
+handles code import + search refresh; this Mac doesn't run a local gbrain
+CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
+
+> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
+> Your brain server (`<host>` from claude.json) handles indexing on its own
+> cadence. If indexing seems stale, ping your brain admin or trigger a
+> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
+> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
+> follow-on TODO."
+
+Then exit cleanly. Do NOT proceed to Step 2.
+
+For local-stdio mode and unconfigured states:
+
 If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
 not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
 tell the user:
@@ -66,6 +66,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
 ~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
 ```

+**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
+this skill is a graceful no-op. The brain server's own indexing cadence
+handles code import + search refresh; this Mac doesn't run a local gbrain
+CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
+
+> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
+> Your brain server (`<host>` from claude.json) handles indexing on its own
+> cadence. If indexing seems stale, ping your brain admin or trigger a
+> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
+> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
+> follow-on TODO."
+
+Then exit cleanly. Do NOT proceed to Step 2.
+
+For local-stdio mode and unconfigured states:
+
 If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
 not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
 tell the user:
@@ -133,7 +133,14 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  'plan-eng-finding-count':      ['plan-eng-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-eng-finding-count.test.ts'],
  'plan-design-finding-count':   ['plan-design-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-design-finding-count.test.ts'],
  'plan-devex-finding-count':    ['plan-devex-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-devex-finding-count.test.ts'],
-  'brain-privacy-gate':           ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-brain-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
+  'brain-privacy-gate':           ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-artifacts-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
+
+  // /setup-gbrain Path 4 (Remote MCP) — happy + bad-token end-to-end via
+  // Agent SDK. Gate-tier (deterministic stub server, fixed inputs); fires
+  // when the skill template, the verify helper, the artifacts-init helper,
+  // or the detect script changes.
+  'setup-gbrain-remote':          ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'bin/gstack-artifacts-init', 'bin/gstack-gbrain-detect', 'test/helpers/agent-sdk-runner.ts'],
+  'setup-gbrain-bad-token':       ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'test/helpers/agent-sdk-runner.ts'],

  // AskUserQuestion format regression (RECOMMENDATION + Completeness: N/10)
  // Fires when either template OR the two preamble resolvers change.
@@ -427,6 +434,12 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
  // costs ~$0.30-$0.50 per run, not needed on every commit)
  'brain-privacy-gate': 'periodic',

+  // /setup-gbrain Path 4 (Remote MCP) — gate-tier. Stub HTTP server is
+  // deterministic; Path 4's STOP gates are the failure mode this catches
+  // (token in CLAUDE.md, partial registration on bad bearer).
+  'setup-gbrain-remote': 'gate',
+  'setup-gbrain-bad-token': 'gate',
+
  // AskUserQuestion format regression — periodic (Opus 4.7 non-deterministic benchmark)
  'plan-ceo-review-format-mode': 'periodic',
  'plan-ceo-review-format-approach': 'periodic',
@@ -0,0 +1,120 @@
+/**
+ * Regression: no stale `gstack-brain-init`, `gbrain_sync_mode`, or
+ * `~/.gstack-brain-remote.txt` references survive the v1.27.0.0 rename.
+ *
+ * Per codex Findings #1 + #8 + #9: the rename's blast radius is wider than
+ * the obvious bin/ + scripts/ surface. This test grep-scans the broader
+ * tree (bin, scripts, *.tmpl, generated *.md, test/, docs/) for the
+ * deprecated identifiers and fails CI if any callers were missed.
+ *
+ * Allowlist: the migration script (`gstack-upgrade/migrations/v1.27.0.0.sh`)
+ * legitimately references the old names — it's the rename actor itself.
+ * Old migration scripts (v1.17.0.0.sh and similar) reference the old names
+ * for their own historical context and are also allowlisted.
+ *
+ * The test is mechanical: if you find yourself adding a non-historical
+ * file to the allowlist, you probably need to actually fix the rename
+ * instead.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import { spawnSync } from 'child_process';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+const ALLOWLIST = [
+  // The migration script that performs the rename. Self-references are expected.
+  'gstack-upgrade/migrations/v1.27.0.0.sh',
+  // Older migration scripts — historical references; these document past state.
+  'gstack-upgrade/migrations/v1.17.0.0.sh',
+  // The migration test itself — it asserts on the migration's behavior.
+  'test/migrations-v1.27.0.0.test.ts',
+  // The test for the v1.17.0.0 historical migration.
+  'test/gstack-upgrade-migration-v1_17_0_0.test.ts',
+  // CHANGELOG entries describe historical state by their nature.
+  'CHANGELOG.md',
+  // TODOS may reference past or future states by name.
+  'TODOS.md',
+  // The plan file for v1.27.0.0 documents why we're renaming.
+  '.context/plans/setup-gbrain-remote-mcp-rename-brain-artifacts.md',
+  // The bin/gstack-config comment explicitly preserves the rename note.
+  'bin/gstack-config',
+  // Detect script's "renamed in v1.27.0.0" comment + brain-remote-fallback path.
+  'bin/gstack-gbrain-detect',
+  // brain-restore + source-wireup keep the old file as a migration-window fallback
+  // (read both, prefer artifacts). brain-uninstall has the same fallback.
+  'bin/gstack-brain-restore',
+  'bin/gstack-gbrain-source-wireup',
+  'bin/gstack-brain-uninstall',
+  // The preamble resolver reads the legacy file as a fallback during the
+  // migration window — same pattern.
+  'scripts/resolvers/preamble/generate-brain-sync-block.ts',
+  // gstack-upgrade.test.ts may exercise old migration behavior.
+  'test/gstack-upgrade.test.ts',
+  // This test itself references the patterns to grep for.
+  'test/no-stale-gstack-brain-refs.test.ts',
+  // memory.md documents the rename context.
+  'setup-gbrain/memory.md',
+  // The new init script's header comment intentionally cites the rename.
+  'bin/gstack-artifacts-init',
+  // The replacement test mirrors the pattern of the old test (lineage note).
+  'test/gstack-artifacts-init.test.ts',
+  // The post-rename-doc-regen test references the patterns it greps for.
+  'test/post-rename-doc-regen.test.ts',
+  // The Path 4 structural lint references some legacy names in comments.
+  'test/setup-gbrain-path4-structure.test.ts',
+  // Generated docs that include the preamble bash (which has the fallback).
+  // We grep template sources, not generated output, by limiting scan paths.
+];
+
+const FORBIDDEN_PATTERNS = [
+  'gstack-brain-init',
+  'gbrain_sync_mode',
+];
+
+const SCAN_PATHS = [
+  'bin/',
+  'scripts/',
+  'setup-gbrain/SKILL.md.tmpl',
+  'sync-gbrain/SKILL.md.tmpl',
+  'health/SKILL.md.tmpl',
+  'plan-eng-review/SKILL.md.tmpl',
+  'plan-ceo-review/SKILL.md.tmpl',
+  'review/SKILL.md.tmpl',
+  'ship/SKILL.md.tmpl',
+  'test/',
+];
+
+function grepRefs(pattern: string): string[] {
+  const args = ['-rn', '--', pattern, ...SCAN_PATHS.map((p) => path.join(ROOT, p))];
+  const r = spawnSync('grep', args, { encoding: 'utf-8' });
+  // grep exits 1 when no matches — that's fine for our purposes.
+  const lines = (r.stdout || '').split('\n').filter((l) => l.trim().length > 0);
+  return lines
+    .map((line) => {
+      // Strip ROOT prefix to get repo-relative path.
+      const colon = line.indexOf(':');
+      const file = line.slice(0, colon);
+      return path.relative(ROOT, file);
+    })
+    .filter((file) => !ALLOWLIST.includes(file))
+    // Filter out any file that's inside a directory we don't actually scan.
+    .filter((file) => !file.startsWith('node_modules/') && !file.startsWith('.git/'));
+}
+
+describe('no stale gstack-brain refs (v1.27.0.0 rename)', () => {
+  for (const pattern of FORBIDDEN_PATTERNS) {
+    test(`no non-allowlisted references to "${pattern}"`, () => {
+      const offenders = [...new Set(grepRefs(pattern))];
+      if (offenders.length > 0) {
+        console.error(`Found stale "${pattern}" references in:\n${offenders.map((f) => `  - ${f}`).join('\n')}`);
+        console.error(
+          `If a file is intentionally referencing the old name (migration, historical doc, fallback path), add it to ALLOWLIST in this test.`
+        );
+      }
+      expect(offenders).toEqual([]);
+    });
+  }
+});
@@ -0,0 +1,74 @@
+// Post-rename doc-regen regression: after `bun run gen:skill-docs`, no
+// `gstack-brain-init` or `gbrain_sync_mode` strings appear in any of the
+// generated SKILL.md files (the cross-product blind spot codex
+// Finding #12 flagged).
+//
+// The check runs against the canonical claude-host output already on
+// disk. We don't shell out to gen-skill-docs again; the existing
+// freshness check in gen-skill-docs.test.ts covers that. This test
+// just verifies the rename actually propagated to the generated
+// artifacts that users see.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+const FORBIDDEN_PATTERNS = [
+  // Bare identifier — should NEVER appear in generated docs (if it does,
+  // a template still has the old call site).
+  /^.*\bgstack-brain-init\b.*$/m,
+  /^.*\bgbrain_sync_mode\b.*$/m,
+];
+
+// Per the preamble resolver: generated docs DO contain the
+// "~/.gstack-brain-remote.txt" string in the migration-window fallback. We
+// don't grep for that — it's intentional. We grep for the call-site
+// identifiers only.
+
+function findSkillMdFiles(): string[] {
+  const skillMd = path.join(ROOT, 'SKILL.md');
+  const files: string[] = [skillMd];
+  // Top-level skill directories with their own SKILL.md.
+  const entries = fs.readdirSync(ROOT, { withFileTypes: true });
+  for (const e of entries) {
+    if (e.isDirectory() && !e.name.startsWith('.') && !['node_modules', 'test'].includes(e.name)) {
+      const inner = path.join(ROOT, e.name, 'SKILL.md');
+      if (fs.existsSync(inner)) files.push(inner);
+    }
+  }
+  return files;
+}
+
+describe('post-rename doc-regen regression (codex Finding #12)', () => {
+  test('no generated SKILL.md contains "gstack-brain-init"', () => {
+    const offenders: string[] = [];
+    for (const file of findSkillMdFiles()) {
+      const content = fs.readFileSync(file, 'utf-8');
+      const m = content.match(/^.*\bgstack-brain-init\b.*$/m);
+      if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
+    }
+    if (offenders.length > 0) {
+      console.error(`Stale "gstack-brain-init" in generated SKILL.md files:\n${offenders.map((o) => '  ' + o).join('\n')}`);
+    }
+    expect(offenders).toEqual([]);
+  });
+
+  test('no generated SKILL.md contains "gbrain_sync_mode"', () => {
+    const offenders: string[] = [];
+    for (const file of findSkillMdFiles()) {
+      const content = fs.readFileSync(file, 'utf-8');
+      const m = content.match(/^.*\bgbrain_sync_mode\b.*$/m);
+      if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
+    }
+    if (offenders.length > 0) {
+      console.error(`Stale "gbrain_sync_mode" in generated SKILL.md files:\n${offenders.map((o) => '  ' + o).join('\n')}`);
+    }
+    expect(offenders).toEqual([]);
+  });
+
+  test('top-level SKILL.md exists and is regenerated', () => {
+    expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
+  });
+});
@@ -0,0 +1,133 @@
+// setup-gbrain Path 4 structural lint.
+//
+// Verifies the SKILL.md.tmpl has the prose contract that Path 4 (Remote MCP)
+// depends on: STOP gates after verify failures, never-write-token rules,
+// mode-aware CLAUDE.md block, idempotent re-run path.
+//
+// Why a structural test instead of a full Agent SDK E2E:
+//   - Side effects (claude.json mutation, MCP registration) are covered
+//     by unit tests for gstack-gbrain-mcp-verify and gstack-artifacts-init.
+//   - The structural prose is the source of regressions for AUQ pacing
+//     (the failure mode the gstack repo has tracked since v1.26.x:
+//     "wrote_findings_before_asking"). A grep-based regression on the
+//     template prose is fast (<200ms), free, and catches the same drift
+//     as the paid E2E without spending tokens.
+//   - The full Agent SDK E2E remains the right tool for end-to-end
+//     pacing eval; this is the gate-tier check that catches the failure
+//     class deterministically.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+const TMPL = path.join(ROOT, 'setup-gbrain', 'SKILL.md.tmpl');
+
+const tmpl = fs.readFileSync(TMPL, 'utf-8');
+
+describe('setup-gbrain Path 4 (Remote MCP) — structural contract', () => {
+  test('Step 2 lists Path 4 as one of the path options', () => {
+    // "4 — Remote gbrain MCP" with em-dash (—, U+2014 — one codepoint).
+    expect(tmpl).toMatch(/\*\*4 . Remote gbrain MCP/);
+  });
+
+  test('Step 4 has a Path 4 sub-section', () => {
+    expect(tmpl).toMatch(/### Path 4 \(Remote gbrain MCP/);
+  });
+
+  test('Step 4 collects the bearer via read_secret_to_env, never argv', () => {
+    // The secret-read helper is the canonical token-capture pattern.
+    // Without it, tokens land in shell history.
+    expect(tmpl).toContain('read_secret_to_env GBRAIN_MCP_TOKEN');
+  });
+
+  test('Step 4c invokes gstack-gbrain-mcp-verify and STOPs on failure', () => {
+    expect(tmpl).toContain('gstack-gbrain-mcp-verify');
+    // The STOP rule is what prevents partial registration after auth fail.
+    const path4Section = tmpl.split('### Path 4')[1] || '';
+    expect(path4Section).toMatch(/STOP/);
+  });
+
+  test('Step 4d explicitly skips Steps 3, 4 (other paths), 5, 7.5 in remote mode', () => {
+    expect(tmpl).toMatch(/4d.*[Ss]kip Steps? 3, 4.*5.*7\.5/s);
+  });
+
+  test('Step 5a has a Path 4 branch with claude mcp add --transport http', () => {
+    expect(tmpl).toMatch(/Path 4 \(Remote MCP/);
+    expect(tmpl).toMatch(/claude mcp add --scope user --transport http gbrain/);
+    expect(tmpl).toContain('Authorization: Bearer $GBRAIN_MCP_TOKEN');
+    // Token must be unset after registration so it doesn't linger in env.
+    expect(tmpl).toMatch(/unset GBRAIN_MCP_TOKEN/);
+  });
+
+  test('Step 5a removes any prior gbrain registration before adding the new one', () => {
+    // Otherwise local-stdio + remote-http coexist, which breaks routing.
+    expect(tmpl).toMatch(/claude mcp remove gbrain/);
+  });
+
+  test('Step 7 calls gstack-artifacts-init with --url-form-supported flag', () => {
+    expect(tmpl).toMatch(/gstack-artifacts-init.*--url-form-supported/);
+  });
+
+  test('Step 8 CLAUDE.md block branches on mode', () => {
+    // The remote-http block has Mode: remote-http; local-stdio block has Engine:.
+    expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
+    expect(tmpl).toMatch(/Mode: remote-http/);
+    expect(tmpl).toMatch(/Mode: local-stdio/);
+  });
+
+  test('Step 8 explicitly says the bearer is never written to CLAUDE.md', () => {
+    // Token-leak regression guard. CLAUDE.md is committed in many projects.
+    expect(tmpl).toMatch(/bearer token is \*\*never\*\* written to CLAUDE\.md/);
+  });
+
+  test('Step 9 smoke test on Path 4 prints a placeholder, never the real token', () => {
+    // Don't paste the token into the curl example the user might share.
+    expect(tmpl).toMatch(/<YOUR_TOKEN>/);
+  });
+
+  test('Step 10 verdict block has a remote-http variant separate from local-stdio', () => {
+    expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
+    expect(tmpl).toMatch(/mode: remote-http/);
+    expect(tmpl).toMatch(/N\/A.*remote mode/);
+  });
+
+  test('idempotency: re-running with gbrain_mcp_mode=remote-http skips Step 2', () => {
+    // Re-run path stays graceful; no double-registration.
+    expect(tmpl).toMatch(/gbrain_mcp_mode=remote-http/);
+  });
+
+  test('Step 5 (local doctor) explicitly skips on Path 4', () => {
+    expect(tmpl).toMatch(/SKIP entirely on Path 4 \(Remote MCP\)/);
+  });
+
+  test('Step 7.5 (transcript ingest) explicitly skips on Path 4', () => {
+    // Transcript ingest needs local gbrain CLI which Path 4 doesn't install.
+    const matches = tmpl.match(/SKIP entirely on Path 4 \(Remote MCP\)/g);
+    expect(matches?.length).toBeGreaterThanOrEqual(2);
+  });
+});
+
+describe('setup-gbrain Path 4 — token security regressions', () => {
+  test('the template never inlines a real-shaped bearer string', () => {
+    // We never want a literal "gbrain_<hex>" token to appear in the
+    // template — placeholders only. This catches the failure mode where
+    // someone copies a real token into the template by accident.
+    const realTokenShape = /gbrain_[a-f0-9]{40,}/;
+    expect(tmpl).not.toMatch(realTokenShape);
+  });
+
+  test('Path 4 always uses env-var $GBRAIN_MCP_TOKEN, never inline strings', () => {
+    // Find every reference to the bearer header in Path 4 and verify it's
+    // either an env-var expansion or an explicit placeholder. Allow:
+    //   - $GBRAIN_MCP_TOKEN  (env-var expansion)
+    //   - <bearer>, <YOUR_TOKEN>, <TOKEN>  (placeholder)
+    //   - "..."  (rest-of-doc-text continuation; a doc note showing how
+    //     `claude mcp add --header` shapes its argv).
+    const path4Section = tmpl.match(/### Path 4 \(Remote MCP[\s\S]*?(?=###|## )/g)?.join('') || '';
+    const bearerLines = path4Section.match(/Bearer\s+\S+/g) || [];
+    for (const line of bearerLines) {
+      expect(line).toMatch(/Bearer (\$GBRAIN_MCP_TOKEN|<bearer>|<YOUR_TOKEN>|<TOKEN>|\.\.\."?)/);
+    }
+  });
+});
@@ -0,0 +1,148 @@
+// E2E: /setup-gbrain Path 4 with a bad bearer token via Agent SDK.
+//
+// Drives the skill against a stub HTTP MCP server that returns 401
+// (auth-shape body). Asserts that the AUTH classifier hint shows up
+// AND no MCP registration happens (no claude mcp add --transport http
+// in the call log; no half-written CLAUDE.md block). This is the
+// regression guard for the "verify failed → STOP" gate.
+//
+// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import * as http from 'http';
+import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
+
+const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
+const describeE2E = shouldRun ? describe : describe.skip;
+
+function startStub401(): Promise<{ url: string; close: () => Promise<void> }> {
+  return new Promise((resolve) => {
+    const server = http.createServer((req, res) => {
+      let body = '';
+      req.on('data', (c) => (body += c));
+      req.on('end', () => {
+        res.statusCode = 401;
+        res.setHeader('Content-Type', 'application/json');
+        res.end(
+          JSON.stringify({ error: 'unauthorized', error_description: 'invalid or expired auth token' })
+        );
+      });
+    });
+    server.listen(0, '127.0.0.1', () => {
+      const addr = server.address();
+      if (!addr || typeof addr === 'string') throw new Error('no address');
+      resolve({
+        url: `http://127.0.0.1:${addr.port}/mcp`,
+        close: () => new Promise((r) => server.close(() => r())),
+      });
+    });
+  });
+}
+
+function makeFakeClaude(fakeBinDir: string): string {
+  const callLog = path.join(fakeBinDir, 'claude-calls.log');
+  const script = `#!/bin/bash
+echo "claude $@" >> "${callLog}"
+case "$1 $2" in
+  "mcp add") exit 0 ;;
+  "mcp list") echo "no gbrain" ; exit 0 ;;
+  "mcp remove") exit 0 ;;
+  "mcp get") exit 1 ;;
+esac
+exit 0
+`;
+  fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
+  return callLog;
+}
+
+describeE2E('/setup-gbrain Path 4 — bad token STOPs cleanly', () => {
+  test('AUTH classifier fires, no MCP registration, no CLAUDE.md mutation', async () => {
+    const stubServer = await startStub401();
+    const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-'));
+    const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-bin-'));
+    const callLog = makeFakeClaude(fakeBinDir);
+
+    const ORIGINAL_CLAUDE_MD = '# Test project\n\nSome existing content here.\n';
+    fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), ORIGINAL_CLAUDE_MD);
+
+    const BAD_TOKEN = 'gbrain_BAD_TOKEN_67890_DELIBERATELY_INVALID';
+    const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
+    const binary = resolveClaudeBinary();
+
+    const orig = {
+      gstackHome: process.env.GSTACK_HOME,
+      pathEnv: process.env.PATH,
+      mcpToken: process.env.GBRAIN_MCP_TOKEN,
+    };
+    process.env.GSTACK_HOME = gstackHome;
+    process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
+    process.env.GBRAIN_MCP_TOKEN = BAD_TOKEN;
+
+    let modelTextOutput = '';
+
+    try {
+      const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
+      const result = await runAgentSdkTest({
+        systemPrompt: { type: 'preset', preset: 'claude_code' },
+        userPrompt:
+          `Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
+          `Use this MCP URL: ${stubServer.url}. ` +
+          `The bearer token is already in the GBRAIN_MCP_TOKEN env var. ` +
+          `If verify fails (Step 4c), follow the skill's STOP rule — surface the error and stop. ` +
+          `Do NOT register the MCP if verify failed. ` +
+          `Do NOT modify CLAUDE.md if verify failed.`,
+        workingDirectory: gstackHome,
+        maxTurns: 15,
+        allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
+        ...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
+        canUseTool: async (toolName, input) => {
+          if (toolName === 'AskUserQuestion') {
+            askUserQuestions.push({ input });
+            const q = (input.questions as Array<{
+              question: string;
+              options: Array<{ label: string }>;
+            }>)[0];
+            const decline = q.options.find((o) => /skip|decline|no/i.test(o.label)) ?? q.options[0]!;
+            return {
+              behavior: 'allow',
+              updatedInput: { questions: input.questions, answers: { [q.question]: decline.label } },
+            };
+          }
+          return passThroughNonAskUserQuestion(toolName, input);
+        },
+      });
+
+      modelTextOutput = JSON.stringify(result);
+
+      // Assertion 1: the AUTH classifier hint surfaced somewhere in the run.
+      // The verify helper outputs `"error_class": "AUTH"` and the hint
+      // "rotate token on the brain host" — at least one should be visible.
+      const hintShown =
+        /error_class.*AUTH/i.test(modelTextOutput) ||
+        /rotate token/i.test(modelTextOutput) ||
+        /AUTH.*HTTP 401/i.test(modelTextOutput);
+      expect(hintShown).toBe(true);
+
+      // Assertion 2: claude mcp add was NEVER called (verify failed → STOP).
+      const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
+      expect(calls).not.toMatch(/mcp add.*--transport http/);
+
+      // Assertion 3: CLAUDE.md is unchanged (no half-written block).
+      const finalClaudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
+      expect(finalClaudeMd).toBe(ORIGINAL_CLAUDE_MD);
+
+      // Assertion 4: the bad token never leaked to CLAUDE.md.
+      expect(finalClaudeMd).not.toContain(BAD_TOKEN);
+    } finally {
+      if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
+      if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
+      if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
+      await stubServer.close();
+      fs.rmSync(gstackHome, { recursive: true, force: true });
+      fs.rmSync(fakeBinDir, { recursive: true, force: true });
+    }
+  }, 240_000);
+});
@@ -0,0 +1,214 @@
+// E2E: /setup-gbrain Path 4 (Remote MCP) happy path via Agent SDK.
+//
+// Drives the skill against a stub HTTP MCP server and a stubbed `claude`
+// binary that records `claude mcp add` calls. Asserts:
+//   - The verify helper succeeds (no AUTH/MALFORMED/NETWORK error in output)
+//   - The skill calls `claude mcp add --transport http` with the bearer
+//   - The token NEVER appears in the CLAUDE.md block the skill writes
+//   - The wrote_findings_before_asking failure mode is NOT triggered
+//
+// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
+//
+// See setup-gbrain/SKILL.md.tmpl Step 4 (Path 4) for the contract under test.
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import * as http from 'http';
+import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
+
+const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
+const describeE2E = shouldRun ? describe : describe.skip;
+
+// Spin up a stub MCP server that responds to initialize + tools/list.
+function startStubMcpServer(opts: { failWithStatus?: number; failBody?: string } = {}): Promise<{ url: string; close: () => Promise<void> }> {
+  return new Promise((resolve) => {
+    const server = http.createServer((req, res) => {
+      if (req.method !== 'POST' || !(req.url ?? '').endsWith('/mcp')) {
+        res.statusCode = 404;
+        res.end();
+        return;
+      }
+      let body = '';
+      req.on('data', (c) => (body += c));
+      req.on('end', () => {
+        if (opts.failWithStatus) {
+          res.statusCode = opts.failWithStatus;
+          res.setHeader('Content-Type', 'application/json');
+          res.end(opts.failBody ?? JSON.stringify({ error: 'fail' }));
+          return;
+        }
+        const reqJson = (() => {
+          try { return JSON.parse(body); } catch { return {} as any; }
+        })();
+        let respBody: any;
+        if (reqJson.method === 'initialize') {
+          respBody = {
+            result: {
+              protocolVersion: '2024-11-05',
+              capabilities: { tools: {} },
+              serverInfo: { name: 'gbrain', version: '0.27.1' },
+            },
+            jsonrpc: '2.0',
+            id: reqJson.id,
+          };
+        } else if (reqJson.method === 'tools/list') {
+          respBody = { result: { tools: [{ name: 'search' }, { name: 'put_page' }] }, jsonrpc: '2.0', id: reqJson.id };
+        } else {
+          respBody = { error: { code: -32601, message: 'unknown method' }, jsonrpc: '2.0', id: reqJson.id };
+        }
+        // SSE-shape since the verify helper supports both, and many MCP
+        // servers (including wintermute) wrap responses as SSE.
+        res.statusCode = 200;
+        res.setHeader('Content-Type', 'text/event-stream');
+        res.end(`event: message\ndata: ${JSON.stringify(respBody)}\n\n`);
+      });
+    });
+    server.listen(0, '127.0.0.1', () => {
+      const addr = server.address();
+      if (!addr || typeof addr === 'string') throw new Error('no address');
+      resolve({
+        url: `http://127.0.0.1:${addr.port}/mcp`,
+        close: () => new Promise((r) => server.close(() => r())),
+      });
+    });
+  });
+}
+
+// Stubbed `claude` binary: intercepts `mcp add` and `mcp list` commands so
+// the skill's Step 5a registration appears to succeed, while we record
+// every invocation for assertions.
+function makeFakeClaude(fakeBinDir: string): string {
+  const claudeJsonPath = path.join(fakeBinDir, 'claude.json');
+  const callLog = path.join(fakeBinDir, 'claude-calls.log');
+  const script = `#!/bin/bash
+echo "claude $@" >> "${callLog}"
+case "$1 $2" in
+  "mcp add")
+    # Just record the call; pretend it succeeded.
+    exit 0
+    ;;
+  "mcp list")
+    echo "gbrain: http://127.0.0.1:0/mcp (HTTP) - ✓ Connected"
+    exit 0
+    ;;
+  "mcp remove")
+    exit 0
+    ;;
+  "mcp get")
+    # First few calls return "no entry"; after mcp add fires, return success.
+    if [ -f "${claudeJsonPath}" ]; then
+      cat "${claudeJsonPath}"
+      exit 0
+    fi
+    exit 1
+    ;;
+esac
+exit 0
+`;
+  fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
+  return callLog;
+}
+
+describeE2E('/setup-gbrain Path 4 (Remote MCP) — happy path', () => {
+  test('verifies, registers HTTP MCP, never writes token to CLAUDE.md', async () => {
+    const stubServer = await startStubMcpServer();
+    const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-'));
+    const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-bin-'));
+    const callLog = makeFakeClaude(fakeBinDir);
+
+    // The skill writes CLAUDE.md in cwd. Use gstackHome as cwd so we
+    // can inspect it after the run.
+    fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), '# Test project\n');
+
+    const SECRET_TOKEN = 'gbrain_TEST_TOKEN_THAT_MUST_NEVER_LEAK_84613';
+    const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
+    const binary = resolveClaudeBinary();
+
+    // Ambient env mutations. Restored in finally.
+    const orig = {
+      gstackHome: process.env.GSTACK_HOME,
+      pathEnv: process.env.PATH,
+      mcpToken: process.env.GBRAIN_MCP_TOKEN,
+    };
+    process.env.GSTACK_HOME = gstackHome;
+    process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
+    process.env.GBRAIN_MCP_TOKEN = SECRET_TOKEN;
+
+    let modelTextOutput = '';
+
+    try {
+      const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
+      const result = await runAgentSdkTest({
+        systemPrompt: { type: 'preset', preset: 'claude_code' },
+        userPrompt:
+          `Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
+          `Use this MCP URL: ${stubServer.url}. ` +
+          `The bearer token is already in the GBRAIN_MCP_TOKEN env var (do not echo it). ` +
+          `Skip the privacy gate — answer "Decline" if the preamble fires. ` +
+          `Skip the artifacts-repo provisioning step (Step 7) — answer "No thanks". ` +
+          `Skip per-remote policy (Step 6) — answer "skip-for-now". ` +
+          `Walk through Steps 4a, 4b, 4c, 5a, 8, 10 ONLY.`,
+        workingDirectory: gstackHome,
+        maxTurns: 25,
+        allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
+        ...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
+        canUseTool: async (toolName, input) => {
+          if (toolName === 'AskUserQuestion') {
+            askUserQuestions.push({ input });
+            const q = (input.questions as Array<{
+              question: string;
+              options: Array<{ label: string }>;
+            }>)[0];
+            // Auto-decline / skip everything except the path-pick (which the
+            // user-prompt already directed to Path 4).
+            const decline =
+              q.options.find((o) => /skip|decline|no thanks|local/i.test(o.label)) ?? q.options[q.options.length - 1]!;
+            return {
+              behavior: 'allow',
+              updatedInput: {
+                questions: input.questions,
+                answers: { [q.question]: decline.label },
+              },
+            };
+          }
+          return passThroughNonAskUserQuestion(toolName, input);
+        },
+      });
+
+      modelTextOutput = JSON.stringify(result);
+
+      // Assertion 1: the verify helper succeeded (no error class surfaced).
+      expect(modelTextOutput).not.toMatch(/error_class.*NETWORK/i);
+      expect(modelTextOutput).not.toMatch(/error_class.*AUTH/i);
+      expect(modelTextOutput).not.toMatch(/error_class.*MALFORMED/i);
+
+      // Assertion 2: claude mcp add was called with --transport http.
+      const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
+      expect(calls).toMatch(/mcp add.*--transport http/);
+
+      // Assertion 3: the secret token NEVER appears in the final CLAUDE.md.
+      const claudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
+      expect(claudeMd).not.toContain(SECRET_TOKEN);
+
+      // Assertion 4: CLAUDE.md got the remote-http block.
+      expect(claudeMd).toMatch(/Mode: remote-http/);
+
+      // Assertion 5: classifier — the model didn't write findings before
+      // asking. The Path 4 prose has 5 STOP gates; if any of them got
+      // skipped, that's the wrote_findings_before_asking pattern.
+      const wroteBefore = /## GSTACK REVIEW REPORT|critical_gaps/i.test(modelTextOutput);
+      // Setup-gbrain doesn't have a review report contract, so this is
+      // a structural shape check, not a hard failure mode.
+      expect(wroteBefore).toBe(false);
+    } finally {
+      if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
+      if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
+      if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
+      await stubServer.close();
+      fs.rmSync(gstackHome, { recursive: true, force: true });
+      fs.rmSync(fakeBinDir, { recursive: true, force: true });
+    }
+  }, 240_000);
+});