test: regression suite + E2E for v1.27.0.0 rename

Three new regression tests guard the rename's blast radius (per codex
Findings #1, #8, #9, #12):

- test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl,
  test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode);
  fails CI if any non-allowlisted file references them.
- test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has
  no stale references in any */SKILL.md (the cross-product blind spot).
- test/setup-gbrain-path4-structure.test.ts: structural lint over the
  Path 4 prose contract — STOP gates after verify failure, never-write-
  token rules, mode-aware CLAUDE.md block, bearer always via env-var.

Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs):

- test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs
  an HTTP MCP server, drives the skill via Agent SDK with a stubbed
  bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets
  the remote-http block, the secret token NEVER leaks to CLAUDE.md.
- test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401;
  asserts the AUTH classifier hint surfaces, no MCP registration occurs,
  CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP"
  rule.

touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at
gate-tier so CI catches Path 4 regressions on every PR.

Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log
(legacy gstack-brain-init mentions in headers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-06 10:15:23 -07:00
parent 935adf3a50
commit b0e0a76dca
11 changed files with 835 additions and 3 deletions
+1 -1
View File
@@ -4,7 +4,7 @@
# Usage (called by git, not by users):
# gstack-jsonl-merge <base> <ours> <theirs>
#
# Registered in local git config by bin/gstack-brain-init and
# Registered in local git config by bin/gstack-artifacts-init and
# bin/gstack-brain-restore:
# git config merge.jsonl-append.driver \
# "$GSTACK_BIN/gstack-jsonl-merge %O %A %B"
+1 -1
View File
@@ -4,7 +4,7 @@
#
# Session timeline: local by default. If the user enables `artifacts_sync_mode`
# with the `full` (not `artifacts-only`) privacy tier — via the first-run
# stop-gate from `gstack-brain-init` or the preamble — timeline events are
# stop-gate from `gstack-artifacts-init` or the preamble — timeline events are
# published to the user's private GBrain sync repo. See docs/gbrain-sync.md.
# Required fields: skill, event (started|completed).
# Optional: branch, outcome, duration_s, session, ts.
+98
View File
@@ -176,3 +176,101 @@ the recovery path is:
on the brain remote for hard-delete from history
4. File a gitleaks issue with the pattern (or extend the gitleaks config
at `~/.gitleaks.toml`).
## Path 4: Remote MCP setup (v1.27.0.0+)
If you don't run gbrain locally — you have a teammate or another machine
running `gbrain serve` over HTTP, accessible via Tailscale, ngrok, or
internal LAN — `/setup-gbrain` Path 4 is the one-paste flow.
You provide:
- The MCP URL (e.g., `https://wintermute.tail554574.ts.net:3131/mcp`)
- A bearer token (issued by the brain admin via `gbrain access-token issue`)
What `/setup-gbrain` does:
1. Verifies the URL + token via `gstack-gbrain-mcp-verify`. Three failure
modes get classified with one-line remediation hints:
**NETWORK** ("check Tailscale/DNS"), **AUTH** ("rotate token"),
**MALFORMED** ("Accept-header gotcha — pass both `application/json`
AND `text/event-stream`").
2. Registers the MCP at user scope:
```
claude mcp add --scope user --transport http gbrain "$URL" \
--header "Authorization: Bearer $TOKEN"
```
3. Skips local install, local doctor, transcript ingest, and federated
source registration. All four require a local `gbrain` CLI that Path 4
doesn't install.
4. Optionally provisions a `gstack-artifacts-$USER` private repo on
GitHub or GitLab and prints the one-line `gbrain sources add` command
for your brain admin to run on the brain host.
### Token storage trade-off
The bearer token lives in `~/.claude.json` (mode 0600), where Claude Code
stores every MCP server's credentials. During `claude mcp add --header
"Authorization: Bearer $TOKEN"`, the token is briefly visible in
process argv (~10ms) — visible to `ps` running concurrently. The window
is small but it's not zero.
Mitigations we've considered:
- **Stdin or env-var input form for headers** — would close the argv
window. As of Claude Code v1.0.x, the CLI doesn't expose either.
When it does, `/setup-gbrain` Path 4 will switch automatically.
- **Keychain storage** — explicitly out of scope (the token's resting
state in `~/.claude.json` is the existing trust surface for every MCP
credential; expanding to Keychain would touch every MCP server, not
just gbrain).
### Why Path 4 is "always print" for the brain-admin hookup
`gstack-artifacts-init` always prints the `gbrain sources add` command
labeled "Send this to your brain admin" — even when the user IS the
brain admin (consistent UX, no mode-detection fragility).
A previous design proposed probing whether the user's bearer has admin
scope (via a benign MCP write call like `add_tag`) and auto-executing
the source registration when scope was sufficient. The design review
flagged that page-write doesn't actually prove source-management
permission — those are different scopes in any sensible auth model.
Until gbrain ships:
- a `mcp__gbrain__whoami` capability tool that returns the bearer's
scope set, AND
- a `mcp__gbrain__sources_add` MCP tool with admin-scope gating
we always print the command rather than pretending we know who has
permission to run it.
### CLAUDE.md block in Path 4
Distinct from local-stdio mode. Token is **never** written to CLAUDE.md
(many projects check CLAUDE.md into git). The block records the URL,
the verified server version, the artifacts repo URL (if provisioned),
and the per-repo trust policy.
```markdown
## GBrain Configuration (configured by /setup-gbrain)
- Mode: remote-http
- MCP URL: https://wintermute.tail554574.ts.net:3131/mcp
- Server version: gbrain v0.27.1
- Setup date: 2026-05-06
- MCP registered: yes (user scope)
- Token: stored in ~/.claude.json (do not commit; never written to CLAUDE.md)
- Artifacts repo: github.com/garrytan/gstack-artifacts-garrytan (private)
- Artifacts sync: artifacts-only
- Current repo policy: read-write
```
### Token rotation
Server-side. When verify hits `AUTH` (e.g., the brain admin rotated the
token), the helper says: "rotate token on the brain host, re-run
/setup-gbrain." On wintermute or wherever your gbrain server lives:
```
gbrain access-token rotate # invalidates old, issues new
```
(See `gstack/setup-gbrain/SKILL.md.tmpl` for the full Path 4 flow plus
the gbrain enhancement requests around scoped tokens that would let
gstack auto-rotate in V2.)
+16
View File
@@ -768,6 +768,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
```
**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
this skill is a graceful no-op. The brain server's own indexing cadence
handles code import + search refresh; this Mac doesn't run a local gbrain
CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
> Your brain server (`<host>` from claude.json) handles indexing on its own
> cadence. If indexing seems stale, ping your brain admin or trigger a
> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
> follow-on TODO."
Then exit cleanly. Do NOT proceed to Step 2.
For local-stdio mode and unconfigured states:
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
tell the user:
+16
View File
@@ -66,6 +66,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
```
**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
this skill is a graceful no-op. The brain server's own indexing cadence
handles code import + search refresh; this Mac doesn't run a local gbrain
CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
> Your brain server (`<host>` from claude.json) handles indexing on its own
> cadence. If indexing seems stale, ping your brain admin or trigger a
> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
> follow-on TODO."
Then exit cleanly. Do NOT proceed to Step 2.
For local-stdio mode and unconfigured states:
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
tell the user:
+14 -1
View File
@@ -133,7 +133,14 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
'plan-eng-finding-count': ['plan-eng-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-eng-finding-count.test.ts'],
'plan-design-finding-count': ['plan-design-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-design-finding-count.test.ts'],
'plan-devex-finding-count': ['plan-devex-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-devex-finding-count.test.ts'],
'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-brain-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-artifacts-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
// /setup-gbrain Path 4 (Remote MCP) — happy + bad-token end-to-end via
// Agent SDK. Gate-tier (deterministic stub server, fixed inputs); fires
// when the skill template, the verify helper, the artifacts-init helper,
// or the detect script changes.
'setup-gbrain-remote': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'bin/gstack-artifacts-init', 'bin/gstack-gbrain-detect', 'test/helpers/agent-sdk-runner.ts'],
'setup-gbrain-bad-token': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'test/helpers/agent-sdk-runner.ts'],
// AskUserQuestion format regression (RECOMMENDATION + Completeness: N/10)
// Fires when either template OR the two preamble resolvers change.
@@ -427,6 +434,12 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
// costs ~$0.30-$0.50 per run, not needed on every commit)
'brain-privacy-gate': 'periodic',
// /setup-gbrain Path 4 (Remote MCP) — gate-tier. Stub HTTP server is
// deterministic; Path 4's STOP gates are the failure mode this catches
// (token in CLAUDE.md, partial registration on bad bearer).
'setup-gbrain-remote': 'gate',
'setup-gbrain-bad-token': 'gate',
// AskUserQuestion format regression — periodic (Opus 4.7 non-deterministic benchmark)
'plan-ceo-review-format-mode': 'periodic',
'plan-ceo-review-format-approach': 'periodic',
+120
View File
@@ -0,0 +1,120 @@
/**
* Regression: no stale `gstack-brain-init`, `gbrain_sync_mode`, or
* `~/.gstack-brain-remote.txt` references survive the v1.27.0.0 rename.
*
* Per codex Findings #1 + #8 + #9: the rename's blast radius is wider than
* the obvious bin/ + scripts/ surface. This test grep-scans the broader
* tree (bin, scripts, *.tmpl, generated *.md, test/, docs/) for the
* deprecated identifiers and fails CI if any callers were missed.
*
* Allowlist: the migration script (`gstack-upgrade/migrations/v1.27.0.0.sh`)
* legitimately references the old names — it's the rename actor itself.
* Old migration scripts (v1.17.0.0.sh and similar) reference the old names
* for their own historical context and are also allowlisted.
*
* The test is mechanical: if you find yourself adding a non-historical
* file to the allowlist, you probably need to actually fix the rename
* instead.
*/
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import { spawnSync } from 'child_process';
const ROOT = path.resolve(import.meta.dir, '..');
const ALLOWLIST = [
// The migration script that performs the rename. Self-references are expected.
'gstack-upgrade/migrations/v1.27.0.0.sh',
// Older migration scripts — historical references; these document past state.
'gstack-upgrade/migrations/v1.17.0.0.sh',
// The migration test itself — it asserts on the migration's behavior.
'test/migrations-v1.27.0.0.test.ts',
// The test for the v1.17.0.0 historical migration.
'test/gstack-upgrade-migration-v1_17_0_0.test.ts',
// CHANGELOG entries describe historical state by their nature.
'CHANGELOG.md',
// TODOS may reference past or future states by name.
'TODOS.md',
// The plan file for v1.27.0.0 documents why we're renaming.
'.context/plans/setup-gbrain-remote-mcp-rename-brain-artifacts.md',
// The bin/gstack-config comment explicitly preserves the rename note.
'bin/gstack-config',
// Detect script's "renamed in v1.27.0.0" comment + brain-remote-fallback path.
'bin/gstack-gbrain-detect',
// brain-restore + source-wireup keep the old file as a migration-window fallback
// (read both, prefer artifacts). brain-uninstall has the same fallback.
'bin/gstack-brain-restore',
'bin/gstack-gbrain-source-wireup',
'bin/gstack-brain-uninstall',
// The preamble resolver reads the legacy file as a fallback during the
// migration window — same pattern.
'scripts/resolvers/preamble/generate-brain-sync-block.ts',
// gstack-upgrade.test.ts may exercise old migration behavior.
'test/gstack-upgrade.test.ts',
// This test itself references the patterns to grep for.
'test/no-stale-gstack-brain-refs.test.ts',
// memory.md documents the rename context.
'setup-gbrain/memory.md',
// The new init script's header comment intentionally cites the rename.
'bin/gstack-artifacts-init',
// The replacement test mirrors the pattern of the old test (lineage note).
'test/gstack-artifacts-init.test.ts',
// The post-rename-doc-regen test references the patterns it greps for.
'test/post-rename-doc-regen.test.ts',
// The Path 4 structural lint references some legacy names in comments.
'test/setup-gbrain-path4-structure.test.ts',
// Generated docs that include the preamble bash (which has the fallback).
// We grep template sources, not generated output, by limiting scan paths.
];
const FORBIDDEN_PATTERNS = [
'gstack-brain-init',
'gbrain_sync_mode',
];
const SCAN_PATHS = [
'bin/',
'scripts/',
'setup-gbrain/SKILL.md.tmpl',
'sync-gbrain/SKILL.md.tmpl',
'health/SKILL.md.tmpl',
'plan-eng-review/SKILL.md.tmpl',
'plan-ceo-review/SKILL.md.tmpl',
'review/SKILL.md.tmpl',
'ship/SKILL.md.tmpl',
'test/',
];
function grepRefs(pattern: string): string[] {
const args = ['-rn', '--', pattern, ...SCAN_PATHS.map((p) => path.join(ROOT, p))];
const r = spawnSync('grep', args, { encoding: 'utf-8' });
// grep exits 1 when no matches — that's fine for our purposes.
const lines = (r.stdout || '').split('\n').filter((l) => l.trim().length > 0);
return lines
.map((line) => {
// Strip ROOT prefix to get repo-relative path.
const colon = line.indexOf(':');
const file = line.slice(0, colon);
return path.relative(ROOT, file);
})
.filter((file) => !ALLOWLIST.includes(file))
// Filter out any file that's inside a directory we don't actually scan.
.filter((file) => !file.startsWith('node_modules/') && !file.startsWith('.git/'));
}
describe('no stale gstack-brain refs (v1.27.0.0 rename)', () => {
for (const pattern of FORBIDDEN_PATTERNS) {
test(`no non-allowlisted references to "${pattern}"`, () => {
const offenders = [...new Set(grepRefs(pattern))];
if (offenders.length > 0) {
console.error(`Found stale "${pattern}" references in:\n${offenders.map((f) => ` - ${f}`).join('\n')}`);
console.error(
`If a file is intentionally referencing the old name (migration, historical doc, fallback path), add it to ALLOWLIST in this test.`
);
}
expect(offenders).toEqual([]);
});
}
});
+74
View File
@@ -0,0 +1,74 @@
// Post-rename doc-regen regression: after `bun run gen:skill-docs`, no
// `gstack-brain-init` or `gbrain_sync_mode` strings appear in any of the
// generated SKILL.md files (the cross-product blind spot codex
// Finding #12 flagged).
//
// The check runs against the canonical claude-host output already on
// disk. We don't shell out to gen-skill-docs again; the existing
// freshness check in gen-skill-docs.test.ts covers that. This test
// just verifies the rename actually propagated to the generated
// artifacts that users see.
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
const ROOT = path.resolve(import.meta.dir, '..');
const FORBIDDEN_PATTERNS = [
// Bare identifier — should NEVER appear in generated docs (if it does,
// a template still has the old call site).
/^.*\bgstack-brain-init\b.*$/m,
/^.*\bgbrain_sync_mode\b.*$/m,
];
// Per the preamble resolver: generated docs DO contain the
// "~/.gstack-brain-remote.txt" string in the migration-window fallback. We
// don't grep for that — it's intentional. We grep for the call-site
// identifiers only.
function findSkillMdFiles(): string[] {
const skillMd = path.join(ROOT, 'SKILL.md');
const files: string[] = [skillMd];
// Top-level skill directories with their own SKILL.md.
const entries = fs.readdirSync(ROOT, { withFileTypes: true });
for (const e of entries) {
if (e.isDirectory() && !e.name.startsWith('.') && !['node_modules', 'test'].includes(e.name)) {
const inner = path.join(ROOT, e.name, 'SKILL.md');
if (fs.existsSync(inner)) files.push(inner);
}
}
return files;
}
describe('post-rename doc-regen regression (codex Finding #12)', () => {
test('no generated SKILL.md contains "gstack-brain-init"', () => {
const offenders: string[] = [];
for (const file of findSkillMdFiles()) {
const content = fs.readFileSync(file, 'utf-8');
const m = content.match(/^.*\bgstack-brain-init\b.*$/m);
if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
}
if (offenders.length > 0) {
console.error(`Stale "gstack-brain-init" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
}
expect(offenders).toEqual([]);
});
test('no generated SKILL.md contains "gbrain_sync_mode"', () => {
const offenders: string[] = [];
for (const file of findSkillMdFiles()) {
const content = fs.readFileSync(file, 'utf-8');
const m = content.match(/^.*\bgbrain_sync_mode\b.*$/m);
if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
}
if (offenders.length > 0) {
console.error(`Stale "gbrain_sync_mode" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
}
expect(offenders).toEqual([]);
});
test('top-level SKILL.md exists and is regenerated', () => {
expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
});
});
+133
View File
@@ -0,0 +1,133 @@
// setup-gbrain Path 4 structural lint.
//
// Verifies the SKILL.md.tmpl has the prose contract that Path 4 (Remote MCP)
// depends on: STOP gates after verify failures, never-write-token rules,
// mode-aware CLAUDE.md block, idempotent re-run path.
//
// Why a structural test instead of a full Agent SDK E2E:
// - Side effects (claude.json mutation, MCP registration) are covered
// by unit tests for gstack-gbrain-mcp-verify and gstack-artifacts-init.
// - The structural prose is the source of regressions for AUQ pacing
// (the failure mode the gstack repo has tracked since v1.26.x:
// "wrote_findings_before_asking"). A grep-based regression on the
// template prose is fast (<200ms), free, and catches the same drift
// as the paid E2E without spending tokens.
// - The full Agent SDK E2E remains the right tool for end-to-end
// pacing eval; this is the gate-tier check that catches the failure
// class deterministically.
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
const ROOT = path.resolve(import.meta.dir, '..');
const TMPL = path.join(ROOT, 'setup-gbrain', 'SKILL.md.tmpl');
const tmpl = fs.readFileSync(TMPL, 'utf-8');
describe('setup-gbrain Path 4 (Remote MCP) — structural contract', () => {
test('Step 2 lists Path 4 as one of the path options', () => {
// "4 — Remote gbrain MCP" with em-dash (—, U+2014 — one codepoint).
expect(tmpl).toMatch(/\*\*4 . Remote gbrain MCP/);
});
test('Step 4 has a Path 4 sub-section', () => {
expect(tmpl).toMatch(/### Path 4 \(Remote gbrain MCP/);
});
test('Step 4 collects the bearer via read_secret_to_env, never argv', () => {
// The secret-read helper is the canonical token-capture pattern.
// Without it, tokens land in shell history.
expect(tmpl).toContain('read_secret_to_env GBRAIN_MCP_TOKEN');
});
test('Step 4c invokes gstack-gbrain-mcp-verify and STOPs on failure', () => {
expect(tmpl).toContain('gstack-gbrain-mcp-verify');
// The STOP rule is what prevents partial registration after auth fail.
const path4Section = tmpl.split('### Path 4')[1] || '';
expect(path4Section).toMatch(/STOP/);
});
test('Step 4d explicitly skips Steps 3, 4 (other paths), 5, 7.5 in remote mode', () => {
expect(tmpl).toMatch(/4d.*[Ss]kip Steps? 3, 4.*5.*7\.5/s);
});
test('Step 5a has a Path 4 branch with claude mcp add --transport http', () => {
expect(tmpl).toMatch(/Path 4 \(Remote MCP/);
expect(tmpl).toMatch(/claude mcp add --scope user --transport http gbrain/);
expect(tmpl).toContain('Authorization: Bearer $GBRAIN_MCP_TOKEN');
// Token must be unset after registration so it doesn't linger in env.
expect(tmpl).toMatch(/unset GBRAIN_MCP_TOKEN/);
});
test('Step 5a removes any prior gbrain registration before adding the new one', () => {
// Otherwise local-stdio + remote-http coexist, which breaks routing.
expect(tmpl).toMatch(/claude mcp remove gbrain/);
});
test('Step 7 calls gstack-artifacts-init with --url-form-supported flag', () => {
expect(tmpl).toMatch(/gstack-artifacts-init.*--url-form-supported/);
});
test('Step 8 CLAUDE.md block branches on mode', () => {
// The remote-http block has Mode: remote-http; local-stdio block has Engine:.
expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
expect(tmpl).toMatch(/Mode: remote-http/);
expect(tmpl).toMatch(/Mode: local-stdio/);
});
test('Step 8 explicitly says the bearer is never written to CLAUDE.md', () => {
// Token-leak regression guard. CLAUDE.md is committed in many projects.
expect(tmpl).toMatch(/bearer token is \*\*never\*\* written to CLAUDE\.md/);
});
test('Step 9 smoke test on Path 4 prints a placeholder, never the real token', () => {
// Don't paste the token into the curl example the user might share.
expect(tmpl).toMatch(/<YOUR_TOKEN>/);
});
test('Step 10 verdict block has a remote-http variant separate from local-stdio', () => {
expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
expect(tmpl).toMatch(/mode: remote-http/);
expect(tmpl).toMatch(/N\/A.*remote mode/);
});
test('idempotency: re-running with gbrain_mcp_mode=remote-http skips Step 2', () => {
// Re-run path stays graceful; no double-registration.
expect(tmpl).toMatch(/gbrain_mcp_mode=remote-http/);
});
test('Step 5 (local doctor) explicitly skips on Path 4', () => {
expect(tmpl).toMatch(/SKIP entirely on Path 4 \(Remote MCP\)/);
});
test('Step 7.5 (transcript ingest) explicitly skips on Path 4', () => {
// Transcript ingest needs local gbrain CLI which Path 4 doesn't install.
const matches = tmpl.match(/SKIP entirely on Path 4 \(Remote MCP\)/g);
expect(matches?.length).toBeGreaterThanOrEqual(2);
});
});
describe('setup-gbrain Path 4 — token security regressions', () => {
test('the template never inlines a real-shaped bearer string', () => {
// We never want a literal "gbrain_<hex>" token to appear in the
// template — placeholders only. This catches the failure mode where
// someone copies a real token into the template by accident.
const realTokenShape = /gbrain_[a-f0-9]{40,}/;
expect(tmpl).not.toMatch(realTokenShape);
});
test('Path 4 always uses env-var $GBRAIN_MCP_TOKEN, never inline strings', () => {
// Find every reference to the bearer header in Path 4 and verify it's
// either an env-var expansion or an explicit placeholder. Allow:
// - $GBRAIN_MCP_TOKEN (env-var expansion)
// - <bearer>, <YOUR_TOKEN>, <TOKEN> (placeholder)
// - "..." (rest-of-doc-text continuation; a doc note showing how
// `claude mcp add --header` shapes its argv).
const path4Section = tmpl.match(/### Path 4 \(Remote MCP[\s\S]*?(?=###|## )/g)?.join('') || '';
const bearerLines = path4Section.match(/Bearer\s+\S+/g) || [];
for (const line of bearerLines) {
expect(line).toMatch(/Bearer (\$GBRAIN_MCP_TOKEN|<bearer>|<YOUR_TOKEN>|<TOKEN>|\.\.\."?)/);
}
});
});
@@ -0,0 +1,148 @@
// E2E: /setup-gbrain Path 4 with a bad bearer token via Agent SDK.
//
// Drives the skill against a stub HTTP MCP server that returns 401
// (auth-shape body). Asserts that the AUTH classifier hint shows up
// AND no MCP registration happens (no claude mcp add --transport http
// in the call log; no half-written CLAUDE.md block). This is the
// regression guard for the "verify failed → STOP" gate.
//
// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
import * as http from 'http';
import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
const describeE2E = shouldRun ? describe : describe.skip;
function startStub401(): Promise<{ url: string; close: () => Promise<void> }> {
return new Promise((resolve) => {
const server = http.createServer((req, res) => {
let body = '';
req.on('data', (c) => (body += c));
req.on('end', () => {
res.statusCode = 401;
res.setHeader('Content-Type', 'application/json');
res.end(
JSON.stringify({ error: 'unauthorized', error_description: 'invalid or expired auth token' })
);
});
});
server.listen(0, '127.0.0.1', () => {
const addr = server.address();
if (!addr || typeof addr === 'string') throw new Error('no address');
resolve({
url: `http://127.0.0.1:${addr.port}/mcp`,
close: () => new Promise((r) => server.close(() => r())),
});
});
});
}
function makeFakeClaude(fakeBinDir: string): string {
const callLog = path.join(fakeBinDir, 'claude-calls.log');
const script = `#!/bin/bash
echo "claude $@" >> "${callLog}"
case "$1 $2" in
"mcp add") exit 0 ;;
"mcp list") echo "no gbrain" ; exit 0 ;;
"mcp remove") exit 0 ;;
"mcp get") exit 1 ;;
esac
exit 0
`;
fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
return callLog;
}
describeE2E('/setup-gbrain Path 4 — bad token STOPs cleanly', () => {
test('AUTH classifier fires, no MCP registration, no CLAUDE.md mutation', async () => {
const stubServer = await startStub401();
const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-'));
const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-bin-'));
const callLog = makeFakeClaude(fakeBinDir);
const ORIGINAL_CLAUDE_MD = '# Test project\n\nSome existing content here.\n';
fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), ORIGINAL_CLAUDE_MD);
const BAD_TOKEN = 'gbrain_BAD_TOKEN_67890_DELIBERATELY_INVALID';
const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
const binary = resolveClaudeBinary();
const orig = {
gstackHome: process.env.GSTACK_HOME,
pathEnv: process.env.PATH,
mcpToken: process.env.GBRAIN_MCP_TOKEN,
};
process.env.GSTACK_HOME = gstackHome;
process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
process.env.GBRAIN_MCP_TOKEN = BAD_TOKEN;
let modelTextOutput = '';
try {
const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
const result = await runAgentSdkTest({
systemPrompt: { type: 'preset', preset: 'claude_code' },
userPrompt:
`Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
`Use this MCP URL: ${stubServer.url}. ` +
`The bearer token is already in the GBRAIN_MCP_TOKEN env var. ` +
`If verify fails (Step 4c), follow the skill's STOP rule — surface the error and stop. ` +
`Do NOT register the MCP if verify failed. ` +
`Do NOT modify CLAUDE.md if verify failed.`,
workingDirectory: gstackHome,
maxTurns: 15,
allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
canUseTool: async (toolName, input) => {
if (toolName === 'AskUserQuestion') {
askUserQuestions.push({ input });
const q = (input.questions as Array<{
question: string;
options: Array<{ label: string }>;
}>)[0];
const decline = q.options.find((o) => /skip|decline|no/i.test(o.label)) ?? q.options[0]!;
return {
behavior: 'allow',
updatedInput: { questions: input.questions, answers: { [q.question]: decline.label } },
};
}
return passThroughNonAskUserQuestion(toolName, input);
},
});
modelTextOutput = JSON.stringify(result);
// Assertion 1: the AUTH classifier hint surfaced somewhere in the run.
// The verify helper outputs `"error_class": "AUTH"` and the hint
// "rotate token on the brain host" — at least one should be visible.
const hintShown =
/error_class.*AUTH/i.test(modelTextOutput) ||
/rotate token/i.test(modelTextOutput) ||
/AUTH.*HTTP 401/i.test(modelTextOutput);
expect(hintShown).toBe(true);
// Assertion 2: claude mcp add was NEVER called (verify failed → STOP).
const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
expect(calls).not.toMatch(/mcp add.*--transport http/);
// Assertion 3: CLAUDE.md is unchanged (no half-written block).
const finalClaudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
expect(finalClaudeMd).toBe(ORIGINAL_CLAUDE_MD);
// Assertion 4: the bad token never leaked to CLAUDE.md.
expect(finalClaudeMd).not.toContain(BAD_TOKEN);
} finally {
if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
await stubServer.close();
fs.rmSync(gstackHome, { recursive: true, force: true });
fs.rmSync(fakeBinDir, { recursive: true, force: true });
}
}, 240_000);
});
+214
View File
@@ -0,0 +1,214 @@
// E2E: /setup-gbrain Path 4 (Remote MCP) happy path via Agent SDK.
//
// Drives the skill against a stub HTTP MCP server and a stubbed `claude`
// binary that records `claude mcp add` calls. Asserts:
// - The verify helper succeeds (no AUTH/MALFORMED/NETWORK error in output)
// - The skill calls `claude mcp add --transport http` with the bearer
// - The token NEVER appears in the CLAUDE.md block the skill writes
// - The wrote_findings_before_asking failure mode is NOT triggered
//
// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
//
// See setup-gbrain/SKILL.md.tmpl Step 4 (Path 4) for the contract under test.
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
import * as http from 'http';
import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
const describeE2E = shouldRun ? describe : describe.skip;
// Spin up a stub MCP server that responds to initialize + tools/list.
function startStubMcpServer(opts: { failWithStatus?: number; failBody?: string } = {}): Promise<{ url: string; close: () => Promise<void> }> {
return new Promise((resolve) => {
const server = http.createServer((req, res) => {
if (req.method !== 'POST' || !(req.url ?? '').endsWith('/mcp')) {
res.statusCode = 404;
res.end();
return;
}
let body = '';
req.on('data', (c) => (body += c));
req.on('end', () => {
if (opts.failWithStatus) {
res.statusCode = opts.failWithStatus;
res.setHeader('Content-Type', 'application/json');
res.end(opts.failBody ?? JSON.stringify({ error: 'fail' }));
return;
}
const reqJson = (() => {
try { return JSON.parse(body); } catch { return {} as any; }
})();
let respBody: any;
if (reqJson.method === 'initialize') {
respBody = {
result: {
protocolVersion: '2024-11-05',
capabilities: { tools: {} },
serverInfo: { name: 'gbrain', version: '0.27.1' },
},
jsonrpc: '2.0',
id: reqJson.id,
};
} else if (reqJson.method === 'tools/list') {
respBody = { result: { tools: [{ name: 'search' }, { name: 'put_page' }] }, jsonrpc: '2.0', id: reqJson.id };
} else {
respBody = { error: { code: -32601, message: 'unknown method' }, jsonrpc: '2.0', id: reqJson.id };
}
// SSE-shape since the verify helper supports both, and many MCP
// servers (including wintermute) wrap responses as SSE.
res.statusCode = 200;
res.setHeader('Content-Type', 'text/event-stream');
res.end(`event: message\ndata: ${JSON.stringify(respBody)}\n\n`);
});
});
server.listen(0, '127.0.0.1', () => {
const addr = server.address();
if (!addr || typeof addr === 'string') throw new Error('no address');
resolve({
url: `http://127.0.0.1:${addr.port}/mcp`,
close: () => new Promise((r) => server.close(() => r())),
});
});
});
}
// Stubbed `claude` binary: intercepts `mcp add` and `mcp list` commands so
// the skill's Step 5a registration appears to succeed, while we record
// every invocation for assertions.
function makeFakeClaude(fakeBinDir: string): string {
const claudeJsonPath = path.join(fakeBinDir, 'claude.json');
const callLog = path.join(fakeBinDir, 'claude-calls.log');
const script = `#!/bin/bash
echo "claude $@" >> "${callLog}"
case "$1 $2" in
"mcp add")
# Just record the call; pretend it succeeded.
exit 0
;;
"mcp list")
echo "gbrain: http://127.0.0.1:0/mcp (HTTP) - ✓ Connected"
exit 0
;;
"mcp remove")
exit 0
;;
"mcp get")
# First few calls return "no entry"; after mcp add fires, return success.
if [ -f "${claudeJsonPath}" ]; then
cat "${claudeJsonPath}"
exit 0
fi
exit 1
;;
esac
exit 0
`;
fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
return callLog;
}
describeE2E('/setup-gbrain Path 4 (Remote MCP) — happy path', () => {
test('verifies, registers HTTP MCP, never writes token to CLAUDE.md', async () => {
const stubServer = await startStubMcpServer();
const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-'));
const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-bin-'));
const callLog = makeFakeClaude(fakeBinDir);
// The skill writes CLAUDE.md in cwd. Use gstackHome as cwd so we
// can inspect it after the run.
fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), '# Test project\n');
const SECRET_TOKEN = 'gbrain_TEST_TOKEN_THAT_MUST_NEVER_LEAK_84613';
const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
const binary = resolveClaudeBinary();
// Ambient env mutations. Restored in finally.
const orig = {
gstackHome: process.env.GSTACK_HOME,
pathEnv: process.env.PATH,
mcpToken: process.env.GBRAIN_MCP_TOKEN,
};
process.env.GSTACK_HOME = gstackHome;
process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
process.env.GBRAIN_MCP_TOKEN = SECRET_TOKEN;
let modelTextOutput = '';
try {
const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
const result = await runAgentSdkTest({
systemPrompt: { type: 'preset', preset: 'claude_code' },
userPrompt:
`Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
`Use this MCP URL: ${stubServer.url}. ` +
`The bearer token is already in the GBRAIN_MCP_TOKEN env var (do not echo it). ` +
`Skip the privacy gate — answer "Decline" if the preamble fires. ` +
`Skip the artifacts-repo provisioning step (Step 7) — answer "No thanks". ` +
`Skip per-remote policy (Step 6) — answer "skip-for-now". ` +
`Walk through Steps 4a, 4b, 4c, 5a, 8, 10 ONLY.`,
workingDirectory: gstackHome,
maxTurns: 25,
allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
canUseTool: async (toolName, input) => {
if (toolName === 'AskUserQuestion') {
askUserQuestions.push({ input });
const q = (input.questions as Array<{
question: string;
options: Array<{ label: string }>;
}>)[0];
// Auto-decline / skip everything except the path-pick (which the
// user-prompt already directed to Path 4).
const decline =
q.options.find((o) => /skip|decline|no thanks|local/i.test(o.label)) ?? q.options[q.options.length - 1]!;
return {
behavior: 'allow',
updatedInput: {
questions: input.questions,
answers: { [q.question]: decline.label },
},
};
}
return passThroughNonAskUserQuestion(toolName, input);
},
});
modelTextOutput = JSON.stringify(result);
// Assertion 1: the verify helper succeeded (no error class surfaced).
expect(modelTextOutput).not.toMatch(/error_class.*NETWORK/i);
expect(modelTextOutput).not.toMatch(/error_class.*AUTH/i);
expect(modelTextOutput).not.toMatch(/error_class.*MALFORMED/i);
// Assertion 2: claude mcp add was called with --transport http.
const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
expect(calls).toMatch(/mcp add.*--transport http/);
// Assertion 3: the secret token NEVER appears in the final CLAUDE.md.
const claudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
expect(claudeMd).not.toContain(SECRET_TOKEN);
// Assertion 4: CLAUDE.md got the remote-http block.
expect(claudeMd).toMatch(/Mode: remote-http/);
// Assertion 5: classifier — the model didn't write findings before
// asking. The Path 4 prose has 5 STOP gates; if any of them got
// skipped, that's the wrote_findings_before_asking pattern.
const wroteBefore = /## GSTACK REVIEW REPORT|critical_gaps/i.test(modelTextOutput);
// Setup-gbrain doesn't have a review report contract, so this is
// a structural shape check, not a hard failure mode.
expect(wroteBefore).toBe(false);
} finally {
if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
await stubServer.close();
fs.rmSync(gstackHome, { recursive: true, force: true });
fs.rmSync(fakeBinDir, { recursive: true, force: true });
}
}, 240_000);
});