mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-08 06:26:45 +02:00
test: regression suite + E2E for v1.27.0.0 rename
Three new regression tests guard the rename's blast radius (per codex Findings #1, #8, #9, #12): - test/no-stale-gstack-brain-refs.test.ts: greps bin/, scripts/, *.tmpl, test/ for forbidden identifiers (gstack-brain-init, gbrain_sync_mode); fails CI if any non-allowlisted file references them. - test/post-rename-doc-regen.test.ts: confirms gen-skill-docs output has no stale references in any */SKILL.md (the cross-product blind spot). - test/setup-gbrain-path4-structure.test.ts: structural lint over the Path 4 prose contract — STOP gates after verify failure, never-write- token rules, mode-aware CLAUDE.md block, bearer always via env-var. Two new gate-tier E2E tests (deterministic stub HTTP server, fixed inputs): - test/skill-e2e-setup-gbrain-remote.test.ts: Path 4 happy path. Stubs an HTTP MCP server, drives the skill via Agent SDK with a stubbed bearer, asserts claude.json gets the http MCP entry, CLAUDE.md gets the remote-http block, the secret token NEVER leaks to CLAUDE.md. - test/skill-e2e-setup-gbrain-bad-token.test.ts: stub server returns 401; asserts the AUTH classifier hint surfaces, no MCP registration occurs, CLAUDE.md is unchanged. Regression guard for the "verify failed → STOP" rule. touchfiles.ts: setup-gbrain-remote and setup-gbrain-bad-token added at gate-tier so CI catches Path 4 regressions on every PR. Plus a few comment refs flipped: bin/gstack-jsonl-merge, bin/gstack-timeline-log (legacy gstack-brain-init mentions in headers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,7 +4,7 @@
|
||||
# Usage (called by git, not by users):
|
||||
# gstack-jsonl-merge <base> <ours> <theirs>
|
||||
#
|
||||
# Registered in local git config by bin/gstack-brain-init and
|
||||
# Registered in local git config by bin/gstack-artifacts-init and
|
||||
# bin/gstack-brain-restore:
|
||||
# git config merge.jsonl-append.driver \
|
||||
# "$GSTACK_BIN/gstack-jsonl-merge %O %A %B"
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
#
|
||||
# Session timeline: local by default. If the user enables `artifacts_sync_mode`
|
||||
# with the `full` (not `artifacts-only`) privacy tier — via the first-run
|
||||
# stop-gate from `gstack-brain-init` or the preamble — timeline events are
|
||||
# stop-gate from `gstack-artifacts-init` or the preamble — timeline events are
|
||||
# published to the user's private GBrain sync repo. See docs/gbrain-sync.md.
|
||||
# Required fields: skill, event (started|completed).
|
||||
# Optional: branch, outcome, duration_s, session, ts.
|
||||
|
||||
@@ -176,3 +176,101 @@ the recovery path is:
|
||||
on the brain remote for hard-delete from history
|
||||
4. File a gitleaks issue with the pattern (or extend the gitleaks config
|
||||
at `~/.gitleaks.toml`).
|
||||
|
||||
## Path 4: Remote MCP setup (v1.27.0.0+)
|
||||
|
||||
If you don't run gbrain locally — you have a teammate or another machine
|
||||
running `gbrain serve` over HTTP, accessible via Tailscale, ngrok, or
|
||||
internal LAN — `/setup-gbrain` Path 4 is the one-paste flow.
|
||||
|
||||
You provide:
|
||||
- The MCP URL (e.g., `https://wintermute.tail554574.ts.net:3131/mcp`)
|
||||
- A bearer token (issued by the brain admin via `gbrain access-token issue`)
|
||||
|
||||
What `/setup-gbrain` does:
|
||||
1. Verifies the URL + token via `gstack-gbrain-mcp-verify`. Three failure
|
||||
modes get classified with one-line remediation hints:
|
||||
**NETWORK** ("check Tailscale/DNS"), **AUTH** ("rotate token"),
|
||||
**MALFORMED** ("Accept-header gotcha — pass both `application/json`
|
||||
AND `text/event-stream`").
|
||||
2. Registers the MCP at user scope:
|
||||
```
|
||||
claude mcp add --scope user --transport http gbrain "$URL" \
|
||||
--header "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
3. Skips local install, local doctor, transcript ingest, and federated
|
||||
source registration. All four require a local `gbrain` CLI that Path 4
|
||||
doesn't install.
|
||||
4. Optionally provisions a `gstack-artifacts-$USER` private repo on
|
||||
GitHub or GitLab and prints the one-line `gbrain sources add` command
|
||||
for your brain admin to run on the brain host.
|
||||
|
||||
### Token storage trade-off
|
||||
|
||||
The bearer token lives in `~/.claude.json` (mode 0600), where Claude Code
|
||||
stores every MCP server's credentials. During `claude mcp add --header
|
||||
"Authorization: Bearer $TOKEN"`, the token is briefly visible in
|
||||
process argv (~10ms) — visible to `ps` running concurrently. The window
|
||||
is small but it's not zero.
|
||||
|
||||
Mitigations we've considered:
|
||||
- **Stdin or env-var input form for headers** — would close the argv
|
||||
window. As of Claude Code v1.0.x, the CLI doesn't expose either.
|
||||
When it does, `/setup-gbrain` Path 4 will switch automatically.
|
||||
- **Keychain storage** — explicitly out of scope (the token's resting
|
||||
state in `~/.claude.json` is the existing trust surface for every MCP
|
||||
credential; expanding to Keychain would touch every MCP server, not
|
||||
just gbrain).
|
||||
|
||||
### Why Path 4 is "always print" for the brain-admin hookup
|
||||
|
||||
`gstack-artifacts-init` always prints the `gbrain sources add` command
|
||||
labeled "Send this to your brain admin" — even when the user IS the
|
||||
brain admin (consistent UX, no mode-detection fragility).
|
||||
|
||||
A previous design proposed probing whether the user's bearer has admin
|
||||
scope (via a benign MCP write call like `add_tag`) and auto-executing
|
||||
the source registration when scope was sufficient. The design review
|
||||
flagged that page-write doesn't actually prove source-management
|
||||
permission — those are different scopes in any sensible auth model.
|
||||
Until gbrain ships:
|
||||
- a `mcp__gbrain__whoami` capability tool that returns the bearer's
|
||||
scope set, AND
|
||||
- a `mcp__gbrain__sources_add` MCP tool with admin-scope gating
|
||||
|
||||
we always print the command rather than pretending we know who has
|
||||
permission to run it.
|
||||
|
||||
### CLAUDE.md block in Path 4
|
||||
|
||||
Distinct from local-stdio mode. Token is **never** written to CLAUDE.md
|
||||
(many projects check CLAUDE.md into git). The block records the URL,
|
||||
the verified server version, the artifacts repo URL (if provisioned),
|
||||
and the per-repo trust policy.
|
||||
|
||||
```markdown
|
||||
## GBrain Configuration (configured by /setup-gbrain)
|
||||
- Mode: remote-http
|
||||
- MCP URL: https://wintermute.tail554574.ts.net:3131/mcp
|
||||
- Server version: gbrain v0.27.1
|
||||
- Setup date: 2026-05-06
|
||||
- MCP registered: yes (user scope)
|
||||
- Token: stored in ~/.claude.json (do not commit; never written to CLAUDE.md)
|
||||
- Artifacts repo: github.com/garrytan/gstack-artifacts-garrytan (private)
|
||||
- Artifacts sync: artifacts-only
|
||||
- Current repo policy: read-write
|
||||
```
|
||||
|
||||
### Token rotation
|
||||
|
||||
Server-side. When verify hits `AUTH` (e.g., the brain admin rotated the
|
||||
token), the helper says: "rotate token on the brain host, re-run
|
||||
/setup-gbrain." On wintermute or wherever your gbrain server lives:
|
||||
|
||||
```
|
||||
gbrain access-token rotate # invalidates old, issues new
|
||||
```
|
||||
|
||||
(See `gstack/setup-gbrain/SKILL.md.tmpl` for the full Path 4 flow plus
|
||||
the gbrain enhancement requests around scoped tokens that would let
|
||||
gstack auto-rotate in V2.)
|
||||
|
||||
@@ -768,6 +768,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
|
||||
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
|
||||
```
|
||||
|
||||
**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
|
||||
this skill is a graceful no-op. The brain server's own indexing cadence
|
||||
handles code import + search refresh; this Mac doesn't run a local gbrain
|
||||
CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
|
||||
|
||||
> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
|
||||
> Your brain server (`<host>` from claude.json) handles indexing on its own
|
||||
> cadence. If indexing seems stale, ping your brain admin or trigger a
|
||||
> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
|
||||
> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
|
||||
> follow-on TODO."
|
||||
|
||||
Then exit cleanly. Do NOT proceed to Step 2.
|
||||
|
||||
For local-stdio mode and unconfigured states:
|
||||
|
||||
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
|
||||
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
|
||||
tell the user:
|
||||
|
||||
@@ -66,6 +66,22 @@ Before doing anything, check that /setup-gbrain has been run on this Mac.
|
||||
~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null
|
||||
```
|
||||
|
||||
**Remote-MCP mode (Path 4 of /setup-gbrain):** if `gbrain_mcp_mode=remote-http`,
|
||||
this skill is a graceful no-op. The brain server's own indexing cadence
|
||||
handles code import + search refresh; this Mac doesn't run a local gbrain
|
||||
CLI to drive `gbrain sources add` / `sync --strategy code`. Print:
|
||||
|
||||
> "Remote MCP detected (Path 4). /sync-gbrain is local-mode-only in V1.
|
||||
> Your brain server (`<host>` from claude.json) handles indexing on its own
|
||||
> cadence. If indexing seems stale, ping your brain admin or trigger a
|
||||
> manual sync there. To wire `/sync-gbrain` through MCP tools (when gbrain
|
||||
> ships `mcp__gbrain__sources_add` and friends), see the v1.27.0.0+
|
||||
> follow-on TODO."
|
||||
|
||||
Then exit cleanly. Do NOT proceed to Step 2.
|
||||
|
||||
For local-stdio mode and unconfigured states:
|
||||
|
||||
If `gbrain_on_path=false` OR `gbrain_config_exists=false` OR CLAUDE.md does
|
||||
not contain `## GBrain Configuration (configured by /setup-gbrain)`, STOP and
|
||||
tell the user:
|
||||
|
||||
@@ -133,7 +133,14 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
'plan-eng-finding-count': ['plan-eng-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-eng-finding-count.test.ts'],
|
||||
'plan-design-finding-count': ['plan-design-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-design-finding-count.test.ts'],
|
||||
'plan-devex-finding-count': ['plan-devex-review/**', 'scripts/resolvers/preamble.ts', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/claude-pty-runner.ts', 'test/skill-e2e-plan-devex-finding-count.test.ts'],
|
||||
'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-brain-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
|
||||
'brain-privacy-gate': ['scripts/resolvers/preamble/generate-brain-sync-block.ts', 'scripts/resolvers/preamble.ts', 'bin/gstack-brain-sync', 'bin/gstack-artifacts-init', 'bin/gstack-config', 'test/helpers/agent-sdk-runner.ts'],
|
||||
|
||||
// /setup-gbrain Path 4 (Remote MCP) — happy + bad-token end-to-end via
|
||||
// Agent SDK. Gate-tier (deterministic stub server, fixed inputs); fires
|
||||
// when the skill template, the verify helper, the artifacts-init helper,
|
||||
// or the detect script changes.
|
||||
'setup-gbrain-remote': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'bin/gstack-artifacts-init', 'bin/gstack-gbrain-detect', 'test/helpers/agent-sdk-runner.ts'],
|
||||
'setup-gbrain-bad-token': ['setup-gbrain/SKILL.md.tmpl', 'bin/gstack-gbrain-mcp-verify', 'test/helpers/agent-sdk-runner.ts'],
|
||||
|
||||
// AskUserQuestion format regression (RECOMMENDATION + Completeness: N/10)
|
||||
// Fires when either template OR the two preamble resolvers change.
|
||||
@@ -427,6 +434,12 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
|
||||
// costs ~$0.30-$0.50 per run, not needed on every commit)
|
||||
'brain-privacy-gate': 'periodic',
|
||||
|
||||
// /setup-gbrain Path 4 (Remote MCP) — gate-tier. Stub HTTP server is
|
||||
// deterministic; Path 4's STOP gates are the failure mode this catches
|
||||
// (token in CLAUDE.md, partial registration on bad bearer).
|
||||
'setup-gbrain-remote': 'gate',
|
||||
'setup-gbrain-bad-token': 'gate',
|
||||
|
||||
// AskUserQuestion format regression — periodic (Opus 4.7 non-deterministic benchmark)
|
||||
'plan-ceo-review-format-mode': 'periodic',
|
||||
'plan-ceo-review-format-approach': 'periodic',
|
||||
|
||||
@@ -0,0 +1,120 @@
|
||||
/**
|
||||
* Regression: no stale `gstack-brain-init`, `gbrain_sync_mode`, or
|
||||
* `~/.gstack-brain-remote.txt` references survive the v1.27.0.0 rename.
|
||||
*
|
||||
* Per codex Findings #1 + #8 + #9: the rename's blast radius is wider than
|
||||
* the obvious bin/ + scripts/ surface. This test grep-scans the broader
|
||||
* tree (bin, scripts, *.tmpl, generated *.md, test/, docs/) for the
|
||||
* deprecated identifiers and fails CI if any callers were missed.
|
||||
*
|
||||
* Allowlist: the migration script (`gstack-upgrade/migrations/v1.27.0.0.sh`)
|
||||
* legitimately references the old names — it's the rename actor itself.
|
||||
* Old migration scripts (v1.17.0.0.sh and similar) reference the old names
|
||||
* for their own historical context and are also allowlisted.
|
||||
*
|
||||
* The test is mechanical: if you find yourself adding a non-historical
|
||||
* file to the allowlist, you probably need to actually fix the rename
|
||||
* instead.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import { spawnSync } from 'child_process';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
const ALLOWLIST = [
|
||||
// The migration script that performs the rename. Self-references are expected.
|
||||
'gstack-upgrade/migrations/v1.27.0.0.sh',
|
||||
// Older migration scripts — historical references; these document past state.
|
||||
'gstack-upgrade/migrations/v1.17.0.0.sh',
|
||||
// The migration test itself — it asserts on the migration's behavior.
|
||||
'test/migrations-v1.27.0.0.test.ts',
|
||||
// The test for the v1.17.0.0 historical migration.
|
||||
'test/gstack-upgrade-migration-v1_17_0_0.test.ts',
|
||||
// CHANGELOG entries describe historical state by their nature.
|
||||
'CHANGELOG.md',
|
||||
// TODOS may reference past or future states by name.
|
||||
'TODOS.md',
|
||||
// The plan file for v1.27.0.0 documents why we're renaming.
|
||||
'.context/plans/setup-gbrain-remote-mcp-rename-brain-artifacts.md',
|
||||
// The bin/gstack-config comment explicitly preserves the rename note.
|
||||
'bin/gstack-config',
|
||||
// Detect script's "renamed in v1.27.0.0" comment + brain-remote-fallback path.
|
||||
'bin/gstack-gbrain-detect',
|
||||
// brain-restore + source-wireup keep the old file as a migration-window fallback
|
||||
// (read both, prefer artifacts). brain-uninstall has the same fallback.
|
||||
'bin/gstack-brain-restore',
|
||||
'bin/gstack-gbrain-source-wireup',
|
||||
'bin/gstack-brain-uninstall',
|
||||
// The preamble resolver reads the legacy file as a fallback during the
|
||||
// migration window — same pattern.
|
||||
'scripts/resolvers/preamble/generate-brain-sync-block.ts',
|
||||
// gstack-upgrade.test.ts may exercise old migration behavior.
|
||||
'test/gstack-upgrade.test.ts',
|
||||
// This test itself references the patterns to grep for.
|
||||
'test/no-stale-gstack-brain-refs.test.ts',
|
||||
// memory.md documents the rename context.
|
||||
'setup-gbrain/memory.md',
|
||||
// The new init script's header comment intentionally cites the rename.
|
||||
'bin/gstack-artifacts-init',
|
||||
// The replacement test mirrors the pattern of the old test (lineage note).
|
||||
'test/gstack-artifacts-init.test.ts',
|
||||
// The post-rename-doc-regen test references the patterns it greps for.
|
||||
'test/post-rename-doc-regen.test.ts',
|
||||
// The Path 4 structural lint references some legacy names in comments.
|
||||
'test/setup-gbrain-path4-structure.test.ts',
|
||||
// Generated docs that include the preamble bash (which has the fallback).
|
||||
// We grep template sources, not generated output, by limiting scan paths.
|
||||
];
|
||||
|
||||
const FORBIDDEN_PATTERNS = [
|
||||
'gstack-brain-init',
|
||||
'gbrain_sync_mode',
|
||||
];
|
||||
|
||||
const SCAN_PATHS = [
|
||||
'bin/',
|
||||
'scripts/',
|
||||
'setup-gbrain/SKILL.md.tmpl',
|
||||
'sync-gbrain/SKILL.md.tmpl',
|
||||
'health/SKILL.md.tmpl',
|
||||
'plan-eng-review/SKILL.md.tmpl',
|
||||
'plan-ceo-review/SKILL.md.tmpl',
|
||||
'review/SKILL.md.tmpl',
|
||||
'ship/SKILL.md.tmpl',
|
||||
'test/',
|
||||
];
|
||||
|
||||
function grepRefs(pattern: string): string[] {
|
||||
const args = ['-rn', '--', pattern, ...SCAN_PATHS.map((p) => path.join(ROOT, p))];
|
||||
const r = spawnSync('grep', args, { encoding: 'utf-8' });
|
||||
// grep exits 1 when no matches — that's fine for our purposes.
|
||||
const lines = (r.stdout || '').split('\n').filter((l) => l.trim().length > 0);
|
||||
return lines
|
||||
.map((line) => {
|
||||
// Strip ROOT prefix to get repo-relative path.
|
||||
const colon = line.indexOf(':');
|
||||
const file = line.slice(0, colon);
|
||||
return path.relative(ROOT, file);
|
||||
})
|
||||
.filter((file) => !ALLOWLIST.includes(file))
|
||||
// Filter out any file that's inside a directory we don't actually scan.
|
||||
.filter((file) => !file.startsWith('node_modules/') && !file.startsWith('.git/'));
|
||||
}
|
||||
|
||||
describe('no stale gstack-brain refs (v1.27.0.0 rename)', () => {
|
||||
for (const pattern of FORBIDDEN_PATTERNS) {
|
||||
test(`no non-allowlisted references to "${pattern}"`, () => {
|
||||
const offenders = [...new Set(grepRefs(pattern))];
|
||||
if (offenders.length > 0) {
|
||||
console.error(`Found stale "${pattern}" references in:\n${offenders.map((f) => ` - ${f}`).join('\n')}`);
|
||||
console.error(
|
||||
`If a file is intentionally referencing the old name (migration, historical doc, fallback path), add it to ALLOWLIST in this test.`
|
||||
);
|
||||
}
|
||||
expect(offenders).toEqual([]);
|
||||
});
|
||||
}
|
||||
});
|
||||
@@ -0,0 +1,74 @@
|
||||
// Post-rename doc-regen regression: after `bun run gen:skill-docs`, no
|
||||
// `gstack-brain-init` or `gbrain_sync_mode` strings appear in any of the
|
||||
// generated SKILL.md files (the cross-product blind spot codex
|
||||
// Finding #12 flagged).
|
||||
//
|
||||
// The check runs against the canonical claude-host output already on
|
||||
// disk. We don't shell out to gen-skill-docs again; the existing
|
||||
// freshness check in gen-skill-docs.test.ts covers that. This test
|
||||
// just verifies the rename actually propagated to the generated
|
||||
// artifacts that users see.
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
const FORBIDDEN_PATTERNS = [
|
||||
// Bare identifier — should NEVER appear in generated docs (if it does,
|
||||
// a template still has the old call site).
|
||||
/^.*\bgstack-brain-init\b.*$/m,
|
||||
/^.*\bgbrain_sync_mode\b.*$/m,
|
||||
];
|
||||
|
||||
// Per the preamble resolver: generated docs DO contain the
|
||||
// "~/.gstack-brain-remote.txt" string in the migration-window fallback. We
|
||||
// don't grep for that — it's intentional. We grep for the call-site
|
||||
// identifiers only.
|
||||
|
||||
function findSkillMdFiles(): string[] {
|
||||
const skillMd = path.join(ROOT, 'SKILL.md');
|
||||
const files: string[] = [skillMd];
|
||||
// Top-level skill directories with their own SKILL.md.
|
||||
const entries = fs.readdirSync(ROOT, { withFileTypes: true });
|
||||
for (const e of entries) {
|
||||
if (e.isDirectory() && !e.name.startsWith('.') && !['node_modules', 'test'].includes(e.name)) {
|
||||
const inner = path.join(ROOT, e.name, 'SKILL.md');
|
||||
if (fs.existsSync(inner)) files.push(inner);
|
||||
}
|
||||
}
|
||||
return files;
|
||||
}
|
||||
|
||||
describe('post-rename doc-regen regression (codex Finding #12)', () => {
|
||||
test('no generated SKILL.md contains "gstack-brain-init"', () => {
|
||||
const offenders: string[] = [];
|
||||
for (const file of findSkillMdFiles()) {
|
||||
const content = fs.readFileSync(file, 'utf-8');
|
||||
const m = content.match(/^.*\bgstack-brain-init\b.*$/m);
|
||||
if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
|
||||
}
|
||||
if (offenders.length > 0) {
|
||||
console.error(`Stale "gstack-brain-init" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
|
||||
}
|
||||
expect(offenders).toEqual([]);
|
||||
});
|
||||
|
||||
test('no generated SKILL.md contains "gbrain_sync_mode"', () => {
|
||||
const offenders: string[] = [];
|
||||
for (const file of findSkillMdFiles()) {
|
||||
const content = fs.readFileSync(file, 'utf-8');
|
||||
const m = content.match(/^.*\bgbrain_sync_mode\b.*$/m);
|
||||
if (m) offenders.push(`${path.relative(ROOT, file)}: ${m[0].slice(0, 100)}`);
|
||||
}
|
||||
if (offenders.length > 0) {
|
||||
console.error(`Stale "gbrain_sync_mode" in generated SKILL.md files:\n${offenders.map((o) => ' ' + o).join('\n')}`);
|
||||
}
|
||||
expect(offenders).toEqual([]);
|
||||
});
|
||||
|
||||
test('top-level SKILL.md exists and is regenerated', () => {
|
||||
expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,133 @@
|
||||
// setup-gbrain Path 4 structural lint.
|
||||
//
|
||||
// Verifies the SKILL.md.tmpl has the prose contract that Path 4 (Remote MCP)
|
||||
// depends on: STOP gates after verify failures, never-write-token rules,
|
||||
// mode-aware CLAUDE.md block, idempotent re-run path.
|
||||
//
|
||||
// Why a structural test instead of a full Agent SDK E2E:
|
||||
// - Side effects (claude.json mutation, MCP registration) are covered
|
||||
// by unit tests for gstack-gbrain-mcp-verify and gstack-artifacts-init.
|
||||
// - The structural prose is the source of regressions for AUQ pacing
|
||||
// (the failure mode the gstack repo has tracked since v1.26.x:
|
||||
// "wrote_findings_before_asking"). A grep-based regression on the
|
||||
// template prose is fast (<200ms), free, and catches the same drift
|
||||
// as the paid E2E without spending tokens.
|
||||
// - The full Agent SDK E2E remains the right tool for end-to-end
|
||||
// pacing eval; this is the gate-tier check that catches the failure
|
||||
// class deterministically.
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const TMPL = path.join(ROOT, 'setup-gbrain', 'SKILL.md.tmpl');
|
||||
|
||||
const tmpl = fs.readFileSync(TMPL, 'utf-8');
|
||||
|
||||
describe('setup-gbrain Path 4 (Remote MCP) — structural contract', () => {
|
||||
test('Step 2 lists Path 4 as one of the path options', () => {
|
||||
// "4 — Remote gbrain MCP" with em-dash (—, U+2014 — one codepoint).
|
||||
expect(tmpl).toMatch(/\*\*4 . Remote gbrain MCP/);
|
||||
});
|
||||
|
||||
test('Step 4 has a Path 4 sub-section', () => {
|
||||
expect(tmpl).toMatch(/### Path 4 \(Remote gbrain MCP/);
|
||||
});
|
||||
|
||||
test('Step 4 collects the bearer via read_secret_to_env, never argv', () => {
|
||||
// The secret-read helper is the canonical token-capture pattern.
|
||||
// Without it, tokens land in shell history.
|
||||
expect(tmpl).toContain('read_secret_to_env GBRAIN_MCP_TOKEN');
|
||||
});
|
||||
|
||||
test('Step 4c invokes gstack-gbrain-mcp-verify and STOPs on failure', () => {
|
||||
expect(tmpl).toContain('gstack-gbrain-mcp-verify');
|
||||
// The STOP rule is what prevents partial registration after auth fail.
|
||||
const path4Section = tmpl.split('### Path 4')[1] || '';
|
||||
expect(path4Section).toMatch(/STOP/);
|
||||
});
|
||||
|
||||
test('Step 4d explicitly skips Steps 3, 4 (other paths), 5, 7.5 in remote mode', () => {
|
||||
expect(tmpl).toMatch(/4d.*[Ss]kip Steps? 3, 4.*5.*7\.5/s);
|
||||
});
|
||||
|
||||
test('Step 5a has a Path 4 branch with claude mcp add --transport http', () => {
|
||||
expect(tmpl).toMatch(/Path 4 \(Remote MCP/);
|
||||
expect(tmpl).toMatch(/claude mcp add --scope user --transport http gbrain/);
|
||||
expect(tmpl).toContain('Authorization: Bearer $GBRAIN_MCP_TOKEN');
|
||||
// Token must be unset after registration so it doesn't linger in env.
|
||||
expect(tmpl).toMatch(/unset GBRAIN_MCP_TOKEN/);
|
||||
});
|
||||
|
||||
test('Step 5a removes any prior gbrain registration before adding the new one', () => {
|
||||
// Otherwise local-stdio + remote-http coexist, which breaks routing.
|
||||
expect(tmpl).toMatch(/claude mcp remove gbrain/);
|
||||
});
|
||||
|
||||
test('Step 7 calls gstack-artifacts-init with --url-form-supported flag', () => {
|
||||
expect(tmpl).toMatch(/gstack-artifacts-init.*--url-form-supported/);
|
||||
});
|
||||
|
||||
test('Step 8 CLAUDE.md block branches on mode', () => {
|
||||
// The remote-http block has Mode: remote-http; local-stdio block has Engine:.
|
||||
expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
|
||||
expect(tmpl).toMatch(/Mode: remote-http/);
|
||||
expect(tmpl).toMatch(/Mode: local-stdio/);
|
||||
});
|
||||
|
||||
test('Step 8 explicitly says the bearer is never written to CLAUDE.md', () => {
|
||||
// Token-leak regression guard. CLAUDE.md is committed in many projects.
|
||||
expect(tmpl).toMatch(/bearer token is \*\*never\*\* written to CLAUDE\.md/);
|
||||
});
|
||||
|
||||
test('Step 9 smoke test on Path 4 prints a placeholder, never the real token', () => {
|
||||
// Don't paste the token into the curl example the user might share.
|
||||
expect(tmpl).toMatch(/<YOUR_TOKEN>/);
|
||||
});
|
||||
|
||||
test('Step 10 verdict block has a remote-http variant separate from local-stdio', () => {
|
||||
expect(tmpl).toMatch(/### Path 4 \(Remote MCP\)/);
|
||||
expect(tmpl).toMatch(/mode: remote-http/);
|
||||
expect(tmpl).toMatch(/N\/A.*remote mode/);
|
||||
});
|
||||
|
||||
test('idempotency: re-running with gbrain_mcp_mode=remote-http skips Step 2', () => {
|
||||
// Re-run path stays graceful; no double-registration.
|
||||
expect(tmpl).toMatch(/gbrain_mcp_mode=remote-http/);
|
||||
});
|
||||
|
||||
test('Step 5 (local doctor) explicitly skips on Path 4', () => {
|
||||
expect(tmpl).toMatch(/SKIP entirely on Path 4 \(Remote MCP\)/);
|
||||
});
|
||||
|
||||
test('Step 7.5 (transcript ingest) explicitly skips on Path 4', () => {
|
||||
// Transcript ingest needs local gbrain CLI which Path 4 doesn't install.
|
||||
const matches = tmpl.match(/SKIP entirely on Path 4 \(Remote MCP\)/g);
|
||||
expect(matches?.length).toBeGreaterThanOrEqual(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('setup-gbrain Path 4 — token security regressions', () => {
|
||||
test('the template never inlines a real-shaped bearer string', () => {
|
||||
// We never want a literal "gbrain_<hex>" token to appear in the
|
||||
// template — placeholders only. This catches the failure mode where
|
||||
// someone copies a real token into the template by accident.
|
||||
const realTokenShape = /gbrain_[a-f0-9]{40,}/;
|
||||
expect(tmpl).not.toMatch(realTokenShape);
|
||||
});
|
||||
|
||||
test('Path 4 always uses env-var $GBRAIN_MCP_TOKEN, never inline strings', () => {
|
||||
// Find every reference to the bearer header in Path 4 and verify it's
|
||||
// either an env-var expansion or an explicit placeholder. Allow:
|
||||
// - $GBRAIN_MCP_TOKEN (env-var expansion)
|
||||
// - <bearer>, <YOUR_TOKEN>, <TOKEN> (placeholder)
|
||||
// - "..." (rest-of-doc-text continuation; a doc note showing how
|
||||
// `claude mcp add --header` shapes its argv).
|
||||
const path4Section = tmpl.match(/### Path 4 \(Remote MCP[\s\S]*?(?=###|## )/g)?.join('') || '';
|
||||
const bearerLines = path4Section.match(/Bearer\s+\S+/g) || [];
|
||||
for (const line of bearerLines) {
|
||||
expect(line).toMatch(/Bearer (\$GBRAIN_MCP_TOKEN|<bearer>|<YOUR_TOKEN>|<TOKEN>|\.\.\."?)/);
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,148 @@
|
||||
// E2E: /setup-gbrain Path 4 with a bad bearer token via Agent SDK.
|
||||
//
|
||||
// Drives the skill against a stub HTTP MCP server that returns 401
|
||||
// (auth-shape body). Asserts that the AUTH classifier hint shows up
|
||||
// AND no MCP registration happens (no claude mcp add --transport http
|
||||
// in the call log; no half-written CLAUDE.md block). This is the
|
||||
// regression guard for the "verify failed → STOP" gate.
|
||||
//
|
||||
// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import * as http from 'http';
|
||||
import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
|
||||
|
||||
const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
|
||||
const describeE2E = shouldRun ? describe : describe.skip;
|
||||
|
||||
function startStub401(): Promise<{ url: string; close: () => Promise<void> }> {
|
||||
return new Promise((resolve) => {
|
||||
const server = http.createServer((req, res) => {
|
||||
let body = '';
|
||||
req.on('data', (c) => (body += c));
|
||||
req.on('end', () => {
|
||||
res.statusCode = 401;
|
||||
res.setHeader('Content-Type', 'application/json');
|
||||
res.end(
|
||||
JSON.stringify({ error: 'unauthorized', error_description: 'invalid or expired auth token' })
|
||||
);
|
||||
});
|
||||
});
|
||||
server.listen(0, '127.0.0.1', () => {
|
||||
const addr = server.address();
|
||||
if (!addr || typeof addr === 'string') throw new Error('no address');
|
||||
resolve({
|
||||
url: `http://127.0.0.1:${addr.port}/mcp`,
|
||||
close: () => new Promise((r) => server.close(() => r())),
|
||||
});
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
function makeFakeClaude(fakeBinDir: string): string {
|
||||
const callLog = path.join(fakeBinDir, 'claude-calls.log');
|
||||
const script = `#!/bin/bash
|
||||
echo "claude $@" >> "${callLog}"
|
||||
case "$1 $2" in
|
||||
"mcp add") exit 0 ;;
|
||||
"mcp list") echo "no gbrain" ; exit 0 ;;
|
||||
"mcp remove") exit 0 ;;
|
||||
"mcp get") exit 1 ;;
|
||||
esac
|
||||
exit 0
|
||||
`;
|
||||
fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
|
||||
return callLog;
|
||||
}
|
||||
|
||||
describeE2E('/setup-gbrain Path 4 — bad token STOPs cleanly', () => {
|
||||
test('AUTH classifier fires, no MCP registration, no CLAUDE.md mutation', async () => {
|
||||
const stubServer = await startStub401();
|
||||
const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-'));
|
||||
const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-bad-bin-'));
|
||||
const callLog = makeFakeClaude(fakeBinDir);
|
||||
|
||||
const ORIGINAL_CLAUDE_MD = '# Test project\n\nSome existing content here.\n';
|
||||
fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), ORIGINAL_CLAUDE_MD);
|
||||
|
||||
const BAD_TOKEN = 'gbrain_BAD_TOKEN_67890_DELIBERATELY_INVALID';
|
||||
const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
|
||||
const binary = resolveClaudeBinary();
|
||||
|
||||
const orig = {
|
||||
gstackHome: process.env.GSTACK_HOME,
|
||||
pathEnv: process.env.PATH,
|
||||
mcpToken: process.env.GBRAIN_MCP_TOKEN,
|
||||
};
|
||||
process.env.GSTACK_HOME = gstackHome;
|
||||
process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
|
||||
process.env.GBRAIN_MCP_TOKEN = BAD_TOKEN;
|
||||
|
||||
let modelTextOutput = '';
|
||||
|
||||
try {
|
||||
const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
|
||||
const result = await runAgentSdkTest({
|
||||
systemPrompt: { type: 'preset', preset: 'claude_code' },
|
||||
userPrompt:
|
||||
`Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
|
||||
`Use this MCP URL: ${stubServer.url}. ` +
|
||||
`The bearer token is already in the GBRAIN_MCP_TOKEN env var. ` +
|
||||
`If verify fails (Step 4c), follow the skill's STOP rule — surface the error and stop. ` +
|
||||
`Do NOT register the MCP if verify failed. ` +
|
||||
`Do NOT modify CLAUDE.md if verify failed.`,
|
||||
workingDirectory: gstackHome,
|
||||
maxTurns: 15,
|
||||
allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
|
||||
...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
|
||||
canUseTool: async (toolName, input) => {
|
||||
if (toolName === 'AskUserQuestion') {
|
||||
askUserQuestions.push({ input });
|
||||
const q = (input.questions as Array<{
|
||||
question: string;
|
||||
options: Array<{ label: string }>;
|
||||
}>)[0];
|
||||
const decline = q.options.find((o) => /skip|decline|no/i.test(o.label)) ?? q.options[0]!;
|
||||
return {
|
||||
behavior: 'allow',
|
||||
updatedInput: { questions: input.questions, answers: { [q.question]: decline.label } },
|
||||
};
|
||||
}
|
||||
return passThroughNonAskUserQuestion(toolName, input);
|
||||
},
|
||||
});
|
||||
|
||||
modelTextOutput = JSON.stringify(result);
|
||||
|
||||
// Assertion 1: the AUTH classifier hint surfaced somewhere in the run.
|
||||
// The verify helper outputs `"error_class": "AUTH"` and the hint
|
||||
// "rotate token on the brain host" — at least one should be visible.
|
||||
const hintShown =
|
||||
/error_class.*AUTH/i.test(modelTextOutput) ||
|
||||
/rotate token/i.test(modelTextOutput) ||
|
||||
/AUTH.*HTTP 401/i.test(modelTextOutput);
|
||||
expect(hintShown).toBe(true);
|
||||
|
||||
// Assertion 2: claude mcp add was NEVER called (verify failed → STOP).
|
||||
const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
|
||||
expect(calls).not.toMatch(/mcp add.*--transport http/);
|
||||
|
||||
// Assertion 3: CLAUDE.md is unchanged (no half-written block).
|
||||
const finalClaudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
|
||||
expect(finalClaudeMd).toBe(ORIGINAL_CLAUDE_MD);
|
||||
|
||||
// Assertion 4: the bad token never leaked to CLAUDE.md.
|
||||
expect(finalClaudeMd).not.toContain(BAD_TOKEN);
|
||||
} finally {
|
||||
if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
|
||||
if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
|
||||
if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
|
||||
await stubServer.close();
|
||||
fs.rmSync(gstackHome, { recursive: true, force: true });
|
||||
fs.rmSync(fakeBinDir, { recursive: true, force: true });
|
||||
}
|
||||
}, 240_000);
|
||||
});
|
||||
@@ -0,0 +1,214 @@
|
||||
// E2E: /setup-gbrain Path 4 (Remote MCP) happy path via Agent SDK.
|
||||
//
|
||||
// Drives the skill against a stub HTTP MCP server and a stubbed `claude`
|
||||
// binary that records `claude mcp add` calls. Asserts:
|
||||
// - The verify helper succeeds (no AUTH/MALFORMED/NETWORK error in output)
|
||||
// - The skill calls `claude mcp add --transport http` with the bearer
|
||||
// - The token NEVER appears in the CLAUDE.md block the skill writes
|
||||
// - The wrote_findings_before_asking failure mode is NOT triggered
|
||||
//
|
||||
// Cost: ~$0.30-$0.50 per run. Gate-tier (EVALS=1 EVALS_TIER=gate).
|
||||
//
|
||||
// See setup-gbrain/SKILL.md.tmpl Step 4 (Path 4) for the contract under test.
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import * as http from 'http';
|
||||
import { runAgentSdkTest, passThroughNonAskUserQuestion, resolveClaudeBinary } from './helpers/agent-sdk-runner';
|
||||
|
||||
const shouldRun = !!process.env.EVALS && (process.env.EVALS_TIER === 'gate' || !process.env.EVALS_TIER);
|
||||
const describeE2E = shouldRun ? describe : describe.skip;
|
||||
|
||||
// Spin up a stub MCP server that responds to initialize + tools/list.
|
||||
function startStubMcpServer(opts: { failWithStatus?: number; failBody?: string } = {}): Promise<{ url: string; close: () => Promise<void> }> {
|
||||
return new Promise((resolve) => {
|
||||
const server = http.createServer((req, res) => {
|
||||
if (req.method !== 'POST' || !(req.url ?? '').endsWith('/mcp')) {
|
||||
res.statusCode = 404;
|
||||
res.end();
|
||||
return;
|
||||
}
|
||||
let body = '';
|
||||
req.on('data', (c) => (body += c));
|
||||
req.on('end', () => {
|
||||
if (opts.failWithStatus) {
|
||||
res.statusCode = opts.failWithStatus;
|
||||
res.setHeader('Content-Type', 'application/json');
|
||||
res.end(opts.failBody ?? JSON.stringify({ error: 'fail' }));
|
||||
return;
|
||||
}
|
||||
const reqJson = (() => {
|
||||
try { return JSON.parse(body); } catch { return {} as any; }
|
||||
})();
|
||||
let respBody: any;
|
||||
if (reqJson.method === 'initialize') {
|
||||
respBody = {
|
||||
result: {
|
||||
protocolVersion: '2024-11-05',
|
||||
capabilities: { tools: {} },
|
||||
serverInfo: { name: 'gbrain', version: '0.27.1' },
|
||||
},
|
||||
jsonrpc: '2.0',
|
||||
id: reqJson.id,
|
||||
};
|
||||
} else if (reqJson.method === 'tools/list') {
|
||||
respBody = { result: { tools: [{ name: 'search' }, { name: 'put_page' }] }, jsonrpc: '2.0', id: reqJson.id };
|
||||
} else {
|
||||
respBody = { error: { code: -32601, message: 'unknown method' }, jsonrpc: '2.0', id: reqJson.id };
|
||||
}
|
||||
// SSE-shape since the verify helper supports both, and many MCP
|
||||
// servers (including wintermute) wrap responses as SSE.
|
||||
res.statusCode = 200;
|
||||
res.setHeader('Content-Type', 'text/event-stream');
|
||||
res.end(`event: message\ndata: ${JSON.stringify(respBody)}\n\n`);
|
||||
});
|
||||
});
|
||||
server.listen(0, '127.0.0.1', () => {
|
||||
const addr = server.address();
|
||||
if (!addr || typeof addr === 'string') throw new Error('no address');
|
||||
resolve({
|
||||
url: `http://127.0.0.1:${addr.port}/mcp`,
|
||||
close: () => new Promise((r) => server.close(() => r())),
|
||||
});
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
// Stubbed `claude` binary: intercepts `mcp add` and `mcp list` commands so
|
||||
// the skill's Step 5a registration appears to succeed, while we record
|
||||
// every invocation for assertions.
|
||||
function makeFakeClaude(fakeBinDir: string): string {
|
||||
const claudeJsonPath = path.join(fakeBinDir, 'claude.json');
|
||||
const callLog = path.join(fakeBinDir, 'claude-calls.log');
|
||||
const script = `#!/bin/bash
|
||||
echo "claude $@" >> "${callLog}"
|
||||
case "$1 $2" in
|
||||
"mcp add")
|
||||
# Just record the call; pretend it succeeded.
|
||||
exit 0
|
||||
;;
|
||||
"mcp list")
|
||||
echo "gbrain: http://127.0.0.1:0/mcp (HTTP) - ✓ Connected"
|
||||
exit 0
|
||||
;;
|
||||
"mcp remove")
|
||||
exit 0
|
||||
;;
|
||||
"mcp get")
|
||||
# First few calls return "no entry"; after mcp add fires, return success.
|
||||
if [ -f "${claudeJsonPath}" ]; then
|
||||
cat "${claudeJsonPath}"
|
||||
exit 0
|
||||
fi
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
exit 0
|
||||
`;
|
||||
fs.writeFileSync(path.join(fakeBinDir, 'claude'), script, { mode: 0o755 });
|
||||
return callLog;
|
||||
}
|
||||
|
||||
describeE2E('/setup-gbrain Path 4 (Remote MCP) — happy path', () => {
|
||||
test('verifies, registers HTTP MCP, never writes token to CLAUDE.md', async () => {
|
||||
const stubServer = await startStubMcpServer();
|
||||
const gstackHome = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-'));
|
||||
const fakeBinDir = fs.mkdtempSync(path.join(os.tmpdir(), 'setup-gbrain-remote-bin-'));
|
||||
const callLog = makeFakeClaude(fakeBinDir);
|
||||
|
||||
// The skill writes CLAUDE.md in cwd. Use gstackHome as cwd so we
|
||||
// can inspect it after the run.
|
||||
fs.writeFileSync(path.join(gstackHome, 'CLAUDE.md'), '# Test project\n');
|
||||
|
||||
const SECRET_TOKEN = 'gbrain_TEST_TOKEN_THAT_MUST_NEVER_LEAK_84613';
|
||||
const askUserQuestions: Array<{ input: Record<string, unknown> }> = [];
|
||||
const binary = resolveClaudeBinary();
|
||||
|
||||
// Ambient env mutations. Restored in finally.
|
||||
const orig = {
|
||||
gstackHome: process.env.GSTACK_HOME,
|
||||
pathEnv: process.env.PATH,
|
||||
mcpToken: process.env.GBRAIN_MCP_TOKEN,
|
||||
};
|
||||
process.env.GSTACK_HOME = gstackHome;
|
||||
process.env.PATH = `${fakeBinDir}:${path.join(path.resolve(import.meta.dir, '..'), 'bin')}:${process.env.PATH ?? '/usr/bin:/bin:/opt/homebrew/bin'}`;
|
||||
process.env.GBRAIN_MCP_TOKEN = SECRET_TOKEN;
|
||||
|
||||
let modelTextOutput = '';
|
||||
|
||||
try {
|
||||
const skillPath = path.resolve(import.meta.dir, '..', 'setup-gbrain', 'SKILL.md');
|
||||
const result = await runAgentSdkTest({
|
||||
systemPrompt: { type: 'preset', preset: 'claude_code' },
|
||||
userPrompt:
|
||||
`Read the skill file at ${skillPath} and follow Path 4 (Remote MCP) only. ` +
|
||||
`Use this MCP URL: ${stubServer.url}. ` +
|
||||
`The bearer token is already in the GBRAIN_MCP_TOKEN env var (do not echo it). ` +
|
||||
`Skip the privacy gate — answer "Decline" if the preamble fires. ` +
|
||||
`Skip the artifacts-repo provisioning step (Step 7) — answer "No thanks". ` +
|
||||
`Skip per-remote policy (Step 6) — answer "skip-for-now". ` +
|
||||
`Walk through Steps 4a, 4b, 4c, 5a, 8, 10 ONLY.`,
|
||||
workingDirectory: gstackHome,
|
||||
maxTurns: 25,
|
||||
allowedTools: ['Read', 'Grep', 'Glob', 'Bash', 'Write', 'Edit'],
|
||||
...(binary ? { pathToClaudeCodeExecutable: binary } : {}),
|
||||
canUseTool: async (toolName, input) => {
|
||||
if (toolName === 'AskUserQuestion') {
|
||||
askUserQuestions.push({ input });
|
||||
const q = (input.questions as Array<{
|
||||
question: string;
|
||||
options: Array<{ label: string }>;
|
||||
}>)[0];
|
||||
// Auto-decline / skip everything except the path-pick (which the
|
||||
// user-prompt already directed to Path 4).
|
||||
const decline =
|
||||
q.options.find((o) => /skip|decline|no thanks|local/i.test(o.label)) ?? q.options[q.options.length - 1]!;
|
||||
return {
|
||||
behavior: 'allow',
|
||||
updatedInput: {
|
||||
questions: input.questions,
|
||||
answers: { [q.question]: decline.label },
|
||||
},
|
||||
};
|
||||
}
|
||||
return passThroughNonAskUserQuestion(toolName, input);
|
||||
},
|
||||
});
|
||||
|
||||
modelTextOutput = JSON.stringify(result);
|
||||
|
||||
// Assertion 1: the verify helper succeeded (no error class surfaced).
|
||||
expect(modelTextOutput).not.toMatch(/error_class.*NETWORK/i);
|
||||
expect(modelTextOutput).not.toMatch(/error_class.*AUTH/i);
|
||||
expect(modelTextOutput).not.toMatch(/error_class.*MALFORMED/i);
|
||||
|
||||
// Assertion 2: claude mcp add was called with --transport http.
|
||||
const calls = fs.existsSync(callLog) ? fs.readFileSync(callLog, 'utf-8') : '';
|
||||
expect(calls).toMatch(/mcp add.*--transport http/);
|
||||
|
||||
// Assertion 3: the secret token NEVER appears in the final CLAUDE.md.
|
||||
const claudeMd = fs.readFileSync(path.join(gstackHome, 'CLAUDE.md'), 'utf-8');
|
||||
expect(claudeMd).not.toContain(SECRET_TOKEN);
|
||||
|
||||
// Assertion 4: CLAUDE.md got the remote-http block.
|
||||
expect(claudeMd).toMatch(/Mode: remote-http/);
|
||||
|
||||
// Assertion 5: classifier — the model didn't write findings before
|
||||
// asking. The Path 4 prose has 5 STOP gates; if any of them got
|
||||
// skipped, that's the wrote_findings_before_asking pattern.
|
||||
const wroteBefore = /## GSTACK REVIEW REPORT|critical_gaps/i.test(modelTextOutput);
|
||||
// Setup-gbrain doesn't have a review report contract, so this is
|
||||
// a structural shape check, not a hard failure mode.
|
||||
expect(wroteBefore).toBe(false);
|
||||
} finally {
|
||||
if (orig.gstackHome === undefined) delete process.env.GSTACK_HOME; else process.env.GSTACK_HOME = orig.gstackHome;
|
||||
if (orig.pathEnv === undefined) delete process.env.PATH; else process.env.PATH = orig.pathEnv;
|
||||
if (orig.mcpToken === undefined) delete process.env.GBRAIN_MCP_TOKEN; else process.env.GBRAIN_MCP_TOKEN = orig.mcpToken;
|
||||
await stubServer.close();
|
||||
fs.rmSync(gstackHome, { recursive: true, force: true });
|
||||
fs.rmSync(fakeBinDir, { recursive: true, force: true });
|
||||
}
|
||||
}, 240_000);
|
||||
});
|
||||
Reference in New Issue
Block a user