test: collision sentinel covers every gstack skill across every host

Universal insurance policy against upstream slash-command shadowing. The /checkpoint bug (Claude Code shipped /checkpoint as a /rewind alias, silently shadowing the gstack skill) cost us weeks of user confusion before we realized. This test is the "never again" check: enumerate every gstack skill name and cross-check against a per-host list of known built-in slash commands. Architecture: - KNOWN_BUILTINS per host. Currently Claude Code: 23 built-ins (checkpoint, rewind, compact, plan, cost, stats, context, usage, help, clear, quit, exit, agents, mcp, model, permissions, config, init, review, security-review, continue, bare, model). Sourced from docs + live skill-list dumps + claude --help output. - KNOWN_COLLISIONS_TOLERATED: skill names that DO collide but we've consciously decided to live with. Mandatory justification comment per entry. - GENERIC_VERB_WATCHLIST: advisory list of names at higher risk of future collision (save, load, run, deploy, start, stop, etc.). Prints a warning but doesn't fail. Tests (6 total, 26ms, free tier): 1. At least one skill discovered (enumerator sanity) 2. No duplicate skill names within gstack 3. No skill name collides with any claude-code built-in (with KNOWN_COLLISIONS_TOLERATED escape hatch) 4. KNOWN_COLLISIONS_TOLERATED entries are all still live collisions (prevents stale exceptions rotting after a rename) 5. The /checkpoint rename actually landed (checkpoint not in skills, context-save and context-restore are) 6. Advisory: generic-verb watchlist (informational only) Current real collisions: - /review — gstack pre-dates Claude Code's /review. Tolerated with written justification (track user confusion, rename to /diff-review if it bites). The rest of gstack is collision-free. Maintenance: when a host ships a new built-in, add the name to the host's KNOWN_BUILTINS list. If a gstack skill needs to coexist with a built-in, add an entry to KNOWN_COLLISIONS_TOLERATED with a written justification. Blind additions fail code review. TODO: add codex/kiro/opencode/slate/cursor/openclaw/hermes/factory/ gbrain built-in lists as we encounter collisions. Claude Code is the primary shadow risk (biggest audience, fastest release cadence). Note: bun's parser chokes on backticks inside block comments (spec- legal but regex-breaking in @oven/bun-parser). Workaround: avoid them.
2026-05-02 03:35:09 +02:00 · 2026-04-19 05:47:41 +08:00
parent bdcf2504ae
commit e22ce8e28e
1 changed files with 228 additions and 0 deletions
@@ -0,0 +1,228 @@
+/**
+ * Collision Sentinel — insurance policy against upstream slash-command collisions.
+ *
+ * History: in April 2026 Claude Code shipped /checkpoint as a native alias
+ * for /rewind, silently shadowing the gstack /checkpoint skill. Users
+ * typed /checkpoint expecting to save state; agents routed to the built-in
+ * or confabulated "this is a built-in you need to type directly" and nothing
+ * was saved. We found out from users, not from tests.
+ *
+ * This file is the "never again" test. It enumerates every gstack skill name
+ * from every SKILL.md.tmpl file in the repo and cross-checks against a
+ * per-host list of known built-in slash commands. If any gstack skill name
+ * collides with a host built-in, this test fails and names the collision.
+ *
+ * Maintenance: when Claude Code (or any other host we support) ships a new
+ * built-in slash command, add the name to the host's KNOWN_BUILTINS list
+ * below. If a gstack skill needs to coexist with a built-in anyway (e.g.,
+ * we decide the semantic overlap is acceptable), add it to
+ * KNOWN_COLLISIONS_TOLERATED with a written justification.
+ *
+ * Free tier. ~50ms runtime.
+ */
+
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+// ─── Host built-in registries ──────────────────────────────────────────────
+//
+// One const per host we support. Names are the slash-command identifier WITHOUT
+// the leading slash. Keep sorted alphabetically within each host so diffs are
+// reviewable. Cite the source (docs URL, release notes, or "observed") in the
+// comment next to each entry — future maintainers need to know why an entry
+// is on the list.
+
+const KNOWN_BUILTINS: Record<string, string[]> = {
+  'claude-code': [
+    // Slash commands observed in 'claude --help' or cited in docs as of 2026-04.
+    // Sources:
+    //   https://code.claude.com/docs/en/checkpointing
+    //   https://claudelog.com/mechanics/rewind/
+    //   claude --help output
+    //   Claude Code skill list dumps from live sessions
+    'agents',         // Agent config
+    'bare',           // Minimal mode
+    'checkpoint',     // Alias of /rewind (the collision that started this file)
+    'clear',          // Clear the conversation
+    'compact',        // Context compaction
+    'config',         // Config UI
+    'context',        // Context usage display
+    'continue',       // --continue / resume last conversation
+    'cost',           // Cost display
+    'exit',           // Exit shell
+    'help',           // Help
+    'init',           // Initialize a new CLAUDE.md file
+    'mcp',            // MCP server config
+    'model',          // Model selection
+    'permissions',    // Permission config
+    'plan',           // Plan mode toggle (also Shift+Tab)
+    'quit',           // Quit
+    'review',         // Review a pull request (BUILT-IN shipped in 2026)
+    'rewind',         // Conversation rewind
+    'security-review', // Security audit of pending changes
+    'stats',          // Session stats
+    'usage',          // API usage stats
+  ],
+  // Add codex/kiro/opencode/slate/cursor/openclaw/hermes/factory/gbrain
+  // built-in lists when we encounter collisions. Claude Code is the primary
+  // shadow risk because it's the biggest audience and ships the most
+  // frequently; other hosts collide less often.
+  // TODO: codex CLI built-ins (login, logout, exec, review, etc. — but we
+  // invoke codex from gstack, we don't install skills INTO codex the same
+  // way, so this is lower priority).
+};
+
+// Collisions we know about and have consciously decided to tolerate. The
+// justification is mandatory — reviewers need the context next time the
+// user reports confusion, and blind additions to this map should fail code
+// review.
+const KNOWN_COLLISIONS_TOLERATED: Record<string, string> = {
+  // skill name → one-line justification + action plan
+  'review': 'gstack /review (pre-landing diff analysis) pre-dates the Claude Code built-in /review (Review a pull request). The gstack skill is much richer (SQL safety, LLM trust boundary, specialist dispatch). Watch for user confusion reports and consider renaming to /diff-review or /pre-land if the collision bites. TODO: track user-reported incidents in TODOS.md.',
+};
+
+// Generic-verb watchlist: skill names that are single common verbs, which
+// are at higher risk of being claimed by a future host built-in. Advisory
+// only — the test prints a warning but doesn't fail. If a name here stops
+// being safe, move it to the appropriate host's KNOWN_BUILTINS list.
+const GENERIC_VERB_WATCHLIST = [
+  'save', 'load', 'run', 'test', 'build', 'deploy',
+  'fork', 'branch', 'commit', 'push', 'pull', 'merge', 'rebase',
+  'start', 'stop', 'restart', 'reset', 'pause', 'resume',
+  'show', 'list', 'find', 'search', 'view',
+  'create', 'delete', 'remove', 'update', 'rename',
+  'login', 'logout', 'auth',
+];
+
+// ─── Enumerator ────────────────────────────────────────────────────────────
+
+interface GstackSkill {
+  name: string;
+  templatePath: string;
+}
+
+function enumerateGstackSkills(): GstackSkill[] {
+  const skills: GstackSkill[] = [];
+  // Scan one level deep for */SKILL.md.tmpl plus root SKILL.md.tmpl.
+  const candidates = [
+    path.join(ROOT, 'SKILL.md.tmpl'),
+    ...fs.readdirSync(ROOT, { withFileTypes: true })
+      .filter((d) => d.isDirectory())
+      .map((d) => path.join(ROOT, d.name, 'SKILL.md.tmpl')),
+  ];
+  for (const tmpl of candidates) {
+    if (!fs.existsSync(tmpl)) continue;
+    const content = fs.readFileSync(tmpl, 'utf-8');
+    // Parse the 'name:' field from YAML frontmatter.
+    const frontmatter = content.match(/^---\n([\s\S]+?)\n---/);
+    if (!frontmatter) continue;
+    const nameMatch = frontmatter[1].match(/^name:\s*(\S+)/m);
+    if (!nameMatch) continue;
+    skills.push({ name: nameMatch[1].trim(), templatePath: tmpl });
+  }
+  return skills;
+}
+
+// ─── Tests ─────────────────────────────────────────────────────────────────
+
+describe('skill-collision-sentinel', () => {
+  const skills = enumerateGstackSkills();
+
+  test('at least one skill is discovered (sanity)', () => {
+    // If this fails, the enumerator broke, not the collision check.
+    expect(skills.length).toBeGreaterThan(10);
+  });
+
+  test('no duplicate skill names within gstack', () => {
+    const seen = new Map<string, string>();
+    const dupes: string[] = [];
+    for (const { name, templatePath } of skills) {
+      if (seen.has(name)) {
+        dupes.push(`${name} appears in both ${seen.get(name)} and ${templatePath}`);
+      } else {
+        seen.set(name, templatePath);
+      }
+    }
+    if (dupes.length > 0) {
+      throw new Error(`Duplicate skill names:\n  ${dupes.join('\n  ')}`);
+    }
+  });
+
+  // Hard check: no gstack skill name collides with a known host built-in
+  // unless the collision is explicitly tolerated. This is the test that
+  // would have caught the /checkpoint bug in April 2026.
+  for (const [host, builtins] of Object.entries(KNOWN_BUILTINS)) {
+    test(`no skill name collides with a ${host} built-in (or has written justification)`, () => {
+      const builtinSet = new Set(builtins);
+      const collisions: Array<{ skill: string; builtin: string }> = [];
+      for (const { name } of skills) {
+        if (builtinSet.has(name) && !(name in KNOWN_COLLISIONS_TOLERATED)) {
+          collisions.push({ skill: name, builtin: name });
+        }
+      }
+      if (collisions.length > 0) {
+        const msg = collisions.map(c =>
+          `  /${c.skill} collides with ${host} built-in /${c.builtin}.\n` +
+          `    Fix: rename the gstack skill (precedent: /checkpoint → /context-save+/context-restore),\n` +
+          `    OR add an entry to KNOWN_COLLISIONS_TOLERATED with a written justification.`
+        ).join('\n\n');
+        throw new Error(`Found ${collisions.length} unresolved collision(s) with ${host} built-ins:\n\n${msg}`);
+      }
+    });
+  }
+
+  // Every KNOWN_COLLISIONS_TOLERATED entry must correspond to a real skill
+  // AND a real built-in. Prevents the exception list from rotting with
+  // stale entries after a rename.
+  test('KNOWN_COLLISIONS_TOLERATED entries are all still active collisions', () => {
+    const skillNames = new Set(skills.map(s => s.name));
+    const allBuiltins = new Set<string>();
+    for (const list of Object.values(KNOWN_BUILTINS)) {
+      for (const name of list) allBuiltins.add(name);
+    }
+    const stale: string[] = [];
+    for (const name of Object.keys(KNOWN_COLLISIONS_TOLERATED)) {
+      if (!skillNames.has(name)) {
+        stale.push(`  "${name}" is in KNOWN_COLLISIONS_TOLERATED but no gstack skill has that name — remove the exception`);
+      } else if (!allBuiltins.has(name)) {
+        stale.push(`  "${name}" is in KNOWN_COLLISIONS_TOLERATED but no host's KNOWN_BUILTINS lists it — remove the exception`);
+      }
+    }
+    if (stale.length > 0) {
+      throw new Error(`Stale tolerance entries:\n${stale.join('\n')}`);
+    }
+  });
+
+  // Self-check: the /checkpoint rename actually landed. If someone reverts
+  // the rename by accident, this catches it.
+  test('the /checkpoint collision that started this file is actually resolved', () => {
+    const names = new Set(skills.map(s => s.name));
+    expect(names.has('checkpoint')).toBe(false);
+    // And the replacements exist.
+    expect(names.has('context-save')).toBe(true);
+    expect(names.has('context-restore')).toBe(true);
+  });
+
+  // Advisory: print a warning for any skill whose name is a generic verb.
+  // Doesn't fail — just informs reviewers.
+  test('advisory: generic-verb watchlist (informational)', () => {
+    const watchlist = new Set(GENERIC_VERB_WATCHLIST);
+    const flagged: string[] = [];
+    for (const { name } of skills) {
+      if (watchlist.has(name)) flagged.push(name);
+    }
+    if (flagged.length > 0) {
+      console.log(
+        `\n⚠️  advisory: ${flagged.length} skill(s) use generic verbs that may be at risk ` +
+        `of future host built-in collisions: ${flagged.map(n => `/${n}`).join(', ')}\n` +
+        `   These are NOT current collisions — they're names to watch. If any become ` +
+        `taken, the per-host test above will fail.\n`
+      );
+    }
+    // Test always passes — this is advisory.
+    expect(true).toBe(true);
+  });
+});