feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005)

* feat: add Confusion Protocol to preamble resolver Injects a high-stakes ambiguity gate at preamble tier >= 2 so all workflow skills get it. Fires when Claude encounters architectural decisions, data model changes, destructive operations, or contradictory requirements. Does NOT fire on routine coding. Addresses Karpathy failure mode #1 (wrong assumptions) with an inline STOP gate instead of relying on workflow skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Hermes and GBrain host configs Hermes: tool rewrites for terminal/read_file/patch/delegate_task, paths to ~/.hermes/skills/gstack, AGENTS.md config file. GBrain: coding skills become brain-aware when GBrain mod is installed. Same tool rewrites as OpenClaw (agents spawn Claude Code via ACP). GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS NOT suppressed on gbrain host, enabling brain-first lookup and save-to-brain behavior. Both registered in hosts/index.ts with setup script redirect messages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver — brain-first lookup and save-to-brain New scripts/resolvers/gbrain.ts with two resolver functions: - GBRAIN_CONTEXT_LOAD: search brain for context before skill starts - GBRAIN_SAVE_RESULTS: save skill output to brain after completion Placeholders added to 4 thinking skill templates (office-hours, investigate, plan-ceo-review, retro). Resolves to empty string on all hosts except gbrain via suppressedResolvers. GBRAIN suppression added to all 9 non-gbrain host configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire slop:diff into /review as advisory diagnostic Adds Step 3.5 to the review template: runs bun run slop:diff against the base branch to catch AI code quality issues (empty catches, redundant return await, overcomplicated abstractions). Advisory only, never blocking. Skips silently if slop-scan is not installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Karpathy compatibility note to README Positions gstack as the workflow enforcement layer for Karpathy-style CLAUDE.md rules (17K stars). Links to forrestchang/andrej-karpathy-skills. Maps each Karpathy failure mode to the gstack skill that addresses it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve native OpenClaw thinking skills office-hours: add design doc path visibility message after writing ceo-review: add HARD GATE reminder at review section transitions retro: add non-git context support (check memory for meeting notes) Mirrors template improvements to hand-crafted native skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update tests and golden fixtures for new hosts - Host count: 8 → 10 (hermes, gbrain) - OpenClaw adapter test: expects undefined (dead code removed) - Golden ship fixtures: updated with Confusion Protocol + vendoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files Regenerated from templates after Confusion Protocol, GBrain resolver placeholders, slop:diff in review, HARD GATE reminders, investigation learnings, design doc visibility, and retro non-git context changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.18.0.0 - CHANGELOG: add v0.18.0.0 entry (Confusion Protocol, Hermes, GBrain, slop in review, Karpathy note, skill improvements) - CLAUDE.md: add hermes.ts and gbrain.ts to hosts listing - README.md: update agent count 8→10, add Hermes + GBrain to table - VERSION: bump to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: sync package.json version to 0.18.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extract Step 0 from review SKILL.md in E2E test The review-base-branch E2E test was copying the full 1493-line review/SKILL.md into the test fixture. The agent spent 8+ turns reading it in chunks, leaving only 7 turns for actual work, causing error_max_turns on every attempt. Now extracts only Step 0 (base branch detection, ~50 lines) which is all the test actually needs. Follows the CLAUDE.md rule: "NEVER copy a full SKILL.md file into an E2E test fixture." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: update GBrain and Hermes host configs for v0.10.0 integration GBrain: add 'triggers' to keepFields so generated skills pass checkResolvable() validation. Add version compat comment. Hermes: un-suppress GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS. The resolvers handle GBrain-not-installed gracefully, so Hermes agents with GBrain as a mod get brain features automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GBrain resolver DX improvements and preamble health check Resolver changes: - gbrain query → gbrain search (fast keyword search, not expensive hybrid) - Add keyword extraction guidance for agents - Show explicit gbrain put_page syntax with --title, --tags, heredoc - Add entity enrichment with false-positive filter - Name throttle error patterns (exit code 1, stderr keywords) - Add data-research routing for investigate skill - Expand skillSaveMap from 4 to 8 entries - Add brain operation telemetry summary Preamble changes: - Add gbrain doctor --fast --json health check for gbrain/hermes hosts - Parse check failures/warnings count - Show failing check details when score < 50 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve keepFields in allowlist frontmatter mode The allowlist mode hard-coded name + description reconstruction but never iterated keepFields for additional fields. Adding 'triggers' to keepFields was a no-op because the field was silently stripped. Now iterates keepFields and preserves any field beyond name/description from the source template frontmatter, including YAML arrays. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add triggers to all 38 skill templates Multi-word, skill-specific trigger keywords for GBrain's RESOLVER.md router. Each skill gets 3-6 triggers derived from its "Use when asked to..." description text. Avoids single generic words that would collide across skills (e.g., "debug this" not "debug"). These are distinct from voice-triggers (speech-to-text aliases) and serve GBrain's checkResolvable() validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files and update golden fixtures Regenerated from updated templates (triggers, brain placeholders, resolver DX improvements, preamble health check). Golden fixtures updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: settings-hook remove exits 1 when nothing to remove gstack-settings-hook remove was exiting 0 when settings.json didn't exist, causing gstack-uninstall to report "SessionStart hook" as removed on clean systems where nothing was installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for GBrain v0.10.0 integration ARCHITECTURE.md: added GBRAIN_CONTEXT_LOAD and GBRAIN_SAVE_RESULTS to resolver table. CHANGELOG.md: expanded v0.18.0.0 entry with GBrain v0.10.0 integration details (triggers, expanded brain-awareness, DX improvements, Hermes brain support), updated date. CLAUDE.md: added gbrain to resolvers/ directory comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: routing E2E stops writing to user's ~/.claude/skills/ installSkills() was copying SKILL.md files to both project-level (.claude/skills/ in tmpDir) and user-level (~/.claude/skills/). Writing to the user's real install fails when symlinks point to different worktrees or dangling targets (ENOENT on copyFileSync). Now installs to project-level only. The test already sets cwd to the tmpDir, so project-level discovery works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: scale Gemini E2E back to smoke test Gemini CLI gets lost in worktrees on complex tasks (review times out at 600s, discover-skill hits exit 124). Nobody uses Gemini for gstack skill execution. Replace the two failing tests (gemini-discover-skill and gemini-review-findings) with a single smoke test that verifies Gemini can start and read the README. 90s timeout, no skill invocation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-04-16 10:41:38 -07:00
parent 2300067267
commit b805aa0113
111 changed files with 1504 additions and 112 deletions
@@ -289,6 +289,18 @@ function transformFrontmatter(content: string, host: Host): string {
    }
  }

+  // Preserve additional keepFields beyond name and description
+  if (fm.keepFields) {
+    for (const field of fm.keepFields) {
+      if (field === 'name' || field === 'description') continue;
+      // Match YAML field with possible multi-line/array value (indented lines after colon)
+      const fieldMatch = frontmatter.match(new RegExp(`^${field}:(.*(?:\\n(?:[ \\t]+.+))*)`, 'm'));
+      if (fieldMatch) {
+        newFm += `${field}:${fieldMatch[1]}\n`;
+      }
+    }
+  }
+
  // Rename fields (copy values from template frontmatter with new keys)
  if (fm.renameFields) {
    for (const [oldName, newName] of Object.entries(fm.renameFields)) {
@@ -0,0 +1,70 @@
+/**
+ * GBrain resolver — brain-first lookup and save-to-brain for thinking skills.
+ *
+ * GBrain is a "mod" for gstack. When installed, coding skills become brain-aware:
+ * they search the brain for context before starting and save results after finishing.
+ *
+ * These resolvers are suppressed on hosts that don't support brain features
+ * (via suppressedResolvers in each host config). For those hosts,
+ * {{GBRAIN_CONTEXT_LOAD}} and {{GBRAIN_SAVE_RESULTS}} resolve to empty string.
+ *
+ * Compatible with GBrain >= v0.10.0 (search CLI, doctor --fast --json, entity enrichment).
+ */
+import type { TemplateContext } from './types';
+
+export function generateGBrainContextLoad(ctx: TemplateContext): string {
+  let base = `## Brain Context Load
+
+Before starting this skill, search your brain for relevant context:
+
+1. Extract 2-4 keywords from the user's request (nouns, error names, file paths, technical terms).
+   Search GBrain: \`gbrain search "keyword1 keyword2"\`
+   Example: for "the login page is broken after deploy", search \`gbrain search "login broken deploy"\`
+   Search returns lines like: \`[slug] Title (score: 0.85) - first line of content...\`
+2. If few results, broaden to the single most specific keyword and search again.
+3. For each result page, read it: \`gbrain get_page "<page_slug>"\`
+   Read the top 3 pages for context.
+4. Use this brain context to inform your analysis.
+
+If GBrain is not available or returns no results, proceed without brain context.
+Any non-zero exit code from gbrain commands should be treated as a transient failure.`;
+
+  if (ctx.skillName === 'investigate') {
+    base += `\n\nIf the user's request is about tracking, extracting, or researching structured data (e.g., "track this data", "extract from emails", "build a tracker"), route to GBrain's data-research skill instead: \`gbrain call data-research\`. This skill has a 7-phase pipeline optimized for structured data extraction.`;
+  }
+
+  return base;
+}
+
+export function generateGBrainSaveResults(ctx: TemplateContext): string {
+  const skillSaveMap: Record<string, string> = {
+    'office-hours': 'Save the design document as a brain page:\n```bash\ngbrain put_page --title "Office Hours: <project name>" --tags "design-doc,<project-slug>" <<\'EOF\'\n<design doc content in markdown>\nEOF\n```',
+    'investigate': 'Save the root cause analysis as a brain page:\n```bash\ngbrain put_page --title "Investigation: <issue summary>" --tags "investigation,<affected-files>" <<\'EOF\'\n<investigation findings in markdown>\nEOF\n```',
+    'plan-ceo-review': 'Save the CEO plan as a brain page:\n```bash\ngbrain put_page --title "CEO Plan: <feature name>" --tags "ceo-plan,<feature-slug>" <<\'EOF\'\n<scope decisions and vision in markdown>\nEOF\n```',
+    'retro': 'Save the retrospective as a brain page:\n```bash\ngbrain put_page --title "Retro: <date range>" --tags "retro,<date>" <<\'EOF\'\n<retro output in markdown>\nEOF\n```',
+    'plan-eng-review': 'Save the architecture decisions as a brain page:\n```bash\ngbrain put_page --title "Eng Review: <feature name>" --tags "eng-review,<feature-slug>" <<\'EOF\'\n<review findings and decisions in markdown>\nEOF\n```',
+    'ship': 'Save the release notes as a brain page:\n```bash\ngbrain put_page --title "Release: <version>" --tags "release,<version>" <<\'EOF\'\n<changelog entry and deploy details in markdown>\nEOF\n```',
+    'cso': 'Save the security audit as a brain page:\n```bash\ngbrain put_page --title "Security Audit: <date>" --tags "security-audit,<date>" <<\'EOF\'\n<findings and remediation status in markdown>\nEOF\n```',
+    'design-consultation': 'Save the design system as a brain page:\n```bash\ngbrain put_page --title "Design System: <project name>" --tags "design-system,<project-slug>" <<\'EOF\'\n<design decisions in markdown>\nEOF\n```',
+  };
+
+  const saveInstruction = skillSaveMap[ctx.skillName] || 'Save the skill output as a brain page if the results are worth preserving:\n```bash\ngbrain put_page --title "<descriptive title>" --tags "<relevant,tags>" <<\'EOF\'\n<content in markdown>\nEOF\n```';
+
+  return `## Save Results to Brain
+
+After completing this skill, persist the results to your brain for future reference:
+
+${saveInstruction}
+
+After saving the page, extract and enrich mentioned entities: for each actual person name or company/organization name found in the output, \`gbrain search "<entity name>"\` to check if a page exists. If not, create a stub page:
+\`\`\`bash
+gbrain put_page --title "<Person or Company Name>" --tags "entity,person" --content "Stub page. Mentioned in <skill name> output."
+\`\`\`
+Only extract actual person names and company/organization names. Skip product names, section headings, technical terms, and file paths.
+
+Throttle errors appear as: exit code 1 with stderr containing "throttle", "rate limit", "capacity", or "busy". If GBrain returns a throttle or rate-limit error on any save operation, defer the save and move on. The brain is busy — the content is not lost, just not persisted this run. Any other non-zero exit code should also be treated as a transient failure.
+
+Add backlinks to related brain pages if they exist. If GBrain is not available, skip this step.
+
+After brain operations complete, note in your completion output: how many pages were found in the initial search, how many entities were enriched, and whether any operations were throttled. This helps the user see brain utilization over time.`;
+}
@@ -18,6 +18,7 @@ import { generateConfidenceCalibration } from './confidence';
 import { generateInvokeSkill } from './composition';
 import { generateReviewArmy } from './review-army';
 import { generateDxFramework } from './dx';
+import { generateGBrainContextLoad, generateGBrainSaveResults } from './gbrain';

 export const RESOLVERS: Record<string, ResolverFn> = {
  SLUG_EVAL: generateSlugEval,
@@ -63,4 +64,6 @@ export const RESOLVERS: Record<string, ResolverFn> = {
  REVIEW_ARMY: generateReviewArmy,
  CROSS_REVIEW_DEDUP: generateCrossReviewDedup,
  DX_FRAMEWORK: generateDxFramework,
+  GBRAIN_CONTEXT_LOAD: generateGBrainContextLoad,
+  GBRAIN_SAVE_RESULTS: generateGBrainSaveResults,
 };
@@ -98,7 +98,18 @@ if [ -d ".claude/skills/gstack" ] && [ ! -L ".claude/skills/gstack" ]; then
 fi
 echo "VENDORED_GSTACK: $_VENDORED"
 # Detect spawned session (OpenClaw or other orchestrator)
-[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true
+[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true${ctx.host === 'gbrain' || ctx.host === 'hermes' ? `
+# GBrain health check (gbrain/hermes host only)
+if command -v gbrain &>/dev/null; then
+  _BRAIN_JSON=$(gbrain doctor --fast --json 2>/dev/null || echo '{}')
+  _BRAIN_SCORE=$(echo "$_BRAIN_JSON" | grep -o '"health_score":[0-9]*' | cut -d: -f2)
+  _BRAIN_FAILS=$(echo "$_BRAIN_JSON" | grep -o '"status":"fail"' | wc -l | tr -d ' ')
+  _BRAIN_WARNS=$(echo "$_BRAIN_JSON" | grep -o '"status":"warn"' | wc -l | tr -d ' ')
+  echo "BRAIN_HEALTH: \${_BRAIN_SCORE:-unknown} (\${_BRAIN_FAILS:-0} failures, \${_BRAIN_WARNS:-0} warnings)"
+  if [ "\${_BRAIN_SCORE:-100}" -lt 50 ] 2>/dev/null; then
+    echo "$_BRAIN_JSON" | grep -o '"name":"[^"]*","status":"[^"]*","message":"[^"]*"' || true
+  fi
+fi` : ''}
 \`\`\``;
 }

@@ -270,6 +281,14 @@ touch ~/.gstack/.vendoring-warned-\${SLUG:-unknown}
 This only happens once per project. If the marker file exists, skip entirely.`;
 }

+function generateBrainHealthInstruction(ctx: TemplateContext): string {
+  if (ctx.host !== 'gbrain' && ctx.host !== 'hermes') return '';
+  return `If \`BRAIN_HEALTH\` is shown and the score is below 50, tell the user which checks
+failed (shown in the output) and suggest: "Run \\\`gbrain doctor\\\` for full diagnostics."
+If the output is not valid JSON or health_score is missing, treat GBrain as unavailable
+and proceed without brain features this session.`;
+}
+
 function generateSpawnedSessionCheck(): string {
  return `If \`SPAWNED_SESSION\` is \`"true"\`, you are running inside a session spawned by an
 AI orchestrator (e.g., OpenClaw). In spawned sessions:
@@ -426,6 +445,21 @@ Use AskUserQuestion:
 - Note in output: "Pre-existing test failure skipped: <test-name>"`;
 }

+function generateConfusionProtocol(): string {
+  return `## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.`;
+}
+
 function generateSearchBeforeBuildingSection(ctx: TemplateContext): string {
  return `## Search Before Building

@@ -730,8 +764,9 @@ export function generatePreamble(ctx: TemplateContext): string {
    generateRoutingInjection(ctx),
    generateVendoringDeprecation(ctx),
    generateSpawnedSessionCheck(),
+    generateBrainHealthInstruction(ctx),
    generateVoiceDirective(tier),
-    ...(tier >= 2 ? [generateContextRecovery(ctx), generateAskUserFormat(ctx), generateCompletenessSection()] : []),
+    ...(tier >= 2 ? [generateContextRecovery(ctx), generateAskUserFormat(ctx), generateCompletenessSection(), generateConfusionProtocol()] : []),
    ...(tier >= 3 ? [generateRepoModeSection(), generateSearchBeforeBuildingSection(ctx)] : []),
    generateCompletionStatus(ctx),
  ];