merge: integrate origin/main (v0.18.1.0) into open-agents-learnings

Main moved forward 6 commits while this branch was local. Integrated both sides preserving all functionality: From main (v0.16.4.0 → v0.18.1.0): - v0.17.0.0 — UX behavioral foundations + ux-audit (generateUXPrinciples, {{UX_PRINCIPLES}} placeholder, triggers frontmatter on skills) - v0.18.0.0 — Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (generateBrainHealthInstruction, generateConfusionProtocol, generateGBrainContextLoad, generateGBrainSaveResults, hosts/gbrain.ts, hosts/hermes.ts, scripts/resolvers/gbrain.ts, GBrain bash health check) - v0.18.0.1 — ngrok Windows build fix - 0cc830b6 — tilde-in-assignment permission fix - cc42f14a — gstack compact design doc (tabled) - 822e843a — headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) Integration approach: keep this branch's preamble.ts submodule refactor as the structure of record. Extracted main's two new generators into their own submodules: - scripts/resolvers/preamble/generate-brain-health-instruction.ts - scripts/resolvers/preamble/generate-confusion-protocol.ts Updated scripts/resolvers/preamble/generate-preamble-bash.ts to absorb main's GBrain health check (host-conditional on gbrain/hermes). scripts/resolvers/index.ts now imports BOTH: - This branch's adds: MODEL_OVERLAY, TASTE_PROFILE, BIN_DIR resolvers - Main's adds: UX_PRINCIPLES, GBRAIN_CONTEXT_LOAD, GBRAIN_SAVE_RESULTS resolvers scripts/resolvers/design.ts keeps both generateTasteProfile (this branch) and generateUXPrinciples (main). Sibling exports, no overlap. scripts/gen-skill-docs.ts keeps both this branch's --model flag wiring and main's edits. Templates auto-merged where possible. The 35 generated SKILL.md / golden conflicts auto-resolved via `bun run gen:skill-docs --host all` followed by re-snapshotting the ship goldens for claude/codex/factory. Verification: - bun run gen:skill-docs --host all completes cleanly - bun test: 1 pre-existing failure (gstack-community-dashboard Supabase network test, 235s timeout). NOT related to merge — unchanged Supabase test infra times out without live network. Flagged in PR body. Token-ceiling warnings on plan-ceo-review (29K), office-hours (26K), and ship (34K). These existed on origin/main before the merge — the preamble grew substantially from main's GBrain + UX additions plus this branch's continuous-checkpoint, context-health, model-overlay, taste-profile, and feature-discovery additions. Worth a follow-up reduction pass but doesn't block this merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:05:08 +02:00 · 2026-04-17 13:58:15 +08:00
parent 0926e4b994 822e843a60
commit 7529dbb276
129 changed files with 3314 additions and 154 deletions
@@ -268,6 +268,18 @@ function transformFrontmatter(content: string, host: Host): string {
    }
  }

+  // Preserve additional keepFields beyond name and description
+  if (fm.keepFields) {
+    for (const field of fm.keepFields) {
+      if (field === 'name' || field === 'description') continue;
+      // Match YAML field with possible multi-line/array value (indented lines after colon)
+      const fieldMatch = frontmatter.match(new RegExp(`^${field}:(.*(?:\\n(?:[ \\t]+.+))*)`, 'm'));
+      if (fieldMatch) {
+        newFm += `${field}:${fieldMatch[1]}\n`;
+      }
+    }
+  }
+
  // Rename fields (copy values from template frontmatter with new keys)
  if (fm.renameFields) {
    for (const [oldName, newName] of Object.entries(fm.renameFields)) {
@@ -521,6 +533,12 @@ for (const currentHost of hostsToRun) {
      const lines = content.split('\n').length;
      const tokens = Math.round(content.length / 4); // ~4 chars per token
      tokenBudget.push({ skill: relOutput, lines, tokens });
+
+      // Token ceiling check: warn if any generated SKILL.md exceeds ~25K tokens (100KB)
+      const TOKEN_CEILING_BYTES = 100_000;
+      if (content.length > TOKEN_CEILING_BYTES) {
+        console.warn(`⚠️  TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~25K tokens)`);
+      }
    }

    // Generate gstack-lite and gstack-full for OpenClaw host
@@ -106,7 +106,7 @@ export function generateBrowseSetup(ctx: TemplateContext): string {
 _ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
 B=""
 [ -n "$_ROOT" ] && [ -x "$_ROOT/${ctx.paths.localSkillRoot}/browse/dist/browse" ] && B="$_ROOT/${ctx.paths.localSkillRoot}/browse/dist/browse"
-[ -z "$B" ] && B=${ctx.paths.browseDir}/browse
+[ -z "$B" ] && B="$HOME${ctx.paths.browseDir.replace(/^~/, '')}/browse"
 if [ -x "$B" ]; then
  echo "READY: $B"
 else
@@ -99,9 +99,13 @@ The most uniquely designer-like output. Form a gut reaction before analyzing any
 3. Write the **First Impression** using this structured critique format:
   - "The site communicates **[what]**." (what it says at a glance — competence? playfulness? confusion?)
   - "I notice **[observation]**." (what stands out, positive or negative — be specific)
-   - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these intentional?)
+   - "The first 3 things my eye goes to are: **[1]**, **[2]**, **[3]**." (hierarchy check — are these the 3 things the designer intended? If not, the visual hierarchy is lying.)
   - "If I had to describe this in one word: **[word]**." (gut verdict)

+**Narration mode:** Write this section in first person, as if you are a user scanning the page for the first time. "I'm looking at this page... my eye goes to the logo, then a wall of text I skip entirely, then... wait, is that a button?" Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually scanning, you're generating platitudes.
+
+**Page Area Test:** Point at each clearly defined area of the page. Can you instantly name its purpose? ("Things I can buy," "Today's deals," "How to search.") Areas you can't name in 2 seconds are poorly defined. List them.
+
 This is the section users read first. Be opinionated. A designer doesn't hedge — they react.

 ---
@@ -157,6 +161,19 @@ $B url
 \`\`\`
 If URL contains \`/login\`, \`/signin\`, \`/auth\`, or \`/sso\`: the site requires authentication. AskUserQuestion: "This site requires authentication. Want to import cookies from your browser? Run \`/setup-browser-cookies\` first if needed."

+### Trunk Test (run on every page)
+
+Imagine being dropped on this page with no context. Can you immediately answer:
+1. What site is this? (Site ID visible and identifiable)
+2. What page am I on? (Page name prominent, matches what I clicked)
+3. What are the major sections? (Primary nav visible and clear)
+4. What are my options at this level? (Local nav or content choices obvious)
+5. Where am I in the scheme of things? ("You are here" indicator, breadcrumbs)
+6. How can I search? (Search box findable without hunting)
+
+Score: PASS (all 6 clear) / PARTIAL (4-5 clear) / FAIL (3 or fewer clear).
+A FAIL on the trunk test is a HIGH-impact finding regardless of how polished the visual design is.
+
 ### Design Audit Checklist (10 categories, ~80 items)

 Apply these at each page. Each finding gets an impact rating (high/medium/polish) and category.
@@ -225,6 +242,7 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish
 - Success: confirmation animation or color, auto-dismiss
 - Touch targets >= 44px on all interactive elements
 - \`cursor: pointer\` on all clickable elements
+- Mindless choice audit: every decision point (button, link, dropdown, modal choice) is a mindless click (obvious what happens). If a click requires thought about whether it's the right choice, flag as HIGH.

 **6. Responsive Design** (8 items)
 - Mobile layout makes *design* sense (not just stacked desktop columns)
@@ -253,6 +271,9 @@ Apply these at each page. Each finding gets an impact rating (high/medium/polish
 - Active voice ("Install the CLI" not "The CLI will be installed")
 - Loading states end with \`…\` ("Saving…" not "Saving...")
 - Destructive actions have confirmation modal or undo window
+- Happy talk detection: scan for introductory paragraphs that start with "Welcome to..." or tell users how great the site is. If you can hear "blah blah blah", it's happy talk. Flag for removal.
+- Instructions detection: any visible instructions longer than one sentence. If users need to read instructions, the design has failed. Flag the instructions AND the interaction they're compensating for.
+- Happy talk word count: count total visible words on the page. Classify each text block as "useful content" vs "happy talk" (welcome paragraphs, self-congratulatory text, instructions nobody reads). Report: "This page has X words. Y (Z%) are happy talk."

 **9. AI Slop Detection** (10 anti-patterns — the blacklist)

@@ -286,6 +307,43 @@ Evaluate:
 - **Feedback clarity:** Did the action clearly succeed or fail? Is the feedback immediate?
 - **Form polish:** Focus states visible? Validation timing correct? Errors near the source?

+**Narration mode:** Narrate the flow in first person. "I click 'Sign Up'... spinner appears... 3 seconds pass... still spinning... I'm getting nervous. Finally the dashboard loads, but where am I? The nav doesn't highlight anything." Name the specific element, its position, its visual weight. If you can't name it specifically, you're not actually experiencing the flow, you're generating platitudes.
+
+### Goodwill Reservoir (track across the flow)
+
+As you walk the user flow, maintain a mental goodwill meter (starts at 70/100).
+These scores are heuristic, not measured. The value is in identifying specific
+drains and fills, not in the final number.
+
+Subtract points for:
+- Hidden information the user would want (pricing, contact, shipping): subtract 15
+- Format punishment (rejecting valid input like dashes in phone numbers): subtract 10
+- Unnecessary information requests: subtract 10
+- Interstitials, splash screens, forced tours blocking the task: subtract 15
+- Sloppy or unprofessional appearance: subtract 10
+- Ambiguous choices that require thinking: subtract 5 each
+
+Add points for:
+- Top user tasks are obvious and prominent: add 10
+- Upfront about costs and limitations: add 5
+- Saves steps (direct links, smart defaults, autofill): add 5 each
+- Graceful error recovery with specific fix instructions: add 10
+- Apologizes when things go wrong: add 5
+
+Report the final goodwill score with a visual dashboard:
+
+\`\`\`
+Goodwill: 70 ████████████████████░░░░░░░░░░
+  Step 1: Login page        70 → 75  (+5 obvious primary action)
+  Step 2: Dashboard          75 → 60  (-15 interstitial tour popup)
+  Step 3: Settings           60 → 50  (-10 format punishment on phone)
+  Step 4: Billing            50 → 35  (-15 hidden pricing info)
+  FINAL: 35/100 ⚠️ CRITICAL UX DEBT
+\`\`\`
+
+Below 30 = critical UX debt. 30-60 = needs work. Above 60 = healthy.
+Include the biggest drains and fills as specific findings.
+
 ---

 ## Phase 5: Cross-Page Consistency
@@ -716,6 +774,10 @@ ${litmusItems}
 - One job per section
 - "If deleting 30% of the copy improves it, keep deleting"
 - Cards earn their existence — no decorative card grids
+- NEVER use small, low-contrast type (body text < 16px or contrast ratio < 4.5:1 on body text)
+- NEVER put labels inside form fields as the only label (placeholder-as-label pattern — labels must be visible when the field has content)
+- ALWAYS preserve visited vs unvisited link distinction (visited links must have a different color)
+- NEVER float headings between paragraphs (heading must be visually closer to the section it introduces than to the preceding section)

 **AI Slop blacklist** (the 10 patterns that scream "AI-generated"):
 ${slopItems}
@@ -990,3 +1052,91 @@ the legacy approved.json aggregate — \`${ctx.paths.binDir}/gstack-taste-update
 will migrate it to schema v1 on the next write.`;
 }

+// ─── UX Behavioral Foundations (Krug + HCI research) ───
+export function generateUXPrinciples(_ctx: TemplateContext): string {
+  return `## UX Principles: How Users Actually Behave
+
+These principles govern how real humans interact with interfaces. They are observed
+behavior, not preferences. Apply them before, during, and after every design decision.
+
+### The Three Laws of Usability
+
+1. **Don't make me think.** Every page should be self-evident. If a user stops
+   to think "What do I click?" or "What does this mean?", the design has failed.
+   Self-evident > self-explanatory > requires explanation.
+
+2. **Clicks don't matter, thinking does.** Three mindless, unambiguous clicks
+   beat one click that requires thought. Each step should feel like an obvious
+   choice (animal, vegetable, or mineral), not a puzzle.
+
+3. **Omit, then omit again.** Get rid of half the words on each page, then get
+   rid of half of what's left. Happy talk (self-congratulatory text) must die.
+   Instructions must die. If they need reading, the design has failed.
+
+### How Users Actually Behave
+
+- **Users scan, they don't read.** Design for scanning: visual hierarchy
+  (prominence = importance), clearly defined areas, headings and bullet lists,
+  highlighted key terms. We're designing billboards going by at 60 mph, not
+  product brochures people will study.
+- **Users satisfice.** They pick the first reasonable option, not the best.
+  Make the right choice the most visible choice.
+- **Users muddle through.** They don't figure out how things work. They wing
+  it. If they accomplish their goal by accident, they won't seek the "right" way.
+  Once they find something that works, no matter how badly, they stick to it.
+- **Users don't read instructions.** They dive in. Guidance must be brief,
+  timely, and unavoidable, or it won't be seen.
+
+### Billboard Design for Interfaces
+
+- **Use conventions.** Logo top-left, nav top/left, search = magnifying glass.
+  Don't innovate on navigation to be clever. Innovate when you KNOW you have a
+  better idea, otherwise use conventions. Even across languages and cultures,
+  web conventions let people identify the logo, nav, search, and main content.
+- **Visual hierarchy is everything.** Related things are visually grouped. Nested
+  things are visually contained. More important = more prominent. If everything
+  shouts, nothing is heard. Start with the assumption everything is visual noise,
+  guilty until proven innocent.
+- **Make clickable things obviously clickable.** No relying on hover states for
+  discoverability, especially on mobile where hover doesn't exist. Shape, location,
+  and formatting (color, underlining) must signal clickability without interaction.
+- **Eliminate noise.** Three sources: too many things shouting for attention
+  (shouting), things not organized logically (disorganization), and too much stuff
+  (clutter). Fix noise by removal, not addition.
+- **Clarity trumps consistency.** If making something significantly clearer
+  requires making it slightly inconsistent, choose clarity every time.
+
+### Navigation as Wayfinding
+
+Users on the web have no sense of scale, direction, or location. Navigation
+must always answer: What site is this? What page am I on? What are the major
+sections? What are my options at this level? Where am I? How can I search?
+
+Persistent navigation on every page. Breadcrumbs for deep hierarchies.
+Current section visually indicated. The "trunk test": cover everything except
+the navigation. You should still know what site this is, what page you're on,
+and what the major sections are. If not, the navigation has failed.
+
+### The Goodwill Reservoir
+
+Users start with a reservoir of goodwill. Every friction point depletes it.
+
+**Deplete faster:** Hiding info users want (pricing, contact, shipping). Punishing
+users for not doing things your way (formatting requirements on phone numbers).
+Asking for unnecessary information. Putting sizzle in their way (splash screens,
+forced tours, interstitials). Unprofessional or sloppy appearance.
+
+**Replenish:** Know what users want to do and make it obvious. Tell them what they
+want to know upfront. Save them steps wherever possible. Make it easy to recover
+from errors. When in doubt, apologize.
+
+### Mobile: Same Rules, Higher Stakes
+
+All the above applies on mobile, just more so. Real estate is scarce, but never
+sacrifice usability for space savings. Affordances must be VISIBLE: no cursor
+means no hover-to-discover. Touch targets must be big enough (44px minimum).
+Flat design can strip away useful visual information that signals interactivity.
+Prioritize ruthlessly: things needed in a hurry go close at hand, everything
+else a few taps away with an obvious path to get there.`;
+}
+
@@ -0,0 +1,70 @@
+/**
+ * GBrain resolver — brain-first lookup and save-to-brain for thinking skills.
+ *
+ * GBrain is a "mod" for gstack. When installed, coding skills become brain-aware:
+ * they search the brain for context before starting and save results after finishing.
+ *
+ * These resolvers are suppressed on hosts that don't support brain features
+ * (via suppressedResolvers in each host config). For those hosts,
+ * {{GBRAIN_CONTEXT_LOAD}} and {{GBRAIN_SAVE_RESULTS}} resolve to empty string.
+ *
+ * Compatible with GBrain >= v0.10.0 (search CLI, doctor --fast --json, entity enrichment).
+ */
+import type { TemplateContext } from './types';
+
+export function generateGBrainContextLoad(ctx: TemplateContext): string {
+  let base = `## Brain Context Load
+
+Before starting this skill, search your brain for relevant context:
+
+1. Extract 2-4 keywords from the user's request (nouns, error names, file paths, technical terms).
+   Search GBrain: \`gbrain search "keyword1 keyword2"\`
+   Example: for "the login page is broken after deploy", search \`gbrain search "login broken deploy"\`
+   Search returns lines like: \`[slug] Title (score: 0.85) - first line of content...\`
+2. If few results, broaden to the single most specific keyword and search again.
+3. For each result page, read it: \`gbrain get_page "<page_slug>"\`
+   Read the top 3 pages for context.
+4. Use this brain context to inform your analysis.
+
+If GBrain is not available or returns no results, proceed without brain context.
+Any non-zero exit code from gbrain commands should be treated as a transient failure.`;
+
+  if (ctx.skillName === 'investigate') {
+    base += `\n\nIf the user's request is about tracking, extracting, or researching structured data (e.g., "track this data", "extract from emails", "build a tracker"), route to GBrain's data-research skill instead: \`gbrain call data-research\`. This skill has a 7-phase pipeline optimized for structured data extraction.`;
+  }
+
+  return base;
+}
+
+export function generateGBrainSaveResults(ctx: TemplateContext): string {
+  const skillSaveMap: Record<string, string> = {
+    'office-hours': 'Save the design document as a brain page:\n```bash\ngbrain put_page --title "Office Hours: <project name>" --tags "design-doc,<project-slug>" <<\'EOF\'\n<design doc content in markdown>\nEOF\n```',
+    'investigate': 'Save the root cause analysis as a brain page:\n```bash\ngbrain put_page --title "Investigation: <issue summary>" --tags "investigation,<affected-files>" <<\'EOF\'\n<investigation findings in markdown>\nEOF\n```',
+    'plan-ceo-review': 'Save the CEO plan as a brain page:\n```bash\ngbrain put_page --title "CEO Plan: <feature name>" --tags "ceo-plan,<feature-slug>" <<\'EOF\'\n<scope decisions and vision in markdown>\nEOF\n```',
+    'retro': 'Save the retrospective as a brain page:\n```bash\ngbrain put_page --title "Retro: <date range>" --tags "retro,<date>" <<\'EOF\'\n<retro output in markdown>\nEOF\n```',
+    'plan-eng-review': 'Save the architecture decisions as a brain page:\n```bash\ngbrain put_page --title "Eng Review: <feature name>" --tags "eng-review,<feature-slug>" <<\'EOF\'\n<review findings and decisions in markdown>\nEOF\n```',
+    'ship': 'Save the release notes as a brain page:\n```bash\ngbrain put_page --title "Release: <version>" --tags "release,<version>" <<\'EOF\'\n<changelog entry and deploy details in markdown>\nEOF\n```',
+    'cso': 'Save the security audit as a brain page:\n```bash\ngbrain put_page --title "Security Audit: <date>" --tags "security-audit,<date>" <<\'EOF\'\n<findings and remediation status in markdown>\nEOF\n```',
+    'design-consultation': 'Save the design system as a brain page:\n```bash\ngbrain put_page --title "Design System: <project name>" --tags "design-system,<project-slug>" <<\'EOF\'\n<design decisions in markdown>\nEOF\n```',
+  };
+
+  const saveInstruction = skillSaveMap[ctx.skillName] || 'Save the skill output as a brain page if the results are worth preserving:\n```bash\ngbrain put_page --title "<descriptive title>" --tags "<relevant,tags>" <<\'EOF\'\n<content in markdown>\nEOF\n```';
+
+  return `## Save Results to Brain
+
+After completing this skill, persist the results to your brain for future reference:
+
+${saveInstruction}
+
+After saving the page, extract and enrich mentioned entities: for each actual person name or company/organization name found in the output, \`gbrain search "<entity name>"\` to check if a page exists. If not, create a stub page:
+\`\`\`bash
+gbrain put_page --title "<Person or Company Name>" --tags "entity,person" --content "Stub page. Mentioned in <skill name> output."
+\`\`\`
+Only extract actual person names and company/organization names. Skip product names, section headings, technical terms, and file paths.
+
+Throttle errors appear as: exit code 1 with stderr containing "throttle", "rate limit", "capacity", or "busy". If GBrain returns a throttle or rate-limit error on any save operation, defer the save and move on. The brain is busy — the content is not lost, just not persisted this run. Any other non-zero exit code should also be treated as a transient failure.
+
+Add backlinks to related brain pages if they exist. If GBrain is not available, skip this step.
+
+After brain operations complete, note in your completion output: how many pages were found in the initial search, how many entities were enriched, and whether any operations were throttled. This helps the user see brain utilization over time.`;
+}
@@ -9,7 +9,7 @@ import type { TemplateContext, ResolverFn } from './types';
 import { generatePreamble } from './preamble';
 import { generateTestFailureTriage } from './preamble';
 import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup } from './browse';
-import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateTasteProfile } from './design';
+import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop, generateTasteProfile, generateUXPrinciples } from './design';
 import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
 import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift, generateCrossReviewDedup } from './review';
 import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
@@ -19,6 +19,7 @@ import { generateInvokeSkill } from './composition';
 import { generateReviewArmy } from './review-army';
 import { generateDxFramework } from './dx';
 import { generateModelOverlay } from './model-overlay';
+import { generateGBrainContextLoad, generateGBrainSaveResults } from './gbrain';

 export const RESOLVERS: Record<string, ResolverFn> = {
  SLUG_EVAL: generateSlugEval,
@@ -31,6 +32,7 @@ export const RESOLVERS: Record<string, ResolverFn> = {
  QA_METHODOLOGY: generateQAMethodology,
  DESIGN_METHODOLOGY: generateDesignMethodology,
  DESIGN_HARD_RULES: generateDesignHardRules,
+  UX_PRINCIPLES: generateUXPrinciples,
  DESIGN_OUTSIDE_VOICES: generateDesignOutsideVoices,
  DESIGN_REVIEW_LITE: generateDesignReviewLite,
  REVIEW_DASHBOARD: generateReviewDashboard,
@@ -66,4 +68,6 @@ export const RESOLVERS: Record<string, ResolverFn> = {
  MODEL_OVERLAY: generateModelOverlay,
  TASTE_PROFILE: generateTasteProfile,
  BIN_DIR: (ctx) => ctx.paths.binDir,
+  GBRAIN_CONTEXT_LOAD: generateGBrainContextLoad,
+  GBRAIN_SAVE_RESULTS: generateGBrainSaveResults,
 };
@@ -31,6 +31,9 @@ import { generateRoutingInjection } from './preamble/generate-routing-injection'
 import { generateVendoringDeprecation } from './preamble/generate-vendoring-deprecation';
 import { generateSpawnedSessionCheck } from './preamble/generate-spawned-session-check';

+// Host-specific instructions
+import { generateBrainHealthInstruction } from './preamble/generate-brain-health-instruction';
+
 // Behavioral / voice
 import { generateVoiceDirective } from './preamble/generate-voice-directive';

@@ -38,6 +41,7 @@ import { generateVoiceDirective } from './preamble/generate-voice-directive';
 import { generateContextRecovery } from './preamble/generate-context-recovery';
 import { generateAskUserFormat } from './preamble/generate-ask-user-format';
 import { generateCompletenessSection } from './preamble/generate-completeness-section';
+import { generateConfusionProtocol } from './preamble/generate-confusion-protocol';
 import { generateContinuousCheckpoint } from './preamble/generate-continuous-checkpoint';
 import { generateContextHealth } from './preamble/generate-context-health';

@@ -51,7 +55,7 @@ export { generateTestFailureTriage } from './preamble/generate-test-failure-tria
 // Preamble Composition (tier → sections)
 // ─────────────────────────────────────────────
 // T1: core + upgrade + lake + telemetry + voice(trimmed) + completion
-// T2: T1 + voice(full) + ask + completeness + context-recovery
+// T2: T1 + voice(full) + ask + completeness + context-recovery + confusion + checkpoint + context-health
 // T3: T2 + repo-mode + search
 // T4: (same as T3 — TEST_FAILURE_TRIAGE is a separate {{}} placeholder, not preamble)
 //
@@ -74,9 +78,17 @@ export function generatePreamble(ctx: TemplateContext): string {
    generateRoutingInjection(ctx),
    generateVendoringDeprecation(ctx),
    generateSpawnedSessionCheck(),
+    generateBrainHealthInstruction(ctx),
    generateModelOverlay(ctx),
    generateVoiceDirective(tier),
-    ...(tier >= 2 ? [generateContextRecovery(ctx), generateAskUserFormat(ctx), generateCompletenessSection(), generateContinuousCheckpoint(), generateContextHealth()] : []),
+    ...(tier >= 2 ? [
+      generateContextRecovery(ctx),
+      generateAskUserFormat(ctx),
+      generateCompletenessSection(),
+      generateConfusionProtocol(),
+      generateContinuousCheckpoint(),
+      generateContextHealth(),
+    ] : []),
    ...(tier >= 3 ? [generateRepoModeSection(), generateSearchBeforeBuildingSection(ctx)] : []),
    generateCompletionStatus(ctx),
  ];
@@ -0,0 +1,9 @@
+import type { TemplateContext } from '../types';
+
+export function generateBrainHealthInstruction(ctx: TemplateContext): string {
+  if (ctx.host !== 'gbrain' && ctx.host !== 'hermes') return '';
+  return `If \`BRAIN_HEALTH\` is shown and the score is below 50, tell the user which checks
+failed (shown in the output) and suggest: "Run \\\`gbrain doctor\\\` for full diagnostics."
+If the output is not valid JSON or health_score is missing, treat GBrain as unavailable
+and proceed without brain features this session.`;
+}
@@ -0,0 +1,14 @@
+export function generateConfusionProtocol(): string {
+  return `## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.`;
+}
@@ -92,7 +92,18 @@ _CHECKPOINT_PUSH=$(${ctx.paths.binDir}/gstack-config get checkpoint_push 2>/dev/
 echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE"
 echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH"
 # Detect spawned session (OpenClaw or other orchestrator)
-[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true
+[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true${ctx.host === 'gbrain' || ctx.host === 'hermes' ? `
+# GBrain health check (gbrain/hermes host only)
+if command -v gbrain &>/dev/null; then
+  _BRAIN_JSON=$(gbrain doctor --fast --json 2>/dev/null || echo '{}')
+  _BRAIN_SCORE=$(echo "$_BRAIN_JSON" | grep -o '"health_score":[0-9]*' | cut -d: -f2)
+  _BRAIN_FAILS=$(echo "$_BRAIN_JSON" | grep -o '"status":"fail"' | wc -l | tr -d ' ')
+  _BRAIN_WARNS=$(echo "$_BRAIN_JSON" | grep -o '"status":"warn"' | wc -l | tr -d ' ')
+  echo "BRAIN_HEALTH: \${_BRAIN_SCORE:-unknown} (\${_BRAIN_FAILS:-0} failures, \${_BRAIN_WARNS:-0} warnings)"
+  if [ "\${_BRAIN_SCORE:-100}" -lt 50 ] 2>/dev/null; then
+    echo "$_BRAIN_JSON" | grep -o '"name":"[^"]*","status":"[^"]*","message":"[^"]*"' || true
+  fi
+fi` : ''}
 \`\`\``;
 }