merge: integrate origin/main (v0.18.1.0) into open-agents-learnings

Main moved forward 6 commits while this branch was local. Integrated both sides preserving all functionality: From main (v0.16.4.0 → v0.18.1.0): - v0.17.0.0 — UX behavioral foundations + ux-audit (generateUXPrinciples, {{UX_PRINCIPLES}} placeholder, triggers frontmatter on skills) - v0.18.0.0 — Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (generateBrainHealthInstruction, generateConfusionProtocol, generateGBrainContextLoad, generateGBrainSaveResults, hosts/gbrain.ts, hosts/hermes.ts, scripts/resolvers/gbrain.ts, GBrain bash health check) - v0.18.0.1 — ngrok Windows build fix - 0cc830b6 — tilde-in-assignment permission fix - cc42f14a — gstack compact design doc (tabled) - 822e843a — headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) Integration approach: keep this branch's preamble.ts submodule refactor as the structure of record. Extracted main's two new generators into their own submodules: - scripts/resolvers/preamble/generate-brain-health-instruction.ts - scripts/resolvers/preamble/generate-confusion-protocol.ts Updated scripts/resolvers/preamble/generate-preamble-bash.ts to absorb main's GBrain health check (host-conditional on gbrain/hermes). scripts/resolvers/index.ts now imports BOTH: - This branch's adds: MODEL_OVERLAY, TASTE_PROFILE, BIN_DIR resolvers - Main's adds: UX_PRINCIPLES, GBRAIN_CONTEXT_LOAD, GBRAIN_SAVE_RESULTS resolvers scripts/resolvers/design.ts keeps both generateTasteProfile (this branch) and generateUXPrinciples (main). Sibling exports, no overlap. scripts/gen-skill-docs.ts keeps both this branch's --model flag wiring and main's edits. Templates auto-merged where possible. The 35 generated SKILL.md / golden conflicts auto-resolved via `bun run gen:skill-docs --host all` followed by re-snapshotting the ship goldens for claude/codex/factory. Verification: - bun run gen:skill-docs --host all completes cleanly - bun test: 1 pre-existing failure (gstack-community-dashboard Supabase network test, 235s timeout). NOT related to merge — unchanged Supabase test infra times out without live network. Flagged in PR body. Token-ceiling warnings on plan-ceo-review (29K), office-hours (26K), and ship (34K). These existed on origin/main before the merge — the preamble grew substantially from main's GBrain + UX additions plus this branch's continuous-checkpoint, context-health, model-overlay, taste-profile, and feature-discovery additions. Worth a follow-up reduction pass but doesn't block this merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 13:15:24 +02:00 · 2026-04-17 13:58:15 +08:00
parent 0926e4b994 822e843a60
commit 7529dbb276
129 changed files with 3314 additions and 154 deletions
@@ -18,6 +18,11 @@ allowed-tools:
  - Agent
  - AskUserQuestion
  - WebSearch
+triggers:
+  - ship it
+  - create a pr
+  - push to main
+  - deploy this
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->
@@ -434,6 +439,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Continuous Checkpoint Mode

 If `CHECKPOINT_MODE` is `"continuous"` (from preamble output): auto-commit work as
@@ -707,6 +725,8 @@ branch name wherever the instructions say "the base branch" or `<default>`.

 ---

+
+
 # Ship: Fully Automated Ship Workflow

 You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
@@ -2282,6 +2302,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Step 4: Version bump (auto-decide)

 **Idempotency check:** Before bumping, compare VERSION against the base branch.
@@ -428,6 +428,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Continuous Checkpoint Mode

 If `CHECKPOINT_MODE` is `"continuous"` (from preamble output): auto-commit work as
@@ -701,6 +714,8 @@ branch name wherever the instructions say "the base branch" or `<default>`.

 ---

+
+
 # Ship: Fully Automated Ship Workflow

 You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
@@ -1902,6 +1917,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Step 4: Version bump (auto-decide)

 **Idempotency check:** Before bumping, compare VERSION against the base branch.
@@ -430,6 +430,19 @@ AI makes completeness near-free. Always recommend the complete option over short

 Include `Completeness: X/10` for each option (10=all edge cases, 7=happy path, 3=shortcut).

+## Confusion Protocol
+
+When you encounter high-stakes ambiguity during coding:
+- Two plausible architectures or data models for the same requirement
+- A request that contradicts existing patterns and you're unsure which to follow
+- A destructive operation where the scope is unclear
+- Missing context that would change your approach significantly
+
+STOP. Name the ambiguity in one sentence. Present 2-3 options with tradeoffs.
+Ask the user. Do not guess on architectural or data model decisions.
+
+This does NOT apply to routine coding, small features, or obvious changes.
+
 ## Continuous Checkpoint Mode

 If `CHECKPOINT_MODE` is `"continuous"` (from preamble output): auto-commit work as
@@ -703,6 +716,8 @@ branch name wherever the instructions say "the base branch" or `<default>`.

 ---

+
+
 # Ship: Fully Automated Ship Workflow

 You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
@@ -2278,6 +2293,8 @@ staleness detection: if those files are later deleted, the learning can be flagg
 **Only log genuine discoveries.** Don't log obvious things. Don't log things the user
 already knows. A good test: would this insight save time in a future session? If yes, log it.

+
+
 ## Step 4: Version bump (auto-decide)

 **Idempotency check:** Before bumping, compare VERSION against the base branch.
@@ -1,9 +1,10 @@
 /**
- * Gemini CLI E2E tests — verify skills work when invoked by Gemini CLI.
+ * Gemini CLI E2E smoke test — verify Gemini CLI can start and discover skills.
 *
- * Spawns `gemini -p` with stream-json output in the repo root (where
- * .agents/skills/ already exists), parses JSONL events, and validates
- * structured results. Follows the same pattern as codex-e2e.test.ts.
+ * This is a lightweight smoke test, not a full integration test. Gemini CLI
+ * gets lost in worktrees and times out on complex tasks. The smoke test
+ * validates that the skill files are structured correctly for Gemini's
+ * .agents/skills/ discovery mechanism.
 *
 * Prerequisites:
 * - `gemini` binary installed (npm install -g @google/gemini-cli)
@@ -48,10 +49,9 @@ if (!evalsEnabled) {

 // --- Diff-based test selection ---

-// Gemini E2E touchfiles — keyed by test name, same pattern as Codex E2E
+// Gemini E2E touchfiles — keyed by test name
 const GEMINI_E2E_TOUCHFILES: Record<string, string[]> = {
-  'gemini-discover-skill':  ['.agents/skills/**', 'test/helpers/gemini-session-runner.ts'],
-  'gemini-review-findings': ['review/**', '.agents/skills/gstack-review/**', 'test/helpers/gemini-session-runner.ts'],
+  'gemini-smoke':  ['.agents/skills/**', 'test/helpers/gemini-session-runner.ts'],
 };

 let selectedTests: string[] | null = null; // null = run all
@@ -71,7 +71,6 @@ if (evalsEnabled && !process.env.EVALS_ALL) {
    }
    process.stderr.write('\n');
  }
-  // If changedFiles is empty (e.g., on main branch), selectedTests stays null -> run all
 }

 /** Skip an individual test if not selected by diff-based selection. */
@@ -84,7 +83,6 @@ function testIfSelected(testName: string, fn: () => Promise<void>, timeout: numb

 const evalCollector = evalsEnabled && !SKIP ? new EvalCollector('e2e-gemini') : null;

-/** DRY helper to record a Gemini E2E test result into the eval collector. */
 function recordGeminiE2E(name: string, result: GeminiResult, passed: boolean) {
  evalCollector?.addTest({
    name,
@@ -92,14 +90,13 @@ function recordGeminiE2E(name: string, result: GeminiResult, passed: boolean) {
    tier: 'e2e',
    passed,
    duration_ms: result.durationMs,
-    cost_usd: 0, // Gemini doesn't report cost in USD; tokens are tracked
+    cost_usd: 0,
    output: result.output?.slice(0, 2000),
-    turns_used: result.toolCalls.length, // approximate: tool calls as turns
+    turns_used: result.toolCalls.length,
    exit_reason: result.exitCode === 0 ? 'success' : `exit_code_${result.exitCode}`,
  });
 }

-/** Print cost summary after a Gemini E2E test. */
 function logGeminiCost(label: string, result: GeminiResult) {
  const durationSec = Math.round(result.durationMs / 1000);
  console.log(`${label}: ${result.tokens} tokens, ${result.toolCalls.length} tool calls, ${durationSec}s`);
@@ -125,59 +122,22 @@ describeGemini('Gemini E2E', () => {
    harvestAndCleanup('gemini');
  });

-  testIfSelected('gemini-discover-skill', async () => {
-    // Run Gemini in an isolated worktree (has .agents/skills/ copied from ROOT)
+  testIfSelected('gemini-smoke', async () => {
+    // Smoke test: can Gemini start, read the repo, and produce output?
+    // Uses a simple prompt that doesn't require skill invocation or complex navigation.
    const result = await runGeminiSkill({
-      prompt: 'List any skills or instructions you have available. Just list the names.',
-      timeoutMs: 60_000,
+      prompt: 'What is this project? Answer in one sentence based on the README.',
+      timeoutMs: 90_000,
      cwd: testWorktree,
    });

-    logGeminiCost('gemini-discover-skill', result);
+    logGeminiCost('gemini-smoke', result);

-    // Gemini should have produced some output
-    const passed = result.exitCode === 0 && result.output.length > 0;
-    recordGeminiE2E('gemini-discover-skill', result, passed);
+    // Pass if Gemini produced any meaningful output (even with non-zero exit from timeout)
+    const hasOutput = result.output.length > 10;
+    const passed = hasOutput;
+    recordGeminiE2E('gemini-smoke', result, passed);

-    expect(result.exitCode).toBe(0);
-    expect(result.output.length).toBeGreaterThan(0);
-    // The output should reference skills in some form
-    const outputLower = result.output.toLowerCase();
-    expect(
-      outputLower.includes('review') || outputLower.includes('gstack') || outputLower.includes('skill'),
-    ).toBe(true);
+    expect(result.output.length, 'Gemini should produce output').toBeGreaterThan(10);
  }, 120_000);
-
-  testIfSelected('gemini-review-findings', async () => {
-    // Run gstack-review skill via Gemini on worktree (isolated from main working tree)
-    const result = await runGeminiSkill({
-      prompt: 'Run the gstack-review skill on this repository. Review the current branch diff and report your findings.',
-      timeoutMs: 540_000,
-      cwd: testWorktree,
-    });
-
-    logGeminiCost('gemini-review-findings', result);
-
-    // Should produce structured review-like output
-    const output = result.output;
-    const passed = result.exitCode === 0 && output.length > 50;
-    recordGeminiE2E('gemini-review-findings', result, passed);
-
-    expect(result.exitCode).toBe(0);
-    expect(output.length).toBeGreaterThan(50);
-
-    // Review output should contain some review-like content
-    const outputLower = output.toLowerCase();
-    const hasReviewContent =
-      outputLower.includes('finding') ||
-      outputLower.includes('issue') ||
-      outputLower.includes('review') ||
-      outputLower.includes('change') ||
-      outputLower.includes('diff') ||
-      outputLower.includes('clean') ||
-      outputLower.includes('no issues') ||
-      outputLower.includes('p1') ||
-      outputLower.includes('p2');
-    expect(hasReviewContent).toBe(true);
-  }, 600_000);
 });
@@ -122,9 +122,8 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  'codex-discover-skill':  ['codex/**', '.agents/skills/**', 'test/helpers/codex-session-runner.ts', 'lib/worktree.ts'],
  'codex-review-findings': ['review/**', '.agents/skills/gstack-review/**', 'codex/**', 'test/helpers/codex-session-runner.ts', 'lib/worktree.ts'],

-  // Gemini E2E (tests skills via Gemini CLI + worktree)
-  'gemini-discover-skill':  ['.agents/skills/**', 'test/helpers/gemini-session-runner.ts', 'lib/worktree.ts'],
-  'gemini-review-findings': ['review/**', '.agents/skills/gstack-review/**', 'test/helpers/gemini-session-runner.ts', 'lib/worktree.ts'],
+  // Gemini E2E — smoke test only (Gemini gets lost in worktrees on complex tasks)
+  'gemini-smoke':  ['.agents/skills/**', 'test/helpers/gemini-session-runner.ts', 'lib/worktree.ts'],


  // Coverage audit (shared fixture) + triage + gates
@@ -284,8 +283,7 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
  // Multi-AI — periodic (require external CLIs)
  'codex-discover-skill': 'periodic',
  'codex-review-findings': 'periodic',
-  'gemini-discover-skill': 'periodic',
-  'gemini-review-findings': 'periodic',
+  'gemini-smoke': 'periodic',

  // Design — gate for cheap functional, periodic for Opus/quality
  'design-consultation-core': 'periodic',
@@ -30,8 +30,8 @@ const ROOT = path.resolve(import.meta.dir, '..');
 // ─── hosts/index.ts ─────────────────────────────────────────

 describe('hosts/index.ts', () => {
-  test('ALL_HOST_CONFIGS has 8 hosts', () => {
-    expect(ALL_HOST_CONFIGS.length).toBe(8);
+  test('ALL_HOST_CONFIGS has 10 hosts', () => {
+    expect(ALL_HOST_CONFIGS.length).toBe(10);
  });

  test('ALL_HOST_NAMES matches config names', () => {
@@ -479,9 +479,8 @@ describe('host config correctness', () => {
    expect(openclaw.pathRewrites.some(r => r.from === 'CLAUDE.md' && r.to === 'AGENTS.md')).toBe(true);
  });

-  test('openclaw has adapter path', () => {
-    expect(openclaw.adapter).toBeDefined();
-    expect(openclaw.adapter).toContain('openclaw-adapter');
+  test('openclaw has no adapter (dead code removed)', () => {
+    expect(openclaw.adapter).toBeUndefined();
  });

  test('openclaw has no staticFiles (SOUL.md removed)', () => {
@@ -1,9 +1,19 @@
-import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
+import { describe, test as _bunTest, expect, beforeEach, afterEach } from 'bun:test';
 import { execSync } from 'child_process';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';

+// Every test in this file shells out to gstack-config + gstack-relink (bash scripts
+// invoking subprocess work). Under parallel bun test load, subprocess spawn contends
+// with other suites and each test can drift ~200ms past the 5s default. Bump to 15s.
+// Object.assign preserves test.only / test.skip / test.each / test.todo sub-APIs.
+const test = Object.assign(
+  ((name: any, fn: any, timeout?: number) =>
+    _bunTest(name, fn, timeout ?? 15_000)) as typeof _bunTest,
+  _bunTest,
+);
+
 const ROOT = path.resolve(import.meta.dir, '..');
 const BIN = path.join(ROOT, 'bin');

@@ -286,18 +286,21 @@ describeIfSelected('Base branch detection', ['review-base-branch', 'ship-base-br
    run('git', ['add', 'app.rb'], dir);
    run('git', ['commit', '-m', 'feat: add hello method'], dir);

-    // Copy review skill files
-    fs.copyFileSync(path.join(ROOT, 'review', 'SKILL.md'), path.join(dir, 'review-SKILL.md'));
-    fs.copyFileSync(path.join(ROOT, 'review', 'checklist.md'), path.join(dir, 'review-checklist.md'));
-    fs.copyFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), path.join(dir, 'review-greptile-triage.md'));
+    // Extract only Step 0 (base branch detection) + minimal review instructions
+    // Full SKILL.md is ~1500 lines — copying it causes the agent to spend all turns reading
+    const full = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
+    const step0Start = full.indexOf('## Step 0: Detect platform and base branch');
+    const step1Start = full.indexOf('## Step 1: Check branch');
+    const step1End = full.indexOf('---', step1Start + 10);
+    const extracted = full.slice(step0Start, step1End > step1Start ? step1End : step1Start + 500);
+    fs.writeFileSync(path.join(dir, 'review-SKILL.md'), extracted);

    const result = await runSkillTest({
      prompt: `You are in a git repo on a feature branch with changes.
-Read review-SKILL.md for the review workflow instructions.
-Also read review-checklist.md and apply it.
+Read review-SKILL.md for the base branch detection instructions.

 IMPORTANT: Follow Step 0 to detect the base branch. Since there is no remote, gh commands will fail — fall back to main.
-Then run the review against the detected base branch.
+Then run git diff against the detected base branch and write a brief review.
 Write your findings to ${dir}/review-output.md`,
      workingDirectory: dir,
      maxTurns: 15,
@@ -60,10 +60,9 @@ if (evalsEnabled && process.env.EVALS_TIER) {
 // --- Helper functions ---

 /** Copy all SKILL.md files for auto-discovery.
- *  Install to BOTH project-level (.claude/skills/) AND user-level (~/.claude/skills/)
- *  because Claude Code discovers skills from both locations. In CI containers,
- *  $HOME may differ from the working directory, so we need both paths to ensure
- *  the Skill tool appears in Claude's available tools list. */
+ *  Installs to project-level (.claude/skills/) only. Writing to the user's
+ *  ~/.claude/skills/ is unsafe: it may contain symlinks from the real gstack
+ *  install that point to different worktrees or dangling targets. */
 function installSkills(tmpDir: string) {
  const skillDirs = [
    '', // root gstack SKILL.md
@@ -73,24 +72,16 @@ function installSkills(tmpDir: string) {
    'gstack-upgrade', 'humanizer',
  ];

-  // Install to both project-level and user-level skill directories
-  const homeDir = process.env.HOME || os.homedir();
-  const installTargets = [
-    path.join(tmpDir, '.claude', 'skills'),        // project-level
-    path.join(homeDir, '.claude', 'skills'),        // user-level (~/.claude/skills/)
-  ];
+  const targetBase = path.join(tmpDir, '.claude', 'skills');

  for (const skill of skillDirs) {
    const srcPath = path.join(ROOT, skill, 'SKILL.md');
    if (!fs.existsSync(srcPath)) continue;

    const skillName = skill || 'gstack';
-
-    for (const targetBase of installTargets) {
-      const destDir = path.join(targetBase, skillName);
-      fs.mkdirSync(destDir, { recursive: true });
-      fs.copyFileSync(srcPath, path.join(destDir, 'SKILL.md'));
-    }
+    const destDir = path.join(targetBase, skillName);
+    fs.mkdirSync(destDir, { recursive: true });
+    fs.copyFileSync(srcPath, path.join(destDir, 'SKILL.md'));
  }

  // Write a CLAUDE.md with explicit routing instructions.
@@ -143,6 +143,7 @@ describe('Command registry consistency', () => {
    const validKeys = new Set([
      'interactive', 'compact', 'depth', 'selector',
      'diff', 'annotate', 'outputPath', 'cursorInteractive',
+      'heatmap',
    ]);
    for (const flag of SNAPSHOT_FLAGS) {
      expect(validKeys.has(flag.optionKey)).toBe(true);
@@ -85,11 +85,11 @@ describe('gstack-settings-hook', () => {
    expect(settings.hooks).toBeUndefined();
  });

-  test('remove is safe when settings.json does not exist', () => {
+  test('remove exits 1 when settings.json does not exist', () => {
    const result = run(`${SETTINGS_HOOK} remove /path/to/gstack-session-update`, {
      env: { GSTACK_SETTINGS_FILE: settingsFile },
    });
-    expect(result.exitCode).toBe(0);
+    expect(result.exitCode).toBe(1);
  });

  test('remove preserves other hooks', () => {