v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806)

* feat(test): transcript-section-logger + ship-action fingerprint (T10) Pure-analysis module over a SkillTestResult/NDJSON transcript: - extractSectionReads(): which sections/*.md a run opened (post-carve check) - extractShipActions(): observable action fingerprint (merge/test/bump/ changelog/commit/push/pr) that works on the MONOLITH too, so a baseline captured before the carve can detect a sectioned-ship regression - baseline read/write + compareShipActions() for baseline-first dogf(T10) Baseline-first answers the Codex outside-voice critique that a logger in the same PR as the carve is post-failure telemetry without a pre-carve reference. 11 unit tests, all green. Paid monolith baseline capture runs separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(pipeline): section discovery + generation machinery (T9) - discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl - gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext as shared helpers (processTemplate and the new processSectionTemplate both call them, so a sanitization/rewrite fix can't miss sections) [C1] - processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice), parent-skill TemplateContext (skillName pinned to parent, not 'sections', so appliesTo gating + tier behave identically), per-host output routing - --host all now fails the build on ANY host failure, not just claude, so a stale external-host output can't slip the freshness gate [Codex outside-voice #9] Inert until a skill is carved (no sections/ dirs exist yet). Refactor is output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE. 5 discovery unit tests + 389 gen-skill-docs tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9) Two install targets cherry-pick SKILL.md and would leave a carved skill's sections/ behind, 404ing a runtime 'Read sections/<name>.md': - link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows gets a fresh copy on every ./setup) - kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under ~/.kiro, not ~/.codex/~/.claude codex/factory/opencode link the whole generated dir, so sections ride free. Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a skill is carved. Static-tripwire test + windows-fallback invariant green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9) Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a tested CLI instead of bash prose the agent re-derives each run. - classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION vs origin/<base>:VERSION vs package.json.version (pure reader) - write: validated dual-write to VERSION + package.json (FRESH bump) - repair: DRIFT_STALE_PKG sync, no re-bump Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from skippable prose into code that can't be skipped or misread. 15 tests (exhaustive state matrix + write/repair fs + real-git classify). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): sectioned-skill parity capability — guards the carve (T9) Carved skills (skeleton + sections/*.md) need parity checks that see relocated content, or moving a phrase into a section reads as 'lost': - readSkillForParity(): union skeleton + all sections/*.md - checkSkillParity sectioned mode: content checks against the union; minBytes/ maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a small skeleton would otherwise make the size floor toothless [Codex #12]. Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the same commit it lands. Monolith path byte-identical (verified: pre-existing investigate 1.053 ratio drift fails the same with this change stashed). 7 sectioned-parity tests + existing parity tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(ship): carve into skeleton + on-demand sections (Claude) (T9) ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving 8 prose-heavy steps into ship/sections/*.md, read on demand: tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body. Step 12's version logic now calls the tested gstack-version-bump CLI instead of inline bash. Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton + generated section files) and INLINES the content on every other host, so external hosts keep the full monolith — verified factory at 162KB with no sections dir. {{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures. Multi-pass resolve expands inlined sections' own resolvers. Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/ golden/plan-completion/#1539/size-budget tests via skeleton+sections union reads. Free suite green except the pre-existing investigate parity drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): manifest-consistency + context-parity + requiredReads helper (T9) Free deterministic guards for the carve: - required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the mechanical layer-5 check that the agent Read the sections its situation needs (required set comes from the fixture, not the passive manifest) - section-manifest-consistency: 3-tier orphan classification (generated orphan + hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and pins the PASSIVE-manifest contract (no applies_when/required_for) - template-context-parity: generated sections have zero unresolved placeholders and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW) rendered — proving sections resolve with the parent skillName, not 'sections' 16 tests, all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): section-loading E2E + idempotency CLI detection (T9) - skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan mode against a fresh version-changing fixture and asserts the agent Read the required sections (review-army + changelog). Runs against the INSTALLED skill (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip. - skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12 now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a gstack-version-bump-write re-bump regression signal. - touchfiles: register ship-section-loading (periodic) + extend idempotency deps with bin/gstack-version-bump + scripts/resolvers/sections.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): union-read redaction wiring test for the carve (T9) main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the carve, not the skeleton template. Read skeleton + section templates union so the redaction-wiring assertions follow the relocated content. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-17 21:17:23 +02:00 · 2026-05-30 12:09:10 -07:00
parent 9562ad4e70
commit 46c1fae7f1
51 changed files with 4445 additions and 4891 deletions
@@ -11,7 +11,7 @@

 import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
 import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
-import { discoverTemplates } from './discover-skills';
+import { discoverTemplates, discoverSectionTemplates } from './discover-skills';
 import { writeLlmsTxt } from './gen-llms-txt';
 import * as fs from 'fs';
 import * as path from 'path';
@@ -574,6 +574,102 @@ function extractHookSafetyProse(tmplContent: string): string | null {

 const GENERATED_HEADER = `<!-- AUTO-GENERATED from {{SOURCE}} — do not edit directly -->\n<!-- Regenerate: bun run gen:skill-docs -->\n`;

+/**
+ * Apply a host's configured path + tool rewrites. Extracted so both SKILL.md
+ * (via processExternalHost) and section files (via processSectionTemplate) get
+ * identical per-host treatment — a section's cross-references must rewrite the
+ * same way the parent skill's do, or external hosts get wrong paths.
+ */
+function applyHostRewrites(content: string, hostConfig: HostConfig): string {
+  let result = content;
+  for (const rewrite of hostConfig.pathRewrites) {
+    result = result.replaceAll(rewrite.from, rewrite.to);
+  }
+  if (hostConfig.toolRewrites) {
+    for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
+      result = result.replaceAll(from, to);
+    }
+  }
+  return result;
+}
+
+/**
+ * Resolve {{PLACEHOLDER}} / {{NAME:arg}} tokens against the RESOLVERS registry,
+ * honoring host suppression and appliesTo gating, then assert nothing is left
+ * unresolved. Extracted so SKILL.md and section templates resolve through the
+ * exact same path — a security/sanitization fix to one can't miss the other.
+ */
+function resolvePlaceholders(
+  tmplContent: string,
+  ctx: TemplateContext,
+  hostConfig: HostConfig,
+  relTmplPath: string,
+): string {
+  // effectiveSuppressedResolvers() honors --respect-detection: when gbrain is
+  // detected locally, GBRAIN_* resolvers un-suppress. Shared by SKILL.md and
+  // section generation so both paths get the same gbrain-aware behavior.
+  const suppressed = effectiveSuppressedResolvers(hostConfig);
+  const onePass = (input: string): string =>
+    input.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (_match, fullKey) => {
+      const parts = fullKey.split(':');
+      const resolverName = parts[0];
+      const args = parts.slice(1);
+      if (suppressed.has(resolverName)) return '';
+      const entry = RESOLVERS[resolverName];
+      if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
+      const { resolve, appliesTo } = unwrapResolver(entry);
+      if (appliesTo && !appliesTo(ctx)) return '';
+      return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
+    });
+
+  // Multi-pass: a resolver may emit content that itself contains {{TOKENS}} — the
+  // {{SECTION:id}} resolver inlines a section template (with its own resolvers)
+  // for non-Claude hosts. .replace() doesn't re-scan inserted text, so loop until
+  // the output stabilizes. Bounded to avoid an infinite loop if a resolver ever
+  // emits its own placeholder; 6 passes is far more nesting than any skill needs.
+  let content = tmplContent;
+  for (let pass = 0; pass < 6; pass++) {
+    const next = onePass(content);
+    if (next === content) break;
+    content = next;
+  }
+
+  const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
+  if (remaining) {
+    throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
+  }
+  return content;
+}
+
+/**
+ * Build the TemplateContext from a template's frontmatter. Shared by SKILL.md
+ * and section generation so sections inherit the SAME context the parent skill
+ * resolves with (skillName, tier, benefitsFrom, interactive) — enforced by
+ * test/template-context-parity.test.ts. skillNameOverride lets section
+ * generation pin the parent skill's name instead of deriving "sections".
+ */
+function buildContext(
+  tmplContent: string,
+  tmplPath: string,
+  host: Host,
+  skillNameOverride?: string,
+): TemplateContext {
+  const { name: extractedName } = extractNameAndDescription(tmplContent);
+  const skillName = skillNameOverride || extractedName || path.basename(path.dirname(tmplPath));
+  const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
+  const benefitsFrom = benefitsMatch
+    ? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
+    : undefined;
+  const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
+  const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
+  const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
+  const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
+  return {
+    skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host],
+    preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL,
+  };
+}
+
 /**
 * Process external host output: routing, frontmatter, path rewrites, metadata.
 * Shared between Codex and Factory (and future external hosts).
@@ -619,17 +715,9 @@ function processExternalHost(
    result = result.slice(0, bodyStart) + '\n' + safetyProse + '\n' + result.slice(bodyStart);
  }

-  // Config-driven path rewrites (order matters, replaceAll)
-  for (const rewrite of hostConfig.pathRewrites) {
-    result = result.replaceAll(rewrite.from, rewrite.to);
-  }
-
-  // Config-driven tool rewrites
-  if (hostConfig.toolRewrites) {
-    for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
-      result = result.replaceAll(from, to);
-    }
-  }
+  // Config-driven path + tool rewrites (shared with processSectionTemplate so
+  // section cross-references get the same per-host treatment as SKILL.md).
+  result = applyHostRewrites(result, hostConfig);

  // Config-driven: generate metadata (e.g., openai.yaml for Codex)
  if (hostConfig.generation.generateMetadata && !symlinkLoop) {
@@ -650,53 +738,18 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
  // Determine skill directory relative to ROOT
  const skillDir = path.relative(ROOT, path.dirname(tmplPath));

-  // Extract skill name from frontmatter early — needed for both TemplateContext and external host output paths.
-  // When frontmatter name: differs from directory name (e.g., run-tests/ with name: test),
-  // the frontmatter name is used for external skill naming and setup script symlinks.
+  // Extract name/description: name drives external skill naming + setup symlinks
+  // (and TemplateContext.skillName via buildContext); description feeds external
+  // host metadata. When frontmatter name: differs from directory name (e.g.
+  // run-tests/ with name: test), the frontmatter name wins.
  const { name: extractedName, description: extractedDescription } = extractNameAndDescription(tmplContent);
-  const skillName = extractedName || path.basename(path.dirname(tmplPath));

-
-  // Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
-  const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
-  const benefitsFrom = benefitsMatch
-    ? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
-    : undefined;
-
-  // Extract preamble-tier from frontmatter (1-4, controls which preamble sections are included)
-  const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
-  const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
-
-  // Extract interactive flag from frontmatter (generator-only; controls plan-mode handshake inclusion)
-  const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
-  const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
-
-  const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL };
-
-  // Replace placeholders (supports parameterized: {{NAME:arg1:arg2}})
-  // Config-driven: suppressedResolvers return empty string for this host.
-  // effectiveSuppressedResolvers() honors --respect-detection: when gbrain
-  // is detected locally, GBRAIN_* resolvers un-suppress so brain-aware
-  // blocks render for users who have gbrain installed.
  const currentHostConfig = getHostConfig(host);
-  const suppressed = effectiveSuppressedResolvers(currentHostConfig);
-  let content = tmplContent.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (match, fullKey) => {
-    const parts = fullKey.split(':');
-    const resolverName = parts[0];
-    const args = parts.slice(1);
-    if (suppressed.has(resolverName)) return '';
-    const entry = RESOLVERS[resolverName];
-    if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
-    const { resolve, appliesTo } = unwrapResolver(entry);
-    if (appliesTo && !appliesTo(ctx)) return '';
-    return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
-  });
+  const ctx = buildContext(tmplContent, tmplPath, host);
+  const skillName = ctx.skillName;

-  // Check for any remaining unresolved placeholders
-  const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
-  if (remaining) {
-    throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
-  }
+  // Replace placeholders + assert none remain (shared path with section generation).
+  let content = resolvePlaceholders(tmplContent, ctx, currentHostConfig, relTmplPath);

  // Preprocess voice triggers: fold into description, strip field from frontmatter.
  // Must run BEFORE transformFrontmatter so all hosts see the updated description,
@@ -742,6 +795,58 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
  return { outputPath, content, symlinkLoop, catalogParts };
 }

+/**
+ * Generate one on-demand section file (`<skill>/sections/<name>.md.tmpl` →
+ * `<name>.md`). Sections are BODY FRAGMENTS — no frontmatter, no catalog trim,
+ * no voice triggers. They resolve placeholders through the SAME path as
+ * SKILL.md (resolvePlaceholders) using the PARENT skill's TemplateContext
+ * (so appliesTo gating + tier behave identically — a section's {{PREAMBLE}}-
+ * style resolver renders the same content it would in the parent, not empty).
+ *
+ * Output routing mirrors SKILL.md: Claude writes in-tree at
+ * `<skill>/sections/<name>.md`; external hosts write to
+ * `<hostSubdir>/skills/<externalName>/sections/<name>.md`. External hosts get
+ * applyHostRewrites so cross-references resolve per host.
+ */
+function processSectionTemplate(
+  sectionTmplPath: string,
+  skillDir: string,
+  host: Host = 'claude',
+): { outputPath: string; content: string } {
+  const tmplContent = fs.readFileSync(sectionTmplPath, 'utf-8');
+  const relTmplPath = path.relative(ROOT, sectionTmplPath);
+  const hostConfig = getHostConfig(host);
+
+  // Read the owning SKILL.md.tmpl so the section inherits the parent's name +
+  // tier + benefits-from (TemplateContext parity). Fall back to the dir name.
+  const parentTmplPath = path.join(ROOT, skillDir, 'SKILL.md.tmpl');
+  const parentContent = fs.existsSync(parentTmplPath) ? fs.readFileSync(parentTmplPath, 'utf-8') : '';
+  const parentName = (parentContent && extractNameAndDescription(parentContent).name) || skillDir;
+  const ctx = buildContext(parentContent || tmplContent, parentTmplPath, host, parentName);
+
+  // Resolve placeholders against the section body (shared guard catches stragglers).
+  let content = resolvePlaceholders(tmplContent, ctx, hostConfig, relTmplPath);
+
+  // External hosts: rewrite cross-reference paths/tools (no frontmatter to transform).
+  if (host !== 'claude') {
+    content = applyHostRewrites(content, hostConfig);
+  }
+
+  // Plain generated header (no frontmatter to insert after).
+  content = GENERATED_HEADER.replace('{{SOURCE}}', path.basename(sectionTmplPath)) + content;
+
+  const fileName = path.basename(sectionTmplPath).replace(/\.tmpl$/, '');
+  let outputPath: string;
+  if (host === 'claude') {
+    outputPath = path.join(ROOT, skillDir, 'sections', fileName);
+  } else {
+    const externalName = externalSkillName(skillDir, parentName);
+    outputPath = path.join(ROOT, hostConfig.hostSubdir, 'skills', externalName, 'sections', fileName);
+  }
+  if (!DRY_RUN) fs.mkdirSync(path.dirname(outputPath), { recursive: true });
+  return { outputPath, content };
+}
+
 // ─── Main ───────────────────────────────────────────────────

 function findTemplates(): string[] {
@@ -833,6 +938,42 @@ for (const currentHost of hostsToRun) {
      }
    }

+    // ─── Section generation (v2 plan T9, Claude-first carve) ───
+    // On-demand sections/*.md for carved skills. Generated for CLAUDE ONLY:
+    // every other host inlines section content via the {{SECTION:id}} resolver
+    // (keeping the full monolith skill), so they need no section files and we
+    // sidestep host-portable section paths until that plumbing lands. No-op for
+    // any skill without a sections/ dir. Mirrors the SKILL.md DRY_RUN handling so
+    // sections participate in the freshness gate.
+    for (const sec of currentHost === 'claude' ? discoverSectionTemplates(ROOT) : []) {
+      if (currentHostConfig.generation.includeSkills?.length &&
+          !currentHostConfig.generation.includeSkills.includes(sec.skillDir)) continue;
+      if (currentHostConfig.generation.skipSkills?.length &&
+          currentHostConfig.generation.skipSkills.includes(sec.skillDir)) continue;
+
+      const { outputPath, content } = processSectionTemplate(path.join(ROOT, sec.tmpl), sec.skillDir, currentHost);
+      const relOutput = path.relative(ROOT, outputPath);
+
+      if (DRY_RUN) {
+        const existing = fs.existsSync(outputPath) ? fs.readFileSync(outputPath, 'utf-8') : '';
+        if (existing !== content) {
+          console.log(`STALE: ${relOutput}`);
+          hasChanges = true;
+        } else {
+          console.log(`FRESH: ${relOutput}`);
+        }
+      } else {
+        fs.writeFileSync(outputPath, content);
+        console.log(`GENERATED: ${relOutput}`);
+      }
+
+      tokenBudget.push({
+        skill: relOutput,
+        lines: content.split('\n').length,
+        tokens: Math.round(content.length / 4),
+      });
+    }
+
    // Generate gstack-lite and gstack-full for OpenClaw host
    if (currentHost === 'openclaw' && !DRY_RUN) {
      const openclawDir = path.join(ROOT, 'openclaw');
@@ -959,10 +1100,14 @@ The orchestrator will persist the plan link to its own memory/knowledge store.
  }
 }

-// --host all: report failures. Only exit(1) if claude failed.
+// --host all: any host failure fails the build. Previously only claude failures
+// exited nonzero, which let a stale or broken external-host output (e.g. a
+// section that failed to generate for Factory) slip through the freshness gate
+// silently. With sections fanned out across every host, "all hosts regenerated
+// in the same commit" is only a real gate if every host failure is fatal here.
 if (failures.length > 0 && HOST_ARG_VAL === 'all') {
  console.error(`\n${failures.length} host(s) failed: ${failures.map(f => f.host).join(', ')}`);
-  if (failures.some(f => f.host === 'claude')) process.exit(1);
+  process.exit(1);
 }
 // Single host dry-run failure already handled above