mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-01 07:41:36 +02:00
22f8c7f4e1
* docs(designs): add v2_PLAN.md — gstack v2 the lightest opinionated skill pack The approved plan from /plan-ceo-review → /plan-eng-review → /codex×2 → /plan-devex-review. Captures the v1.45/v2.0 hybrid release shape, cathedral parity-eval suite, sequential v1.45 execution, sections/*.md.tmpl pipeline, EVALS_BUDGET_HARD_CAP override path, and v2 launch copy specs. This commit just lands the design doc. Implementation follows in the rest of the v1.45.0.0 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(parity): T0a — capture v1.44.1 baseline + capture helper + diff utility Cathedral parity-eval suite primitive. captureBaseline() walks every top-level SKILL.md and records bytes, lines, estimated tokens, frontmatter description length, and eval coverage. diffBaselines() reports per-skill delta + total corpus delta + catalog tokens delta. Locks the v1.44.1 reference snapshot at test/fixtures/parity-baseline-v1.44.1.json. After Phase A+B+C land, scripts/capture-baseline.ts --tag v1.45.0.0 produces a comparable snapshot; diff supplies the real numbers the v2 CHANGELOG quotes. Never invent baseline numbers; ship them only if they came from a real run. v1.44.1 numbers captured this commit: - 51 skills - 2,847 KB total corpus - ~9,319 catalog tokens (sum of description bytes / 4) - top 3: ship 160 KB, plan-ceo-review 128 KB, office-hours 108 KB Test plan: - bun test test/helpers/capture-parity-baseline.test.ts passes 4/4 - The baseline JSON file is committed so reviewers can audit v1→v2 numbers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(resolvers): T2 — ResolverEntry + appliesTo gate infrastructure Adds the conditional-resolver-injection plumbing from the v2_PLAN A.1 step. Resolvers can now be either a bare ResolverFn (always fires, current behavior) or a ResolverEntry { resolve, appliesTo? } (gated; appliesTo returning false skips the resolver, substitutes empty string). Why infrastructure-only: the audit during T0a confirmed most resolvers don't need gating. The {{NAME}} placeholder system is already conditional at the template level — a resolver only fires for skills that reference it. The gate is for future use when a placeholder's audience needs a structural guardrail beyond social convention, or when a sub-resolver inside a larger composed resolver (e.g. preamble) needs per-skill skip. scripts/gen-skill-docs.ts:444 now uses unwrapResolver() to handle both shapes. RESOLVERS map signature widens from Record<string, ResolverFn> to Record<string, ResolverValue>. All existing resolvers stay bare functions and work unchanged. Test plan: - bun test test/resolver-entry.test.ts: 6 pass (gate plumbing + registry) - bun test test/gen-skill-docs.test.ts: 389 pass (no regression) - bun run gen:skill-docs --dry-run: all SKILL.md files FRESH (no diff) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(preamble): T3 — jargon dedup + terse-build flag (Phase A.2 + A.3) A.2 jargon dedup: generate-writing-style.ts replaces the inlined 80-term jargon list with a one-line pointer to scripts/jargon-list.json. The list was duplicated into every tier-2+ skill (48 of 51 skills); inlining cost was ~1.5 KB × 48 = ~70 KB across the corpus. Pointer cost is ~30 bytes per skill. Agents Read the JSON once per session on first jargon term encountered; thereafter the terms array is the canonical reference. A.3 terse build flag: --explain-level=terse compresses preamble prose at gen time. When the flag is set, writing-style collapses to a one-line terse directive and completeness-section + confusion-protocol + context-health are dropped entirely. The default build keeps the runtime-conditional behavior intact (sections still render; the model skips them when EXPLAIN_LEVEL: terse appears in the preamble echo). Terse build is opt-in for users who want shipped skills to match their runtime preference and avoid the per-session terse-mode dead prose. TemplateContext gains an optional `explainLevel: 'default' | 'terse'` field. Default builds set it to 'default'; --explain-level=terse sets 'terse'. Resolvers gate their output via `ctx?.explainLevel === 'terse'`. Measured impact (default build, post-T3): - Total corpus: 2,847 KB → 2,812 KB (saved 35 KB) - ship.md: 160 → 159 KB - plan-ceo-review.md: 128 → 127 KB - Top 10 heaviest: all slightly smaller from jargon pointer Larger compression lands in T4 (catalog trim) and T7 (atomic regen across the full Phase A pipeline). The terse build path further compresses to ~711K tokens vs default ~725K (saved ~14K tokens corpus-wide). Test plan: - bun test test/gen-skill-docs.test.ts: 389 pass (no regression) - bun test test/resolver-entry.test.ts: 6 pass - bun test test/helpers/capture-parity-baseline.test.ts: 4 pass - bun run gen:skill-docs --explain-level=terse: ship.md drops completeness + confusion-protocol + context-health sections; writing-style collapses to one-line terse directive 48 SKILL.md files updated (every tier-2+ skill picks up the jargon pointer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalog): T4 — catalog trim + proactive-suggestions.json (Phase A.4) Shortens frontmatter `description:` in every Claude SKILL.md to a single lead sentence + (gstack) tag. The routing prose ("Use when asked to...", "Proactively suggest...") and voice triggers move to a "## When to invoke" body section so they remain discoverable inside the skill. A per-run registry at scripts/proactive-suggestions.json aggregates the routing/ voice text for all 52 skills so agents can pull guidance on demand without paying for it in the always-loaded catalog. Build flag --catalog-mode=full restores v1.44 legacy behavior (full multi-line descriptions in frontmatter). Default is trim. splitCatalogDescription() extracts: lead sentence, routing paragraphs, voice-triggers line, (gstack) tag presence. Short descriptions (<120 chars, already trimmed) are skipped via a guard so re-runs are idempotent. Measured impact (vs v1.44.1 baseline): - Catalog tokens (sum of description bytes / 4): 9,319 → 4,045 (-56.6%) - Total SKILL.md corpus bytes: 2,915 KB → 2,880 KB (-1.2%) - Routing prose preserved as in-skill "## When to invoke" sections - 52 skill entries in scripts/proactive-suggestions.json (on-demand registry) The corpus drop is small because catalog trim MOVES text from frontmatter to body, it doesn't delete it. The headline win is the catalog: the always-loaded system prompt surface drops by more than half. Test plan: - bun test test/gen-skill-docs.test.ts: 389 pass, 0 fail - Manual: ship/SKILL.md frontmatter description is now ONE line ending with `(gstack)`; allowed-tools field on next line (YAML well-formed) - Manual: scripts/proactive-suggestions.json contains 52 entries - bun run gen:skill-docs --catalog-mode=full restores legacy behavior 53 files changed (52 SKILL.md across hosts + the new proactive-suggestions.json). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(budget): T5 — hard token budgets + override audit trail (Phase A.6) Two new gate-tier guardrails for the v1.45.0.0 compression baseline: 1. test/skill-size-budget.test.ts (NEW) — per-skill SKILL.md size budget. Compares current state to test/fixtures/parity-baseline-v1.44.1.json. Three checks: per-skill (×1.05 default ratio), total corpus, and catalog token estimate (≤7000 for v1.45). The per-skill ratio is 1.05 not 1.0 because the T4 catalog trim moves text from frontmatter to a body section; small skills see a tiny body growth that's fine when offset by the much larger catalog-token win. 2. test/skill-budget-regression.test.ts EXTENDED — hard dollar cap on per-run eval cost. Per-tier defaults: gate $25, periodic $70. Umbrella EVALS_BUDGET_HARD_CAP=$30. Catches runaway eval costs (infinite retry, model price changes) before they amortize across PRs. Both checks support an override path with audit trail: GSTACK_SIZE_BUDGET_OVERRIDE_REASON="why this is OK" — size EVALS_BUDGET_OVERRIDE_REASON="why this is OK" — cost Overrides log to ~/.gstack/analytics/spend-overrides.jsonl with timestamp + scope + reason + CI provenance (runner, branch, commit) via test/helpers/budget-override.ts. Why the override audit: a hard cap with no escape valve becomes operationally hostile (legit price changes, longer transcripts, new required evals can all blow the cap). An override with no audit becomes "everyone overrides everything and the gate is theater." This module ships the audit half so reviewers can see what was waived and why. Codex 2nd-pass critique #3 absorbed: per-suite caps + override path with auditability + budget baselines checked into repo (parity-baseline-v1.44.1.json already in test/fixtures/). Test plan: - bun test test/skill-size-budget.test.ts: 4 pass (per-skill, corpus, catalog, baseline-exists) - bun test test/skill-budget-regression.test.ts: 4 pass (2 existing ratio checks + 2 new hard-cap checks) - Existing eval runs ($14.11 e2e, $0.02 llm-judge) sit well under the new caps Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cso): T6 — pin must-preserve security phrases (Phase A.5) cso/SKILL.md is a content-heavy security audit skill (75 KB after T3+T4). Codex 2nd-pass critique #9: "cso exemption too broad ... should still get resolver dedup, catalog trim, sectioning if safe, and targeted evals around must-not-miss checks." T3 (jargon dedup) and T4 (catalog trim) already applied to cso the same way they applied to every other skill — confirmed by inspection: - jargon list NOT inlined (0 inline term lines) - catalog description trimmed to one line (74 bytes vs 774 bytes baseline) - "## When to invoke" body section present T6 work: lock in the security-prose preservation via a gate-tier test that fails CI if future compression strips load-bearing phrases: - OWASP, STRIDE positioning - daily / comprehensive mode discipline - confidence scoring language - active verification ("verif" prefix catches verify/verified/verification) - ## Preamble heading (preamble resolver still fires) Also guards cso against accidental over-stripping: SKILL.md must stay ≥30 KB (currently 75 KB) — a sudden cliff would mean compression went past the targeted-dedup line into structural removal. No structural change to cso. Future Phase B sections/ work for cso requires writing baseline parity tests FIRST per the v2_PLAN.md sequencing. Test plan: - bun test test/cso-preserved.test.ts: 5 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(parity): T0b — cathedral parity-suite harness + invariant registry Adds the harness that the v2_PLAN.md cathedral parity-eval suite is built on. Compares CURRENT SKILL.md output to v1.44.1 baseline along three axes: STRUCTURE frontmatter shape (catalog trim landed, "## When to invoke" present) CONTENT must-preserve phrases per skill family (cso: OWASP/STRIDE; plan-ceo: SCOPE EXPANSION/HOLD SCOPE/REDUCTION; ship: VERSION/CHANGELOG/PR; etc.) SIZE per-skill byte budget (maxSizeRatio + minBytes guards) PARITY_INVARIANTS registry pins 10 load-bearing skills (cso, ship, plan-*- review, review, qa, investigate, office-hours, autoplan). Each entry declares what must NOT regress; future compression that strips these phrases or shrinks a skill past its minBytes cliff fails CI. Periodic-tier LLM-judge parity (paid, ~$0.20/skill) lands in v2.0.0.0 sections/ phase. Same registry, same harness, judge added on top. Test plan: - bun test test/parity-suite.test.ts: 10/10 invariants pass vs v1.44.1 - Per-skill failures get actionable per-line breakdown so a reviewer can see which phrase / heading / size limit went sideways Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): T1 — skill coverage matrix + structural-compliance floor Phase 0 deliverable — eval-first foundation. Two new test files plus the registry: 1. test/skill-coverage-matrix.ts — single source of truth mapping each skill to its gate-tier + periodic-tier test files. SKILL_COVERAGE record with 51 entries; every gstack skill on disk has at least one gate-tier entry. 2. test/skill-coverage-matrix.test.ts — CI gate. Asserts every skill on disk has a registry entry AND that gate[] is non-empty. Catches "skill added but eval not registered" the moment a new SKILL.md lands. 3. test/skill-coverage-floor.test.ts — per-skill structural compliance (FREE, file-IO only). For each of 51 skills, verifies: - SKILL.md exists - Frontmatter well-formed (name + description fields) - Catalog-trim contract (inline description ≤ 250 chars, or block form) - Generated header present (edit .tmpl, not .md) - Body ≥ 200 bytes (non-trivial content) - No unresolved {{TEMPLATE}} placeholders leaked The "floor" is the minimum eval that every skill ships with. Skills that need deeper behavioral testing get additional entries in their coverage record (e.g., ship has skill-e2e-ship-idempotency + workflow + floor). Future skills only need to add the floor entry and the matrix gate unblocks them. Codex 2nd-pass critique #1 mitigation: eval-first floor is structural compliance (the testable part) — judgment-skill behavior gets layered periodic-tier evals on top. We don't pretend the floor proves correctness, only that the skill structurally compiles. Test plan: - bun test test/skill-coverage-matrix.test.ts: 4 pass (matrix shape + coverage) - bun test test/skill-coverage-floor.test.ts: 309 pass (6 checks × 51 skills + 3 registry-level) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(skills): T7 — atomic regenerate + capture v1.45.0.0 baseline Final regen pass across all hosts after T1-T6 work landed. Captures the v1.45.0.0 parity baseline at test/fixtures/parity-baseline-v1.45.0.0.json for diffing against the v1.44.1 reference. Measured deltas (real numbers from test/helpers/capture-parity-baseline.ts): Total SKILL.md corpus 2,847 KB → 2,813 KB (-1.2%) Catalog tokens (always-loaded) ~9,319 → ~4,045 tokens (-56.6%) Top 10 heaviest skills 0.5-1.0% drop each The catalog token cut is the headline. It's the always-loaded surface, i.e. tokens charged on every session start. Per-skill SKILL.md sizes barely moved because T4 catalog trim MOVES routing prose from frontmatter to a body "## When to invoke" section rather than deleting it — the catalog wins without amputating discoverability. The bigger per-skill compression lands in v2.0.0.0 (Phase B sections/ pattern on the 5 heavyweights). v1.45 is the foundation: eval-first infrastructure + cheap wins. scripts/proactive-suggestions.json regenerated with the latest 52 skills listed (one-time write per gen-skill-docs run; aggregated catalog parts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.45.0.0 — gstack v2 foundation: catalog tokens drop 56%, eval-first floor Bumps VERSION + package.json to 1.45.0.0. CHANGELOG entry covers what shipped between v1.44.1 and this release: the cathedral parity-eval foundation, conditional resolver injection plumbing, jargon dedup, terse build flag, catalog trim with one-line frontmatter descriptions, hard token + dollar budget gates with override audit, cso preservation pins, and the v1.44.1 ↔ v1.45.0.0 parity baselines committed to test/fixtures/. Numbers (measured, not estimated): - Catalog tokens: ~9,319 → ~4,045 (-56.6%) - Total corpus: 2,847 KB → 2,813 KB (-1.2%) - Skills with gate-tier eval coverage: 32/51 → 51/51 (floor achieved) This is the foundation release. v2.0.0.0 will ship the architectural break (sections/*.md.tmpl pattern + mechanical Read enforcement + eval-coverage annotations) as a coordinated marketing-grade launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(catalog): refresh proactive-suggestions.json timestamp after v1.45 bump The generated_at field updates on every gen-skill-docs run; this is the T7 atomic-regenerate output landed alongside the v1.45.0.0 bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): deterministic proactive-suggestions.json (no per-run timestamp) Original implementation wrote a generated_at timestamp on every gen-skill-docs run. That made CI dry-run freshness checks flap because the file changed on every regeneration even when the actual content (skill descriptions, routing prose, voice triggers) was unchanged. Two fixes: 1. Drop the generated_at field. The file is purely a content registry now. 2. Only write the file when serialized content actually differs from disk. Reproducible test: bun run gen:skill-docs twice in a row now leaves scripts/proactive-suggestions.json unchanged on the second run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): preserve routing prose when first sentence exceeds 200 chars splitCatalogDescription truncated the lead BEFORE computing routing extraction, which meant skills whose first sentence was over 200 chars (design-consultation: 207 chars) had their entire routing prose silently dropped — the "## When to invoke" body section came out empty. Root cause: routing was extracted via `collapsed.indexOf(lead)` after lead was suffixed with "...". The "..." never appeared in the original string, so indexOf returned -1 and routingProse fell back to empty. Fix: compute routing from sentenceLead (the untruncated first sentence) BEFORE truncating the displayed lead. The displayed lead still gets "..." when over 200 chars, but the routing extraction uses the real boundary. Also: refresh golden snapshots for claude/codex/factory ship and update two unit tests that asserted v1.44 behavior: - skill-validation.test.ts: trigger-phrase + proactive-routing tests now search whole content, not just frontmatter (T4 moved them to a body "## When to invoke" section) - writing-style-resolver.test.ts: jargon-list assertion now expects the T3 reference pointer, not the inline list Test plan: - bun test test/skill-validation.test.ts test/writing-style-resolver.test.ts test/host-config.test.ts test/skill-size-budget.test.ts test/parity-suite.test.ts test/skill-coverage-matrix.test.ts test/skill-coverage-floor.test.ts test/cso-preserved.test.ts test/resolver-entry.test.ts test/helpers/capture-parity-baseline.test.ts test/gen-skill-docs.test.ts: 1134 pass, 0 fail - Manual verify: design-consultation/SKILL.md "## When to invoke this skill" body section now contains "Use when asked to..." + "Proactively suggest..." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): deterministic proactive-suggestions.json across machines CI check-freshness failed because scripts/proactive-suggestions.json serialized differently on local vs CI: 1. Root-skill key leaked the directory name. processTemplate's outer loop computed `dir = path.basename(path.dirname(tmplPath))`. For the root SKILL.md.tmpl at ROOT/SKILL.md.tmpl, that returns the repo-checkout directory name — "seville-v3" in a Conductor worktree, "gstack" on GitHub Actions, anything-else for a fork. Fix: detect root via `path.dirname(tmplPath) === ROOT` and hardcode the key to "gstack" for that one case. 2. Aggregate key order was filesystem-iteration order. discoverTemplates doesn't guarantee stable ordering across platforms, so the JSON `skills` object came out shuffled between machines. Fix: sort Object.keys(proactiveAggregate) alphabetically before serializing. After the fix, the generated file is identical on every machine and matches what's committed. CI freshness check (bun run gen:skill-docs && git diff --exit-code) now passes. Test plan: - bun run gen:skill-docs && bun run gen:skill-docs --dry-run: all FRESH - node -e 'verify keys sorted': sorted match: true - grep -c '"seville-v3"' scripts/proactive-suggestions.json: 0 - Focused test suite: 704 pass, 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalog): unit + regression coverage for catalog-trim helpers Four exported functions in scripts/gen-skill-docs.ts handle every skill's frontmatter rewrite at gen time but had zero unit tests. Both real bugs we shipped (and fixed) on this branch lived in these functions: v1.45.0.0 design-consultation: when the first sentence exceeded 200 chars, routing-prose extraction lost the entire tail (anchored on truncated lead with "..." that didn't substring-match the original). v1.45.0.0 CI freshness: root-skill key leaked the checkout directory name ("seville-v3" vs "gstack") and aggregate order was filesystem- iteration order. Both shapes are now regression-tested: - splitCatalogDescription: 7 tests covering simple multi-line, >200-char first sentence (design-consultation regression), voice-trigger extraction, no-(gstack) handling, embedded periods (documents known fallback), no-period fragments, and idempotency. - buildTrimmedDescription: 3 tests. - buildWhenToInvokeSection: 3 tests. - applyCatalogTrim: 4 tests covering the standard rewrite, no-op for already-short descriptions, the YAML-collision newline fix, and the malformed-frontmatter null return. - proactive-suggestions.json determinism: 3 tests asserting sorted keys, root keyed as "gstack" (not the worktree directory), and no timestamp/generated_at field that would flap CI freshness. Test plan: - bun test test/catalog-trim.test.ts: 20 pass, 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): fill three remaining v1.46.0.0 test gaps Three untested surfaces from the v1.46.0.0 work. All three would have caught real bugs we shipped (and fixed) on this branch. 1. test/helpers/budget-override.test.ts — 7 tests pin the audit-trail contract for EVALS_BUDGET_OVERRIDE_REASON and GSTACK_SIZE_BUDGET_OVERRIDE_REASON. Without this, the audit logger could silently drop events and overrides become invisible. Tests cover: required fields per JSONL line, CI provenance capture (CI/GITHUB_ACTIONS/branch/commit), local-runner defaults, append-only behavior, missing-directory recovery, and unwritable- path resilience (logs warning instead of throwing). 2. test/terse-build.test.ts — 16 tests pin --explain-level=terse behavior across the 4 gated resolvers and the composed preamble. Default vs terse vs undefined-ctx all asserted. Without this, a refactor that breaks the explainLevel threading silently regresses the opt-in compression path; the runtime EXPLAIN_LEVEL: terse gate still works so users wouldn't notice. Tier-1 invariant pinned (terse-only-affects-tier-2+). 3. test/gen-skill-docs-idempotency.test.ts — 2 tests catch the class of bug behind the v1.45.0.0 timestamp flap. Two consecutive gen-skill-docs runs must produce byte-identical outputs across STABLE_OUTPUTS (proactive-suggestions.json, SKILL.md, ship/SKILL.md, plan-ceo-review/SKILL.md, office-hours/SKILL.md, gstack/llms.txt). --dry-run reports zero stale files after a fresh gen. CI freshness regressions surface as test failures BEFORE a PR is opened. Test plan: - bun test test/helpers/budget-override.test.ts: 7 pass - bun test test/terse-build.test.ts: 16 pass - bun test test/gen-skill-docs-idempotency.test.ts: 2 pass - Full focused suite (15 test files): 1179 pass, 0 fail (+45 new tests vs the pre-fill baseline of 1134) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): close 5 remaining v1.46.0.0 test gaps (A-E) Five behaviors that v1.46 ships but had no test coverage. All now pinned. A) --host all idempotency (test/gen-skill-docs-idempotency.test.ts) The default test ran Claude host only. Non-Claude hosts (Codex, Factory, Cursor, OpenClaw, GBrain, Slate, OpenCode, Hermes, Kiro) each have their own output paths and could carry their own non-deterministic fields. We hit a "--host all needed for freshness check" mid-/ship. Now: two consecutive `bun run gen:skill-docs --host all` runs must produce byte-identical outputs across a per-host sample (.agents/, .cursor/, .factory/, .gbrain/). Catches per-host adapter regressions before CI. B) --catalog-mode=full opt-out (test/catalog-mode-full.test.ts) The legacy escape hatch had zero tests. 6 new tests across two layers: static (CATALOG_MODE_ARG parsed; conditional gate present; default is "trim"; invalid value throws) + smoke (actual --catalog-mode=full run produces a multi-line `description: |` block + omits "## When to invoke" body section; mutates the working tree then restores in a finally block). C) parity-baseline-v1.44.1.json integrity (test/parity-baseline-integrity.test.ts) The baseline is the source of every v1→v2 number cited in the CHANGELOG v1.46.0.0 entry. Anyone could edit it without test failure until now. 8 new tests pin: existence, tag, capturedFromCommit allowlist, expected v1.44 numbers (51 skills, ~2,915 KB, ~9,319 catalog tokens), CHANGELOG references this file by path, per-skill shape, and a SHA256 byte-stability hash. Any edit fails with a clear "if intentional, update EXPECTED_HASH AND the CHANGELOG numbers" signal. D) Live appliesTo gate end-to-end (test/resolver-entry.test.ts extended) The unwrapResolver unit tests covered the function; the gen-skill-docs.ts substitution loop that USES the gate had no integration coverage. 6 new tests simulate the exact 4-line shape from gen-skill-docs.ts:457-467 against synthetic registries: plain-function fires unconditionally, gated fires when true / empty-string when false, mixed registries compose, parameterized resolvers respect gates, unknown resolvers throw. E) Per-skill min-size floor (test/skill-size-budget.test.ts extended) The existing 200-byte body coverage-floor is a noise floor — a skill that lost 99.75% of content still passes. 1 new test asserts every skill stays ≥80% of its v1.44.1 baseline size (the parity-suite content invariants only covered 10 of 51 skills; the remaining 41 were uncovered). SECTIONS_EXTRACTED hook in place for v2.0.0.0 when the sections/ pattern legitimately shrinks ship/plan-ceo/etc. past the floor. Test plan: - bun test focused 17-file suite: 1202 pass, 0 fail (+23 new tests vs the pre-fill 1179 baseline) - catalog-mode=full mutates working tree then restores cleanly - --host all idempotency runs two full gen passes in <1s on this machine Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
957 lines
42 KiB
TypeScript
957 lines
42 KiB
TypeScript
#!/usr/bin/env bun
|
|
/**
|
|
* Generate SKILL.md files from .tmpl templates.
|
|
*
|
|
* Pipeline:
|
|
* read .tmpl → find {{PLACEHOLDERS}} → resolve from source → format → write .md
|
|
*
|
|
* Supports --dry-run: generate to memory, exit 1 if different from committed file.
|
|
* Used by skill:check and CI freshness checks.
|
|
*/
|
|
|
|
import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
|
|
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
|
import { discoverTemplates } from './discover-skills';
|
|
import { writeLlmsTxt } from './gen-llms-txt';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import type { Host, TemplateContext } from './resolvers/types';
|
|
import { HOST_PATHS, unwrapResolver } from './resolvers/types';
|
|
import { RESOLVERS } from './resolvers/index';
|
|
import { externalSkillName, extractHookSafetyProse as _extractHookSafetyProse, extractNameAndDescription as _extractNameAndDescription, condenseOpenAIShortDescription as _condenseOpenAIShortDescription, generateOpenAIYaml as _generateOpenAIYaml } from './resolvers/codex-helpers';
|
|
import { generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './resolvers/review';
|
|
import { ALL_HOST_CONFIGS, ALL_HOST_NAMES, resolveHostArg, getHostConfig } from '../hosts/index';
|
|
import type { HostConfig } from './host-config';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '..');
|
|
const DRY_RUN = process.argv.includes('--dry-run');
|
|
|
|
// ─── Host Detection (config-driven) ─────────────────────────
|
|
|
|
const HOST_ARG = process.argv.find(a => a.startsWith('--host'));
|
|
type HostArg = Host | 'all';
|
|
const HOST_ARG_VAL: HostArg = (() => {
|
|
if (!HOST_ARG) return 'claude';
|
|
const val = HOST_ARG.includes('=') ? HOST_ARG.split('=')[1] : process.argv[process.argv.indexOf(HOST_ARG) + 1];
|
|
if (val === 'all') return 'all';
|
|
try {
|
|
return resolveHostArg(val) as Host;
|
|
} catch {
|
|
throw new Error(`Unknown host: ${val}. Use ${ALL_HOST_NAMES.join(', ')}, or all.`);
|
|
}
|
|
})();
|
|
|
|
// For single-host mode, HOST is the host. For --host all, it's set per iteration below.
|
|
let HOST: Host = HOST_ARG_VAL === 'all' ? 'claude' : HOST_ARG_VAL;
|
|
|
|
// ─── Model Overlay Selection ────────────────────────────────
|
|
// --model is explicit. We do NOT auto-detect from host (host ≠ model).
|
|
// Default is 'claude'. Missing overlay file → empty string (graceful).
|
|
import { ALL_MODEL_NAMES, resolveModel, type Model } from './models';
|
|
const MODEL_ARG = process.argv.find(a => a.startsWith('--model'));
|
|
const MODEL_ARG_VAL: Model = (() => {
|
|
if (!MODEL_ARG) return 'claude';
|
|
const val = MODEL_ARG.includes('=') ? MODEL_ARG.split('=')[1] : process.argv[process.argv.indexOf(MODEL_ARG) + 1];
|
|
const resolved = resolveModel(val);
|
|
if (!resolved) {
|
|
throw new Error(`Unknown model: ${val}. Use ${ALL_MODEL_NAMES.join(', ')}, or a family variant (e.g., claude-opus-4-7, gpt-5.4-mini, o3).`);
|
|
}
|
|
return resolved;
|
|
})();
|
|
|
|
// ─── Catalog Mode (v1.45.0.0 T4) ────────────────────────────
|
|
// 'trim' (default): shorten frontmatter description to lead sentence,
|
|
// move routing/voice prose into a "## When to invoke" body section, and
|
|
// emit scripts/proactive-suggestions.json (single file across all skills).
|
|
// 'full': legacy v1.44 behavior — full description stays in frontmatter.
|
|
const CATALOG_MODE_ARG = process.argv.find(a => a.startsWith('--catalog-mode'));
|
|
const CATALOG_MODE: 'trim' | 'full' = (() => {
|
|
if (!CATALOG_MODE_ARG) return 'trim';
|
|
const val = CATALOG_MODE_ARG.includes('=')
|
|
? CATALOG_MODE_ARG.split('=')[1]
|
|
: process.argv[process.argv.indexOf(CATALOG_MODE_ARG) + 1];
|
|
if (val !== 'trim' && val !== 'full') {
|
|
throw new Error(`Unknown catalog mode: ${val}. Use 'trim' (default) or 'full'.`);
|
|
}
|
|
return val;
|
|
})();
|
|
|
|
// ─── Explain-level Overlay ──────────────────────────────────
|
|
// --explain-level=terse compresses preamble prose (writing-style, completeness,
|
|
// confusion-protocol, context-health) to a single pointer line at gen time.
|
|
// Default keeps the runtime-conditional behavior (sections render unconditionally,
|
|
// the model skips them when EXPLAIN_LEVEL: terse appears in the preamble echo).
|
|
// Opt-in via the build flag so most users get the runtime-flexible default.
|
|
const EXPLAIN_LEVEL_ARG = process.argv.find(a => a.startsWith('--explain-level'));
|
|
const EXPLAIN_LEVEL: 'default' | 'terse' = (() => {
|
|
if (!EXPLAIN_LEVEL_ARG) return 'default';
|
|
const val = EXPLAIN_LEVEL_ARG.includes('=')
|
|
? EXPLAIN_LEVEL_ARG.split('=')[1]
|
|
: process.argv[process.argv.indexOf(EXPLAIN_LEVEL_ARG) + 1];
|
|
if (val !== 'default' && val !== 'terse') {
|
|
throw new Error(`Unknown explain level: ${val}. Use 'default' or 'terse'.`);
|
|
}
|
|
return val;
|
|
})();
|
|
|
|
// HostPaths, HOST_PATHS, and TemplateContext imported from ./resolvers/types (line 7-8)
|
|
// Design constants (AI_SLOP_BLACKLIST, OPENAI_HARD_REJECTIONS, OPENAI_LITMUS_CHECKS)
|
|
// live in ./resolvers/constants and are consumed by resolvers directly.
|
|
|
|
// ─── External Host Helpers ───────────────────────────────────
|
|
|
|
// Re-export local copy for use in this file (matches codex-helpers.ts)
|
|
// Accepts optional frontmatter name to support directory/invocation name divergence
|
|
function externalSkillName(skillDir: string, frontmatterName?: string): string {
|
|
// Root skill (skillDir === '' or '.') always maps to 'gstack' regardless of frontmatter
|
|
if (skillDir === '.' || skillDir === '') return 'gstack';
|
|
// Use frontmatter name when it differs from directory name (e.g., run-tests/ with name: test)
|
|
const baseName = frontmatterName && frontmatterName !== skillDir ? frontmatterName : skillDir;
|
|
// Don't double-prefix: gstack-upgrade → gstack-upgrade (not gstack-gstack-upgrade)
|
|
if (baseName.startsWith('gstack-')) return baseName;
|
|
return `gstack-${baseName}`;
|
|
}
|
|
|
|
function extractNameAndDescription(content: string): { name: string; description: string } {
|
|
const fmStart = content.indexOf('---\n');
|
|
if (fmStart !== 0) return { name: '', description: '' };
|
|
const fmEnd = content.indexOf('\n---', fmStart + 4);
|
|
if (fmEnd === -1) return { name: '', description: '' };
|
|
|
|
const frontmatter = content.slice(fmStart + 4, fmEnd);
|
|
const nameMatch = frontmatter.match(/^name:\s*(.+)$/m);
|
|
const name = nameMatch ? nameMatch[1].trim() : '';
|
|
|
|
let description = '';
|
|
const lines = frontmatter.split('\n');
|
|
let inDescription = false;
|
|
const descLines: string[] = [];
|
|
for (const line of lines) {
|
|
if (line.match(/^description:\s*\|?\s*$/)) {
|
|
inDescription = true;
|
|
continue;
|
|
}
|
|
if (line.match(/^description:\s*\S/)) {
|
|
description = line.replace(/^description:\s*/, '').trim();
|
|
break;
|
|
}
|
|
if (inDescription) {
|
|
if (line === '' || line.match(/^\s/)) {
|
|
descLines.push(line.replace(/^ /, ''));
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
if (descLines.length > 0) {
|
|
description = descLines.join('\n').trim();
|
|
}
|
|
|
|
return { name, description };
|
|
}
|
|
|
|
// ─── Voice Trigger Processing ────────────────────────────────
|
|
|
|
/**
|
|
* Extract voice-triggers YAML list from frontmatter.
|
|
* Returns an array of trigger strings, or [] if no voice-triggers field.
|
|
*/
|
|
function extractVoiceTriggers(content: string): string[] {
|
|
const fmStart = content.indexOf('---\n');
|
|
if (fmStart !== 0) return [];
|
|
const fmEnd = content.indexOf('\n---', fmStart + 4);
|
|
if (fmEnd === -1) return [];
|
|
const frontmatter = content.slice(fmStart + 4, fmEnd);
|
|
|
|
const triggers: string[] = [];
|
|
let inVoice = false;
|
|
for (const line of frontmatter.split('\n')) {
|
|
if (/^voice-triggers:/.test(line)) { inVoice = true; continue; }
|
|
if (inVoice) {
|
|
const m = line.match(/^\s+-\s+"(.+)"$/);
|
|
if (m) triggers.push(m[1]);
|
|
else if (!/^\s/.test(line)) break;
|
|
}
|
|
}
|
|
return triggers;
|
|
}
|
|
|
|
/**
|
|
* Preprocess voice triggers: fold voice-triggers YAML field into description,
|
|
* then strip the field from frontmatter. Must run BEFORE transformFrontmatter
|
|
* and extractNameAndDescription so all hosts see the updated description.
|
|
*/
|
|
function processVoiceTriggers(content: string): string {
|
|
const triggers = extractVoiceTriggers(content);
|
|
if (triggers.length === 0) return content;
|
|
|
|
// Strip voice-triggers block from frontmatter
|
|
content = content.replace(/^voice-triggers:\n(?:\s+-\s+"[^"]*"\n?)*/m, '');
|
|
|
|
// Get current description (after stripping voice-triggers, so it's clean)
|
|
const { description } = extractNameAndDescription(content);
|
|
if (!description) return content;
|
|
|
|
// Build new description with voice triggers appended
|
|
const voiceLine = `Voice triggers (speech-to-text aliases): ${triggers.map(t => `"${t}"`).join(', ')}.`;
|
|
const newDescription = description + '\n' + voiceLine;
|
|
|
|
// Replace old indented description with new in frontmatter
|
|
const oldIndented = description.split('\n').map(l => ` ${l}`).join('\n');
|
|
const newIndented = newDescription.split('\n').map(l => ` ${l}`).join('\n');
|
|
content = content.replace(oldIndented, newIndented);
|
|
|
|
return content;
|
|
}
|
|
|
|
// Export for testing
|
|
export { extractVoiceTriggers, processVoiceTriggers };
|
|
|
|
// ─── Catalog Trim (v1.45.0.0 T4) ─────────────────────────────
|
|
//
|
|
// Frontmatter `description:` blocks today pack: a one-line outcome, "Use when
|
|
// asked to..." voice triggers, "Proactively..." routing guidance, and a
|
|
// "(gstack)" tag. This pile is the always-loaded catalog surface — every
|
|
// session pays for the full text. The catalog trim splits the description
|
|
// into a one-line catalog entry (lead sentence + "(gstack)") that stays in
|
|
// the frontmatter, and a "## When to invoke" body section that holds the
|
|
// routing/voice triggers prose for in-skill discovery. A registry written
|
|
// to scripts/proactive-suggestions.json (one entry per skill) makes routing
|
|
// available to agents that need it without paying the always-loaded cost.
|
|
//
|
|
// Opt-out: `--catalog-mode=full` keeps v1.44 behavior (no trim, full
|
|
// description in frontmatter). Use when debugging routing regressions or
|
|
// when shipping skills to hosts that depend on the legacy fat catalog.
|
|
|
|
export interface CatalogParts {
|
|
lead: string; // First sentence — kept in catalog
|
|
routingProse: string; // "Use when asked to...", "Proactively..." paragraphs
|
|
voiceLine: string | null; // "Voice triggers (speech-to-text aliases): ..." line if present
|
|
hasGstackTag: boolean;
|
|
}
|
|
|
|
export function splitCatalogDescription(description: string): CatalogParts {
|
|
// Voice triggers line (folded in by processVoiceTriggers earlier)
|
|
const voiceMatch = description.match(/Voice triggers \(speech-to-text aliases\):[^\n]+/);
|
|
const voiceLine = voiceMatch ? voiceMatch[0] : null;
|
|
let working = voiceLine ? description.replace(voiceLine, '').trim() : description.trim();
|
|
|
|
const hasGstackTag = /\(gstack\)/.test(working);
|
|
if (hasGstackTag) working = working.replace(/\(gstack\)/, '').trim();
|
|
|
|
// Lead = first sentence (up to first period followed by space or end of string).
|
|
// We tolerate sentences with embedded periods (URLs, "v1.45.0.0") by requiring
|
|
// the period to be followed by whitespace OR end-of-text.
|
|
// First normalize to single-line for sentence detection, then back out.
|
|
const collapsed = working.replace(/\s+/g, ' ').trim();
|
|
const sentenceMatch = collapsed.match(/^([^.!?]*[.!?])(?:\s|$)/);
|
|
// sentenceLead is the FULL first sentence (no truncation). We compute routing
|
|
// from this position, then optionally truncate the displayed lead afterwards.
|
|
// Truncating first then computing routing was the v1.45.0.0 bug — when the
|
|
// first sentence exceeded 200 chars, the routing extraction would lose the
|
|
// entire tail of the description (design-consultation's "Use when..."
|
|
// routing prose silently dropped).
|
|
const sentenceLead = sentenceMatch ? sentenceMatch[1].trim() : collapsed.split(/\s/).slice(0, 20).join(' ');
|
|
|
|
// Routing prose: everything AFTER the first sentence boundary in the collapsed view.
|
|
const leadInCollapsed = collapsed.indexOf(sentenceLead);
|
|
const routingCollapsed = leadInCollapsed >= 0
|
|
? collapsed.slice(leadInCollapsed + sentenceLead.length).trim()
|
|
: '';
|
|
|
|
// Now produce the displayed lead — truncated if too long. The original
|
|
// sentenceLead is preserved for routing extraction below.
|
|
let lead = sentenceLead;
|
|
if (lead.length > 200) {
|
|
const trunc = lead.slice(0, 197);
|
|
const lastSpace = trunc.lastIndexOf(' ');
|
|
lead = (lastSpace > 60 ? trunc.slice(0, lastSpace) : trunc) + '...';
|
|
}
|
|
// Restore line breaks for routing prose by mapping back to original layout.
|
|
// Use original whitespace structure where possible; fall back to collapsed.
|
|
// Anchor recovery on sentenceLead (the untruncated first sentence) — not
|
|
// `lead` (which may have a "..." suffix and won't substring-match `working`).
|
|
let routingProse = routingCollapsed;
|
|
const collapsedLeadIdx = working.replace(/\s+/g, ' ').indexOf(sentenceLead);
|
|
if (collapsedLeadIdx >= 0) {
|
|
let consumed = 0;
|
|
let cut = 0;
|
|
for (let i = 0; i < working.length && consumed < collapsedLeadIdx + sentenceLead.length; i++) {
|
|
if (/\s/.test(working[i])) {
|
|
if (i === 0 || /\s/.test(working[i - 1])) continue;
|
|
consumed += 1;
|
|
} else {
|
|
consumed += 1;
|
|
}
|
|
cut = i + 1;
|
|
}
|
|
const tail = working.slice(cut).trim();
|
|
if (tail.length > 0) routingProse = tail;
|
|
}
|
|
|
|
return { lead, routingProse, voiceLine, hasGstackTag };
|
|
}
|
|
|
|
/** Build the catalog-trimmed `description:` block. */
|
|
export function buildTrimmedDescription(parts: CatalogParts): string {
|
|
const lead = parts.lead.trim();
|
|
const suffix = parts.hasGstackTag ? ' (gstack)' : '';
|
|
return `${lead}${suffix}`;
|
|
}
|
|
|
|
/** Build the body section that holds the routing/voice prose. */
|
|
export function buildWhenToInvokeSection(parts: CatalogParts): string {
|
|
const lines: string[] = ['## When to invoke this skill', ''];
|
|
if (parts.routingProse) {
|
|
lines.push(parts.routingProse);
|
|
lines.push('');
|
|
}
|
|
if (parts.voiceLine) {
|
|
lines.push(parts.voiceLine);
|
|
lines.push('');
|
|
}
|
|
return lines.join('\n');
|
|
}
|
|
|
|
/**
|
|
* Apply catalog trim to a SKILL.md body:
|
|
* - shorten frontmatter `description:` to lead + (gstack)
|
|
* - insert "## When to invoke" body section AFTER the generated header
|
|
* (so it lands near the top of body content, where routing guidance
|
|
* belongs)
|
|
*
|
|
* Returns the rewritten content plus the parts (used for proactive-suggestions
|
|
* JSON aggregation at the end of the run).
|
|
*/
|
|
export function applyCatalogTrim(content: string, skillName: string): { content: string; parts: CatalogParts } | null {
|
|
// Locate description block in frontmatter
|
|
if (!content.startsWith('---\n')) return null;
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
if (fmEnd === -1) return null;
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
|
|
// Match `description: |` block + indented body lines
|
|
const descMatch = frontmatter.match(/^description:\s*\|?\s*\n((?:\s{2,}.*(?:\n|$))+)/m)
|
|
|| frontmatter.match(/^description:\s+(.+)$/m);
|
|
if (!descMatch) return null;
|
|
|
|
// Extract full description text
|
|
let descText: string;
|
|
if (descMatch[0].startsWith('description: |') || /^description:\s*\|/.test(descMatch[0])) {
|
|
descText = descMatch[1].split('\n').map(l => l.replace(/^\s{2}/, '')).join('\n').trim();
|
|
} else {
|
|
descText = descMatch[1].trim();
|
|
}
|
|
|
|
// Skip skills with very short descriptions (already trimmed or no routing prose).
|
|
// Below ~120 chars, splitting adds no value.
|
|
if (descText.length < 120) return null;
|
|
|
|
const parts = splitCatalogDescription(descText);
|
|
// If lead + (gstack) is already most of the text, no trim needed.
|
|
const trimmedLen = buildTrimmedDescription(parts).length;
|
|
if (trimmedLen >= descText.length - 20) return null;
|
|
|
|
// Replace description in frontmatter — keep trailing newline so the next
|
|
// YAML field doesn't collide on the same line as the description value.
|
|
const newDesc = buildTrimmedDescription(parts);
|
|
const newFrontmatter = frontmatter.replace(descMatch[0], `description: ${newDesc}\n`);
|
|
let newContent = '---\n' + newFrontmatter + content.slice(fmEnd);
|
|
|
|
// Insert body section after frontmatter (after the closing ---\n and any
|
|
// existing GENERATED header). We insert before the first non-comment line.
|
|
const bodyStart = newContent.indexOf('\n---\n') + 5;
|
|
const whenToInvoke = '\n' + buildWhenToInvokeSection(parts).trim() + '\n';
|
|
// Skip past the generated header if present (it lives after frontmatter close)
|
|
const headerMatch = newContent.slice(bodyStart).match(/^(<!--[^>]*-->\s*\n)+/);
|
|
const insertAt = bodyStart + (headerMatch ? headerMatch[0].length : 0);
|
|
newContent = newContent.slice(0, insertAt) + whenToInvoke + '\n' + newContent.slice(insertAt);
|
|
|
|
return { content: newContent, parts };
|
|
}
|
|
|
|
const OPENAI_SHORT_DESCRIPTION_LIMIT = 120;
|
|
|
|
function condenseOpenAIShortDescription(description: string): string {
|
|
const firstParagraph = description.split(/\n\s*\n/)[0] || description;
|
|
const collapsed = firstParagraph.replace(/\s+/g, ' ').trim();
|
|
if (collapsed.length <= OPENAI_SHORT_DESCRIPTION_LIMIT) return collapsed;
|
|
|
|
const truncated = collapsed.slice(0, OPENAI_SHORT_DESCRIPTION_LIMIT - 3);
|
|
const lastSpace = truncated.lastIndexOf(' ');
|
|
const safe = lastSpace > 40 ? truncated.slice(0, lastSpace) : truncated;
|
|
return `${safe}...`;
|
|
}
|
|
|
|
function generateOpenAIYaml(displayName: string, shortDescription: string): string {
|
|
return `interface:
|
|
display_name: ${JSON.stringify(displayName)}
|
|
short_description: ${JSON.stringify(shortDescription)}
|
|
default_prompt: ${JSON.stringify(`Use ${displayName} for this task.`)}
|
|
policy:
|
|
allow_implicit_invocation: true
|
|
`;
|
|
}
|
|
|
|
/**
|
|
* Transform frontmatter for external hosts.
|
|
* Claude: strips `sensitive:` field (only Factory uses it).
|
|
* Codex: keeps name + description only, enforces 1024-char limit.
|
|
* Factory: keeps name + description + user-invocable, conditionally adds disable-model-invocation.
|
|
*/
|
|
function transformFrontmatter(content: string, host: Host): string {
|
|
const hostConfig = getHostConfig(host);
|
|
const fm = hostConfig.frontmatter;
|
|
|
|
if (fm.mode === 'denylist') {
|
|
// Denylist mode: strip listed fields, keep everything else
|
|
for (const field of fm.stripFields || []) {
|
|
if (field === 'voice-triggers') {
|
|
content = content.replace(/^voice-triggers:\n(?:\s+-\s+"[^"]*"\n?)*/m, '');
|
|
} else {
|
|
content = content.replace(new RegExp(`^${field}:\\s*.*\\n`, 'm'), '');
|
|
}
|
|
}
|
|
return content;
|
|
}
|
|
|
|
// Allowlist mode: reconstruct frontmatter with only allowed fields
|
|
const fmStart = content.indexOf('---\n');
|
|
if (fmStart !== 0) return content;
|
|
const fmEnd = content.indexOf('\n---', fmStart + 4);
|
|
if (fmEnd === -1) return content;
|
|
const frontmatter = content.slice(fmStart + 4, fmEnd);
|
|
const body = content.slice(fmEnd + 4);
|
|
const { name, description } = extractNameAndDescription(content);
|
|
|
|
// Description limit enforcement
|
|
if (fm.descriptionLimit) {
|
|
const behavior = fm.descriptionLimitBehavior || 'error';
|
|
if (description.length > fm.descriptionLimit) {
|
|
if (behavior === 'error') {
|
|
throw new Error(
|
|
`${hostConfig.displayName} description for "${name}" is ${description.length} chars (max ${fm.descriptionLimit}). ` +
|
|
`Compress the description in the .tmpl file.`
|
|
);
|
|
} else if (behavior === 'warn') {
|
|
console.warn(`WARNING: ${hostConfig.displayName} description for "${name}" exceeds ${fm.descriptionLimit} chars`);
|
|
}
|
|
// 'truncate' — silently proceed
|
|
}
|
|
}
|
|
|
|
// Build frontmatter with allowed fields
|
|
const indentedDesc = description.split('\n').map(l => ` ${l}`).join('\n');
|
|
let newFm = `---\nname: ${name}\ndescription: |\n${indentedDesc}\n`;
|
|
|
|
// Add extra fields (host-wide)
|
|
if (fm.extraFields) {
|
|
for (const [key, value] of Object.entries(fm.extraFields)) {
|
|
if (key !== 'name' && key !== 'description') {
|
|
newFm += `${key}: ${value}\n`;
|
|
}
|
|
}
|
|
}
|
|
|
|
// Add conditional fields
|
|
if (fm.conditionalFields) {
|
|
for (const rule of fm.conditionalFields) {
|
|
const match = Object.entries(rule.if).every(([k, v]) =>
|
|
new RegExp(`^${k}:\\s*${v}`, 'm').test(frontmatter)
|
|
);
|
|
if (match) {
|
|
for (const [key, value] of Object.entries(rule.add)) {
|
|
newFm += `${key}: ${value}\n`;
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
// Preserve additional keepFields beyond name and description
|
|
if (fm.keepFields) {
|
|
for (const field of fm.keepFields) {
|
|
if (field === 'name' || field === 'description') continue;
|
|
// Match YAML field with possible multi-line/array value (indented lines after colon)
|
|
const fieldMatch = frontmatter.match(new RegExp(`^${field}:(.*(?:\\n(?:[ \\t]+.+))*)`, 'm'));
|
|
if (fieldMatch) {
|
|
newFm += `${field}:${fieldMatch[1]}\n`;
|
|
}
|
|
}
|
|
}
|
|
|
|
// Rename fields (copy values from template frontmatter with new keys)
|
|
if (fm.renameFields) {
|
|
for (const [oldName, newName] of Object.entries(fm.renameFields)) {
|
|
const fieldMatch = frontmatter.match(new RegExp(`^${oldName}:(.+(?:\\n(?:\\s+.+)*)?)`, 'm'));
|
|
if (fieldMatch) {
|
|
newFm += `${newName}:${fieldMatch[1]}\n`;
|
|
}
|
|
}
|
|
}
|
|
|
|
newFm += '---';
|
|
return newFm + body;
|
|
}
|
|
|
|
/**
|
|
* Extract hook descriptions from frontmatter for inline safety prose.
|
|
* Returns a description of what the hooks do, or null if no hooks.
|
|
*/
|
|
function extractHookSafetyProse(tmplContent: string): string | null {
|
|
if (!tmplContent.match(/^hooks:/m)) return null;
|
|
|
|
// Parse the hook matchers to build a human-readable safety description
|
|
const matchers: string[] = [];
|
|
const matcherRegex = /matcher:\s*"(\w+)"/g;
|
|
let m;
|
|
while ((m = matcherRegex.exec(tmplContent)) !== null) {
|
|
if (!matchers.includes(m[1])) matchers.push(m[1]);
|
|
}
|
|
|
|
if (matchers.length === 0) return null;
|
|
|
|
// Build safety prose based on what tools are hooked
|
|
const toolDescriptions: Record<string, string> = {
|
|
Bash: 'check bash commands for destructive operations (rm -rf, DROP TABLE, force-push, git reset --hard, etc.) before execution',
|
|
Edit: 'verify file edits are within the allowed scope boundary before applying',
|
|
Write: 'verify file writes are within the allowed scope boundary before applying',
|
|
};
|
|
|
|
const safetyChecks = matchers
|
|
.map(t => toolDescriptions[t] || `check ${t} operations for safety`)
|
|
.join(', and ');
|
|
|
|
return `> **Safety Advisory:** This skill includes safety checks that ${safetyChecks}. When using this skill, always pause and verify before executing potentially destructive operations. If uncertain about a command's safety, ask the user for confirmation before proceeding.`;
|
|
}
|
|
|
|
// ─── External Host Config (now derived from hosts/*.ts) ──────
|
|
// EXTERNAL_HOST_CONFIG replaced by getHostConfig() from hosts/index.ts
|
|
|
|
// ─── Template Processing ────────────────────────────────────
|
|
|
|
const GENERATED_HEADER = `<!-- AUTO-GENERATED from {{SOURCE}} — do not edit directly -->\n<!-- Regenerate: bun run gen:skill-docs -->\n`;
|
|
|
|
/**
|
|
* Process external host output: routing, frontmatter, path rewrites, metadata.
|
|
* Shared between Codex and Factory (and future external hosts).
|
|
*/
|
|
function processExternalHost(
|
|
content: string,
|
|
tmplContent: string,
|
|
host: Host,
|
|
skillDir: string,
|
|
extractedDescription: string,
|
|
ctx: TemplateContext,
|
|
frontmatterName?: string,
|
|
): { content: string; outputPath: string; outputDir: string; symlinkLoop: boolean } {
|
|
const hostConfig = getHostConfig(host);
|
|
|
|
const name = externalSkillName(skillDir === '.' ? '' : skillDir, frontmatterName);
|
|
const outputDir = path.join(ROOT, hostConfig.hostSubdir, 'skills', name);
|
|
fs.mkdirSync(outputDir, { recursive: true });
|
|
const outputPath = path.join(outputDir, 'SKILL.md');
|
|
|
|
// Guard against symlink loops
|
|
let symlinkLoop = false;
|
|
const claudePath = ctx.tmplPath.replace(/\.tmpl$/, '');
|
|
try {
|
|
const resolvedClaude = fs.realpathSync(claudePath);
|
|
const resolvedExternal = fs.realpathSync(path.dirname(outputPath)) + '/' + path.basename(outputPath);
|
|
if (resolvedClaude === resolvedExternal) {
|
|
symlinkLoop = true;
|
|
}
|
|
} catch {
|
|
// realpathSync fails if file doesn't exist yet — no symlink loop
|
|
}
|
|
|
|
// Extract hook safety prose BEFORE transforming frontmatter (which strips hooks)
|
|
const safetyProse = extractHookSafetyProse(tmplContent);
|
|
|
|
// Transform frontmatter (host-aware)
|
|
let result = transformFrontmatter(content, host);
|
|
|
|
// Insert safety advisory at the top of the body (after frontmatter)
|
|
if (safetyProse) {
|
|
const bodyStart = result.indexOf('\n---') + 4;
|
|
result = result.slice(0, bodyStart) + '\n' + safetyProse + '\n' + result.slice(bodyStart);
|
|
}
|
|
|
|
// Config-driven path rewrites (order matters, replaceAll)
|
|
for (const rewrite of hostConfig.pathRewrites) {
|
|
result = result.replaceAll(rewrite.from, rewrite.to);
|
|
}
|
|
|
|
// Config-driven tool rewrites
|
|
if (hostConfig.toolRewrites) {
|
|
for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
|
|
result = result.replaceAll(from, to);
|
|
}
|
|
}
|
|
|
|
// Config-driven: generate metadata (e.g., openai.yaml for Codex)
|
|
if (hostConfig.generation.generateMetadata && !symlinkLoop) {
|
|
const agentsDir = path.join(outputDir, 'agents');
|
|
fs.mkdirSync(agentsDir, { recursive: true });
|
|
const shortDescription = condenseOpenAIShortDescription(extractedDescription);
|
|
fs.writeFileSync(path.join(agentsDir, 'openai.yaml'), generateOpenAIYaml(name, shortDescription));
|
|
}
|
|
|
|
return { content: result, outputPath, outputDir, symlinkLoop };
|
|
}
|
|
|
|
function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: string; content: string; symlinkLoop?: boolean; catalogParts?: CatalogParts | null } {
|
|
const tmplContent = fs.readFileSync(tmplPath, 'utf-8');
|
|
const relTmplPath = path.relative(ROOT, tmplPath);
|
|
let outputPath = tmplPath.replace(/\.tmpl$/, '');
|
|
|
|
// Determine skill directory relative to ROOT
|
|
const skillDir = path.relative(ROOT, path.dirname(tmplPath));
|
|
|
|
// Extract skill name from frontmatter early — needed for both TemplateContext and external host output paths.
|
|
// When frontmatter name: differs from directory name (e.g., run-tests/ with name: test),
|
|
// the frontmatter name is used for external skill naming and setup script symlinks.
|
|
const { name: extractedName, description: extractedDescription } = extractNameAndDescription(tmplContent);
|
|
const skillName = extractedName || path.basename(path.dirname(tmplPath));
|
|
|
|
|
|
// Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
|
|
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
|
|
const benefitsFrom = benefitsMatch
|
|
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
|
|
: undefined;
|
|
|
|
// Extract preamble-tier from frontmatter (1-4, controls which preamble sections are included)
|
|
const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
|
|
const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
|
|
|
|
// Extract interactive flag from frontmatter (generator-only; controls plan-mode handshake inclusion)
|
|
const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
|
|
const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
|
|
|
|
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL };
|
|
|
|
// Replace placeholders (supports parameterized: {{NAME:arg1:arg2}})
|
|
// Config-driven: suppressedResolvers return empty string for this host
|
|
const currentHostConfig = getHostConfig(host);
|
|
const suppressed = new Set(currentHostConfig.suppressedResolvers || []);
|
|
let content = tmplContent.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (match, fullKey) => {
|
|
const parts = fullKey.split(':');
|
|
const resolverName = parts[0];
|
|
const args = parts.slice(1);
|
|
if (suppressed.has(resolverName)) return '';
|
|
const entry = RESOLVERS[resolverName];
|
|
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
|
|
const { resolve, appliesTo } = unwrapResolver(entry);
|
|
if (appliesTo && !appliesTo(ctx)) return '';
|
|
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
|
|
});
|
|
|
|
// Check for any remaining unresolved placeholders
|
|
const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
|
|
if (remaining) {
|
|
throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
|
|
}
|
|
|
|
// Preprocess voice triggers: fold into description, strip field from frontmatter.
|
|
// Must run BEFORE transformFrontmatter so all hosts see the updated description,
|
|
// and BEFORE extractedDescription is used by external host metadata.
|
|
content = processVoiceTriggers(content);
|
|
|
|
// Re-extract description AFTER voice trigger preprocessing so Codex openai.yaml
|
|
// metadata gets the updated description with voice triggers included.
|
|
const postProcessDescription = extractNameAndDescription(content).description;
|
|
|
|
// For Claude: strip sensitive: field (only Factory uses it)
|
|
// For external hosts: route output, transform frontmatter, rewrite paths
|
|
let symlinkLoop = false;
|
|
if (host === 'claude') {
|
|
content = transformFrontmatter(content, host);
|
|
} else {
|
|
const result = processExternalHost(content, tmplContent, host, skillDir, postProcessDescription, ctx, extractedName || undefined);
|
|
content = result.content;
|
|
outputPath = result.outputPath;
|
|
symlinkLoop = result.symlinkLoop;
|
|
}
|
|
|
|
// Prepend generated header (after frontmatter)
|
|
const header = GENERATED_HEADER.replace('{{SOURCE}}', path.basename(tmplPath));
|
|
const fmEnd = content.indexOf('---', content.indexOf('---') + 3);
|
|
if (fmEnd !== -1) {
|
|
const insertAt = content.indexOf('\n', fmEnd) + 1;
|
|
content = content.slice(0, insertAt) + header + content.slice(insertAt);
|
|
} else {
|
|
content = header + content;
|
|
}
|
|
|
|
// Catalog trim (Claude only — external hosts have their own frontmatter shapes)
|
|
let catalogParts: CatalogParts | null = null;
|
|
if (host === 'claude' && CATALOG_MODE === 'trim') {
|
|
const trimmed = applyCatalogTrim(content, skillName);
|
|
if (trimmed) {
|
|
content = trimmed.content;
|
|
catalogParts = trimmed.parts;
|
|
}
|
|
}
|
|
|
|
return { outputPath, content, symlinkLoop, catalogParts };
|
|
}
|
|
|
|
// ─── Main ───────────────────────────────────────────────────
|
|
|
|
function findTemplates(): string[] {
|
|
return discoverTemplates(ROOT).map(t => path.join(ROOT, t.tmpl));
|
|
}
|
|
|
|
const ALL_HOSTS: Host[] = ALL_HOST_NAMES as Host[];
|
|
const hostsToRun: Host[] = HOST_ARG_VAL === 'all' ? ALL_HOSTS : [HOST];
|
|
const failures: { host: string; error: Error }[] = [];
|
|
|
|
for (const currentHost of hostsToRun) {
|
|
HOST = currentHost;
|
|
|
|
try {
|
|
let hasChanges = false;
|
|
const tokenBudget: Array<{ skill: string; lines: number; tokens: number }> = [];
|
|
|
|
// T4 catalog trim: collect routing/voice parts across all Claude skills,
|
|
// then write scripts/proactive-suggestions.json once per gen-skill-docs run.
|
|
const proactiveAggregate: Record<string, {
|
|
lead: string;
|
|
routing: string;
|
|
voice_line: string | null;
|
|
}> = {};
|
|
|
|
const currentHostConfig = getHostConfig(currentHost);
|
|
for (const tmplPath of findTemplates()) {
|
|
const dir = path.basename(path.dirname(tmplPath));
|
|
|
|
// includeSkills allowlist (union logic: include minus skip)
|
|
if (currentHostConfig.generation.includeSkills?.length) {
|
|
if (!currentHostConfig.generation.includeSkills.includes(dir)) continue;
|
|
}
|
|
// skipSkills denylist (subtracts from includeSkills or full set)
|
|
if (currentHostConfig.generation.skipSkills?.length) {
|
|
if (currentHostConfig.generation.skipSkills.includes(dir)) continue;
|
|
}
|
|
|
|
const { outputPath, content, symlinkLoop, catalogParts } = processTemplate(tmplPath, currentHost);
|
|
if (catalogParts) {
|
|
// Root-skill detection: when the template lives at ROOT/SKILL.md.tmpl,
|
|
// path.basename(path.dirname(tmplPath)) returns the repo's directory
|
|
// name (e.g. "seville-v3" in a Conductor worktree, "gstack" on CI).
|
|
// That's non-deterministic across machines and breaks CI freshness
|
|
// checks. Use the frontmatter `name` field as the registry key — the
|
|
// root SKILL.md.tmpl declares `name: gstack` explicitly. For all other
|
|
// skills, `dir` matches the directory name which matches the
|
|
// frontmatter name by convention.
|
|
const isRoot = path.dirname(tmplPath) === ROOT;
|
|
const key = isRoot ? 'gstack' : dir;
|
|
proactiveAggregate[key] = {
|
|
lead: catalogParts.lead,
|
|
routing: catalogParts.routingProse,
|
|
voice_line: catalogParts.voiceLine,
|
|
};
|
|
}
|
|
const relOutput = path.relative(ROOT, outputPath);
|
|
|
|
if (symlinkLoop) {
|
|
console.log(`SKIPPED (symlink loop): ${relOutput}`);
|
|
} else if (DRY_RUN) {
|
|
const existing = fs.existsSync(outputPath) ? fs.readFileSync(outputPath, 'utf-8') : '';
|
|
if (existing !== content) {
|
|
console.log(`STALE: ${relOutput}`);
|
|
hasChanges = true;
|
|
} else {
|
|
console.log(`FRESH: ${relOutput}`);
|
|
}
|
|
} else {
|
|
fs.writeFileSync(outputPath, content);
|
|
console.log(`GENERATED: ${relOutput}`);
|
|
}
|
|
|
|
// Track token budget
|
|
const lines = content.split('\n').length;
|
|
const tokens = Math.round(content.length / 4); // ~4 chars per token
|
|
tokenBudget.push({ skill: relOutput, lines, tokens });
|
|
|
|
// Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
|
|
// The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
|
|
// flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
|
|
// Prompt caching further reduces the marginal cost of larger skills. This ceiling
|
|
// exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
|
|
// a release, not to force compression on carefully-tuned big skills (ship,
|
|
// plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
|
|
const TOKEN_CEILING_BYTES = 160_000;
|
|
if (content.length > TOKEN_CEILING_BYTES) {
|
|
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
|
|
}
|
|
}
|
|
|
|
// Generate gstack-lite and gstack-full for OpenClaw host
|
|
if (currentHost === 'openclaw' && !DRY_RUN) {
|
|
const openclawDir = path.join(ROOT, 'openclaw');
|
|
if (!fs.existsSync(openclawDir)) fs.mkdirSync(openclawDir, { recursive: true });
|
|
|
|
const gstackLite = `# gstack-lite Planning Discipline
|
|
|
|
Injected by the orchestrator into spawned Claude Code sessions. Append to existing CLAUDE.md.
|
|
|
|
## Planning Discipline
|
|
1. Read every file you will modify. Understand existing patterns first.
|
|
2. Before writing code, state your plan: what, why, which files, test case, risk.
|
|
3. When ambiguous, prefer: completeness over shortcuts, existing patterns over new ones,
|
|
reversible choices over irreversible ones, safe defaults over clever ones.
|
|
4. Self-review your changes before reporting done. Check for: missed files, broken
|
|
imports, untested paths, style inconsistencies.
|
|
5. Report when done: what shipped, what decisions you made, anything uncertain.
|
|
`;
|
|
fs.writeFileSync(path.join(openclawDir, 'gstack-lite-CLAUDE.md'), gstackLite);
|
|
console.log('GENERATED: openclaw/gstack-lite-CLAUDE.md');
|
|
|
|
const gstackFull = `# gstack-full Pipeline
|
|
|
|
Injected by the orchestrator for complete feature builds. Append to existing CLAUDE.md.
|
|
|
|
## Full Pipeline
|
|
1. Read CLAUDE.md and understand the project context.
|
|
2. Run /autoplan to review your approach (CEO + eng + design review pipeline).
|
|
3. Implement the approved plan. Follow the planning discipline above.
|
|
4. Run /ship to create a PR with tests, changelog, and version bump.
|
|
5. Report back: PR URL, what shipped, decisions made, anything uncertain.
|
|
|
|
Do not ask for human input until the PR is ready for review.
|
|
`;
|
|
fs.writeFileSync(path.join(openclawDir, 'gstack-full-CLAUDE.md'), gstackFull);
|
|
console.log('GENERATED: openclaw/gstack-full-CLAUDE.md');
|
|
|
|
const gstackPlan = `# gstack-plan: Full Review Gauntlet
|
|
|
|
Injected by the orchestrator when the user wants to plan a Claude Code project.
|
|
Append to existing CLAUDE.md.
|
|
|
|
## Planning Pipeline
|
|
1. Read CLAUDE.md and understand the project context.
|
|
2. Run /office-hours to produce a design doc (problem statement, premises, alternatives).
|
|
3. Run /autoplan to review the design (CEO + eng + design + DX reviews + codex adversarial).
|
|
4. Save the final reviewed plan to a file the orchestrator can reference later.
|
|
Write it to: plans/<project-slug>-plan-<date>.md in the current repo.
|
|
Include the design doc, all review decisions, and the implementation sequence.
|
|
5. Report back to the orchestrator:
|
|
- Plan file path
|
|
- One-paragraph summary of what was designed and the key decisions
|
|
- List of accepted scope expansions (if any)
|
|
- Recommended next step (usually: spawn a new session with gstack-full to implement)
|
|
|
|
Do not implement anything. This is planning only.
|
|
The orchestrator will persist the plan link to its own memory/knowledge store.
|
|
`;
|
|
fs.writeFileSync(path.join(openclawDir, 'gstack-plan-CLAUDE.md'), gstackPlan);
|
|
console.log('GENERATED: openclaw/gstack-plan-CLAUDE.md');
|
|
}
|
|
|
|
if (DRY_RUN && hasChanges) {
|
|
console.error(`\nGenerated SKILL.md files are stale (${currentHost} host). Run: bun run gen:skill-docs --host ${currentHost}`);
|
|
if (HOST_ARG_VAL !== 'all') process.exit(1);
|
|
failures.push({ host: currentHost, error: new Error('Stale files detected') });
|
|
}
|
|
|
|
// T4 catalog trim: write aggregated proactive-suggestions.json (Claude only).
|
|
// The JSON registry lets agents pull voice triggers / routing prose for any
|
|
// skill on demand instead of paying for it always-loaded in the catalog.
|
|
//
|
|
// No timestamp field — keeps the file content-deterministic across runs so
|
|
// CI dry-run freshness checks don't flap on regen. If a per-run timestamp
|
|
// is ever needed for debugging, write it to a separate `.gen-stamp` file.
|
|
if (currentHost === 'claude' && CATALOG_MODE === 'trim' && Object.keys(proactiveAggregate).length > 0 && !DRY_RUN) {
|
|
const proactivePath = path.join(ROOT, 'scripts', 'proactive-suggestions.json');
|
|
// Sort keys alphabetically so the serialized JSON is identical across
|
|
// machines regardless of filesystem-iteration order. Without this, CI
|
|
// freshness checks fail when the local dev machine and CI runner
|
|
// discover templates in different orders.
|
|
const sortedSkills: typeof proactiveAggregate = {};
|
|
for (const key of Object.keys(proactiveAggregate).sort()) {
|
|
sortedSkills[key] = proactiveAggregate[key];
|
|
}
|
|
const payload = {
|
|
$schema: 'https://gstack.dev/schemas/proactive-suggestions.json',
|
|
catalog_mode: 'trim',
|
|
note: 'Routing / voice-trigger prose extracted from SKILL.md frontmatter descriptions during catalog trim. Loaded on demand when routing guidance is needed.',
|
|
skills: sortedSkills,
|
|
};
|
|
const serialized = JSON.stringify(payload, null, 2) + '\n';
|
|
// Only write if content actually changed — prevents needless touches that
|
|
// would flap CI freshness checks. Read existing file, compare, skip write
|
|
// when identical.
|
|
let existing = '';
|
|
try { existing = fs.readFileSync(proactivePath, 'utf-8'); } catch { /* first run */ }
|
|
if (existing !== serialized) {
|
|
fs.writeFileSync(proactivePath, serialized);
|
|
}
|
|
}
|
|
|
|
// Print token budget summary
|
|
if (!DRY_RUN && tokenBudget.length > 0) {
|
|
tokenBudget.sort((a, b) => b.lines - a.lines);
|
|
const totalLines = tokenBudget.reduce((s, t) => s + t.lines, 0);
|
|
const totalTokens = tokenBudget.reduce((s, t) => s + t.tokens, 0);
|
|
|
|
console.log('');
|
|
console.log(`Token Budget (${currentHost} host)`);
|
|
console.log('═'.repeat(60));
|
|
for (const t of tokenBudget) {
|
|
const hostSubdirs = ALL_HOST_CONFIGS.map(c => c.hostSubdir.replace('.', '\\.')).join('|');
|
|
const name = t.skill.replace(/\/SKILL\.md$/, '').replace(new RegExp(`^\\.(${hostSubdirs})\\/skills\\/`), '');
|
|
console.log(` ${name.padEnd(30)} ${String(t.lines).padStart(5)} lines ~${String(t.tokens).padStart(6)} tokens`);
|
|
}
|
|
console.log('─'.repeat(60));
|
|
console.log(` ${'TOTAL'.padEnd(30)} ${String(totalLines).padStart(5)} lines ~${String(totalTokens).padStart(6)} tokens`);
|
|
console.log('');
|
|
}
|
|
} catch (e) {
|
|
failures.push({ host: currentHost, error: e as Error });
|
|
console.error(`WARNING: ${currentHost} generation failed: ${(e as Error).message}`);
|
|
}
|
|
}
|
|
|
|
// --host all: report failures. Only exit(1) if claude failed.
|
|
if (failures.length > 0 && HOST_ARG_VAL === 'all') {
|
|
console.error(`\n${failures.length} host(s) failed: ${failures.map(f => f.host).join(', ')}`);
|
|
if (failures.some(f => f.host === 'claude')) process.exit(1);
|
|
}
|
|
// Single host dry-run failure already handled above
|
|
|
|
// After all hosts processed, warn if prefix patches may need re-applying
|
|
if (!DRY_RUN) {
|
|
try {
|
|
const configPath = path.join(process.env.HOME || '', '.gstack', 'config.yaml');
|
|
if (fs.existsSync(configPath)) {
|
|
const config = fs.readFileSync(configPath, 'utf-8');
|
|
if (/^skill_prefix:\s*true/m.test(config)) {
|
|
console.log('\nNote: skill_prefix is true. Run gstack-relink to re-apply name: patches.');
|
|
}
|
|
}
|
|
} catch { /* non-fatal */ }
|
|
}
|
|
|
|
// Regenerate gstack/llms.txt — single-file capability index for AI agents.
|
|
// Runs after SKILL.md generation so it sees current skill descriptions and
|
|
// browse command list. Wrapped in an IIFE so the await-import doesn't make
|
|
// this module async (test/gen-skill-docs.test.ts uses require() to pull
|
|
// extractVoiceTriggers/processVoiceTriggers, which fails on async modules).
|
|
// Freshness is asserted in test/llms-txt-shape.test.ts.
|
|
if (!DRY_RUN) {
|
|
void (async () => {
|
|
try {
|
|
const result = await writeLlmsTxt();
|
|
if (result.warnings.length > 0) {
|
|
for (const w of result.warnings) console.error(`[gen-llms-txt] WARN: ${w}`);
|
|
} else {
|
|
console.log(`[gen-llms-txt] gstack/llms.txt: ${result.skills.length} skills, ${result.browseCommands.length} browse commands`);
|
|
}
|
|
} catch (err) {
|
|
const msg = err instanceof Error ? err.message : String(err);
|
|
console.error(`[gen-llms-txt] FAILED: ${msg}`);
|
|
}
|
|
})();
|
|
}
|