Files
gstack/scripts/gen-skill-docs.ts
T
Garry Tan 070722ace3 v1.52.1.0 feat: brain-aware planning — 5 skills read structured gbrain context before asking (#1742)
* feat(brain): brain-cache-spec.ts — single source of truth for cache layer

Foundation for the brain-aware planning skills work (v1.48 plan / D2).
One TS const file consolidates BRAIN_CACHE_ENTITIES (8 entities × TTL +
budget + invalidation rules), SKILL_DIGEST_SUBSETS (per-skill which
files to load), SALIENCE_DEFAULT_ALLOWLIST (D9 privacy gate),
SKILL_CALIBRATION_WEIGHTS (Phase 2 E5), and policy / identity / schema
constants.

Drift between docs and runtime becomes impossible by construction:
resolver, cache CLI, and test/skill-preflight-budget.test.ts all import
from the same module.

test/brain-cache-spec.test.ts: 19 invariant assertions (subset/entity
consistency, per-skill achievability, allowlist sanity, transport
defaults, user-slug fallback chain, lock timeout, retention policy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-core@1.0.0 schema pack (T1 / Phase 0)

Defines 8 typed page kinds for the brain entity model:
  gstack/user-profile, gstack/product, gstack/goal,
  gstack/developer-persona, gstack/brand, gstack/competitive-intel,
  gstack/skill-run, gstack/take

Each declares frontmatter shape (typed fields with required/optional flags),
retention policy (immutable / archive-after-90d / never-archive), and
emits_links graph for mcp__gbrain__schema_graph rendering.

getSchemaPackMutationPayload() returns JSON in the shape accepted by
mcp__gbrain__schema_apply_mutations. Idempotent registration: gbrain
skips when pack+version already installed.

test/gstack-schema-pack.test.ts: 16 invariants on pack shape, retention
policies, link verb consistency, JSON serializability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-brain-cache CLI (T2a) — core subcommands

bin/gstack-brain-cache: TS CLI with five subcommands:
  get <entity-name> [--project <slug>]
  refresh [--full] [--entity X] [--project <slug>]
  invalidate <entity-name> [--project <slug>]
  digest <entity-slug>
  meta [--project <slug>]

Cache layout per Phase 0.5 design:
  ~/.gstack/brain-cache/                 ← cross-project (user-profile)
  ~/.gstack/projects/<slug>/brain-cache/ ← per-project (everything else)

Per-entity TTL drives staleness; per-entity byte budgets enforce
compression at write time. Atomic writes via tmp+rename. Stale-but-usable
fallback when brain unreachable (returns cached digest with diagnostic
prefix instead of failing). Schema-version mismatch + endpoint switch
both trigger full rebuild for the affected scope (D4 A4).

Fetch+compress paths wired for the 7 entities (user-profile, product,
goals, developer-persona, brand, competitive-intel, recent-decisions,
salience) via gbrain CLI shell-out — works for local PGLite and
local-stdio MCP, transparent over the existing spawnGbrain helper.

Concurrent-refresh dedup (D3 / T15) is a follow-up commit. Salience
allowlist gate (D9 / T17) is a follow-up commit. Bootstrap + lifecycle
subcommands (T2b / T18) are follow-up commits.

test/brain-cache-roundtrip.test.ts: 11 tests covering path resolution,
meta lifecycle, endpoint detection, schema mismatch behavior, and the
four cache states (warm / cold-refreshed / stale-fallback / missing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): concurrent-refresh lockfile dedup (T15 / D3)

When autoplan dispatches 4 planning skills back-to-back and they all hit
a cold-miss on the same digest, only ONE actually fetches from the brain.
The rest dedup via the project-scoped lockfile at
~/.gstack/projects/<slug>/brain-cache/.refresh.lock.

Reuses the 5-min stale-takeover convention from /sync-gbrain. Lock is
taken over when:
  - File is older than CACHE_REFRESH_LOCK_TIMEOUT_MS
  - PID is on the same host and dead (process.kill(pid, 0) fails)
  - Lock file is corrupt (defensive)

withRefreshLock(projectSlug, fn) returns either the callback's value or
the literal 'dedup'. The CLI emits exit code 3 + diagnostic stderr on
dedup, so callers can choose to wait + retry (resolver does this) or
fall through to stale-but-usable behavior.

test/cache-concurrent-refresh.test.ts: 7 tests covering acquire/release,
stale-takeover, dead-PID takeover, corrupt-lock recovery, error-path
release, and cross-project lock location.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): salience privacy allowlist gate (T17 / D9)

D9 cross-model finding from codex outside voice: salience-sourced digests
can include emotionally-weighted personal pages (family, therapy,
reflection). Pulling those into a coding-review prompt leaks sensitive
context into work-flow reasoning.

fetchSalience now strips entries whose slugs don't match an allowlist
prefix BEFORE writing to the cache file. Default allowlist is
SALIENCE_DEFAULT_ALLOWLIST = ['projects/', 'concepts/', 'gstack/'].
User can extend via:
  gstack-config set salience_allowlist 'projects/,gstack/,concepts/,custom/'
or override with GSTACK_SALIENCE_ALLOWLIST env var.

Digest still records the strip count for transparency. Empty result
emits 'all N entries stripped' note rather than silent absence.

test/salience-allowlist.test.ts: 9 tests covering default permits,
default blocks, empty allowlist, env override, whitespace trimming,
and the invariant that defaults contain nothing sensitive (personal,
family, therapy, reflection, private, medical, health).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): bootstrap + list + purge subcommands (T2b / T18)

T2b — bootstrap synthesizes draft entity content from CLAUDE.md + README
+ recent learnings.jsonl and emits as JSON for the caller. Skill template
is responsible for the AUQ-confirm-before-write flow (D10 T4 extraction-
review requirement). Cli stays pure (no AUQ logic); agent owns user
interaction.

T18 — list/purge subcommands close the lifecycle loop:
  list [--project <slug>] — enumerate gstack-owned pages in brain
                            (probe all 8 gstack/* page types)
  purge <slug>           — delete one gstack page, refuses non-gstack/
                            slugs (defensive)

list defaults to all-projects (cross-project user-profile included).
With --project, filters to per-project pages plus the cross-project
user-profile. --json flag emits machine-readable output for the agent.

Retention sweep + audit subcommand are deferred to a follow-up commit
(they need the lifecycle scheduling design, not just CLI plumbing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): brain-aware planning resolvers + 3 new placeholders (T4)

scripts/resolvers/gbrain.ts adds:
  - generateBrainPreflight(ctx)       — emits per-skill ## Brain Context
                                        block + bash that loads digests via
                                        gstack-brain-cache get (one call per
                                        digest). Per-skill subset comes from
                                        SKILL_DIGEST_SUBSETS (single source).
  - generateBrainCacheRefresh(ctx)    — at-skill-end background refresh hook;
                                        non-blocking; warms cache for next run.
  - generateBrainWriteBack(ctx)       — Phase 2 / E5 calibration write-back
                                        with per-skill weight. Gated on
                                        personal trust policy + the
                                        BRAIN_CALIBRATION_WRITEBACK flag.
                                        Includes invalidation bash that busts
                                        affected digests after the write.

scripts/resolvers/index.ts registers three new placeholders:
  {{BRAIN_PREFLIGHT}}, {{BRAIN_CACHE_REFRESH}}, {{BRAIN_WRITE_BACK}}

All three resolvers return empty string for skills not in
SKILL_DIGEST_SUBSETS (defensive — skill template authors can drop the
placeholders into non-preflight skills with zero effect).

D9 privacy is mentioned in the rendered preflight prose so the agent
knows to expect filtered salience.
D11 codex tension: write-back gates on brain_trust_policy@<hash> being
personal — shared brains skip write-back to avoid polluting team
calibration profile.

test/brain-preflight.test.ts: 19 tests covering subset rendering,
non-preflight skill gating, cross-project vs per-project --project flag
emission, weight injection per skill, BRAIN_CALIBRATION_WRITEBACK flag
mention, and registration in RESOLVERS map.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-config brain integration helpers (T5+T10+T16)

Extends bin/gstack-config to support the brain-aware planning layer:

KEY VALIDATION (T5):
  Plain alphanumeric/underscore now extended to allow @<hex-hash> suffix.
  Required for per-endpoint namespaced keys (brain_trust_policy@<sha8>,
  user_slug_at_<sha8>). Keys without the suffix still validate as before.

VALUE WHITELISTING (D4 / D11):
  brain_trust_policy@* values gated to personal | shared | unset.
  Unknown values warn + default to unset (defense against typos).

NEW DEFAULTS (lookup_default):
  brain_trust_policy@*  -> unset
  salience_allowlist    -> '' (resolver uses SALIENCE_DEFAULT_ALLOWLIST)
  user_slug_at_*        -> '' (resolve-user-slug fills + persists on demand)

NEW SUBCOMMANDS:
  endpoint-hash      — print sha8 of active gbrain MCP URL from
                       ~/.claude.json. Collision check escalates to sha16
                       when a prior endpoint stored at the same sha8
                       would conflict (T10 defensive default).
  resolve-user-slug  — walks D4 A3 identity chain:
                         1. mcp__gbrain__whoami.client_name
                         2. $USER env var
                         3. sha8(git config user.email)
                         4. anonymous-<sha8(hostname)>
                       Persists result on first call so subsequent
                       calls are stable across sessions.

test/user-slug-fallback.test.ts: 14 tests covering endpoint-hash output
shape, fallback chain ordering, persistence, brain_trust_policy
namespace value validation + per-endpoint isolation, and key validator
extension for @-suffixed keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): wire 5 planning skill templates with BRAIN_* placeholders (T6)

Adds three placeholders to each of the 5 planning SKILL.md.tmpl files:
  {{BRAIN_PREFLIGHT}}     — top of skill body, before first interactive
                            section. Loads the per-skill digest subset
                            (5 files for office-hours, 2 for plan-eng-
                            review, etc.) into the prompt context before
                            any AskUserQuestion fires.
  {{BRAIN_WRITE_BACK}}    — end of skill, before refresh hook. Phase 2
                            calibration write path; gated on personal
                            policy + BRAIN_CALIBRATION_WRITEBACK flag.
  {{BRAIN_CACHE_REFRESH}} — end of skill, after write-back. Non-blocking
                            background refresh so next invocation gets
                            warm cache.

Files touched (templates + regenerated SKILL.md):
  office-hours/SKILL.md.tmpl
  plan-ceo-review/SKILL.md.tmpl
  plan-eng-review/SKILL.md.tmpl
  plan-design-review/SKILL.md.tmpl
  plan-devex-review/SKILL.md.tmpl
  (matching .md files regenerated via bun run gen:skill-docs)

All 5 generated SKILL.md files now contain the rendered ## Brain Context
(preflight) section + write-back guidance + background-refresh hook. The
resolver renders only for skills in SKILL_DIGEST_SUBSETS — these 5 + an
empty string for any other skill that drops in the placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): setup-gbrain trust-policy step + sync-gbrain flags (T5b / T13+T5c)

T5b — setup-gbrain Step 9.5:
  Inserts the brain trust policy AskUserQuestion before the verdict block.
  Detects active endpoint hash via gstack-config endpoint-hash. Branches
  per transport:
    * Local (sha == "local"): auto-set personal, one-line notice
    * Remote-MCP, unset: AskUserQuestion (personal vs shared)
    * Already-set: skip, just print current policy
  Personal default flips artifacts_sync_mode=full when still off.

T13+T5c — sync-gbrain:
  Adds two flag short-circuits:
    --refresh-cache : route to gstack-brain-cache refresh --project <slug>;
                       skip code + memory + brain-sync stages. Replaces
                       the planned /brain-refresh-context skill per D1
                       fold (one fewer always-loaded skill in catalog).
    --audit          : emit gstack-owned page summary + sensitive-content
                       leak check via gstack-brain-cache list. Read-only.
  Step 1 trust policy gate: fires the same AskUserQuestion as setup-gbrain
  Step 9.5 when policy is unset for a remote endpoint. Local engines
  auto-set personal silently. Idempotent for already-set policies.

Both templates re-rendered via bun run gen:skill-docs. Trust policy
question wording centralized in setup-gbrain Step 9.5; sync-gbrain
Step 1 references it to avoid prompt drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): schema migration + fence-block fallback + preflight budget (T19+T21)

3 new gate-tier test files closing the most important coverage gaps in
the brain-aware planning layer:

test/schema-version-migration.test.ts (D4 A4):
  - Cache file with mismatched schema_version triggers wipe-and-rebuild
  - Matching version + fresh TTL stays warm-hit (no unnecessary rebuild)
  - Rebuild wipes ALL files in scope, not just the one being read

test/takes-fence-fallback.test.ts:
  - Every preflight skill mentions both takes_add (preferred) and
    put_page fence-block (fallback for pre-T8 gbrain versions)
  - All 5 skills gate on BRAIN_CALIBRATION_WRITEBACK flag + personal
    trust policy
  - Per-skill weight matches SKILL_CALIBRATION_WEIGHTS (E5)
  - Write-back emits the kind=bet frontmatter shape and invalidates
    affected cache digests

test/skill-preflight-budget.test.ts (T21 / D7):
  - Per-skill BRAIN_* instruction bytes stay under 3x the runtime
    digest budget (resolver bloat catch)
  - Autoplan total instruction bytes stay under 75 KB (3x of 25 KB
    runtime cap)
  - Non-preflight skills emit zero brain bytes
  - Per-skill subset references are present in the preflight bash

Note on the 3x multiplier: SKILL_PREFLIGHT_BUDGET_BYTES governs runtime
digest data (enforced by cache CLI truncateToBudget). Instruction text
emitted by the resolver gets a separate 3x headroom — anything beyond
that signals the instructions themselves are bloated and need a trim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(todos): brain-aware planning follow-ups (T11)

Adds five deferred items from the v1.48.0.0 brain-aware planning plan:

  - P2: /gstack-reflect nightly synthesis skill (E2, deferred D4)
  - P3: cross-machine brain-cache sync (E3, deferred D5)
  - P3: /gstack-onboarding dedicated skill (E4, deferred D6)
  - P2: upstream gbrain takes_add + takes_resolve MCP ops (T8 wrap-up)
  - P3: background-refresh hook supervision (codex outside-voice T3)

Each entry follows the TODOS.md format: What / Why / Pros / Cons /
Context / Effort / Depends on. Each cross-references the v1.48.0.0
review decision (D-numbers from /plan-ceo-review and /plan-eng-review)
that deferred it.

The plan itself is at ~/.claude/plans/hm-interesting-well-why-dapper-eagle.md
and is NOT a TODO entry (it's a one-shot design doc, not ongoing work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): bump schema-migration test timeout to 60s

Rebuild path fans out to 7 per-project entity refreshes, each shelling
gbrain with 10s internal timeout. Worst case ~70s. Default bun test
5s was timing out on slow brain unreachable cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.50.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): tighten put_page regression pin to CLI subcommand

The test asserted no substring 'put_page' anywhere in the resolver,
but the BRAIN_WRITE_BACK resolver legitimately references the MCP op
`mcp__gbrain__put_page` as the fallback path for calibration takes
when gbrain v0.42+'s `takes_add` op isn't available. The check
conflated the deprecated `gbrain put_page` CLI subcommand (renamed in
v0.18+ to `gbrain put`) with the still-valid MCP op of the same name.

Narrow the assertion to `gbrain put_page` (with the space) so the
fallback prose stays legal while the CLI rename regression stays caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gstack-config gbrain-refresh subcommand

Adds a new subcommand that re-detects gbrain installation state and
persists the result to ~/.gstack/gbrain-detection.json. The detection
file is consumed by gen-skill-docs --respect-detection (next commit)
to decide whether to render the GBRAIN_CONTEXT_LOAD and
GBRAIN_SAVE_RESULTS resolver blocks in user-local SKILL.md generation.

Reuses the existing bin/gstack-gbrain-detect helper for the actual
probe; this subcommand just persists + summarizes. Users run it after
installing or uninstalling gbrain so their locally generated SKILL.md
files match their installation state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): gen-skill-docs respects gbrain-detection override

Adds --respect-detection flag (and bun run gen:skill-docs:user script).
When the flag is set, gen-skill-docs reads ~/.gstack/gbrain-detection.json
and filters GBRAIN_CONTEXT_LOAD + GBRAIN_SAVE_RESULTS out of each host's
suppressedResolvers when gbrain_local_status is "ok". When absent or
gbrain isn't detected, suppression behaves as before.

The default `bun run gen:skill-docs` (CI canonical) ignores the
detection file so the committed SKILL.md stays reproducible regardless
of any developer's local gbrain installation state. Use
gen:skill-docs:user for user-local installs (./setup invokes it).

No host config files modified — the static suppressedResolvers stay
correct for the no-gbrain case; the override happens at gen-time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): setup runs gbrain detection + conditional SKILL.md regen

At the end of install, ./setup now:
  1. Runs bin/gstack-gbrain-detect, persists the result to
     ~/.gstack/gbrain-detection.json
  2. If gbrain_local_status == "ok", regenerates Claude-host SKILL.md
     via `bun run gen:skill-docs:user --host claude` so the user's
     local install picks up the compressed brain-aware blocks
  3. If gbrain isn't detected, leaves the canonical no-gbrain SKILL.md
     files in place (zero token overhead) and surfaces the
     gstack-config gbrain-refresh path for users who install gbrain
     later

Together with the prior two commits, this completes the setup-time
conditional un-suppression: brain-aware blocks render iff the user
has gbrain installed, regardless of which CLI host they're on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(brain): compress GBRAIN_* resolvers, move template prose to docs/

generateGBrainContextLoad: 80 -> 115 tokens with explicit skip-header.
generateGBrainSaveResults: 500-700 -> 161 tokens per skill with the
skill metadata extracted into a typed skillSaveMap (slugPrefix + title
+ tag). Verbose prose (heredoc body, entity-stub instructions, throttle
handling, backlink protocol) moved into a new doc:
docs/gbrain-write-surfaces.md (Sections: §Context Load, §Save Template).
The agent reads the doc on-demand only when actually saving — one Read
call, cached by Claude's context.

Net per-planning-skill overhead under un-suppression drops from ~1000
tokens (naive un-suppression) to ~275 tokens (compressed). Combined
with the setup-time detection from prior commits, users WITHOUT gbrain
pay zero overhead (block suppressed at gen-time) and users WITH gbrain
pay ~275 tokens.

The /investigate special-case (data-research routing in CONTEXT_LOAD)
stays inline since it's skill-specific.

docs/gbrain-write-surfaces.md also serves as the manual-probe reference
for humans verifying live persistence + a topology summary covering
trust-policy + .gbrain-source reads-only semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(brain): wire SAVE_RESULTS for plan-design-review + plan-devex-review

Adds {{GBRAIN_SAVE_RESULTS}} placeholder to the two planning skills
that were missing it, immediately before {{BRAIN_WRITE_BACK}} (mirrors
plan-eng-review:324 + office-hours:650). The corresponding skillSaveMap
entries (design-reviews/<feature-slug> + devex-reviews/<feature-slug>)
landed with the resolver compression in the prior commit.

Regenerated SKILL.md reflects the new placeholder position. The
default no-gbrain generation (CI canonical) still suppresses the
block — zero diff in the rendered output for non-gbrain users.

All five planning skills now write a retrievable review page to gbrain
when gbrain is detected at setup time, instead of three of five.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): resolver compression + detection-override regression pins

test/resolvers-gbrain-save-results.test.ts (140 LOC, 10 tests):
  - Per-skill assertions for all 5 planning skills: emits gbrain put +
    correct slug prefix + tag + title.
  - Skip-header present so agent can short-circuit when gbrain isn't
    on PATH.
  - Compression pin: each per-skill block stays under 750 chars
    (~190 tokens) — guards against a future "let me add one more
    line" refactor silently re-inflating toward the ~1000-token naive
    un-suppression baseline.
  - Generic fallback for unmapped skill names still works.
  - /investigate gets the data-research routing suffix; non-investigate
    skills do not.
  - generateGBrainContextLoad stays under 500 chars (~125 tokens).

test/gbrain-detection-override.test.ts (120 LOC, 4 tests):
  - End-to-end through gen-skill-docs subprocess against an isolated
    temp GSTACK_HOME. Asserts:
    * detected:true un-suppresses GBRAIN_* → SKILL.md gains the block
    * detected:false (status != "ok") suppresses → no block
    * no detection file suppresses → no block (graceful default)
    * no --respect-detection flag IGNORES the detection file → no
      block (CI canonical path stays reproducible)

Each detection-override test restores the canonical SKILL.md in a
finally block so the working tree stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): fake-CLI agent-obedience E2E for /office-hours writeback

test/skill-e2e-office-hours-brain-writeback.test.ts (~210 LOC,
periodic-tier, ~$0.50-1/run):

Drives /office-hours via runSkillTest against a deterministic fixture
brief (pixel.fund founder pitch). The workdir has:
  - A regenerated office-hours/SKILL.md with the compressed brain blocks
    (generated via gen-skill-docs --respect-detection against a temp
    GSTACK_HOME, then restored to canonical post-snapshot)
  - A fake gbrain shell script on PATH that uses printf %q quoting to
    preserve --content "$(cat <<'EOF' ... EOF)" heredoc payloads
    intact (naive `echo "$@"` would lose argv boundaries)
  - The docs/gbrain-write-surfaces.md the resolver points to

Asserts:
  - gbrain-calls.log contains `gbrain put office-hours/pixel-fund`
  - Payload file at gbrain-payloads/office-hours/pixel-fund.md exists
    with valid YAML frontmatter (title: + tags: + design-doc tag)
  - At least one gbrain put entities/<name> call (entity stub
    enrichment is best-effort, soft warning if absent)

Covers agent obedience to the SAVE_RESULTS instruction. Out of scope:
gbrain CLI persistence contract (T11 covers that with real PGLite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): real PGLite round-trip E2E (matched-pair persistence)

test/skill-e2e-gbrain-roundtrip-local.test.ts (~145 LOC, periodic-tier,
~$0.001/run on Voyage):

Real gbrain CLI round-trip against an isolated temp HOME:
  1. gbrain init --pglite --embedding-model voyage:voyage-code-3
  2. gbrain put office-hours/<unique-slug> --content <markdown>
  3. gbrain get <slug>
  4. Assert every body line survives + title + tags + non-empty

This is the matched-pair check for the v1.50.0.0 question "is the data
we hope to save actually being saved?" — proves the gbrain CLI
persistence contract gstack relies on, against a real engine.

Does NOT involve the agent — pure CLI integration test. The agent
obedience side is covered by the fake-CLI E2E in the prior commit.

Skips cleanly when VOYAGE_API_KEY is unset OR gbrain CLI is missing
from PATH, so CI without secrets degrades gracefully.

Remote/Supabase routing is gbrain's contract — the same CLI shape
works against every engine. gstack stops at local round-trip coverage
to avoid re-testing gbrain's MCP client implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(brain): touchfiles + TODOS + CHANGELOG for v1.50.0.0

test/helpers/touchfiles.ts: register the two new E2Es in
E2E_TOUCHFILES + E2E_TIERS (both periodic):
  - office-hours-brain-writeback: triggered by resolver / gen-pipeline /
    detection helper / refresh subcommand / office-hours template /
    docs / fixture / test file changes
  - gbrain-roundtrip-local: triggered by resolver / test file changes

TODOS.md: append two P2 follow-ups carried over from the v1.50 plan:
  - Re-verify calibration takes when gbrain v0.42+ ships takes_add and
    BRAIN_CALIBRATION_WRITEBACK flips TRUE
  - Extend brain-writeback E2E to the other 4 planning skills (extract
    makeFakeGbrain to test/helpers/fake-gbrain.ts when second consumer
    arrives)

CHANGELOG.md v1.50.0.0: add a "Save-results path: works under any CLI
when gbrain is on PATH" section that documents the headline:
  - Conditional inclusion at setup-time (zero overhead for non-gbrain
    users, ~250 tokens with gbrain)
  - Wiring symmetry fix (5 of 5 planning skills now write a page)
  - Token cost table comparing detection states
  - Test coverage map (resolver unit + override mechanism + fake-CLI
    agent obedience + real PGLite round-trip)
  - Why remote routing isn't tested here (gbrain's contract)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(brain): tighten prompt + relax slug assertion in writeback E2E

Two fixes:

1. Prompt: "Slug it 'pixel-fund'" was ambiguous — agent could read it
   as "use pixel-fund as the FULL slug" instead of "substitute
   pixel-fund for <feature-slug>". Replaced with explicit guidance:
   "The feature-slug value to substitute into the SAVE_RESULTS
   template's <feature-slug> placeholder is exactly 'pixel-fund' (no
   path prefix — the template already provides the prefix). Apply the
   SAVE_RESULTS template literally." Also added "Do NOT explore gbrain
   --help" to short-circuit the discovery loop the agent fell into.

2. Slug assertion: was a strict /gbrain put .*office-hours\/pixel-fund/
   regex. This conflated two concerns — agent obedience (does the
   agent actually invoke gbrain put?) vs resolver output shape (does
   the template emit the right prefix?). The latter is already pinned
   by test/resolvers-gbrain-save-results.test.ts at the resolver level
   (free, hermetic). The E2E now asserts /gbrain put .*pixel-fund/
   (slug contains pixel-fund somewhere) plus a recursive payload-file
   search that accepts either office-hours/pixel-fund.md (template-
   faithful) or pixel-fund.md (agent dropped prefix). The YAML
   frontmatter + tag assertions on the payload remain strict — those
   are the real agent-obedience contract.

3. Entity-stub regex: was looking for entities/<name>; agent
   variability uses entity/<name>, people/<name>, companies/<name>.
   Loosened to match entit(y|ies) only. The soft-warning path stays
   (no hard fail) because entity extraction is best-effort prose, not
   a CLI contract.

Verified passing locally: 7 expect() calls, 268s, ~$0.50.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version to 1.51.1.0

main advanced to 1.51.0.0 while this branch was in development. Bump
to 1.51.1.0 (PATCH above main) so the branch lands cleanly above the
current main version per the monotonic-ordered-release invariant.

Renames the branch-internal [1.50.0.0] CHANGELOG entry to [1.51.1.0] —
1.50.0.0 never landed on main (main skipped to 1.51.0.0), so this
consolidates the branch's brain-aware planning + save-results work
under a single shipping version with no orphaned entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 08:35:00 -07:00

1003 lines
44 KiB
TypeScript

#!/usr/bin/env bun
/**
* Generate SKILL.md files from .tmpl templates.
*
* Pipeline:
* read .tmpl → find {{PLACEHOLDERS}} → resolve from source → format → write .md
*
* Supports --dry-run: generate to memory, exit 1 if different from committed file.
* Used by skill:check and CI freshness checks.
*/
import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
import { discoverTemplates } from './discover-skills';
import { writeLlmsTxt } from './gen-llms-txt';
import * as fs from 'fs';
import * as path from 'path';
import type { Host, TemplateContext } from './resolvers/types';
import { HOST_PATHS, unwrapResolver } from './resolvers/types';
import { RESOLVERS } from './resolvers/index';
import { externalSkillName, extractHookSafetyProse as _extractHookSafetyProse, extractNameAndDescription as _extractNameAndDescription, condenseOpenAIShortDescription as _condenseOpenAIShortDescription, generateOpenAIYaml as _generateOpenAIYaml } from './resolvers/codex-helpers';
import { generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './resolvers/review';
import { ALL_HOST_CONFIGS, ALL_HOST_NAMES, resolveHostArg, getHostConfig } from '../hosts/index';
import type { HostConfig } from './host-config';
const ROOT = path.resolve(import.meta.dir, '..');
const DRY_RUN = process.argv.includes('--dry-run');
// ─── GBrain Detection Override ──────────────────────────────
// When --respect-detection is passed, read ~/.gstack/gbrain-detection.json
// and un-suppress GBRAIN_CONTEXT_LOAD + GBRAIN_SAVE_RESULTS for hosts that
// statically suppress them (claude, codex, slate, factory, opencode,
// openclaw, cursor, kiro). Detection state is produced by
// bin/gstack-gbrain-detect and persisted by `gstack-config gbrain-refresh`
// or by ./setup.
//
// Default (no flag): static suppressedResolvers honored as-is. Used by
// `bun run gen:skill-docs` (CI + canonical checked-in SKILL.md files) so
// the committed output is reproducible regardless of any developer's
// local gbrain installation state. Use `bun run gen:skill-docs:user`
// (which adds --respect-detection) for user-local installs.
const RESPECT_DETECTION = process.argv.includes('--respect-detection');
function loadGbrainOverride(): { detected: boolean } {
if (!RESPECT_DETECTION) return { detected: false };
const stateDir = process.env.GSTACK_HOME || path.join(process.env.HOME || '', '.gstack');
const detectionPath = path.join(stateDir, 'gbrain-detection.json');
try {
const json = JSON.parse(fs.readFileSync(detectionPath, 'utf-8')) as { gbrain_local_status?: string };
return { detected: json.gbrain_local_status === 'ok' };
} catch {
return { detected: false };
}
}
const GBRAIN_OVERRIDE = loadGbrainOverride();
/**
* Compute effective suppressedResolvers for a host, applying the gbrain
* detection override when enabled. When the override fires, GBRAIN_*
* resolvers are removed from the suppression set so they render in the
* generated SKILL.md.
*/
function effectiveSuppressedResolvers(hostConfig: HostConfig): Set<string> {
let list = hostConfig.suppressedResolvers || [];
if (GBRAIN_OVERRIDE.detected) {
list = list.filter(r => r !== 'GBRAIN_CONTEXT_LOAD' && r !== 'GBRAIN_SAVE_RESULTS');
}
return new Set(list);
}
// ─── Host Detection (config-driven) ─────────────────────────
const HOST_ARG = process.argv.find(a => a.startsWith('--host'));
type HostArg = Host | 'all';
const HOST_ARG_VAL: HostArg = (() => {
if (!HOST_ARG) return 'claude';
const val = HOST_ARG.includes('=') ? HOST_ARG.split('=')[1] : process.argv[process.argv.indexOf(HOST_ARG) + 1];
if (val === 'all') return 'all';
try {
return resolveHostArg(val) as Host;
} catch {
throw new Error(`Unknown host: ${val}. Use ${ALL_HOST_NAMES.join(', ')}, or all.`);
}
})();
// For single-host mode, HOST is the host. For --host all, it's set per iteration below.
let HOST: Host = HOST_ARG_VAL === 'all' ? 'claude' : HOST_ARG_VAL;
// ─── Model Overlay Selection ────────────────────────────────
// --model is explicit. We do NOT auto-detect from host (host ≠ model).
// Default is 'claude'. Missing overlay file → empty string (graceful).
import { ALL_MODEL_NAMES, resolveModel, type Model } from './models';
const MODEL_ARG = process.argv.find(a => a.startsWith('--model'));
const MODEL_ARG_VAL: Model = (() => {
if (!MODEL_ARG) return 'claude';
const val = MODEL_ARG.includes('=') ? MODEL_ARG.split('=')[1] : process.argv[process.argv.indexOf(MODEL_ARG) + 1];
const resolved = resolveModel(val);
if (!resolved) {
throw new Error(`Unknown model: ${val}. Use ${ALL_MODEL_NAMES.join(', ')}, or a family variant (e.g., claude-opus-4-7, gpt-5.4-mini, o3).`);
}
return resolved;
})();
// ─── Catalog Mode (v1.45.0.0 T4) ────────────────────────────
// 'trim' (default): shorten frontmatter description to lead sentence,
// move routing/voice prose into a "## When to invoke" body section, and
// emit scripts/proactive-suggestions.json (single file across all skills).
// 'full': legacy v1.44 behavior — full description stays in frontmatter.
const CATALOG_MODE_ARG = process.argv.find(a => a.startsWith('--catalog-mode'));
const CATALOG_MODE: 'trim' | 'full' = (() => {
if (!CATALOG_MODE_ARG) return 'trim';
const val = CATALOG_MODE_ARG.includes('=')
? CATALOG_MODE_ARG.split('=')[1]
: process.argv[process.argv.indexOf(CATALOG_MODE_ARG) + 1];
if (val !== 'trim' && val !== 'full') {
throw new Error(`Unknown catalog mode: ${val}. Use 'trim' (default) or 'full'.`);
}
return val;
})();
// ─── Explain-level Overlay ──────────────────────────────────
// --explain-level=terse compresses preamble prose (writing-style, completeness,
// confusion-protocol, context-health) to a single pointer line at gen time.
// Default keeps the runtime-conditional behavior (sections render unconditionally,
// the model skips them when EXPLAIN_LEVEL: terse appears in the preamble echo).
// Opt-in via the build flag so most users get the runtime-flexible default.
const EXPLAIN_LEVEL_ARG = process.argv.find(a => a.startsWith('--explain-level'));
const EXPLAIN_LEVEL: 'default' | 'terse' = (() => {
if (!EXPLAIN_LEVEL_ARG) return 'default';
const val = EXPLAIN_LEVEL_ARG.includes('=')
? EXPLAIN_LEVEL_ARG.split('=')[1]
: process.argv[process.argv.indexOf(EXPLAIN_LEVEL_ARG) + 1];
if (val !== 'default' && val !== 'terse') {
throw new Error(`Unknown explain level: ${val}. Use 'default' or 'terse'.`);
}
return val;
})();
// HostPaths, HOST_PATHS, and TemplateContext imported from ./resolvers/types (line 7-8)
// Design constants (AI_SLOP_BLACKLIST, OPENAI_HARD_REJECTIONS, OPENAI_LITMUS_CHECKS)
// live in ./resolvers/constants and are consumed by resolvers directly.
// ─── External Host Helpers ───────────────────────────────────
// Re-export local copy for use in this file (matches codex-helpers.ts)
// Accepts optional frontmatter name to support directory/invocation name divergence
function externalSkillName(skillDir: string, frontmatterName?: string): string {
// Root skill (skillDir === '' or '.') always maps to 'gstack' regardless of frontmatter
if (skillDir === '.' || skillDir === '') return 'gstack';
// Use frontmatter name when it differs from directory name (e.g., run-tests/ with name: test)
const baseName = frontmatterName && frontmatterName !== skillDir ? frontmatterName : skillDir;
// Don't double-prefix: gstack-upgrade → gstack-upgrade (not gstack-gstack-upgrade)
if (baseName.startsWith('gstack-')) return baseName;
return `gstack-${baseName}`;
}
function extractNameAndDescription(content: string): { name: string; description: string } {
const fmStart = content.indexOf('---\n');
if (fmStart !== 0) return { name: '', description: '' };
const fmEnd = content.indexOf('\n---', fmStart + 4);
if (fmEnd === -1) return { name: '', description: '' };
const frontmatter = content.slice(fmStart + 4, fmEnd);
const nameMatch = frontmatter.match(/^name:\s*(.+)$/m);
const name = nameMatch ? nameMatch[1].trim() : '';
let description = '';
const lines = frontmatter.split('\n');
let inDescription = false;
const descLines: string[] = [];
for (const line of lines) {
if (line.match(/^description:\s*\|?\s*$/)) {
inDescription = true;
continue;
}
if (line.match(/^description:\s*\S/)) {
description = line.replace(/^description:\s*/, '').trim();
break;
}
if (inDescription) {
if (line === '' || line.match(/^\s/)) {
descLines.push(line.replace(/^ /, ''));
} else {
break;
}
}
}
if (descLines.length > 0) {
description = descLines.join('\n').trim();
}
return { name, description };
}
// ─── Voice Trigger Processing ────────────────────────────────
/**
* Extract voice-triggers YAML list from frontmatter.
* Returns an array of trigger strings, or [] if no voice-triggers field.
*/
function extractVoiceTriggers(content: string): string[] {
const fmStart = content.indexOf('---\n');
if (fmStart !== 0) return [];
const fmEnd = content.indexOf('\n---', fmStart + 4);
if (fmEnd === -1) return [];
const frontmatter = content.slice(fmStart + 4, fmEnd);
const triggers: string[] = [];
let inVoice = false;
for (const line of frontmatter.split('\n')) {
if (/^voice-triggers:/.test(line)) { inVoice = true; continue; }
if (inVoice) {
const m = line.match(/^\s+-\s+"(.+)"$/);
if (m) triggers.push(m[1]);
else if (!/^\s/.test(line)) break;
}
}
return triggers;
}
/**
* Preprocess voice triggers: fold voice-triggers YAML field into description,
* then strip the field from frontmatter. Must run BEFORE transformFrontmatter
* and extractNameAndDescription so all hosts see the updated description.
*/
function processVoiceTriggers(content: string): string {
const triggers = extractVoiceTriggers(content);
if (triggers.length === 0) return content;
// Strip voice-triggers block from frontmatter
content = content.replace(/^voice-triggers:\n(?:\s+-\s+"[^"]*"\n?)*/m, '');
// Get current description (after stripping voice-triggers, so it's clean)
const { description } = extractNameAndDescription(content);
if (!description) return content;
// Build new description with voice triggers appended
const voiceLine = `Voice triggers (speech-to-text aliases): ${triggers.map(t => `"${t}"`).join(', ')}.`;
const newDescription = description + '\n' + voiceLine;
// Replace old indented description with new in frontmatter
const oldIndented = description.split('\n').map(l => ` ${l}`).join('\n');
const newIndented = newDescription.split('\n').map(l => ` ${l}`).join('\n');
content = content.replace(oldIndented, newIndented);
return content;
}
// Export for testing
export { extractVoiceTriggers, processVoiceTriggers };
// ─── Catalog Trim (v1.45.0.0 T4) ─────────────────────────────
//
// Frontmatter `description:` blocks today pack: a one-line outcome, "Use when
// asked to..." voice triggers, "Proactively..." routing guidance, and a
// "(gstack)" tag. This pile is the always-loaded catalog surface — every
// session pays for the full text. The catalog trim splits the description
// into a one-line catalog entry (lead sentence + "(gstack)") that stays in
// the frontmatter, and a "## When to invoke" body section that holds the
// routing/voice triggers prose for in-skill discovery. A registry written
// to scripts/proactive-suggestions.json (one entry per skill) makes routing
// available to agents that need it without paying the always-loaded cost.
//
// Opt-out: `--catalog-mode=full` keeps v1.44 behavior (no trim, full
// description in frontmatter). Use when debugging routing regressions or
// when shipping skills to hosts that depend on the legacy fat catalog.
export interface CatalogParts {
lead: string; // First sentence — kept in catalog
routingProse: string; // "Use when asked to...", "Proactively..." paragraphs
voiceLine: string | null; // "Voice triggers (speech-to-text aliases): ..." line if present
hasGstackTag: boolean;
}
export function splitCatalogDescription(description: string): CatalogParts {
// Voice triggers line (folded in by processVoiceTriggers earlier)
const voiceMatch = description.match(/Voice triggers \(speech-to-text aliases\):[^\n]+/);
const voiceLine = voiceMatch ? voiceMatch[0] : null;
let working = voiceLine ? description.replace(voiceLine, '').trim() : description.trim();
const hasGstackTag = /\(gstack\)/.test(working);
if (hasGstackTag) working = working.replace(/\(gstack\)/, '').trim();
// Lead = first sentence (up to first period followed by space or end of string).
// We tolerate sentences with embedded periods (URLs, "v1.45.0.0") by requiring
// the period to be followed by whitespace OR end-of-text.
// First normalize to single-line for sentence detection, then back out.
const collapsed = working.replace(/\s+/g, ' ').trim();
const sentenceMatch = collapsed.match(/^([^.!?]*[.!?])(?:\s|$)/);
// sentenceLead is the FULL first sentence (no truncation). We compute routing
// from this position, then optionally truncate the displayed lead afterwards.
// Truncating first then computing routing was the v1.45.0.0 bug — when the
// first sentence exceeded 200 chars, the routing extraction would lose the
// entire tail of the description (design-consultation's "Use when..."
// routing prose silently dropped).
const sentenceLead = sentenceMatch ? sentenceMatch[1].trim() : collapsed.split(/\s/).slice(0, 20).join(' ');
// Routing prose: everything AFTER the first sentence boundary in the collapsed view.
const leadInCollapsed = collapsed.indexOf(sentenceLead);
const routingCollapsed = leadInCollapsed >= 0
? collapsed.slice(leadInCollapsed + sentenceLead.length).trim()
: '';
// Now produce the displayed lead — truncated if too long. The original
// sentenceLead is preserved for routing extraction below.
let lead = sentenceLead;
if (lead.length > 200) {
const trunc = lead.slice(0, 197);
const lastSpace = trunc.lastIndexOf(' ');
lead = (lastSpace > 60 ? trunc.slice(0, lastSpace) : trunc) + '...';
}
// Restore line breaks for routing prose by mapping back to original layout.
// Use original whitespace structure where possible; fall back to collapsed.
// Anchor recovery on sentenceLead (the untruncated first sentence) — not
// `lead` (which may have a "..." suffix and won't substring-match `working`).
let routingProse = routingCollapsed;
const collapsedLeadIdx = working.replace(/\s+/g, ' ').indexOf(sentenceLead);
if (collapsedLeadIdx >= 0) {
let consumed = 0;
let cut = 0;
for (let i = 0; i < working.length && consumed < collapsedLeadIdx + sentenceLead.length; i++) {
if (/\s/.test(working[i])) {
if (i === 0 || /\s/.test(working[i - 1])) continue;
consumed += 1;
} else {
consumed += 1;
}
cut = i + 1;
}
const tail = working.slice(cut).trim();
if (tail.length > 0) routingProse = tail;
}
return { lead, routingProse, voiceLine, hasGstackTag };
}
/** Build the catalog-trimmed `description:` block. */
export function buildTrimmedDescription(parts: CatalogParts): string {
const lead = parts.lead.trim();
const suffix = parts.hasGstackTag ? ' (gstack)' : '';
return `${lead}${suffix}`;
}
/** Build the body section that holds the routing/voice prose. */
export function buildWhenToInvokeSection(parts: CatalogParts): string {
const lines: string[] = ['## When to invoke this skill', ''];
if (parts.routingProse) {
lines.push(parts.routingProse);
lines.push('');
}
if (parts.voiceLine) {
lines.push(parts.voiceLine);
lines.push('');
}
return lines.join('\n');
}
/**
* Apply catalog trim to a SKILL.md body:
* - shorten frontmatter `description:` to lead + (gstack)
* - insert "## When to invoke" body section AFTER the generated header
* (so it lands near the top of body content, where routing guidance
* belongs)
*
* Returns the rewritten content plus the parts (used for proactive-suggestions
* JSON aggregation at the end of the run).
*/
export function applyCatalogTrim(content: string, skillName: string): { content: string; parts: CatalogParts } | null {
// Locate description block in frontmatter
if (!content.startsWith('---\n')) return null;
const fmEnd = content.indexOf('\n---', 4);
if (fmEnd === -1) return null;
const frontmatter = content.slice(4, fmEnd);
// Match `description: |` block + indented body lines
const descMatch = frontmatter.match(/^description:\s*\|?\s*\n((?:\s{2,}.*(?:\n|$))+)/m)
|| frontmatter.match(/^description:\s+(.+)$/m);
if (!descMatch) return null;
// Extract full description text
let descText: string;
if (descMatch[0].startsWith('description: |') || /^description:\s*\|/.test(descMatch[0])) {
descText = descMatch[1].split('\n').map(l => l.replace(/^\s{2}/, '')).join('\n').trim();
} else {
descText = descMatch[1].trim();
}
// Skip skills with very short descriptions (already trimmed or no routing prose).
// Below ~120 chars, splitting adds no value.
if (descText.length < 120) return null;
const parts = splitCatalogDescription(descText);
// If lead + (gstack) is already most of the text, no trim needed.
const trimmedLen = buildTrimmedDescription(parts).length;
if (trimmedLen >= descText.length - 20) return null;
// Replace description in frontmatter — keep trailing newline so the next
// YAML field doesn't collide on the same line as the description value.
const newDesc = buildTrimmedDescription(parts);
const newFrontmatter = frontmatter.replace(descMatch[0], `description: ${newDesc}\n`);
let newContent = '---\n' + newFrontmatter + content.slice(fmEnd);
// Insert body section after frontmatter (after the closing ---\n and any
// existing GENERATED header). We insert before the first non-comment line.
const bodyStart = newContent.indexOf('\n---\n') + 5;
const whenToInvoke = '\n' + buildWhenToInvokeSection(parts).trim() + '\n';
// Skip past the generated header if present (it lives after frontmatter close)
const headerMatch = newContent.slice(bodyStart).match(/^(<!--[^>]*-->\s*\n)+/);
const insertAt = bodyStart + (headerMatch ? headerMatch[0].length : 0);
newContent = newContent.slice(0, insertAt) + whenToInvoke + '\n' + newContent.slice(insertAt);
return { content: newContent, parts };
}
const OPENAI_SHORT_DESCRIPTION_LIMIT = 120;
function condenseOpenAIShortDescription(description: string): string {
const firstParagraph = description.split(/\n\s*\n/)[0] || description;
const collapsed = firstParagraph.replace(/\s+/g, ' ').trim();
if (collapsed.length <= OPENAI_SHORT_DESCRIPTION_LIMIT) return collapsed;
const truncated = collapsed.slice(0, OPENAI_SHORT_DESCRIPTION_LIMIT - 3);
const lastSpace = truncated.lastIndexOf(' ');
const safe = lastSpace > 40 ? truncated.slice(0, lastSpace) : truncated;
return `${safe}...`;
}
function generateOpenAIYaml(displayName: string, shortDescription: string): string {
return `interface:
display_name: ${JSON.stringify(displayName)}
short_description: ${JSON.stringify(shortDescription)}
default_prompt: ${JSON.stringify(`Use ${displayName} for this task.`)}
policy:
allow_implicit_invocation: true
`;
}
/**
* Transform frontmatter for external hosts.
* Claude: strips `sensitive:` field (only Factory uses it).
* Codex: keeps name + description only, enforces 1024-char limit.
* Factory: keeps name + description + user-invocable, conditionally adds disable-model-invocation.
*/
function transformFrontmatter(content: string, host: Host): string {
const hostConfig = getHostConfig(host);
const fm = hostConfig.frontmatter;
if (fm.mode === 'denylist') {
// Denylist mode: strip listed fields, keep everything else
for (const field of fm.stripFields || []) {
if (field === 'voice-triggers') {
content = content.replace(/^voice-triggers:\n(?:\s+-\s+"[^"]*"\n?)*/m, '');
} else {
content = content.replace(new RegExp(`^${field}:\\s*.*\\n`, 'm'), '');
}
}
return content;
}
// Allowlist mode: reconstruct frontmatter with only allowed fields
const fmStart = content.indexOf('---\n');
if (fmStart !== 0) return content;
const fmEnd = content.indexOf('\n---', fmStart + 4);
if (fmEnd === -1) return content;
const frontmatter = content.slice(fmStart + 4, fmEnd);
const body = content.slice(fmEnd + 4);
const { name, description } = extractNameAndDescription(content);
// Description limit enforcement
if (fm.descriptionLimit) {
const behavior = fm.descriptionLimitBehavior || 'error';
if (description.length > fm.descriptionLimit) {
if (behavior === 'error') {
throw new Error(
`${hostConfig.displayName} description for "${name}" is ${description.length} chars (max ${fm.descriptionLimit}). ` +
`Compress the description in the .tmpl file.`
);
} else if (behavior === 'warn') {
console.warn(`WARNING: ${hostConfig.displayName} description for "${name}" exceeds ${fm.descriptionLimit} chars`);
}
// 'truncate' — silently proceed
}
}
// Build frontmatter with allowed fields
const indentedDesc = description.split('\n').map(l => ` ${l}`).join('\n');
let newFm = `---\nname: ${name}\ndescription: |\n${indentedDesc}\n`;
// Add extra fields (host-wide)
if (fm.extraFields) {
for (const [key, value] of Object.entries(fm.extraFields)) {
if (key !== 'name' && key !== 'description') {
newFm += `${key}: ${value}\n`;
}
}
}
// Add conditional fields
if (fm.conditionalFields) {
for (const rule of fm.conditionalFields) {
const match = Object.entries(rule.if).every(([k, v]) =>
new RegExp(`^${k}:\\s*${v}`, 'm').test(frontmatter)
);
if (match) {
for (const [key, value] of Object.entries(rule.add)) {
newFm += `${key}: ${value}\n`;
}
}
}
}
// Preserve additional keepFields beyond name and description
if (fm.keepFields) {
for (const field of fm.keepFields) {
if (field === 'name' || field === 'description') continue;
// Match YAML field with possible multi-line/array value (indented lines after colon)
const fieldMatch = frontmatter.match(new RegExp(`^${field}:(.*(?:\\n(?:[ \\t]+.+))*)`, 'm'));
if (fieldMatch) {
newFm += `${field}:${fieldMatch[1]}\n`;
}
}
}
// Rename fields (copy values from template frontmatter with new keys)
if (fm.renameFields) {
for (const [oldName, newName] of Object.entries(fm.renameFields)) {
const fieldMatch = frontmatter.match(new RegExp(`^${oldName}:(.+(?:\\n(?:\\s+.+)*)?)`, 'm'));
if (fieldMatch) {
newFm += `${newName}:${fieldMatch[1]}\n`;
}
}
}
newFm += '---';
return newFm + body;
}
/**
* Extract hook descriptions from frontmatter for inline safety prose.
* Returns a description of what the hooks do, or null if no hooks.
*/
function extractHookSafetyProse(tmplContent: string): string | null {
if (!tmplContent.match(/^hooks:/m)) return null;
// Parse the hook matchers to build a human-readable safety description
const matchers: string[] = [];
const matcherRegex = /matcher:\s*"(\w+)"/g;
let m;
while ((m = matcherRegex.exec(tmplContent)) !== null) {
if (!matchers.includes(m[1])) matchers.push(m[1]);
}
if (matchers.length === 0) return null;
// Build safety prose based on what tools are hooked
const toolDescriptions: Record<string, string> = {
Bash: 'check bash commands for destructive operations (rm -rf, DROP TABLE, force-push, git reset --hard, etc.) before execution',
Edit: 'verify file edits are within the allowed scope boundary before applying',
Write: 'verify file writes are within the allowed scope boundary before applying',
};
const safetyChecks = matchers
.map(t => toolDescriptions[t] || `check ${t} operations for safety`)
.join(', and ');
return `> **Safety Advisory:** This skill includes safety checks that ${safetyChecks}. When using this skill, always pause and verify before executing potentially destructive operations. If uncertain about a command's safety, ask the user for confirmation before proceeding.`;
}
// ─── External Host Config (now derived from hosts/*.ts) ──────
// EXTERNAL_HOST_CONFIG replaced by getHostConfig() from hosts/index.ts
// ─── Template Processing ────────────────────────────────────
const GENERATED_HEADER = `<!-- AUTO-GENERATED from {{SOURCE}} — do not edit directly -->\n<!-- Regenerate: bun run gen:skill-docs -->\n`;
/**
* Process external host output: routing, frontmatter, path rewrites, metadata.
* Shared between Codex and Factory (and future external hosts).
*/
function processExternalHost(
content: string,
tmplContent: string,
host: Host,
skillDir: string,
extractedDescription: string,
ctx: TemplateContext,
frontmatterName?: string,
): { content: string; outputPath: string; outputDir: string; symlinkLoop: boolean } {
const hostConfig = getHostConfig(host);
const name = externalSkillName(skillDir === '.' ? '' : skillDir, frontmatterName);
const outputDir = path.join(ROOT, hostConfig.hostSubdir, 'skills', name);
fs.mkdirSync(outputDir, { recursive: true });
const outputPath = path.join(outputDir, 'SKILL.md');
// Guard against symlink loops
let symlinkLoop = false;
const claudePath = ctx.tmplPath.replace(/\.tmpl$/, '');
try {
const resolvedClaude = fs.realpathSync(claudePath);
const resolvedExternal = fs.realpathSync(path.dirname(outputPath)) + '/' + path.basename(outputPath);
if (resolvedClaude === resolvedExternal) {
symlinkLoop = true;
}
} catch {
// realpathSync fails if file doesn't exist yet — no symlink loop
}
// Extract hook safety prose BEFORE transforming frontmatter (which strips hooks)
const safetyProse = extractHookSafetyProse(tmplContent);
// Transform frontmatter (host-aware)
let result = transformFrontmatter(content, host);
// Insert safety advisory at the top of the body (after frontmatter)
if (safetyProse) {
const bodyStart = result.indexOf('\n---') + 4;
result = result.slice(0, bodyStart) + '\n' + safetyProse + '\n' + result.slice(bodyStart);
}
// Config-driven path rewrites (order matters, replaceAll)
for (const rewrite of hostConfig.pathRewrites) {
result = result.replaceAll(rewrite.from, rewrite.to);
}
// Config-driven tool rewrites
if (hostConfig.toolRewrites) {
for (const [from, to] of Object.entries(hostConfig.toolRewrites)) {
result = result.replaceAll(from, to);
}
}
// Config-driven: generate metadata (e.g., openai.yaml for Codex)
if (hostConfig.generation.generateMetadata && !symlinkLoop) {
const agentsDir = path.join(outputDir, 'agents');
fs.mkdirSync(agentsDir, { recursive: true });
const shortDescription = condenseOpenAIShortDescription(extractedDescription);
fs.writeFileSync(path.join(agentsDir, 'openai.yaml'), generateOpenAIYaml(name, shortDescription));
}
return { content: result, outputPath, outputDir, symlinkLoop };
}
function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: string; content: string; symlinkLoop?: boolean; catalogParts?: CatalogParts | null } {
const tmplContent = fs.readFileSync(tmplPath, 'utf-8');
const relTmplPath = path.relative(ROOT, tmplPath);
let outputPath = tmplPath.replace(/\.tmpl$/, '');
// Determine skill directory relative to ROOT
const skillDir = path.relative(ROOT, path.dirname(tmplPath));
// Extract skill name from frontmatter early — needed for both TemplateContext and external host output paths.
// When frontmatter name: differs from directory name (e.g., run-tests/ with name: test),
// the frontmatter name is used for external skill naming and setup script symlinks.
const { name: extractedName, description: extractedDescription } = extractNameAndDescription(tmplContent);
const skillName = extractedName || path.basename(path.dirname(tmplPath));
// Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
const benefitsFrom = benefitsMatch
? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
: undefined;
// Extract preamble-tier from frontmatter (1-4, controls which preamble sections are included)
const tierMatch = tmplContent.match(/^preamble-tier:\s*(\d+)$/m);
const preambleTier = tierMatch ? parseInt(tierMatch[1], 10) : undefined;
// Extract interactive flag from frontmatter (generator-only; controls plan-mode handshake inclusion)
const interactiveMatch = tmplContent.match(/^interactive:\s*(true|false)\s*$/m);
const interactive = interactiveMatch ? interactiveMatch[1] === 'true' : undefined;
const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host], preambleTier, model: MODEL_ARG_VAL, interactive, explainLevel: EXPLAIN_LEVEL };
// Replace placeholders (supports parameterized: {{NAME:arg1:arg2}})
// Config-driven: suppressedResolvers return empty string for this host.
// effectiveSuppressedResolvers() honors --respect-detection: when gbrain
// is detected locally, GBRAIN_* resolvers un-suppress so brain-aware
// blocks render for users who have gbrain installed.
const currentHostConfig = getHostConfig(host);
const suppressed = effectiveSuppressedResolvers(currentHostConfig);
let content = tmplContent.replace(/\{\{(\w+(?::[^}]+)?)\}\}/g, (match, fullKey) => {
const parts = fullKey.split(':');
const resolverName = parts[0];
const args = parts.slice(1);
if (suppressed.has(resolverName)) return '';
const entry = RESOLVERS[resolverName];
if (!entry) throw new Error(`Unknown placeholder {{${resolverName}}} in ${relTmplPath}`);
const { resolve, appliesTo } = unwrapResolver(entry);
if (appliesTo && !appliesTo(ctx)) return '';
return args.length > 0 ? resolve(ctx, args) : resolve(ctx);
});
// Check for any remaining unresolved placeholders
const remaining = content.match(/\{\{(\w+(?::[^}]+)?)\}\}/g);
if (remaining) {
throw new Error(`Unresolved placeholders in ${relTmplPath}: ${remaining.join(', ')}`);
}
// Preprocess voice triggers: fold into description, strip field from frontmatter.
// Must run BEFORE transformFrontmatter so all hosts see the updated description,
// and BEFORE extractedDescription is used by external host metadata.
content = processVoiceTriggers(content);
// Re-extract description AFTER voice trigger preprocessing so Codex openai.yaml
// metadata gets the updated description with voice triggers included.
const postProcessDescription = extractNameAndDescription(content).description;
// For Claude: strip sensitive: field (only Factory uses it)
// For external hosts: route output, transform frontmatter, rewrite paths
let symlinkLoop = false;
if (host === 'claude') {
content = transformFrontmatter(content, host);
} else {
const result = processExternalHost(content, tmplContent, host, skillDir, postProcessDescription, ctx, extractedName || undefined);
content = result.content;
outputPath = result.outputPath;
symlinkLoop = result.symlinkLoop;
}
// Prepend generated header (after frontmatter)
const header = GENERATED_HEADER.replace('{{SOURCE}}', path.basename(tmplPath));
const fmEnd = content.indexOf('---', content.indexOf('---') + 3);
if (fmEnd !== -1) {
const insertAt = content.indexOf('\n', fmEnd) + 1;
content = content.slice(0, insertAt) + header + content.slice(insertAt);
} else {
content = header + content;
}
// Catalog trim (Claude only — external hosts have their own frontmatter shapes)
let catalogParts: CatalogParts | null = null;
if (host === 'claude' && CATALOG_MODE === 'trim') {
const trimmed = applyCatalogTrim(content, skillName);
if (trimmed) {
content = trimmed.content;
catalogParts = trimmed.parts;
}
}
return { outputPath, content, symlinkLoop, catalogParts };
}
// ─── Main ───────────────────────────────────────────────────
function findTemplates(): string[] {
return discoverTemplates(ROOT).map(t => path.join(ROOT, t.tmpl));
}
const ALL_HOSTS: Host[] = ALL_HOST_NAMES as Host[];
const hostsToRun: Host[] = HOST_ARG_VAL === 'all' ? ALL_HOSTS : [HOST];
const failures: { host: string; error: Error }[] = [];
for (const currentHost of hostsToRun) {
HOST = currentHost;
try {
let hasChanges = false;
const tokenBudget: Array<{ skill: string; lines: number; tokens: number }> = [];
// T4 catalog trim: collect routing/voice parts across all Claude skills,
// then write scripts/proactive-suggestions.json once per gen-skill-docs run.
const proactiveAggregate: Record<string, {
lead: string;
routing: string;
voice_line: string | null;
}> = {};
const currentHostConfig = getHostConfig(currentHost);
for (const tmplPath of findTemplates()) {
const dir = path.basename(path.dirname(tmplPath));
// includeSkills allowlist (union logic: include minus skip)
if (currentHostConfig.generation.includeSkills?.length) {
if (!currentHostConfig.generation.includeSkills.includes(dir)) continue;
}
// skipSkills denylist (subtracts from includeSkills or full set)
if (currentHostConfig.generation.skipSkills?.length) {
if (currentHostConfig.generation.skipSkills.includes(dir)) continue;
}
const { outputPath, content, symlinkLoop, catalogParts } = processTemplate(tmplPath, currentHost);
if (catalogParts) {
// Root-skill detection: when the template lives at ROOT/SKILL.md.tmpl,
// path.basename(path.dirname(tmplPath)) returns the repo's directory
// name (e.g. "seville-v3" in a Conductor worktree, "gstack" on CI).
// That's non-deterministic across machines and breaks CI freshness
// checks. Use the frontmatter `name` field as the registry key — the
// root SKILL.md.tmpl declares `name: gstack` explicitly. For all other
// skills, `dir` matches the directory name which matches the
// frontmatter name by convention.
const isRoot = path.dirname(tmplPath) === ROOT;
const key = isRoot ? 'gstack' : dir;
proactiveAggregate[key] = {
lead: catalogParts.lead,
routing: catalogParts.routingProse,
voice_line: catalogParts.voiceLine,
};
}
const relOutput = path.relative(ROOT, outputPath);
if (symlinkLoop) {
console.log(`SKIPPED (symlink loop): ${relOutput}`);
} else if (DRY_RUN) {
const existing = fs.existsSync(outputPath) ? fs.readFileSync(outputPath, 'utf-8') : '';
if (existing !== content) {
console.log(`STALE: ${relOutput}`);
hasChanges = true;
} else {
console.log(`FRESH: ${relOutput}`);
}
} else {
fs.writeFileSync(outputPath, content);
console.log(`GENERATED: ${relOutput}`);
}
// Track token budget
const lines = content.split('\n').length;
const tokens = Math.round(content.length / 4); // ~4 chars per token
tokenBudget.push({ skill: relOutput, lines, tokens });
// Token ceiling check: warn if any generated SKILL.md exceeds ~40K tokens (160KB).
// The ceiling is a "watch for feature bloat" guardrail, not a hard gate. Modern
// flagship models have 200K-1M context windows, so 40K (4-20% of window) is fine.
// Prompt caching further reduces the marginal cost of larger skills. This ceiling
// exists to catch a runaway preamble or resolver that's grown by 10K+ tokens in
// a release, not to force compression on carefully-tuned big skills (ship,
// plan-ceo-review, office-hours all legitimately pack 25-35K tokens of behavior).
const TOKEN_CEILING_BYTES = 160_000;
if (content.length > TOKEN_CEILING_BYTES) {
console.warn(`⚠️ TOKEN CEILING: ${relOutput} is ${content.length} bytes (~${tokens} tokens), exceeds ${TOKEN_CEILING_BYTES} byte ceiling (~40K tokens)`);
}
}
// Generate gstack-lite and gstack-full for OpenClaw host
if (currentHost === 'openclaw' && !DRY_RUN) {
const openclawDir = path.join(ROOT, 'openclaw');
if (!fs.existsSync(openclawDir)) fs.mkdirSync(openclawDir, { recursive: true });
const gstackLite = `# gstack-lite Planning Discipline
Injected by the orchestrator into spawned Claude Code sessions. Append to existing CLAUDE.md.
## Planning Discipline
1. Read every file you will modify. Understand existing patterns first.
2. Before writing code, state your plan: what, why, which files, test case, risk.
3. When ambiguous, prefer: completeness over shortcuts, existing patterns over new ones,
reversible choices over irreversible ones, safe defaults over clever ones.
4. Self-review your changes before reporting done. Check for: missed files, broken
imports, untested paths, style inconsistencies.
5. Report when done: what shipped, what decisions you made, anything uncertain.
`;
fs.writeFileSync(path.join(openclawDir, 'gstack-lite-CLAUDE.md'), gstackLite);
console.log('GENERATED: openclaw/gstack-lite-CLAUDE.md');
const gstackFull = `# gstack-full Pipeline
Injected by the orchestrator for complete feature builds. Append to existing CLAUDE.md.
## Full Pipeline
1. Read CLAUDE.md and understand the project context.
2. Run /autoplan to review your approach (CEO + eng + design review pipeline).
3. Implement the approved plan. Follow the planning discipline above.
4. Run /ship to create a PR with tests, changelog, and version bump.
5. Report back: PR URL, what shipped, decisions made, anything uncertain.
Do not ask for human input until the PR is ready for review.
`;
fs.writeFileSync(path.join(openclawDir, 'gstack-full-CLAUDE.md'), gstackFull);
console.log('GENERATED: openclaw/gstack-full-CLAUDE.md');
const gstackPlan = `# gstack-plan: Full Review Gauntlet
Injected by the orchestrator when the user wants to plan a Claude Code project.
Append to existing CLAUDE.md.
## Planning Pipeline
1. Read CLAUDE.md and understand the project context.
2. Run /office-hours to produce a design doc (problem statement, premises, alternatives).
3. Run /autoplan to review the design (CEO + eng + design + DX reviews + codex adversarial).
4. Save the final reviewed plan to a file the orchestrator can reference later.
Write it to: plans/<project-slug>-plan-<date>.md in the current repo.
Include the design doc, all review decisions, and the implementation sequence.
5. Report back to the orchestrator:
- Plan file path
- One-paragraph summary of what was designed and the key decisions
- List of accepted scope expansions (if any)
- Recommended next step (usually: spawn a new session with gstack-full to implement)
Do not implement anything. This is planning only.
The orchestrator will persist the plan link to its own memory/knowledge store.
`;
fs.writeFileSync(path.join(openclawDir, 'gstack-plan-CLAUDE.md'), gstackPlan);
console.log('GENERATED: openclaw/gstack-plan-CLAUDE.md');
}
if (DRY_RUN && hasChanges) {
console.error(`\nGenerated SKILL.md files are stale (${currentHost} host). Run: bun run gen:skill-docs --host ${currentHost}`);
if (HOST_ARG_VAL !== 'all') process.exit(1);
failures.push({ host: currentHost, error: new Error('Stale files detected') });
}
// T4 catalog trim: write aggregated proactive-suggestions.json (Claude only).
// The JSON registry lets agents pull voice triggers / routing prose for any
// skill on demand instead of paying for it always-loaded in the catalog.
//
// No timestamp field — keeps the file content-deterministic across runs so
// CI dry-run freshness checks don't flap on regen. If a per-run timestamp
// is ever needed for debugging, write it to a separate `.gen-stamp` file.
if (currentHost === 'claude' && CATALOG_MODE === 'trim' && Object.keys(proactiveAggregate).length > 0 && !DRY_RUN) {
const proactivePath = path.join(ROOT, 'scripts', 'proactive-suggestions.json');
// Sort keys alphabetically so the serialized JSON is identical across
// machines regardless of filesystem-iteration order. Without this, CI
// freshness checks fail when the local dev machine and CI runner
// discover templates in different orders.
const sortedSkills: typeof proactiveAggregate = {};
for (const key of Object.keys(proactiveAggregate).sort()) {
sortedSkills[key] = proactiveAggregate[key];
}
const payload = {
$schema: 'https://gstack.dev/schemas/proactive-suggestions.json',
catalog_mode: 'trim',
note: 'Routing / voice-trigger prose extracted from SKILL.md frontmatter descriptions during catalog trim. Loaded on demand when routing guidance is needed.',
skills: sortedSkills,
};
const serialized = JSON.stringify(payload, null, 2) + '\n';
// Only write if content actually changed — prevents needless touches that
// would flap CI freshness checks. Read existing file, compare, skip write
// when identical.
let existing = '';
try { existing = fs.readFileSync(proactivePath, 'utf-8'); } catch { /* first run */ }
if (existing !== serialized) {
fs.writeFileSync(proactivePath, serialized);
}
}
// Print token budget summary
if (!DRY_RUN && tokenBudget.length > 0) {
tokenBudget.sort((a, b) => b.lines - a.lines);
const totalLines = tokenBudget.reduce((s, t) => s + t.lines, 0);
const totalTokens = tokenBudget.reduce((s, t) => s + t.tokens, 0);
console.log('');
console.log(`Token Budget (${currentHost} host)`);
console.log('═'.repeat(60));
for (const t of tokenBudget) {
const hostSubdirs = ALL_HOST_CONFIGS.map(c => c.hostSubdir.replace('.', '\\.')).join('|');
const name = t.skill.replace(/\/SKILL\.md$/, '').replace(new RegExp(`^\\.(${hostSubdirs})\\/skills\\/`), '');
console.log(` ${name.padEnd(30)} ${String(t.lines).padStart(5)} lines ~${String(t.tokens).padStart(6)} tokens`);
}
console.log('─'.repeat(60));
console.log(` ${'TOTAL'.padEnd(30)} ${String(totalLines).padStart(5)} lines ~${String(totalTokens).padStart(6)} tokens`);
console.log('');
}
} catch (e) {
failures.push({ host: currentHost, error: e as Error });
console.error(`WARNING: ${currentHost} generation failed: ${(e as Error).message}`);
}
}
// --host all: report failures. Only exit(1) if claude failed.
if (failures.length > 0 && HOST_ARG_VAL === 'all') {
console.error(`\n${failures.length} host(s) failed: ${failures.map(f => f.host).join(', ')}`);
if (failures.some(f => f.host === 'claude')) process.exit(1);
}
// Single host dry-run failure already handled above
// After all hosts processed, warn if prefix patches may need re-applying
if (!DRY_RUN) {
try {
const configPath = path.join(process.env.HOME || '', '.gstack', 'config.yaml');
if (fs.existsSync(configPath)) {
const config = fs.readFileSync(configPath, 'utf-8');
if (/^skill_prefix:\s*true/m.test(config)) {
console.log('\nNote: skill_prefix is true. Run gstack-relink to re-apply name: patches.');
}
}
} catch { /* non-fatal */ }
}
// Regenerate gstack/llms.txt — single-file capability index for AI agents.
// Runs after SKILL.md generation so it sees current skill descriptions and
// browse command list. Wrapped in an IIFE so the await-import doesn't make
// this module async (test/gen-skill-docs.test.ts uses require() to pull
// extractVoiceTriggers/processVoiceTriggers, which fails on async modules).
// Freshness is asserted in test/llms-txt-shape.test.ts.
if (!DRY_RUN) {
void (async () => {
try {
const result = await writeLlmsTxt();
if (result.warnings.length > 0) {
for (const w of result.warnings) console.error(`[gen-llms-txt] WARN: ${w}`);
} else {
console.log(`[gen-llms-txt] gstack/llms.txt: ${result.skills.length} skills, ${result.browseCommands.length} browse commands`);
}
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`[gen-llms-txt] FAILED: ${msg}`);
}
})();
}