mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-30 14:59:41 +02:00
v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712)
* docs(designs): add v2_PLAN.md — gstack v2 the lightest opinionated skill pack The approved plan from /plan-ceo-review → /plan-eng-review → /codex×2 → /plan-devex-review. Captures the v1.45/v2.0 hybrid release shape, cathedral parity-eval suite, sequential v1.45 execution, sections/*.md.tmpl pipeline, EVALS_BUDGET_HARD_CAP override path, and v2 launch copy specs. This commit just lands the design doc. Implementation follows in the rest of the v1.45.0.0 branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(parity): T0a — capture v1.44.1 baseline + capture helper + diff utility Cathedral parity-eval suite primitive. captureBaseline() walks every top-level SKILL.md and records bytes, lines, estimated tokens, frontmatter description length, and eval coverage. diffBaselines() reports per-skill delta + total corpus delta + catalog tokens delta. Locks the v1.44.1 reference snapshot at test/fixtures/parity-baseline-v1.44.1.json. After Phase A+B+C land, scripts/capture-baseline.ts --tag v1.45.0.0 produces a comparable snapshot; diff supplies the real numbers the v2 CHANGELOG quotes. Never invent baseline numbers; ship them only if they came from a real run. v1.44.1 numbers captured this commit: - 51 skills - 2,847 KB total corpus - ~9,319 catalog tokens (sum of description bytes / 4) - top 3: ship 160 KB, plan-ceo-review 128 KB, office-hours 108 KB Test plan: - bun test test/helpers/capture-parity-baseline.test.ts passes 4/4 - The baseline JSON file is committed so reviewers can audit v1→v2 numbers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(resolvers): T2 — ResolverEntry + appliesTo gate infrastructure Adds the conditional-resolver-injection plumbing from the v2_PLAN A.1 step. Resolvers can now be either a bare ResolverFn (always fires, current behavior) or a ResolverEntry { resolve, appliesTo? } (gated; appliesTo returning false skips the resolver, substitutes empty string). Why infrastructure-only: the audit during T0a confirmed most resolvers don't need gating. The {{NAME}} placeholder system is already conditional at the template level — a resolver only fires for skills that reference it. The gate is for future use when a placeholder's audience needs a structural guardrail beyond social convention, or when a sub-resolver inside a larger composed resolver (e.g. preamble) needs per-skill skip. scripts/gen-skill-docs.ts:444 now uses unwrapResolver() to handle both shapes. RESOLVERS map signature widens from Record<string, ResolverFn> to Record<string, ResolverValue>. All existing resolvers stay bare functions and work unchanged. Test plan: - bun test test/resolver-entry.test.ts: 6 pass (gate plumbing + registry) - bun test test/gen-skill-docs.test.ts: 389 pass (no regression) - bun run gen:skill-docs --dry-run: all SKILL.md files FRESH (no diff) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(preamble): T3 — jargon dedup + terse-build flag (Phase A.2 + A.3) A.2 jargon dedup: generate-writing-style.ts replaces the inlined 80-term jargon list with a one-line pointer to scripts/jargon-list.json. The list was duplicated into every tier-2+ skill (48 of 51 skills); inlining cost was ~1.5 KB × 48 = ~70 KB across the corpus. Pointer cost is ~30 bytes per skill. Agents Read the JSON once per session on first jargon term encountered; thereafter the terms array is the canonical reference. A.3 terse build flag: --explain-level=terse compresses preamble prose at gen time. When the flag is set, writing-style collapses to a one-line terse directive and completeness-section + confusion-protocol + context-health are dropped entirely. The default build keeps the runtime-conditional behavior intact (sections still render; the model skips them when EXPLAIN_LEVEL: terse appears in the preamble echo). Terse build is opt-in for users who want shipped skills to match their runtime preference and avoid the per-session terse-mode dead prose. TemplateContext gains an optional `explainLevel: 'default' | 'terse'` field. Default builds set it to 'default'; --explain-level=terse sets 'terse'. Resolvers gate their output via `ctx?.explainLevel === 'terse'`. Measured impact (default build, post-T3): - Total corpus: 2,847 KB → 2,812 KB (saved 35 KB) - ship.md: 160 → 159 KB - plan-ceo-review.md: 128 → 127 KB - Top 10 heaviest: all slightly smaller from jargon pointer Larger compression lands in T4 (catalog trim) and T7 (atomic regen across the full Phase A pipeline). The terse build path further compresses to ~711K tokens vs default ~725K (saved ~14K tokens corpus-wide). Test plan: - bun test test/gen-skill-docs.test.ts: 389 pass (no regression) - bun test test/resolver-entry.test.ts: 6 pass - bun test test/helpers/capture-parity-baseline.test.ts: 4 pass - bun run gen:skill-docs --explain-level=terse: ship.md drops completeness + confusion-protocol + context-health sections; writing-style collapses to one-line terse directive 48 SKILL.md files updated (every tier-2+ skill picks up the jargon pointer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(catalog): T4 — catalog trim + proactive-suggestions.json (Phase A.4) Shortens frontmatter `description:` in every Claude SKILL.md to a single lead sentence + (gstack) tag. The routing prose ("Use when asked to...", "Proactively suggest...") and voice triggers move to a "## When to invoke" body section so they remain discoverable inside the skill. A per-run registry at scripts/proactive-suggestions.json aggregates the routing/ voice text for all 52 skills so agents can pull guidance on demand without paying for it in the always-loaded catalog. Build flag --catalog-mode=full restores v1.44 legacy behavior (full multi-line descriptions in frontmatter). Default is trim. splitCatalogDescription() extracts: lead sentence, routing paragraphs, voice-triggers line, (gstack) tag presence. Short descriptions (<120 chars, already trimmed) are skipped via a guard so re-runs are idempotent. Measured impact (vs v1.44.1 baseline): - Catalog tokens (sum of description bytes / 4): 9,319 → 4,045 (-56.6%) - Total SKILL.md corpus bytes: 2,915 KB → 2,880 KB (-1.2%) - Routing prose preserved as in-skill "## When to invoke" sections - 52 skill entries in scripts/proactive-suggestions.json (on-demand registry) The corpus drop is small because catalog trim MOVES text from frontmatter to body, it doesn't delete it. The headline win is the catalog: the always-loaded system prompt surface drops by more than half. Test plan: - bun test test/gen-skill-docs.test.ts: 389 pass, 0 fail - Manual: ship/SKILL.md frontmatter description is now ONE line ending with `(gstack)`; allowed-tools field on next line (YAML well-formed) - Manual: scripts/proactive-suggestions.json contains 52 entries - bun run gen:skill-docs --catalog-mode=full restores legacy behavior 53 files changed (52 SKILL.md across hosts + the new proactive-suggestions.json). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(budget): T5 — hard token budgets + override audit trail (Phase A.6) Two new gate-tier guardrails for the v1.45.0.0 compression baseline: 1. test/skill-size-budget.test.ts (NEW) — per-skill SKILL.md size budget. Compares current state to test/fixtures/parity-baseline-v1.44.1.json. Three checks: per-skill (×1.05 default ratio), total corpus, and catalog token estimate (≤7000 for v1.45). The per-skill ratio is 1.05 not 1.0 because the T4 catalog trim moves text from frontmatter to a body section; small skills see a tiny body growth that's fine when offset by the much larger catalog-token win. 2. test/skill-budget-regression.test.ts EXTENDED — hard dollar cap on per-run eval cost. Per-tier defaults: gate $25, periodic $70. Umbrella EVALS_BUDGET_HARD_CAP=$30. Catches runaway eval costs (infinite retry, model price changes) before they amortize across PRs. Both checks support an override path with audit trail: GSTACK_SIZE_BUDGET_OVERRIDE_REASON="why this is OK" — size EVALS_BUDGET_OVERRIDE_REASON="why this is OK" — cost Overrides log to ~/.gstack/analytics/spend-overrides.jsonl with timestamp + scope + reason + CI provenance (runner, branch, commit) via test/helpers/budget-override.ts. Why the override audit: a hard cap with no escape valve becomes operationally hostile (legit price changes, longer transcripts, new required evals can all blow the cap). An override with no audit becomes "everyone overrides everything and the gate is theater." This module ships the audit half so reviewers can see what was waived and why. Codex 2nd-pass critique #3 absorbed: per-suite caps + override path with auditability + budget baselines checked into repo (parity-baseline-v1.44.1.json already in test/fixtures/). Test plan: - bun test test/skill-size-budget.test.ts: 4 pass (per-skill, corpus, catalog, baseline-exists) - bun test test/skill-budget-regression.test.ts: 4 pass (2 existing ratio checks + 2 new hard-cap checks) - Existing eval runs ($14.11 e2e, $0.02 llm-judge) sit well under the new caps Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cso): T6 — pin must-preserve security phrases (Phase A.5) cso/SKILL.md is a content-heavy security audit skill (75 KB after T3+T4). Codex 2nd-pass critique #9: "cso exemption too broad ... should still get resolver dedup, catalog trim, sectioning if safe, and targeted evals around must-not-miss checks." T3 (jargon dedup) and T4 (catalog trim) already applied to cso the same way they applied to every other skill — confirmed by inspection: - jargon list NOT inlined (0 inline term lines) - catalog description trimmed to one line (74 bytes vs 774 bytes baseline) - "## When to invoke" body section present T6 work: lock in the security-prose preservation via a gate-tier test that fails CI if future compression strips load-bearing phrases: - OWASP, STRIDE positioning - daily / comprehensive mode discipline - confidence scoring language - active verification ("verif" prefix catches verify/verified/verification) - ## Preamble heading (preamble resolver still fires) Also guards cso against accidental over-stripping: SKILL.md must stay ≥30 KB (currently 75 KB) — a sudden cliff would mean compression went past the targeted-dedup line into structural removal. No structural change to cso. Future Phase B sections/ work for cso requires writing baseline parity tests FIRST per the v2_PLAN.md sequencing. Test plan: - bun test test/cso-preserved.test.ts: 5 pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(parity): T0b — cathedral parity-suite harness + invariant registry Adds the harness that the v2_PLAN.md cathedral parity-eval suite is built on. Compares CURRENT SKILL.md output to v1.44.1 baseline along three axes: STRUCTURE frontmatter shape (catalog trim landed, "## When to invoke" present) CONTENT must-preserve phrases per skill family (cso: OWASP/STRIDE; plan-ceo: SCOPE EXPANSION/HOLD SCOPE/REDUCTION; ship: VERSION/CHANGELOG/PR; etc.) SIZE per-skill byte budget (maxSizeRatio + minBytes guards) PARITY_INVARIANTS registry pins 10 load-bearing skills (cso, ship, plan-*- review, review, qa, investigate, office-hours, autoplan). Each entry declares what must NOT regress; future compression that strips these phrases or shrinks a skill past its minBytes cliff fails CI. Periodic-tier LLM-judge parity (paid, ~$0.20/skill) lands in v2.0.0.0 sections/ phase. Same registry, same harness, judge added on top. Test plan: - bun test test/parity-suite.test.ts: 10/10 invariants pass vs v1.44.1 - Per-skill failures get actionable per-line breakdown so a reviewer can see which phrase / heading / size limit went sideways Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): T1 — skill coverage matrix + structural-compliance floor Phase 0 deliverable — eval-first foundation. Two new test files plus the registry: 1. test/skill-coverage-matrix.ts — single source of truth mapping each skill to its gate-tier + periodic-tier test files. SKILL_COVERAGE record with 51 entries; every gstack skill on disk has at least one gate-tier entry. 2. test/skill-coverage-matrix.test.ts — CI gate. Asserts every skill on disk has a registry entry AND that gate[] is non-empty. Catches "skill added but eval not registered" the moment a new SKILL.md lands. 3. test/skill-coverage-floor.test.ts — per-skill structural compliance (FREE, file-IO only). For each of 51 skills, verifies: - SKILL.md exists - Frontmatter well-formed (name + description fields) - Catalog-trim contract (inline description ≤ 250 chars, or block form) - Generated header present (edit .tmpl, not .md) - Body ≥ 200 bytes (non-trivial content) - No unresolved {{TEMPLATE}} placeholders leaked The "floor" is the minimum eval that every skill ships with. Skills that need deeper behavioral testing get additional entries in their coverage record (e.g., ship has skill-e2e-ship-idempotency + workflow + floor). Future skills only need to add the floor entry and the matrix gate unblocks them. Codex 2nd-pass critique #1 mitigation: eval-first floor is structural compliance (the testable part) — judgment-skill behavior gets layered periodic-tier evals on top. We don't pretend the floor proves correctness, only that the skill structurally compiles. Test plan: - bun test test/skill-coverage-matrix.test.ts: 4 pass (matrix shape + coverage) - bun test test/skill-coverage-floor.test.ts: 309 pass (6 checks × 51 skills + 3 registry-level) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(skills): T7 — atomic regenerate + capture v1.45.0.0 baseline Final regen pass across all hosts after T1-T6 work landed. Captures the v1.45.0.0 parity baseline at test/fixtures/parity-baseline-v1.45.0.0.json for diffing against the v1.44.1 reference. Measured deltas (real numbers from test/helpers/capture-parity-baseline.ts): Total SKILL.md corpus 2,847 KB → 2,813 KB (-1.2%) Catalog tokens (always-loaded) ~9,319 → ~4,045 tokens (-56.6%) Top 10 heaviest skills 0.5-1.0% drop each The catalog token cut is the headline. It's the always-loaded surface, i.e. tokens charged on every session start. Per-skill SKILL.md sizes barely moved because T4 catalog trim MOVES routing prose from frontmatter to a body "## When to invoke" section rather than deleting it — the catalog wins without amputating discoverability. The bigger per-skill compression lands in v2.0.0.0 (Phase B sections/ pattern on the 5 heavyweights). v1.45 is the foundation: eval-first infrastructure + cheap wins. scripts/proactive-suggestions.json regenerated with the latest 52 skills listed (one-time write per gen-skill-docs run; aggregated catalog parts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.45.0.0 — gstack v2 foundation: catalog tokens drop 56%, eval-first floor Bumps VERSION + package.json to 1.45.0.0. CHANGELOG entry covers what shipped between v1.44.1 and this release: the cathedral parity-eval foundation, conditional resolver injection plumbing, jargon dedup, terse build flag, catalog trim with one-line frontmatter descriptions, hard token + dollar budget gates with override audit, cso preservation pins, and the v1.44.1 ↔ v1.45.0.0 parity baselines committed to test/fixtures/. Numbers (measured, not estimated): - Catalog tokens: ~9,319 → ~4,045 (-56.6%) - Total corpus: 2,847 KB → 2,813 KB (-1.2%) - Skills with gate-tier eval coverage: 32/51 → 51/51 (floor achieved) This is the foundation release. v2.0.0.0 will ship the architectural break (sections/*.md.tmpl pattern + mechanical Read enforcement + eval-coverage annotations) as a coordinated marketing-grade launch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(catalog): refresh proactive-suggestions.json timestamp after v1.45 bump The generated_at field updates on every gen-skill-docs run; this is the T7 atomic-regenerate output landed alongside the v1.45.0.0 bump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): deterministic proactive-suggestions.json (no per-run timestamp) Original implementation wrote a generated_at timestamp on every gen-skill-docs run. That made CI dry-run freshness checks flap because the file changed on every regeneration even when the actual content (skill descriptions, routing prose, voice triggers) was unchanged. Two fixes: 1. Drop the generated_at field. The file is purely a content registry now. 2. Only write the file when serialized content actually differs from disk. Reproducible test: bun run gen:skill-docs twice in a row now leaves scripts/proactive-suggestions.json unchanged on the second run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): preserve routing prose when first sentence exceeds 200 chars splitCatalogDescription truncated the lead BEFORE computing routing extraction, which meant skills whose first sentence was over 200 chars (design-consultation: 207 chars) had their entire routing prose silently dropped — the "## When to invoke" body section came out empty. Root cause: routing was extracted via `collapsed.indexOf(lead)` after lead was suffixed with "...". The "..." never appeared in the original string, so indexOf returned -1 and routingProse fell back to empty. Fix: compute routing from sentenceLead (the untruncated first sentence) BEFORE truncating the displayed lead. The displayed lead still gets "..." when over 200 chars, but the routing extraction uses the real boundary. Also: refresh golden snapshots for claude/codex/factory ship and update two unit tests that asserted v1.44 behavior: - skill-validation.test.ts: trigger-phrase + proactive-routing tests now search whole content, not just frontmatter (T4 moved them to a body "## When to invoke" section) - writing-style-resolver.test.ts: jargon-list assertion now expects the T3 reference pointer, not the inline list Test plan: - bun test test/skill-validation.test.ts test/writing-style-resolver.test.ts test/host-config.test.ts test/skill-size-budget.test.ts test/parity-suite.test.ts test/skill-coverage-matrix.test.ts test/skill-coverage-floor.test.ts test/cso-preserved.test.ts test/resolver-entry.test.ts test/helpers/capture-parity-baseline.test.ts test/gen-skill-docs.test.ts: 1134 pass, 0 fail - Manual verify: design-consultation/SKILL.md "## When to invoke this skill" body section now contains "Use when asked to..." + "Proactively suggest..." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): deterministic proactive-suggestions.json across machines CI check-freshness failed because scripts/proactive-suggestions.json serialized differently on local vs CI: 1. Root-skill key leaked the directory name. processTemplate's outer loop computed `dir = path.basename(path.dirname(tmplPath))`. For the root SKILL.md.tmpl at ROOT/SKILL.md.tmpl, that returns the repo-checkout directory name — "seville-v3" in a Conductor worktree, "gstack" on GitHub Actions, anything-else for a fork. Fix: detect root via `path.dirname(tmplPath) === ROOT` and hardcode the key to "gstack" for that one case. 2. Aggregate key order was filesystem-iteration order. discoverTemplates doesn't guarantee stable ordering across platforms, so the JSON `skills` object came out shuffled between machines. Fix: sort Object.keys(proactiveAggregate) alphabetically before serializing. After the fix, the generated file is identical on every machine and matches what's committed. CI freshness check (bun run gen:skill-docs && git diff --exit-code) now passes. Test plan: - bun run gen:skill-docs && bun run gen:skill-docs --dry-run: all FRESH - node -e 'verify keys sorted': sorted match: true - grep -c '"seville-v3"' scripts/proactive-suggestions.json: 0 - Focused test suite: 704 pass, 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(catalog): unit + regression coverage for catalog-trim helpers Four exported functions in scripts/gen-skill-docs.ts handle every skill's frontmatter rewrite at gen time but had zero unit tests. Both real bugs we shipped (and fixed) on this branch lived in these functions: v1.45.0.0 design-consultation: when the first sentence exceeded 200 chars, routing-prose extraction lost the entire tail (anchored on truncated lead with "..." that didn't substring-match the original). v1.45.0.0 CI freshness: root-skill key leaked the checkout directory name ("seville-v3" vs "gstack") and aggregate order was filesystem- iteration order. Both shapes are now regression-tested: - splitCatalogDescription: 7 tests covering simple multi-line, >200-char first sentence (design-consultation regression), voice-trigger extraction, no-(gstack) handling, embedded periods (documents known fallback), no-period fragments, and idempotency. - buildTrimmedDescription: 3 tests. - buildWhenToInvokeSection: 3 tests. - applyCatalogTrim: 4 tests covering the standard rewrite, no-op for already-short descriptions, the YAML-collision newline fix, and the malformed-frontmatter null return. - proactive-suggestions.json determinism: 3 tests asserting sorted keys, root keyed as "gstack" (not the worktree directory), and no timestamp/generated_at field that would flap CI freshness. Test plan: - bun test test/catalog-trim.test.ts: 20 pass, 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): fill three remaining v1.46.0.0 test gaps Three untested surfaces from the v1.46.0.0 work. All three would have caught real bugs we shipped (and fixed) on this branch. 1. test/helpers/budget-override.test.ts — 7 tests pin the audit-trail contract for EVALS_BUDGET_OVERRIDE_REASON and GSTACK_SIZE_BUDGET_OVERRIDE_REASON. Without this, the audit logger could silently drop events and overrides become invisible. Tests cover: required fields per JSONL line, CI provenance capture (CI/GITHUB_ACTIONS/branch/commit), local-runner defaults, append-only behavior, missing-directory recovery, and unwritable- path resilience (logs warning instead of throwing). 2. test/terse-build.test.ts — 16 tests pin --explain-level=terse behavior across the 4 gated resolvers and the composed preamble. Default vs terse vs undefined-ctx all asserted. Without this, a refactor that breaks the explainLevel threading silently regresses the opt-in compression path; the runtime EXPLAIN_LEVEL: terse gate still works so users wouldn't notice. Tier-1 invariant pinned (terse-only-affects-tier-2+). 3. test/gen-skill-docs-idempotency.test.ts — 2 tests catch the class of bug behind the v1.45.0.0 timestamp flap. Two consecutive gen-skill-docs runs must produce byte-identical outputs across STABLE_OUTPUTS (proactive-suggestions.json, SKILL.md, ship/SKILL.md, plan-ceo-review/SKILL.md, office-hours/SKILL.md, gstack/llms.txt). --dry-run reports zero stale files after a fresh gen. CI freshness regressions surface as test failures BEFORE a PR is opened. Test plan: - bun test test/helpers/budget-override.test.ts: 7 pass - bun test test/terse-build.test.ts: 16 pass - bun test test/gen-skill-docs-idempotency.test.ts: 2 pass - Full focused suite (15 test files): 1179 pass, 0 fail (+45 new tests vs the pre-fill baseline of 1134) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(coverage): close 5 remaining v1.46.0.0 test gaps (A-E) Five behaviors that v1.46 ships but had no test coverage. All now pinned. A) --host all idempotency (test/gen-skill-docs-idempotency.test.ts) The default test ran Claude host only. Non-Claude hosts (Codex, Factory, Cursor, OpenClaw, GBrain, Slate, OpenCode, Hermes, Kiro) each have their own output paths and could carry their own non-deterministic fields. We hit a "--host all needed for freshness check" mid-/ship. Now: two consecutive `bun run gen:skill-docs --host all` runs must produce byte-identical outputs across a per-host sample (.agents/, .cursor/, .factory/, .gbrain/). Catches per-host adapter regressions before CI. B) --catalog-mode=full opt-out (test/catalog-mode-full.test.ts) The legacy escape hatch had zero tests. 6 new tests across two layers: static (CATALOG_MODE_ARG parsed; conditional gate present; default is "trim"; invalid value throws) + smoke (actual --catalog-mode=full run produces a multi-line `description: |` block + omits "## When to invoke" body section; mutates the working tree then restores in a finally block). C) parity-baseline-v1.44.1.json integrity (test/parity-baseline-integrity.test.ts) The baseline is the source of every v1→v2 number cited in the CHANGELOG v1.46.0.0 entry. Anyone could edit it without test failure until now. 8 new tests pin: existence, tag, capturedFromCommit allowlist, expected v1.44 numbers (51 skills, ~2,915 KB, ~9,319 catalog tokens), CHANGELOG references this file by path, per-skill shape, and a SHA256 byte-stability hash. Any edit fails with a clear "if intentional, update EXPECTED_HASH AND the CHANGELOG numbers" signal. D) Live appliesTo gate end-to-end (test/resolver-entry.test.ts extended) The unwrapResolver unit tests covered the function; the gen-skill-docs.ts substitution loop that USES the gate had no integration coverage. 6 new tests simulate the exact 4-line shape from gen-skill-docs.ts:457-467 against synthetic registries: plain-function fires unconditionally, gated fires when true / empty-string when false, mixed registries compose, parameterized resolvers respect gates, unknown resolvers throw. E) Per-skill min-size floor (test/skill-size-budget.test.ts extended) The existing 200-byte body coverage-floor is a noise floor — a skill that lost 99.75% of content still passes. 1 new test asserts every skill stays ≥80% of its v1.44.1 baseline size (the parity-suite content invariants only covered 10 of 51 skills; the remaining 41 were uncovered). SECTIONS_EXTRACTED hook in place for v2.0.0.0 when the sections/ pattern legitimately shrinks ship/plan-ceo/etc. past the floor. Test plan: - bun test focused 17-file suite: 1202 pass, 0 fail (+23 new tests vs the pre-fill 1179 baseline) - catalog-mode=full mutates working tree then restores cleanly - --host all idempotency runs two full gen passes in <1s on this machine Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
-85
@@ -2,13 +2,7 @@
|
||||
name: document-generate
|
||||
preamble-tier: 2
|
||||
version: 1.0.0
|
||||
description: |
|
||||
Generate missing documentation from scratch for a feature, module, or entire project.
|
||||
Uses the Diataxis framework (tutorial / how-to / reference / explanation) to produce
|
||||
complete, structured documentation. Can be invoked standalone or called by
|
||||
/document-release when it finds coverage gaps. Use when asked to "write docs",
|
||||
"generate documentation", "document this feature", "create a tutorial", or
|
||||
"explain this module". (gstack)
|
||||
description: Generate missing documentation from scratch for a feature, module, or entire project. (gstack)
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
@@ -29,6 +23,15 @@ triggers:
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
|
||||
|
||||
## When to invoke this skill
|
||||
|
||||
Uses the Diataxis framework (tutorial / how-to / reference / explanation) to produce
|
||||
complete, structured documentation. Can be invoked standalone or called by
|
||||
/document-release when it finds coverage gaps. Use when asked to "write docs",
|
||||
"generate documentation", "document this feature", "create a tutorial", or
|
||||
"explain this module".
|
||||
|
||||
## Preamble (run first)
|
||||
|
||||
```bash
|
||||
@@ -555,84 +558,7 @@ Applies to AskUserQuestion, user replies, and findings. AskUserQuestion Format i
|
||||
- User-turn override wins: if the current message asks for terse / no explanations / just the answer, skip this section.
|
||||
- Terse mode (EXPLAIN_LEVEL: terse): no glosses, no outcome-framing layer, shorter responses.
|
||||
|
||||
Jargon list, gloss on first use if the term appears:
|
||||
- idempotent
|
||||
- idempotency
|
||||
- race condition
|
||||
- deadlock
|
||||
- cyclomatic complexity
|
||||
- N+1
|
||||
- N+1 query
|
||||
- backpressure
|
||||
- memoization
|
||||
- eventual consistency
|
||||
- CAP theorem
|
||||
- CORS
|
||||
- CSRF
|
||||
- XSS
|
||||
- SQL injection
|
||||
- prompt injection
|
||||
- DDoS
|
||||
- rate limit
|
||||
- throttle
|
||||
- circuit breaker
|
||||
- load balancer
|
||||
- reverse proxy
|
||||
- SSR
|
||||
- CSR
|
||||
- hydration
|
||||
- tree-shaking
|
||||
- bundle splitting
|
||||
- code splitting
|
||||
- hot reload
|
||||
- tombstone
|
||||
- soft delete
|
||||
- cascade delete
|
||||
- foreign key
|
||||
- composite index
|
||||
- covering index
|
||||
- OLTP
|
||||
- OLAP
|
||||
- sharding
|
||||
- replication lag
|
||||
- quorum
|
||||
- two-phase commit
|
||||
- saga
|
||||
- outbox pattern
|
||||
- inbox pattern
|
||||
- optimistic locking
|
||||
- pessimistic locking
|
||||
- thundering herd
|
||||
- cache stampede
|
||||
- bloom filter
|
||||
- consistent hashing
|
||||
- virtual DOM
|
||||
- reconciliation
|
||||
- closure
|
||||
- hoisting
|
||||
- tail call
|
||||
- GIL
|
||||
- zero-copy
|
||||
- mmap
|
||||
- cold start
|
||||
- warm start
|
||||
- green-blue deploy
|
||||
- canary deploy
|
||||
- feature flag
|
||||
- kill switch
|
||||
- dead letter queue
|
||||
- fan-out
|
||||
- fan-in
|
||||
- debounce
|
||||
- throttle (UI)
|
||||
- hydration mismatch
|
||||
- memory leak
|
||||
- GC pause
|
||||
- heap fragmentation
|
||||
- stack overflow
|
||||
- null pointer
|
||||
- dangling pointer
|
||||
- buffer overflow
|
||||
Curated jargon list lives at `~/.claude/skills/gstack/scripts/jargon-list.json` (80+ terms). On the first jargon term you encounter this session, Read that file once; treat the `terms` array as the canonical list. The list is repo-owned and may grow between releases.
|
||||
|
||||
|
||||
## Completeness Principle — Boil the Lake
|
||||
|
||||
Reference in New Issue
Block a user