v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907)

* test: canonical CARVE_GUARDS registry; derive parity + size-budget from it

Single source of truth for the carved-skill set + per-skill invariants
(EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts
SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists.
Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED
but had no sectioned parity invariant; now generated. carve-guards.ts is
a pure leaf data module (import type only) to avoid an import cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: shared carve-guard check fns with injectable root

discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so
the negative tests can point the real guards at a fixture dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E2 data-driven carve static ordering guard (gate)

Per-PR backstop for every carved skill, one test() per skill, driven by
CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific
ordering test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E1 carve-guard completeness meta-guard (gate)

Asserts filesystem carved set == CARVE_GUARDS set both directions, so a
future carve without a registry entry fails CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: ET1 guard-of-guards negative tests (gate)

Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable
root. Kills the silent-pass-guard failure class.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: T2 data-driven behavioral section-loading guard (periodic)

One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL
cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke
tests; testNames aligned to their touchfile keys. Registered in touchfiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: defer E3 real-session carve canary to TODOS

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve document-release into skeleton + on-demand section

Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG
voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit +
PR body) move to sections/release-body.md, read on demand after the Step
1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved.
Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve design-consultation into skeleton + on-demand section

Phases 3-6 (complete proposal, drill-downs, design preview, writing
DESIGN.md) move to sections/proposal-and-preview.md, read on demand after
product context + research. Skeleton 80,719 -> 59,229 B (-27%); union
preserved. Adds the CARVE_GUARDS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve cso into skeleton + on-demand section (security-safe)

Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode
dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the
Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the
skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved.

Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop):
mode dispatch must appear before any STOP-Read, so a directive that decides
which sections to read can't be stranded behind the STOP that reads them
(codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check.
parity moves cso monolith -> generated carved entry. cso-preserved.test.ts
strengthened: phrases checked against the union, plus an always-loaded
contract on the skeleton (dispatch + FP-filtering, codex #5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: make redaction/taxonomy tests union-aware for cso + document-release carves

The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts
pointer, git-history scan) into sections/audit-phases.md, and the
document-release carve moved the Step 9 PR-body redaction scan into
sections/release-body.md. Three content-presence tests asserted that content
in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union
(same fix as cso-preserved + parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.0.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: address pre-landing review (codex) on the carve

- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
  run only their selected phases, not every phase bundled in the section
  ('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
  first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
  path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-07 19:13:24 -07:00
committed by GitHub
parent 476b0ec597
commit e722c5bf89
34 changed files with 2981 additions and 2071 deletions
+53
View File
@@ -1,5 +1,58 @@
# Changelog
## [1.57.0.0] - 2026-06-07
## **Three more heavyweight skills load lighter, and every carved skill finally has a test that proves it loads.**
## **`/cso`, `/document-release`, and `/design-consultation` shed ~49KB of always-loaded prose; CI now blocks any carve that ships without its guards.**
gstack splits its biggest skills into a small always-loaded skeleton plus on-demand
sections that load only when a step needs them. This release carves three more,
`/document-release`, `/design-consultation`, and `/cso`, so the first time you invoke
them the agent reads far less. It also closes a gap from the earlier carves: only two
of six already-carved skills had a test proving an agent actually reads the section it
was told to read. Now all nine carved skills are guarded the same way, and CI blocks
any future carve that ships without its guards. `/cso` got extra care: its mode
dispatch and false-positive-filtering rules stay always-loaded, so a security audit
can never run with a rule stranded in an unread section.
### The numbers that matter
Measured with `wc -c <skill>/SKILL.md`; the skeleton+sections union is reproduced by
`bun test test/parity-suite.test.ts test/skill-size-budget.test.ts`.
| Skill | Always-loaded before | After | Δ |
|---|---|---|---|
| /design-consultation | 80,719 B | 59,229 B | **27%** |
| /document-release | 59,256 B | 45,797 B | **23%** |
| /cso | 79,383 B | 65,117 B | **18%** |
| Carved skills with a section-load guard | 2 of 6 | 9 of 9 | **full coverage** |
Total always-loaded prose across the three skills drops about 49KB (~12K tokens) on
first invoke, with nothing lost: every line moved into an on-demand section the
skeleton points at, and the parity suite checks the union still contains it.
### What this means for you
Run `/cso`, `/document-release`, or `/design-consultation` and the agent does less
reading before it starts working, so the session stays leaner. The carve pattern is
now safe to extend: a free static test runs on every PR and a behavioral test runs
weekly to prove the agent reads each section, so future slimming can't quietly drop
behavior. Nothing about how you invoke these skills changed.
### Itemized changes
#### Added
- Canonical carved-skill guard registry (`test/helpers/carve-guards.ts`): one source of truth for which skills are carved and what each must preserve. `parity-harness.ts` and `skill-size-budget.ts` derive their carved-skill lists from it.
- Carve guard suite: data-driven static ordering test, behavioral section-loading test (periodic), a completeness meta-guard that fails CI if a carved skill lacks its guards, and negative tests proving the guards actually fire.
- `/cso`, `/document-release`, and `/design-consultation` carved into skeleton + on-demand sections.
#### Changed
- `/cso` keeps its mode dispatch (`## Arguments`, `## Mode Resolution`), always-run phases, and false-positive-filtering exceptions always-loaded; an earliest-use invariant enforces that dispatch appears before any on-demand read.
#### For contributors
- Redaction, taxonomy, and parity content tests now read the skeleton+sections union so relocated prose still counts toward coverage.
- Real-session section-read canary deferred to TODOS (the deterministic guards ship first).
## [1.56.1.0] - 2026-06-03
## **`/sync-gbrain` can no longer delete your repo. Cleanup now refuses any directory it cannot prove it created.**