v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907)

* test: canonical CARVE_GUARDS registry; derive parity + size-budget from it

Single source of truth for the carved-skill set + per-skill invariants
(EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts
SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists.
Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED
but had no sectioned parity invariant; now generated. carve-guards.ts is
a pure leaf data module (import type only) to avoid an import cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: shared carve-guard check fns with injectable root

discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so
the negative tests can point the real guards at a fixture dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E2 data-driven carve static ordering guard (gate)

Per-PR backstop for every carved skill, one test() per skill, driven by
CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific
ordering test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E1 carve-guard completeness meta-guard (gate)

Asserts filesystem carved set == CARVE_GUARDS set both directions, so a
future carve without a registry entry fails CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: ET1 guard-of-guards negative tests (gate)

Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable
root. Kills the silent-pass-guard failure class.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: T2 data-driven behavioral section-loading guard (periodic)

One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL
cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke
tests; testNames aligned to their touchfile keys. Registered in touchfiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: defer E3 real-session carve canary to TODOS

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve document-release into skeleton + on-demand section

Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG
voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit +
PR body) move to sections/release-body.md, read on demand after the Step
1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved.
Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve design-consultation into skeleton + on-demand section

Phases 3-6 (complete proposal, drill-downs, design preview, writing
DESIGN.md) move to sections/proposal-and-preview.md, read on demand after
product context + research. Skeleton 80,719 -> 59,229 B (-27%); union
preserved. Adds the CARVE_GUARDS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve cso into skeleton + on-demand section (security-safe)

Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode
dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the
Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the
skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved.

Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop):
mode dispatch must appear before any STOP-Read, so a directive that decides
which sections to read can't be stranded behind the STOP that reads them
(codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check.
parity moves cso monolith -> generated carved entry. cso-preserved.test.ts
strengthened: phrases checked against the union, plus an always-loaded
contract on the skeleton (dispatch + FP-filtering, codex #5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: make redaction/taxonomy tests union-aware for cso + document-release carves

The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts
pointer, git-history scan) into sections/audit-phases.md, and the
document-release carve moved the Step 9 PR-body redaction scan into
sections/release-body.md. Three content-presence tests asserted that content
in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union
(same fix as cso-preserved + parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.0.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: address pre-landing review (codex) on the carve

- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
  run only their selected phases, not every phase bundled in the section
  ('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
  first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
  path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-07 19:13:24 -07:00
committed by GitHub
parent 476b0ec597
commit e722c5bf89
34 changed files with 2981 additions and 2071 deletions
+51
View File
@@ -2283,3 +2283,54 @@ into `test/helpers/fake-gbrain.ts` when the second consumer arrives
runs).
**Depends on:** None.
### P2: Real-session carve canary (E3, deferred from carve-guard plan)
**What:** Wire a real-session section-Read-miss canary on top of the
carved skills. When a real user session drives a carved skill and the
agent does NOT Read a section the skeleton's STOP directive pointed it
at, log it (salted, content-free) to
`~/.gstack/analytics/section-reads.jsonl` and surface drift via
`bun run eval:summary`. Non-blocking alert, never a merge gate
(real-session data is non-deterministic).
**Why:** The static (E2) + behavioral (T2) guards prove carves are
structurally sound and that a real agent Reads sections in a controlled
eval. They do NOT see production drift — a prompt-context change that
makes live agents start skipping a section. The canary is the only
mechanism that catches that, from real usage.
**Context:** Deferred from the carve-guard-hardening plan (D5→T2, codex
outside-voice #7). `test/helpers/transcript-section-logger.ts` exists but
is built for deterministic test transcripts + ship action fingerprints,
NOT real-session drift — it needs rework before it can back this. Ship
the deterministic guards first; add this once they've proven useful. The
carved-skill set + each skill's `requiredReads` are already declared in
`test/helpers/carve-guards.ts`, so the canary reads its expectations
from there.
**Effort:** M (human ~2d, CC ~4h).
**Depends on:** `transcript-section-logger.ts` real-session-drift rework.
### P2: Harden behavioral section-loading test hermeticity
**What:** `captureSectionReads` in `test/helpers/auq-sdk-capture.ts` accepts ANY
Read whose path matches `sections/<file>.md`. The skeleton's STOP-Read directive
points at the gstack-root install path (`scripts/resolvers/sections.ts` builds it
from `ctx.paths.skillRoot`), not the planted fixture copy. So a run can satisfy
the section-read assertion by reading the GLOBAL install's section instead of the
hermetic fixture.
**Why:** A behavioral test that passes by reading the global install doesn't prove
THIS branch's carved section loads. If the fixture's section were broken but the
global install's weren't, the test would still pass.
**Context:** Codex outside-voice finding on the carve-guard ship (v1.57.0.0).
Pre-existing in `auq-sdk-capture.ts` — affects `skill-e2e-ship-section-loading`,
`skill-e2e-plan-ceo-review-section-loading`, and the new
`carve-section-loading.test.ts`. Fix: match the fixture's ABSOLUTE sections path
(the `planDir` copy), not a bare `sections/<file>.md` regex; or rewrite the STOP
path to the fixture during the run.
**Effort:** S (human ~3h, CC ~30min). **Depends on:** None.