v1.57.0.0 feat: carve-guard system + carve cso/document-release/design-consultation (#1907)

* test: canonical CARVE_GUARDS registry; derive parity + size-budget from it

Single source of truth for the carved-skill set + per-skill invariants
(EQ1). parity-harness.ts sectioned entries and skill-size-budget.ts
SECTIONS_EXTRACTED now derive from it instead of hand-maintained lists.
Closes a pre-existing drift: plan-devex-review was in SECTIONS_EXTRACTED
but had no sectioned parity invariant; now generated. carve-guards.ts is
a pure leaf data module (import type only) to avoid an import cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: shared carve-guard check fns with injectable root

discoverCarvedSkills/checkOrdering/checkCompleteness take a root param so
the negative tests can point the real guards at a fixture dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E2 data-driven carve static ordering guard (gate)

Per-PR backstop for every carved skill, one test() per skill, driven by
CARVE_GUARDS staticInvariants. Generalizes + retires the ceo-specific
ordering test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: E1 carve-guard completeness meta-guard (gate)

Asserts filesystem carved set == CARVE_GUARDS set both directions, so a
future carve without a registry entry fails CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: ET1 guard-of-guards negative tests (gate)

Temp fixture broken 3 ways proves E1/E2 actually throw, via the injectable
root. Kills the silent-pass-guard failure class.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: T2 data-driven behavioral section-loading guard (periodic)

One file iterating CARVE_GUARDS, one test() per skill with GSTACK_CARVE_SKILL
cost-scoping (D-CODEX A). external carves (ship, plan-ceo) keep bespoke
tests; testNames aligned to their touchfile keys. Registered in touchfiles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: defer E3 real-session carve canary to TODOS

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve document-release into skeleton + on-demand section

Steps 2-9 (per-file audit, auto-updates, risky-change asks, CHANGELOG
voice polish, cross-doc consistency, TODOS cleanup, VERSION bump, commit +
PR body) move to sections/release-body.md, read on demand after the Step
1.5 coverage map. Skeleton 59,256 -> 45,797 B (-23%); union preserved.
Adds the CARVE_GUARDS entry (auto-extends parity + size-budget via EQ1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve design-consultation into skeleton + on-demand section

Phases 3-6 (complete proposal, drill-downs, design preview, writing
DESIGN.md) move to sections/proposal-and-preview.md, read on demand after
product context + research. Skeleton 80,719 -> 59,229 B (-27%); union
preserved. Adds the CARVE_GUARDS entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat: carve cso into skeleton + on-demand section (security-safe)

Scope-dependent audit Phases 2-11 move to sections/audit-phases.md. Mode
dispatch (## Arguments, ## Mode Resolution), always-run Phases 0/1, and the
Phase 12 false-positive-filtering exceptions stay ALWAYS-LOADED in the
skeleton. Skeleton 79,383 -> 65,117 B (-18%); union preserved.

Adds a cso CARVE_GUARDS entry with an earliest-use invariant (mustPrecedeStop):
mode dispatch must appear before any STOP-Read, so a directive that decides
which sections to read can't be stranded behind the STOP that reads them
(codex outside-voice #6). carve-guard-checks gains the mustPrecedeStop check.
parity moves cso monolith -> generated carved entry. cso-preserved.test.ts
strengthened: phrases checked against the union, plus an always-loaded
contract on the skeleton (dispatch + FP-filtering, codex #5).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: make redaction/taxonomy tests union-aware for cso + document-release carves

The cso carve moved Secrets Archaeology (prefixes, lib/redact-patterns.ts
pointer, git-history scan) into sections/audit-phases.md, and the
document-release carve moved the Step 9 PR-body redaction scan into
sections/release-body.md. Three content-presence tests asserted that content
in the skeleton SKILL.md/.md.tmpl; they now read the skeleton+sections union
(same fix as cso-preserved + parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.0.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: address pre-landing review (codex) on the carve

- cso section: add a scope-gate header so '--owasp' (and other scoped modes)
  run only their selected phases, not every phase bundled in the section
  ('execute in full' no longer overrides Mode Resolution).
- carve-guard-checks: gateAfterStop now compares against the LAST STOP, not the
  first, so a gate stranded between two STOPs in a multi-STOP skeleton fails.
- TODOS: behavioral section-loading hermeticity (verifier matches global-install
  path, not the fixture) — pre-existing in auq-sdk-capture.ts, deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-07 19:13:24 -07:00
committed by GitHub
parent 476b0ec597
commit e722c5bf89
34 changed files with 2981 additions and 2071 deletions
+177
View File
@@ -0,0 +1,177 @@
/**
* Pure carve-guard check functions, with an injectable `root` (codex
* outside-voice #5, refined-plan pass) so the negative tests (T5) can point the
* REAL guards at a broken fixture dir instead of testing a wrapper.
*
* Used by:
* - test/carve-section-ordering.test.ts (E2) → checkOrdering
* - test/carve-guard-completeness.test.ts (E1) → discoverCarvedSkills + checkCompleteness
* - test/carve-guards-negative.test.ts (T5) → both, against a fixture root
*
* Imports only the leaf data module (carve-guards.ts) + node stdlib — no cycle.
*/
import * as fs from 'fs';
import * as path from 'path';
import { CARVE_GUARDS, type CarveGuard } from './carve-guards';
/** Every dir under `root` that owns a sections/manifest.json. Injectable for tests. */
export function discoverCarvedSkills(root: string): string[] {
return fs
.readdirSync(root, { withFileTypes: true })
.filter((d) => d.isDirectory())
.map((d) => d.name)
.filter((name) => fs.existsSync(path.join(root, name, 'sections', 'manifest.json')))
.sort();
}
function readSkeleton(root: string, skill: string): string {
return fs.readFileSync(path.join(root, skill, 'SKILL.md'), 'utf-8');
}
/** Skeleton + every sections/*.md unioned (relocated content still counts). */
function readUnion(root: string, skill: string): string {
let text = readSkeleton(root, skill);
const dir = path.join(root, skill, 'sections');
if (fs.existsSync(dir)) {
for (const f of fs.readdirSync(dir).sort()) {
if (f.endsWith('.md') && !f.endsWith('.md.tmpl')) {
text += '\n' + fs.readFileSync(path.join(dir, f), 'utf-8');
}
}
}
return text;
}
const STOP = '> **STOP.**';
/**
* Static ordering invariants for one carved skill. Returns a list of failure
* strings (empty = pass). Pure: takes `root` so it runs against the real repo or
* a fixture identically.
*/
export function checkOrdering(root: string, guard: CarveGuard): string[] {
const failures: string[] = [];
let skeleton: string;
try {
skeleton = readSkeleton(root, guard.skill);
} catch (err) {
return [`cannot read ${guard.skill}/SKILL.md: ${(err as Error).message}`];
}
const union = readUnion(root, guard.skill);
// 1. The skeleton routes to sections via a Section index + STOP-Read directives.
if (!skeleton.includes('## Section index')) {
failures.push('skeleton is missing the "## Section index" table');
}
if (!skeleton.includes(STOP)) {
failures.push('skeleton has no STOP-Read directive');
}
// 2. Every expected section is referenced by path AND generated (AUTO-GENERATED).
for (const file of guard.expectedSections) {
if (!skeleton.includes(`sections/${file}`)) {
failures.push(`skeleton does not reference sections/${file}`);
}
const secPath = path.join(root, guard.skill, 'sections', file);
if (!fs.existsSync(secPath)) {
failures.push(`section file missing: sections/${file}`);
} else if (!fs.readFileSync(secPath, 'utf-8').slice(0, 200).includes('AUTO-GENERATED')) {
failures.push(`sections/${file} is hand-edited (no AUTO-GENERATED header)`);
}
}
// 3. Pre-STOP anchors stay in the skeleton.
for (const anchor of guard.staticInvariants.mustStayInSkeleton) {
if (!skeleton.includes(anchor)) {
failures.push(`mustStayInSkeleton anchor missing from skeleton: "${anchor}"`);
}
}
// 3b. Earliest-use: dispatch directives must appear BEFORE the first STOP
// (codex #6 — a directive that governs which sections to read can't sit after
// the STOP that reads them).
const firstStopIdx = skeleton.indexOf(STOP);
for (const anchor of guard.staticInvariants.mustPrecedeStop ?? []) {
const at = skeleton.indexOf(anchor);
if (at < 0) {
failures.push(`mustPrecedeStop anchor missing from skeleton: "${anchor}"`);
} else if (firstStopIdx >= 0 && at > firstStopIdx) {
failures.push(`mustPrecedeStop anchor "${anchor}" appears AFTER the STOP (stranded)`);
}
}
// 4. Heavy body moved out of the skeleton but is preserved in the union.
for (const moved of guard.staticInvariants.mustMoveToSection) {
if (skeleton.includes(moved)) {
failures.push(`mustMoveToSection marker is still in the skeleton: "${moved}"`);
}
if (!union.includes(moved)) {
failures.push(`mustMoveToSection marker absent from the union (lost): "${moved}"`);
}
}
// 5. The post-STOP gate fires after the last STOP (review skills).
const gate = guard.staticInvariants.gateAfterStop;
if (gate) {
// Gate must fire after the LAST STOP (once all section work returns), not just
// the first — for multi-STOP skeletons a gate between two STOPs is stranded.
const lastStop = skeleton.lastIndexOf(STOP);
const lastGate = skeleton.lastIndexOf(gate);
if (lastGate < 0) {
failures.push(`gateAfterStop marker missing from skeleton: "${gate}"`);
} else if (lastStop >= 0 && lastGate < lastStop) {
failures.push(`gateAfterStop "${gate}" appears before the last STOP (stranded above it)`);
}
}
return failures;
}
/**
* Completeness (E1): the filesystem carved set must equal the registry set, both
* directions, and every registry entry must be internally consistent. Pure:
* takes `root`.
*/
export function checkCompleteness(root: string): string[] {
const failures: string[] = [];
const discovered = new Set(discoverCarvedSkills(root));
const registered = new Set(Object.keys(CARVE_GUARDS));
for (const skill of discovered) {
if (!registered.has(skill)) {
failures.push(`carved on disk but NOT in CARVE_GUARDS (unguarded carve): ${skill}`);
}
}
for (const skill of registered) {
if (!discovered.has(skill)) {
failures.push(`in CARVE_GUARDS but not carved on disk (stale registry entry): ${skill}`);
}
}
for (const [skill, g] of Object.entries(CARVE_GUARDS)) {
if (g.expectedSections.length === 0) {
failures.push(`${skill}: expectedSections is empty`);
}
if (g.requiredReads.length === 0) {
failures.push(`${skill}: requiredReads is empty (behavioral guard would be decorative)`);
}
for (const r of g.requiredReads) {
if (!g.expectedSections.includes(r)) {
failures.push(`${skill}: requiredRead "${r}" is not in expectedSections`);
}
}
// Behavioral guard exists: 'plan'/'prompt' are covered structurally by the
// data-driven loop (registry membership IS coverage); 'external' must name a
// dedicated test file that actually exists on disk.
if (g.behavioral === 'external') {
if (!g.externalTest) {
failures.push(`${skill}: behavioral 'external' but no externalTest path`);
} else if (!fs.existsSync(path.join(root, g.externalTest))) {
failures.push(`${skill}: externalTest missing on disk: ${g.externalTest}`);
}
}
}
return failures;
}
+273
View File
@@ -0,0 +1,273 @@
/**
* Canonical carved-skill guard registry — the single source of truth for which
* skills are carved (skeleton SKILL.md + on-demand sections/*.md) and what each
* carve must guarantee.
*
* PURE LEAF DATA MODULE (codex outside-voice #1, refined-plan pass): this file
* has NO runtime imports — `import type` only. parity-harness.ts and
* skill-size-budget.test.ts derive their carved-skill lists FROM here (no
* parallel hand-maintained lists), so a runtime import back into either of them
* would create a cycle. Keep it data.
*
* Consumers:
* - test/carve-section-ordering.test.ts (E2, gate) → staticInvariants
* - test/carve-section-loading.test.ts (T2, periodic) → requiredReads + scenario
* - test/carve-guard-completeness.test.ts (E1, gate) → the set must equal the
* filesystem carved set
* - test/carve-guards-negative.test.ts (ET1, gate) → injects a broken fixture
* - test/helpers/parity-harness.ts → sectioned/maxSkeletonBytes/minBytes/mustContain
* - test/skill-size-budget.test.ts → SECTIONS_EXTRACTED = CARVED_SKILLS
*
* Adding a carve = add one entry here (atomically, in the same commit as the
* skeleton + manifest + sections — codex #4 — so E1's bidirectional parity never
* false-positives mid-commit).
*/
/** Static (skeleton-shape) invariants the per-PR ordering guard (E2) asserts. */
export interface CarveStaticInvariants {
/**
* Substrings that MUST remain in the always-loaded skeleton. Empty = skip
* (the skill has no distinctive pre-STOP anchor worth pinning beyond the
* universal STOP/section-index checks E2 already runs).
*/
mustStayInSkeleton: string[];
/**
* Substrings that MUST appear in the skeleton BEFORE the first STOP-Read
* (earliest-use, codex #6). For cso: mode-dispatch directives (## Arguments,
* ## Mode Resolution) must be resolved before any section is read — a dispatch
* directive stranded after the STOP can't govern which sections to read.
* Empty/undefined = skip (most skills).
*/
mustPrecedeStop?: string[];
/**
* Substrings that MUST be in the union (skeleton + sections) but MUST NOT be in
* the skeleton — i.e. the heavy body that the carve relocated. Empty = skip.
*/
mustMoveToSection: string[];
/**
* If set, this marker must appear in the skeleton AFTER the last STOP-Read
* directive (e.g. the EXIT PLAN MODE GATE that fires once section work returns).
* Undefined = the skill has no post-STOP gate (operational/conversational carve).
*/
gateAfterStop?: string;
}
export interface CarveGuard {
skill: string;
/** Section .md filenames the manifest lists and the skeleton must STOP-Read. */
expectedSections: string[];
/**
* Sections the behavioral test (T2) asserts the agent actually Read when driven
* by `scenario`. A non-empty subset of expectedSections — the ones the scenario
* is built to require. The registry owns this so "registered ⇒ asserted" is
* structural (codex #2), not policed.
*/
requiredReads: string[];
/**
* Fixture prompt that drives a real `claude -p` run down the STOP-Read path for
* this skill (codex #7). The behavioral test asserts the run reached the STOP
* (read requiredReads), not merely that nothing was read.
*/
scenario: string;
staticInvariants: CarveStaticInvariants;
/**
* How the behavioral guard (T2) exercises this skill:
* - 'plan' → write a PLAN.md fixture, run the review against it
* - 'prompt' → no fixture file; the scenario prompt alone drives the run
* - 'external' → covered by a dedicated bespoke test (complex fixtures, e.g.
* ship's git/VERSION/CHANGELOG state). The data-driven loop
* skips it; E1 asserts `externalTest` exists instead.
*/
behavioral: 'plan' | 'prompt' | 'external';
/** Required when behavioral === 'external': path (repo-relative) to the dedicated test. */
externalTest?: string;
/** Parity: max bytes for the always-loaded skeleton (asserts the carve shrank it). */
maxSkeletonBytes: number;
/** Parity: min bytes for the skeleton+sections union (total behavior preserved). */
minUnionBytes: number;
/** Parity: content phrases the union must preserve. */
mustContain: string[];
}
export const CARVE_GUARDS: Record<string, CarveGuard> = {
ship: {
skill: 'ship',
expectedSections: [
'tests.md',
'test-coverage.md',
'plan-completion.md',
'review-army.md',
'greptile.md',
'adversarial.md',
'changelog.md',
'pr-body.md',
],
requiredReads: ['review-army.md', 'changelog.md'],
scenario:
'This is a FRESH version-changing ship: the branch has a real code change, VERSION still equals the base version (needs a bump), and CHANGELOG.md needs a new entry. Follow the skill flow for a version-changing ship: run the pre-landing review and prepare the CHANGELOG entry. Produce the ship plan / review report. Do NOT actually commit, push, or open a PR.',
staticInvariants: {
mustStayInSkeleton: [],
mustMoveToSection: [],
// ship is operational (multi-STOP, not a plan review); no single post-STOP gate.
gateAfterStop: undefined,
},
behavioral: 'external',
externalTest: 'test/skill-e2e-ship-section-loading.test.ts',
maxSkeletonBytes: 90_000,
minUnionBytes: 120_000,
mustContain: ['VERSION', 'CHANGELOG', 'review', 'merge', 'PR'],
},
'plan-ceo-review': {
skill: 'plan-ceo-review',
expectedSections: ['review-sections.md'],
requiredReads: ['review-sections.md'],
scenario:
'Review the plan in PLAN.md. Hold the current scope (HOLD SCOPE mode) — do not challenge or expand scope. Run the full CEO review and produce the review report.',
staticInvariants: {
mustStayInSkeleton: ['## Step 0: Nuclear Scope Challenge'],
mustMoveToSection: ['### Section 1: Architecture Review', '## Mode Quick Reference'],
gateAfterStop: 'EXIT PLAN MODE GATE',
},
behavioral: 'external',
externalTest: 'test/skill-e2e-plan-ceo-review-section-loading.test.ts',
maxSkeletonBytes: 90_000,
minUnionBytes: 80_000,
mustContain: ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'],
},
'plan-eng-review': {
skill: 'plan-eng-review',
expectedSections: ['review-sections.md'],
requiredReads: ['review-sections.md'],
scenario:
'Review the plan in PLAN.md. Accept the current scope. Run the full engineering review (architecture, code quality, tests, performance) and produce the review report.',
staticInvariants: {
mustStayInSkeleton: ['### Step 0: Scope Challenge'],
mustMoveToSection: ['### 1. Architecture review'],
gateAfterStop: 'EXIT PLAN MODE GATE',
},
behavioral: 'plan',
maxSkeletonBytes: 62_000,
minUnionBytes: 70_000,
mustContain: ['Architecture', 'Code Quality', 'Test', 'Performance'],
},
'plan-design-review': {
skill: 'plan-design-review',
expectedSections: ['review-sections.md'],
requiredReads: ['review-sections.md'],
scenario:
'Review the plan in PLAN.md for design and UX. Accept the current scope. Run the full design review passes and produce the review report.',
staticInvariants: {
mustStayInSkeleton: [],
mustMoveToSection: ['### Pass 1: Information Architecture'],
gateAfterStop: 'EXIT PLAN MODE GATE',
},
behavioral: 'plan',
maxSkeletonBytes: 82_000,
minUnionBytes: 70_000,
mustContain: ['design', 'visual'],
},
'plan-devex-review': {
skill: 'plan-devex-review',
expectedSections: ['review-sections.md'],
requiredReads: ['review-sections.md'],
scenario:
'Review the plan in PLAN.md for developer experience. Accept the current scope. Run the full DX review passes and produce the review report.',
staticInvariants: {
mustStayInSkeleton: [],
mustMoveToSection: ['### Pass 1: Getting Started Experience'],
gateAfterStop: 'EXIT PLAN MODE GATE',
},
behavioral: 'plan',
maxSkeletonBytes: 76_000,
minUnionBytes: 70_000,
mustContain: ['developer experience', 'Getting Started'],
},
'office-hours': {
skill: 'office-hours',
expectedSections: ['design-and-handoff.md'],
requiredReads: ['design-and-handoff.md'],
scenario:
'Run office hours for this product idea through to the end: have the diagnostic conversation, explore alternatives, then write the design doc and run the relationship handoff (Phases 5-6).',
staticInvariants: {
mustStayInSkeleton: [],
mustMoveToSection: [],
// office-hours is conversational; the design-doc/handoff section has no
// post-STOP review gate in the skeleton.
gateAfterStop: undefined,
},
behavioral: 'prompt',
maxSkeletonBytes: 96_000,
minUnionBytes: 70_000,
mustContain: ['design doc', 'problem statement'],
},
'document-release': {
skill: 'document-release',
expectedSections: ['release-body.md'],
requiredReads: ['release-body.md'],
scenario:
'A PR has shipped a new CLI flag and touched README.md and CHANGELOG.md. Skip the git pre-flight shell commands (assume the diff adds --new-flag and updates those two docs). Run the documentation workflow: build the coverage map, then audit the docs, apply updates, and polish the CHANGELOG voice. Produce the documentation health summary.',
staticInvariants: {
mustStayInSkeleton: ['## Step 1: Pre-flight', '## Step 1.5: Coverage Map'],
mustMoveToSection: ['## Step 2: Per-File Documentation Audit', '## Step 5: CHANGELOG Voice Polish'],
// Operational skill (no plan-mode review gate).
gateAfterStop: undefined,
},
behavioral: 'prompt',
maxSkeletonBytes: 50_000,
minUnionBytes: 55_000,
mustContain: ['CHANGELOG', 'Diataxis', 'coverage'],
},
'design-consultation': {
skill: 'design-consultation',
expectedSections: ['proposal-and-preview.md'],
requiredReads: ['proposal-and-preview.md'],
scenario:
'The user gave product context (a B2B analytics dashboard for ops teams) and declined the research phase. Skip browser/design tool setup. Proceed to build the complete design-system proposal, then write DESIGN.md. Produce the proposal and the DESIGN.md content.',
staticInvariants: {
mustStayInSkeleton: ['## Phase 0: Pre-checks', '## Phase 1: Product Context', '## Phase 2: Research'],
mustMoveToSection: ['## Phase 3: The Complete Proposal', '## Phase 6: Write DESIGN.md'],
gateAfterStop: undefined,
},
behavioral: 'prompt',
maxSkeletonBytes: 64_000,
minUnionBytes: 72_000,
mustContain: ['Typography', 'Color', 'Aesthetic Direction'],
},
cso: {
skill: 'cso',
expectedSections: ['audit-phases.md'],
requiredReads: ['audit-phases.md'],
scenario:
'Run a security audit on this repository in --owasp mode (OWASP Top 10 only). Resolve the mode, do the Phase 0 stack detection and Phase 1 attack-surface census, then run the scoped audit phases and produce the findings report. Skip any step that needs network access.',
staticInvariants: {
// Dispatch + always-run + FP-filtering phases are ALWAYS loaded (security).
mustStayInSkeleton: [
'## Arguments',
'## Mode Resolution',
'### Phase 0',
'### Phase 1',
'### Phase 12',
'### Phase 13',
'### Phase 14',
],
// Earliest-use: mode must be resolvable before any section is read (codex #6).
mustPrecedeStop: ['## Arguments', '## Mode Resolution'],
// Scope-dependent audit detail moved to the section.
mustMoveToSection: [
'### Phase 2: Secrets Archaeology',
'### Phase 9: OWASP Top 10 Assessment',
'### Phase 10: STRIDE Threat Model',
],
gateAfterStop: undefined,
},
behavioral: 'prompt',
maxSkeletonBytes: 70_000,
minUnionBytes: 72_000,
mustContain: ['OWASP', 'STRIDE', 'daily', 'comprehensive', 'verif'],
},
};
/** Sorted carved-skill names. Consumers derive their lists from this — no parallel lists. */
export const CARVED_SKILLS: readonly string[] = Object.freeze(
Object.keys(CARVE_GUARDS).sort(),
);
+32 -95
View File
@@ -22,6 +22,7 @@ import * as fs from 'fs';
import * as path from 'path';
import type { ParityBaseline, SkillBaselineEntry } from './capture-parity-baseline';
import { captureBaseline } from './capture-parity-baseline';
import { CARVE_GUARDS } from './carve-guards';
export interface ParityInvariant {
skill: string;
@@ -198,86 +199,13 @@ export function runParityChecks(opts: {
* Each entry pins what must-not-break in a skill family. Extend as future
* skills land. Phase B (v2.0.0.0) adds LLM-judge invariants on top of these.
*/
export const PARITY_INVARIANTS: ParityInvariant[] = [
{
skill: 'cso',
mustContain: ['OWASP', 'STRIDE', 'daily', 'comprehensive', 'verif'],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 30_000,
},
{
// Carved (v2 plan T9): skeleton SKILL.md + sections/*.md. Content checks run
// against the union (relocated phrases still count); size floors run against
// the union (total behavior preserved); maxSkeletonBytes asserts the
// always-loaded skeleton actually shrank from the ~167KB monolith.
skill: 'ship',
sectioned: true,
maxSkeletonBytes: 90_000,
mustContain: [
'VERSION',
'CHANGELOG',
'review',
'merge',
'PR',
],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 120_000,
},
{
// Carved (v2 plan T9): skeleton SKILL.md + sections/review-sections.md.
// Content + size floors run against the union (relocated prose still counts);
// maxSkeletonBytes asserts the always-loaded skeleton shrank from the ~138KB
// monolith to ~81KB (measured 80,731 B, -42%). Headroom to 90KB so a small
// skeleton edit doesn't trip CI, but a 10KB regression does.
skill: 'plan-ceo-review',
sectioned: true,
maxSkeletonBytes: 90_000,
mustContain: [
'SCOPE EXPANSION',
'SELECTIVE EXPANSION',
'HOLD SCOPE',
'SCOPE REDUCTION',
],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 80_000,
},
{
// Carved (v2 plan T9): skeleton + sections/review-sections.md. The 4-section
// review, outside voice, and required outputs moved to the section; content
// checks run against the union. Skeleton shrank 106,984 -> 54,892 B (-48.7%);
// maxSkeletonBytes 62KB = measured + headroom.
skill: 'plan-eng-review',
sectioned: true,
maxSkeletonBytes: 62_000,
mustContain: [
'Architecture',
'Code Quality',
'Test',
'Performance',
],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 70_000,
},
{
// Carved (v2 plan T9): skeleton + sections/review-sections.md. The 7 design
// passes + required outputs moved to the section; content checks run against
// the union. Skeleton shrank 112,057 -> 76,024 B (-32.2%); maxSkeletonBytes
// 82KB = measured + headroom.
skill: 'plan-design-review',
sectioned: true,
maxSkeletonBytes: 82_000,
mustContain: [
'design',
'visual',
],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 70_000,
},
/**
* Monolith (non-carved) invariants — hand-written. Carved-skill invariants are
* generated from CARVE_GUARDS below (single source of truth), so they never drift
* from the size-budget / static / behavioral guards.
*/
const MONOLITH_INVARIANTS: ParityInvariant[] = [
// cso is now carved — its invariant is generated from CARVE_GUARDS below.
{
skill: 'review',
mustContain: ['confidence', 'P1', 'P2'],
@@ -299,21 +227,6 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
maxSizeRatio: 1.05,
minBytes: 30_000,
},
{
// Carved (v2 plan T9): skeleton SKILL.md + sections/design-and-handoff.md.
// Phase 5 (design doc) + Phase 6 (handoff) moved into the section, so
// 'design doc' / 'problem statement' now live there — content checks run
// against the union. maxSkeletonBytes asserts the always-loaded skeleton
// shrank from the ~118KB monolith to ~89KB (measured 88,975 B, -24.8%);
// headroom to 96KB so a small skeleton edit doesn't trip CI.
skill: 'office-hours',
sectioned: true,
maxSkeletonBytes: 96_000,
mustContain: ['design doc', 'problem statement'],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
minBytes: 70_000,
},
{
skill: 'autoplan',
mustContain: ['ceo', 'eng', 'design'],
@@ -322,3 +235,27 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
minBytes: 70_000,
},
];
/**
* Carved-skill invariants, GENERATED from the canonical CARVE_GUARDS registry
* (EQ1: single source of truth). Each carve's skeleton-shrink floor
* (maxSkeletonBytes), union floor (minUnionBytes), and content invariants
* (mustContain) live in carve-guards.ts; this just projects them into the parity
* shape. Adding a carve there auto-adds its union guard here — which is how
* plan-devex-review (previously in SECTIONS_EXTRACTED but missing a sectioned
* parity invariant) is now guarded.
*/
const CARVED_INVARIANTS: ParityInvariant[] = Object.values(CARVE_GUARDS).map((g) => ({
skill: g.skill,
sectioned: true,
maxSkeletonBytes: g.maxSkeletonBytes,
minBytes: g.minUnionBytes,
mustContain: g.mustContain,
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
}));
export const PARITY_INVARIANTS: ParityInvariant[] = [
...MONOLITH_INVARIANTS,
...CARVED_INVARIANTS,
];
+6
View File
@@ -123,6 +123,11 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
'ship-idempotency-pty': ['ship/**', 'bin/gstack-next-version', 'bin/gstack-version-bump', 'scripts/resolvers/sections.ts', 'lib/worktree.ts', 'test/helpers/claude-pty-runner.ts'],
'ship-section-loading': ['ship/**', 'scripts/resolvers/sections.ts', 'scripts/gen-skill-docs.ts', 'test/helpers/auq-sdk-capture.ts', 'test/helpers/session-runner.ts'],
'plan-ceo-section-loading': ['plan-ceo-review/**', 'scripts/resolvers/sections.ts', 'scripts/gen-skill-docs.ts', 'test/helpers/auq-sdk-capture.ts', 'test/helpers/session-runner.ts'],
// Data-driven behavioral guard for the 'plan'/'prompt' carves (eng, design,
// devex, office-hours + future PR2 carves). One file iterating CARVE_GUARDS;
// the selector sets GSTACK_CARVE_SKILL=<name> to scope cost to the changed
// skill (D-CODEX A). Touching the registry/helper or sections.ts runs all.
'carve-section-loading': ['plan-eng-review/**', 'plan-design-review/**', 'plan-devex-review/**', 'office-hours/**', 'document-release/**', 'design-consultation/**', 'cso/**', 'test/helpers/carve-guards.ts', 'scripts/resolvers/sections.ts', 'scripts/gen-skill-docs.ts', 'test/helpers/auq-sdk-capture.ts', 'test/helpers/session-runner.ts'],
'autoplan-chain-pty': ['autoplan/**', 'plan-ceo-review/**', 'plan-design-review/**', 'plan-eng-review/**', 'plan-devex-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'],
'e2e-harness-audit': ['plan-ceo-review/**', 'plan-eng-review/**', 'plan-design-review/**', 'plan-devex-review/**', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/agent-sdk-runner.ts', 'test/helpers/claude-pty-runner.ts'],
@@ -512,6 +517,7 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
'ship-idempotency-pty': 'periodic', // ~$3/run, real /ship in plan mode
'ship-section-loading': 'periodic', // ~$3/run, real /ship; asserts section reads
'plan-ceo-section-loading': 'periodic', // ~$3-5/run, real /plan-ceo-review; asserts section read
'carve-section-loading': 'periodic', // ~$1-2/skill, data-driven; GSTACK_CARVE_SKILL scopes to one
'autoplan-chain-pty': 'periodic', // ~$8/run, all 3 phases sequential
// Per-finding count + review-report-at-bottom — periodic because each