mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-10 12:03:59 +02:00
45cc95d5f4
* feat(gbrain-sync): add cycleCompleted() cycle-state probe Reads `gbrain doctor` cycle_freshness to classify whether a source has completed a full cycle (completed/never/unknown). A fail naming this source -> never; a fail naming only other sources -> completed; an absent or unparseable check -> unknown, so an unrelated doctor failure never masks a real state. Gates the automatic call-graph build on --full. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): --dream call-graph stage with lock-free gate + honest outcome guard Adds a source-scoped `gbrain dream --source <id>` stage that builds this worktree's call graph (code-callers/code-callees). Runs lock-free after the sync lock releases so it never blocks sibling worktrees; a .dream-in-progress marker dedupes concurrent dreams. --full auto-runs it only when the cycle was never built; explicit --dream always forces; --no-dream opts out. The stage parses the cycle's own output and reports the truth, not a flat "built": a WARN when the schema pack can't extract code symbols, when the embed phase failed for a missing key, or when 0 edges resolved; OK with the resolved-edge count otherwise. gbrain exits 0 even when it skips on a held cycle lock (e.g. autopilot), so that case reports SKIP, not success. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: ignore gbrain .sources/ local staging dir gbrain writes per-source staging and capability-check artifacts under .sources/ in the repo root. It's machine-local runtime state, not source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(gbrain): honest call-graph guidance in /sync-gbrain + pin works on gbrain>=0.41.38 sync-gbrain frames the --dream offer honestly: building a call graph requires a code-aware schema pack, and the dream stage reports a WARN when it can't. The verdict's Call graph row mirrors the dream stage's real outcome instead of assuming a completed cycle means edges exist. The ## GBrain Search Guidance block written into CLAUDE.md drops the old code-callers --source caveat: gbrain >=0.41.38.0 honors the .gbrain-source pin for code-callers/code-callees. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(jsonl-store): shared audited JSONL plumbing (injection-reject + atomic append + tolerant read) Single source of truth extracted for D2A: gstack-learnings-* and the upcoming gstack-decision-* bins share one injection-pattern list, one atomic single-line appender, and one tolerant reader. No more drift between stores. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(learnings-log): use shared hasInjection from lib/jsonl-store (D2A) Replace the inline injection-pattern copy with the shared list. One audited write-path rejection across learnings + the upcoming decision store. Behavior unchanged (35/35 learnings tests green); learnings-search keeps its inline copy because a structural test pins its bash/bun shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): event-sourced decision-memory model (lib/gstack-decision) decide/supersede/redact events on lib/jsonl-store; active set is computed (no mutable status), dangling refs tolerated. Free-text is injection-checked and redact-scanned on write (HIGH secret -> reject). Scope filter (repo/branch/issue) for relevant resurfacing. File-only + reliable; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): bounded active snapshot + compaction (redact expunges, supersede archives) writeSnapshot/readSnapshot/rebuildSnapshot give an O(active) bounded read for the session-start hot path (D1A). compact() rewrites the log to active, archives superseded decisions for history, and EXPUNGES redacted ones (dropped, never archived) so an accidentally-captured secret leaves the store for good. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): gstack-decision-log + gstack-decision-search bins (non-interactive) Two bins mirroring gstack-learnings-* (D3A). log writes decide/--supersede/--redact/ --compact events + refreshes the bounded snapshot + enqueues for cross-machine sync; search reads the O(active) snapshot, scope-filtered to current branch, newest-first, --all to include superseded, --json for machines. Empty store returns silently (no snapshot write on an empty read). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): surface active decisions at session start + capture nudge (Context Recovery) Context Recovery now shows recent scope-relevant active decisions (bounded read of decisions.active.json via gstack-decision-search) and instructs the agent to treat them as settled calls and to log durable decisions/reversals. Closes the Phase-1 capture->curate->resurface loop, reliable + file-only. Regen across all hosts folded in (squash-with-regen); parity 10/10, freshness green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: refresh ship golden baselines for the memory-loop preamble change Context Recovery now emits the cross-session-decisions block, so ship's preamble (all hosts) changed. Golden baselines are hand-maintained copies (gen does not write them); refresh them from the fresh gen so golden-file regression passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(memory): document the cross-session decision-memory loop in CLAUDE.md Adds a '## Cross-session decision memory' section: how to resurface (gstack-decision-search) and capture (gstack-decision-log) durable decisions, the supersede/redact/compact verbs, and a crisp durable-vs-trivial definition so the store stays signal. Reliable file-only path; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): emit durable decisions from ship/ceo/eng/spec at structured points Wires the four skills that finalize real decisions to capture them in the cross-session decision store, from their STRUCTURED outputs (never free-text scraping): - ship: the version bump (level + why) at write time - plan-ceo-review: accepted scope + verdict (branch-scoped) - plan-eng-review: the architecture verdict + key call (branch-scoped) - spec: the filed issue's core approach (issue-scoped) All emits are non-interactive, schema-correct (content in decision/rationale, source=skill, confidence 1-10), and best-effort (|| true) so a decision-log failure never blocks the workflow. Includes regen across hosts + refreshed ship golden baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): optional gbrain --semantic recall for decision search Adds gstack-decision-search --semantic (with --query): appends a 'Related from memory' block from gbrain semantic search, scoped to the curated-memory source. Pure enhancement, reliability-first: a new lib/gstack-decision-semantic.ts is the ONLY decision module that touches gbrain and is imported lazily only on --semantic, so the reliable file path never loads gbrain code. Every path degrades to the reliable file results when gbrain is off, unconfigured, empty, or errors (never throws, 10s timeout). Built against the verified gbrain 0.42.x surface (text output [score] slug -- snippet, NOT JSON; curated-memory source resolved by worktree path, not a gstack-brain-<user> id). Deterministic-contract tests only: parser units, degrade-to-null when gbrain absent, and a fake-gbrain shim proving scope+search end-to-end. find-contradictions deferred (no verifiable CLI surface yet + curated memory not indexed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): self-heal stale autopilot lock (dead-pid) detectAutopilot treated a lock FILE as proof of life, so a crashed gbrain daemon left a stale lock that wedged every sync forever (observed: a dead pid refused --full indefinitely). Now read the holder pid (bare or JSON body) and check liveness via signal-0: ESRCH=dead → ignore the stale signal and keep checking; EPERM=alive (other user) → active. A stale lock never masks a live autopilot process. Pure decision function — does not delete the file; the caller may clean it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): drop stray trailing code fence in TODOS-format Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): align section-loading E2E testNames with their TOUCHFILES keys Pre-existing on main (v1.56.x): the two section-loading E2E tests used human-label testNames ('/ship section-loading') that don't match their slug keys ('ship-section-loading') in E2E_TOUCHFILES/E2E_TIERS. Every other E2E test uses the slug as its testName, and the TOUCHFILES completeness gate requires testName to be a registered key — so the gate was red. Align both testNames to their slug keys (also fixes tier lookup for these two periodic tests). Verified failing on a clean origin/main checkout before the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (datamark, DRY, compact, coverage) Addresses the pre-landing review findings (all INFORMATIONAL, no criticals): - security: datamark resurfaced decision text at the render boundary (lib/gstack-decision.ts datamark() — neutralizes code fences, --- banners, <|role|>/</system> markers, control chars, newlines). Applied in gstack-decision-search human output so stored text can't masquerade as instructions in Context Recovery (codex hardening #3 / AC #7). --json stays raw. - DRY: extract resolveSlug/gitBranch/flagValue to lib/bin-context.ts; both decision bins use it instead of duplicating the helpers. - compact(): batch the archive append (one write, not N) and shrink the mid-compact crash window; simplify the opaque branch/issue ternary. - coverage: learnings-log injection rejection (D2A wiring), search --recent/ --scope + NaN-safe --recent, datamark-applied, unparseable lock body, compact-empty, corrupt-snapshot degrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close adversarial-review findings in decision memory Adversarial review (Claude subagent) found a CRITICAL the specialist pass missed: - F1 (CRITICAL): 'Human:'/'Assistant:' turn-prefixes bypassed BOTH the write-time denylist AND datamark(), landing verbatim in agent context inside the trusted ACTIVE DECISIONS fence. Add 'human:' (+ 'disregard previous', 'from now on') to the shared denylist, and have datamark() neutralize Human:/Assistant:/System:/User: turn-prefixes (ZWSP) at the render boundary. - F2: datamark() only stripped ASCII C0; extend to Unicode line terminators (U+0085/2028/2029) and U+007F so 'strip newlines' actually holds. - F3: validateDecide blocked only HIGH secrets; MEDIUM-tier PII (e.g. SSN) persisted silently and synced cross-machine. The store is non-interactive (no confirm path), so fail closed on MEDIUM too. - F4: compact() was a lock-free read-modify-rewrite that could clobber a concurrent append (lost decision). Add an O_EXCL compact lock + a pre-rename size recheck that aborts untouched (skipped=true) if an append landed; caller re-runs. - F7: filterByScope unknown/garbage scope fell through to 'return true' (leaked into every context); fail conservative (false). F5 (pid reuse) and F6 (pgrep over-match) are intentionally left as-is: both fail SAFE (over-refuse sync); making them precise would introduce a fail-DANGEROUS path (allowing sync during a real autopilot). True disambiguation needs gbrain to stamp the lock with a start-time, which gstack doesn't own. F8 (compact moves history to archive) is by design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close cross-model (Codex) adversarial findings Codex adversarial review found a HIGH the Claude pass missed plus 3 mediums: - C1 (HIGH): gstack-decision-search --all returned every decide and IGNORED redact events, so a redacted secret still resurfaced via --all until compact ran. --all now excludes redacted (redact = expunge from every read path), still showing superseded history. - C-med: semantic (external gbrain) slug/snippet were printed raw — datamark them too so a gbrain hit can't spoof role markers / fences into agent context. - C4: semanticRecall fell back to an UNSCOPED gbrain search when no curated-memory source resolved, pulling code/doc corpora mislabeled as 'related decisions'. Now returns null (degrade) when there's no worktree-backed memory source. - C5: validateDecide scanned only decision/rationale/alternatives; branch and issue are stored + surfaced (raw via --json), so include them in the injection+secret scan. C2 (snapshot staleness) / C3 (compact TOCTOU residual): accepted for a single-user store — atomic appends never lose the event, rebuilds self-heal, and the compact size-recheck leaves only a sub-ms window; full append-locking would break the lock-free append design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.5.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
303 lines
14 KiB
TypeScript
303 lines
14 KiB
TypeScript
/**
|
|
* Canonical carved-skill guard registry — the single source of truth for which
|
|
* skills are carved (skeleton SKILL.md + on-demand sections/*.md) and what each
|
|
* carve must guarantee.
|
|
*
|
|
* PURE LEAF DATA MODULE (codex outside-voice #1, refined-plan pass): this file
|
|
* has NO runtime imports — `import type` only. parity-harness.ts and
|
|
* skill-size-budget.test.ts derive their carved-skill lists FROM here (no
|
|
* parallel hand-maintained lists), so a runtime import back into either of them
|
|
* would create a cycle. Keep it data.
|
|
*
|
|
* Consumers:
|
|
* - test/carve-section-ordering.test.ts (E2, gate) → staticInvariants
|
|
* - test/carve-section-loading.test.ts (T2, periodic) → requiredReads + scenario
|
|
* - test/carve-guard-completeness.test.ts (E1, gate) → the set must equal the
|
|
* filesystem carved set
|
|
* - test/carve-guards-negative.test.ts (ET1, gate) → injects a broken fixture
|
|
* - test/helpers/parity-harness.ts → sectioned/maxSkeletonBytes/minBytes/mustContain
|
|
* - test/skill-size-budget.test.ts → SECTIONS_EXTRACTED = CARVED_SKILLS
|
|
*
|
|
* Adding a carve = add one entry here (atomically, in the same commit as the
|
|
* skeleton + manifest + sections — codex #4 — so E1's bidirectional parity never
|
|
* false-positives mid-commit).
|
|
*/
|
|
|
|
/** Static (skeleton-shape) invariants the per-PR ordering guard (E2) asserts. */
|
|
export interface CarveStaticInvariants {
|
|
/**
|
|
* Substrings that MUST remain in the always-loaded skeleton. Empty = skip
|
|
* (the skill has no distinctive pre-STOP anchor worth pinning beyond the
|
|
* universal STOP/section-index checks E2 already runs).
|
|
*/
|
|
mustStayInSkeleton: string[];
|
|
/**
|
|
* Substrings that MUST appear in the skeleton BEFORE the first STOP-Read
|
|
* (earliest-use, codex #6). For cso: mode-dispatch directives (## Arguments,
|
|
* ## Mode Resolution) must be resolved before any section is read — a dispatch
|
|
* directive stranded after the STOP can't govern which sections to read.
|
|
* Empty/undefined = skip (most skills).
|
|
*/
|
|
mustPrecedeStop?: string[];
|
|
/**
|
|
* Substrings that MUST be in the union (skeleton + sections) but MUST NOT be in
|
|
* the skeleton — i.e. the heavy body that the carve relocated. Empty = skip.
|
|
*/
|
|
mustMoveToSection: string[];
|
|
/**
|
|
* If set, this marker must appear in the skeleton AFTER the last STOP-Read
|
|
* directive (e.g. the EXIT PLAN MODE GATE that fires once section work returns).
|
|
* Undefined = the skill has no post-STOP gate (operational/conversational carve).
|
|
*/
|
|
gateAfterStop?: string;
|
|
}
|
|
|
|
export interface CarveGuard {
|
|
skill: string;
|
|
/** Section .md filenames the manifest lists and the skeleton must STOP-Read. */
|
|
expectedSections: string[];
|
|
/**
|
|
* Sections the behavioral test (T2) asserts the agent actually Read when driven
|
|
* by `scenario`. A non-empty subset of expectedSections — the ones the scenario
|
|
* is built to require. The registry owns this so "registered ⇒ asserted" is
|
|
* structural (codex #2), not policed.
|
|
*/
|
|
requiredReads: string[];
|
|
/**
|
|
* Fixture prompt that drives a real `claude -p` run down the STOP-Read path for
|
|
* this skill (codex #7). The behavioral test asserts the run reached the STOP
|
|
* (read requiredReads), not merely that nothing was read.
|
|
*/
|
|
scenario: string;
|
|
staticInvariants: CarveStaticInvariants;
|
|
/**
|
|
* How the behavioral guard (T2) exercises this skill:
|
|
* - 'plan' → write a PLAN.md fixture, run the review against it
|
|
* - 'prompt' → no fixture file; the scenario prompt alone drives the run
|
|
* - 'external' → covered by a dedicated bespoke test (complex fixtures, e.g.
|
|
* ship's git/VERSION/CHANGELOG state). The data-driven loop
|
|
* skips it; E1 asserts `externalTest` exists instead.
|
|
*/
|
|
behavioral: 'plan' | 'prompt' | 'external';
|
|
/** Required when behavioral === 'external': path (repo-relative) to the dedicated test. */
|
|
externalTest?: string;
|
|
/** Parity: max bytes for the always-loaded skeleton (asserts the carve shrank it). */
|
|
maxSkeletonBytes: number;
|
|
/** Parity: min bytes for the skeleton+sections union (total behavior preserved). */
|
|
minUnionBytes: number;
|
|
/** Parity: content phrases the union must preserve. */
|
|
mustContain: string[];
|
|
/**
|
|
* Parity: optional per-skill override for the union size-growth ceiling vs the
|
|
* v1.53.0.0 baseline (default 1.05). Bumped only when a deliberate cross-cutting
|
|
* preamble feature legitimately grows a smaller carved skeleton past 5%.
|
|
*/
|
|
maxSizeRatio?: number;
|
|
}
|
|
|
|
export const CARVE_GUARDS: Record<string, CarveGuard> = {
|
|
ship: {
|
|
skill: 'ship',
|
|
expectedSections: [
|
|
'tests.md',
|
|
'test-coverage.md',
|
|
'plan-completion.md',
|
|
'review-army.md',
|
|
'greptile.md',
|
|
'adversarial.md',
|
|
'changelog.md',
|
|
'pr-body.md',
|
|
],
|
|
requiredReads: ['review-army.md', 'changelog.md'],
|
|
scenario:
|
|
'This is a FRESH version-changing ship: the branch has a real code change, VERSION still equals the base version (needs a bump), and CHANGELOG.md needs a new entry. Follow the skill flow for a version-changing ship: run the pre-landing review and prepare the CHANGELOG entry. Produce the ship plan / review report. Do NOT actually commit, push, or open a PR.',
|
|
staticInvariants: {
|
|
// The PR-title-version invariant MUST stay always-loaded: the v1.54.0.0
|
|
// carve stranded it in pr-body.md and PRs started landing with bare titles
|
|
// (CI backstop: test/pr-title-sync-workflow-safety.test.ts).
|
|
mustStayInSkeleton: ['v$NEW_VERSION', 'gstack-pr-title-rewrite'],
|
|
// ...while the full create/update procedure stays carved into pr-body.md
|
|
// (out of the skeleton, present in the union). Asserts BOTH PR paths
|
|
// survive: the create path and the idempotent update path.
|
|
mustMoveToSection: ['gh pr create --base', 'gh pr edit --title'],
|
|
// ship is operational (multi-STOP, not a plan review); no single post-STOP gate.
|
|
gateAfterStop: undefined,
|
|
},
|
|
behavioral: 'external',
|
|
externalTest: 'test/skill-e2e-ship-section-loading.test.ts',
|
|
maxSkeletonBytes: 90_000,
|
|
minUnionBytes: 120_000,
|
|
mustContain: ['VERSION', 'CHANGELOG', 'review', 'merge', 'PR'],
|
|
},
|
|
'plan-ceo-review': {
|
|
skill: 'plan-ceo-review',
|
|
expectedSections: ['review-sections.md'],
|
|
requiredReads: ['review-sections.md'],
|
|
scenario:
|
|
'Review the plan in PLAN.md. Hold the current scope (HOLD SCOPE mode) — do not challenge or expand scope. Run the full CEO review and produce the review report.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: ['## Step 0: Nuclear Scope Challenge'],
|
|
mustMoveToSection: ['### Section 1: Architecture Review', '## Mode Quick Reference'],
|
|
gateAfterStop: 'EXIT PLAN MODE GATE',
|
|
},
|
|
behavioral: 'external',
|
|
externalTest: 'test/skill-e2e-plan-ceo-review-section-loading.test.ts',
|
|
maxSkeletonBytes: 90_000,
|
|
minUnionBytes: 80_000,
|
|
mustContain: ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'],
|
|
},
|
|
'plan-eng-review': {
|
|
skill: 'plan-eng-review',
|
|
expectedSections: ['review-sections.md'],
|
|
requiredReads: ['review-sections.md'],
|
|
scenario:
|
|
'Review the plan in PLAN.md. Accept the current scope. Run the full engineering review (architecture, code quality, tests, performance) and produce the review report.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: ['### Step 0: Scope Challenge'],
|
|
mustMoveToSection: ['### 1. Architecture review'],
|
|
gateAfterStop: 'EXIT PLAN MODE GATE',
|
|
},
|
|
behavioral: 'plan',
|
|
maxSkeletonBytes: 62_000,
|
|
minUnionBytes: 70_000,
|
|
mustContain: ['Architecture', 'Code Quality', 'Test', 'Performance'],
|
|
// Cross-cutting preamble growth (v1.57.2.0 AUQ-failure prose fallback + the
|
|
// decision-memory nudge + the v1.57.4.0 Boil-the-Ocean rename) lands this just
|
|
// over the strict 1.05; small headroom for the shared preamble additions.
|
|
maxSizeRatio: 1.06,
|
|
},
|
|
'plan-design-review': {
|
|
skill: 'plan-design-review',
|
|
expectedSections: ['review-sections.md'],
|
|
requiredReads: ['review-sections.md'],
|
|
scenario:
|
|
'Review the plan in PLAN.md for design and UX. Accept the current scope. Run the full design review passes and produce the review report.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: [],
|
|
mustMoveToSection: ['### Pass 1: Information Architecture'],
|
|
gateAfterStop: 'EXIT PLAN MODE GATE',
|
|
},
|
|
behavioral: 'plan',
|
|
maxSkeletonBytes: 82_000,
|
|
minUnionBytes: 70_000,
|
|
mustContain: ['design', 'visual'],
|
|
},
|
|
'plan-devex-review': {
|
|
skill: 'plan-devex-review',
|
|
expectedSections: ['review-sections.md'],
|
|
requiredReads: ['review-sections.md'],
|
|
scenario:
|
|
'Review the plan in PLAN.md for developer experience. Accept the current scope. Run the full DX review passes and produce the review report.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: [],
|
|
mustMoveToSection: ['### Pass 1: Getting Started Experience'],
|
|
gateAfterStop: 'EXIT PLAN MODE GATE',
|
|
},
|
|
behavioral: 'plan',
|
|
maxSkeletonBytes: 76_000,
|
|
minUnionBytes: 70_000,
|
|
mustContain: ['developer experience', 'Getting Started'],
|
|
},
|
|
'office-hours': {
|
|
skill: 'office-hours',
|
|
expectedSections: ['design-and-handoff.md'],
|
|
requiredReads: ['design-and-handoff.md'],
|
|
scenario:
|
|
'Run office hours for this product idea through to the end: have the diagnostic conversation, explore alternatives, then write the design doc and run the relationship handoff (Phases 5-6).',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: [],
|
|
mustMoveToSection: [],
|
|
// office-hours is conversational; the design-doc/handoff section has no
|
|
// post-STOP review gate in the skeleton.
|
|
gateAfterStop: undefined,
|
|
},
|
|
behavioral: 'prompt',
|
|
maxSkeletonBytes: 96_000,
|
|
minUnionBytes: 70_000,
|
|
mustContain: ['design doc', 'problem statement'],
|
|
},
|
|
'document-release': {
|
|
skill: 'document-release',
|
|
expectedSections: ['release-body.md'],
|
|
requiredReads: ['release-body.md'],
|
|
scenario:
|
|
'A PR has shipped a new CLI flag and touched README.md and CHANGELOG.md. Skip the git pre-flight shell commands (assume the diff adds --new-flag and updates those two docs). Run the documentation workflow: build the coverage map, then audit the docs, apply updates, and polish the CHANGELOG voice. Produce the documentation health summary.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: ['## Step 1: Pre-flight', '## Step 1.5: Coverage Map'],
|
|
mustMoveToSection: ['## Step 2: Per-File Documentation Audit', '## Step 5: CHANGELOG Voice Polish'],
|
|
// Operational skill (no plan-mode review gate).
|
|
gateAfterStop: undefined,
|
|
},
|
|
behavioral: 'prompt',
|
|
maxSkeletonBytes: 50_000,
|
|
minUnionBytes: 55_000,
|
|
mustContain: ['CHANGELOG', 'Diataxis', 'coverage'],
|
|
// The AUQ-failure prose fallback (v1.57.2.0) adds ~2KB to every skill's
|
|
// always-loaded preamble; on this small carved skeleton that lands at ~5.9%
|
|
// over the pre-carve/pre-AUQ v1.53.0.0 baseline. Headroom for the
|
|
// cross-cutting addition; all other skills keep the strict 1.05 ceiling.
|
|
maxSizeRatio: 1.08,
|
|
},
|
|
'design-consultation': {
|
|
skill: 'design-consultation',
|
|
expectedSections: ['proposal-and-preview.md'],
|
|
requiredReads: ['proposal-and-preview.md'],
|
|
scenario:
|
|
'The user gave product context (a B2B analytics dashboard for ops teams) and declined the research phase. Skip browser/design tool setup. Proceed to build the complete design-system proposal, then write DESIGN.md. Produce the proposal and the DESIGN.md content.',
|
|
staticInvariants: {
|
|
mustStayInSkeleton: ['## Phase 0: Pre-checks', '## Phase 1: Product Context', '## Phase 2: Research'],
|
|
mustMoveToSection: ['## Phase 3: The Complete Proposal', '## Phase 6: Write DESIGN.md'],
|
|
gateAfterStop: undefined,
|
|
},
|
|
behavioral: 'prompt',
|
|
maxSkeletonBytes: 64_000,
|
|
minUnionBytes: 72_000,
|
|
mustContain: ['Typography', 'Color', 'Aesthetic Direction'],
|
|
// Cross-cutting preamble growth (v1.57.2.0 AUQ-failure prose fallback ~2KB +
|
|
// the cross-session decision-memory nudge) lands this carved skeleton just over
|
|
// the strict 1.05; headroom for the shared preamble additions.
|
|
maxSizeRatio: 1.07,
|
|
},
|
|
cso: {
|
|
skill: 'cso',
|
|
expectedSections: ['audit-phases.md'],
|
|
requiredReads: ['audit-phases.md'],
|
|
scenario:
|
|
'Run a security audit on this repository in --owasp mode (OWASP Top 10 only). Resolve the mode, do the Phase 0 stack detection and Phase 1 attack-surface census, then run the scoped audit phases and produce the findings report. Skip any step that needs network access.',
|
|
staticInvariants: {
|
|
// Dispatch + always-run + FP-filtering phases are ALWAYS loaded (security).
|
|
mustStayInSkeleton: [
|
|
'## Arguments',
|
|
'## Mode Resolution',
|
|
'### Phase 0',
|
|
'### Phase 1',
|
|
'### Phase 12',
|
|
'### Phase 13',
|
|
'### Phase 14',
|
|
],
|
|
// Earliest-use: mode must be resolvable before any section is read (codex #6).
|
|
mustPrecedeStop: ['## Arguments', '## Mode Resolution'],
|
|
// Scope-dependent audit detail moved to the section.
|
|
mustMoveToSection: [
|
|
'### Phase 2: Secrets Archaeology',
|
|
'### Phase 9: OWASP Top 10 Assessment',
|
|
'### Phase 10: STRIDE Threat Model',
|
|
],
|
|
gateAfterStop: undefined,
|
|
},
|
|
behavioral: 'prompt',
|
|
maxSkeletonBytes: 70_000,
|
|
minUnionBytes: 72_000,
|
|
mustContain: ['OWASP', 'STRIDE', 'daily', 'comprehensive', 'verif'],
|
|
// cso keeps its mode-dispatch + FP-filtering phases always-loaded, so the
|
|
// cross-cutting preamble growth (v1.57.2.0 AUQ-failure prose fallback ~2KB + the
|
|
// decision-memory nudge) lands it just over 1.05; headroom for the shared additions.
|
|
maxSizeRatio: 1.07,
|
|
},
|
|
};
|
|
|
|
/** Sorted carved-skill names. Consumers derive their lists from this — no parallel lists. */
|
|
export const CARVED_SKILLS: readonly string[] = Object.freeze(
|
|
Object.keys(CARVE_GUARDS).sort(),
|
|
);
|