mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-03 00:28:02 +02:00
v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806)
* feat(test): transcript-section-logger + ship-action fingerprint (T10) Pure-analysis module over a SkillTestResult/NDJSON transcript: - extractSectionReads(): which sections/*.md a run opened (post-carve check) - extractShipActions(): observable action fingerprint (merge/test/bump/ changelog/commit/push/pr) that works on the MONOLITH too, so a baseline captured before the carve can detect a sectioned-ship regression - baseline read/write + compareShipActions() for baseline-first dogf(T10) Baseline-first answers the Codex outside-voice critique that a logger in the same PR as the carve is post-failure telemetry without a pre-carve reference. 11 unit tests, all green. Paid monolith baseline capture runs separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(pipeline): section discovery + generation machinery (T9) - discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl - gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext as shared helpers (processTemplate and the new processSectionTemplate both call them, so a sanitization/rewrite fix can't miss sections) [C1] - processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice), parent-skill TemplateContext (skillName pinned to parent, not 'sections', so appliesTo gating + tier behave identically), per-host output routing - --host all now fails the build on ANY host failure, not just claude, so a stale external-host output can't slip the freshness gate [Codex outside-voice #9] Inert until a skill is carved (no sections/ dirs exist yet). Refactor is output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE. 5 discovery unit tests + 389 gen-skill-docs tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9) Two install targets cherry-pick SKILL.md and would leave a carved skill's sections/ behind, 404ing a runtime 'Read sections/<name>.md': - link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows gets a fresh copy on every ./setup) - kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under ~/.kiro, not ~/.codex/~/.claude codex/factory/opencode link the whole generated dir, so sections ride free. Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a skill is carved. Static-tripwire test + windows-fallback invariant green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9) Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a tested CLI instead of bash prose the agent re-derives each run. - classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION vs origin/<base>:VERSION vs package.json.version (pure reader) - write: validated dual-write to VERSION + package.json (FRESH bump) - repair: DRIFT_STALE_PKG sync, no re-bump Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from skippable prose into code that can't be skipped or misread. 15 tests (exhaustive state matrix + write/repair fs + real-git classify). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): sectioned-skill parity capability — guards the carve (T9) Carved skills (skeleton + sections/*.md) need parity checks that see relocated content, or moving a phrase into a section reads as 'lost': - readSkillForParity(): union skeleton + all sections/*.md - checkSkillParity sectioned mode: content checks against the union; minBytes/ maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a small skeleton would otherwise make the size floor toothless [Codex #12]. Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the same commit it lands. Monolith path byte-identical (verified: pre-existing investigate 1.053 ratio drift fails the same with this change stashed). 7 sectioned-parity tests + existing parity tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(ship): carve into skeleton + on-demand sections (Claude) (T9) ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving 8 prose-heavy steps into ship/sections/*.md, read on demand: tests, test-coverage, plan-completion, review-army, greptile, adversarial, changelog, pr-body. Step 12's version logic now calls the tested gstack-version-bump CLI instead of inline bash. Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton + generated section files) and INLINES the content on every other host, so external hosts keep the full monolith — verified factory at 162KB with no sections dir. {{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures. Multi-pass resolve expands inlined sections' own resolvers. Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/ golden/plan-completion/#1539/size-budget tests via skeleton+sections union reads. Free suite green except the pre-existing investigate parity drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): manifest-consistency + context-parity + requiredReads helper (T9) Free deterministic guards for the carve: - required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the mechanical layer-5 check that the agent Read the sections its situation needs (required set comes from the fixture, not the passive manifest) - section-manifest-consistency: 3-tier orphan classification (generated orphan + hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and pins the PASSIVE-manifest contract (no applies_when/required_for) - template-context-parity: generated sections have zero unresolved placeholders and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW) rendered — proving sections resolve with the parent skillName, not 'sections' 16 tests, all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): section-loading E2E + idempotency CLI detection (T9) - skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan mode against a fresh version-changing fixture and asserts the agent Read the required sections (review-army + changelog). Runs against the INSTALLED skill (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip. - skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12 now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a gstack-version-bump-write re-bump regression signal. - touchfiles: register ship-section-loading (periodic) + extend idempotency deps with bin/gstack-version-bump + scripts/resolvers/sections.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(ship): union-read redaction wiring test for the carve (T9) main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the carve, not the skeleton template. Read skeleton + section templates union so the redaction-wiring assertions follow the relocated content. 9/9 green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
/**
|
||||
* Unit coverage for discoverSectionTemplates — the section-discovery half of the
|
||||
* v2 plan T9 pipeline. Drives it against a temp fixture tree so it doesn't
|
||||
* depend on which skills have been carved in the real repo.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { discoverSectionTemplates } from '../scripts/discover-skills';
|
||||
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'sections-disc-'));
|
||||
afterAll(() => { try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// ship/ has two section templates + a non-template file; review/ has none;
|
||||
// hidden + node_modules dirs must be skipped by the shared subdirs() filter.
|
||||
fs.mkdirSync(path.join(root, 'ship', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'ship', 'SKILL.md.tmpl'), '---\nname: ship\n---\nbody');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'version-bump.md.tmpl'), 'bump');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'changelog.md.tmpl'), 'changelog');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'manifest.json'), '{}'); // not a .md.tmpl
|
||||
fs.mkdirSync(path.join(root, 'review'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'review', 'SKILL.md.tmpl'), '---\nname: review\n---\nbody');
|
||||
fs.mkdirSync(path.join(root, 'node_modules', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'node_modules', 'sections', 'x.md.tmpl'), 'nope');
|
||||
|
||||
describe('discoverSectionTemplates', () => {
|
||||
const found = discoverSectionTemplates(root);
|
||||
|
||||
test('finds only *.md.tmpl files inside <skill>/sections/', () => {
|
||||
expect(found.map(f => f.tmpl)).toEqual([
|
||||
'ship/sections/changelog.md.tmpl',
|
||||
'ship/sections/version-bump.md.tmpl',
|
||||
]);
|
||||
});
|
||||
|
||||
test('strips .tmpl for the output path and records the owning skill dir', () => {
|
||||
const bump = found.find(f => f.tmpl.endsWith('version-bump.md.tmpl'))!;
|
||||
expect(bump.output).toBe('ship/sections/version-bump.md');
|
||||
expect(bump.skillDir).toBe('ship');
|
||||
});
|
||||
|
||||
test('ignores non-template files (manifest.json) and skipped dirs (node_modules)', () => {
|
||||
expect(found.some(f => f.tmpl.includes('manifest.json'))).toBe(false);
|
||||
expect(found.some(f => f.tmpl.includes('node_modules'))).toBe(false);
|
||||
});
|
||||
|
||||
test('returns deterministic (sorted) order', () => {
|
||||
const tmpls = found.map(f => f.tmpl);
|
||||
expect([...tmpls].sort()).toEqual(tmpls);
|
||||
});
|
||||
|
||||
test('skills without a sections/ dir contribute nothing', () => {
|
||||
expect(found.some(f => f.skillDir === 'review')).toBe(false);
|
||||
});
|
||||
});
|
||||
+68
-1927
File diff suppressed because it is too large
Load Diff
+39
-138
@@ -805,6 +805,10 @@ Only *actions* are idempotent:
|
||||
- Step 19: If PR exists, update the body instead of creating a new PR
|
||||
Never skip a verification step because a prior `/ship` run already performed it.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Pre-flight
|
||||
@@ -2098,150 +2102,37 @@ If any learnings come back, name which one applies to the version bump or CHANGE
|
||||
|
||||
## Step 12: Version bump (auto-decide)
|
||||
|
||||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||||
|
||||
```bash
|
||||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||||
PKG_VERSION=""
|
||||
PKG_EXISTS=0
|
||||
if [ -f package.json ]; then
|
||||
PKG_EXISTS=1
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PARSE_EXIT" != "0" ]; then
|
||||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||||
|
||||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_UNEXPECTED"
|
||||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||||
exit 1
|
||||
fi
|
||||
echo "STATE: FRESH"
|
||||
else
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_STALE_PKG"
|
||||
else
|
||||
echo "STATE: ALREADY_BUMPED"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
Read the `STATE:` line and dispatch:
|
||||
|
||||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||||
|
||||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||||
|
||||
2. **Auto-decide the bump level based on the diff:**
|
||||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||||
|
||||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||||
|
||||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||||
The deterministic version-state logic is the tested **`gstack-version-bump`** CLI
|
||||
(classify / write / repair). The bump-LEVEL decision and queue-collision handling
|
||||
stay agent judgment; the slot pick stays `gstack-next-version`.
|
||||
|
||||
1. **Classify state** — pure reader, never writes:
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||||
--base <base> \
|
||||
--bump "$BUMP_LEVEL" \
|
||||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump classify --base <base>
|
||||
```
|
||||
Read the JSON `state` and dispatch:
|
||||
- **FRESH** → do the bump (steps 2-4).
|
||||
- **ALREADY_BUMPED** → skip the bump, but run the queue-drift check (step 3) with the reported `currentVersion`. If the queue moved (next free version differs), **AskUserQuestion**: rebump to the new version (rewrites CHANGELOG header + PR title) or keep current (CI version-gate will reject until resolved).
|
||||
- **DRIFT_STALE_PKG** → run `gstack-version-bump repair` (syncs package.json to VERSION). No re-bump; reuse `currentVersion` for CHANGELOG + PR.
|
||||
- **DRIFT_UNEXPECTED** → **STOP**. package.json disagrees with VERSION while VERSION matches base — a manual edit bypassed /ship. Reconcile manually, then re-run.
|
||||
|
||||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||||
```
|
||||
Queue on <base> (vBASE_VERSION):
|
||||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||||
Active sibling workspaces (WIP, not yet PR'd):
|
||||
<path> → v<version> (committed Nh ago)
|
||||
Your branch will claim: vNEW_VERSION (<reason>)
|
||||
```
|
||||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||||
2. **Decide the bump level** from the diff (agent judgment):
|
||||
- **MICRO**: <50 lines, trivial tweaks/config. **PATCH**: 50+ lines, no feature signals.
|
||||
- **MINOR**: **ASK** if any feature signal (new route/page, migration, new module), OR 500+ lines. **MAJOR**: **ASK** — milestones or breaking changes only.
|
||||
Save as `BUMP_LEVEL`. The level is the user-intended bump; queue-aware placement may advance the slot without changing the level.
|
||||
|
||||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||||
3. **Queue-aware pick** (workspace-aware ship):
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run $GSTACK_ROOT/bin/gstack-next-version --base <base> --bump "$BUMP_LEVEL" --current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
```
|
||||
If `offline`/util fails: fall back to local `BUMP_LEVEL` arithmetic and print `⚠ workspace-aware ship offline — using local bump only`. If `claimed` is non-empty, render the queue table so the user sees landing order. If an active sibling workspace holds a version `>= NEW_VERSION`, **AskUserQuestion**: advance past (unrelated work) or abort and sync with the sibling.
|
||||
|
||||
```bash
|
||||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "$NEW_VERSION" > VERSION
|
||||
if [ -f package.json ]; then
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||||
exit 1
|
||||
}
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||||
|
||||
```bash
|
||||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||||
exit 1
|
||||
fi
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed — could not update package.json."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed."
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||||
```
|
||||
|
||||
---
|
||||
4. **Write the bump** (FRESH, or an approved rebump):
|
||||
```bash
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump write --version "$NEW_VERSION"
|
||||
```
|
||||
The CLI validates the 4-digit `MAJOR.MINOR.PATCH.MICRO` pattern and writes **both** VERSION and package.json. On a half-write (VERSION written, package.json failed) it exits 3 — re-run, and classify will report DRIFT_STALE_PKG for `repair` to fix.
|
||||
|
||||
## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
@@ -2746,6 +2637,16 @@ no-op. The marker guarantees at-most-once per machine. To re-enable:
|
||||
|
||||
---
|
||||
|
||||
## Section self-check (before you finish)
|
||||
|
||||
You ran a carved skill. For your situation, list every section the Section index
|
||||
named as applying, and confirm you issued a Read for each one. If you executed any
|
||||
of those steps from memory without reading its section, you skipped the source of
|
||||
truth — STOP, Read it now, and redo that step. Deterministic version work goes
|
||||
through `gstack-version-bump`; never hand-roll the VERSION/package.json write.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
|
||||
+39
-138
@@ -807,6 +807,10 @@ Only *actions* are idempotent:
|
||||
- Step 19: If PR exists, update the body instead of creating a new PR
|
||||
Never skip a verification step because a prior `/ship` run already performed it.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Pre-flight
|
||||
@@ -2476,150 +2480,37 @@ If any learnings come back, name which one applies to the version bump or CHANGE
|
||||
|
||||
## Step 12: Version bump (auto-decide)
|
||||
|
||||
**Idempotency check:** Before bumping, classify the state by comparing `VERSION` against the base branch AND against `package.json`'s `version` field. Four states: FRESH (do bump), ALREADY_BUMPED (skip bump), DRIFT_STALE_PKG (sync pkg only, no re-bump), DRIFT_UNEXPECTED (stop and ask).
|
||||
|
||||
```bash
|
||||
if ! git rev-parse --verify origin/<base> >/dev/null 2>&1; then
|
||||
echo "ERROR: Unable to resolve origin/<base>. Run 'git fetch origin' or verify the base branch exists."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
CURRENT_VERSION=$(cat VERSION 2>/dev/null | tr -d '\r\n[:space:]' || echo "0.0.0.0")
|
||||
[ -z "$BASE_VERSION" ] && BASE_VERSION="0.0.0.0"
|
||||
[ -z "$CURRENT_VERSION" ] && CURRENT_VERSION="0.0.0.0"
|
||||
PKG_VERSION=""
|
||||
PKG_EXISTS=0
|
||||
if [ -f package.json ]; then
|
||||
PKG_EXISTS=1
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(node -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
PKG_VERSION=$(bun -e 'const p=require("./package.json");process.stdout.write(p.version||"")' 2>/dev/null)
|
||||
PARSE_EXIT=$?
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available. Install one and re-run."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PARSE_EXIT" != "0" ]; then
|
||||
echo "ERROR: package.json is not valid JSON. Fix the file before re-running /ship."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
echo "BASE: $BASE_VERSION VERSION: $CURRENT_VERSION package.json: ${PKG_VERSION:-<none>}"
|
||||
|
||||
if [ "$CURRENT_VERSION" = "$BASE_VERSION" ]; then
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_UNEXPECTED"
|
||||
echo "package.json version ($PKG_VERSION) disagrees with VERSION ($CURRENT_VERSION) while VERSION matches base."
|
||||
echo "This looks like a manual edit to package.json bypassing /ship. Reconcile manually, then re-run."
|
||||
exit 1
|
||||
fi
|
||||
echo "STATE: FRESH"
|
||||
else
|
||||
if [ "$PKG_EXISTS" = "1" ] && [ -n "$PKG_VERSION" ] && [ "$PKG_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "STATE: DRIFT_STALE_PKG"
|
||||
else
|
||||
echo "STATE: ALREADY_BUMPED"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
Read the `STATE:` line and dispatch:
|
||||
|
||||
- **FRESH** → proceed with the bump action below (steps 1–4).
|
||||
- **ALREADY_BUMPED** → skip the bump by default, BUT check for queue drift first: call `bin/gstack-next-version` with the implied bump level (derived from `CURRENT_VERSION` vs `BASE_VERSION`), compare its `.version` against `CURRENT_VERSION`. If they differ (queue moved since last ship), use **AskUserQuestion**: "VERSION drift detected: you claim v<CURRENT> but next available is v<NEW> (queue moved). A) Rebump to v<NEW> and rewrite CHANGELOG header + PR title (recommended), B) Keep v<CURRENT> — will be rejected by CI version-gate until resolved." If A, treat this as FRESH with `NEW_VERSION=<new>` and run steps 1-4 (which will also trigger Step 13 CHANGELOG header rewrite and Step 19 PR title rewrite). If B, reuse `CURRENT_VERSION` and warn that CI will likely reject. If util is offline, warn and reuse `CURRENT_VERSION`.
|
||||
- **DRIFT_STALE_PKG** → a prior `/ship` bumped `VERSION` but failed to update `package.json`. Run the sync-only repair block below (after step 4). Do NOT re-bump. Reuse `CURRENT_VERSION` for CHANGELOG and PR body. (Queue check still runs in ALREADY_BUMPED terms after repair.)
|
||||
- **DRIFT_UNEXPECTED** → `/ship` has halted (exit 1). Resolve manually; /ship cannot tell which file is authoritative.
|
||||
|
||||
1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
|
||||
|
||||
2. **Auto-decide the bump level based on the diff:**
|
||||
- Count lines changed (`git diff origin/<base>...HEAD --stat | tail -1`)
|
||||
- Check for feature signals: new route/page files (e.g. `app/*/page.tsx`, `pages/*.ts`), new DB migration/schema files, new test files alongside new source files, or branch name starting with `feat/`
|
||||
- **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
|
||||
- **PATCH** (3rd digit): 50+ lines changed, no feature signals detected
|
||||
- **MINOR** (2nd digit): **ASK the user** if ANY feature signal is detected, OR 500+ lines changed, OR new modules/packages added
|
||||
- **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
|
||||
|
||||
Save the chosen level as `BUMP_LEVEL` (one of `major`, `minor`, `patch`, `micro`). This is the user-intended level. The next step decides *placement* — the level stays the same even if queue-aware allocation has to advance past a claimed slot.
|
||||
|
||||
3. **Queue-aware version pick (workspace-aware ship, v1.6.4.0+).** Call `bin/gstack-next-version` to see what's already claimed by open PRs + active sibling Conductor worktrees, then render the queue state to the user:
|
||||
The deterministic version-state logic is the tested **`gstack-version-bump`** CLI
|
||||
(classify / write / repair). The bump-LEVEL decision and queue-collision handling
|
||||
stay agent judgment; the slot pick stays `gstack-next-version`.
|
||||
|
||||
1. **Classify state** — pure reader, never writes:
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run bin/gstack-next-version \
|
||||
--base <base> \
|
||||
--bump "$BUMP_LEVEL" \
|
||||
--current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
CLAIMED_COUNT=$(echo "$QUEUE_JSON" | jq -r '.claimed | length')
|
||||
ACTIVE_SIBLING_COUNT=$(echo "$QUEUE_JSON" | jq -r '.active_siblings | length')
|
||||
OFFLINE=$(echo "$QUEUE_JSON" | jq -r '.offline // false')
|
||||
REASON=$(echo "$QUEUE_JSON" | jq -r '.reason // ""')
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump classify --base <base>
|
||||
```
|
||||
Read the JSON `state` and dispatch:
|
||||
- **FRESH** → do the bump (steps 2-4).
|
||||
- **ALREADY_BUMPED** → skip the bump, but run the queue-drift check (step 3) with the reported `currentVersion`. If the queue moved (next free version differs), **AskUserQuestion**: rebump to the new version (rewrites CHANGELOG header + PR title) or keep current (CI version-gate will reject until resolved).
|
||||
- **DRIFT_STALE_PKG** → run `gstack-version-bump repair` (syncs package.json to VERSION). No re-bump; reuse `currentVersion` for CHANGELOG + PR.
|
||||
- **DRIFT_UNEXPECTED** → **STOP**. package.json disagrees with VERSION while VERSION matches base — a manual edit bypassed /ship. Reconcile manually, then re-run.
|
||||
|
||||
- If `OFFLINE=true` or the util fails (auth expired, no `gh`/`glab`, network): fall back to local `BUMP_LEVEL` arithmetic (bump `BASE_VERSION` at the chosen level). Print `⚠ workspace-aware ship offline — using local bump only`. Continue.
|
||||
- If `CLAIMED_COUNT > 0`: render the queue table to the user so they can see landing order at a glance:
|
||||
```
|
||||
Queue on <base> (vBASE_VERSION):
|
||||
#<pr> <branch> → v<version> [⚠ collision with #<other>]
|
||||
Active sibling workspaces (WIP, not yet PR'd):
|
||||
<path> → v<version> (committed Nh ago)
|
||||
Your branch will claim: vNEW_VERSION (<reason>)
|
||||
```
|
||||
- If `ACTIVE_SIBLING_COUNT > 0` and any active sibling's VERSION is `>= NEW_VERSION`, use **AskUserQuestion**: "Sibling workspace <path> has v<X> committed <N>h ago but hasn't PR'd yet. Wait for them to ship first, or advance past? A) Advance past (recommended for unrelated work), B) Abort /ship and sync up with sibling first."
|
||||
- Validate `NEW_VERSION` matches `MAJOR.MINOR.PATCH.MICRO`. If util returns an empty or malformed version, fall back to local bump.
|
||||
2. **Decide the bump level** from the diff (agent judgment):
|
||||
- **MICRO**: <50 lines, trivial tweaks/config. **PATCH**: 50+ lines, no feature signals.
|
||||
- **MINOR**: **ASK** if any feature signal (new route/page, migration, new module), OR 500+ lines. **MAJOR**: **ASK** — milestones or breaking changes only.
|
||||
Save as `BUMP_LEVEL`. The level is the user-intended bump; queue-aware placement may advance the slot without changing the level.
|
||||
|
||||
4. **Validate** `NEW_VERSION` and write it to **both** `VERSION` and `package.json`. This block runs only when `STATE: FRESH`.
|
||||
3. **Queue-aware pick** (workspace-aware ship):
|
||||
```bash
|
||||
QUEUE_JSON=$(bun run $GSTACK_ROOT/bin/gstack-next-version --base <base> --bump "$BUMP_LEVEL" --current-version "$BASE_VERSION" 2>/dev/null || echo '{"offline":true}')
|
||||
NEW_VERSION=$(echo "$QUEUE_JSON" | jq -r '.version // empty')
|
||||
```
|
||||
If `offline`/util fails: fall back to local `BUMP_LEVEL` arithmetic and print `⚠ workspace-aware ship offline — using local bump only`. If `claimed` is non-empty, render the queue table so the user sees landing order. If an active sibling workspace holds a version `>= NEW_VERSION`, **AskUserQuestion**: advance past (unrelated work) or abort and sync with the sibling.
|
||||
|
||||
```bash
|
||||
if ! printf '%s' "$NEW_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: NEW_VERSION ($NEW_VERSION) does not match MAJOR.MINOR.PATCH.MICRO pattern. Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "$NEW_VERSION" > VERSION
|
||||
if [ -f package.json ]; then
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale. Fix and re-run — the new idempotency check will detect the drift."
|
||||
exit 1
|
||||
}
|
||||
elif command -v bun >/dev/null 2>&1; then
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$NEW_VERSION" || {
|
||||
echo "ERROR: failed to update package.json. VERSION was written but package.json is now stale."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
echo "ERROR: package.json exists but neither node nor bun is available."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
**DRIFT_STALE_PKG repair path** — runs when idempotency reports `STATE: DRIFT_STALE_PKG`. No re-bump; sync `package.json.version` to the current `VERSION` and continue. Reuse `CURRENT_VERSION` for CHANGELOG and PR body.
|
||||
|
||||
```bash
|
||||
REPAIR_VERSION=$(cat VERSION | tr -d '\r\n[:space:]')
|
||||
if ! printf '%s' "$REPAIR_VERSION" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
|
||||
echo "ERROR: VERSION file contents ($REPAIR_VERSION) do not match MAJOR.MINOR.PATCH.MICRO pattern. Refusing to propagate invalid semver into package.json. Fix VERSION manually, then re-run /ship."
|
||||
exit 1
|
||||
fi
|
||||
if command -v node >/dev/null 2>&1; then
|
||||
node -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed — could not update package.json."
|
||||
exit 1
|
||||
}
|
||||
else
|
||||
bun -e 'const fs=require("fs"),p=require("./package.json");p.version=process.argv[1];fs.writeFileSync("package.json",JSON.stringify(p,null,2)+"\n")' "$REPAIR_VERSION" || {
|
||||
echo "ERROR: drift repair failed."
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "Drift repaired: package.json synced to $REPAIR_VERSION. No version bump performed."
|
||||
```
|
||||
|
||||
---
|
||||
4. **Write the bump** (FRESH, or an approved rebump):
|
||||
```bash
|
||||
bun run $GSTACK_ROOT/bin/gstack-version-bump write --version "$NEW_VERSION"
|
||||
```
|
||||
The CLI validates the 4-digit `MAJOR.MINOR.PATCH.MICRO` pattern and writes **both** VERSION and package.json. On a half-write (VERSION written, package.json failed) it exits 3 — re-run, and classify will report DRIFT_STALE_PKG for `repair` to fix.
|
||||
|
||||
## Step 13: CHANGELOG (auto-generate)
|
||||
|
||||
@@ -3124,6 +3015,16 @@ no-op. The marker guarantees at-most-once per machine. To re-enable:
|
||||
|
||||
---
|
||||
|
||||
## Section self-check (before you finish)
|
||||
|
||||
You ran a carved skill. For your situation, list every section the Section index
|
||||
named as applying, and confirm you issued a Read for each one. If you executed any
|
||||
of those steps from memory without reading its section, you skipped the source of
|
||||
truth — STOP, Read it now, and redo that step. Deterministic version work goes
|
||||
through `gstack-version-bump`; never hand-roll the VERSION/package.json write.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
|
||||
+48
-25
@@ -8,6 +8,24 @@ import * as os from 'os';
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
|
||||
|
||||
// Carved-skill aware (v2 plan T9): ship is now a skeleton SKILL.md + sections/*.md.
|
||||
// Read the union so assertions about content that MOVED into a section still pass.
|
||||
// The skeleton is a subset of the union, so skeleton-only assertions also hold,
|
||||
// and negative assertions stay safe (the absent phrases live in neither file).
|
||||
function readSkillUnion(skill: string): string {
|
||||
let t = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const secDir = path.join(ROOT, skill, 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf-8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
function readShipUnion(): string {
|
||||
return readSkillUnion('ship');
|
||||
}
|
||||
|
||||
function extractDescription(content: string): string {
|
||||
const fmEnd = content.indexOf('\n---', 4);
|
||||
expect(fmEnd).toBeGreaterThan(0);
|
||||
@@ -485,7 +503,7 @@ describe('gen-skill-docs', () => {
|
||||
|
||||
describe('BASE_BRANCH_DETECT resolver', () => {
|
||||
// Find a generated SKILL.md that uses the placeholder (ship is guaranteed to)
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('resolver output contains PR base detection command', () => {
|
||||
expect(shipContent).toContain('gh pr view --json baseRefName');
|
||||
@@ -518,7 +536,7 @@ describe('BASE_BRANCH_DETECT resolver', () => {
|
||||
|
||||
describe('GitLab support in generated skills', () => {
|
||||
const retroContent = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||||
const shipSkillContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkillContent = readShipUnion();
|
||||
|
||||
test('retro contains GitLab MR number extraction', () => {
|
||||
expect(retroContent).toContain('[#!]');
|
||||
@@ -634,13 +652,13 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
||||
}
|
||||
|
||||
test('review dashboard appears in ship generated file', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('reviews.jsonl');
|
||||
expect(content).toContain('REVIEW READINESS DASHBOARD');
|
||||
});
|
||||
|
||||
test('dashboard treats review as a valid Eng Review source', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('plan-eng-review, review, plan-design-review');
|
||||
expect(content).toContain('`review` (diff-scoped pre-landing review)');
|
||||
expect(content).toContain('`plan-eng-review` (plan-stage architecture review)');
|
||||
@@ -708,7 +726,7 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
||||
});
|
||||
|
||||
test('ship does NOT contain review chaining', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).not.toContain('Review Chaining');
|
||||
});
|
||||
});
|
||||
@@ -717,7 +735,7 @@ describe('REVIEW_DASHBOARD resolver', () => {
|
||||
|
||||
describe('TEST_COVERAGE_AUDIT placeholders', () => {
|
||||
const planSkill = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan and ship modes share codepath tracing methodology', () => {
|
||||
@@ -874,7 +892,7 @@ describe('TEST_COVERAGE_AUDIT placeholders', () => {
|
||||
// --- {{TEST_FAILURE_TRIAGE}} resolver tests ---
|
||||
|
||||
describe('TEST_FAILURE_TRIAGE resolver', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('contains all 4 triage steps', () => {
|
||||
expect(shipSkill).toContain('Step T1: Classify each failure');
|
||||
@@ -938,7 +956,7 @@ describe('PLAN_FILE_REVIEW_REPORT resolver', () => {
|
||||
// --- {{PLAN_COMPLETION_AUDIT}} resolver tests ---
|
||||
|
||||
describe('PLAN_COMPLETION_AUDIT placeholders', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('ship SKILL.md contains plan completion audit step', () => {
|
||||
@@ -989,7 +1007,7 @@ describe('PLAN_COMPLETION_AUDIT placeholders', () => {
|
||||
// --- {{PLAN_VERIFICATION_EXEC}} resolver tests ---
|
||||
|
||||
describe('PLAN_VERIFICATION_EXEC placeholder', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains plan verification step', () => {
|
||||
expect(shipSkill).toContain('Step 8.1');
|
||||
@@ -1018,7 +1036,7 @@ describe('PLAN_VERIFICATION_EXEC placeholder', () => {
|
||||
// --- Coverage gate tests ---
|
||||
|
||||
describe('Coverage gate in ship', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('ship SKILL.md contains coverage gate with thresholds', () => {
|
||||
@@ -1047,7 +1065,7 @@ describe('Coverage gate in ship', () => {
|
||||
// --- Ship metrics logging ---
|
||||
|
||||
describe('Ship metrics logging', () => {
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains metrics persistence step', () => {
|
||||
expect(shipSkill).toContain('Step 20');
|
||||
@@ -1063,7 +1081,7 @@ describe('Ship metrics logging', () => {
|
||||
describe('Plan file discovery shared helper', () => {
|
||||
// The shared helper should appear in ship (via PLAN_COMPLETION_AUDIT_SHIP)
|
||||
// and in review (via PLAN_COMPLETION_AUDIT_REVIEW)
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
test('plan file discovery appears in both ship and review', () => {
|
||||
@@ -1276,7 +1294,8 @@ describe('Codex filesystem boundary', () => {
|
||||
|
||||
test('boundary instruction appears in all skills that call codex', () => {
|
||||
for (const skill of CODEX_CALLING_SKILLS) {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
// Union: ship's codex call lives in sections/adversarial.md after the carve.
|
||||
const content = readSkillUnion(skill);
|
||||
expect(content).toContain(BOUNDARY_MARKER);
|
||||
}
|
||||
});
|
||||
@@ -1393,7 +1412,7 @@ describe('INVOKE_SKILL resolver', () => {
|
||||
// --- {{CHANGELOG_WORKFLOW}} resolver tests ---
|
||||
|
||||
describe('CHANGELOG_WORKFLOW resolver', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('ship SKILL.md contains changelog workflow', () => {
|
||||
expect(shipContent).toContain('CHANGELOG (auto-generate)');
|
||||
@@ -1410,10 +1429,13 @@ describe('CHANGELOG_WORKFLOW resolver', () => {
|
||||
});
|
||||
|
||||
test('template uses {{CHANGELOG_WORKFLOW}} placeholder', () => {
|
||||
const tmpl = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md.tmpl'), 'utf-8');
|
||||
expect(tmpl).toContain('{{CHANGELOG_WORKFLOW}}');
|
||||
// Should NOT contain the old inline changelog content
|
||||
expect(tmpl).not.toContain('Group commits by theme');
|
||||
// Post-carve (T9): the skeleton points to the changelog section, which carries
|
||||
// the resolver. Neither should inline the old changelog content.
|
||||
const skel = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md.tmpl'), 'utf-8');
|
||||
const changelogSection = fs.readFileSync(path.join(ROOT, 'ship', 'sections', 'changelog.md.tmpl'), 'utf-8');
|
||||
expect(skel).toContain('{{SECTION:changelog}}');
|
||||
expect(changelogSection).toContain('{{CHANGELOG_WORKFLOW}}');
|
||||
expect(skel + changelogSection).not.toContain('Group commits by theme');
|
||||
});
|
||||
|
||||
test('changelog workflow includes keep-changelog format', () => {
|
||||
@@ -1450,7 +1472,7 @@ describe('parameterized resolver support', () => {
|
||||
// --- Preamble routing injection tests ---
|
||||
|
||||
describe('preamble routing injection', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
test('preamble bash checks for routing section in CLAUDE.md', () => {
|
||||
expect(shipContent).toContain('grep -q "## Skill routing" CLAUDE.md');
|
||||
@@ -1594,7 +1616,7 @@ describe('DESIGN_SKETCH extended with outside voices', () => {
|
||||
// --- Extended DESIGN_REVIEW_LITE resolver tests ---
|
||||
|
||||
describe('DESIGN_REVIEW_LITE extended with Codex', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
|
||||
test('contains Codex design voice block', () => {
|
||||
expect(content).toContain('Codex design voice');
|
||||
@@ -1897,7 +1919,7 @@ describe('Codex generation (--host codex)', () => {
|
||||
});
|
||||
|
||||
test('Claude output unchanged: ship skill still uses .claude/skills/ paths', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('~/.claude/skills/gstack');
|
||||
expect(content).not.toContain('.agents/skills');
|
||||
expect(content).not.toContain('~/.codex/');
|
||||
@@ -2586,7 +2608,7 @@ describe('community fixes wave', () => {
|
||||
|
||||
// #573 — Feature signals: ship/SKILL.md contains feature signal detection
|
||||
test('ship/SKILL.md contains feature signal detection in Step 4', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content.toLowerCase()).toContain('feature signal');
|
||||
});
|
||||
|
||||
@@ -2736,7 +2758,8 @@ describe('codex commands must not use inline $(git rev-parse --show-toplevel) fo
|
||||
];
|
||||
|
||||
for (const rel of checkedFiles) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex/adversarial command moved into sections/adversarial.md (T9 carve).
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
|
||||
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
|
||||
}
|
||||
@@ -2750,7 +2773,7 @@ describe('LEARNINGS_SEARCH resolver', () => {
|
||||
|
||||
for (const skill of SEARCH_SKILLS) {
|
||||
test(`${skill} generated SKILL.md contains learnings search`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const content = readSkillUnion(skill); // ship: moved to sections/plan-completion.md
|
||||
expect(content).toContain('Prior Learnings');
|
||||
expect(content).toContain('gstack-learnings-search');
|
||||
});
|
||||
@@ -2811,7 +2834,7 @@ describe('CONFIDENCE_CALIBRATION resolver', () => {
|
||||
|
||||
for (const skill of CONFIDENCE_SKILLS) {
|
||||
test(`${skill} generated SKILL.md contains confidence calibration`, () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
||||
const content = readSkillUnion(skill); // ship: moved to sections/review-army.md
|
||||
expect(content).toContain('Confidence Calibration');
|
||||
expect(content).toContain('confidence score');
|
||||
});
|
||||
|
||||
@@ -0,0 +1,133 @@
|
||||
/**
|
||||
* Tests for the gstack-version-bump CLI (v2 plan T9 hybrid extraction). Covers
|
||||
* the idempotency classifier (pure) + the write/repair mutations (temp fs).
|
||||
* The classifier is the one that prevents re-bumping an already-shipped branch —
|
||||
* the worst /ship footgun — so it gets exhaustive state coverage.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { execFileSync } from 'child_process';
|
||||
import { classifyState, VERSION_RE } from '../bin/gstack-version-bump';
|
||||
|
||||
const BIN = path.join(import.meta.dir, '..', 'bin', 'gstack-version-bump');
|
||||
|
||||
describe('classifyState (idempotency)', () => {
|
||||
test('FRESH when VERSION matches base and pkg agrees', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', true, '1.1.0.0')).toBe('FRESH');
|
||||
});
|
||||
test('FRESH when VERSION matches base and no package.json', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', false, '')).toBe('FRESH');
|
||||
});
|
||||
test('ALREADY_BUMPED when VERSION moved past base and pkg agrees (re-run)', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', true, '1.2.0.0')).toBe('ALREADY_BUMPED');
|
||||
});
|
||||
test('ALREADY_BUMPED when VERSION moved past base, no package.json', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', false, '')).toBe('ALREADY_BUMPED');
|
||||
});
|
||||
test('DRIFT_STALE_PKG when VERSION bumped but pkg lagging', () => {
|
||||
expect(classifyState('1.2.0.0', '1.1.0.0', true, '1.1.0.0')).toBe('DRIFT_STALE_PKG');
|
||||
});
|
||||
test('DRIFT_UNEXPECTED when VERSION matches base but pkg diverges (manual edit)', () => {
|
||||
expect(classifyState('1.1.0.0', '1.1.0.0', true, '1.2.0.0')).toBe('DRIFT_UNEXPECTED');
|
||||
});
|
||||
});
|
||||
|
||||
describe('VERSION_RE', () => {
|
||||
test('accepts 4-digit semver', () => {
|
||||
expect(VERSION_RE.test('1.2.3.4')).toBe(true);
|
||||
});
|
||||
test('rejects 3-digit and garbage', () => {
|
||||
expect(VERSION_RE.test('1.2.3')).toBe(false);
|
||||
expect(VERSION_RE.test('v1.2.3.4')).toBe(false);
|
||||
expect(VERSION_RE.test('1.2.3.4-rc')).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('write (FRESH bump)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-write-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('writes VERSION + package.json.version, preserving other pkg fields', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.0.0.0', scripts: { t: 'y' } }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'write', '--version', '1.1.0.0'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ wrote: '1.1.0.0', packageJson: true });
|
||||
expect(fs.readFileSync(path.join(dir, 'VERSION'), 'utf-8').trim()).toBe('1.1.0.0');
|
||||
const pkg = JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf-8'));
|
||||
expect(pkg.version).toBe('1.1.0.0');
|
||||
expect(pkg.scripts).toEqual({ t: 'y' }); // untouched
|
||||
});
|
||||
|
||||
test('rejects a malformed version with exit 2', () => {
|
||||
let code = 0;
|
||||
try { execFileSync('bun', [BIN, 'write', '--version', '1.2.3'], { cwd: dir, stdio: 'pipe' }); }
|
||||
catch (e: any) { code = e.status; }
|
||||
expect(code).toBe(2);
|
||||
});
|
||||
|
||||
test('VERSION-only repo (no package.json) writes just VERSION', () => {
|
||||
const d2 = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-noPkg-'));
|
||||
fs.writeFileSync(path.join(d2, 'VERSION'), '0.1.0.0\n');
|
||||
const out = execFileSync('bun', [BIN, 'write', '--version', '0.2.0.0'], { cwd: d2 }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ wrote: '0.2.0.0', packageJson: false });
|
||||
expect(fs.readFileSync(path.join(d2, 'VERSION'), 'utf-8').trim()).toBe('0.2.0.0');
|
||||
fs.rmSync(d2, { recursive: true, force: true });
|
||||
});
|
||||
});
|
||||
|
||||
describe('repair (DRIFT_STALE_PKG)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-repair-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('syncs package.json.version up to VERSION, no re-bump', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '2.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.9.0.0' }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'repair'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out)).toEqual({ repaired: '2.0.0.0' });
|
||||
expect(JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf-8')).version).toBe('2.0.0.0');
|
||||
expect(fs.readFileSync(path.join(dir, 'VERSION'), 'utf-8').trim()).toBe('2.0.0.0'); // unchanged
|
||||
});
|
||||
|
||||
test('refuses to propagate an invalid VERSION (exit 2)', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), 'not-a-version\n');
|
||||
let code = 0;
|
||||
try { execFileSync('bun', [BIN, 'repair'], { cwd: dir, stdio: 'pipe' }); }
|
||||
catch (e: any) { code = e.status; }
|
||||
expect(code).toBe(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('classify (idempotency over a real git base)', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'vbump-classify-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// Build a tiny repo with an "origin/main" carrying VERSION=1.0.0.0.
|
||||
const git = (...a: string[]) => execFileSync('git', a, { cwd: dir, stdio: 'pipe' });
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.0.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.0.0.0' }, null, 2) + '\n');
|
||||
git('init', '-q', '-b', 'main');
|
||||
git('config', 'user.email', 't@t'); git('config', 'user.name', 't');
|
||||
git('add', '-A'); git('commit', '-q', '-m', 'base');
|
||||
// Fake an "origin/main" remote-tracking ref pointing at this commit.
|
||||
const head = execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir }).toString().trim();
|
||||
fs.mkdirSync(path.join(dir, '.git', 'refs', 'remotes', 'origin'), { recursive: true });
|
||||
fs.writeFileSync(path.join(dir, '.git', 'refs', 'remotes', 'origin', 'main'), head + '\n');
|
||||
|
||||
test('reports FRESH before any bump', () => {
|
||||
const out = execFileSync('bun', [BIN, 'classify', '--base', 'main'], { cwd: dir }).toString();
|
||||
expect(JSON.parse(out).state).toBe('FRESH');
|
||||
});
|
||||
|
||||
test('reports ALREADY_BUMPED after VERSION+pkg move together', () => {
|
||||
fs.writeFileSync(path.join(dir, 'VERSION'), '1.1.0.0\n');
|
||||
fs.writeFileSync(path.join(dir, 'package.json'), JSON.stringify({ name: 'x', version: '1.1.0.0' }, null, 2) + '\n');
|
||||
const out = execFileSync('bun', [BIN, 'classify', '--base', 'main'], { cwd: dir }).toString();
|
||||
const parsed = JSON.parse(out);
|
||||
expect(parsed.state).toBe('ALREADY_BUMPED');
|
||||
expect(parsed.baseVersion).toBe('1.0.0.0');
|
||||
expect(parsed.currentVersion).toBe('1.1.0.0');
|
||||
});
|
||||
});
|
||||
@@ -33,6 +33,22 @@ export interface ParityInvariant {
|
||||
maxSizeRatio?: number;
|
||||
/** Minimum byte size (catches over-stripping cliffs). */
|
||||
minBytes?: number;
|
||||
/**
|
||||
* Carved skill (v2 plan T9): the skill is a skeleton SKILL.md plus on-demand
|
||||
* sections/*.md. When true:
|
||||
* - mustContain / mustHaveHeadings run against skeleton + ALL sections unioned,
|
||||
* so a phrase that moved into a section still counts (content preserved, just
|
||||
* relocated — that's the whole point of the carve).
|
||||
* - minBytes / maxSizeRatio run against the UNION bytes, not the skeleton alone
|
||||
* (total behavior must not shrink; the win is what's no longer always-loaded,
|
||||
* which the union size deliberately does NOT measure — maxSkeletonBytes does).
|
||||
* - maxSkeletonBytes asserts the always-loaded skeleton actually shrank.
|
||||
* Without this, lowering minBytes to fit a 65KB skeleton would make the size
|
||||
* floor toothless (Codex outside-voice #12).
|
||||
*/
|
||||
sectioned?: boolean;
|
||||
/** Max bytes for the always-loaded skeleton SKILL.md (carved skills only). */
|
||||
maxSkeletonBytes?: number;
|
||||
}
|
||||
|
||||
export interface ParityCheckResult {
|
||||
@@ -41,6 +57,35 @@ export interface ParityCheckResult {
|
||||
failures: string[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Read a skill's check text + sizes. For a carved skill, union the skeleton with
|
||||
* every sections/*.md so relocated content still counts and the union size
|
||||
* measures total preserved behavior; skeletonBytes is reported separately so the
|
||||
* always-loaded shrink can be asserted. For a monolith, text == skeleton.
|
||||
*/
|
||||
export function readSkillForParity(
|
||||
repoRoot: string,
|
||||
skill: string,
|
||||
sectioned: boolean,
|
||||
): { text: string; unionBytes: number; skeletonBytes: number } {
|
||||
const skeleton = fs.readFileSync(path.join(repoRoot, skill, 'SKILL.md'), 'utf-8');
|
||||
const skeletonBytes = Buffer.byteLength(skeleton, 'utf-8');
|
||||
if (!sectioned) return { text: skeleton, unionBytes: skeletonBytes, skeletonBytes };
|
||||
|
||||
let text = skeleton;
|
||||
let unionBytes = skeletonBytes;
|
||||
const sectionsDir = path.join(repoRoot, skill, 'sections');
|
||||
if (fs.existsSync(sectionsDir)) {
|
||||
for (const f of fs.readdirSync(sectionsDir).sort()) {
|
||||
if (!f.endsWith('.md')) continue;
|
||||
const sec = fs.readFileSync(path.join(sectionsDir, f), 'utf-8');
|
||||
text += '\n' + sec;
|
||||
unionBytes += Buffer.byteLength(sec, 'utf-8');
|
||||
}
|
||||
}
|
||||
return { text, unionBytes, skeletonBytes };
|
||||
}
|
||||
|
||||
export function checkSkillParity(
|
||||
invariant: ParityInvariant,
|
||||
current: SkillBaselineEntry,
|
||||
@@ -48,38 +93,54 @@ export function checkSkillParity(
|
||||
repoRoot: string,
|
||||
): ParityCheckResult {
|
||||
const failures: string[] = [];
|
||||
const needText = !!(invariant.mustContain?.length || invariant.mustHaveHeadings?.length);
|
||||
|
||||
// SIZE checks
|
||||
// Resolve the text + size to check against. Carved skills union skeleton +
|
||||
// sections; monoliths use the skeleton alone. Read on demand so size-only
|
||||
// invariants don't pay for a file read they don't need (monolith path).
|
||||
let checkText: string | null = null;
|
||||
let checkBytes = current.skillMdBytes;
|
||||
if (invariant.sectioned) {
|
||||
try {
|
||||
const r = readSkillForParity(repoRoot, invariant.skill, true);
|
||||
checkText = r.text;
|
||||
checkBytes = r.unionBytes;
|
||||
if (invariant.maxSkeletonBytes !== undefined && r.skeletonBytes > invariant.maxSkeletonBytes) {
|
||||
failures.push(`skeleton ${r.skeletonBytes} > maxSkeletonBytes ${invariant.maxSkeletonBytes}`);
|
||||
}
|
||||
} catch (err) {
|
||||
failures.push(`cannot read carved skill ${invariant.skill}: ${(err as Error).message}`);
|
||||
}
|
||||
} else if (needText) {
|
||||
try {
|
||||
checkText = fs.readFileSync(path.join(repoRoot, invariant.skill, 'SKILL.md'), 'utf-8');
|
||||
} catch (err) {
|
||||
failures.push(`cannot read ${path.join(repoRoot, invariant.skill, 'SKILL.md')}: ${(err as Error).message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// SIZE checks (union bytes for carved skills, skeleton bytes for monoliths)
|
||||
if (invariant.maxSizeRatio !== undefined && baseline) {
|
||||
const ratio = current.skillMdBytes / baseline.skillMdBytes;
|
||||
const ratio = checkBytes / baseline.skillMdBytes;
|
||||
if (ratio > invariant.maxSizeRatio) {
|
||||
failures.push(`size ratio ${ratio.toFixed(3)} > maxSizeRatio ${invariant.maxSizeRatio}`);
|
||||
}
|
||||
}
|
||||
if (invariant.minBytes !== undefined && current.skillMdBytes < invariant.minBytes) {
|
||||
failures.push(`size ${current.skillMdBytes} < minBytes ${invariant.minBytes}`);
|
||||
if (invariant.minBytes !== undefined && checkBytes < invariant.minBytes) {
|
||||
failures.push(`size ${checkBytes} < minBytes ${invariant.minBytes}`);
|
||||
}
|
||||
|
||||
// CONTENT checks (read live file for fresh content)
|
||||
if (invariant.mustContain?.length || invariant.mustHaveHeadings?.length) {
|
||||
const skillMdPath = path.join(repoRoot, invariant.skill, 'SKILL.md');
|
||||
let content: string | null = null;
|
||||
try {
|
||||
content = fs.readFileSync(skillMdPath, 'utf-8');
|
||||
} catch (err) {
|
||||
failures.push(`cannot read ${skillMdPath}: ${(err as Error).message}`);
|
||||
}
|
||||
if (content) {
|
||||
const lower = content.toLowerCase();
|
||||
for (const phrase of invariant.mustContain ?? []) {
|
||||
if (!lower.includes(phrase.toLowerCase())) {
|
||||
failures.push(`missing required phrase: "${phrase}"`);
|
||||
}
|
||||
// CONTENT checks
|
||||
if (needText && checkText !== null) {
|
||||
const lower = checkText.toLowerCase();
|
||||
for (const phrase of invariant.mustContain ?? []) {
|
||||
if (!lower.includes(phrase.toLowerCase())) {
|
||||
failures.push(`missing required phrase: "${phrase}"`);
|
||||
}
|
||||
for (const heading of invariant.mustHaveHeadings ?? []) {
|
||||
if (!content.includes(heading)) {
|
||||
failures.push(`missing required heading: "${heading}"`);
|
||||
}
|
||||
}
|
||||
for (const heading of invariant.mustHaveHeadings ?? []) {
|
||||
if (!checkText.includes(heading)) {
|
||||
failures.push(`missing required heading: "${heading}"`);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -146,7 +207,13 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
|
||||
minBytes: 30_000,
|
||||
},
|
||||
{
|
||||
// Carved (v2 plan T9): skeleton SKILL.md + sections/*.md. Content checks run
|
||||
// against the union (relocated phrases still count); size floors run against
|
||||
// the union (total behavior preserved); maxSkeletonBytes asserts the
|
||||
// always-loaded skeleton actually shrank from the ~167KB monolith.
|
||||
skill: 'ship',
|
||||
sectioned: true,
|
||||
maxSkeletonBytes: 90_000,
|
||||
mustContain: [
|
||||
'VERSION',
|
||||
'CHANGELOG',
|
||||
@@ -156,7 +223,7 @@ export const PARITY_INVARIANTS: ParityInvariant[] = [
|
||||
],
|
||||
mustHaveHeadings: ['## Preamble', '## When to invoke'],
|
||||
maxSizeRatio: 1.05,
|
||||
minBytes: 80_000,
|
||||
minBytes: 120_000,
|
||||
},
|
||||
{
|
||||
skill: 'plan-ceo-review',
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
/**
|
||||
* requiredReads enforcement (v2 plan T9, mitigation layer 5 — the only CI-failing
|
||||
* layer against silent section-skip).
|
||||
*
|
||||
* Given a /ship run's tool calls and the set of section files the run's SITUATION
|
||||
* required, assert the agent actually Read each one. The required set comes from
|
||||
* the TEST FIXTURE (which situation it set up), NOT from the manifest — the
|
||||
* manifest is passive (CM2). This keeps "when is a section required" in exactly
|
||||
* one machine-checkable place: the eval fixtures.
|
||||
*
|
||||
* Builds on extractSectionReads from transcript-section-logger so section-path
|
||||
* matching (the `/sections/<file>.md` segment, host-layout agnostic) lives in one
|
||||
* place.
|
||||
*/
|
||||
|
||||
import { extractSectionReads, type TranscriptResultLike } from './transcript-section-logger';
|
||||
|
||||
export interface RequiredReadsResult {
|
||||
required: string[];
|
||||
read: string[];
|
||||
missing: string[];
|
||||
ok: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* @param result the skill run (anything with toolCalls)
|
||||
* @param requiredFiles section basenames the situation required, e.g.
|
||||
* ['version-bump.md','changelog.md'] (or with a sections/
|
||||
* prefix — normalized to basename here)
|
||||
*/
|
||||
export function assertRequiredReads(
|
||||
result: TranscriptResultLike,
|
||||
requiredFiles: string[],
|
||||
): RequiredReadsResult {
|
||||
const read = extractSectionReads(result);
|
||||
const readSet = new Set(read);
|
||||
const required = requiredFiles.map(f => f.replace(/^.*\//, '')); // tolerate sections/<f>
|
||||
const missing = required.filter(f => !readSet.has(f));
|
||||
return { required, read, missing, ok: missing.length === 0 };
|
||||
}
|
||||
@@ -120,7 +120,8 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
'plan-ceo-mode-routing': ['plan-ceo-review/**', 'scripts/resolvers/preamble/generate-ask-user-format.ts', 'scripts/resolvers/preamble.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'plan-design-with-ui-scope': ['plan-design-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'],
|
||||
'budget-regression-pty': ['test/helpers/eval-store.ts', 'test/skill-budget-regression.test.ts'],
|
||||
'ship-idempotency-pty': ['ship/**', 'bin/gstack-next-version', 'lib/worktree.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'ship-idempotency-pty': ['ship/**', 'bin/gstack-next-version', 'bin/gstack-version-bump', 'scripts/resolvers/sections.ts', 'lib/worktree.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'ship-section-loading': ['ship/**', 'scripts/resolvers/sections.ts', 'scripts/gen-skill-docs.ts', 'test/helpers/required-reads.ts', 'test/helpers/transcript-section-logger.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
'autoplan-chain-pty': ['autoplan/**', 'plan-ceo-review/**', 'plan-design-review/**', 'plan-eng-review/**', 'plan-devex-review/**', 'test/fixtures/plans/ui-heavy-feature.md', 'test/helpers/claude-pty-runner.ts'],
|
||||
'e2e-harness-audit': ['plan-ceo-review/**', 'plan-eng-review/**', 'plan-design-review/**', 'plan-devex-review/**', 'scripts/resolvers/preamble/generate-completion-status.ts', 'test/helpers/agent-sdk-runner.ts', 'test/helpers/claude-pty-runner.ts'],
|
||||
|
||||
@@ -508,6 +509,7 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
|
||||
'plan-design-with-ui-scope': 'gate', // ~$0.80/run
|
||||
'budget-regression-pty': 'gate', // free, library-only assertion
|
||||
'ship-idempotency-pty': 'periodic', // ~$3/run, real /ship in plan mode
|
||||
'ship-section-loading': 'periodic', // ~$3/run, real /ship; asserts section reads
|
||||
'autoplan-chain-pty': 'periodic', // ~$8/run, all 3 phases sequential
|
||||
|
||||
// Per-finding count + review-report-at-bottom — periodic because each
|
||||
|
||||
@@ -0,0 +1,196 @@
|
||||
/**
|
||||
* Transcript section logger (v2 plan T10).
|
||||
*
|
||||
* Two jobs, both pure analysis over a SkillTestResult / NDJSON transcript:
|
||||
*
|
||||
* 1. extractSectionReads() — which `sections/*.md` files a run actually Read.
|
||||
* Used by the sectioned world (post-carve) to verify the agent opened the
|
||||
* chapters its situation required.
|
||||
*
|
||||
* 2. extractShipActions() — an observable ACTION fingerprint of a /ship run
|
||||
* (ran tests, bumped VERSION, wrote CHANGELOG, created PR, ...). This works
|
||||
* on BOTH the monolith and the sectioned skill, which is the whole point:
|
||||
* capture a baseline on the current monolith ship FIRST, then assert the
|
||||
* sectioned ship still performs the same actions. A section-read check alone
|
||||
* can't catch "agent read the chapter but skipped the step"; the action
|
||||
* fingerprint can.
|
||||
*
|
||||
* Why baseline-first (Codex outside-voice critique on the T9 plan): a logger
|
||||
* shipped in the same PR as the carve is post-failure telemetry unless it has a
|
||||
* pre-carve reference. captureShipBaseline() records the monolith's action
|
||||
* fingerprint so compareShipActions() can flag a regression introduced by the
|
||||
* carve.
|
||||
*
|
||||
* Pure functions, no I/O except the explicit read/write baseline helpers. The
|
||||
* unit tests drive these with synthetic transcripts — no paid run needed to
|
||||
* validate the logic.
|
||||
*/
|
||||
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
/** Minimal shape we need from SkillTestResult — kept structural so callers can
|
||||
* pass a full SkillTestResult or a hand-built fixture in unit tests. */
|
||||
export interface ToolCallLike {
|
||||
tool: string;
|
||||
input: unknown;
|
||||
output?: string;
|
||||
}
|
||||
export interface TranscriptResultLike {
|
||||
toolCalls: ToolCallLike[];
|
||||
output?: string;
|
||||
}
|
||||
|
||||
/** Pull the file_path off a tool-call input, tolerating unknown shapes. */
|
||||
function readFilePath(input: unknown): string | null {
|
||||
if (input && typeof input === 'object') {
|
||||
const fp = (input as Record<string, unknown>).file_path;
|
||||
if (typeof fp === 'string') return fp;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/** Pull the command string off a Bash tool-call input. */
|
||||
function bashCommand(input: unknown): string | null {
|
||||
if (input && typeof input === 'object') {
|
||||
const cmd = (input as Record<string, unknown>).command;
|
||||
if (typeof cmd === 'string') return cmd;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Every `sections/<name>.md` file the run Read, normalized to the section
|
||||
* basename (e.g. "version-bump.md"). Deduped, in first-Read order. Matching is
|
||||
* on the path segment `/sections/<file>.md` so it works regardless of whether
|
||||
* the host resolved a relative, absolute, or prefixed install path.
|
||||
*/
|
||||
export function extractSectionReads(result: TranscriptResultLike): string[] {
|
||||
const seen = new Set<string>();
|
||||
const ordered: string[] = [];
|
||||
for (const call of result.toolCalls) {
|
||||
if (call.tool !== 'Read') continue;
|
||||
const fp = readFilePath(call.input);
|
||||
if (!fp) continue;
|
||||
const m = fp.match(/(?:^|\/)sections\/([A-Za-z0-9._-]+\.md)$/);
|
||||
if (!m) continue;
|
||||
const name = m[1];
|
||||
if (!seen.has(name)) {
|
||||
seen.add(name);
|
||||
ordered.push(name);
|
||||
}
|
||||
}
|
||||
return ordered;
|
||||
}
|
||||
|
||||
/**
|
||||
* The canonical /ship action vocabulary. Each action is detected from the Bash
|
||||
* commands the agent ran (plus a couple of Write/Edit signals). Order is the
|
||||
* rough ship sequence; detection is order-independent.
|
||||
*
|
||||
* Keep this list aligned with the ship skeleton's numbered steps. The
|
||||
* section-loading eval asserts the sectioned ship still triggers the same
|
||||
* actions a monolith run did for the same fixture situation.
|
||||
*/
|
||||
export const SHIP_ACTIONS = [
|
||||
'merged_base', // git merge <base>
|
||||
'ran_tests', // bun test / npm test / the project test cmd
|
||||
'bumped_version', // wrote VERSION / package.json version / ran gstack-version-bump
|
||||
'wrote_changelog', // edited CHANGELOG.md
|
||||
'committed', // git commit
|
||||
'pushed', // git push
|
||||
'opened_pr', // gh pr create / glab mr create
|
||||
] as const;
|
||||
export type ShipAction = (typeof SHIP_ACTIONS)[number];
|
||||
|
||||
const BASH_ACTION_PATTERNS: Array<{ action: ShipAction; re: RegExp }> = [
|
||||
{ action: 'merged_base', re: /\bgit\s+merge\b/ },
|
||||
{ action: 'ran_tests', re: /\b(bun\s+test|npm\s+(run\s+)?test|yarn\s+test|pytest|go\s+test|cargo\s+test|rspec)\b/ },
|
||||
{ action: 'bumped_version', re: /gstack-version-bump\b|gstack-next-version\b|>\s*VERSION\b|npm\s+version\b/ },
|
||||
{ action: 'wrote_changelog', re: /CHANGELOG\.md/ },
|
||||
{ action: 'committed', re: /\bgit\s+commit\b/ },
|
||||
{ action: 'pushed', re: /\bgit\s+push\b/ },
|
||||
{ action: 'opened_pr', re: /\bgh\s+pr\s+create\b|\bglab\s+mr\s+create\b/ },
|
||||
];
|
||||
|
||||
/**
|
||||
* The observable action fingerprint of a ship run. Works on monolith AND
|
||||
* sectioned skills because it reads what the agent DID (Bash + file writes),
|
||||
* not which prose it loaded.
|
||||
*/
|
||||
export function extractShipActions(result: TranscriptResultLike): ShipAction[] {
|
||||
const found = new Set<ShipAction>();
|
||||
for (const call of result.toolCalls) {
|
||||
if (call.tool === 'Bash') {
|
||||
const cmd = bashCommand(call.input);
|
||||
if (!cmd) continue;
|
||||
for (const { action, re } of BASH_ACTION_PATTERNS) {
|
||||
if (re.test(cmd)) found.add(action);
|
||||
}
|
||||
} else if (call.tool === 'Write' || call.tool === 'Edit') {
|
||||
const fp = readFilePath(call.input);
|
||||
if (fp && /CHANGELOG\.md$/.test(fp)) found.add('wrote_changelog');
|
||||
if (fp && /(?:^|\/)VERSION$/.test(fp)) found.add('bumped_version');
|
||||
}
|
||||
}
|
||||
// Preserve canonical order.
|
||||
return SHIP_ACTIONS.filter(a => found.has(a));
|
||||
}
|
||||
|
||||
export interface ShipBaseline {
|
||||
tag: string;
|
||||
/** Fixture/situation id this baseline was captured for. */
|
||||
situation: string;
|
||||
/** Action fingerprint observed on the monolith ship. */
|
||||
actions: ShipAction[];
|
||||
/** Section reads observed (empty on the monolith — present after carve). */
|
||||
sectionReads: string[];
|
||||
capturedAt: string;
|
||||
}
|
||||
|
||||
const DEFAULT_BASELINE_DIR = path.join(os.homedir(), '.gstack-dev', 'ship-baselines');
|
||||
|
||||
/** Where a baseline for a given situation lives. */
|
||||
export function baselinePath(situation: string, dir = DEFAULT_BASELINE_DIR): string {
|
||||
return path.join(dir, `${situation}.json`);
|
||||
}
|
||||
|
||||
/** Persist a ship baseline (used once on the monolith, before the carve). */
|
||||
export function writeShipBaseline(baseline: ShipBaseline, dir = DEFAULT_BASELINE_DIR): string {
|
||||
fs.mkdirSync(dir, { recursive: true });
|
||||
const p = baselinePath(baseline.situation, dir);
|
||||
fs.writeFileSync(p, JSON.stringify(baseline, null, 2) + '\n');
|
||||
return p;
|
||||
}
|
||||
|
||||
/** Read a previously-captured baseline, or null if none exists yet. */
|
||||
export function readShipBaseline(situation: string, dir = DEFAULT_BASELINE_DIR): ShipBaseline | null {
|
||||
try {
|
||||
return JSON.parse(fs.readFileSync(baselinePath(situation, dir), 'utf-8')) as ShipBaseline;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export interface ShipActionDiff {
|
||||
/** Actions the baseline performed that the current run did NOT (the regression set). */
|
||||
missing: ShipAction[];
|
||||
/** Actions the current run performed that the baseline did not (usually fine). */
|
||||
added: ShipAction[];
|
||||
/** True when no baseline action was dropped. */
|
||||
ok: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compare a current sectioned-ship run against the monolith baseline. A dropped
|
||||
* action (in baseline, not in current) is the carve regression we care about:
|
||||
* the sectioned ship stopped doing something the monolith did.
|
||||
*/
|
||||
export function compareShipActions(baseline: ShipBaseline, current: ShipAction[]): ShipActionDiff {
|
||||
const cur = new Set(current);
|
||||
const base = new Set(baseline.actions);
|
||||
const missing = baseline.actions.filter(a => !cur.has(a));
|
||||
const added = current.filter(a => !base.has(a));
|
||||
return { missing, added, ok: missing.length === 0 };
|
||||
}
|
||||
@@ -0,0 +1,88 @@
|
||||
/**
|
||||
* Unit coverage for the sectioned-parity capability (v2 plan T9, guards the
|
||||
* carve). Proves that a carved skill's relocated content still counts (union of
|
||||
* skeleton + sections), the always-loaded skeleton shrink is asserted
|
||||
* separately (maxSkeletonBytes), and size floors run against the union so they
|
||||
* stay meaningful (Codex outside-voice #12). Synthetic fixture — no ship carve
|
||||
* needed to validate the logic.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import { checkSkillParity, readSkillForParity, type ParityInvariant } from './helpers/parity-harness';
|
||||
import type { SkillBaselineEntry } from './helpers/capture-parity-baseline';
|
||||
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'parity-sectioned-'));
|
||||
afterAll(() => { try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
// Carved "ship": a small skeleton + two sections holding the relocated prose.
|
||||
fs.mkdirSync(path.join(root, 'ship', 'sections'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'ship', 'SKILL.md'),
|
||||
'## Preamble\nskeleton body, decision tree, VERSION bump step calls the CLI.\n## When to invoke\n');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'changelog.md'), '# Changelog\nWrite the CHANGELOG entry here.\n');
|
||||
fs.writeFileSync(path.join(root, 'ship', 'sections', 'review-army.md'), '# Review\nDispatch the pre-landing review army.\n');
|
||||
|
||||
// A monolith control skill.
|
||||
fs.mkdirSync(path.join(root, 'mono'), { recursive: true });
|
||||
fs.writeFileSync(path.join(root, 'mono', 'SKILL.md'), '## Preamble\nVERSION CHANGELOG review all inline here.\n');
|
||||
|
||||
const skeletonBytes = Buffer.byteLength(fs.readFileSync(path.join(root, 'ship', 'SKILL.md'), 'utf-8'), 'utf-8');
|
||||
const unionBytes = readSkillForParity(root, 'ship', true).unionBytes;
|
||||
const baseline: SkillBaselineEntry = { skillMdBytes: unionBytes } as SkillBaselineEntry;
|
||||
|
||||
describe('readSkillForParity', () => {
|
||||
test('unions skeleton + sections for carved skills', () => {
|
||||
const r = readSkillForParity(root, 'ship', true);
|
||||
expect(r.text).toContain('CHANGELOG'); // from changelog.md
|
||||
expect(r.text).toContain('review army'); // from review-army.md
|
||||
expect(r.skeletonBytes).toBe(skeletonBytes);
|
||||
expect(r.unionBytes).toBeGreaterThan(r.skeletonBytes);
|
||||
});
|
||||
test('monolith text == skeleton, union == skeleton', () => {
|
||||
const r = readSkillForParity(root, 'mono', false);
|
||||
expect(r.unionBytes).toBe(r.skeletonBytes);
|
||||
});
|
||||
});
|
||||
|
||||
describe('checkSkillParity (sectioned)', () => {
|
||||
test('finds phrases that moved into sections (union content check)', () => {
|
||||
const inv: ParityInvariant = {
|
||||
skill: 'ship', sectioned: true,
|
||||
mustContain: ['VERSION', 'CHANGELOG', 'review army'],
|
||||
mustHaveHeadings: ['## Preamble', '## When to invoke'],
|
||||
};
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true);
|
||||
});
|
||||
|
||||
test('maxSkeletonBytes catches a skeleton that did not shrink', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, maxSkeletonBytes: 10 };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(false);
|
||||
expect(res.failures.join()).toContain('maxSkeletonBytes');
|
||||
});
|
||||
|
||||
test('minBytes runs against the union, not the skeleton (content preserved)', () => {
|
||||
// A floor between skeletonBytes and unionBytes must PASS for sectioned skills,
|
||||
// because the union (total behavior) is what must not shrink.
|
||||
const floor = Math.floor((skeletonBytes + unionBytes) / 2);
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, minBytes: floor };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true);
|
||||
});
|
||||
|
||||
test('flags a phrase that truly went missing', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, mustContain: ['this-phrase-is-not-anywhere'] };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(false);
|
||||
expect(res.failures.join()).toContain('missing required phrase');
|
||||
});
|
||||
|
||||
test('maxSizeRatio uses union bytes vs baseline (carve preserves ~total size)', () => {
|
||||
const inv: ParityInvariant = { skill: 'ship', sectioned: true, maxSizeRatio: 1.05 };
|
||||
const res = checkSkillParity(inv, { skillMdBytes: skeletonBytes } as SkillBaselineEntry, baseline, root);
|
||||
expect(res.passed).toBe(true); // union == baseline here → ratio 1.0
|
||||
});
|
||||
});
|
||||
@@ -83,9 +83,22 @@ describe("#1539 generated SKILL.md files — gate propagated to all consumers",
|
||||
"ship/SKILL.md",
|
||||
];
|
||||
|
||||
// ship's confidence-calibration gate moved into sections/review-army.md (T9 carve);
|
||||
// read the skeleton+sections union so the gate is still found.
|
||||
const readUnion = (rel: string): string => {
|
||||
let body = fs.readFileSync(path.join(ROOT, rel), "utf-8");
|
||||
const secDir = path.join(ROOT, path.dirname(rel), "sections");
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith(".md")) body += "\n" + fs.readFileSync(path.join(secDir, f), "utf-8");
|
||||
}
|
||||
}
|
||||
return body;
|
||||
};
|
||||
|
||||
for (const rel of consumers) {
|
||||
test(`${rel} carries the Pre-emit verification gate`, () => {
|
||||
const body = fs.readFileSync(path.join(ROOT, rel), "utf-8");
|
||||
const body = readUnion(rel);
|
||||
expect(body).toMatch(/Pre-emit verification gate/);
|
||||
expect(body).toMatch(/Quote the specific code line/);
|
||||
});
|
||||
|
||||
@@ -0,0 +1,41 @@
|
||||
/**
|
||||
* Unit tests for assertRequiredReads (v2 plan T9 mitigation layer 5). Pure logic
|
||||
* over synthetic tool-call transcripts — the section-loading E2E (paid) drives
|
||||
* this against real /ship runs.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { assertRequiredReads } from './helpers/required-reads';
|
||||
import type { ToolCallLike } from './helpers/transcript-section-logger';
|
||||
|
||||
const read = (fp: string): ToolCallLike => ({ tool: 'Read', input: { file_path: fp }, output: '' });
|
||||
|
||||
describe('assertRequiredReads', () => {
|
||||
test('passes when every required section was Read', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('/Users/x/.claude/skills/gstack/ship/sections/version-bump.md'),
|
||||
read('ship/sections/changelog.md'),
|
||||
],
|
||||
};
|
||||
const r = assertRequiredReads(result, ['version-bump.md', 'changelog.md']);
|
||||
expect(r.ok).toBe(true);
|
||||
expect(r.missing).toEqual([]);
|
||||
});
|
||||
|
||||
test('flags a required section the agent never opened', () => {
|
||||
const result = { toolCalls: [read('ship/sections/changelog.md')] };
|
||||
const r = assertRequiredReads(result, ['version-bump.md', 'changelog.md']);
|
||||
expect(r.ok).toBe(false);
|
||||
expect(r.missing).toEqual(['version-bump.md']);
|
||||
});
|
||||
|
||||
test('tolerates a sections/ prefix in the required list', () => {
|
||||
const result = { toolCalls: [read('/abs/gstack/ship/sections/review-army.md')] };
|
||||
expect(assertRequiredReads(result, ['sections/review-army.md']).ok).toBe(true);
|
||||
});
|
||||
|
||||
test('empty required set always passes', () => {
|
||||
expect(assertRequiredReads({ toolCalls: [] }, []).ok).toBe(true);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,77 @@
|
||||
/**
|
||||
* Section manifest ↔ filesystem consistency (v2 plan T9 / Phase C orphan check).
|
||||
*
|
||||
* Implements the 3-tier orphan classification from v2_PLAN.md:
|
||||
* - generated orphan (sections/X.md with no sections/X.md.tmpl) → FAIL
|
||||
* - hand-edited generated file (X.md missing the AUTO-GENERATED header) → FAIL
|
||||
* - manifest orphan (sections/X.md.tmpl not listed in manifest) → WARN (v2.0)
|
||||
*
|
||||
* Also pins the PASSIVE-manifest contract (CM2 / v2_PLAN.md:663): manifest entries
|
||||
* carry only id/file/title/trigger — no machine predicate (applies_when/required_for).
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const SHIP_SECTIONS = path.join(ROOT, 'ship', 'sections');
|
||||
const manifest = JSON.parse(fs.readFileSync(path.join(SHIP_SECTIONS, 'manifest.json'), 'utf-8'));
|
||||
|
||||
const sectionTmpls = fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md.tmpl'));
|
||||
const sectionMds = fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md') && !f.endsWith('.md.tmpl'));
|
||||
|
||||
describe('section manifest ↔ filesystem consistency', () => {
|
||||
test('manifest parses with skill + sections array', () => {
|
||||
expect(manifest.skill).toBe('ship');
|
||||
expect(Array.isArray(manifest.sections)).toBe(true);
|
||||
expect(manifest.sections.length).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
test('every manifest entry has a .md.tmpl source AND a generated .md', () => {
|
||||
for (const s of manifest.sections) {
|
||||
expect(fs.existsSync(path.join(SHIP_SECTIONS, `${s.file}.tmpl`))).toBe(true);
|
||||
expect(fs.existsSync(path.join(SHIP_SECTIONS, s.file))).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
test('manifest is PASSIVE — no applies_when / required_for predicate (CM2)', () => {
|
||||
for (const s of manifest.sections) {
|
||||
expect(s).not.toHaveProperty('applies_when');
|
||||
expect(s).not.toHaveProperty('required_for');
|
||||
// The allowed passive shape:
|
||||
expect(typeof s.id).toBe('string');
|
||||
expect(typeof s.file).toBe('string');
|
||||
expect(typeof s.title).toBe('string');
|
||||
expect(typeof s.trigger).toBe('string');
|
||||
}
|
||||
});
|
||||
|
||||
test('no generated orphan: every sections/X.md has a sections/X.md.tmpl → FAIL', () => {
|
||||
const orphans = sectionMds.filter(md => !sectionTmpls.includes(`${md}.tmpl`));
|
||||
expect(orphans).toEqual([]);
|
||||
});
|
||||
|
||||
test('no hand-edited generated file: every sections/X.md has the AUTO-GENERATED header → FAIL', () => {
|
||||
for (const md of sectionMds) {
|
||||
const head = fs.readFileSync(path.join(SHIP_SECTIONS, md), 'utf-8').slice(0, 120);
|
||||
expect(head).toContain('AUTO-GENERATED');
|
||||
}
|
||||
});
|
||||
|
||||
test('manifest orphan check (WARN in v2.0): every .md.tmpl is listed', () => {
|
||||
const listed = new Set(manifest.sections.map((s: { file: string }) => `${s.file}.tmpl`));
|
||||
const unlisted = sectionTmpls.filter(t => !listed.has(t));
|
||||
if (unlisted.length > 0) {
|
||||
// v2_PLAN.md: WARN now, FAIL in v2.1. Surface, don't fail the build yet.
|
||||
// eslint-disable-next-line no-console
|
||||
console.warn(`[section-manifest] manifest orphan(s) (not in manifest.json): ${unlisted.join(', ')}`);
|
||||
}
|
||||
expect(unlisted.length).toBeLessThanOrEqual(unlisted.length); // always passes; WARN only
|
||||
});
|
||||
|
||||
test('section ids are unique', () => {
|
||||
const ids = manifest.sections.map((s: { id: string }) => s.id);
|
||||
expect(new Set(ids).size).toBe(ids.length);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,48 @@
|
||||
/**
|
||||
* Static invariant: the two install targets that cherry-pick SKILL.md (Claude
|
||||
* prefixed dirs + Kiro) must ALSO install the sections/ subdir, or a carved
|
||||
* skill's runtime "Read sections/<name>.md" 404s. codex/factory/opencode link
|
||||
* the whole generated dir, so sections ride along for free there.
|
||||
*
|
||||
* Matches the repo's static-tripwire style (setup-windows-fallback,
|
||||
* cdp-session-cleanup). End-to-end "sections resolve in a temp install" runs in
|
||||
* the group-5/6 functional pass once real ship/sections/ exist.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const SETUP = fs.readFileSync(path.join(import.meta.dir, '..', 'setup'), 'utf-8');
|
||||
|
||||
/** Body of a shell function `name() { ... }` up to the closing line `}`. */
|
||||
function fnBody(src: string, name: string): string {
|
||||
const start = src.indexOf(`${name}() {`);
|
||||
if (start === -1) return '';
|
||||
const end = src.indexOf('\n}', start);
|
||||
return src.slice(start, end === -1 ? undefined : end);
|
||||
}
|
||||
|
||||
describe('setup links sections/ for cherry-pick install targets', () => {
|
||||
test('link_claude_skill_dirs links sections/ via _link_or_copy', () => {
|
||||
const body = fnBody(SETUP, 'link_claude_skill_dirs');
|
||||
expect(body).toContain('sections');
|
||||
// sections install must route through the windows-safe helper, not raw ln.
|
||||
expect(body).toMatch(/_link_or_copy\s+"\$gstack_dir\/\$dir_name\/sections"\s+"\$target\/sections"/);
|
||||
expect(body).toMatch(/if \[ -d "\$gstack_dir\/\$dir_name\/sections" \]/);
|
||||
});
|
||||
|
||||
test('kiro per-skill loop rewrites + copies sections/*', () => {
|
||||
// Kiro builds from the codex output and sed-rewrites paths; sections must get
|
||||
// the same rewrite so they resolve under ~/.kiro, not ~/.codex or ~/.claude.
|
||||
expect(SETUP).toMatch(/if \[ -d "\$skill_dir\/sections" \]/);
|
||||
expect(SETUP).toMatch(/mkdir -p "\$target_dir\/sections"/);
|
||||
expect(SETUP).toContain('$target_dir/sections/$(basename "$section_file")');
|
||||
});
|
||||
|
||||
test('no raw ln introduced (windows-fallback invariant still holds)', () => {
|
||||
// Every new line touching sections uses _link_or_copy or sed redirect, never ln.
|
||||
const sectionLines = SETUP.split('\n').filter(l => l.includes('sections') && /\bln\s+-/.test(l));
|
||||
expect(sectionLines).toEqual([]);
|
||||
});
|
||||
});
|
||||
@@ -2,10 +2,23 @@ import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const SHIP_SKILL = path.join(__dirname, '..', 'ship', 'SKILL.md');
|
||||
const SHIP_DIR = path.join(__dirname, '..', 'ship');
|
||||
|
||||
// Carved (v2 plan T9): the Plan Completion gate moved into sections/plan-completion.md.
|
||||
// Read the skeleton + sections union so these invariants follow the content.
|
||||
function readShipUnion(): string {
|
||||
let t = fs.readFileSync(path.join(SHIP_DIR, 'SKILL.md'), 'utf8');
|
||||
const secDir = path.join(SHIP_DIR, 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
describe('ship/SKILL.md — Plan Completion gate invariants (VAS-449 remediation)', () => {
|
||||
const skill = fs.readFileSync(SHIP_SKILL, 'utf8');
|
||||
const skill = readShipUnion();
|
||||
|
||||
test('Path concreteness rule: filesystem-pathed items must be test -f checked', () => {
|
||||
expect(skill).toContain('**Path concreteness rule.**');
|
||||
|
||||
@@ -9,7 +9,20 @@ import * as path from "path";
|
||||
import { scan } from "../lib/redact-engine";
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, "..");
|
||||
const TMPL = fs.readFileSync(path.join(ROOT, "ship", "SKILL.md.tmpl"), "utf-8");
|
||||
// Carved (v2 plan T9): ship is a skeleton template + sections/*.md.tmpl. The
|
||||
// PR-body redaction wiring moved into sections/pr-body.md.tmpl, so assert against
|
||||
// the union of the skeleton template and its section templates.
|
||||
function readShipTemplateUnion(): string {
|
||||
let t = fs.readFileSync(path.join(ROOT, "ship", "SKILL.md.tmpl"), "utf-8");
|
||||
const secDir = path.join(ROOT, "ship", "sections");
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith(".md.tmpl")) t += "\n" + fs.readFileSync(path.join(secDir, f), "utf-8");
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
const TMPL = readShipTemplateUnion();
|
||||
|
||||
describe("/ship redaction wiring", () => {
|
||||
test("scans the PR body via the shared bin before create", () => {
|
||||
|
||||
@@ -197,20 +197,26 @@ describeE2E('/ship idempotency E2E (periodic, real-PTY)', () => {
|
||||
}
|
||||
}
|
||||
|
||||
// Positive: the idempotency-check echoed ALREADY_BUMPED.
|
||||
if (/STATE:\s*ALREADY_BUMPED/.test(visible)) {
|
||||
// Positive: idempotency classify reported ALREADY_BUMPED. Post-carve
|
||||
// (T9), Step 12 runs `gstack-version-bump classify` which emits JSON
|
||||
// (`"state":"ALREADY_BUMPED"`); the legacy inline bash echoed
|
||||
// `STATE: ALREADY_BUMPED`. Accept either so the test survives the carve.
|
||||
if (/STATE:\s*ALREADY_BUMPED|"state":\s*"ALREADY_BUMPED"/.test(visible)) {
|
||||
outcome = 'detected';
|
||||
evidence = visible.slice(-3000);
|
||||
break;
|
||||
}
|
||||
|
||||
// Negative regressions:
|
||||
// - bump-action bash block ran (would echo on FRESH path)
|
||||
// - classify reported FRESH (CLI JSON or legacy echo) → would re-bump
|
||||
// - agent attempted git commit -m "chore: bump version"
|
||||
// - agent attempted git push
|
||||
// - agent rendered an Edit/Write to CHANGELOG.md or VERSION (acceptable in plan mode but flagged here)
|
||||
// - agent ran the CLI write path (gstack-version-bump write) — a
|
||||
// re-bump on an already-shipped branch
|
||||
if (
|
||||
/"state":\s*"FRESH"/.test(visible) ||
|
||||
/STATE:\s*FRESH(?![\w-])/i.test(visible) ||
|
||||
/gstack-version-bump\s+write/i.test(visible) ||
|
||||
/git\s+commit\s+.*chore:\s*bump\s+version/i.test(visible) ||
|
||||
/git\s+push.*origin/i.test(visible)
|
||||
) {
|
||||
|
||||
@@ -0,0 +1,120 @@
|
||||
/**
|
||||
* /ship section-loading E2E (periodic, paid, real-PTY) — v2 plan T9 mitigation
|
||||
* layer 5, the ONLY CI-failing guard against silent section-skip.
|
||||
*
|
||||
* After the carve, ship is a skeleton whose STOP-Read directives point at
|
||||
* sections/*.md. This test runs the REAL /ship skill in plan mode against a
|
||||
* fresh version-changing fixture and asserts the agent actually Read the
|
||||
* sections its situation requires (review-army + changelog at minimum — every
|
||||
* version-changing ship needs the pre-landing review and a CHANGELOG entry).
|
||||
*
|
||||
* Runs against the INSTALLED skill at ~/.claude/skills/gstack/ship (Codex
|
||||
* outside-voice #5: an E2E that reads repo paths would miss install-layout
|
||||
* 404s). Section reads are detected from the PTY scrollback — when the agent
|
||||
* Reads a section the tool render shows the `sections/<file>.md` path.
|
||||
*
|
||||
* Plan-mode framing keeps the agent from committing/pushing; producing a plan
|
||||
* is the terminal signal. Cost: ~$2-4/run. Periodic tier.
|
||||
*
|
||||
* Situation matrix (T1 = B): this file covers the fresh version-changing ship;
|
||||
* the already-bumped re-run is covered by skill-e2e-ship-idempotency.test.ts,
|
||||
* and a no-plan-file variant can be added to FIXTURES below.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { spawnSync } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
import {
|
||||
launchClaudePty,
|
||||
isPermissionDialogVisible,
|
||||
isNumberedOptionListVisible,
|
||||
} from './helpers/claude-pty-runner';
|
||||
|
||||
const shouldRun = !!process.env.EVALS && process.env.EVALS_TIER === 'periodic';
|
||||
const describeE2E = shouldRun ? describe : describe.skip;
|
||||
|
||||
/** Fresh fixture: feature branch with a real change but VERSION still == base,
|
||||
* so /ship must bump (FRESH) and walk the full pre-landing + changelog flow. */
|
||||
function buildFreshFixture(): { workTree: string; root: string } {
|
||||
const root = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-ship-secload-'));
|
||||
const workTree = path.join(root, 'workspace');
|
||||
const bareRemote = path.join(root, 'origin.git');
|
||||
fs.mkdirSync(workTree, { recursive: true });
|
||||
const sh = (cmd: string, cwd: string): void => {
|
||||
const r = spawnSync('bash', ['-c', cmd], { cwd, stdio: 'pipe', timeout: 15_000 });
|
||||
if (r.status !== 0) throw new Error(`fixture setup failed at "${cmd}":\n${r.stderr?.toString()}`);
|
||||
};
|
||||
sh(`git init --bare "${bareRemote}"`, root);
|
||||
sh('git init -b main', workTree);
|
||||
sh('git config user.email "t@t.com" && git config user.name "T" && git config commit.gpgsign false', workTree);
|
||||
fs.writeFileSync(path.join(workTree, 'VERSION'), '0.0.1\n');
|
||||
fs.writeFileSync(path.join(workTree, 'package.json'), JSON.stringify({ name: 'fx', version: '0.0.1', private: true }, null, 2) + '\n');
|
||||
fs.writeFileSync(path.join(workTree, 'CHANGELOG.md'), '# Changelog\n\n## [0.0.1] - 2026-01-01\n\n- Initial release\n');
|
||||
fs.writeFileSync(path.join(workTree, 'app.js'), '// base\n');
|
||||
sh('git add -A && git commit -m "chore: initial v0.0.1"', workTree);
|
||||
sh(`git remote add origin "${bareRemote}" && git push -u origin main`, workTree);
|
||||
// Feature branch: a real code change, VERSION untouched → FRESH (needs a bump).
|
||||
sh('git checkout -b feat/new-thing', workTree);
|
||||
fs.writeFileSync(path.join(workTree, 'app.js'), '// base\nexport function newThing() { return 42; }\n');
|
||||
fs.writeFileSync(path.join(workTree, 'app.test.js'), 'test("newThing", () => {});\n');
|
||||
sh('git add -A && git commit -m "feat: add newThing"', workTree);
|
||||
sh('git push -u origin feat/new-thing', workTree);
|
||||
return { workTree, root };
|
||||
}
|
||||
|
||||
// Sections every version-changing ship must consult.
|
||||
const REQUIRED_SECTIONS = ['review-army.md', 'changelog.md'];
|
||||
|
||||
describeE2E('/ship section-loading E2E (periodic, real-PTY, installed skill)', () => {
|
||||
test(
|
||||
'fresh version-changing ship Reads the required sections',
|
||||
async () => {
|
||||
const { workTree, root } = buildFreshFixture();
|
||||
const session = await launchClaudePty({
|
||||
permissionMode: 'plan',
|
||||
cwd: workTree,
|
||||
timeoutMs: 720_000,
|
||||
env: { GH_TOKEN: 'mock-not-real', NO_COLOR: '1' },
|
||||
});
|
||||
|
||||
const readSections = new Set<string>();
|
||||
let planReady = false;
|
||||
try {
|
||||
await Bun.sleep(8000);
|
||||
const since = session.mark();
|
||||
session.send('/ship\r');
|
||||
const start = Date.now();
|
||||
let lastPermSig = '';
|
||||
while (Date.now() - start < 600_000) {
|
||||
await Bun.sleep(3000);
|
||||
if (session.exited()) break;
|
||||
const visible = session.visibleSince(since);
|
||||
const tail = visible.slice(-1500);
|
||||
if (isNumberedOptionListVisible(tail) && isPermissionDialogVisible(tail)) {
|
||||
const sig = visible.slice(-500);
|
||||
if (sig !== lastPermSig) { lastPermSig = sig; session.send('1\r'); await Bun.sleep(1500); continue; }
|
||||
}
|
||||
// Detect section reads from the scrollback (tool render shows the path).
|
||||
for (const m of visible.matchAll(/sections\/([A-Za-z0-9._-]+\.md)/g)) readSections.add(m[1]);
|
||||
if (/ready to execute|Would you like to proceed|GSTACK REVIEW REPORT/i.test(visible)) {
|
||||
planReady = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
} finally {
|
||||
await session.close();
|
||||
try { fs.rmSync(root, { recursive: true, force: true }); } catch { /* ignore */ }
|
||||
}
|
||||
|
||||
const missing = REQUIRED_SECTIONS.filter(s => !readSections.has(s));
|
||||
expect({ planReady, read: [...readSections], missing }).toEqual({
|
||||
planReady: true,
|
||||
read: expect.any(Array),
|
||||
missing: [],
|
||||
});
|
||||
},
|
||||
900_000,
|
||||
);
|
||||
});
|
||||
@@ -156,7 +156,11 @@ describe('SKILL.md size budget regression (gate, free)', () => {
|
||||
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||||
const current = captureBaseline({ repoRoot: REPO_ROOT });
|
||||
const MIN_RATIO = 0.80; // a skill at <80% of its v1.44 size signals mass-deletion
|
||||
const SECTIONS_EXTRACTED = new Set<string>(); // populate in v2.0.0.0 when sections/ lands
|
||||
// Carved skills (v2 plan T9): the skeleton SKILL.md intentionally shrinks
|
||||
// because prose moved into sections/*.md. The union size is guarded instead
|
||||
// by the sectioned ship invariant in parity-harness.ts (minBytes on the
|
||||
// skeleton+sections union), so exempt the skeleton from the body-strip floor.
|
||||
const SECTIONS_EXTRACTED = new Set<string>(['ship']);
|
||||
|
||||
const undershoots: Array<{
|
||||
skill: string; beforeBytes: number; afterBytes: number; ratio: number;
|
||||
|
||||
@@ -7,6 +7,22 @@ import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
|
||||
// Carved-skill aware (v2 plan T9): ship is a skeleton SKILL.md + sections/*.md.
|
||||
// Read the union so validations of content that moved into a section still hold.
|
||||
// `_SHIP_MD` is a distinct path expression so a mechanical read-replace can't
|
||||
// recurse into this helper.
|
||||
const _SHIP_MD = path.join(ROOT, 'ship', 'SKILL.md');
|
||||
function readShipUnion(): string {
|
||||
let t = fs.readFileSync(_SHIP_MD, 'utf-8');
|
||||
const secDir = path.join(ROOT, 'ship', 'sections');
|
||||
if (fs.existsSync(secDir)) {
|
||||
for (const f of fs.readdirSync(secDir).sort()) {
|
||||
if (f.endsWith('.md')) t += '\n' + fs.readFileSync(path.join(secDir, f), 'utf-8');
|
||||
}
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
describe('SKILL.md command validation', () => {
|
||||
test('all $B commands in SKILL.md are valid browse commands', () => {
|
||||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||||
@@ -315,7 +331,8 @@ describe('Cross-skill path consistency', () => {
|
||||
for (const file of filesToCheck) {
|
||||
const filePath = path.join(ROOT, file);
|
||||
if (!fs.existsSync(filePath)) continue;
|
||||
const content = fs.readFileSync(filePath, 'utf-8');
|
||||
// ship's greptile handling moved into sections/greptile.md (T9 carve).
|
||||
const content = file === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(filePath, 'utf-8');
|
||||
|
||||
const hasBoth = (content.includes('per-project') && content.includes('global')) ||
|
||||
(content.includes('$REMOTE_SLUG/greptile-history') && content.includes('~/.gstack/greptile-history'));
|
||||
@@ -437,7 +454,7 @@ describe('Greptile history format consistency', () => {
|
||||
|
||||
test('review/SKILL.md and ship/SKILL.md both reference greptile-triage.md for write details', () => {
|
||||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
|
||||
expect(reviewContent.toLowerCase()).toContain('greptile-triage.md');
|
||||
expect(shipContent.toLowerCase()).toContain('greptile-triage.md');
|
||||
@@ -530,7 +547,7 @@ describe('TODOS-format.md reference consistency', () => {
|
||||
});
|
||||
|
||||
test('skills that write TODOs reference TODOS-format.md', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||||
const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
@@ -788,7 +805,7 @@ describe('Enum & Value Completeness in review checklist', () => {
|
||||
expect(checklist).toContain('ASK');
|
||||
|
||||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
|
||||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
|
||||
const shipSkill = readShipUnion();
|
||||
expect(reviewSkill).toContain('AUTO-FIX');
|
||||
expect(reviewSkill).toContain('[AUTO-FIXED]');
|
||||
expect(shipSkill).toContain('AUTO-FIX');
|
||||
@@ -1014,7 +1031,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
||||
});
|
||||
|
||||
test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Test Framework Bootstrap');
|
||||
expect(content).toContain('Step 4');
|
||||
});
|
||||
@@ -1063,7 +1080,7 @@ describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
||||
|
||||
test('WebSearch is in allowed-tools for qa, ship, design-review', () => {
|
||||
const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const ship = readShipUnion();
|
||||
const qaDesign = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||||
expect(qa).toContain('WebSearch');
|
||||
expect(ship).toContain('WebSearch');
|
||||
@@ -1112,7 +1129,7 @@ describe('Phase 8e.5 regression test generation', () => {
|
||||
|
||||
describe('Step 3.4 test coverage audit', () => {
|
||||
test('ship/SKILL.md contains Step 7', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Step 7: Test Coverage Audit');
|
||||
// The coverage diagram collapses code-path and user-flow counts onto one
|
||||
// summary line. Verify that summary is present (labels are stable).
|
||||
@@ -1120,7 +1137,7 @@ describe('Step 3.4 test coverage audit', () => {
|
||||
});
|
||||
|
||||
test('Step 3.4 includes quality scoring rubric', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('★★★');
|
||||
expect(content).toContain('★★');
|
||||
expect(content).toContain('edge cases AND error paths');
|
||||
@@ -1128,36 +1145,36 @@ describe('Step 3.4 test coverage audit', () => {
|
||||
});
|
||||
|
||||
test('Step 3.4 includes before/after test count', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Count test files before');
|
||||
expect(content).toContain('Count test files after');
|
||||
});
|
||||
|
||||
test('ship PR body includes Test Coverage section', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('## Test Coverage');
|
||||
});
|
||||
|
||||
test('ship rules include test generation rule', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Step 7 generates coverage tests');
|
||||
expect(content).toContain('Never commit failing tests');
|
||||
});
|
||||
|
||||
test('Step 3.4 includes vibe coding philosophy', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('vibe coding becomes yolo coding');
|
||||
});
|
||||
|
||||
test('Step 3.4 traces actual codepaths, not just syntax', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Trace every codepath');
|
||||
expect(content).toContain('Trace data flow');
|
||||
expect(content).toContain('Diagram the execution');
|
||||
});
|
||||
|
||||
test('Step 3.4 maps user flows and interaction edge cases', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Map user flows');
|
||||
expect(content).toContain('Interaction edge cases');
|
||||
expect(content).toContain('Double-click');
|
||||
@@ -1167,7 +1184,7 @@ describe('Step 3.4 test coverage audit', () => {
|
||||
});
|
||||
|
||||
test('Step 3.4 diagram includes user-flow coverage summary', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
// The diagram was compressed from separate CODE PATH COVERAGE / USER FLOW
|
||||
// COVERAGE section headers into a single summary line. Assert on the
|
||||
// labels that still appear on that summary line.
|
||||
@@ -1203,7 +1220,7 @@ describe('ship step numbering', () => {
|
||||
});
|
||||
|
||||
test('ship/SKILL.md main headings use clean integer step numbers', () => {
|
||||
const skill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const skill = readShipUnion();
|
||||
// Headings like "## Step 7: Test Coverage Audit" — NOT sub-steps like "## Step 8.1:"
|
||||
const headings = Array.from(skill.matchAll(/^## Step (\d+(?:\.\d+)?):/gm)).map(
|
||||
(m) => m[1]
|
||||
@@ -1381,7 +1398,7 @@ describe('Codex skill', () => {
|
||||
});
|
||||
|
||||
test('adversarial review in /ship always runs both passes', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Adversarial review (always-on)');
|
||||
expect(content).toContain('adversarial-review');
|
||||
expect(content).toContain('reasoning_effort="high"');
|
||||
@@ -1391,7 +1408,7 @@ describe('Codex skill', () => {
|
||||
|
||||
test('scope drift detection in /review and /ship', () => {
|
||||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const shipContent = readShipUnion();
|
||||
// Both should contain scope drift from the shared resolver
|
||||
for (const content of [reviewContent, shipContent]) {
|
||||
expect(content).toContain('Scope Check:');
|
||||
@@ -1427,7 +1444,8 @@ describe('Codex skill', () => {
|
||||
|
||||
test('codex review invocations avoid the prompt plus --base argument shape', () => {
|
||||
for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex command moved into sections/adversarial.md (T9 carve).
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).not.toContain('--base <base> -c \'model_reasoning_effort="high"\'');
|
||||
expect(content).toContain('Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD');
|
||||
}
|
||||
@@ -1443,7 +1461,8 @@ describe('Codex skill', () => {
|
||||
const boundaryLine =
|
||||
'Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/';
|
||||
for (const rel of ['codex/SKILL.md', 'review/SKILL.md', 'ship/SKILL.md']) {
|
||||
const content = fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
// ship's codex/adversarial boundary line moved into sections/adversarial.md.
|
||||
const content = rel === 'ship/SKILL.md' ? readShipUnion() : fs.readFileSync(path.join(ROOT, rel), 'utf-8');
|
||||
expect(content).toContain(boundaryLine);
|
||||
}
|
||||
});
|
||||
@@ -1456,7 +1475,7 @@ describe('Codex skill', () => {
|
||||
});
|
||||
|
||||
test('Review Readiness Dashboard includes Adversarial Review row', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Adversarial');
|
||||
expect(content).toContain('codex-review');
|
||||
});
|
||||
@@ -1711,17 +1730,17 @@ describe('Repo mode preamble validation', () => {
|
||||
|
||||
describe('Test failure triage in ship skill', () => {
|
||||
test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('Test Failure Ownership Triage');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md triage uses git diff for classification', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('git diff origin/<base>...HEAD --name-only');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md triage has solo and collaborative paths', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('REPO_MODE');
|
||||
expect(content).toContain('solo');
|
||||
expect(content).toContain('collaborative');
|
||||
@@ -1730,18 +1749,18 @@ describe('Test failure triage in ship skill', () => {
|
||||
});
|
||||
|
||||
test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('gh issue create');
|
||||
expect(content).toContain('--assignee');
|
||||
});
|
||||
|
||||
test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md uses in-branch language for stop condition', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const content = readShipUnion();
|
||||
expect(content).toContain('In-branch test failures');
|
||||
});
|
||||
});
|
||||
|
||||
@@ -0,0 +1,58 @@
|
||||
/**
|
||||
* Section TemplateContext parity (v2 plan T9 / Codex consult absorbed-refinement #1).
|
||||
*
|
||||
* Section generation must use the SAME TemplateContext as the parent skill —
|
||||
* crucially the same skillName, so resolver `appliesTo` gating + tier behave
|
||||
* identically. If a section resolved with skillName "sections" (the bug
|
||||
* processSectionTemplate guards against), gated resolvers like ADVERSARIAL_STEP /
|
||||
* CONFIDENCE_CALIBRATION would render empty.
|
||||
*
|
||||
* We assert on the GENERATED section output: gated resolver content is present and
|
||||
* no placeholder is left unresolved. That can only be true if the parent ctx
|
||||
* (skillName=ship) drove the resolve.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const SHIP_SECTIONS = path.join(ROOT, 'ship', 'sections');
|
||||
|
||||
function readSection(file: string): string {
|
||||
return fs.readFileSync(path.join(SHIP_SECTIONS, file), 'utf-8');
|
||||
}
|
||||
|
||||
describe('section TemplateContext parity (skillName pinned to parent)', () => {
|
||||
test('no generated section has unresolved {{PLACEHOLDER}} tokens', () => {
|
||||
for (const md of fs.readdirSync(SHIP_SECTIONS).filter(f => f.endsWith('.md') && !f.endsWith('.md.tmpl'))) {
|
||||
const content = readSection(md);
|
||||
const unresolved = content.match(/\{\{[A-Z_]+(?::[^}]+)?\}\}/g);
|
||||
expect({ md, unresolved }).toEqual({ md, unresolved: null });
|
||||
}
|
||||
});
|
||||
|
||||
test('adversarial section rendered the ADVERSARIAL_STEP resolver (proves ship ctx)', () => {
|
||||
const content = readSection('adversarial.md');
|
||||
// The codex filesystem-boundary line only appears when ADVERSARIAL_STEP resolves.
|
||||
expect(content).toContain('Do NOT read or execute any files under');
|
||||
expect(content.length).toBeGreaterThan(500);
|
||||
});
|
||||
|
||||
test('review-army section rendered CONFIDENCE_CALIBRATION + REVIEW_ARMY (gated resolvers)', () => {
|
||||
const content = readSection('review-army.md');
|
||||
expect(content).toContain('Confidence Calibration');
|
||||
expect(content).toContain('confidence score');
|
||||
});
|
||||
|
||||
test('tests section rendered TEST_BOOTSTRAP + TEST_FAILURE_TRIAGE', () => {
|
||||
const content = readSection('tests.md');
|
||||
expect(content).toContain('Test Failure Ownership Triage');
|
||||
});
|
||||
|
||||
test('changelog section rendered CHANGELOG_WORKFLOW', () => {
|
||||
const content = readSection('changelog.md');
|
||||
expect(content).toContain('CHANGELOG');
|
||||
expect(content.length).toBeGreaterThan(300);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,136 @@
|
||||
/**
|
||||
* Unit tests for the transcript section logger (T10). Pure-function coverage —
|
||||
* no paid run needed. Drives the analyzers with synthetic tool-call transcripts.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, afterAll } from 'bun:test';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import {
|
||||
extractSectionReads,
|
||||
extractShipActions,
|
||||
compareShipActions,
|
||||
writeShipBaseline,
|
||||
readShipBaseline,
|
||||
baselinePath,
|
||||
SHIP_ACTIONS,
|
||||
type ToolCallLike,
|
||||
type ShipBaseline,
|
||||
} from './helpers/transcript-section-logger';
|
||||
|
||||
const read = (fp: string): ToolCallLike => ({ tool: 'Read', input: { file_path: fp }, output: '' });
|
||||
const bash = (command: string): ToolCallLike => ({ tool: 'Bash', input: { command }, output: '' });
|
||||
|
||||
describe('extractSectionReads', () => {
|
||||
test('picks up section reads via the /sections/<file>.md segment', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('/Users/x/.claude/skills/gstack-ship/sections/version-bump.md'),
|
||||
read('ship/sections/changelog.md'),
|
||||
read('/abs/.factory/skills/gstack-ship/sections/review-army.md'),
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual(['version-bump.md', 'changelog.md', 'review-army.md']);
|
||||
});
|
||||
|
||||
test('ignores non-section reads and non-Read tools', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('ship/SKILL.md'),
|
||||
read('/some/sections-like/notsections/x.md'),
|
||||
bash('cat ship/sections/version-bump.md'), // bash, not a Read
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual([]);
|
||||
});
|
||||
|
||||
test('dedupes and preserves first-read order', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
read('ship/sections/tests.md'),
|
||||
read('ship/sections/version-bump.md'),
|
||||
read('ship/sections/tests.md'),
|
||||
],
|
||||
};
|
||||
expect(extractSectionReads(result)).toEqual(['tests.md', 'version-bump.md']);
|
||||
});
|
||||
});
|
||||
|
||||
describe('extractShipActions', () => {
|
||||
test('detects the full action fingerprint from bash + writes', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
bash('git merge origin/main'),
|
||||
bash('bun test'),
|
||||
bash('gstack-version-bump --bump minor'),
|
||||
{ tool: 'Edit', input: { file_path: 'CHANGELOG.md' }, output: '' },
|
||||
bash('git commit -m "v1.2.0.0 feat"'),
|
||||
bash('git push origin HEAD'),
|
||||
bash('gh pr create --base main'),
|
||||
],
|
||||
};
|
||||
expect(extractShipActions(result)).toEqual([...SHIP_ACTIONS]);
|
||||
});
|
||||
|
||||
test('returns canonical order regardless of execution order', () => {
|
||||
const result = {
|
||||
toolCalls: [
|
||||
bash('gh pr create --base main'),
|
||||
bash('git merge origin/main'),
|
||||
],
|
||||
};
|
||||
expect(extractShipActions(result)).toEqual(['merged_base', 'opened_pr']);
|
||||
});
|
||||
|
||||
test('VERSION write counts as a version bump even without the CLI', () => {
|
||||
const result = { toolCalls: [{ tool: 'Write', input: { file_path: 'VERSION' }, output: '' }] };
|
||||
expect(extractShipActions(result)).toEqual(['bumped_version']);
|
||||
});
|
||||
|
||||
test('empty run produces empty fingerprint', () => {
|
||||
expect(extractShipActions({ toolCalls: [] })).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('compareShipActions', () => {
|
||||
const baseline: ShipBaseline = {
|
||||
tag: 'monolith',
|
||||
situation: 'fresh-version-changing',
|
||||
actions: ['merged_base', 'ran_tests', 'bumped_version', 'wrote_changelog', 'committed', 'pushed', 'opened_pr'],
|
||||
sectionReads: [],
|
||||
capturedAt: '2026-05-30T00:00:00Z',
|
||||
};
|
||||
|
||||
test('flags a dropped action as the carve regression', () => {
|
||||
const current = baseline.actions.filter(a => a !== 'bumped_version');
|
||||
const diff = compareShipActions(baseline, current);
|
||||
expect(diff.ok).toBe(false);
|
||||
expect(diff.missing).toEqual(['bumped_version']);
|
||||
});
|
||||
|
||||
test('passes when the sectioned run performs every baseline action', () => {
|
||||
const diff = compareShipActions(baseline, [...baseline.actions, 'merged_base']);
|
||||
expect(diff.ok).toBe(true);
|
||||
expect(diff.missing).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('baseline persistence', () => {
|
||||
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'ship-baseline-'));
|
||||
afterAll(() => { try { fs.rmSync(dir, { recursive: true, force: true }); } catch { /* noop */ } });
|
||||
|
||||
test('round-trips a baseline to disk', () => {
|
||||
const baseline: ShipBaseline = {
|
||||
tag: 'monolith', situation: 'no-plan-file',
|
||||
actions: ['ran_tests', 'committed'], sectionReads: [], capturedAt: '2026-05-30T00:00:00Z',
|
||||
};
|
||||
const p = writeShipBaseline(baseline, dir);
|
||||
expect(p).toBe(baselinePath('no-plan-file', dir));
|
||||
expect(readShipBaseline('no-plan-file', dir)).toEqual(baseline);
|
||||
});
|
||||
|
||||
test('returns null when no baseline captured yet', () => {
|
||||
expect(readShipBaseline('never-captured', dir)).toBeNull();
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user