Files
gstack/test/parity-suite.test.ts
T
Garry Tan 1626d4857b v1.57.7.0 feat: GSTACK REVIEW REPORT always declares unresolved decisions (#1916)
* fix(plan-devex-review): add missing gstack-review-log step

plan-devex-review carried the EXIT PLAN MODE GATE but never wrote a
review-log entry, so the gate's 'review log was called' check was
structurally unsatisfiable and the Review Readiness Dashboard / GSTACK
REVIEW REPORT had no plan-devex-review data to read. Add a Review Log
section before the dashboard read, logging the devex fields the report
parser already expects (status, scores, product_type, tthw, persona,
competitive_tier, unresolved, commit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(review): make unresolved-decisions status mandatory in GSTACK REVIEW REPORT

The report's UNRESOLVED line was optional ('omit if empty') and the EXIT
PLAN MODE GATE only checked it 'if applicable', so a plan could ship with
no statement about open decisions at all — a missed ambiguity read
identically to a clean plan. Now every report ends with a mandatory
unresolved-decisions status as its final line: either the exact unbolded
sentinel 'NO UNRESOLVED DECISIONS', or a '**UNRESOLVED DECISIONS:**' block
of bullets. The gate blocks ExitPlanMode unless that final line is present.

generatePlanFileReviewReport: current-review items are listed from context;
prior reviews contribute an aggregate count computed as latest-fresh-row-
per-skill minus the current run (no double-count, dashboard 7-day window).
generateExitPlanModeGate: check #3 is now blocking with no 'if applicable'
escape; bolded sentinel does not satisfy it.

Tests: static guard in gen-skill-docs.test.ts asserts the mandatory status
across all six report consumers and the gate across gate-bearing skills;
skill-e2e-plan.test.ts asserts the written report's final line is the
status (and fixes a stale 'four review rows' -> five-row prompt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(review): compress unresolved-status prose to fit parity budget

After merging origin/main (v1.57.3.0), plan-devex-review exceeded the 1.05x
parity ratio vs the v1.53.0.0 baseline. Rather than rebase the baseline,
compressed the new prose to stay under the cap honestly: the report's
unresolved-status block (~32 -> ~9 lines) and the EXIT PLAN MODE GATE's
final-line check (~7 -> ~5 lines), plus the plan-devex-review review-log
step. All load-bearing rules and the exact gate-checkable tokens are
preserved; the static guards in gen-skill-docs.test.ts still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: regenerate stale ship golden fixtures (#1909 follow-up)

#1909 (v1.57.3.0) added the always-loaded PR-title-version rule to ship's
template and committed the regenerated ship/SKILL.md, but did not refresh the
three ship golden fixtures, leaving the golden-file regression test red on
main. Regenerate them from current output. The diff is purely #1909 content:
the PR-title invariant line plus a previously-unresolved ${ctx.paths.binDir}
placeholder that current generation correctly resolves. No feature content
from this branch leaks into ship (ship does not consume the review report
resolvers).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(plan-devex-review): restore TIMESTAMP fill instruction in review-log

Adversarial review caught that compressing the devex review-log block dropped
the TIMESTAMP substitution guidance the three sibling plan-review skills carry.
A literal "timestamp":"TIMESTAMP" parses as JSON but is an unparseable date,
so the Review Readiness Dashboard's 7-day freshness window silently drops the
plan-devex-review row (and the report's prior-review aggregation loses it).
Restore the one-line instruction. Also: the plan-review-report E2E now derives
its last-line check from the report slice, not the whole file, so a mis-placed
report surfaces the real trailing content in the failure message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(parity): rebase parity baseline v1.53.0.0 -> v1.57.7.0

The v1.53 anchor is four minor versions stale. v1.54-v1.57 (ship/plan carving,
carve-guards, AUQ prose fallback, the cross-session decision-log preamble) plus
this branch's mandatory unresolved-decisions status line pushed the three
plan-review skills past the 5% ratchet even after exhaustive compression. The
new baseline captures current UNION sizes (skeleton + sections/*.md, matching
what parity-harness measures) so the per-skill 1.05 ratio keeps catching future
bloat. The frozen v1.44.1 integrity anchor and the v1.47 size-budget baseline
are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.7.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:17:18 -07:00

61 lines
2.5 KiB
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
/**
* Cathedral parity suite — gate-tier (free, structural + content checks).
*
* Runs every PARITY_INVARIANTS check against the current SKILL.md output
* vs the v1.57.7.0 baseline. Failures get an actionable, per-skill report
* showing missing phrases, missing headings, and size ratios.
*
* Baseline rebased v1.53.0.0 → v1.57.7.0: the v1.54v1.57 releases (ship/plan
* carving, carve-guards, AUQ prose fallback, the cross-session decision-log
* preamble) plus the mandatory unresolved-decisions status added to every
* GSTACK REVIEW REPORT pushed the three plan-review skills past the 5% ratchet
* on the v1.53 anchor even after exhaustive compression. The v1.57.7.0 baseline
* captures current UNION sizes (skeleton + sections/*.md, matching what the
* harness measures) so the per-skill 1.05 ratio still catches future bloat.
* Earlier rebase v1.44.1 → v1.53.0.0: brain-aware-planning (v1.49v1.52) + the
* v1.53 redaction guard. Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 / v1.53.0.0
* baselines are retained in test/fixtures/ for the audit trail.
*
* Periodic-tier LLM-judge parity (paid) lands in Phase B (v2.0.0.0)
* alongside the sections/ extraction. Plumbing is in parity-harness.ts.
*/
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import { runParityChecks, PARITY_INVARIANTS } from './helpers/parity-harness';
import type { ParityBaseline } from './helpers/capture-parity-baseline';
const REPO_ROOT = path.resolve(import.meta.dir, '..');
const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.57.7.0.json');
describe('parity suite vs v1.57.7.0 baseline (gate, free)', () => {
test('baseline exists', () => {
expect(fs.existsSync(BASELINE_PATH)).toBe(true);
});
test('all PARITY_INVARIANTS pass', () => {
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
const report = runParityChecks({
repoRoot: REPO_ROOT,
baseline,
invariants: PARITY_INVARIANTS,
});
// eslint-disable-next-line no-console
console.log(
`[parity] ${report.passed}/${report.totalChecks} skills passed parity vs ${baseline.tag}`,
);
if (report.failed === 0) return;
const failureMessages = report.details
.filter(d => !d.passed)
.map(d => ` ${d.skill}:\n - ${d.failures.join('\n - ')}`)
.join('\n');
throw new Error(
`${report.failed} skill(s) failed parity checks vs ${baseline.tag}:\n${failureMessages}`,
);
});
});