mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-10 12:03:59 +02:00
1626d4857b
* fix(plan-devex-review): add missing gstack-review-log step plan-devex-review carried the EXIT PLAN MODE GATE but never wrote a review-log entry, so the gate's 'review log was called' check was structurally unsatisfiable and the Review Readiness Dashboard / GSTACK REVIEW REPORT had no plan-devex-review data to read. Add a Review Log section before the dashboard read, logging the devex fields the report parser already expects (status, scores, product_type, tthw, persona, competitive_tier, unresolved, commit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(review): make unresolved-decisions status mandatory in GSTACK REVIEW REPORT The report's UNRESOLVED line was optional ('omit if empty') and the EXIT PLAN MODE GATE only checked it 'if applicable', so a plan could ship with no statement about open decisions at all — a missed ambiguity read identically to a clean plan. Now every report ends with a mandatory unresolved-decisions status as its final line: either the exact unbolded sentinel 'NO UNRESOLVED DECISIONS', or a '**UNRESOLVED DECISIONS:**' block of bullets. The gate blocks ExitPlanMode unless that final line is present. generatePlanFileReviewReport: current-review items are listed from context; prior reviews contribute an aggregate count computed as latest-fresh-row- per-skill minus the current run (no double-count, dashboard 7-day window). generateExitPlanModeGate: check #3 is now blocking with no 'if applicable' escape; bolded sentinel does not satisfy it. Tests: static guard in gen-skill-docs.test.ts asserts the mandatory status across all six report consumers and the gate across gate-bearing skills; skill-e2e-plan.test.ts asserts the written report's final line is the status (and fixes a stale 'four review rows' -> five-row prompt). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(review): compress unresolved-status prose to fit parity budget After merging origin/main (v1.57.3.0), plan-devex-review exceeded the 1.05x parity ratio vs the v1.53.0.0 baseline. Rather than rebase the baseline, compressed the new prose to stay under the cap honestly: the report's unresolved-status block (~32 -> ~9 lines) and the EXIT PLAN MODE GATE's final-line check (~7 -> ~5 lines), plus the plan-devex-review review-log step. All load-bearing rules and the exact gate-checkable tokens are preserved; the static guards in gen-skill-docs.test.ts still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: regenerate stale ship golden fixtures (#1909 follow-up) #1909 (v1.57.3.0) added the always-loaded PR-title-version rule to ship's template and committed the regenerated ship/SKILL.md, but did not refresh the three ship golden fixtures, leaving the golden-file regression test red on main. Regenerate them from current output. The diff is purely #1909 content: the PR-title invariant line plus a previously-unresolved ${ctx.paths.binDir} placeholder that current generation correctly resolves. No feature content from this branch leaks into ship (ship does not consume the review report resolvers). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(plan-devex-review): restore TIMESTAMP fill instruction in review-log Adversarial review caught that compressing the devex review-log block dropped the TIMESTAMP substitution guidance the three sibling plan-review skills carry. A literal "timestamp":"TIMESTAMP" parses as JSON but is an unparseable date, so the Review Readiness Dashboard's 7-day freshness window silently drops the plan-devex-review row (and the report's prior-review aggregation loses it). Restore the one-line instruction. Also: the plan-review-report E2E now derives its last-line check from the report slice, not the whole file, so a mis-placed report surfaces the real trailing content in the failure message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): rebase parity baseline v1.53.0.0 -> v1.57.7.0 The v1.53 anchor is four minor versions stale. v1.54-v1.57 (ship/plan carving, carve-guards, AUQ prose fallback, the cross-session decision-log preamble) plus this branch's mandatory unresolved-decisions status line pushed the three plan-review skills past the 5% ratchet even after exhaustive compression. The new baseline captures current UNION sizes (skeleton + sections/*.md, matching what parity-harness measures) so the per-skill 1.05 ratio keeps catching future bloat. The frozen v1.44.1 integrity anchor and the v1.47 size-budget baseline are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.7.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
61 lines
2.5 KiB
TypeScript
61 lines
2.5 KiB
TypeScript
/**
|
||
* Cathedral parity suite — gate-tier (free, structural + content checks).
|
||
*
|
||
* Runs every PARITY_INVARIANTS check against the current SKILL.md output
|
||
* vs the v1.57.7.0 baseline. Failures get an actionable, per-skill report
|
||
* showing missing phrases, missing headings, and size ratios.
|
||
*
|
||
* Baseline rebased v1.53.0.0 → v1.57.7.0: the v1.54–v1.57 releases (ship/plan
|
||
* carving, carve-guards, AUQ prose fallback, the cross-session decision-log
|
||
* preamble) plus the mandatory unresolved-decisions status added to every
|
||
* GSTACK REVIEW REPORT pushed the three plan-review skills past the 5% ratchet
|
||
* on the v1.53 anchor even after exhaustive compression. The v1.57.7.0 baseline
|
||
* captures current UNION sizes (skeleton + sections/*.md, matching what the
|
||
* harness measures) so the per-skill 1.05 ratio still catches future bloat.
|
||
* Earlier rebase v1.44.1 → v1.53.0.0: brain-aware-planning (v1.49–v1.52) + the
|
||
* v1.53 redaction guard. Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 / v1.53.0.0
|
||
* baselines are retained in test/fixtures/ for the audit trail.
|
||
*
|
||
* Periodic-tier LLM-judge parity (paid) lands in Phase B (v2.0.0.0)
|
||
* alongside the sections/ extraction. Plumbing is in parity-harness.ts.
|
||
*/
|
||
|
||
import { describe, test, expect } from 'bun:test';
|
||
import * as fs from 'fs';
|
||
import * as path from 'path';
|
||
import { runParityChecks, PARITY_INVARIANTS } from './helpers/parity-harness';
|
||
import type { ParityBaseline } from './helpers/capture-parity-baseline';
|
||
|
||
const REPO_ROOT = path.resolve(import.meta.dir, '..');
|
||
const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.57.7.0.json');
|
||
|
||
describe('parity suite vs v1.57.7.0 baseline (gate, free)', () => {
|
||
test('baseline exists', () => {
|
||
expect(fs.existsSync(BASELINE_PATH)).toBe(true);
|
||
});
|
||
|
||
test('all PARITY_INVARIANTS pass', () => {
|
||
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||
const report = runParityChecks({
|
||
repoRoot: REPO_ROOT,
|
||
baseline,
|
||
invariants: PARITY_INVARIANTS,
|
||
});
|
||
|
||
// eslint-disable-next-line no-console
|
||
console.log(
|
||
`[parity] ${report.passed}/${report.totalChecks} skills passed parity vs ${baseline.tag}`,
|
||
);
|
||
|
||
if (report.failed === 0) return;
|
||
|
||
const failureMessages = report.details
|
||
.filter(d => !d.passed)
|
||
.map(d => ` ${d.skill}:\n - ${d.failures.join('\n - ')}`)
|
||
.join('\n');
|
||
throw new Error(
|
||
`${report.failed} skill(s) failed parity checks vs ${baseline.tag}:\n${failureMessages}`,
|
||
);
|
||
});
|
||
});
|