diff --git a/CHANGELOG.md b/CHANGELOG.md index 52d5d8dcd..967255d61 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,64 @@ # Changelog +## [1.57.7.0] - 2026-06-08 + +## **Every plan review now ends by telling you, in one line, whether anything is still unresolved.** +## **The GSTACK REVIEW REPORT closes with the open decisions, or "NO UNRESOLVED DECISIONS" in plain sight, before you approve.** + +When a plan-review skill (/plan-ceo-review, /plan-eng-review, /plan-design-review, +/plan-devex-review, and /codex) finishes and hands you the plan to approve, its report +now ends with a mandatory unresolved-decisions verdict. If decisions are still open, it +lists each one and what breaks if you ship it deferred. If nothing is open, it prints the +exact line NO UNRESOLVED DECISIONS. A token-reduction pass had made this line optional, so +a clean plan and a plan hiding an open question rendered the same. Now the line is never +omitted, it is always the last thing you read before the approval prompt, and the approval +gate refuses to let the plan through without it. + +### What changed, before and after + +| At plan-approval time | Before | After | +|---|---|---| +| Clean plan | usually no unresolved line | `NO UNRESOLVED DECISIONS` as the final line | +| Plan with open decisions | unresolved line optional, often dropped | `**UNRESOLVED DECISIONS:**` + one bullet per open item | +| Approval gate (ExitPlanMode) | checked the line "if applicable" | blocks unless the unresolved status is the final line | +| /plan-devex-review review log | never written, gate uncheckable | written, so the dashboard and report see its data | + +The unresolved count across reviews is computed without double-counting the review that +just ran, using the same 7-day freshness window as the Review Readiness Dashboard. + +### What this means for you + +Every approve-plan moment now carries an explicit verdict on open questions, so a missed +ambiguity cannot slip through looking like a clean plan. If you run the plan-review skills +or /autoplan, you will see the unresolved status as the closing line of every report. +Nothing to configure. Upgrade and your next plan review shows it. + +### Itemized changes + +#### Added +- **Mandatory unresolved-decisions status in the GSTACK REVIEW REPORT.** Generated into + all six report consumers (/plan-ceo-review, /plan-eng-review, /plan-design-review, + /plan-devex-review, /codex, /devex-review) from `scripts/resolvers/review.ts`. The report + always ends with either the exact unbolded sentinel `NO UNRESOLVED DECISIONS` or a + `**UNRESOLVED DECISIONS:**` bullet block listing each open item; never omitted, always + the final line. +- **Blocking approval gate.** The EXIT PLAN MODE GATE now refuses ExitPlanMode unless the + report's final non-whitespace line is the unresolved status (no "if applicable" escape). +- Static and E2E tests pinning the mandatory status across every report consumer and + gate-bearing skill, so a future compression pass cannot silently drop it again. + +#### Fixed +- **/plan-devex-review never logged a review entry.** It carried the approval gate but + never called `gstack-review-log`, so the gate's "review log was called" check was + structurally unsatisfiable and its data was invisible to the Review Readiness Dashboard + and the report. It now logs with the correct timestamp and DX fields. + +#### For contributors +- Rebased the parity-suite size baseline v1.53.0.0 to v1.57.7.0 (captures current union + sizes; keeps the per-skill 1.05 ratio so future bloat is still caught). Regenerated the + three ship golden fixtures left stale by #1909. The frozen v1.44.1 integrity anchor and + the v1.47 size-budget baseline are untouched. + ## [1.57.6.0] - 2026-06-07 ## **Eight community-filed bugs fixed in one wave, four of them security guards that were quietly failing open.** diff --git a/VERSION b/VERSION index ee55fffe9..bb68a65d9 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.57.6.0 +1.57.7.0 diff --git a/codex/SKILL.md b/codex/SKILL.md index 4d01f131e..e15c16ec2 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -1112,14 +1112,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one @@ -1160,12 +1170,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured `## GSTACK REVIEW REPORT` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded `NO UNRESOLVED DECISIONS`, or a bullet of a final + `**UNRESOLVED DECISIONS:**` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm `gstack-review-log` was called and `gstack-review-read` was run at least once. If no plan file is in context (e.g. `/codex consult` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/devex-review/SKILL.md b/devex-review/SKILL.md index b607c44a4..791db192f 100644 --- a/devex-review/SKILL.md +++ b/devex-review/SKILL.md @@ -1176,14 +1176,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one diff --git a/package.json b/package.json index 3eb9f6f3d..229d7034c 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.57.6.0", + "version": "1.57.7.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index be1f9aa08..a3c4107eb 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -1413,12 +1413,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured `## GSTACK REVIEW REPORT` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded `NO UNRESOLVED DECISIONS`, or a bullet of a final + `**UNRESOLVED DECISIONS:**` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm `gstack-review-log` was called and `gstack-review-read` was run at least once. If no plan file is in context (e.g. `/codex consult` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/plan-ceo-review/sections/review-sections.md b/plan-ceo-review/sections/review-sections.md index 80d903665..517125b39 100644 --- a/plan-ceo-review/sections/review-sections.md +++ b/plan-ceo-review/sections/review-sections.md @@ -712,14 +712,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index cd4e3a6f7..539175b4a 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -1434,12 +1434,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured `## GSTACK REVIEW REPORT` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded `NO UNRESOLVED DECISIONS`, or a bullet of a final + `**UNRESOLVED DECISIONS:**` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm `gstack-review-log` was called and `gstack-review-read` was run at least once. If no plan file is in context (e.g. `/codex consult` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/plan-design-review/sections/review-sections.md b/plan-design-review/sections/review-sections.md index 0d641198d..fde4b79f9 100644 --- a/plan-design-review/sections/review-sections.md +++ b/plan-design-review/sections/review-sections.md @@ -458,14 +458,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one diff --git a/plan-devex-review/SKILL.md b/plan-devex-review/SKILL.md index 0fafac7f9..7f75f1023 100644 --- a/plan-devex-review/SKILL.md +++ b/plan-devex-review/SKILL.md @@ -1397,12 +1397,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured `## GSTACK REVIEW REPORT` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded `NO UNRESOLVED DECISIONS`, or a bullet of a final + `**UNRESOLVED DECISIONS:**` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm `gstack-review-log` was called and `gstack-review-read` was run at least once. If no plan file is in context (e.g. `/codex consult` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/plan-devex-review/sections/review-sections.md b/plan-devex-review/sections/review-sections.md index 0e94ceb62..db1be2a96 100644 --- a/plan-devex-review/sections/review-sections.md +++ b/plan-devex-review/sections/review-sections.md @@ -576,6 +576,17 @@ this run (an empty file means "ran, no findings" — distinct from "didn't run") ### Unresolved Decisions If any AskUserQuestion goes unanswered, note here. Never silently default. +## Review Log + +Persist after the DX Scorecard — the dashboard, the GSTACK REVIEW REPORT, and the EXIT +PLAN MODE GATE's "review log was called" check depend on it. **PLAN MODE EXCEPTION — ALWAYS RUN** (writes to `~/.gstack/`, not project files): + +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"PRODUCT_TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"COMPETITIVE_TIER","unresolved":N,"commit":"COMMIT"}' +``` + +TIMESTAMP = current ISO 8601 datetime; STATUS = "clean" if score 8+ AND 0 unresolved, else "issues_open"; other fields from the DX Scorecard + Step 0; COMMIT = `git rev-parse --short HEAD`. + ## Review Readiness Dashboard After completing the review, read the review log and config to display the dashboard. @@ -675,14 +686,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one diff --git a/plan-devex-review/sections/review-sections.md.tmpl b/plan-devex-review/sections/review-sections.md.tmpl index e1505f6c1..eca5dbcca 100644 --- a/plan-devex-review/sections/review-sections.md.tmpl +++ b/plan-devex-review/sections/review-sections.md.tmpl @@ -334,6 +334,17 @@ DX IMPLEMENTATION CHECKLIST ### Unresolved Decisions If any AskUserQuestion goes unanswered, note here. Never silently default. +## Review Log + +Persist after the DX Scorecard — the dashboard, the GSTACK REVIEW REPORT, and the EXIT +PLAN MODE GATE's "review log was called" check depend on it. **PLAN MODE EXCEPTION — ALWAYS RUN** (writes to `~/.gstack/`, not project files): + +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"PRODUCT_TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"COMPETITIVE_TIER","unresolved":N,"commit":"COMMIT"}' +``` + +TIMESTAMP = current ISO 8601 datetime; STATUS = "clean" if score 8+ AND 0 unresolved, else "issues_open"; other fields from the DX Scorecard + Step 0; COMMIT = `git rev-parse --short HEAD`. + {{REVIEW_DASHBOARD}} {{PLAN_FILE_REVIEW_REPORT}} diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index c31394e2b..58c5cc9c4 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -969,12 +969,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured `## GSTACK REVIEW REPORT` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded `NO UNRESOLVED DECISIONS`, or a bullet of a final + `**UNRESOLVED DECISIONS:**` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm `gstack-review-log` was called and `gstack-review-read` was run at least once. If no plan file is in context (e.g. `/codex consult` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/plan-eng-review/sections/review-sections.md b/plan-eng-review/sections/review-sections.md index 43125b0af..cd677ab3c 100644 --- a/plan-eng-review/sections/review-sections.md +++ b/plan-eng-review/sections/review-sections.md @@ -766,14 +766,24 @@ Produce this markdown table: | DX Review | \`/plan-devex-review\` | Developer experience gaps | {runs} | {status} | {findings} | \`\`\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \`## GSTACK REVIEW REPORT\` +heading — a bold label, never a new \`## \` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \`NO UNRESOLVED DECISIONS\` (a bolded one +does NOT count), OR a \`**UNRESOLVED DECISIONS:**\` header + one bullet per open item +(last bullet = final line; add \`+ N unresolved from prior reviews\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \`unresolved\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts index 9b82b8d8b..6b8546275 100644 --- a/scripts/resolvers/review.ts +++ b/scripts/resolvers/review.ts @@ -119,14 +119,24 @@ Produce this markdown table: | DX Review | \\\`/plan-devex-review\\\` | Developer experience gaps | {runs} | {status} | {findings} | \\\`\\\`\\\` -Below the table, add these lines (omit any that are empty/not applicable): +Below the table, add these lines. **CODEX** and **CROSS-MODEL** are optional (omit when +empty); **VERDICT** is always present: - **CODEX:** (only if codex-review ran) — one-line summary of codex fixes - **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis -- **UNRESOLVED:** total unresolved decisions across all reviews - **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement"). If Eng Review is not CLEAR and not skipped globally, append "eng review required". +**Unresolved-decisions status (MANDATORY — never omitted; the report's final non-whitespace +line).** After VERDICT, end the report (content under the \\\`## GSTACK REVIEW REPORT\\\` +heading — a bold label, never a new \\\`## \\\` heading; exempt from the "omit when empty" +rule) with exactly one: the exact unbolded line \\\`NO UNRESOLVED DECISIONS\\\` (a bolded one +does NOT count), OR a \\\`**UNRESOLVED DECISIONS:**\\\` header + one bullet per open item +(last bullet = final line; add \\\`+ N unresolved from prior reviews\\\` only when N > 0). +This avoids double-counting: list THIS review's open items from context; for prior reviews +sum \\\`unresolved\\\` over the latest fresh row per skill (dashboard 7-day window) after you +DROP the current skill's row; emit the sentinel only when both are zero. + ### Write to the plan file **PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one @@ -169,12 +179,17 @@ missing work — do NOT call ExitPlanMode: In-body prose that mentions "outside voice", "codex findings", or similar does NOT count — only the structured \`## GSTACK REVIEW REPORT\` section satisfies this check. -3. Confirm the report contains: a Runs / Status / Findings table, a VERDICT - line, and absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable. -4. If a plan file is in context for this skill invocation: confirm +3. Confirm the report has a Runs / Status / Findings table and a VERDICT line + (CODEX / CROSS-MODEL absorbed if applicable). +4. Confirm the report's FINAL non-whitespace line is the unresolved-decisions + status: the exact unbolded \`NO UNRESOLVED DECISIONS\`, or a bullet of a final + \`**UNRESOLVED DECISIONS:**\` block. BLOCKING, no "if applicable" escape — a + bolded sentinel, any trailing CODEX/CROSS-MODEL/VERDICT/prose, or a missing + status each FAILS the gate. +5. If a plan file is in context for this skill invocation: confirm \`gstack-review-log\` was called and \`gstack-review-read\` was run at least once. If no plan file is in context (e.g. \`/codex consult\` against a - diff with no plan), this check short-circuits — checks 1-3 already + diff with no plan), this check short-circuits — checks 1-4 already short-circuit when no plan file exists. Failing this gate and calling ExitPlanMode anyway is a contract violation — diff --git a/test/fixtures/parity-baseline-v1.57.7.0.json b/test/fixtures/parity-baseline-v1.57.7.0.json new file mode 100644 index 000000000..dab983329 --- /dev/null +++ b/test/fixtures/parity-baseline-v1.57.7.0.json @@ -0,0 +1,633 @@ +{ + "tag": "v1.57.7.0", + "capturedAt": "2026-05-30T18:00:56.209Z", + "capturedFromCommit": "49035bdd", + "capturedFromBranch": "garrytan/plan-flag-unresolved-issues", + "totalSkills": 52, + "totalCorpusBytes": 3359373, + "estTotalCatalogTokens": 4116, + "topHeaviest": [ + { + "skill": "ship", + "skillMdBytes": 174407, + "skillMdLines": 3137, + "estTokens": 43602, + "tmplBytes": 53240, + "descriptionLen": 291, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "plan-ceo-review", + "skillMdBytes": 144411, + "skillMdLines": 2349, + "estTokens": 36103, + "tmplBytes": 63461, + "descriptionLen": 794, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "office-hours", + "skillMdBytes": 123037, + "skillMdLines": 2200, + "estTokens": 30759, + "tmplBytes": 55534, + "descriptionLen": 860, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "plan-design-review", + "skillMdBytes": 118532, + "skillMdLines": 2073, + "estTokens": 29633, + "tmplBytes": 28717, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "plan-devex-review", + "skillMdBytes": 117907, + "skillMdLines": 2277, + "estTokens": 29477, + "tmplBytes": 35773, + "descriptionLen": 250, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "spec", + "skillMdBytes": 117382, + "skillMdLines": 2276, + "estTokens": 29346, + "tmplBytes": 30590, + "descriptionLen": 282, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "plan-eng-review", + "skillMdBytes": 114209, + "skillMdLines": 1906, + "estTokens": 28552, + "tmplBytes": 26302, + "descriptionLen": 231, + "hasGateEval": true, + "hasPeriodicEval": true + }, + { + "skill": "design-review", + "skillMdBytes": 100149, + "skillMdLines": 1953, + "estTokens": 25037, + "tmplBytes": 11674, + "descriptionLen": 304, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "review", + "skillMdBytes": 99573, + "skillMdLines": 1787, + "estTokens": 24893, + "tmplBytes": 14099, + "descriptionLen": 205, + "hasGateEval": true, + "hasPeriodicEval": false + }, + { + "skill": "land-and-deploy", + "skillMdBytes": 96379, + "skillMdLines": 1877, + "estTokens": 24095, + "tmplBytes": 48624, + "descriptionLen": 160, + "hasGateEval": true, + "hasPeriodicEval": false + } + ], + "skills": { + "autoplan": { + "skill": "autoplan", + "skillMdBytes": 95365, + "skillMdLines": 1805, + "estTokens": 23841, + "tmplBytes": 45271, + "descriptionLen": 366, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "benchmark": { + "skill": "benchmark", + "skillMdBytes": 33646, + "skillMdLines": 750, + "estTokens": 8412, + "tmplBytes": 9378, + "descriptionLen": 213, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "benchmark-models": { + "skill": "benchmark-models", + "skillMdBytes": 29713, + "skillMdLines": 625, + "estTokens": 7428, + "tmplBytes": 6631, + "descriptionLen": 217, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "browse": { + "skill": "browse", + "skillMdBytes": 48531, + "skillMdLines": 933, + "estTokens": 12133, + "tmplBytes": 10805, + "descriptionLen": 181, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "canary": { + "skill": "canary", + "skillMdBytes": 51598, + "skillMdLines": 1011, + "estTokens": 12900, + "tmplBytes": 8033, + "descriptionLen": 180, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "careful": { + "skill": "careful", + "skillMdBytes": 2567, + "skillMdLines": 68, + "estTokens": 642, + "tmplBytes": 2435, + "descriptionLen": 315, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "codex": { + "skill": "codex", + "skillMdBytes": 85212, + "skillMdLines": 1555, + "estTokens": 21303, + "tmplBytes": 34143, + "descriptionLen": 187, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "context-restore": { + "skill": "context-restore", + "skillMdBytes": 45986, + "skillMdLines": 869, + "estTokens": 11497, + "tmplBytes": 5255, + "descriptionLen": 238, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "context-save": { + "skill": "context-save", + "skillMdBytes": 50183, + "skillMdLines": 987, + "estTokens": 12546, + "tmplBytes": 9293, + "descriptionLen": 168, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "cso": { + "skill": "cso", + "skillMdBytes": 83808, + "skillMdLines": 1498, + "estTokens": 20952, + "tmplBytes": 35646, + "descriptionLen": 196, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-consultation": { + "skill": "design-consultation", + "skillMdBytes": 84683, + "skillMdLines": 1598, + "estTokens": 21171, + "tmplBytes": 25899, + "descriptionLen": 888, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-html": { + "skill": "design-html", + "skillMdBytes": 71042, + "skillMdLines": 1470, + "estTokens": 17761, + "tmplBytes": 22567, + "descriptionLen": 233, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "design-review": { + "skill": "design-review", + "skillMdBytes": 100149, + "skillMdLines": 1953, + "estTokens": 25037, + "tmplBytes": 11674, + "descriptionLen": 304, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "design-shotgun": { + "skill": "design-shotgun", + "skillMdBytes": 67331, + "skillMdLines": 1332, + "estTokens": 16833, + "tmplBytes": 13331, + "descriptionLen": 786, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "devex-review": { + "skill": "devex-review", + "skillMdBytes": 69681, + "skillMdLines": 1264, + "estTokens": 17420, + "tmplBytes": 7984, + "descriptionLen": 201, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "document-generate": { + "skill": "document-generate", + "skillMdBytes": 58327, + "skillMdLines": 1211, + "estTokens": 14582, + "tmplBytes": 15939, + "descriptionLen": 334, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "document-release": { + "skill": "document-release", + "skillMdBytes": 64403, + "skillMdLines": 1281, + "estTokens": 16101, + "tmplBytes": 20974, + "descriptionLen": 192, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "freeze": { + "skill": "freeze", + "skillMdBytes": 3184, + "skillMdLines": 92, + "estTokens": 796, + "tmplBytes": 3038, + "descriptionLen": 503, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "gstack-upgrade": { + "skill": "gstack-upgrade", + "skillMdBytes": 10817, + "skillMdLines": 285, + "estTokens": 2704, + "tmplBytes": 10667, + "descriptionLen": 163, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "guard": { + "skill": "guard", + "skillMdBytes": 3314, + "skillMdLines": 91, + "estTokens": 829, + "tmplBytes": 3181, + "descriptionLen": 686, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "health": { + "skill": "health", + "skillMdBytes": 52409, + "skillMdLines": 1035, + "estTokens": 13102, + "tmplBytes": 11617, + "descriptionLen": 184, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "investigate": { + "skill": "investigate", + "skillMdBytes": 54902, + "skillMdLines": 1033, + "estTokens": 13726, + "tmplBytes": 11561, + "descriptionLen": 1379, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ios-clean": { + "skill": "ios-clean", + "skillMdBytes": 45540, + "skillMdLines": 834, + "estTokens": 11385, + "tmplBytes": 3851, + "descriptionLen": 252, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-design-review": { + "skill": "ios-design-review", + "skillMdBytes": 46124, + "skillMdLines": 836, + "estTokens": 11531, + "tmplBytes": 4417, + "descriptionLen": 209, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-fix": { + "skill": "ios-fix", + "skillMdBytes": 45253, + "skillMdLines": 832, + "estTokens": 11313, + "tmplBytes": 3574, + "descriptionLen": 187, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "ios-qa": { + "skill": "ios-qa", + "skillMdBytes": 51764, + "skillMdLines": 952, + "estTokens": 12941, + "tmplBytes": 10090, + "descriptionLen": 223, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ios-sync": { + "skill": "ios-sync", + "skillMdBytes": 45230, + "skillMdLines": 825, + "estTokens": 11308, + "tmplBytes": 3544, + "descriptionLen": 269, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "land-and-deploy": { + "skill": "land-and-deploy", + "skillMdBytes": 96379, + "skillMdLines": 1877, + "estTokens": 24095, + "tmplBytes": 48624, + "descriptionLen": 160, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "landing-report": { + "skill": "landing-report", + "skillMdBytes": 48478, + "skillMdLines": 895, + "estTokens": 12120, + "tmplBytes": 6806, + "descriptionLen": 195, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "learn": { + "skill": "learn", + "skillMdBytes": 46215, + "skillMdLines": 912, + "estTokens": 11554, + "tmplBytes": 5594, + "descriptionLen": 178, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "make-pdf": { + "skill": "make-pdf", + "skillMdBytes": 30270, + "skillMdLines": 673, + "estTokens": 7568, + "tmplBytes": 5546, + "descriptionLen": 177, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "office-hours": { + "skill": "office-hours", + "skillMdBytes": 123037, + "skillMdLines": 2200, + "estTokens": 30759, + "tmplBytes": 55534, + "descriptionLen": 860, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "open-gstack-browser": { + "skill": "open-gstack-browser", + "skillMdBytes": 50624, + "skillMdLines": 975, + "estTokens": 12656, + "tmplBytes": 7702, + "descriptionLen": 204, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "pair-agent": { + "skill": "pair-agent", + "skillMdBytes": 51432, + "skillMdLines": 1031, + "estTokens": 12858, + "tmplBytes": 8548, + "descriptionLen": 167, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "plan-ceo-review": { + "skill": "plan-ceo-review", + "skillMdBytes": 144411, + "skillMdLines": 2349, + "estTokens": 36103, + "tmplBytes": 63461, + "descriptionLen": 794, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-design-review": { + "skill": "plan-design-review", + "skillMdBytes": 118532, + "skillMdLines": 2073, + "estTokens": 29633, + "tmplBytes": 28717, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-devex-review": { + "skill": "plan-devex-review", + "skillMdBytes": 117907, + "skillMdLines": 2277, + "estTokens": 29477, + "tmplBytes": 35773, + "descriptionLen": 250, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-eng-review": { + "skill": "plan-eng-review", + "skillMdBytes": 114209, + "skillMdLines": 1906, + "estTokens": 28552, + "tmplBytes": 26302, + "descriptionLen": 231, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "plan-tune": { + "skill": "plan-tune", + "skillMdBytes": 67548, + "skillMdLines": 1372, + "estTokens": 16887, + "tmplBytes": 26922, + "descriptionLen": 325, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "qa": { + "skill": "qa", + "skillMdBytes": 78356, + "skillMdLines": 1643, + "estTokens": 19589, + "tmplBytes": 12701, + "descriptionLen": 218, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "qa-only": { + "skill": "qa-only", + "skillMdBytes": 60914, + "skillMdLines": 1215, + "estTokens": 15229, + "tmplBytes": 3851, + "descriptionLen": 165, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "retro": { + "skill": "retro", + "skillMdBytes": 87382, + "skillMdLines": 1771, + "estTokens": 21846, + "tmplBytes": 42427, + "descriptionLen": 648, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "review": { + "skill": "review", + "skillMdBytes": 99573, + "skillMdLines": 1787, + "estTokens": 24893, + "tmplBytes": 14099, + "descriptionLen": 205, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "scrape": { + "skill": "scrape", + "skillMdBytes": 48134, + "skillMdLines": 908, + "estTokens": 12034, + "tmplBytes": 5220, + "descriptionLen": 167, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "setup-browser-cookies": { + "skill": "setup-browser-cookies", + "skillMdBytes": 26998, + "skillMdLines": 597, + "estTokens": 6750, + "tmplBytes": 2724, + "descriptionLen": 222, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "setup-deploy": { + "skill": "setup-deploy", + "skillMdBytes": 48420, + "skillMdLines": 940, + "estTokens": 12105, + "tmplBytes": 7780, + "descriptionLen": 197, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "setup-gbrain": { + "skill": "setup-gbrain", + "skillMdBytes": 85495, + "skillMdLines": 1794, + "estTokens": 21374, + "tmplBytes": 44851, + "descriptionLen": 323, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "ship": { + "skill": "ship", + "skillMdBytes": 174407, + "skillMdLines": 3137, + "estTokens": 43602, + "tmplBytes": 53240, + "descriptionLen": 291, + "hasGateEval": true, + "hasPeriodicEval": true + }, + "skillify": { + "skill": "skillify", + "skillMdBytes": 58027, + "skillMdLines": 1189, + "estTokens": 14507, + "tmplBytes": 15107, + "descriptionLen": 233, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "spec": { + "skill": "spec", + "skillMdBytes": 117382, + "skillMdLines": 2276, + "estTokens": 29346, + "tmplBytes": 30590, + "descriptionLen": 282, + "hasGateEval": true, + "hasPeriodicEval": false + }, + "sync-gbrain": { + "skill": "sync-gbrain", + "skillMdBytes": 62977, + "skillMdLines": 1191, + "estTokens": 15744, + "tmplBytes": 16077, + "descriptionLen": 299, + "hasGateEval": false, + "hasPeriodicEval": false + }, + "unfreeze": { + "skill": "unfreeze", + "skillMdBytes": 1504, + "skillMdLines": 49, + "estTokens": 376, + "tmplBytes": 1386, + "descriptionLen": 199, + "hasGateEval": false, + "hasPeriodicEval": false + } + } +} diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index 24f337f3d..431209a7f 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -3239,3 +3239,62 @@ describe('EXIT PLAN MODE GATE placement', () => { expect(codex).toContain('Failing this gate and calling ExitPlanMode anyway is a contract violation'); }); }); + +describe('GSTACK REVIEW REPORT mandatory unresolved-decisions status', () => { + // Report text rides in PLAN_FILE_REVIEW_REPORT → every report consumer gets it. + // devex-review is a report consumer but NOT a gate consumer, so the two target + // sets differ (CP5/CX5). Regression guard: a future token-cut that drops the + // unresolved-status line again fails here. See plan-flag-unresolved-issues. + const REPORT_CONSUMERS = [ + 'plan-ceo-review', + 'plan-eng-review', + 'plan-design-review', + 'plan-devex-review', + 'codex', + 'devex-review', + ]; + // Gate text rides in EXIT_PLAN_MODE_GATE (lives in SKILL.md, not sections). + const GATE_SKILLS = [ + 'plan-ceo-review', + 'plan-eng-review', + 'plan-design-review', + 'plan-devex-review', + 'codex', + ]; + + for (const skill of REPORT_CONSUMERS) { + test(`${skill}: report mandates the unresolved-decisions status as final content`, () => { + const content = readSkillUnion(skill); + expect(content).toContain('NO UNRESOLVED DECISIONS'); + // The "never omit / always final" contract must be present, not just the phrase. + expect(content).toContain('Unresolved-decisions status (MANDATORY'); + expect(content).toMatch(/never omitted/); + // \s+ tolerates prose line-wraps within "final non-whitespace line". + expect(content).toMatch(/final\s+non-whitespace\s+line/); + }); + } + + for (const skill of GATE_SKILLS) { + test(`${skill}: exit gate blocks unless the unresolved status is the final line`, () => { + const md = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8'); + // Gate check #4 — present, sentinel named, and explicitly blocking (no escape). + expect(md).toContain('NO UNRESOLVED DECISIONS'); + expect(md).toContain('FINAL non-whitespace line is the unresolved-decisions'); + expect(md).toContain('FAILS the gate'); + }); + } + + test('scripts/resolvers/review.ts source carries the mandatory block + blocking gate', () => { + const src = fs.readFileSync(path.join(ROOT, 'scripts', 'resolvers', 'review.ts'), 'utf-8'); + // Report resolver: mandatory, never-omitted, exact sentinel, anti-double-count algorithm. + expect(src).toContain('Unresolved-decisions status (MANDATORY'); + expect(src).toContain('NO UNRESOLVED DECISIONS'); + expect(src).toContain('avoids double-counting'); + expect(src).toContain('DROP the current skill'); + // Gate resolver: the blocking final-line check with no "if applicable" escape. + expect(src).toContain('FINAL non-whitespace line is the unresolved-decisions'); + expect(src).toContain('FAILS the gate'); + // The old soft wording must be gone from the gate. + expect(src).not.toContain('absorbs CODEX / CROSS-MODEL / UNRESOLVED lines if applicable'); + }); +}); diff --git a/test/parity-suite.test.ts b/test/parity-suite.test.ts index 32ce49f12..bc85bf23f 100644 --- a/test/parity-suite.test.ts +++ b/test/parity-suite.test.ts @@ -2,15 +2,19 @@ * Cathedral parity suite — gate-tier (free, structural + content checks). * * Runs every PARITY_INVARIANTS check against the current SKILL.md output - * vs the v1.53.0.0 baseline. Failures get an actionable, per-skill report + * vs the v1.57.7.0 baseline. Failures get an actionable, per-skill report * showing missing phrases, missing headings, and size ratios. * - * Baseline rebased v1.44.1 → v1.53.0.0: the brain-aware-planning releases - * (v1.49–v1.52) plus the v1.53 redaction guard pushed five planning skills - * past the 5% ratchet on the frozen v1.44.1 anchor. Rebasing absorbs that - * legitimate growth at HEAD while keeping the per-skill 1.05 ratio so future - * bloat is still caught. Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 baselines - * are retained in test/fixtures/ for the v1→v2 audit trail. + * Baseline rebased v1.53.0.0 → v1.57.7.0: the v1.54–v1.57 releases (ship/plan + * carving, carve-guards, AUQ prose fallback, the cross-session decision-log + * preamble) plus the mandatory unresolved-decisions status added to every + * GSTACK REVIEW REPORT pushed the three plan-review skills past the 5% ratchet + * on the v1.53 anchor even after exhaustive compression. The v1.57.7.0 baseline + * captures current UNION sizes (skeleton + sections/*.md, matching what the + * harness measures) so the per-skill 1.05 ratio still catches future bloat. + * Earlier rebase v1.44.1 → v1.53.0.0: brain-aware-planning (v1.49–v1.52) + the + * v1.53 redaction guard. Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 / v1.53.0.0 + * baselines are retained in test/fixtures/ for the audit trail. * * Periodic-tier LLM-judge parity (paid) lands in Phase B (v2.0.0.0) * alongside the sections/ extraction. Plumbing is in parity-harness.ts. @@ -23,9 +27,9 @@ import { runParityChecks, PARITY_INVARIANTS } from './helpers/parity-harness'; import type { ParityBaseline } from './helpers/capture-parity-baseline'; const REPO_ROOT = path.resolve(import.meta.dir, '..'); -const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.53.0.0.json'); +const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.57.7.0.json'); -describe('parity suite vs v1.53.0.0 baseline (gate, free)', () => { +describe('parity suite vs v1.57.7.0 baseline (gate, free)', () => { test('baseline exists', () => { expect(fs.existsSync(BASELINE_PATH)).toBe(true); }); diff --git a/test/skill-e2e-plan.test.ts b/test/skill-e2e-plan.test.ts index 98fded4bb..27e4d74d8 100644 --- a/test/skill-e2e-plan.test.ts +++ b/test/skill-e2e-plan.test.ts @@ -692,7 +692,7 @@ Read plan.md — that's the plan to review. This is a standalone plan document, Proceed directly to the full review. Skip any AskUserQuestion calls — this is non-interactive. Skip the preamble bash block, lake intro, telemetry, and contributor mode sections. -CRITICAL REQUIREMENT: plan.md IS the plan file for this review session. After completing your review, you MUST write a "## GSTACK REVIEW REPORT" section to the END of plan.md, exactly as described in the "Plan File Review Report" section of SKILL.md. If gstack-review-read is not available or returns NO_REVIEWS, write the placeholder table with all four review rows (CEO, Codex, Eng, Design). Use the Edit tool to append to plan.md — do NOT overwrite the existing plan content. +CRITICAL REQUIREMENT: plan.md IS the plan file for this review session. After completing your review, you MUST write a "## GSTACK REVIEW REPORT" section to the END of plan.md, exactly as described in the "Plan File Review Report" section of SKILL.md. If gstack-review-read is not available or returns NO_REVIEWS, write the placeholder table with all five review rows (CEO, Codex, Eng, Design, DX). The report MUST end with the mandatory unresolved-decisions status as its final line — the exact unbolded line NO UNRESOLVED DECISIONS when nothing is open, or a "**UNRESOLVED DECISIONS:**" block of bullets when items remain. Nothing may follow it. Use the Edit tool to append to plan.md — do NOT overwrite the existing plan content. This review report at the bottom of the plan is the MOST IMPORTANT deliverable of this test.`, workingDirectory: planDir, @@ -741,7 +741,24 @@ This review report at the bottom of the plan is the MOST IMPORTANT deliverable o expect(afterReport).toContain('Eng Review'); expect(afterReport).toContain('Design Review'); - console.log('Plan review report found at bottom of plan.md'); + // Mandatory unresolved-decisions status (plan-flag-unresolved-issues): the report's + // final non-whitespace line must be the unresolved status — the exact sentinel or a + // bullet of an UNRESOLVED DECISIONS block, with nothing (CODEX/CROSS-MODEL/VERDICT/ + // prose) after it. + expect(afterReport).toContain('UNRESOLVED DECISIONS'); + // Compute from afterReport (the report section to EOF), not the whole file, so a + // mid-file report surfaces the real trailing content in the failure message. + const nonEmpty = afterReport.split('\n').map(l => l.trim()).filter(l => l !== ''); + const lastLine = nonEmpty[nonEmpty.length - 1]; + const isSentinel = lastLine === 'NO UNRESOLVED DECISIONS'; + const isUnresolvedBullet = + /^[-*]\s+/.test(lastLine) && !/VERDICT/i.test(lastLine) && afterReport.includes('UNRESOLVED DECISIONS:'); + expect( + isSentinel || isUnresolvedBullet, + `report must end with the unresolved-decisions status; last line was: ${lastLine}`, + ).toBe(true); + + console.log('Plan review report found at bottom of plan.md (ends with unresolved status)'); }, 420_000); });