v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs (#1966)

* feat(config): make codex_reviews the master switch for all Codex review

Broaden the codex_reviews doc to describe it governing /review, /ship,
/document-release, plan reviews, and /autoplan. Reject invalid values on
set (preserving the existing value) so a typo can never silently flip
paid Codex calls on or off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(review): Codex review default-on across review/ship/plan/docs

Add a shared codexPreflight() helper (constants.ts) that, in one bash
block, reads codex_reviews, sources gstack-codex-probe, checks install +
auth, and echoes a single canonical mode (ready/not_installed/not_authed/
disabled). All Codex resolvers route through it.

- generateCodexPlanReview: opt-in question removed; the outside voice now
  runs automatically (default-on), falling back to a Claude subagent when
  Codex is missing/unauthed. Cross-model tension still gates on user
  approval (sovereignty preserved).
- generateAdversarialStep: probe-based availability (install AND auth),
  distinct not-installed vs not-authed guidance; 200-line structured-review
  threshold unchanged.
- generateCodexDocReview (new, wired via CODEX_DOC_REVIEW): reviews the
  release's docs against the shipped diff range, informational + an explicit
  apply-fixes decision point, never auto-edits.
- autoplan Phase 0.5 now honors codex_reviews=disabled so the switch is
  truly global.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(docs): regenerate SKILL docs + refresh ship golden

Output of gen:skill-docs for the Codex-default-on resolver/template
changes. Refreshes the factory-ship golden fixture (codex-host output
unchanged — resolvers strip for the codex host).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(infra): widen size-budget guards for default-on Codex outside-voice

The codexPreflight() block + CODEX_MODE branch prose (replacing the
smaller opt-in question) grows plan-ceo/eng/devex-review and review by
5-7% over baseline. Each bump carries a comment justifying it as
intentional capability, not slop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: guard Codex default-on + config reject-on-set

skill-validation: assert plan reviews no longer carry the opt-in question
and render the default-on outside-voice, document-release carries the doc
review, and the codex host strips all of it.

gstack-config: codex_reviews defaults to enabled, accepts enabled/disabled,
and rejects an invalid value while preserving the existing one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(test): align gstack-config tests with defaults-fallback behavior

Three tests (last touched v0.13.7.0) asserted get/list print empty for
unset keys, but gstack-config falls back to the documented defaults table
(get returns the default, list shows the active-values block). Update the
assertions to the real behavior and split out an unknown-key case that does
still return empty. Pre-existing red, unrelated to codex review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs

Codex cross-model review now runs by default on /review, /ship, all four
plan reviews, /document-release, and /autoplan, governed by one master
switch (codex_reviews, default enabled). Plan-review outside voice is
default-on; /document-release gets a new Codex doc-vs-diff audit; every
call site detects install AND auth and falls back to a Claude subagent
with a clear reason. Disable everything with:
gstack-config set codex_reviews disabled

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-10 21:14:58 -07:00
committed by GitHub
parent 8241949357
commit a5833c413f
21 changed files with 766 additions and 196 deletions
+20 -8
View File
@@ -145,6 +145,9 @@ export const CARVE_GUARDS: Record<string, CarveGuard> = {
maxSkeletonBytes: 90_000,
minUnionBytes: 80_000,
mustContain: ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'],
// Default-on Codex outside-voice (codexPreflight block + CODEX_MODE branch
// prose replacing the smaller opt-in question) lands this ~5.2% over baseline.
maxSizeRatio: 1.08,
},
'plan-eng-review': {
skill: 'plan-eng-review',
@@ -162,9 +165,11 @@ export const CARVE_GUARDS: Record<string, CarveGuard> = {
minUnionBytes: 70_000,
mustContain: ['Architecture', 'Code Quality', 'Test', 'Performance'],
// Cross-cutting preamble growth (v1.57.2.0 AUQ-failure prose fallback + the
// decision-memory nudge + the v1.57.4.0 Boil-the-Ocean rename) lands this just
// over the strict 1.05; small headroom for the shared preamble additions.
maxSizeRatio: 1.06,
// decision-memory nudge + the v1.57.4.0 Boil-the-Ocean rename) plus the
// default-on Codex outside-voice (codexPreflight block + CODEX_MODE branch
// prose, replacing the smaller opt-in question) land this at ~6.6% over the
// v1.53.0.0 baseline. Headroom for those intentional additions.
maxSizeRatio: 1.08,
},
'plan-design-review': {
skill: 'plan-design-review',
@@ -197,6 +202,9 @@ export const CARVE_GUARDS: Record<string, CarveGuard> = {
maxSkeletonBytes: 76_000,
minUnionBytes: 70_000,
mustContain: ['developer experience', 'Getting Started'],
// Default-on Codex outside-voice (codexPreflight block + CODEX_MODE branch
// prose replacing the smaller opt-in question) lands this ~5.7% over baseline.
maxSizeRatio: 1.08,
},
'office-hours': {
skill: 'office-hours',
@@ -232,11 +240,15 @@ export const CARVE_GUARDS: Record<string, CarveGuard> = {
maxSkeletonBytes: 50_000,
minUnionBytes: 55_000,
mustContain: ['CHANGELOG', 'Diataxis', 'coverage'],
// The AUQ-failure prose fallback (v1.57.2.0) adds ~2KB to every skill's
// always-loaded preamble; on this small carved skeleton that lands at ~5.9%
// over the pre-carve/pre-AUQ v1.53.0.0 baseline. Headroom for the
// cross-cutting addition; all other skills keep the strict 1.05 ceiling.
maxSizeRatio: 1.08,
// Two intentional additions stack on this small skill: the AUQ-failure prose
// fallback (v1.57.2.0, ~2KB to every preamble) AND the new default-on Codex
// documentation-review section (codexPreflight + prompt + apply-gate, carved
// into release-body so the SKELETON stays under maxSkeletonBytes). On a ~55KB
// baseline that whole new capability is ~18.6% of union bytes. The doc review
// is a deliberate new feature, not preamble creep; the union ceiling is raised
// to match while the skeleton budget (50_000) still holds the always-loaded
// cost flat.
maxSizeRatio: 1.20,
},
'design-consultation': {
skill: 'design-consultation',
+5 -1
View File
@@ -210,7 +210,11 @@ const MONOLITH_INVARIANTS: ParityInvariant[] = [
skill: 'review',
mustContain: ['confidence', 'P1', 'P2'],
mustHaveHeadings: ['## Preamble', '## When to invoke'],
maxSizeRatio: 1.05,
// The adversarial step swapped its bare `command -v codex` check for the shared
// codexPreflight() block (install + auth tri-state + CODEX_MODE branch prose),
// landing ~6.3% over the v1.53.0.0 baseline. Intentional: it adds proper
// not-installed vs not-authed handling, not slop.
maxSizeRatio: 1.08,
minBytes: 70_000,
},
{