diff --git a/CHANGELOG.md b/CHANGELOG.md
index cfa6a53d..356ae97e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,21 @@
# Changelog
+## [0.14.3.0] - 2026-03-31 — Always-On Adversarial Review + Scope Drift + Plan Mode Design Tools
+
+Every code review now runs adversarial analysis from both Claude and Codex, regardless of diff size. A 5-line auth change gets the same cross-model scrutiny as a 500-line feature. The old "skip adversarial for small diffs" heuristic is gone... diff size was never a good proxy for risk.
+
+### Added
+
+- **Always-on adversarial review.** Every `/review` and `/ship` run now dispatches both a Claude adversarial subagent and a Codex adversarial challenge. No more tier-based skipping. The Codex structured review (formal P1 pass/fail gate) still runs on large diffs (200+ lines) where the formal gate adds value.
+- **Scope drift detection in `/ship`.** Before shipping, `/ship` now checks whether you built what you said you'd build, nothing more, nothing less. Catches scope creep ("while I was in there..." changes) and missing requirements. Results appear in the PR body.
+- **Plan Mode Safe Operations.** Browse screenshots, design mockups, Codex outside voices, and writing to `~/.gstack/` are now explicitly allowed in plan mode. Design-related skills (`/design-consultation`, `/design-shotgun`, `/design-html`, `/plan-design-review`) can generate visual artifacts during planning without fighting plan mode restrictions.
+
+### Changed
+
+- **Adversarial opt-out split.** The legacy `codex_reviews=disabled` config now only gates Codex passes. Claude adversarial subagent always runs since it's free and fast. Previously the kill switch disabled everything.
+- **Cross-model tension format.** Outside voice disagreements now include `RECOMMENDATION` and `Completeness` scores, matching the standard AskUserQuestion format used everywhere else in gstack.
+- **Scope drift is now a shared resolver.** Extracted from `/review` into `generateScopeDrift()` so both `/review` and `/ship` use the same logic. DRY.
+
## [0.14.2.0] - 2026-03-30 — Sidebar CSS Inspector + Per-Tab Agents
The sidebar is now a visual design tool. Pick any element on the page and see the full CSS rule cascade, box model, and computed styles right in the Side Panel. Edit styles live and see changes instantly. Each browser tab gets its own independent agent, so you can work on multiple pages simultaneously without cross-talk. Cleanup is LLM-powered... the agent snapshots the page, understands it semantically, and removes the junk while keeping the site's identity.
diff --git a/SKILL.md b/SKILL.md
index 9e6377f6..29b3110c 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -286,6 +286,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/VERSION b/VERSION
index 87df6b05..574c7c4a 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.14.2.0
+0.14.3.0
diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md
index 2754cef0..31ae9ab2 100644
--- a/autoplan/SKILL.md
+++ b/autoplan/SKILL.md
@@ -378,6 +378,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/benchmark/SKILL.md b/benchmark/SKILL.md
index 8fe4bc94..aa6567df 100644
--- a/benchmark/SKILL.md
+++ b/benchmark/SKILL.md
@@ -288,6 +288,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/browse/SKILL.md b/browse/SKILL.md
index 9ad795bc..464c03ae 100644
--- a/browse/SKILL.md
+++ b/browse/SKILL.md
@@ -288,6 +288,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/canary/SKILL.md b/canary/SKILL.md
index 6197b3c4..48c6e0b9 100644
--- a/canary/SKILL.md
+++ b/canary/SKILL.md
@@ -353,6 +353,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/codex/SKILL.md b/codex/SKILL.md
index a3eb12cf..34c1b121 100644
--- a/codex/SKILL.md
+++ b/codex/SKILL.md
@@ -372,6 +372,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/connect-chrome/SKILL.md b/connect-chrome/SKILL.md
index c863b171..d1736f29 100644
--- a/connect-chrome/SKILL.md
+++ b/connect-chrome/SKILL.md
@@ -369,6 +369,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/cso/SKILL.md b/cso/SKILL.md
index 3945884c..ca79f223 100644
--- a/cso/SKILL.md
+++ b/cso/SKILL.md
@@ -357,6 +357,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md
index d1b38a11..ff6b030c 100644
--- a/design-consultation/SKILL.md
+++ b/design-consultation/SKILL.md
@@ -376,6 +376,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/design-html/SKILL.md b/design-html/SKILL.md
index d21750a5..24183b90 100644
--- a/design-html/SKILL.md
+++ b/design-html/SKILL.md
@@ -358,6 +358,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/design-review/SKILL.md b/design-review/SKILL.md
index c152835c..3be5a7c4 100644
--- a/design-review/SKILL.md
+++ b/design-review/SKILL.md
@@ -376,6 +376,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/design-shotgun/SKILL.md b/design-shotgun/SKILL.md
index 3d3421f9..5e29bfcc 100644
--- a/design-shotgun/SKILL.md
+++ b/design-shotgun/SKILL.md
@@ -355,6 +355,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/document-release/SKILL.md b/document-release/SKILL.md
index a1c6ede9..7001fd6c 100644
--- a/document-release/SKILL.md
+++ b/document-release/SKILL.md
@@ -355,6 +355,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/investigate/SKILL.md b/investigate/SKILL.md
index ab940d17..a65025ec 100644
--- a/investigate/SKILL.md
+++ b/investigate/SKILL.md
@@ -370,6 +370,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md
index 6311ecfc..2cca312e 100644
--- a/land-and-deploy/SKILL.md
+++ b/land-and-deploy/SKILL.md
@@ -370,6 +370,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/learn/SKILL.md b/learn/SKILL.md
index 324b4a38..9d344eeb 100644
--- a/learn/SKILL.md
+++ b/learn/SKILL.md
@@ -355,6 +355,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md
index 900fb507..9c8de4ce 100644
--- a/office-hours/SKILL.md
+++ b/office-hours/SKILL.md
@@ -380,6 +380,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md
index c7631669..48a8ab40 100644
--- a/plan-ceo-review/SKILL.md
+++ b/plan-ceo-review/SKILL.md
@@ -376,6 +376,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
@@ -1295,6 +1310,9 @@ For each substantive tension point, use AskUserQuestion:
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
> argues [Y]. [One sentence on what context you might be missing.]"
+>
+> RECOMMENDATION: Choose [A or B] because [one-line reason explaining which argument
+> is more compelling and why]. Completeness: A=X/10, B=Y/10.
Options:
- A) Accept the outside voice's recommendation (I'll apply this change)
@@ -1504,7 +1522,7 @@ Display:
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
-- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
+- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md
index a8c7f264..3c973b10 100644
--- a/plan-design-review/SKILL.md
+++ b/plan-design-review/SKILL.md
@@ -374,6 +374,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
@@ -1203,7 +1218,7 @@ Display:
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
-- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
+- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md
index 1dad9fc0..d2715aac 100644
--- a/plan-eng-review/SKILL.md
+++ b/plan-eng-review/SKILL.md
@@ -375,6 +375,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
@@ -963,6 +978,9 @@ For each substantive tension point, use AskUserQuestion:
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
> argues [Y]. [One sentence on what context you might be missing.]"
+>
+> RECOMMENDATION: Choose [A or B] because [one-line reason explaining which argument
+> is more compelling and why]. Completeness: A=X/10, B=Y/10.
Options:
- A) Accept the outside voice's recommendation (I'll apply this change)
@@ -1149,7 +1167,7 @@ Display:
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
-- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
+- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md
index 2beb599a..63c970ad 100644
--- a/qa-only/SKILL.md
+++ b/qa-only/SKILL.md
@@ -371,6 +371,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/qa/SKILL.md b/qa/SKILL.md
index 94081f20..e2a03226 100644
--- a/qa/SKILL.md
+++ b/qa/SKILL.md
@@ -377,6 +377,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/retro/SKILL.md b/retro/SKILL.md
index 5a84039b..52af68da 100644
--- a/retro/SKILL.md
+++ b/retro/SKILL.md
@@ -353,6 +353,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/review/SKILL.md b/review/SKILL.md
index 3f492d21..3216c747 100644
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -374,6 +374,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
@@ -473,6 +488,31 @@ Before reviewing code quality, check: **did they build what was requested — no
2. Identify the **stated intent** — what was this branch supposed to accomplish?
3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
+
+ **SCOPE CREEP detection:**
+ - Files changed that are unrelated to the stated intent
+ - New features or refactors not mentioned in the plan
+ - "While I was in there..." changes that expand blast radius
+
+ **MISSING REQUIREMENTS detection:**
+ - Requirements from TODOS.md/PR description not addressed in the diff
+ - Test coverage gaps for stated requirements
+ - Partial implementations (started but not finished)
+
+5. Output (before the main review begins):
+ \`\`\`
+ Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
+ Intent: <1-line summary of what was requested>
+ Delivered: <1-line summary of what the diff actually does>
+ [If drift: list each out-of-scope change]
+ [If missing: list each unaddressed requirement]
+ \`\`\`
+
+6. This is **INFORMATIONAL** — does not block the review. Proceed to the next step.
+
+---
+
### Plan File Discovery
1. **Conversation context (primary):** Check if there is an active plan file in this conversation. The host agent's system messages include plan file paths when in plan mode. If found, use it directly — this is the most reliable signal.
@@ -591,31 +631,6 @@ Plan items: N DONE, M PARTIAL, K NOT DONE
**No plan file found:** Fall back to existing scope drift behavior (check TODOS.md and PR description only).
-4. Evaluate with skepticism (incorporating plan completion results if available):
-
- **SCOPE CREEP detection:**
- - Files changed that are unrelated to the stated intent
- - New features or refactors not mentioned in the plan
- - "While I was in there..." changes that expand blast radius
-
- **MISSING REQUIREMENTS detection:**
- - Requirements from TODOS.md/PR description not addressed in the diff
- - Test coverage gaps for stated requirements
- - Partial implementations (started but not finished)
-
-5. Output (before the main review begins):
- ```
- Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
- Intent: <1-line summary of what was requested>
- Delivered: <1-line summary of what the diff actually does>
- [If drift: list each out-of-scope change]
- [If missing: list each unaddressed requirement]
- ```
-
-6. This is **INFORMATIONAL** — does not block the review. Proceed to Step 2.
-
----
-
## Step 2: Read the checklist
Read `.claude/skills/review/checklist.md`.
@@ -1096,9 +1111,9 @@ If no documentation files exist, skip this step silently.
---
-## Step 5.7: Adversarial review (auto-scaled)
+## Step 5.7: Adversarial review (always-on)
-Adversarial review thoroughness scales automatically based on diff size. No configuration needed.
+Every diff gets adversarial review from both Claude and Codex. LOC is not a proxy for risk — a 5-line auth change can be critical.
**Detect diff size and tool availability:**
@@ -1107,30 +1122,34 @@ DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion'
DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
-# Respect old opt-out
+# Legacy opt-out — only gates Codex passes, Claude always runs
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
echo "DIFF_SIZE: $DIFF_TOTAL"
echo "OLD_CFG: ${OLD_CFG:-not_set}"
```
-If `OLD_CFG` is `disabled`: skip this step silently. Continue to the next step.
+If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent still runs (it's free and fast). Jump to the "Claude adversarial subagent" section.
-**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
-
-**Auto-select tier based on diff size:**
-- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
-- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
-- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
+**User override:** If the user explicitly requested "full review", "structured review", or "P1 gate", also run the Codex structured review regardless of diff size.
---
-### Medium tier (50–199 lines)
+### Claude adversarial subagent (always runs)
-Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
+Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
+Subagent prompt:
+"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-**Codex adversarial:**
+Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
+
+If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing."
+
+---
+
+### Codex adversarial challenge (always runs when available)
+
+If Codex is available AND `OLD_CFG` is NOT `disabled`:
```bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
@@ -1150,34 +1169,16 @@ Present the full output verbatim. This is informational — it never blocks ship
- **Timeout:** "Codex timed out after 5 minutes."
- **Empty response:** "Codex returned no response. Stderr: ."
-On any Codex error, fall back to the Claude adversarial subagent automatically.
+**Cleanup:** Run `rm -f "$TMPERR_ADV"` after processing.
-**Claude adversarial subagent** (fallback when Codex unavailable or errored):
-
-Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-
-Subagent prompt:
-"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-
-Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
-
-If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
-
-**Persist the review result:**
-```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"medium","commit":"'"$(git rev-parse --short HEAD)"'"}'
-```
-Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
-
-**Cleanup:** Run `rm -f "$TMPERR_ADV"` after processing (if Codex was used).
+If Codex is NOT available: "Codex CLI not found — running Claude adversarial only. Install Codex for cross-model coverage: `npm install -g @openai/codex`"
---
-### Large tier (200+ lines)
+### Codex structured review (large diffs only, 200+ lines)
-Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
+If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
-**1. Codex structured review (if available):**
```bash
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
@@ -1198,34 +1199,34 @@ B) Continue — review will still complete
If A: address the findings. Re-run `codex review` to verify.
-Read stderr for errors (same error handling as medium tier).
+Read stderr for errors (same error handling as Codex adversarial above).
After stderr: `rm -f "$TMPERR"`
-**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
-
-**3. Codex adversarial challenge (if available):** Run `codex exec` with the adversarial prompt (same as medium tier).
-
-If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: `npm install -g @openai/codex`"
-
-**Persist the review result AFTER all passes complete** (not after each sub-step):
-```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"large","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
-```
-Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+If `DIFF_TOTAL < 200`: skip this section silently. The Claude + Codex adversarial passes provide sufficient coverage for smaller diffs.
---
-### Cross-model synthesis (medium and large tiers)
+### Persist the review result
+
+After all passes complete, persist:
+```bash
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"always","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
+```
+Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), "skipped" if diff < 200, or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+
+---
+
+### Cross-model synthesis
After all passes complete, synthesize findings across all sources:
```
-ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
+ADVERSARIAL REVIEW SYNTHESIS (always-on, N lines):
════════════════════════════════════════════════════════════
High confidence (found by multiple sources): [findings agreed on by >1 pass]
Unique to Claude structured review: [from earlier step]
- Unique to Claude adversarial: [from subagent, if ran]
+ Unique to Claude adversarial: [from subagent]
Unique to Codex: [from codex adversarial or code review, if ran]
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
════════════════════════════════════════════════════════════
diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl
index b748483a..7fb881d6 100644
--- a/review/SKILL.md.tmpl
+++ b/review/SKILL.md.tmpl
@@ -37,43 +37,10 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
---
-## Step 1.5: Scope Drift Detection
-
-Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
-
-1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
- Read commit messages (`git log origin/..HEAD --oneline`).
- **If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
-2. Identify the **stated intent** — what was this branch supposed to accomplish?
-3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+{{SCOPE_DRIFT}}
{{PLAN_COMPLETION_AUDIT_REVIEW}}
-4. Evaluate with skepticism (incorporating plan completion results if available):
-
- **SCOPE CREEP detection:**
- - Files changed that are unrelated to the stated intent
- - New features or refactors not mentioned in the plan
- - "While I was in there..." changes that expand blast radius
-
- **MISSING REQUIREMENTS detection:**
- - Requirements from TODOS.md/PR description not addressed in the diff
- - Test coverage gaps for stated requirements
- - Partial implementations (started but not finished)
-
-5. Output (before the main review begins):
- ```
- Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
- Intent: <1-line summary of what was requested>
- Delivered: <1-line summary of what the diff actually does>
- [If drift: list each out-of-scope change]
- [If missing: list each unaddressed requirement]
- ```
-
-6. This is **INFORMATIONAL** — does not block the review. Proceed to Step 2.
-
----
-
## Step 2: Read the checklist
Read `.claude/skills/review/checklist.md`.
diff --git a/scripts/resolvers/index.ts b/scripts/resolvers/index.ts
index 7ac7f1a2..297b89c9 100644
--- a/scripts/resolvers/index.ts
+++ b/scripts/resolvers/index.ts
@@ -11,7 +11,7 @@ import { generateTestFailureTriage } from './preamble';
import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup } from './browse';
import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch, generateDesignSetup, generateDesignMockup, generateDesignShotgunLoop } from './design';
import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
-import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './review';
+import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec, generateScopeDrift } from './review';
import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer, generateChangelogWorkflow } from './utility';
import { generateLearningsSearch, generateLearningsLog } from './learnings';
import { generateConfidenceCalibration } from './confidence';
@@ -45,6 +45,7 @@ export const RESOLVERS: Record = {
BENEFITS_FROM: generateBenefitsFrom,
CODEX_SECOND_OPINION: generateCodexSecondOpinion,
ADVERSARIAL_STEP: generateAdversarialStep,
+ SCOPE_DRIFT: generateScopeDrift,
DEPLOY_BOOTSTRAP: generateDeployBootstrap,
CODEX_PLAN_REVIEW: generateCodexPlanReview,
PLAN_COMPLETION_AUDIT_SHIP: generatePlanCompletionAuditShip,
diff --git a/scripts/resolvers/preamble.ts b/scripts/resolvers/preamble.ts
index 8cd1b557..7b144b2f 100644
--- a/scripts/resolvers/preamble.ts
+++ b/scripts/resolvers/preamble.ts
@@ -460,6 +460,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- \`$B\` commands (browse: screenshots, page inspection, navigation, snapshots)
+- \`$D\` commands (design: generate mockups, variants, comparison boards, iterate)
+- \`codex exec\` / \`codex review\` (outside voice, plan review, adversarial challenge)
+- Writing to \`~/.gstack/\` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- \`open\` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts
index 5db22644..65f99fb2 100644
--- a/scripts/resolvers/review.ts
+++ b/scripts/resolvers/review.ts
@@ -54,7 +54,7 @@ Display:
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
-- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
+- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
@@ -359,6 +359,50 @@ SECOND OPINION (Claude subagent):
If A: revise the premise and note the revision. If B: proceed (and note that the user defended this premise with reasoning — this is a founder signal if they articulate WHY they disagree, not just dismiss).`;
}
+// ─── Scope Drift Detection (shared between /review and /ship) ────────
+
+export function generateScopeDrift(ctx: TemplateContext): string {
+ const isShip = ctx.skillName === 'ship';
+ const stepNum = isShip ? '3.48' : '1.5';
+
+ return `## Step ${stepNum}: Scope Drift Detection
+
+Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
+
+1. Read \`TODOS.md\` (if it exists). Read PR description (\`gh pr view --json body --jq .body 2>/dev/null || true\`).
+ Read commit messages (\`git log origin/..HEAD --oneline\`).
+ **If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
+2. Identify the **stated intent** — what was this branch supposed to accomplish?
+3. Run \`git diff origin/...HEAD --stat\` and compare the files changed against the stated intent.
+
+4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
+
+ **SCOPE CREEP detection:**
+ - Files changed that are unrelated to the stated intent
+ - New features or refactors not mentioned in the plan
+ - "While I was in there..." changes that expand blast radius
+
+ **MISSING REQUIREMENTS detection:**
+ - Requirements from TODOS.md/PR description not addressed in the diff
+ - Test coverage gaps for stated requirements
+ - Partial implementations (started but not finished)
+
+5. Output (before the main review begins):
+ \\\`\\\`\\\`
+ Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
+ Intent: <1-line summary of what was requested>
+ Delivered: <1-line summary of what the diff actually does>
+ [If drift: list each out-of-scope change]
+ [If missing: list each unaddressed requirement]
+ \\\`\\\`\\\`
+
+6. This is **INFORMATIONAL** — does not block the review. Proceed to the next step.
+
+---`;
+}
+
+// ─── Adversarial Review (always-on) ──────────────────────────────────
+
export function generateAdversarialStep(ctx: TemplateContext): string {
// Codex host: strip entirely — Codex should never invoke itself
if (ctx.host === 'codex') return '';
@@ -366,9 +410,9 @@ export function generateAdversarialStep(ctx: TemplateContext): string {
const isShip = ctx.skillName === 'ship';
const stepNum = isShip ? '3.8' : '5.7';
- return `## Step ${stepNum}: Adversarial review (auto-scaled)
+ return `## Step ${stepNum}: Adversarial review (always-on)
-Adversarial review thoroughness scales automatically based on diff size. No configuration needed.
+Every diff gets adversarial review from both Claude and Codex. LOC is not a proxy for risk — a 5-line auth change can be critical.
**Detect diff size and tool availability:**
@@ -377,30 +421,34 @@ DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion'
DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
-# Respect old opt-out
+# Legacy opt-out — only gates Codex passes, Claude always runs
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
echo "DIFF_SIZE: $DIFF_TOTAL"
echo "OLD_CFG: \${OLD_CFG:-not_set}"
\`\`\`
-If \`OLD_CFG\` is \`disabled\`: skip this step silently. Continue to the next step.
+If \`OLD_CFG\` is \`disabled\`: skip Codex passes only. Claude adversarial subagent still runs (it's free and fast). Jump to the "Claude adversarial subagent" section.
-**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
-
-**Auto-select tier based on diff size:**
-- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
-- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
-- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
+**User override:** If the user explicitly requested "full review", "structured review", or "P1 gate", also run the Codex structured review regardless of diff size.
---
-### Medium tier (50–199 lines)
+### Claude adversarial subagent (always runs)
-Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
+Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
+Subagent prompt:
+"Read the diff for this branch with \`git diff origin/\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-**Codex adversarial:**
+Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
+
+If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing."
+
+---
+
+### Codex adversarial challenge (always runs when available)
+
+If Codex is available AND \`OLD_CFG\` is NOT \`disabled\`:
\`\`\`bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
@@ -420,34 +468,16 @@ Present the full output verbatim. This is informational — it never blocks ship
- **Timeout:** "Codex timed out after 5 minutes."
- **Empty response:** "Codex returned no response. Stderr: ."
-On any Codex error, fall back to the Claude adversarial subagent automatically.
+**Cleanup:** Run \`rm -f "$TMPERR_ADV"\` after processing.
-**Claude adversarial subagent** (fallback when Codex unavailable or errored):
-
-Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-
-Subagent prompt:
-"Read the diff for this branch with \`git diff origin/\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-
-Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
-
-If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
-
-**Persist the review result:**
-\`\`\`bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"medium","commit":"'"$(git rev-parse --short HEAD)"'"}'
-\`\`\`
-Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
-
-**Cleanup:** Run \`rm -f "$TMPERR_ADV"\` after processing (if Codex was used).
+If Codex is NOT available: "Codex CLI not found — running Claude adversarial only. Install Codex for cross-model coverage: \`npm install -g @openai/codex\`"
---
-### Large tier (200+ lines)
+### Codex structured review (large diffs only, 200+ lines)
-Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
+If \`DIFF_TOTAL >= 200\` AND Codex is available AND \`OLD_CFG\` is NOT \`disabled\`:
-**1. Codex structured review (if available):**
\`\`\`bash
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
@@ -468,34 +498,34 @@ B) Continue — review will still complete
If A: address the findings${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Re-run \`codex review\` to verify.
-Read stderr for errors (same error handling as medium tier).
+Read stderr for errors (same error handling as Codex adversarial above).
After stderr: \`rm -f "$TMPERR"\`
-**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
-
-**3. Codex adversarial challenge (if available):** Run \`codex exec\` with the adversarial prompt (same as medium tier).
-
-If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: \`npm install -g @openai/codex\`"
-
-**Persist the review result AFTER all passes complete** (not after each sub-step):
-\`\`\`bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"large","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
-\`\`\`
-Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+If \`DIFF_TOTAL < 200\`: skip this section silently. The Claude + Codex adversarial passes provide sufficient coverage for smaller diffs.
---
-### Cross-model synthesis (medium and large tiers)
+### Persist the review result
+
+After all passes complete, persist:
+\`\`\`bash
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"always","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
+\`\`\`
+Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), "skipped" if diff < 200, or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+
+---
+
+### Cross-model synthesis
After all passes complete, synthesize findings across all sources:
\`\`\`
-ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
+ADVERSARIAL REVIEW SYNTHESIS (always-on, N lines):
════════════════════════════════════════════════════════════
High confidence (found by multiple sources): [findings agreed on by >1 pass]
Unique to Claude structured review: [from earlier step]
- Unique to Claude adversarial: [from subagent, if ran]
+ Unique to Claude adversarial: [from subagent]
Unique to Codex: [from codex adversarial or code review, if ran]
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
════════════════════════════════════════════════════════════
@@ -619,6 +649,9 @@ For each substantive tension point, use AskUserQuestion:
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
> argues [Y]. [One sentence on what context you might be missing.]"
+>
+> RECOMMENDATION: Choose [A or B] because [one-line reason explaining which argument
+> is more compelling and why]. Completeness: A=X/10, B=Y/10.
Options:
- A) Accept the outside voice's recommendation (I'll apply this change)
diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md
index 67657a6b..9724e928 100644
--- a/setup-browser-cookies/SKILL.md
+++ b/setup-browser-cookies/SKILL.md
@@ -285,6 +285,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/setup-deploy/SKILL.md b/setup-deploy/SKILL.md
index 9abfb975..4d174c72 100644
--- a/setup-deploy/SKILL.md
+++ b/setup-deploy/SKILL.md
@@ -356,6 +356,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
diff --git a/ship/SKILL.md b/ship/SKILL.md
index 4519b6e2..94257f29 100644
--- a/ship/SKILL.md
+++ b/ship/SKILL.md
@@ -375,6 +375,21 @@ If you cannot determine the outcome, use "unknown". Both local JSONL and remote
telemetry only run if telemetry is not off. The remote binary additionally requires
the binary to exist.
+## Plan Mode Safe Operations
+
+When in plan mode, these operations are always allowed because they produce
+artifacts that inform the plan, not code changes:
+
+- `$B` commands (browse: screenshots, page inspection, navigation, snapshots)
+- `$D` commands (design: generate mockups, variants, comparison boards, iterate)
+- `codex exec` / `codex review` (outside voice, plan review, adversarial challenge)
+- Writing to `~/.gstack/` (config, analytics, review logs, design artifacts, learnings)
+- Writing to the plan file (already allowed by plan mode)
+- `open` commands for viewing generated artifacts (comparison boards, HTML previews)
+
+These are read-only in spirit — they inspect the live site, generate visual artifacts,
+or get independent opinions. They do NOT modify project source files.
+
## Plan Status Footer
When you are in plan mode and about to call ExitPlanMode:
@@ -525,7 +540,7 @@ Display:
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
-- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
+- **Adversarial Review (automatic):** Always-on for every review. Every diff gets both Claude adversarial subagent and Codex adversarial challenge. Large diffs (200+ lines) additionally get Codex structured review with P1 gate. No configuration needed.
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
**Verdict logic:**
@@ -1423,6 +1438,41 @@ matches a past learning, display:
This makes the compounding visible. The user should see that gstack is getting
smarter on their codebase over time.
+## Step 3.48: Scope Drift Detection
+
+Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
+
+1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
+ Read commit messages (`git log origin/..HEAD --oneline`).
+ **If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
+2. Identify the **stated intent** — what was this branch supposed to accomplish?
+3. Run `git diff origin/...HEAD --stat` and compare the files changed against the stated intent.
+
+4. Evaluate with skepticism (incorporating plan completion results if available from an earlier step or adjacent section):
+
+ **SCOPE CREEP detection:**
+ - Files changed that are unrelated to the stated intent
+ - New features or refactors not mentioned in the plan
+ - "While I was in there..." changes that expand blast radius
+
+ **MISSING REQUIREMENTS detection:**
+ - Requirements from TODOS.md/PR description not addressed in the diff
+ - Test coverage gaps for stated requirements
+ - Partial implementations (started but not finished)
+
+5. Output (before the main review begins):
+ \`\`\`
+ Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
+ Intent: <1-line summary of what was requested>
+ Delivered: <1-line summary of what the diff actually does>
+ [If drift: list each out-of-scope change]
+ [If missing: list each unaddressed requirement]
+ \`\`\`
+
+6. This is **INFORMATIONAL** — does not block the review. Proceed to the next step.
+
+---
+
---
## Step 3.5: Pre-Landing Review
@@ -1590,9 +1640,9 @@ For each classified comment:
---
-## Step 3.8: Adversarial review (auto-scaled)
+## Step 3.8: Adversarial review (always-on)
-Adversarial review thoroughness scales automatically based on diff size. No configuration needed.
+Every diff gets adversarial review from both Claude and Codex. LOC is not a proxy for risk — a 5-line auth change can be critical.
**Detect diff size and tool availability:**
@@ -1601,30 +1651,34 @@ DIFF_INS=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion'
DIFF_DEL=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
-# Respect old opt-out
+# Legacy opt-out — only gates Codex passes, Claude always runs
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
echo "DIFF_SIZE: $DIFF_TOTAL"
echo "OLD_CFG: ${OLD_CFG:-not_set}"
```
-If `OLD_CFG` is `disabled`: skip this step silently. Continue to the next step.
+If `OLD_CFG` is `disabled`: skip Codex passes only. Claude adversarial subagent still runs (it's free and fast). Jump to the "Claude adversarial subagent" section.
-**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
-
-**Auto-select tier based on diff size:**
-- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
-- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
-- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
+**User override:** If the user explicitly requested "full review", "structured review", or "P1 gate", also run the Codex structured review regardless of diff size.
---
-### Medium tier (50–199 lines)
+### Claude adversarial subagent (always runs)
-Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
+Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
+Subagent prompt:
+"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-**Codex adversarial:**
+Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
+
+If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing."
+
+---
+
+### Codex adversarial challenge (always runs when available)
+
+If Codex is available AND `OLD_CFG` is NOT `disabled`:
```bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
@@ -1644,34 +1698,16 @@ Present the full output verbatim. This is informational — it never blocks ship
- **Timeout:** "Codex timed out after 5 minutes."
- **Empty response:** "Codex returned no response. Stderr: ."
-On any Codex error, fall back to the Claude adversarial subagent automatically.
+**Cleanup:** Run `rm -f "$TMPERR_ADV"` after processing.
-**Claude adversarial subagent** (fallback when Codex unavailable or errored):
-
-Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
-
-Subagent prompt:
-"Read the diff for this branch with `git diff origin/`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
-
-Present findings under an `ADVERSARIAL REVIEW (Claude subagent):` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
-
-If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
-
-**Persist the review result:**
-```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"medium","commit":"'"$(git rev-parse --short HEAD)"'"}'
-```
-Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
-
-**Cleanup:** Run `rm -f "$TMPERR_ADV"` after processing (if Codex was used).
+If Codex is NOT available: "Codex CLI not found — running Claude adversarial only. Install Codex for cross-model coverage: `npm install -g @openai/codex`"
---
-### Large tier (200+ lines)
+### Codex structured review (large diffs only, 200+ lines)
-Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
+If `DIFF_TOTAL >= 200` AND Codex is available AND `OLD_CFG` is NOT `disabled`:
-**1. Codex structured review (if available):**
```bash
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
@@ -1692,34 +1728,34 @@ B) Continue — review will still complete
If A: address the findings. After fixing, re-run tests (Step 3) since code has changed. Re-run `codex review` to verify.
-Read stderr for errors (same error handling as medium tier).
+Read stderr for errors (same error handling as Codex adversarial above).
After stderr: `rm -f "$TMPERR"`
-**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
-
-**3. Codex adversarial challenge (if available):** Run `codex exec` with the adversarial prompt (same as medium tier).
-
-If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: `npm install -g @openai/codex`"
-
-**Persist the review result AFTER all passes complete** (not after each sub-step):
-```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"large","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
-```
-Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+If `DIFF_TOTAL < 200`: skip this section silently. The Claude + Codex adversarial passes provide sufficient coverage for smaller diffs.
---
-### Cross-model synthesis (medium and large tiers)
+### Persist the review result
+
+After all passes complete, persist:
+```bash
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"always","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
+```
+Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), "skipped" if diff < 200, or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
+
+---
+
+### Cross-model synthesis
After all passes complete, synthesize findings across all sources:
```
-ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
+ADVERSARIAL REVIEW SYNTHESIS (always-on, N lines):
════════════════════════════════════════════════════════════
High confidence (found by multiple sources): [findings agreed on by >1 pass]
Unique to Claude structured review: [from earlier step]
- Unique to Claude adversarial: [from subagent, if ran]
+ Unique to Claude adversarial: [from subagent]
Unique to Codex: [from codex adversarial or code review, if ran]
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
════════════════════════════════════════════════════════════
@@ -1976,6 +2012,10 @@ you missed it.>
+## Scope Drift
+
+
+
## Plan Completion
diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl
index 993a67a5..a20e614a 100644
--- a/ship/SKILL.md.tmpl
+++ b/ship/SKILL.md.tmpl
@@ -232,6 +232,8 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
{{LEARNINGS_SEARCH}}
+{{SCOPE_DRIFT}}
+
---
## Step 3.5: Pre-Landing Review
@@ -509,6 +511,10 @@ you missed it.>
+## Scope Drift
+
+
+
## Plan Completion
diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts
index e2dcb67e..91240e23 100644
--- a/test/skill-validation.test.ts
+++ b/test/skill-validation.test.ts
@@ -1305,38 +1305,49 @@ describe('Codex skill', () => {
expect(content).toContain('mktemp');
});
- test('adversarial review in /review auto-scales by diff size', () => {
+ test('adversarial review in /review always runs both passes', () => {
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
- expect(content).toContain('Adversarial review (auto-scaled)');
- // Diff size thresholds
- expect(content).toContain('< 50');
- expect(content).toContain('50–199');
- expect(content).toContain('200+');
- // All three tiers present
- expect(content).toContain('Small');
- expect(content).toContain('Medium tier');
- expect(content).toContain('Large tier');
+ expect(content).toContain('Adversarial review (always-on)');
+ // Always-on: both Claude and Codex adversarial
+ expect(content).toContain('Claude adversarial subagent (always runs)');
+ expect(content).toContain('Codex adversarial challenge (always runs when available)');
// Claude adversarial subagent dispatch
expect(content).toContain('Agent tool');
expect(content).toContain('FIXABLE');
expect(content).toContain('INVESTIGATE');
- // Codex fallback logic
+ // Codex availability check
expect(content).toContain('CODEX_NOT_AVAILABLE');
- expect(content).toContain('fall back to the Claude adversarial subagent');
- // Review log uses new skill name
+ // OLD_CFG only gates Codex, not Claude
+ expect(content).toContain('skip Codex passes only');
+ // Review log
expect(content).toContain('adversarial-review');
expect(content).toContain('reasoning_effort="high"');
expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS');
+ // Large diff structured review still gated
+ expect(content).toContain('Codex structured review (large diffs only');
+ expect(content).toContain('200');
});
- test('adversarial review in /ship auto-scales by diff size', () => {
+ test('adversarial review in /ship always runs both passes', () => {
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
- expect(content).toContain('Adversarial review (auto-scaled)');
- expect(content).toContain('< 50');
- expect(content).toContain('200+');
+ expect(content).toContain('Adversarial review (always-on)');
expect(content).toContain('adversarial-review');
expect(content).toContain('reasoning_effort="high"');
expect(content).toContain('Investigate and fix');
+ expect(content).toContain('Claude adversarial subagent (always runs)');
+ });
+
+ test('scope drift detection in /review and /ship', () => {
+ const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
+ const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
+ // Both should contain scope drift from the shared resolver
+ for (const content of [reviewContent, shipContent]) {
+ expect(content).toContain('Scope Check:');
+ expect(content).toContain('DRIFT DETECTED');
+ expect(content).toContain('SCOPE CREEP');
+ expect(content).toContain('MISSING REQUIREMENTS');
+ expect(content).toContain('stated intent');
+ }
});
test('codex-host ship/review do NOT contain adversarial review step', () => {