From 3b62fec470bc559f2d80b05a6bb5d1435c1b56fd Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Sat, 28 Mar 2026 08:11:39 -0700
Subject: [PATCH] =?UTF-8?q?fix:=20autoplan=20dual-voice=20=E2=80=94=20sequ?=
 =?UTF-8?q?ential=20foreground=20execution=20instead=20of=20broken=20paral?=
 =?UTF-8?q?lel?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Background subagents don't inherit tool permissions in Claude Code, so the
Claude subagent in dual-voice mode was silently failing on every invocation.
Every autoplan run was degrading to single-reviewer mode without warning.

Change all three phases (CEO, Design, Eng) from "simultaneously" to
sequential foreground execution: Claude subagent first (Agent tool,
foreground), then Codex (Bash). Both complete before the consensus table.

Fixes #497

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 autoplan/SKILL.md.tmpl | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/autoplan/SKILL.md.tmpl b/autoplan/SKILL.md.tmpl
index 16c35adc..19809947 100644
--- a/autoplan/SKILL.md.tmpl
+++ b/autoplan/SKILL.md.tmpl
@@ -207,7 +207,9 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   Duplicates → reject (P4). Borderline (3-5 files) → mark TASTE DECISION.
 - All 10 review sections: run fully, auto-decide each issue, log every decision.
 - Dual voices: always run BOTH Claude subagent AND Codex if available (P6).
-  Run them simultaneously (Agent tool for subagent, Bash for Codex).
+  Run them sequentially in foreground. First the Claude subagent (Agent tool,
+  foreground — do NOT use run_in_background), then Codex (Bash). Both must
+  complete before building the consensus table.
 
   **Codex CEO voice** (via Bash):
   ```bash
@@ -234,7 +236,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   5. What's the competitive risk — could someone else solve this first/better?
   For each finding: what's wrong, severity (critical/high/medium), and the fix."
 
-  **Error handling:** All non-blocking. Codex auth/timeout/empty → proceed with
+  **Error handling:** Both calls block in foreground. Codex auth/timeout/empty → proceed with
   Claude subagent only, tagged `[single-model]`. If Claude subagent also fails →
   "Outside voices unavailable — continuing with primary review."
 
@@ -255,10 +257,10 @@ Step 0 (0A-0F) — run each sub-step and produce:
 - 0E: Temporal interrogation (HOUR 1 → HOUR 6+)
 - 0F: Mode selection confirmation
 
-Step 0.5 (Dual Voices): Run Claude subagent AND Codex simultaneously. Present
-Codex output under CODEX SAYS (CEO — strategy challenge) header. Present subagent
-output under CLAUDE SUBAGENT (CEO — strategic independence) header. Produce CEO
-consensus table:
+Step 0.5 (Dual Voices): Run Claude subagent (foreground Agent tool) first, then
+Codex (Bash). Present Codex output under CODEX SAYS (CEO — strategy challenge)
+header. Present subagent output under CLAUDE SUBAGENT (CEO — strategic independence)
+header. Produce CEO consensus table:
 
 ```
 CEO DUAL VOICES — CONSENSUS TABLE:
@@ -351,7 +353,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   For each finding: what's wrong, severity (critical/high/medium), and the fix."
   NO prior-phase context — subagent must be truly independent.
 
-  Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
+  Error handling: same as Phase 1 (both foreground/blocking, degradation matrix applies).
 
 - Design choices: if codex disagrees with a design decision with valid UX reasoning
   → TASTE DECISION.
@@ -360,7 +362,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
 
 1. Step 0 (Design Scope): Rate completeness 0-10. Check DESIGN.md. Map existing patterns.
 
-2. Step 0.5 (Dual Voices): Run Claude subagent AND Codex simultaneously. Present under
+2. Step 0.5 (Dual Voices): Run Claude subagent (foreground) first, then Codex. Present under
    CODEX SAYS (design — UX challenge) and CLAUDE SUBAGENT (design — independent review)
    headers. Produce design litmus scorecard (consensus table). Use the litmus scorecard
    format from plan-design-review. Include CEO phase findings in Codex prompt ONLY
@@ -421,7 +423,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   For each finding: what's wrong, severity, and the fix."
   NO prior-phase context — subagent must be truly independent.
 
-  Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
+  Error handling: same as Phase 1 (both foreground/blocking, degradation matrix applies).
 
 - Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
 - Evals: always include all relevant suites (P1)
@@ -433,7 +435,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
 1. Step 0 (Scope Challenge): Read actual code referenced by the plan. Map each
    sub-problem to existing code. Run the complexity check. Produce concrete findings.
 
-2. Step 0.5 (Dual Voices): Run Claude subagent AND Codex simultaneously. Present
+2. Step 0.5 (Dual Voices): Run Claude subagent (foreground) first, then Codex. Present
    Codex output under CODEX SAYS (eng — architecture challenge) header. Present subagent
    output under CLAUDE SUBAGENT (eng — independent review) header. Produce eng consensus
    table: