feat(plan-tune): explicit-consent surface + setup gate for question_tuning

Step 0 grows two implicit gates that run before user-intent routing: - Consent gate: question_tuning=false + no marker → offer opt-in (contributor-specific copy variant) - Setup gate: question_tuning=true + declared empty + no marker → run 5-Q wizard Markers (~/.gstack/.question-tuning-prompted, ~/.gstack/.declared-setup-prompted) ensure each user is asked at most once. The Enable+setup section split into "Consent + opt-in" (with contributor framing) and standalone "5-Q setup" reachable from both the consent flow and the setup gate. Also aligns the calibration gate across three docs (V0 said 90+ days, TODOS said 2+ weeks, binary uses 7 days). The fix distinguishes: - Display gate (sample_size>=20, skills>=3, question_ids>=8, days_span>=7): for rendering inferred values in /plan-tune output - Promotion gate (90+ days stable across 3+ skills): for shipping E1 behavior-adapting defaults TODOS.md E1 card updated to reference 90+ days, plus Codex's substrate risk note: generated skill prose is agent-compliance-based, so E1 ships as advisory annotations on AskUserQuestion recommendations, not silent AUTO_DECIDE. Tests can verify templates contain right reads but can't prove agents obey them. Per /plan-eng-review + Codex outside-voice 2026-05-26. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-21 01:00:10 +02:00 · 2026-05-26 22:58:05 -07:00
parent 22f8c7f4e1
commit 9cc211f66a
3 changed files with 216 additions and 49 deletions
@@ -582,7 +582,24 @@ reads it yet.

 **Effort:** L (human: ~1 week / CC: ~4h)
 **Priority:** P0
-**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
+**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
+`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
+Distinct from the lighter-weight diversity-display gate
+(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
+AND days_span >= 7`) used in /plan-tune to render the inferred column —
+display is a UI affordance, promotion to E1 needs a much higher bar
+because behavioral adaptation is consequential and hard to revert. Prior
+versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
+
+**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
+skill prose is agent-compliance-based. Tests can verify templates contain the
+right reads of `~/.gstack/developer-profile.json` and the right decision
+points, but tests cannot prove agents obey them at runtime. E1 ships
+adaptations as **advisory annotations on AskUserQuestion recommendations**
+("Recommended via your profile: <choice>") until there's a hard runtime
+execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
+of E1; explicit per-question preferences remain the only AUTO_DECIDE
+source.

 ### E3 — `/plan-tune narrative` + `/plan-tune vibe`