feat(plan-tune): explicit-consent surface + setup gate for question_tuning

Step 0 grows two implicit gates that run before user-intent routing:
- Consent gate: question_tuning=false + no marker → offer opt-in (contributor-specific copy variant)
- Setup gate: question_tuning=true + declared empty + no marker → run 5-Q wizard

Markers (~/.gstack/.question-tuning-prompted, ~/.gstack/.declared-setup-prompted)
ensure each user is asked at most once. The Enable+setup section split into
"Consent + opt-in" (with contributor framing) and standalone "5-Q setup"
reachable from both the consent flow and the setup gate.

Also aligns the calibration gate across three docs (V0 said 90+ days, TODOS
said 2+ weeks, binary uses 7 days). The fix distinguishes:
- Display gate (sample_size>=20, skills>=3, question_ids>=8, days_span>=7):
  for rendering inferred values in /plan-tune output
- Promotion gate (90+ days stable across 3+ skills): for shipping E1
  behavior-adapting defaults

TODOS.md E1 card updated to reference 90+ days, plus Codex's substrate risk
note: generated skill prose is agent-compliance-based, so E1 ships as
advisory annotations on AskUserQuestion recommendations, not silent
AUTO_DECIDE. Tests can verify templates contain right reads but can't
prove agents obey them.

Per /plan-eng-review + Codex outside-voice 2026-05-26.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-26 22:58:05 -07:00
parent 22f8c7f4e1
commit 9cc211f66a
3 changed files with 216 additions and 49 deletions
+18 -1
View File
@@ -582,7 +582,24 @@ reads it yet.
**Effort:** L (human: ~1 week / CC: ~4h)
**Priority:** P0
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
Distinct from the lighter-weight diversity-display gate
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
AND days_span >= 7`) used in /plan-tune to render the inferred column —
display is a UI affordance, promotion to E1 needs a much higher bar
because behavioral adaptation is consequential and hard to revert. Prior
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
skill prose is agent-compliance-based. Tests can verify templates contain the
right reads of `~/.gstack/developer-profile.json` and the right decision
points, but tests cannot prove agents obey them at runtime. E1 ships
adaptations as **advisory annotations on AskUserQuestion recommendations**
("Recommended via your profile: <choice>") until there's a hard runtime
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
of E1; explicit per-question preferences remain the only AUTO_DECIDE
source.
### E3 — `/plan-tune narrative` + `/plan-tune vibe`