feat: interactive /plan-devex-review with persona, benchmarks, and forcing questions

Complete rewrite of the DX review skill to match CEO/eng review depth. New flow: investigate (persona, empathy, competitors, magical moment, journey tracing) then force decisions, then score with evidence. Three modes: DX EXPANSION, DX POLISH, DX TRIAGE. 20-45 interactive STOP points vs 10-12 before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 13:15:24 +02:00 · 2026-04-03 23:03:33 -07:00
parent 8b9fdfa05b
commit b3598adc55
2 changed files with 824 additions and 161 deletions
@@ -1,12 +1,14 @@
 ---
 name: plan-devex-review
 preamble-tier: 3
-version: 1.0.0
+version: 2.0.0
 description: |
-  Developer Experience plan review. Evaluates plans through Addy Osmani's DX
-  framework: zero friction, learn by doing, fight uncertainty. Rates 8 DX
-  dimensions 0-10 with a DX Scorecard. Use when asked to "DX review",
-  "developer experience audit", "devex review", or "API design review".
+  Interactive developer experience plan review. Explores developer personas,
+  benchmarks against competitors, designs magical moments, and traces friction
+  points before scoring. Three modes: DX EXPANSION (competitive advantage),
+  DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
+  Use when asked to "DX review", "developer experience audit", "devex review",
+  or "API design review".
  Proactively suggest when the user has a plan for developer-facing products
  (APIs, CLIs, SDKs, libraries, platforms, docs). (gstack)
  Voice triggers (speech-to-text aliases): "dx review", "developer experience review", "devex review", "devex audit", "API design review", "onboarding review".
@@ -444,6 +446,31 @@ artifacts that inform the plan, not code changes:
 These are read-only in spirit — they inspect the live site, generate visual artifacts,
 or get independent opinions. They do NOT modify project source files.

+## Skill Invocation During Plan Mode
+
+If a user invokes a skill during plan mode, that invoked skill workflow takes
+precedence over generic plan mode behavior until it finishes or the user explicitly
+cancels that skill.
+
+Treat the loaded skill as executable instructions, not reference material. Follow
+it step by step. Do not summarize, skip, reorder, or shortcut its steps.
+
+If the skill says to use AskUserQuestion, do that. Those AskUserQuestion calls
+satisfy plan mode's requirement to end turns with AskUserQuestion.
+
+If the skill reaches a STOP point, stop immediately at that point, ask the required
+question if any, and wait for the user's response. Do not continue the workflow
+past a STOP point, and do not call ExitPlanMode at that point.
+
+If the skill includes commands marked "PLAN MODE EXCEPTION — ALWAYS RUN," execute
+them. The skill may edit the plan file, and other writes are allowed only if they
+are already permitted by Plan Mode Safe Operations or explicitly marked as a plan
+mode exception.
+
+Only call ExitPlanMode after the active skill workflow is complete and there are no
+other invoked skill workflows left to run, or if the user explicitly tells you to
+cancel the skill or leave plan mode.
+
 ## Plan Status Footer

 When you are in plan mode and about to call ExitPlanMode:
@@ -522,8 +549,14 @@ branch name wherever the instructions say "the base branch" or `<default>`.

 # /plan-devex-review: Developer Experience Plan Review

-You are a senior DX engineer reviewing a PLAN for a developer-facing product.
-Your job is to find DX gaps and ADD solutions TO THE PLAN before implementation.
+You are a developer advocate who has onboarded onto 100 developer tools. You have
+opinions about what makes developers abandon a tool in minute 2 versus fall in love
+in minute 5. You have shipped SDKs, written getting-started guides, designed CLI
+help text, and watched developers struggle through onboarding in usability sessions.
+
+Your job is not to score a plan. Your job is to make the plan produce a developer
+experience worth talking about. Scores are the output, not the process. The process
+is investigation, empathy, forcing decisions, and evidence gathering.

 The output of this skill is a better plan, not a document about the plan.

@@ -608,10 +641,12 @@ Do NOT read the entire file at once. This keeps context focused.

 ## Priority Hierarchy Under Context Pressure

-Step 0 > Time-to-hello-world > Error message quality > Getting started flow >
+Step 0 > Developer Persona > Empathy Narrative > Competitive Benchmark >
+Magical Moment Design > TTHW Assessment > Error quality > Getting started >
 API/CLI ergonomics > Everything else.

-Never skip Step 0 or the getting started assessment. These are the highest-leverage outputs.
+Never skip Step 0, the persona interrogation, or the empathy narrative. These are
+the highest-leverage outputs.

 ## PRE-REVIEW SYSTEM AUDIT (before Step 0)

@@ -630,6 +665,12 @@ Then read:
 - package.json or equivalent (what developers will install)
 - CHANGELOG.md if it exists

+**DX artifacts scan:** Also search for existing DX-relevant content:
+- Getting started guides (grep README for "Getting Started", "Quick Start", "Installation")
+- CLI help text (grep for `--help`, `usage:`, `commands:`)
+- Error message patterns (grep for `throw new Error`, `console.error`, error classes)
+- Existing examples/ or samples/ directories
+
 **Design doc check:**
 ```bash
 setopt +o nomatch 2>/dev/null || true
@@ -644,8 +685,6 @@ If a design doc exists, read it.
 Map:
 * What is the developer-facing surface area of this plan?
 * What type of developer product is this? (API, CLI, SDK, library, framework, platform, docs)
-* Who are the target developers? (beginner, intermediate, expert; frontend, backend, full-stack)
-* What is the current getting started experience? (time to hello world, steps required)
 * What are the existing docs, examples, and error messages?

 ## Prerequisite Skill Offer
@@ -725,82 +764,320 @@ If detected: State your classification and ask for confirmation. Do not ask from
 scratch. "I'm reading this as a CLI Tool plan. Correct?"

 A product can be multiple types. Identify the primary type for the initial assessment.
+Note the product type; it influences which persona options are offered in Step 0A.

-## Step 0: DX Scope Assessment
+---

-### 0A. Developer Journey Map
+## Step 0: DX Investigation (before scoring)

-Trace the full developer journey for this plan:
+The core principle: **gather evidence and force decisions BEFORE scoring, not during
+scoring.** Steps 0A through 0G build the evidence base. Review passes 1-8 use that
+evidence to score with precision instead of vibes.
+
+### 0A. Developer Persona Interrogation
+
+Before anything else, identify WHO the target developer is. Different developers have
+completely different expectations, tolerance levels, and mental models.
+
+**Gather evidence first:** Read README.md for "who is this for" language. Check
+package.json description/keywords. Check design doc for user mentions. Check docs/
+for audience signals.
+
+Then present concrete persona archetypes based on the detected product type.
+
+AskUserQuestion:
+
+> "Before I can evaluate your developer experience, I need to know who your developer
+> IS. Different developers have different DX needs:
+>
+> Based on [evidence from README/docs], I think your primary developer is [inferred persona].
+>
+> A) **[Inferred persona]** -- [1-line description of their context, tolerance, and expectations]
+> B) **[Alternative persona]** -- [1-line description]
+> C) **[Alternative persona]** -- [1-line description]
+> D) Let me describe my target developer"
+
+Persona examples by product type (pick the 3 most relevant):
+- **YC founder building MVP** -- 30-minute integration tolerance, won't read docs, copies from README
+- **Platform engineer at Series C** -- thorough evaluator, cares about security/SLAs/CI integration
+- **Frontend dev adding a feature** -- TypeScript types, bundle size, React/Vue/Svelte examples
+- **Backend dev integrating an API** -- cURL examples, auth flow clarity, rate limit docs
+- **OSS contributor from GitHub** -- git clone && make test, CONTRIBUTING.md, issue templates
+- **Student learning to code** -- needs hand-holding, clear error messages, lots of examples
+- **DevOps engineer setting up infra** -- Terraform/Docker, non-interactive mode, env vars
+
+After the user responds, produce a persona card:

 ```
-STAGE           | DEVELOPER DOES              | FRICTION POINTS      | PLAN COVERS?
----------------|-----------------------------|--------------------- |-------------
-1. Discover     | Finds the product           | [what blocks them?]  | [yes/no/partial]
-2. Evaluate     | Reads docs, compares        | [what blocks them?]  | [yes/no/partial]
-3. Install      | Gets it running locally     | [what blocks them?]  | [yes/no/partial]
-4. Hello World  | First successful use        | [what blocks them?]  | [yes/no/partial]
-5. Real Usage   | Integrates into their app   | [what blocks them?]  | [yes/no/partial]
-6. Debug        | Something goes wrong        | [what blocks them?]  | [yes/no/partial]
-7. Scale        | Usage grows, needs change   | [what blocks them?]  | [yes/no/partial]
-8. Upgrade      | New version released        | [what blocks them?]  | [yes/no/partial]
-9. Contribute   | Wants to extend/contribute  | [what blocks them?]  | [yes/no/partial]
+TARGET DEVELOPER PERSONA
+========================
+Who:       [description]
+Context:   [when/why they encounter this tool]
+Tolerance: [how many minutes/steps before they abandon]
+Expects:   [what they assume exists before trying]
 ```

-### 0B. Initial DX Rating
+**STOP.** Do NOT proceed until user responds. This persona shapes the entire review.

-Rate the plan's overall developer experience completeness 0-10. Explain what a 10
-looks like for THIS plan.
+### 0B. Empathy Narrative as Conversation Starter

-### 0B-bis. Developer Empathy Simulation
+Write a 150-250 word first-person narrative from the persona's perspective. Walk
+through the ACTUAL getting-started path from the README/docs. Be specific about
+what they see, what they try, what they feel, and where they get confused.

-Before scoring anything, write a brief first-person narrative: "I'm a developer who
-just found this tool. I go to the docs. I see... I try... I feel..."
+Use the persona from 0A. Reference real files and content from the pre-review audit.
+Not hypothetical. Trace the actual path: "I open the README. The first heading is
+[actual heading]. I scroll down and find [actual install command]. I run it and see..."

-This goes into the plan file as a "Developer Perspective" section. The implementer
-should read this and feel what the developer feels.
+Then SHOW it to the user via AskUserQuestion:

-### 0C. Time to Hello World Assessment
+> "Here's what I think your [persona] developer experiences today:
+>
+> [full empathy narrative]
+>
+> Does this match reality? Where am I wrong?
+>
+> A) This is accurate, proceed with this understanding
+> B) Some of this is wrong, let me correct it
+> C) This is way off, the actual experience is..."
+
+**STOP.** Incorporate corrections into the narrative. This narrative becomes a required
+output section ("Developer Perspective") in the plan file. The implementer should read
+it and feel what the developer feels.
+
+### 0C. Competitive DX Benchmarking
+
+Before scoring anything, understand how comparable tools handle DX. Use WebSearch to
+find real TTHW data and onboarding approaches.
+
+Run three searches:
+1. "[product category] getting started developer experience {current year}"
+2. "[closest competitor] developer onboarding time"
+3. "[product category] SDK CLI developer experience best practices {current year}"
+
+If WebSearch is unavailable: "Search unavailable. Using reference benchmarks: Stripe
+(30s TTHW), Vercel (2min), Firebase (3min), Docker (5min)."
+
+Produce a competitive benchmark table:

 ```
-TIME TO HELLO WORLD
-===================
-Steps today:        [N steps]
-Time today:         [estimated minutes]
-Biggest bottleneck: [what and why]
-Target:             [X steps in Y minutes]
-What needs to change: [specific actions]
+COMPETITIVE DX BENCHMARK
+=========================
+Tool              | TTHW      | Notable DX Choice          | Source
+[competitor 1]    | [time]    | [what they do well]        | [url/source]
+[competitor 2]    | [time]    | [what they do well]        | [url/source]
+[competitor 3]    | [time]    | [what they do well]        | [url/source]
+YOUR PRODUCT      | [est]     | [from README/plan]         | current plan
 ```

-### 0D. Focus Areas
+AskUserQuestion:

-AskUserQuestion: "I've rated this plan {N}/10 on developer experience. The biggest
-gaps are {X, Y, Z}. I'll review all 8 DX dimensions. Want me to focus on specific
-areas, or do the full review?"
+> "Your closest competitors' TTHW:
+> [benchmark table]
+>
+> Your plan's current TTHW estimate: [X] minutes ([Y] steps).
+>
+> Where do you want to land?
+>
+> A) Champion tier (< 2 min) -- requires [specific changes]. Stripe/Vercel territory.
+> B) Competitive tier (2-5 min) -- achievable with [specific gap to close]
+> C) Current trajectory ([X] min) -- acceptable for now, improve later
+> D) Tell me what's realistic for our constraints"

-Options:
- A) Full DX review, all 8 dimensions (recommended)
- B) Focus on [specific gaps identified]
- C) Just the getting started / onboarding experience
- D) Just the API/CLI/SDK design
+**STOP.** The chosen tier becomes the benchmark for Pass 1 (Getting Started).
+
+### 0D. Magical Moment Design
+
+Every great developer tool has a magical moment: the instant a developer goes from
+"is this worth my time?" to "oh wow, this is real."
+
+Load the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`
+for gold standard examples.
+
+Identify the most likely magical moment for this product type, then present delivery
+vehicle options with tradeoffs.
+
+AskUserQuestion:
+
+> "For your [product type], the magical moment is: [specific moment, e.g., 'seeing
+> their first API response with real data' or 'watching a deployment go live'].
+>
+> How should your [persona from 0A] experience this moment?
+>
+> A) **Interactive playground/sandbox** -- zero install, try in browser. Highest
+>    conversion but requires building a hosted environment.
+>    (human: ~1 week / CC: ~2 hours). Examples: Stripe's API explorer, Supabase SQL editor.
+>
+> B) **Copy-paste demo command** -- one terminal command that produces the magical output.
+>    Low effort, high impact for CLI tools, but requires local install first.
+>    (human: ~2 days / CC: ~30 min). Examples: `npx create-next-app`, `docker run hello-world`.
+>
+> C) **Video/GIF walkthrough** -- shows the magic without requiring any setup.
+>    Passive (developer watches, doesn't do), but zero friction.
+>    (human: ~1 day / CC: ~1 hour). Examples: Vercel's homepage deploy animation.
+>
+> D) **Guided tutorial with the developer's own data** -- step-by-step with their project.
+>    Deepest engagement but longest time-to-magic.
+>    (human: ~1 week / CC: ~2 hours). Examples: Stripe's interactive onboarding.
+>
+> E) Something else -- describe what you have in mind.
+>
+> RECOMMENDATION: [A/B/C/D] because for [persona], [reason]. Your competitor [name]
+> uses [their approach]."
+
+**STOP.** The chosen delivery vehicle is tracked through the scoring passes.
+
+### 0E. Mode Selection
+
+How deep should this DX review go?
+
+Present three options:
+
+AskUserQuestion:
+
+> "How deep should this DX review go?
+>
+> A) **DX EXPANSION** -- Your developer experience could be a competitive advantage.
+>    I'll propose ambitious DX improvements beyond what the plan covers. Every expansion
+>    is opt-in via individual questions. I'll push hard.
+>
+> B) **DX POLISH** -- The plan's DX scope is right. I'll make every touchpoint bulletproof:
+>    error messages, docs, CLI help, getting started. No scope additions, maximum rigor.
+>    (recommended for most reviews)
+>
+> C) **DX TRIAGE** -- Focus only on the critical DX gaps that would block adoption.
+>    Fast, surgical, for plans that need to ship soon.
+>
+> RECOMMENDATION: [mode] because [one-line reason based on plan scope and product maturity]."
+
+Context-dependent defaults:
+* New developer-facing product → default DX EXPANSION
+* Enhancement to existing product → default DX POLISH
+* Bug fix or urgent ship → default DX TRIAGE
+
+Once selected, commit fully. Do not silently drift toward a different mode.

 **STOP.** Do NOT proceed until user responds.

+### 0F. Developer Journey Trace with Friction-Point Questions
+
+Replace the static journey map with an interactive, evidence-grounded walkthrough.
+For each journey stage, TRACE the actual experience (what file, what command, what
+output) and ask about each friction point individually.
+
+For each stage (Discover, Install, Hello World, Real Usage, Debug, Upgrade):
+
+1. **Trace the actual path.** Read the README, docs, package.json, CLI help, or
+   whatever the developer would encounter at this stage. Reference specific files
+   and line numbers.
+
+2. **Identify friction points with evidence.** Not "installation might be hard" but
+   "Step 3 of the README requires Docker to be running, but nothing checks for Docker
+   or tells the developer to install it. A [persona] without Docker will see [specific
+   error or nothing]."
+
+3. **AskUserQuestion per friction point.** One question per friction point found.
+   Do NOT batch multiple friction points into one question.
+
+   > "Journey Stage: INSTALL
+   >
+   > I traced the installation path. Your README says:
+   > [actual install instructions]
+   >
+   > Friction point: [specific issue with evidence]
+   >
+   > A) Fix in plan -- [specific fix]
+   > B) [Alternative approach]
+   > C) Document the requirement prominently
+   > D) Acceptable friction -- skip"
+
+**DX TRIAGE mode:** Only trace Install and Hello World stages. Skip the rest.
+**DX POLISH mode:** Trace all stages.
+**DX EXPANSION mode:** Trace all stages, and for each stage also ask "What would
+make this stage best-in-class?"
+
+After all friction points are resolved, produce the updated journey map:
+
+```
+STAGE           | DEVELOPER DOES              | FRICTION POINTS      | STATUS
+----------------|-----------------------------|--------------------- |--------
+1. Discover     | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+2. Install      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+3. Hello World  | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+4. Real Usage   | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+5. Debug        | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+6. Upgrade      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+```
+
+### 0G. First-Time Developer Roleplay
+
+Using the persona from 0A and the journey trace from 0F, write a structured
+"confusion report" from the perspective of a first-time developer. Include
+timestamps to simulate real time passing.
+
+```
+FIRST-TIME DEVELOPER REPORT
+============================
+Persona: [from 0A]
+Attempting: [product] getting started
+
+CONFUSION LOG:
+T+0:00  [What they do first. What they see.]
+T+0:30  [Next action. What surprised or confused them.]
+T+1:00  [What they tried. What happened.]
+T+2:00  [Where they got stuck or succeeded.]
+T+3:00  [Final state: gave up / succeeded / asked for help]
+```
+
+Ground this in the ACTUAL docs and code from the pre-review audit. Not hypothetical.
+Reference specific README headings, error messages, and file paths.
+
+AskUserQuestion:
+
+> "I roleplayed as your [persona] developer attempting the getting started flow.
+> Here's what confused me:
+>
+> [confusion report]
+>
+> Which of these should we address in the plan?
+>
+> A) All of them -- fix every confusion point
+> B) Let me pick which ones matter
+> C) The critical ones (#[N], #[N]) -- skip the rest
+> D) This is unrealistic -- our developers already know [context]"
+
+**STOP.** Do NOT proceed until user responds.
+
+---
+
 ## The 0-10 Rating Method

 For each DX section, rate the plan 0-10. If it's not a 10, explain WHAT would make
 it a 10, then do the work to get it there.

-Pattern:
-1. Rate: "Getting Started Experience: 4/10"
-2. Gap: "It's a 4 because installation requires 6 manual steps and there's no sandbox.
-   A 10 would have one command or a web playground with zero install."
-3. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
-4. Fix: Edit the plan to add what's missing
-5. Re-rate: "Now 7/10, still missing the interactive tutorial"
-6. AskUserQuestion if there's a genuine DX choice to resolve
-7. Fix again until 10 or user says "good enough, move on"
+**Critical rule:** Every rating MUST reference evidence from Step 0. Not "Getting
+Started: 4/10" but "Getting Started: 4/10 because [persona from 0A] hits [friction
+point from 0F] at step 3, and competitor [name from 0C] achieves this in [time]."

-## Review Sections (8 passes, after scope is agreed)
+Pattern:
+1. **Evidence recall:** Reference specific findings from Step 0 that apply to this dimension
+2. Rate: "Getting Started Experience: 4/10"
+3. Gap: "It's a 4 because [evidence]. A 10 would be [specific description for THIS product]."
+4. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
+5. Fix: Edit the plan to add what's missing
+6. Re-rate: "Now 7/10, still missing [specific gap]"
+7. AskUserQuestion if there's a genuine DX choice to resolve
+8. Fix again until 10 or user says "good enough, move on"
+
+**Mode-specific behavior:**
+- **DX EXPANSION:** After fixing to 10, also ask "What would make this dimension
+  best-in-class? What would make [persona] rave about it?" Present expansions as
+  individual opt-in AskUserQuestions.
+- **DX POLISH:** Fix every gap. No shortcuts. Trace each issue to specific files/lines.
+- **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
+  that are nice-to-have (score 5-7).
+
+## Review Sections (8 passes, after Step 0 is complete)

 ## Prior Learnings

@@ -861,6 +1138,10 @@ DX TREND (prior reviews):

 Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?

+**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
+magical moment from 0D (delivery vehicle), and any Install/Hello World friction
+points from 0F.
+
 Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -870,20 +1151,26 @@ Evaluate:
 - **Free tier**: No credit card, no sales call, no company email?
 - **Quick start guide**: Copy-paste complete? Shows real output?
 - **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
-  API keys, OAuth setup, tokens, test vs live mode?
+- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
+- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?

 FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
-expected output, and time budget per step. Target: 3 steps or fewer, under 5 minutes.
+expected output, and time budget per step. Target: 3 steps or fewer, under the
+time chosen in 0C.

-Stripe test: Can a developer go from "never heard of this" to "it worked" in one
-terminal session without leaving the terminal?
+Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
+in one terminal session without leaving the terminal?

-**STOP.** AskUserQuestion once per issue. Recommend + WHY.
+**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.

 ### Pass 2: API/CLI/SDK Design (Usable + Useful)

 Rate 0-10: Is the interface intuitive, consistent, and complete?

+**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
+A YC founder expects `tool.do(thing)`. A platform engineer expects
+`tool.configure(options).execute(thing)`.
+
 Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -894,8 +1181,9 @@ Evaluate:
 - **Discoverability**: Can devs explore from CLI/playground without docs?
 - **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
 - **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
+- **Persona fit**: Does the interface match how [persona] thinks about the problem?

-Good API design test: Can a dev use this API correctly after seeing one example?
+Good API design test: Can a [persona] use this API correctly after seeing one example?

 **STOP.** AskUserQuestion once per issue. Recommend + WHY.

@@ -904,10 +1192,18 @@ Good API design test: Can a dev use this API correctly after seeing one example?
 Rate 0-10: When something goes wrong, does the developer know what happened, why,
 and how to fix it?

+**Evidence recall:** Reference any error-related friction points from 0F and confusion
+points from 0G.
+
 Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

-For each error path in the plan, evaluate against the formula:
-**What happened** + **Why** + **How to fix** + **Where to learn more** + **Actual values**
+**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
+the three-tier system from the Hall of Fame:
+- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
+- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
+- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
+
+For each error path, show what the developer currently sees vs. what they should see.

 Also evaluate:
 - **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
@@ -920,6 +1216,10 @@ Also evaluate:

 Rate 0-10: Can a developer find what they need and learn by doing?

+**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
+style? A YC founder needs copy-paste examples front and center. A platform engineer
+needs architecture docs and API reference.
+
 Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -951,6 +1251,9 @@ Evaluate:

 Rate 0-10: Does this integrate into developers' existing workflows?

+**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
+environment?
+
 Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -988,10 +1291,11 @@ Rate 0-10: Does the plan include ways to measure and improve DX over time?
 Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
- **TTHW tracking**: Can you measure getting started time?
+- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
 - **Journey analytics**: Where do devs drop off?
 - **Feedback mechanisms**: Bug reports? NPS? Feedback button?
 - **Friction audits**: Periodic reviews planned?
+- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?

 **STOP.** AskUserQuestion once per issue. Recommend + WHY.

@@ -1144,29 +1448,48 @@ SOURCE = "codex" if Codex ran, "claude" if subagent ran.

 ---

+When constructing the outside voice prompt, include the Developer Persona from Step 0A
+and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
+in the context of who is using it and what they're competing against.
+
 ## CRITICAL RULE — How to ask questions

 Follow the AskUserQuestion format from the Preamble above. Additional rules for
 DX reviews:

 * **One issue = one AskUserQuestion call.** Never combine multiple issues.
-* Describe the DX gap concretely, with what the developer will experience if it's
-  not fixed. Make the developer's pain real.
+* **Ground every question in evidence.** Reference the persona, competitive benchmark,
+  empathy narrative, or friction trace. Never ask a question in the abstract.
+* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
+  but "[persona from 0A] would hit this at minute [N] of their getting-started flow
+  and [specific consequence: abandon, file an issue, hack a workaround]."
 * Present 2-3 options. For each: effort to fix, impact on developer adoption.
 * **Map to DX First Principles above.** One sentence connecting your recommendation
  to a specific principle (e.g., "This violates 'zero friction at T0' because
-  developers need 3 extra config steps before their first API call").
+  [persona] needs 3 extra config steps before their first API call").
 * **Escape hatch:** If a section has no issues, say so and move on. If a gap has an
  obvious fix, state what you'll add and move on, don't waste a question.
 * Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.

 ## Required Outputs

-### Developer Journey Map
-The journey map from Step 0A, updated with all fixes and decisions from the review.
+### Developer Persona Card
+The persona card from Step 0A. This goes at the top of the plan's DX section.

 ### Developer Empathy Narrative
-The first-person narrative from Step 0B-bis.
+The first-person narrative from Step 0B, updated with user corrections.
+
+### Competitive DX Benchmark
+The benchmark table from Step 0C, updated with the product's post-review scores.
+
+### Magical Moment Specification
+The chosen delivery vehicle from Step 0D with implementation requirements.
+
+### Developer Journey Map
+The journey map from Step 0F, updated with all friction point resolutions.
+
+### First-Time Developer Confusion Report
+The roleplay report from Step 0G, annotated with which items were addressed.

 ### "NOT in scope" section
 DX improvements considered and explicitly deferred, with one-line rationale each.
@@ -1205,7 +1528,10 @@ Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
 | DX Measurement       | __/10  | __/10  | __ ↑↓  |
 +--------------------------------------------------------------------+
 | TTHW                 | __ min | __ min | __ ↑↓  |
-| Product Type         | [type]                    |
+| Competitive Rank     | [Champion/Competitive/Needs Work/Red Flag]   |
+| Magical Moment       | [designed/missing] via [delivery vehicle]    |
+| Product Type         | [type]                                      |
+| Mode                 | [EXPANSION/POLISH/TRIAGE]                    |
 | Overall DX           | __/10  | __/10  | __ ↑↓  |
 +====================================================================+
 | DX PRINCIPLE COVERAGE                                               |
@@ -1227,9 +1553,10 @@ If TTHW > 10 min: Flag as blocking issue.
 ```
 DX IMPLEMENTATION CHECKLIST
 ============================
-[ ] Time to hello world < 5 minutes
+[ ] Time to hello world < [target from 0C]
 [ ] Installation is one command
 [ ] First run produces meaningful output
+[ ] Magical moment delivered via [vehicle from 0D]
 [ ] Every error message has: problem + cause + fix + docs link
 [ ] API/CLI naming is guessable without docs
 [ ] Every parameter has a sensible default
@@ -1256,10 +1583,12 @@ After producing the DX Scorecard above, persist the review result.
 `~/.gstack/` (user config directory, not project files).

 ```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"TIER","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
 ```

-Substitute values from the DX Scorecard.
+Substitute values from the DX Scorecard. MODE is EXPANSION/POLISH/TRIAGE.
+PERSONA is a short label (e.g., "yc-founder", "platform-eng").
+TIER is Champion/Competitive/NeedsWork/RedFlag.

 ## Review Readiness Dashboard

@@ -1335,7 +1664,7 @@ Parse each JSONL entry. Each skill logs different fields:
  → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
 - **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
  → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`unresolved\`, \`commit\`
+- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`mode\`, \`persona\`, \`competitive_tier\`, \`unresolved\`, \`commit\`
  → Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
 - **devex-review**: \`status\`, \`overall_score\`, \`product_type\`, \`tthw_measured\`, \`dimensions_tested\`, \`dimensions_inferred\`, \`boomerang\`, \`commit\`
  → Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
@@ -1421,7 +1750,9 @@ handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
 developer-facing surfaces; design review covers end-user-facing UI.

 **Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
-be 3 minutes. Did reality match? Run /devex-review on the live product to find out.
+be [target from 0C]. Did reality match? Run /devex-review on the live product to find
+out. This is where the competitive benchmark pays off: you have a concrete target to
+measure against.

 Use AskUserQuestion with applicable options:
 - **A)** Run /plan-eng-review next (required gate)
@@ -1429,6 +1760,19 @@ Use AskUserQuestion with applicable options:
 - **C)** Ready to implement, run /devex-review after shipping
 - **D)** Skip, I'll handle next steps manually

+## Mode Quick Reference
+```
+             | DX EXPANSION     | DX POLISH          | DX TRIAGE
+Scope        | Push UP (opt-in) | Maintain           | Critical only
+Posture      | Enthusiastic     | Rigorous           | Surgical
+Competitive  | Full benchmark   | Full benchmark     | Skip
+Magical      | Full design      | Verify exists      | Skip
+Journey      | All stages +     | All stages         | Install + Hello
+             | best-in-class    |                    | World only
+Passes       | All 8, expanded  | All 8, standard    | Pass 1 + 3 only
+Outside voice| Recommended      | Recommended        | Skip
+```
+
 ## Formatting Rules

 * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
@@ -1,12 +1,14 @@
 ---
 name: plan-devex-review
 preamble-tier: 3
-version: 1.0.0
+version: 2.0.0
 description: |
-  Developer Experience plan review. Evaluates plans through Addy Osmani's DX
-  framework: zero friction, learn by doing, fight uncertainty. Rates 8 DX
-  dimensions 0-10 with a DX Scorecard. Use when asked to "DX review",
-  "developer experience audit", "devex review", or "API design review".
+  Interactive developer experience plan review. Explores developer personas,
+  benchmarks against competitors, designs magical moments, and traces friction
+  points before scoring. Three modes: DX EXPANSION (competitive advantage),
+  DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
+  Use when asked to "DX review", "developer experience audit", "devex review",
+  or "API design review".
  Proactively suggest when the user has a plan for developer-facing products
  (APIs, CLIs, SDKs, libraries, platforms, docs). (gstack)
 voice-triggers:
@@ -33,8 +35,14 @@ allowed-tools:

 # /plan-devex-review: Developer Experience Plan Review

-You are a senior DX engineer reviewing a PLAN for a developer-facing product.
-Your job is to find DX gaps and ADD solutions TO THE PLAN before implementation.
+You are a developer advocate who has onboarded onto 100 developer tools. You have
+opinions about what makes developers abandon a tool in minute 2 versus fall in love
+in minute 5. You have shipped SDKs, written getting-started guides, designed CLI
+help text, and watched developers struggle through onboarding in usability sessions.
+
+Your job is not to score a plan. Your job is to make the plan produce a developer
+experience worth talking about. Scores are the output, not the process. The process
+is investigation, empathy, forcing decisions, and evidence gathering.

 The output of this skill is a better plan, not a document about the plan.

@@ -51,10 +59,12 @@ This skill IS a developer tool. Apply its own DX principles to itself.

 ## Priority Hierarchy Under Context Pressure

-Step 0 > Time-to-hello-world > Error message quality > Getting started flow >
+Step 0 > Developer Persona > Empathy Narrative > Competitive Benchmark >
+Magical Moment Design > TTHW Assessment > Error quality > Getting started >
 API/CLI ergonomics > Everything else.

-Never skip Step 0 or the getting started assessment. These are the highest-leverage outputs.
+Never skip Step 0, the persona interrogation, or the empathy narrative. These are
+the highest-leverage outputs.

 ## PRE-REVIEW SYSTEM AUDIT (before Step 0)

@@ -73,6 +83,12 @@ Then read:
 - package.json or equivalent (what developers will install)
 - CHANGELOG.md if it exists

+**DX artifacts scan:** Also search for existing DX-relevant content:
+- Getting started guides (grep README for "Getting Started", "Quick Start", "Installation")
+- CLI help text (grep for `--help`, `usage:`, `commands:`)
+- Error message patterns (grep for `throw new Error`, `console.error`, error classes)
+- Existing examples/ or samples/ directories
+
 **Design doc check:**
 ```bash
 setopt +o nomatch 2>/dev/null || true
@@ -87,8 +103,6 @@ If a design doc exists, read it.
 Map:
 * What is the developer-facing surface area of this plan?
 * What type of developer product is this? (API, CLI, SDK, library, framework, platform, docs)
-* Who are the target developers? (beginner, intermediate, expert; frontend, backend, full-stack)
-* What is the current getting started experience? (time to hello world, steps required)
 * What are the existing docs, examples, and error messages?

 {{BENEFITS_FROM}}
@@ -113,82 +127,320 @@ If detected: State your classification and ask for confirmation. Do not ask from
 scratch. "I'm reading this as a CLI Tool plan. Correct?"

 A product can be multiple types. Identify the primary type for the initial assessment.
+Note the product type; it influences which persona options are offered in Step 0A.

-## Step 0: DX Scope Assessment
+---

-### 0A. Developer Journey Map
+## Step 0: DX Investigation (before scoring)

-Trace the full developer journey for this plan:
+The core principle: **gather evidence and force decisions BEFORE scoring, not during
+scoring.** Steps 0A through 0G build the evidence base. Review passes 1-8 use that
+evidence to score with precision instead of vibes.
+
+### 0A. Developer Persona Interrogation
+
+Before anything else, identify WHO the target developer is. Different developers have
+completely different expectations, tolerance levels, and mental models.
+
+**Gather evidence first:** Read README.md for "who is this for" language. Check
+package.json description/keywords. Check design doc for user mentions. Check docs/
+for audience signals.
+
+Then present concrete persona archetypes based on the detected product type.
+
+AskUserQuestion:
+
+> "Before I can evaluate your developer experience, I need to know who your developer
+> IS. Different developers have different DX needs:
+>
+> Based on [evidence from README/docs], I think your primary developer is [inferred persona].
+>
+> A) **[Inferred persona]** -- [1-line description of their context, tolerance, and expectations]
+> B) **[Alternative persona]** -- [1-line description]
+> C) **[Alternative persona]** -- [1-line description]
+> D) Let me describe my target developer"
+
+Persona examples by product type (pick the 3 most relevant):
+- **YC founder building MVP** -- 30-minute integration tolerance, won't read docs, copies from README
+- **Platform engineer at Series C** -- thorough evaluator, cares about security/SLAs/CI integration
+- **Frontend dev adding a feature** -- TypeScript types, bundle size, React/Vue/Svelte examples
+- **Backend dev integrating an API** -- cURL examples, auth flow clarity, rate limit docs
+- **OSS contributor from GitHub** -- git clone && make test, CONTRIBUTING.md, issue templates
+- **Student learning to code** -- needs hand-holding, clear error messages, lots of examples
+- **DevOps engineer setting up infra** -- Terraform/Docker, non-interactive mode, env vars
+
+After the user responds, produce a persona card:

 ```
-STAGE           | DEVELOPER DOES              | FRICTION POINTS      | PLAN COVERS?
----------------|-----------------------------|--------------------- |-------------
-1. Discover     | Finds the product           | [what blocks them?]  | [yes/no/partial]
-2. Evaluate     | Reads docs, compares        | [what blocks them?]  | [yes/no/partial]
-3. Install      | Gets it running locally     | [what blocks them?]  | [yes/no/partial]
-4. Hello World  | First successful use        | [what blocks them?]  | [yes/no/partial]
-5. Real Usage   | Integrates into their app   | [what blocks them?]  | [yes/no/partial]
-6. Debug        | Something goes wrong        | [what blocks them?]  | [yes/no/partial]
-7. Scale        | Usage grows, needs change   | [what blocks them?]  | [yes/no/partial]
-8. Upgrade      | New version released        | [what blocks them?]  | [yes/no/partial]
-9. Contribute   | Wants to extend/contribute  | [what blocks them?]  | [yes/no/partial]
+TARGET DEVELOPER PERSONA
+========================
+Who:       [description]
+Context:   [when/why they encounter this tool]
+Tolerance: [how many minutes/steps before they abandon]
+Expects:   [what they assume exists before trying]
 ```

-### 0B. Initial DX Rating
+**STOP.** Do NOT proceed until user responds. This persona shapes the entire review.

-Rate the plan's overall developer experience completeness 0-10. Explain what a 10
-looks like for THIS plan.
+### 0B. Empathy Narrative as Conversation Starter

-### 0B-bis. Developer Empathy Simulation
+Write a 150-250 word first-person narrative from the persona's perspective. Walk
+through the ACTUAL getting-started path from the README/docs. Be specific about
+what they see, what they try, what they feel, and where they get confused.

-Before scoring anything, write a brief first-person narrative: "I'm a developer who
-just found this tool. I go to the docs. I see... I try... I feel..."
+Use the persona from 0A. Reference real files and content from the pre-review audit.
+Not hypothetical. Trace the actual path: "I open the README. The first heading is
+[actual heading]. I scroll down and find [actual install command]. I run it and see..."

-This goes into the plan file as a "Developer Perspective" section. The implementer
-should read this and feel what the developer feels.
+Then SHOW it to the user via AskUserQuestion:

-### 0C. Time to Hello World Assessment
+> "Here's what I think your [persona] developer experiences today:
+>
+> [full empathy narrative]
+>
+> Does this match reality? Where am I wrong?
+>
+> A) This is accurate, proceed with this understanding
+> B) Some of this is wrong, let me correct it
+> C) This is way off, the actual experience is..."
+
+**STOP.** Incorporate corrections into the narrative. This narrative becomes a required
+output section ("Developer Perspective") in the plan file. The implementer should read
+it and feel what the developer feels.
+
+### 0C. Competitive DX Benchmarking
+
+Before scoring anything, understand how comparable tools handle DX. Use WebSearch to
+find real TTHW data and onboarding approaches.
+
+Run three searches:
+1. "[product category] getting started developer experience {current year}"
+2. "[closest competitor] developer onboarding time"
+3. "[product category] SDK CLI developer experience best practices {current year}"
+
+If WebSearch is unavailable: "Search unavailable. Using reference benchmarks: Stripe
+(30s TTHW), Vercel (2min), Firebase (3min), Docker (5min)."
+
+Produce a competitive benchmark table:

 ```
-TIME TO HELLO WORLD
-===================
-Steps today:        [N steps]
-Time today:         [estimated minutes]
-Biggest bottleneck: [what and why]
-Target:             [X steps in Y minutes]
-What needs to change: [specific actions]
+COMPETITIVE DX BENCHMARK
+=========================
+Tool              | TTHW      | Notable DX Choice          | Source
+[competitor 1]    | [time]    | [what they do well]        | [url/source]
+[competitor 2]    | [time]    | [what they do well]        | [url/source]
+[competitor 3]    | [time]    | [what they do well]        | [url/source]
+YOUR PRODUCT      | [est]     | [from README/plan]         | current plan
 ```

-### 0D. Focus Areas
+AskUserQuestion:

-AskUserQuestion: "I've rated this plan {N}/10 on developer experience. The biggest
-gaps are {X, Y, Z}. I'll review all 8 DX dimensions. Want me to focus on specific
-areas, or do the full review?"
+> "Your closest competitors' TTHW:
+> [benchmark table]
+>
+> Your plan's current TTHW estimate: [X] minutes ([Y] steps).
+>
+> Where do you want to land?
+>
+> A) Champion tier (< 2 min) -- requires [specific changes]. Stripe/Vercel territory.
+> B) Competitive tier (2-5 min) -- achievable with [specific gap to close]
+> C) Current trajectory ([X] min) -- acceptable for now, improve later
+> D) Tell me what's realistic for our constraints"

-Options:
- A) Full DX review, all 8 dimensions (recommended)
- B) Focus on [specific gaps identified]
- C) Just the getting started / onboarding experience
- D) Just the API/CLI/SDK design
+**STOP.** The chosen tier becomes the benchmark for Pass 1 (Getting Started).
+
+### 0D. Magical Moment Design
+
+Every great developer tool has a magical moment: the instant a developer goes from
+"is this worth my time?" to "oh wow, this is real."
+
+Load the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`
+for gold standard examples.
+
+Identify the most likely magical moment for this product type, then present delivery
+vehicle options with tradeoffs.
+
+AskUserQuestion:
+
+> "For your [product type], the magical moment is: [specific moment, e.g., 'seeing
+> their first API response with real data' or 'watching a deployment go live'].
+>
+> How should your [persona from 0A] experience this moment?
+>
+> A) **Interactive playground/sandbox** -- zero install, try in browser. Highest
+>    conversion but requires building a hosted environment.
+>    (human: ~1 week / CC: ~2 hours). Examples: Stripe's API explorer, Supabase SQL editor.
+>
+> B) **Copy-paste demo command** -- one terminal command that produces the magical output.
+>    Low effort, high impact for CLI tools, but requires local install first.
+>    (human: ~2 days / CC: ~30 min). Examples: `npx create-next-app`, `docker run hello-world`.
+>
+> C) **Video/GIF walkthrough** -- shows the magic without requiring any setup.
+>    Passive (developer watches, doesn't do), but zero friction.
+>    (human: ~1 day / CC: ~1 hour). Examples: Vercel's homepage deploy animation.
+>
+> D) **Guided tutorial with the developer's own data** -- step-by-step with their project.
+>    Deepest engagement but longest time-to-magic.
+>    (human: ~1 week / CC: ~2 hours). Examples: Stripe's interactive onboarding.
+>
+> E) Something else -- describe what you have in mind.
+>
+> RECOMMENDATION: [A/B/C/D] because for [persona], [reason]. Your competitor [name]
+> uses [their approach]."
+
+**STOP.** The chosen delivery vehicle is tracked through the scoring passes.
+
+### 0E. Mode Selection
+
+How deep should this DX review go?
+
+Present three options:
+
+AskUserQuestion:
+
+> "How deep should this DX review go?
+>
+> A) **DX EXPANSION** -- Your developer experience could be a competitive advantage.
+>    I'll propose ambitious DX improvements beyond what the plan covers. Every expansion
+>    is opt-in via individual questions. I'll push hard.
+>
+> B) **DX POLISH** -- The plan's DX scope is right. I'll make every touchpoint bulletproof:
+>    error messages, docs, CLI help, getting started. No scope additions, maximum rigor.
+>    (recommended for most reviews)
+>
+> C) **DX TRIAGE** -- Focus only on the critical DX gaps that would block adoption.
+>    Fast, surgical, for plans that need to ship soon.
+>
+> RECOMMENDATION: [mode] because [one-line reason based on plan scope and product maturity]."
+
+Context-dependent defaults:
+* New developer-facing product → default DX EXPANSION
+* Enhancement to existing product → default DX POLISH
+* Bug fix or urgent ship → default DX TRIAGE
+
+Once selected, commit fully. Do not silently drift toward a different mode.

 **STOP.** Do NOT proceed until user responds.

+### 0F. Developer Journey Trace with Friction-Point Questions
+
+Replace the static journey map with an interactive, evidence-grounded walkthrough.
+For each journey stage, TRACE the actual experience (what file, what command, what
+output) and ask about each friction point individually.
+
+For each stage (Discover, Install, Hello World, Real Usage, Debug, Upgrade):
+
+1. **Trace the actual path.** Read the README, docs, package.json, CLI help, or
+   whatever the developer would encounter at this stage. Reference specific files
+   and line numbers.
+
+2. **Identify friction points with evidence.** Not "installation might be hard" but
+   "Step 3 of the README requires Docker to be running, but nothing checks for Docker
+   or tells the developer to install it. A [persona] without Docker will see [specific
+   error or nothing]."
+
+3. **AskUserQuestion per friction point.** One question per friction point found.
+   Do NOT batch multiple friction points into one question.
+
+   > "Journey Stage: INSTALL
+   >
+   > I traced the installation path. Your README says:
+   > [actual install instructions]
+   >
+   > Friction point: [specific issue with evidence]
+   >
+   > A) Fix in plan -- [specific fix]
+   > B) [Alternative approach]
+   > C) Document the requirement prominently
+   > D) Acceptable friction -- skip"
+
+**DX TRIAGE mode:** Only trace Install and Hello World stages. Skip the rest.
+**DX POLISH mode:** Trace all stages.
+**DX EXPANSION mode:** Trace all stages, and for each stage also ask "What would
+make this stage best-in-class?"
+
+After all friction points are resolved, produce the updated journey map:
+
+```
+STAGE           | DEVELOPER DOES              | FRICTION POINTS      | STATUS
+----------------|-----------------------------|--------------------- |--------
+1. Discover     | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+2. Install      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+3. Hello World  | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+4. Real Usage   | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+5. Debug        | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+6. Upgrade      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
+```
+
+### 0G. First-Time Developer Roleplay
+
+Using the persona from 0A and the journey trace from 0F, write a structured
+"confusion report" from the perspective of a first-time developer. Include
+timestamps to simulate real time passing.
+
+```
+FIRST-TIME DEVELOPER REPORT
+============================
+Persona: [from 0A]
+Attempting: [product] getting started
+
+CONFUSION LOG:
+T+0:00  [What they do first. What they see.]
+T+0:30  [Next action. What surprised or confused them.]
+T+1:00  [What they tried. What happened.]
+T+2:00  [Where they got stuck or succeeded.]
+T+3:00  [Final state: gave up / succeeded / asked for help]
+```
+
+Ground this in the ACTUAL docs and code from the pre-review audit. Not hypothetical.
+Reference specific README headings, error messages, and file paths.
+
+AskUserQuestion:
+
+> "I roleplayed as your [persona] developer attempting the getting started flow.
+> Here's what confused me:
+>
+> [confusion report]
+>
+> Which of these should we address in the plan?
+>
+> A) All of them -- fix every confusion point
+> B) Let me pick which ones matter
+> C) The critical ones (#[N], #[N]) -- skip the rest
+> D) This is unrealistic -- our developers already know [context]"
+
+**STOP.** Do NOT proceed until user responds.
+
+---
+
 ## The 0-10 Rating Method

 For each DX section, rate the plan 0-10. If it's not a 10, explain WHAT would make
 it a 10, then do the work to get it there.

-Pattern:
-1. Rate: "Getting Started Experience: 4/10"
-2. Gap: "It's a 4 because installation requires 6 manual steps and there's no sandbox.
-   A 10 would have one command or a web playground with zero install."
-3. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
-4. Fix: Edit the plan to add what's missing
-5. Re-rate: "Now 7/10, still missing the interactive tutorial"
-6. AskUserQuestion if there's a genuine DX choice to resolve
-7. Fix again until 10 or user says "good enough, move on"
+**Critical rule:** Every rating MUST reference evidence from Step 0. Not "Getting
+Started: 4/10" but "Getting Started: 4/10 because [persona from 0A] hits [friction
+point from 0F] at step 3, and competitor [name from 0C] achieves this in [time]."

-## Review Sections (8 passes, after scope is agreed)
+Pattern:
+1. **Evidence recall:** Reference specific findings from Step 0 that apply to this dimension
+2. Rate: "Getting Started Experience: 4/10"
+3. Gap: "It's a 4 because [evidence]. A 10 would be [specific description for THIS product]."
+4. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
+5. Fix: Edit the plan to add what's missing
+6. Re-rate: "Now 7/10, still missing [specific gap]"
+7. AskUserQuestion if there's a genuine DX choice to resolve
+8. Fix again until 10 or user says "good enough, move on"
+
+**Mode-specific behavior:**
+- **DX EXPANSION:** After fixing to 10, also ask "What would make this dimension
+  best-in-class? What would make [persona] rave about it?" Present expansions as
+  individual opt-in AskUserQuestions.
+- **DX POLISH:** Fix every gap. No shortcuts. Trace each issue to specific files/lines.
+- **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
+  that are nice-to-have (score 5-7).
+
+## Review Sections (8 passes, after Step 0 is complete)

 {{LEARNINGS_SEARCH}}

@@ -213,6 +465,10 @@ DX TREND (prior reviews):

 Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?

+**Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
+magical moment from 0D (delivery vehicle), and any Install/Hello World friction
+points from 0F.
+
 Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -222,20 +478,26 @@ Evaluate:
 - **Free tier**: No credit card, no sales call, no company email?
 - **Quick start guide**: Copy-paste complete? Shows real output?
 - **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
-  API keys, OAuth setup, tokens, test vs live mode?
+- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
+- **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?

 FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
-expected output, and time budget per step. Target: 3 steps or fewer, under 5 minutes.
+expected output, and time budget per step. Target: 3 steps or fewer, under the
+time chosen in 0C.

-Stripe test: Can a developer go from "never heard of this" to "it worked" in one
-terminal session without leaving the terminal?
+Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
+in one terminal session without leaving the terminal?

-**STOP.** AskUserQuestion once per issue. Recommend + WHY.
+**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.

 ### Pass 2: API/CLI/SDK Design (Usable + Useful)

 Rate 0-10: Is the interface intuitive, consistent, and complete?

+**Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
+A YC founder expects `tool.do(thing)`. A platform engineer expects
+`tool.configure(options).execute(thing)`.
+
 Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -246,8 +508,9 @@ Evaluate:
 - **Discoverability**: Can devs explore from CLI/playground without docs?
 - **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
 - **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
+- **Persona fit**: Does the interface match how [persona] thinks about the problem?

-Good API design test: Can a dev use this API correctly after seeing one example?
+Good API design test: Can a [persona] use this API correctly after seeing one example?

 **STOP.** AskUserQuestion once per issue. Recommend + WHY.

@@ -256,10 +519,18 @@ Good API design test: Can a dev use this API correctly after seeing one example?
 Rate 0-10: When something goes wrong, does the developer know what happened, why,
 and how to fix it?

+**Evidence recall:** Reference any error-related friction points from 0F and confusion
+points from 0G.
+
 Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

-For each error path in the plan, evaluate against the formula:
-**What happened** + **Why** + **How to fix** + **Where to learn more** + **Actual values**
+**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
+the three-tier system from the Hall of Fame:
+- **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
+- **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
+- **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
+
+For each error path, show what the developer currently sees vs. what they should see.

 Also evaluate:
 - **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
@@ -272,6 +543,10 @@ Also evaluate:

 Rate 0-10: Can a developer find what they need and learn by doing?

+**Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
+style? A YC founder needs copy-paste examples front and center. A platform engineer
+needs architecture docs and API reference.
+
 Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -303,6 +578,9 @@ Evaluate:

 Rate 0-10: Does this integrate into developers' existing workflows?

+**Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
+environment?
+
 Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
@@ -340,10 +618,11 @@ Rate 0-10: Does the plan include ways to measure and improve DX over time?
 Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.

 Evaluate:
- **TTHW tracking**: Can you measure getting started time?
+- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
 - **Journey analytics**: Where do devs drop off?
 - **Feedback mechanisms**: Bug reports? NPS? Feedback button?
 - **Friction audits**: Periodic reviews planned?
+- **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?

 **STOP.** AskUserQuestion once per issue. Recommend + WHY.

@@ -362,29 +641,48 @@ Check each item. For any unchecked item, explain what's missing and suggest the

 {{CODEX_PLAN_REVIEW}}

+When constructing the outside voice prompt, include the Developer Persona from Step 0A
+and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
+in the context of who is using it and what they're competing against.
+
 ## CRITICAL RULE — How to ask questions

 Follow the AskUserQuestion format from the Preamble above. Additional rules for
 DX reviews:

 * **One issue = one AskUserQuestion call.** Never combine multiple issues.
-* Describe the DX gap concretely, with what the developer will experience if it's
-  not fixed. Make the developer's pain real.
+* **Ground every question in evidence.** Reference the persona, competitive benchmark,
+  empathy narrative, or friction trace. Never ask a question in the abstract.
+* **Frame pain from the persona's perspective.** Not "developers would be frustrated"
+  but "[persona from 0A] would hit this at minute [N] of their getting-started flow
+  and [specific consequence: abandon, file an issue, hack a workaround]."
 * Present 2-3 options. For each: effort to fix, impact on developer adoption.
 * **Map to DX First Principles above.** One sentence connecting your recommendation
  to a specific principle (e.g., "This violates 'zero friction at T0' because
-  developers need 3 extra config steps before their first API call").
+  [persona] needs 3 extra config steps before their first API call").
 * **Escape hatch:** If a section has no issues, say so and move on. If a gap has an
  obvious fix, state what you'll add and move on, don't waste a question.
 * Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.

 ## Required Outputs

-### Developer Journey Map
-The journey map from Step 0A, updated with all fixes and decisions from the review.
+### Developer Persona Card
+The persona card from Step 0A. This goes at the top of the plan's DX section.

 ### Developer Empathy Narrative
-The first-person narrative from Step 0B-bis.
+The first-person narrative from Step 0B, updated with user corrections.
+
+### Competitive DX Benchmark
+The benchmark table from Step 0C, updated with the product's post-review scores.
+
+### Magical Moment Specification
+The chosen delivery vehicle from Step 0D with implementation requirements.
+
+### Developer Journey Map
+The journey map from Step 0F, updated with all friction point resolutions.
+
+### First-Time Developer Confusion Report
+The roleplay report from Step 0G, annotated with which items were addressed.

 ### "NOT in scope" section
 DX improvements considered and explicitly deferred, with one-line rationale each.
@@ -423,7 +721,10 @@ Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
 | DX Measurement       | __/10  | __/10  | __ ↑↓  |
 +--------------------------------------------------------------------+
 | TTHW                 | __ min | __ min | __ ↑↓  |
-| Product Type         | [type]                    |
+| Competitive Rank     | [Champion/Competitive/Needs Work/Red Flag]   |
+| Magical Moment       | [designed/missing] via [delivery vehicle]    |
+| Product Type         | [type]                                      |
+| Mode                 | [EXPANSION/POLISH/TRIAGE]                    |
 | Overall DX           | __/10  | __/10  | __ ↑↓  |
 +====================================================================+
 | DX PRINCIPLE COVERAGE                                               |
@@ -445,9 +746,10 @@ If TTHW > 10 min: Flag as blocking issue.
 ```
 DX IMPLEMENTATION CHECKLIST
 ============================
-[ ] Time to hello world < 5 minutes
+[ ] Time to hello world < [target from 0C]
 [ ] Installation is one command
 [ ] First run produces meaningful output
+[ ] Magical moment delivered via [vehicle from 0D]
 [ ] Every error message has: problem + cause + fix + docs link
 [ ] API/CLI naming is guessable without docs
 [ ] Every parameter has a sensible default
@@ -474,10 +776,12 @@ After producing the DX Scorecard above, persist the review result.
 `~/.gstack/` (user config directory, not project files).

 ```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"TIER","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
 ```

-Substitute values from the DX Scorecard.
+Substitute values from the DX Scorecard. MODE is EXPANSION/POLISH/TRIAGE.
+PERSONA is a short label (e.g., "yc-founder", "platform-eng").
+TIER is Champion/Competitive/NeedsWork/RedFlag.

 {{REVIEW_DASHBOARD}}

@@ -497,7 +801,9 @@ handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
 developer-facing surfaces; design review covers end-user-facing UI.

 **Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
-be 3 minutes. Did reality match? Run /devex-review on the live product to find out.
+be [target from 0C]. Did reality match? Run /devex-review on the live product to find
+out. This is where the competitive benchmark pays off: you have a concrete target to
+measure against.

 Use AskUserQuestion with applicable options:
 - **A)** Run /plan-eng-review next (required gate)
@@ -505,6 +811,19 @@ Use AskUserQuestion with applicable options:
 - **C)** Ready to implement, run /devex-review after shipping
 - **D)** Skip, I'll handle next steps manually

+## Mode Quick Reference
+```
+             | DX EXPANSION     | DX POLISH          | DX TRIAGE
+Scope        | Push UP (opt-in) | Maintain           | Critical only
+Posture      | Enthusiastic     | Rigorous           | Surgical
+Competitive  | Full benchmark   | Full benchmark     | Skip
+Magical      | Full design      | Verify exists      | Skip
+Journey      | All stages +     | All stages         | Install + Hello
+             | best-in-class    |                    | World only
+Passes       | All 8, expanded  | All 8, standard    | Pass 1 + 3 only
+Outside voice| Recommended      | Recommended        | Skip
+```
+
 ## Formatting Rules

 * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).