12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
7. **Codex design voice** (optional, automatic if available):
\`\`\`bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
\`\`\`
If Codex is available, run a lightweight design check on the diff:
\`\`\`bash
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
\`\`\`
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
\`\`\`bash
cat "$TMPERR_DRL" && rm -f "$TMPERR_DRL"
\`\`\`
**Error handling:** All errors are non-blocking. On auth failure, timeout, or empty response — skip with a brief note and continue.
Present Codex output under a \`CODEX (design):\` header, merged with the checklist findings above.`;
**If \`SCOPE_FRONTEND=false\`:** Skip design review silently. No output.
**If \`SCOPE_FRONTEND=true\`:**
1. **Check for DESIGN.md.** If \`DESIGN.md\` or \`design-system.md\` exists in the repo root, read it. All design findings are calibrated against it — patterns blessed in DESIGN.md are not flagged. If not found, use universal design principles.
2. **Read \`.claude/skills/review/design-checklist.md\`.** If the file cannot be read, skip design review with a note: "Design checklist not found — skipping design review."
3. **Read each changed frontend file** (full file, not just diff hunks). Frontend files are identified by the patterns listed in the checklist.
4. **Apply the design checklist** against the changed files. For each item:
- **[HIGH/MEDIUM] design judgment needed**: classify as ASK
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /design-review"
5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow.
6. **Log the result** for the Review Readiness Dashboard:
Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count, COMMIT = output of \`git rev-parse --short HEAD\`.${codexBlock}`;
}
// NOTE: design-checklist.md is a subset of this methodology for code-level detection.
// When adding items here, also update review/design-checklist.md, and vice versa.
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
- **Design for trust** — every interface element builds or erodes user trust.
**Step 2: Generate wireframe HTML**
Generate a single-page HTML file with these constraints:
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
hand-drawn-style elements. This is a sketch, not a polished mockup.
- Self-contained — no external dependencies, no CDN links, inline CSS only
- Show the core interaction flow (1-3 screens/states max)
- Include realistic placeholder content (not "Lorem ipsum" — use content that
matches the actual use case)
- Add HTML comments explaining design decisions
Write to a temp file:
\`\`\`bash
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
\`\`\`
**Step 3: Render and capture**
\`\`\`bash
$B goto "file://$SKETCH_FILE"
$B screenshot /tmp/gstack-sketch.png
\`\`\`
If \`$B\` is not available (browse binary not set up), skip the render step. Tell the
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
**Step 4: Present and iterate**
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
If they want changes, regenerate the HTML with their feedback and re-render.
If they approve or say "good enough," proceed.
**Step 5: Include in design doc**
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstream skills
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.
**Step 6: Outside design voices** (optional)
After the wireframe is approved, offer outside design perspectives:
\`\`\`bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
\`\`\`
If Codex is available, use AskUserQuestion:
> "Want outside design perspectives on the chosen approach? Codex proposes a visual thesis, content plan, and interaction ideas. A Claude subagent proposes an alternative aesthetic direction."
>
> A) Yes — get outside design voices
> B) No — proceed without
If user chooses A, launch both voices simultaneously:
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
\`\`\`
Use a 5-minute timeout (\`timeout: 300000\`). After completion: \`cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"\`
2. **Claude subagent** (via Agent tool):
"For this product approach, what design direction would you recommend? What aesthetic, typography, and interaction patterns fit? What would make this approach feel inevitable to the user? Be specific — font names, hex colors, spacing values."
Present Codex output under \`CODEX SAYS (design sketch):\` and subagent output under \`CLAUDE SUBAGENT (design direction):\`.
Error handling: all non-blocking. On failure, skip and continue.`;
// Codex host: strip entirely — Codex should never invoke itself
if(ctx.host==='codex')return'';
return`## Phase 3.5: Cross-Model Second Opinion (optional)
**Binary check first — no question if unavailable:**
\`\`\`bash
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
\`\`\`
If \`CODEX_NOT_AVAILABLE\`: skip Phase 3.5 entirely — no message, no AskUserQuestion. Proceed directly to Phase 4.
If \`CODEX_AVAILABLE\`: use AskUserQuestion:
> Want a second opinion from a different AI model? Codex will independently review your problem statement, key answers, premises, and any landscape findings from this session. It hasn't seen this conversation — it gets a structured summary. Usually takes 2-5 minutes.
> A) Yes, get a second opinion
> B) No, proceed to alternatives
If B: skip Phase 3.5 entirely. Remember that Codex did NOT run (affects design doc, founder signals, and Phase 4 below).
**If A: Run the Codex cold read.**
1. Assemble a structured context block from Phases 1-3:
- Mode (Startup or Builder)
- Problem statement (from Phase 1)
- Key answers from Phase 2A/2B (summarize each Q&A in 1-2 sentences, include verbatim user quotes)
- Landscape findings (from Phase 2.75, if search was run)
Write the full prompt (context block + instructions) to this file. Use the mode-appropriate variant:
**Startup mode instructions:** "You are an independent technical advisor reading a transcript of a startup brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the STRONGEST version of what this person is trying to build? Steelman it in 2-3 sentences. 2) What is the ONE thing from their answers that reveals the most about what they should actually build? Quote it and explain why. 3) Name ONE agreed premise you think is wrong, and what evidence would prove you right. 4) If you had 48 hours and one engineer to build a prototype, what would you build? Be specific — tech stack, features, what you'd skip. Be direct. Be terse. No preamble."
**Builder mode instructions:** "You are an independent technical advisor reading a transcript of a builder brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the COOLEST version of this they haven't considered? 2) What's the ONE thing from their answers that reveals what excites them most? Quote it. 3) What existing open source project or tool gets them 50% of the way there — and what's the 50% they'd need to build? 4) If you had a weekend to build this, what would you build first? Be specific. Be direct. No preamble."
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
\`\`\`bash
cat "$TMPERR_OH"
rm -f "$TMPERR_OH" "$CODEX_PROMPT_FILE"
\`\`\`
**Error handling:** All errors are non-blocking — Codex second opinion is a quality enhancement, not a prerequisite.
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate. Skipping second opinion."
- **Timeout:** "Codex timed out after 5 minutes. Skipping second opinion."
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>. Skipping second opinion."
On any error, proceed to Phase 4 — do NOT fall back to a Claude subagent (this is brainstorming, not adversarial review).
6. **Premise revision check:** If Codex challenged an agreed premise, use AskUserQuestion:
> Codex challenged premise #{N}: "{premise text}". Their argument: "{reasoning}".
> A) Revise this premise based on Codex's input
> B) Keep the original premise — proceed to alternatives
If A: revise the premise and note the revision. If B: proceed (and note that the user defended this premise with reasoning — this is a founder signal if they articulate WHY they disagree, not just dismiss).`;
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
# Respect old opt-out
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
echo "DIFF_SIZE: $DIFF_TOTAL"
echo "OLD_CFG: \${OLD_CFG:-not_set}"
\`\`\`
If \`OLD_CFG\` is \`disabled\`: skip this step silently. Continue to the next step.
**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
**Auto-select tier based on diff size:**
- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
---
### Medium tier (50–199 lines)
Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
**Codex adversarial:**
\`\`\`bash
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
\`\`\`
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
\`\`\`bash
cat "$TMPERR_ADV"
\`\`\`
Present the full output verbatim. This is informational — it never blocks shipping.
**Error handling:** All errors are non-blocking — adversarial review is a quality enhancement, not a prerequisite.
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate."
- **Timeout:** "Codex timed out after 5 minutes."
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>."
On any Codex error, fall back to the Claude adversarial subagent automatically.
**Claude adversarial subagent** (fallback when Codex unavailable or errored):
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
Subagent prompt:
"Read the diff for this branch with \`git diff origin/<base>\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
**Cleanup:** Run \`rm -f "$TMPERR_ADV"\` after processing (if Codex was used).
---
### Large tier (200+ lines)
Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
Check for \`[P1]\` markers: found → \`GATE: FAIL\`, not found → \`GATE: PASS\`.
If GATE is FAIL, use AskUserQuestion:
\`\`\`
Codex found N critical issues in the diff.
A) Investigate and fix now (recommended)
B) Continue — review will still complete
\`\`\`
If A: address the findings${isShip?'. After fixing, re-run tests (Step 3) since code has changed':''}. Re-run \`codex review\` to verify.
Read stderr for errors (same error handling as medium tier).
After stderr: \`rm -f "$TMPERR"\`
**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
**3. Codex adversarial challenge (if available):** Run \`codex exec\` with the adversarial prompt (same as medium tier).
If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: \`npm install -g @openai/codex\`"
**Persist the review result AFTER all passes complete** (not after each sub-step):
Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
---
### Cross-model synthesis (medium and large tiers)
After all passes complete, synthesize findings across all sources:
\`\`\`
ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
- UNIVERSAL: CSS variables for colors, no default font stacks, one job per section, cards earn existence
For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging.`;
subagentPrompt=`Read the plan file at [plan-file-path]. You are an independent senior product designer reviewing this plan. You have NOT seen any prior review. Evaluate:
1. Information hierarchy: what does the user see first, second, third? Is it right?
2. Missing states: loading, empty, error, success, partial — which are unspecified?
3. User journey: what's the emotional arc? Where does it break?
4. Specificity: does the plan describe SPECIFIC UI ("48px Söhne Bold header, #1a1a1a on white") or generic patterns ("clean modern card-based layout")?
5. What design decisions will haunt the implementer if left ambiguous?
For each finding: what's wrong, severity (critical/high/medium), and the fix.`;
}elseif(isDesignReview){
codexPrompt=`Review the frontend source code in this repo. Evaluate against these design hard rules:
- Motion: 2-3 intentional animations, or zero / ornamental only?
- Cards: used only when card IS the interaction? No decorative card grids?
First classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, then apply matching rules.
LITMUS CHECKS — answer YES/NO:
${litmusList}
HARD REJECTION — flag if ANY apply:
${rejectionList}
Be specific. Reference file:line for every finding.`;
subagentPrompt=`Review the frontend source code in this repo. You are an independent senior product designer doing a source-code design audit. Focus on CONSISTENCY PATTERNS across files rather than individual violations:
- Are spacing values systematic across the codebase?
- Is there ONE color system or scattered approaches?
- Do responsive breakpoints follow a consistent set?
- Is the accessibility approach consistent or spotty?
For each finding: what's wrong, severity (critical/high/medium), and the file:line.`;
}elseif(isDesignConsultation){
codexPrompt=`Given this product context, propose a complete design direction:
- Visual thesis: one sentence describing mood, material, and energy
- Typography: specific font names (not defaults — no Inter/Roboto/Arial/system) + hex colors
- Color system: CSS variables for background, surface, primary text, muted text, accent
- Layout: composition-first, not component-first. First viewport as poster, not document
- Differentiation: 2 deliberate departures from category norms
- Anti-slop: no purple gradients, no 3-column icon grids, no centered everything, no decorative blobs
Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it.`;
subagentPrompt=`Given this product context, propose a design direction that would SURPRISE. What would the cool indie studio do that the enterprise UI team wouldn't?
- Propose an aesthetic direction, typography stack (specific font names), color palette (hex values)
- 2 deliberate departures from category norms
- What emotional reaction should the user have in the first 3 seconds?
Be bold. Be specific. No hedging.`;
}else{
// Unknown skill — return empty
return'';
}
// Build the opt-in section
constoptInSection=isAutomatic?`
**Automatic:** Outside voices run automatically when Codex is available. No opt-in needed.`:`
Use AskUserQuestion:
> "Want outside design voices${isPlanDesignReview?' before the detailed review':''}? Codex evaluates against OpenAI's design hard rules + litmus checks; Claude subagent does an independent ${isDesignConsultation?'design direction proposal':'completeness review'}."
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
\`\`\`bash
cat "$TMPERR_DESIGN" && rm -f "$TMPERR_DESIGN"
\`\`\`
2. **Claude design subagent** (via Agent tool):
Dispatch a subagent with this prompt:
"${subagentPrompt}"
**Error handling (all non-blocking):**
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \`codex login\` to authenticate."
- **Timeout:** "Codex timed out after 5 minutes."
- **Empty response:** "Codex returned no response."
- On any Codex error: proceed with Claude subagent output only, tagged \`[single-model]\`.
- If Claude subagent also fails: "Outside voices unavailable — continuing with primary review."
Present Codex output under a \`CODEX SAYS (design ${isPlanDesignReview?'critique':isDesignReview?'source audit':'direction'}):\` header.
Present subagent output under a \`CLAUDE SUBAGENT (design ${isPlanDesignReview?'completeness':isDesignReview?'consistency':'direction'}):\` header.
@@ -17,7 +17,8 @@ If Codex is available, run a lightweight design check on the diff:
\`\`\`bash
TMPERR_DRL=$(mktemp/tmp/codex-drl-XXXXXXXX)
codexexec"Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line."-C"$(git rev-parse --show-toplevel)"-sread-only-c'model_reasoning_effort="high"'--enableweb_search_cached2>"$TMPERR_DRL"
_REPO_ROOT=$(gitrev-parse--show-toplevel)||{echo"ERROR: not in a git repo">&2;exit1;}
codexexec"Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line."-C"$_REPO_ROOT"-sread-only-c'model_reasoning_effort="high"'--enableweb_search_cached2>"$TMPERR_DRL"
\`\`\`
Usea5-minutetimeout(\`timeout: 300000\`). After the command completes, read stderr:
@@ -467,7 +468,8 @@ If user chooses A, launch both voices simultaneously:
codexexec"For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated."-C"$(git rev-parse --show-toplevel)"-sread-only-c'model_reasoning_effort="medium"'--enableweb_search_cached2>"$TMPERR_SKETCH"
_REPO_ROOT=$(gitrev-parse--show-toplevel)||{echo"ERROR: not in a git repo">&2;exit1;}
codexexec"For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated."-C"$_REPO_ROOT"-sread-only-c'model_reasoning_effort="medium"'--enableweb_search_cached2>"$TMPERR_SKETCH"
\`\`\`
Usea5-minutetimeout(\`timeout: 300000\`). After completion: \`cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"\`
Usea5-minutetimeout(\`timeout: 300000\`). After the command completes, read stderr:
@@ -376,7 +377,8 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
\`\`\`bash
TMPERR_ADV=$(mktemp/tmp/codex-adv-XXXXXXXX)
codexexec"Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."-C"$(git rev-parse --show-toplevel)"-sread-only-c'model_reasoning_effort="high"'--enableweb_search_cached2>"$TMPERR_ADV"
_REPO_ROOT=$(gitrev-parse--show-toplevel)||{echo"ERROR: not in a git repo">&2;exit1;}
codexexec"Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."-C"$_REPO_ROOT"-sread-only-c'model_reasoning_effort="high"'--enableweb_search_cached2>"$TMPERR_ADV"
\`\`\`
SettheBashtool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn'texistonmacOS.Afterthecommandcompletes,readstderr:
@@ -421,6 +423,8 @@ Claude's structured review already ran. Now run **all three remaining passes** f
**1.Codexstructuredreview(ifavailable):**
\`\`\`bash
TMPERR=$(mktemp/tmp/codex-review-XXXXXXXX)
_REPO_ROOT=$(gitrev-parse--show-toplevel)||{echo"ERROR: not in a git repo">&2;exit1;}
11.**Showscreenshotstotheuser.**Afterevery\`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
return'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
}
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.