mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
feat: default codex reviews in /ship and /review with xhigh reasoning
Codex code reviews are now opt-in-once-then-always-on via a one-time
adoption prompt. When enabled, both review + adversarial run automatically
on every /ship and /review — no more choosing between them.
Key changes:
- New {{CODEX_REVIEW_STEP}} resolver centralizes Codex review logic (DRY)
- Three-state config: enabled/not-set/disabled via gstack-config
- P1 findings default to "Investigate and fix" instead of "Ship anyway"
- All reasoning bumped to xhigh (review, adversarial, consult)
- Codex review step stripped from codex-host variants (no self-invocation)
- Ship "Never ask" rule updated to accurately list quality-gate stops
- Error handling for auth, timeout, empty response (all non-blocking)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -993,7 +993,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
@@ -528,7 +528,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
@@ -517,7 +517,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
@@ -474,54 +474,7 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
|
||||
@@ -295,7 +295,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
@@ -837,43 +837,7 @@ For each classified comment:
|
||||
|
||||
---
|
||||
|
||||
## Step 3.8: Codex second opinion (optional)
|
||||
|
||||
Check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to break this code
|
||||
C) Skip — ship without Codex review
|
||||
```
|
||||
|
||||
If the user chooses A or B:
|
||||
|
||||
**For code review (A):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers
|
||||
to determine pass/fail gate. Persist the result:
|
||||
|
||||
```bash
|
||||
~/.codex/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?"
|
||||
If the user says no, stop. If yes, continue to Step 4.
|
||||
|
||||
**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill).
|
||||
Present findings. This is informational — does not block shipping.
|
||||
|
||||
If Codex is not available, skip silently. Continue to Step 4.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Version bump (auto-decide)
|
||||
|
||||
@@ -1114,7 +1078,7 @@ doc updates — the user runs `/ship` and documentation stays current without a
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
|
||||
- **Never force push.** Use regular `git push` only.
|
||||
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
|
||||
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), Codex critical findings ([P1]), and the one-time Codex adoption prompt.
|
||||
- **Always use the 4-digit version format** from the VERSION file.
|
||||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
|
||||
+5
-8
@@ -300,13 +300,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
3. Capture the output. Then parse cost from stderr:
|
||||
@@ -461,7 +461,7 @@ THE PLAN:
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -494,7 +494,7 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
<same python streaming parser as above>
|
||||
"
|
||||
```
|
||||
@@ -530,10 +530,7 @@ Session saved — run /codex again to continue this conversation.
|
||||
agentic coding model). This means as OpenAI ships newer models, /codex automatically
|
||||
uses them. If the user wants a specific model, pass `-m` through to codex.
|
||||
|
||||
**Reasoning effort** varies by mode — use the right level for each task:
|
||||
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
|
||||
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
|
||||
- **Consult mode:** `high` — good balance of depth and speed for conversations.
|
||||
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
|
||||
|
||||
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
|
||||
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
|
||||
|
||||
+5
-8
@@ -79,13 +79,13 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
3. Capture the output. Then parse cost from stderr:
|
||||
@@ -240,7 +240,7 @@ THE PLAN:
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -273,7 +273,7 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
codex exec resume <session-id> "<prompt>" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached --json 2>"$TMPERR" | python3 -c "
|
||||
<same python streaming parser as above>
|
||||
"
|
||||
```
|
||||
@@ -309,10 +309,7 @@ Session saved — run /codex again to continue this conversation.
|
||||
agentic coding model). This means as OpenAI ships newer models, /codex automatically
|
||||
uses them. If the user wants a specific model, pass `-m` through to codex.
|
||||
|
||||
**Reasoning effort** varies by mode — use the right level for each task:
|
||||
- **Review mode:** `high` — thorough but not slow. Diff review benefits from depth but doesn't need maximum compute.
|
||||
- **Challenge (adversarial) mode:** `xhigh` — maximum reasoning power. When trying to break code, you want the model thinking as hard as possible.
|
||||
- **Consult mode:** `high` — good balance of depth and speed for conversations.
|
||||
**Reasoning effort:** All modes use `xhigh` — maximum reasoning power. When reviewing code, breaking code, or consulting on architecture, you want the model thinking as hard as possible.
|
||||
|
||||
**Web search:** All codex commands use `--enable web_search_cached` so Codex can look up
|
||||
docs and APIs during review. This is OpenAI's cached index — fast, no extra cost.
|
||||
|
||||
@@ -1001,7 +1001,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
@@ -536,7 +536,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
@@ -526,7 +526,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
|
||||
+83
-25
@@ -483,52 +483,110 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
## Step 5.7: Codex review
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
Check if the Codex CLI is available and read the user's Codex review preference:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
CODEX_REVIEWS_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||||
echo "CODEX_REVIEWS: ${CODEX_REVIEWS_CFG:-not_set}"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
If `CODEX_NOT_AVAILABLE`: skip this step silently. Continue to the next step.
|
||||
|
||||
If `CODEX_REVIEWS` is `disabled`: skip this step silently. Continue to the next step.
|
||||
|
||||
If `CODEX_REVIEWS` is `enabled`: run both code review and adversarial challenge automatically (no prompt). Jump to the "Run Codex" section below.
|
||||
|
||||
If `CODEX_REVIEWS` is `not_set`: use AskUserQuestion to offer the one-time adoption prompt:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
GStack recommends enabling Codex code reviews — Codex is the super smart quiet engineer friend who will save your butt.
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
A) Enable for all future runs (recommended, default)
|
||||
B) Try it for now, ask me again later
|
||||
C) No thanks, don't ask me again
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
If the user chooses A: persist the setting and run both:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews enabled
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
If the user chooses B: run both this time but do not persist any setting.
|
||||
|
||||
If the user chooses C: persist the opt-out and skip:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews disabled
|
||||
```
|
||||
Then skip this step. Continue to the next step.
|
||||
|
||||
### Run Codex
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call.
|
||||
|
||||
**Code review:** Run:
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header:
|
||||
|
||||
```
|
||||
CODEX SAYS (code review):
|
||||
════════════════════════════════════════════════════════════
|
||||
<full codex output, verbatim — do not truncate or summarize>
|
||||
════════════════════════════════════════════════════════════
|
||||
GATE: PASS Tokens: N | Est. cost: ~$X.XX
|
||||
```
|
||||
|
||||
Check the output for `[P1]` markers. If found: `GATE: FAIL`. If no `[P1]`: `GATE: PASS`.
|
||||
|
||||
**If GATE is FAIL:** use AskUserQuestion:
|
||||
|
||||
```
|
||||
Codex found N critical issues in the diff.
|
||||
|
||||
A) Investigate and fix now (recommended)
|
||||
B) Ship anyway — these issues may cause production problems
|
||||
```
|
||||
|
||||
If the user chooses A: read the Codex findings carefully and work to address them. Then re-run `codex review` to verify the gate is now PASS.
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping.
|
||||
|
||||
**Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output:
|
||||
|
||||
```
|
||||
CROSS-MODEL ANALYSIS:
|
||||
Both found: [findings that overlap between Claude and Codex]
|
||||
Only Codex found: [findings unique to Codex]
|
||||
Only Claude found: [findings unique to Claude's review]
|
||||
Agreement rate: X% (N/M total unique findings overlap)
|
||||
```
|
||||
|
||||
**Persist the code review result:**
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
### Error handling
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
|
||||
---
|
||||
|
||||
|
||||
+1
-48
@@ -231,54 +231,7 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
{{CODEX_REVIEW_STEP}}
|
||||
|
||||
## Important Rules
|
||||
|
||||
|
||||
+117
-1
@@ -1092,7 +1092,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \\\`gstack-config set codex_reviews enabled|disabled\\\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \\\`skip_eng_review\\\` is \\\`true\\\`)
|
||||
@@ -1412,6 +1412,121 @@ The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstrea
|
||||
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.`;
|
||||
}
|
||||
|
||||
function generateCodexReviewStep(ctx: TemplateContext): string {
|
||||
// Codex host: strip entirely — Codex should never invoke itself
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepNum = isShip ? '3.8' : '5.7';
|
||||
|
||||
return `## Step ${stepNum}: Codex review
|
||||
|
||||
Check if the Codex CLI is available and read the user's Codex review preference:
|
||||
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
CODEX_REVIEWS_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||||
echo "CODEX_REVIEWS: \${CODEX_REVIEWS_CFG:-not_set}"
|
||||
\`\`\`
|
||||
|
||||
If \`CODEX_NOT_AVAILABLE\`: skip this step silently. Continue to the next step.
|
||||
|
||||
If \`CODEX_REVIEWS\` is \`disabled\`: skip this step silently. Continue to the next step.
|
||||
|
||||
If \`CODEX_REVIEWS\` is \`enabled\`: run both code review and adversarial challenge automatically (no prompt). Jump to the "Run Codex" section below.
|
||||
|
||||
If \`CODEX_REVIEWS\` is \`not_set\`: use AskUserQuestion to offer the one-time adoption prompt:
|
||||
|
||||
\`\`\`
|
||||
GStack recommends enabling Codex code reviews — Codex is the super smart quiet engineer friend who will save your butt.
|
||||
|
||||
A) Enable for all future runs (recommended, default)
|
||||
B) Try it for now, ask me again later
|
||||
C) No thanks, don't ask me again
|
||||
\`\`\`
|
||||
|
||||
If the user chooses A: persist the setting and run both:
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews enabled
|
||||
\`\`\`
|
||||
|
||||
If the user chooses B: run both this time but do not persist any setting.
|
||||
|
||||
If the user chooses C: persist the opt-out and skip:
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews disabled
|
||||
\`\`\`
|
||||
Then skip this step. Continue to the next step.
|
||||
|
||||
### Run Codex
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (\`timeout: 300000\`) on each Bash call.
|
||||
|
||||
**Code review:** Run:
|
||||
\`\`\`bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim under a \`CODEX SAYS (code review):\` header:
|
||||
|
||||
\`\`\`
|
||||
CODEX SAYS (code review):
|
||||
════════════════════════════════════════════════════════════
|
||||
<full codex output, verbatim — do not truncate or summarize>
|
||||
════════════════════════════════════════════════════════════
|
||||
GATE: PASS Tokens: N | Est. cost: ~$X.XX
|
||||
\`\`\`
|
||||
|
||||
Check the output for \`[P1]\` markers. If found: \`GATE: FAIL\`. If no \`[P1]\`: \`GATE: PASS\`.
|
||||
|
||||
**If GATE is FAIL:** use AskUserQuestion:
|
||||
|
||||
\`\`\`
|
||||
Codex found N critical issues in the diff.
|
||||
|
||||
A) Investigate and fix now (recommended)
|
||||
B) Ship anyway — these issues may cause production problems
|
||||
\`\`\`
|
||||
|
||||
If the user chooses A: read the Codex findings carefully and work to address them${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Then re-run \`codex review\` to verify the gate is now PASS.
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
\`\`\`bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim under a \`CODEX SAYS (adversarial challenge):\` header. This is informational — it never blocks shipping.
|
||||
${!isShip ? `
|
||||
**Cross-model analysis:** After both Codex outputs are presented, compare Codex's findings with your own review findings from the earlier review steps and output:
|
||||
|
||||
\`\`\`
|
||||
CROSS-MODEL ANALYSIS:
|
||||
Both found: [findings that overlap between Claude and Codex]
|
||||
Only Codex found: [findings unique to Codex]
|
||||
Only Claude found: [findings unique to Claude's review]
|
||||
Agreement rate: X% (N/M total unique findings overlap)
|
||||
\`\`\`
|
||||
` : ''}
|
||||
**Persist the code review result:**
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
\`\`\`
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
### Error handling
|
||||
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \\\`codex login\\\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
|
||||
---`;
|
||||
}
|
||||
|
||||
const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
COMMAND_REFERENCE: generateCommandReference,
|
||||
SNAPSHOT_FLAGS: generateSnapshotFlags,
|
||||
@@ -1426,6 +1541,7 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
SPEC_REVIEW_LOOP: generateSpecReviewLoop,
|
||||
DESIGN_SKETCH: generateDesignSketch,
|
||||
BENEFITS_FROM: generateBenefitsFrom,
|
||||
CODEX_REVIEW_STEP: generateCodexReviewStep,
|
||||
};
|
||||
|
||||
// ─── Codex Helpers ───────────────────────────────────────────
|
||||
|
||||
+80
-21
@@ -305,7 +305,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
|
||||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting).
|
||||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||||
- **Codex Review (optional):** Independent second opinion from OpenAI Codex CLI. Shows pass/fail gate. Recommend for critical code changes where a second AI perspective adds value. Skip when Codex CLI is not installed.
|
||||
- **Codex Review (enabled by default when Codex CLI is installed):** Independent review + adversarial challenge from OpenAI Codex CLI. Shows pass/fail gate. Runs automatically when enabled — configure with \`gstack-config set codex_reviews enabled|disabled\`.
|
||||
|
||||
**Verdict logic:**
|
||||
- **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`)
|
||||
@@ -847,41 +847,100 @@ For each classified comment:
|
||||
|
||||
---
|
||||
|
||||
## Step 3.8: Codex second opinion (optional)
|
||||
## Step 3.8: Codex review
|
||||
|
||||
Check if the Codex CLI is available:
|
||||
Check if the Codex CLI is available and read the user's Codex review preference:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
CODEX_REVIEWS_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||||
echo "CODEX_REVIEWS: ${CODEX_REVIEWS_CFG:-not_set}"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
If `CODEX_NOT_AVAILABLE`: skip this step silently. Continue to the next step.
|
||||
|
||||
If `CODEX_REVIEWS` is `disabled`: skip this step silently. Continue to the next step.
|
||||
|
||||
If `CODEX_REVIEWS` is `enabled`: run both code review and adversarial challenge automatically (no prompt). Jump to the "Run Codex" section below.
|
||||
|
||||
If `CODEX_REVIEWS` is `not_set`: use AskUserQuestion to offer the one-time adoption prompt:
|
||||
|
||||
```
|
||||
Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping?
|
||||
GStack recommends enabling Codex code reviews — Codex is the super smart quiet engineer friend who will save your butt.
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to break this code
|
||||
C) Skip — ship without Codex review
|
||||
A) Enable for all future runs (recommended, default)
|
||||
B) Try it for now, ask me again later
|
||||
C) No thanks, don't ask me again
|
||||
```
|
||||
|
||||
If the user chooses A or B:
|
||||
|
||||
**For code review (A):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers
|
||||
to determine pass/fail gate. Persist the result:
|
||||
|
||||
If the user chooses A: persist the setting and run both:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}'
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews enabled
|
||||
```
|
||||
|
||||
If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?"
|
||||
If the user says no, stop. If yes, continue to Step 4.
|
||||
If the user chooses B: run both this time but do not persist any setting.
|
||||
|
||||
**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill).
|
||||
Present findings. This is informational — does not block shipping.
|
||||
If the user chooses C: persist the opt-out and skip:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-config set codex_reviews disabled
|
||||
```
|
||||
Then skip this step. Continue to the next step.
|
||||
|
||||
If Codex is not available, skip silently. Continue to Step 4.
|
||||
### Run Codex
|
||||
|
||||
Always run **both** code review and adversarial challenge. Use a 5-minute timeout (`timeout: 300000`) on each Bash call.
|
||||
|
||||
**Code review:** Run:
|
||||
```bash
|
||||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header:
|
||||
|
||||
```
|
||||
CODEX SAYS (code review):
|
||||
════════════════════════════════════════════════════════════
|
||||
<full codex output, verbatim — do not truncate or summarize>
|
||||
════════════════════════════════════════════════════════════
|
||||
GATE: PASS Tokens: N | Est. cost: ~$X.XX
|
||||
```
|
||||
|
||||
Check the output for `[P1]` markers. If found: `GATE: FAIL`. If no `[P1]`: `GATE: PASS`.
|
||||
|
||||
**If GATE is FAIL:** use AskUserQuestion:
|
||||
|
||||
```
|
||||
Codex found N critical issues in the diff.
|
||||
|
||||
A) Investigate and fix now (recommended)
|
||||
B) Ship anyway — these issues may cause production problems
|
||||
```
|
||||
|
||||
If the user chooses A: read the Codex findings carefully and work to address them. After fixing, re-run tests (Step 3) since code has changed. Then re-run `codex review` to verify the gate is now PASS.
|
||||
|
||||
If the user chooses B: continue to the next step.
|
||||
|
||||
**Adversarial challenge:** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached
|
||||
```
|
||||
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header. This is informational — it never blocks shipping.
|
||||
|
||||
**Persist the code review result:**
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
### Error handling
|
||||
|
||||
All errors are non-blocking — Codex is a quality enhancement, not a prerequisite.
|
||||
|
||||
- **Auth failure:** If codex prints an auth error to stderr, tell the user: "Codex authentication failed. Run \`codex login\` in your terminal to authenticate via ChatGPT." Continue to the next step.
|
||||
- **Timeout:** If the Bash call times out (5 min), tell the user: "Codex timed out after 5 minutes. The diff may be too large or the API may be slow." Continue to the next step.
|
||||
- **Empty response:** If codex returns no output, tell the user: "Codex returned no response. Check stderr for errors." Continue to the next step.
|
||||
|
||||
---
|
||||
|
||||
@@ -1124,7 +1183,7 @@ doc updates — the user runs `/ship` and documentation stays current without a
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
|
||||
- **Never force push.** Use regular `git push` only.
|
||||
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
|
||||
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), Codex critical findings ([P1]), and the one-time Codex adoption prompt.
|
||||
- **Always use the 4-digit version format** from the VERSION file.
|
||||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
|
||||
+2
-38
@@ -403,43 +403,7 @@ For each classified comment:
|
||||
|
||||
---
|
||||
|
||||
## Step 3.8: Codex second opinion (optional)
|
||||
|
||||
Check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Pre-landing review complete. Want an independent Codex (OpenAI) review before shipping?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to break this code
|
||||
C) Skip — ship without Codex review
|
||||
```
|
||||
|
||||
If the user chooses A or B:
|
||||
|
||||
**For code review (A):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS:` header. Check for `[P1]` markers
|
||||
to determine pass/fail gate. Persist the result:
|
||||
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
If GATE is FAIL, use AskUserQuestion: "Codex found critical issues. Ship anyway?"
|
||||
If the user says no, stop. If yes, continue to Step 4.
|
||||
|
||||
**For adversarial (B):** Run codex exec with the adversarial prompt (see /codex skill).
|
||||
Present findings. This is informational — does not block shipping.
|
||||
|
||||
If Codex is not available, skip silently. Continue to Step 4.
|
||||
|
||||
---
|
||||
{{CODEX_REVIEW_STEP}}
|
||||
|
||||
## Step 4: Version bump (auto-decide)
|
||||
|
||||
@@ -680,7 +644,7 @@ doc updates — the user runs `/ship` and documentation stays current without a
|
||||
- **Never skip tests.** If tests fail, stop.
|
||||
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
|
||||
- **Never force push.** Use regular `git push` only.
|
||||
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
|
||||
- **Never ask for trivial confirmations** (e.g., "ready to push?", "create PR?"). DO stop for: version bumps (MINOR/MAJOR), pre-landing review findings (ASK items), Codex critical findings ([P1]), and the one-time Codex adoption prompt.
|
||||
- **Always use the 4-digit version format** from the VERSION file.
|
||||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
|
||||
@@ -584,6 +584,16 @@ describe('Codex generation (--host codex)', () => {
|
||||
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-codex'))).toBe(false);
|
||||
});
|
||||
|
||||
test('Codex review step stripped from Codex-host ship and review', () => {
|
||||
const shipContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
|
||||
expect(shipContent).not.toContain('codex review --base');
|
||||
expect(shipContent).not.toContain('Investigate and fix');
|
||||
|
||||
const reviewContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
||||
expect(reviewContent).not.toContain('codex review --base');
|
||||
expect(reviewContent).not.toContain('Investigate and fix');
|
||||
});
|
||||
|
||||
test('--host codex --dry-run freshness', () => {
|
||||
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex', '--dry-run'], {
|
||||
cwd: ROOT,
|
||||
|
||||
@@ -1256,18 +1256,36 @@ describe('Codex skill', () => {
|
||||
expect(content).toContain('mktemp');
|
||||
});
|
||||
|
||||
test('codex integration in /review offers second opinion', () => {
|
||||
test('codex integration in /review has config-driven review step', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Codex second opinion');
|
||||
expect(content).toContain('Codex review');
|
||||
expect(content).toContain('codex_reviews');
|
||||
expect(content).toContain('codex review');
|
||||
expect(content).toContain('adversarial');
|
||||
expect(content).toContain('xhigh');
|
||||
expect(content).toContain('Investigate and fix');
|
||||
expect(content).toContain('CROSS-MODEL');
|
||||
});
|
||||
|
||||
test('codex integration in /ship offers review gate', () => {
|
||||
test('codex integration in /ship has config-driven review step', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Codex');
|
||||
expect(content).toContain('Codex review');
|
||||
expect(content).toContain('codex_reviews');
|
||||
expect(content).toContain('codex review');
|
||||
expect(content).toContain('codex-review');
|
||||
expect(content).toContain('xhigh');
|
||||
expect(content).toContain('Investigate and fix');
|
||||
});
|
||||
|
||||
test('codex-host ship/review do NOT contain codex review step', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-ship', 'SKILL.md'), 'utf-8');
|
||||
expect(shipContent).not.toContain('codex review --base');
|
||||
expect(shipContent).not.toContain('Investigate and fix');
|
||||
|
||||
const reviewContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-review', 'SKILL.md'), 'utf-8');
|
||||
expect(reviewContent).not.toContain('codex review --base');
|
||||
expect(reviewContent).not.toContain('codex_reviews');
|
||||
expect(reviewContent).not.toContain('Investigate and fix');
|
||||
});
|
||||
|
||||
test('codex integration in /plan-eng-review offers plan critique', () => {
|
||||
|
||||
Reference in New Issue
Block a user