Files
gstack/bin/gstack-codex-probe
Garry Tan 9ec4ab7eb9 codex + Apple Silicon hardening wave (v0.18.4.0) (#1056)
* fix: ad-hoc codesign compiled binaries on Apple Silicon after build

On some Apple Silicon machines, Bun's --compile produces a corrupt or
linker-only code signature. macOS kills these binaries with SIGKILL
(exit 137, zsh: killed) before they execute a single instruction.

Add a post-build codesign step to setup that runs only on Darwin arm64:
1. Remove the corrupt/linker-only signature (required — a direct re-sign
   fails with 'invalid or unsupported format for signature')
2. Apply a fresh ad-hoc signature

The step is idempotent, costs <1s, and is what Bun's own docs recommend
for distributed standalone executables. All four compiled binaries are
covered: browse, find-browse, design, and gstack-global-discover.
Failure is a non-fatal warning so Intel/CI builds are unaffected.

Fixes #997

* fix: prevent codex exec stdin deadlock with </dev/null redirect

codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe
(Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY
stdin and waits for EOF to append it as a <stdin> block, even when the
prompt is passed as a positional argument.

Fix: add < /dev/null to every codex exec and codex review invocation
in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl).
Generated SKILL.md files will be produced by bun run gen:skill-docs
in a subsequent commit (Tension D: template+resolver only, generator
is authoritative, not cherry-picked artifacts).

Affected source files (16 total invocations):
- scripts/resolvers/review.ts (4)
- scripts/resolvers/design.ts (3)
- codex/SKILL.md.tmpl (5)
- autoplan/SKILL.md.tmpl (4)

Fixes #971

Co-Authored-By: loning <loning@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: codex/autoplan hardening + Apple Silicon coreutils auto-install

Hardens /codex and /autoplan against silent failures surfaced by the #972
stdin fix and #1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the #972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock #972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from #972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: plan-review right-sized diff counterbalance (not minimal-diff default)

/plan-ceo-review and /plan-eng-review listed "minimal diff" as an
engineering preference without counterbalancing language. Reviewers
picked up on that and rejected rewrites that should have been approved.

The preference is now framed as "right-sized diff" with explicit
permission to recommend a rewrite when the existing foundation is
broken. Implementation alternatives section in CEO review gets an
equal-weight clarification: don't default to minimal viable just
because it is smaller. Recommend whichever best serves the user's
goal; if the right answer is a rewrite, say so.

Three-line tone edit per template, no voice / ETHOS / YC / promotional
content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v0.18.4.0 — codex + Apple Silicon hardening wave

- Apple Silicon codesign fix (#1003 @voidborne-d)
- Codex stdin deadlock fix (#972 @loning)
- Codex timeout wrapper (gtimeout → timeout → unwrapped fallback)
- Multi-signal auth gate for /codex + /autoplan
- Codex version warning for known-bad CLI (0.120.0-0.120.2)
- Challenge mode stderr capture + completeness check
- Plan-review right-sized diff counterbalance
- Failure telemetry + auto-log timeout as operational learning
- 25 deterministic unit tests + dual-voice periodic E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: loning <loning@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:30:54 +08:00

103 lines
4.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# gstack-codex-probe: shared helper for /codex and /autoplan skills.
# Sourced from template bash blocks; never execute directly.
#
# Functions (all prefixed with _gstack_codex_ for namespace hygiene):
# _gstack_codex_auth_probe — multi-signal auth check (env + file)
# _gstack_codex_version_check — warn on known-bad Codex CLI versions
# _gstack_codex_timeout_wrapper — gtimeout -> timeout -> unwrapped fallback
# _gstack_codex_log_event — telemetry emission to ~/.gstack/analytics/
#
# Hygiene rules (enforced by test/codex-hardening.test.ts):
# - Never set -e / set -u / trap / IFS= / PATH= in this file.
# - All internal vars prefix with _GSTACK_CODEX_.
# - All functions prefix with _gstack_codex_.
# - No command execution at source time (only function defs).
# --- Auth probe -------------------------------------------------------------
_gstack_codex_auth_probe() {
# Multi-signal: env vars OR auth file. Avoids false negatives for env-auth
# users (CI, platform engineers) that a file-only check would reject.
local _codex_home="${CODEX_HOME:-$HOME/.codex}"
# Use `-n` which returns true only for non-empty non-whitespace. Bash's [ -n ]
# alone allows whitespace; pair with a whitespace strip for robustness.
local _k1 _k2
_k1=$(printf '%s' "${CODEX_API_KEY:-}" | tr -d '[:space:]')
_k2=$(printf '%s' "${OPENAI_API_KEY:-}" | tr -d '[:space:]')
if [ -n "$_k1" ] || [ -n "$_k2" ] || [ -f "$_codex_home/auth.json" ]; then
echo "AUTH_OK"
return 0
fi
echo "AUTH_FAILED"
return 1
}
# --- Version check ----------------------------------------------------------
_gstack_codex_version_check() {
# Warn on known-bad Codex CLI versions. Anchored regex prevents false
# positives like 0.120.10 or 0.120.20 from matching. 0.120.2-beta still
# matches the bad release and gets warned (it IS buggy).
# Update this list when a new Codex CLI version regresses.
local _ver
_ver=$(codex --version 2>/dev/null | head -1)
[ -z "$_ver" ] && return 0
if echo "$_ver" | grep -Eq '(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)'; then
echo "WARN: Codex CLI $_ver has known stdin deadlock bugs. Run: npm install -g @openai/codex@latest"
_gstack_codex_log_event "codex_version_warning"
fi
}
# --- Timeout wrapper --------------------------------------------------------
_gstack_codex_timeout_wrapper() {
# Resolve wrapper binary: prefer gtimeout (Homebrew coreutils on macOS),
# fall back to timeout (Linux), else run unwrapped. Arguments: $1 is the
# duration in seconds; rest is the command to run.
local _duration="$1"
shift
local _to
_to=$(command -v gtimeout 2>/dev/null || command -v timeout 2>/dev/null || echo "")
if [ -n "$_to" ]; then
"$_to" "$_duration" "$@"
else
"$@"
fi
}
# --- Telemetry event --------------------------------------------------------
_gstack_codex_log_event() {
# Emit a telemetry event to ~/.gstack/analytics/skill-usage.jsonl.
# Gated on $_TEL != "off" (caller sets this from gstack-config).
# Event types: codex_timeout, codex_auth_failed, codex_cli_missing,
# codex_version_warning.
# Payload schema: {skill, event, duration_s, ts}. NEVER includes prompt
# content, env var values, or auth tokens.
local _event="$1"
local _duration="${2:-0}"
[ "${_TEL:-off}" = "off" ] && return 0
mkdir -p "$HOME/.gstack/analytics" 2>/dev/null || return 0
local _ts
_ts=$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)
printf '{"skill":"codex","event":"%s","duration_s":"%s","ts":"%s"}\n' \
"$_event" "$_duration" "$_ts" \
>> "$HOME/.gstack/analytics/skill-usage.jsonl" 2>/dev/null || true
}
# --- Learnings log on hang --------------------------------------------------
_gstack_codex_log_hang() {
# Invoked when a codex invocation times out (exit 124). Records an
# operational learning so future /investigate sessions surface the pattern.
# Best-effort: errors swallowed.
local _mode="${1:-unknown}"
local _prompt_size="${2:-0}"
local _log_bin="$HOME/.claude/skills/gstack/bin/gstack-learnings-log"
[ -x "$_log_bin" ] || return 0
local _key="codex-hang-$(date +%s 2>/dev/null || echo unknown)"
"$_log_bin" "$(printf '{"skill":"codex","type":"operational","key":"%s","insight":"Codex timed out after 600s during [%s] invocation. Prompt size: %s. Consider splitting prompt or checking network.","confidence":8,"source":"observed","files":["codex/SKILL.md.tmpl","autoplan/SKILL.md.tmpl"]}' "$_key" "$_mode" "$_prompt_size")" \
>/dev/null 2>&1 || true
}