Files
gstack/bin/gstack-codex-probe
T
Garry Tan 4bf5ae12c0 feat: codex/autoplan hardening + Apple Silicon coreutils auto-install
Hardens /codex and /autoplan against silent failures surfaced by the #972
stdin fix and #1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the #972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock #972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from #972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:10:21 +08:00

103 lines
4.5 KiB
Bash
Executable File

#!/usr/bin/env bash
# gstack-codex-probe: shared helper for /codex and /autoplan skills.
# Sourced from template bash blocks; never execute directly.
#
# Functions (all prefixed with _gstack_codex_ for namespace hygiene):
# _gstack_codex_auth_probe — multi-signal auth check (env + file)
# _gstack_codex_version_check — warn on known-bad Codex CLI versions
# _gstack_codex_timeout_wrapper — gtimeout -> timeout -> unwrapped fallback
# _gstack_codex_log_event — telemetry emission to ~/.gstack/analytics/
#
# Hygiene rules (enforced by test/codex-hardening.test.ts):
# - Never set -e / set -u / trap / IFS= / PATH= in this file.
# - All internal vars prefix with _GSTACK_CODEX_.
# - All functions prefix with _gstack_codex_.
# - No command execution at source time (only function defs).
# --- Auth probe -------------------------------------------------------------
_gstack_codex_auth_probe() {
# Multi-signal: env vars OR auth file. Avoids false negatives for env-auth
# users (CI, platform engineers) that a file-only check would reject.
local _codex_home="${CODEX_HOME:-$HOME/.codex}"
# Use `-n` which returns true only for non-empty non-whitespace. Bash's [ -n ]
# alone allows whitespace; pair with a whitespace strip for robustness.
local _k1 _k2
_k1=$(printf '%s' "${CODEX_API_KEY:-}" | tr -d '[:space:]')
_k2=$(printf '%s' "${OPENAI_API_KEY:-}" | tr -d '[:space:]')
if [ -n "$_k1" ] || [ -n "$_k2" ] || [ -f "$_codex_home/auth.json" ]; then
echo "AUTH_OK"
return 0
fi
echo "AUTH_FAILED"
return 1
}
# --- Version check ----------------------------------------------------------
_gstack_codex_version_check() {
# Warn on known-bad Codex CLI versions. Anchored regex prevents false
# positives like 0.120.10 or 0.120.20 from matching. 0.120.2-beta still
# matches the bad release and gets warned (it IS buggy).
# Update this list when a new Codex CLI version regresses.
local _ver
_ver=$(codex --version 2>/dev/null | head -1)
[ -z "$_ver" ] && return 0
if echo "$_ver" | grep -Eq '(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)'; then
echo "WARN: Codex CLI $_ver has known stdin deadlock bugs. Run: npm install -g @openai/codex@latest"
_gstack_codex_log_event "codex_version_warning"
fi
}
# --- Timeout wrapper --------------------------------------------------------
_gstack_codex_timeout_wrapper() {
# Resolve wrapper binary: prefer gtimeout (Homebrew coreutils on macOS),
# fall back to timeout (Linux), else run unwrapped. Arguments: $1 is the
# duration in seconds; rest is the command to run.
local _duration="$1"
shift
local _to
_to=$(command -v gtimeout 2>/dev/null || command -v timeout 2>/dev/null || echo "")
if [ -n "$_to" ]; then
"$_to" "$_duration" "$@"
else
"$@"
fi
}
# --- Telemetry event --------------------------------------------------------
_gstack_codex_log_event() {
# Emit a telemetry event to ~/.gstack/analytics/skill-usage.jsonl.
# Gated on $_TEL != "off" (caller sets this from gstack-config).
# Event types: codex_timeout, codex_auth_failed, codex_cli_missing,
# codex_version_warning.
# Payload schema: {skill, event, duration_s, ts}. NEVER includes prompt
# content, env var values, or auth tokens.
local _event="$1"
local _duration="${2:-0}"
[ "${_TEL:-off}" = "off" ] && return 0
mkdir -p "$HOME/.gstack/analytics" 2>/dev/null || return 0
local _ts
_ts=$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo unknown)
printf '{"skill":"codex","event":"%s","duration_s":"%s","ts":"%s"}\n' \
"$_event" "$_duration" "$_ts" \
>> "$HOME/.gstack/analytics/skill-usage.jsonl" 2>/dev/null || true
}
# --- Learnings log on hang --------------------------------------------------
_gstack_codex_log_hang() {
# Invoked when a codex invocation times out (exit 124). Records an
# operational learning so future /investigate sessions surface the pattern.
# Best-effort: errors swallowed.
local _mode="${1:-unknown}"
local _prompt_size="${2:-0}"
local _log_bin="$HOME/.claude/skills/gstack/bin/gstack-learnings-log"
[ -x "$_log_bin" ] || return 0
local _key="codex-hang-$(date +%s 2>/dev/null || echo unknown)"
"$_log_bin" "$(printf '{"skill":"codex","type":"operational","key":"%s","insight":"Codex timed out after 600s during [%s] invocation. Prompt size: %s. Consider splitting prompt or checking network.","confidence":8,"source":"observed","files":["codex/SKILL.md.tmpl","autoplan/SKILL.md.tmpl"]}' "$_key" "$_mode" "$_prompt_size")" \
>/dev/null 2>&1 || true
}