fix(auq): harden error-fallback hook + harness per adversarial review

Codex pre-landing review found three real issues:
- The PostToolUse fallback hook shared source 'plan-tune-cathedral' with the
  question-log hook (same event+matcher); gstack-settings-hook replaces the entry,
  so it would have clobbered plan-tune capture. Give it its own 'auq-error-fallback'
  source (separate entry, both run); ALREADY_INSTALLED now requires both sources.
- isErrorResponse triggered on any string containing 'internal error'/'is_error',
  so a real answer or a {"is_error": false} payload could fire the fallback after a
  successful question. Narrow it to the missing-result sentinel + boolean is_error.
- The SDK runner mutated process.env.GSTACK_HEADLESS process-wide (leaked headless
  into later tests). Removed; GSTACK_HEADLESS=1 now lives in the eval package.json
  scripts, scoped to the invocation and inherited by the SDK child.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-07 19:53:01 -07:00
parent 0ddaf27aa2
commit 6a6900f15b
5 changed files with 53 additions and 32 deletions
+14 -6
View File
@@ -1325,9 +1325,13 @@ if [ "$NO_TEAM_MODE" -ne 1 ] \
&& [ -x "$PLAN_TUNE_LOG_HOOK" ] \
&& [ -x "$PLAN_TUNE_PREF_HOOK" ]; then
# Already installed? Check the settings.json for our source tag.
# Already installed? Require BOTH the plan-tune source AND the AUQ-error-fallback
# source — so an existing install that predates the fallback hook re-runs the
# install (which is idempotent for the plan-tune hooks) and picks up the new one.
ALREADY_INSTALLED=0
if "$SETTINGS_HOOK" list-sources 2>/dev/null | grep -q "plan-tune-cathedral"; then
_HOOK_SOURCES=$("$SETTINGS_HOOK" list-sources 2>/dev/null || true)
if printf '%s' "$_HOOK_SOURCES" | grep -q "plan-tune-cathedral" \
&& printf '%s' "$_HOOK_SOURCES" | grep -q "auq-error-fallback"; then
ALREADY_INSTALLED=1
fi
@@ -1365,15 +1369,19 @@ if [ "$NO_TEAM_MODE" -ne 1 ] \
--command "$PLAN_TUNE_PREF_HOOK" \
--source plan-tune-cathedral \
--timeout 5
# AUQ-failure prose-fallback reliability hook (OV3:B). Fires only when an
# AskUserQuestion call returns an error/missing result; inert on success and
# inert if the platform doesn't invoke PostToolUse on tool errors.
# AskUserQuestion-failure prose-fallback reliability hook (OV3:B). Fires only when
# an AskUserQuestion call returns an error/missing result; inert on success and
# inert if the platform doesn't invoke PostToolUse on tool errors. MUST use its
# OWN source tag: gstack-settings-hook dedupes by (event, matcher, source) and
# REPLACES the entry's hooks, so sharing 'plan-tune-cathedral' would overwrite the
# question-log capture hook (same event+matcher). A distinct source = a second
# PostToolUse entry; both run in parallel.
if [ -x "$AUQ_ERROR_FALLBACK_HOOK" ]; then
"$SETTINGS_HOOK" add-event \
--event PostToolUse \
--matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \
--command "$AUQ_ERROR_FALLBACK_HOOK" \
--source plan-tune-cathedral \
--source auq-error-fallback \
--timeout 5
fi
}