mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-10 12:03:59 +02:00
4dfdb7cdc2
* feat(auq): add gstack-session-kind + echo SESSION_KIND in preamble Classifies the session as spawned | headless | interactive from env markers (OPENCLAW_SESSION / GSTACK_HEADLESS / CONDUCTOR_* / CLAUDE_CODE_ENTRYPOINT / CI), defaulting to interactive. Echoed once at skill start alongside BRANCH/REPO_MODE so the AskUserQuestion-failure fallback can branch without a shell-out at failure time. Degrade-safe: empty/error => interactive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(auq): prose fallback when AskUserQuestion fails (interactive sessions) On a genuine AUQ failure (tool absent, or present-but-erroring like Conductor's flaky MCP returning '[Tool result missing due to internal error]'): retry once, then branch on SESSION_KIND — spawned auto-chooses, headless BLOCKs, interactive renders a prose decision brief the user answers by typing a letter. The prose fallback MUST surface the triad: a clear ELI10 of the issue, a per-choice Completeness score, and a recommendation+why (one paragraph per choice). Carves out the [plan-tune auto-decide] denial as NOT a failure, and qualifies the former 'tool_use, not prose' assertions so the rule isn't self-contradicting. Tests pin the triad, the SESSION_KIND branch, the OV2 collision guard, the always-loaded guarantee, and a cross-file invariant on the auto-decide prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(auq): default GSTACK_HEADLESS=1 in eval/E2E runners Headless harness runs classify as headless (BLOCK on AUQ failure rather than emit a prose question no one reads). SDK runner uses ambient mutation, not the Options.env object, to avoid breaking the SDK auth pipeline. Interactive-path suites opt out by overriding the env per-run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(auq): defensive PostToolUse error-fallback hook (OV3:B) When an AskUserQuestion call returns an error/missing result, this hook injects additionalContext reminding the model to run the prose fallback for the current SESSION_KIND. It does not render prose itself — it guarantees the reminder fires at the moment of failure instead of relying on the model recalling SESSION_KIND. Inert on success and inert if the platform never invokes PostToolUse on tool errors (unverified — could not force the Conductor MCP error in a harness; see the spike doc). The prompt-level fallback covers the case regardless. Decision logic is unit-tested deterministically; registered in setup beside the existing AUQ hooks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(auq): regenerate SKILL.md for all hosts + refresh ship goldens Regenerated from the resolver changes (gen:skill-docs --host all). Refreshes the byte-exact ship golden fixtures (claude/codex/factory). Spec prose tightened so the cross-cutting preamble addition stays under the 5% per-skill parity ceiling (investigate 4.8%) — guard unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): kebab testNames for section-loading E2Es to match TOUCHFILES keys The two section-loading E2E tests used display-form testNames ('/ship section-loading', '/plan-ceo-review section-loading') while every other E2E testName and their E2E_TOUCHFILES keys are kebab. The completeness gate does an exact `name in E2E_TOUCHFILES` check, so it failed (pre-existing on main); diff- based selection also couldn't match them. Align to ship-section-loading / plan-ceo-section-loading. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): make external-host freshness checks deterministic The parameterized host smoke + --host all freshness tests assumed an external `gen:skill-docs --host all` had run first (it never does in `bun test`), so which host reported STALE varied by sibling-test timing — flaky. Regenerate the gitignored external host dirs in a beforeAll so the --dry-run check is deterministic. It still catches non-deterministic generation (the real bug class for regenerated outputs); the tracked-claude freshness test runs earlier and is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(parity): headroom for AUQ cross-cutting addition on carved document-release Merging main brought the carve of document-release (smaller skeleton); the AUQ prose-fallback adds ~2KB to every skill's always-loaded preamble, landing document-release at ~5.9% over the pre-carve v1.53.0.0 baseline. Add a per-carve maxSizeRatio override (CARVE_GUARDS single source of truth) and bump only this skill to 1.08. All other skills keep the strict 1.05 ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(auq): harden error-fallback hook + harness per adversarial review Codex pre-landing review found three real issues: - The PostToolUse fallback hook shared source 'plan-tune-cathedral' with the question-log hook (same event+matcher); gstack-settings-hook replaces the entry, so it would have clobbered plan-tune capture. Give it its own 'auq-error-fallback' source (separate entry, both run); ALREADY_INSTALLED now requires both sources. - isErrorResponse triggered on any string containing 'internal error'/'is_error', so a real answer or a {"is_error": false} payload could fire the fallback after a successful question. Narrow it to the missing-result sentinel + boolean is_error. - The SDK runner mutated process.env.GSTACK_HEADLESS process-wide (leaked headless into later tests). Removed; GSTACK_HEADLESS=1 now lives in the eval package.json scripts, scoped to the invocation and inherited by the SDK child. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.2.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
122 lines
6.2 KiB
TypeScript
122 lines
6.2 KiB
TypeScript
import type { TemplateContext } from '../types';
|
|
import { getHostConfig } from '../../../hosts/index';
|
|
|
|
export function generatePreambleBash(ctx: TemplateContext): string {
|
|
const hostConfig = getHostConfig(ctx.host);
|
|
const runtimeRoot = hostConfig.usesEnvVars
|
|
? `_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
|
GSTACK_ROOT="$HOME/${hostConfig.globalRoot}"
|
|
[ -n "$_ROOT" ] && [ -d "$_ROOT/${ctx.paths.localSkillRoot}" ] && GSTACK_ROOT="$_ROOT/${ctx.paths.localSkillRoot}"
|
|
GSTACK_BIN="$GSTACK_ROOT/bin"
|
|
GSTACK_BROWSE="$GSTACK_ROOT/browse/dist"
|
|
GSTACK_DESIGN="$GSTACK_ROOT/design/dist"
|
|
`
|
|
: '';
|
|
|
|
return `## Preamble (run first)
|
|
|
|
\`\`\`bash
|
|
${runtimeRoot}_UPD=$(${ctx.paths.binDir}/gstack-update-check 2>/dev/null || ${ctx.paths.localSkillRoot}/bin/gstack-update-check 2>/dev/null || true)
|
|
[ -n "$_UPD" ] && echo "$_UPD" || true
|
|
mkdir -p ~/.gstack/sessions
|
|
touch ~/.gstack/sessions/"$PPID"
|
|
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
|
|
find ~/.gstack/sessions -mmin +120 -type f -exec rm {} + 2>/dev/null || true
|
|
_PROACTIVE=$(${ctx.paths.binDir}/gstack-config get proactive 2>/dev/null || echo "true")
|
|
_PROACTIVE_PROMPTED=$([ -f ~/.gstack/.proactive-prompted ] && echo "yes" || echo "no")
|
|
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
|
|
echo "BRANCH: $_BRANCH"
|
|
_SKILL_PREFIX=$(${ctx.paths.binDir}/gstack-config get skill_prefix 2>/dev/null || echo "false")
|
|
echo "PROACTIVE: $_PROACTIVE"
|
|
echo "PROACTIVE_PROMPTED: $_PROACTIVE_PROMPTED"
|
|
echo "SKILL_PREFIX: $_SKILL_PREFIX"
|
|
source <(${ctx.paths.binDir}/gstack-repo-mode 2>/dev/null) || true
|
|
REPO_MODE=\${REPO_MODE:-unknown}
|
|
echo "REPO_MODE: $REPO_MODE"
|
|
_SESSION_KIND=$(${ctx.paths.binDir}/gstack-session-kind 2>/dev/null || echo "interactive")
|
|
case "$_SESSION_KIND" in spawned|headless|interactive) ;; *) _SESSION_KIND="interactive" ;; esac
|
|
echo "SESSION_KIND: $_SESSION_KIND"
|
|
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
|
|
echo "LAKE_INTRO: $_LAKE_SEEN"
|
|
_TEL=$(${ctx.paths.binDir}/gstack-config get telemetry 2>/dev/null || true)
|
|
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
|
|
_TEL_START=$(date +%s)
|
|
_SESSION_ID="$$-$(date +%s)"
|
|
echo "TELEMETRY: \${_TEL:-off}"
|
|
echo "TEL_PROMPTED: $_TEL_PROMPTED"
|
|
_EXPLAIN_LEVEL=$(${ctx.paths.binDir}/gstack-config get explain_level 2>/dev/null || echo "default")
|
|
if [ "$_EXPLAIN_LEVEL" != "default" ] && [ "$_EXPLAIN_LEVEL" != "terse" ]; then _EXPLAIN_LEVEL="default"; fi
|
|
echo "EXPLAIN_LEVEL: $_EXPLAIN_LEVEL"
|
|
_QUESTION_TUNING=$(${ctx.paths.binDir}/gstack-config get question_tuning 2>/dev/null || echo "false")
|
|
echo "QUESTION_TUNING: $_QUESTION_TUNING"
|
|
mkdir -p ~/.gstack/analytics
|
|
if [ "$_TEL" != "off" ]; then
|
|
echo '{"skill":"${ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(_repo=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null | tr -cd 'a-zA-Z0-9._-'); echo "\${_repo:-unknown}")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
|
|
fi
|
|
for _PF in $(find ~/.gstack/analytics -maxdepth 1 -name '.pending-*' 2>/dev/null); do
|
|
if [ -f "$_PF" ]; then
|
|
if [ "$_TEL" != "off" ] && [ -x "${ctx.paths.binDir}/gstack-telemetry-log" ]; then
|
|
${ctx.paths.binDir}/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true
|
|
fi
|
|
rm -f "$_PF" 2>/dev/null || true
|
|
fi
|
|
break
|
|
done
|
|
eval "$(${ctx.paths.binDir}/gstack-slug 2>/dev/null)" 2>/dev/null || true
|
|
_LEARN_FILE="\${GSTACK_HOME:-$HOME/.gstack}/projects/\${SLUG:-unknown}/learnings.jsonl"
|
|
if [ -f "$_LEARN_FILE" ]; then
|
|
_LEARN_COUNT=$(wc -l < "$_LEARN_FILE" 2>/dev/null | tr -d ' ')
|
|
echo "LEARNINGS: $_LEARN_COUNT entries loaded"
|
|
if [ "$_LEARN_COUNT" -gt 5 ] 2>/dev/null; then
|
|
${ctx.paths.binDir}/gstack-learnings-search --limit 3 2>/dev/null || true
|
|
fi
|
|
else
|
|
echo "LEARNINGS: 0"
|
|
fi
|
|
${ctx.paths.binDir}/gstack-timeline-log '{"skill":"${ctx.skillName}","event":"started","branch":"'"$_BRANCH"'","session":"'"$_SESSION_ID"'"}' 2>/dev/null &
|
|
_HAS_ROUTING="no"
|
|
if [ -f CLAUDE.md ] && grep -q "## Skill routing" CLAUDE.md 2>/dev/null; then
|
|
_HAS_ROUTING="yes"
|
|
fi
|
|
_ROUTING_DECLINED=$(${ctx.paths.binDir}/gstack-config get routing_declined 2>/dev/null || echo "false")
|
|
echo "HAS_ROUTING: $_HAS_ROUTING"
|
|
echo "ROUTING_DECLINED: $_ROUTING_DECLINED"
|
|
_VENDORED="no"
|
|
if [ -d ".claude/skills/gstack" ] && [ ! -L ".claude/skills/gstack" ]; then
|
|
if [ -f ".claude/skills/gstack/VERSION" ] || [ -d ".claude/skills/gstack/.git" ]; then
|
|
_VENDORED="yes"
|
|
fi
|
|
fi
|
|
echo "VENDORED_GSTACK: $_VENDORED"
|
|
echo "MODEL_OVERLAY: ${ctx.model ?? 'none'}"
|
|
_CHECKPOINT_MODE=$(${ctx.paths.binDir}/gstack-config get checkpoint_mode 2>/dev/null || echo "explicit")
|
|
_CHECKPOINT_PUSH=$(${ctx.paths.binDir}/gstack-config get checkpoint_push 2>/dev/null || echo "false")
|
|
echo "CHECKPOINT_MODE: $_CHECKPOINT_MODE"
|
|
echo "CHECKPOINT_PUSH: $_CHECKPOINT_PUSH"
|
|
# Plan-mode hint for skills like /spec that branch behavior on plan-mode state.
|
|
# Claude Code exposes plan mode via system reminders; we detect best-effort
|
|
# from CLAUDE_PLAN_FILE (set by the harness when plan mode is active) and
|
|
# fall back to "inactive". Codex hosts and Claude execution mode both end up
|
|
# inactive, which is the safe default (defaults to file+execute pipeline).
|
|
if [ -n "\${CLAUDE_PLAN_FILE:-}\${GSTACK_PLAN_MODE_FORCE:-}" ]; then
|
|
export GSTACK_PLAN_MODE="active"
|
|
elif [ "\${GSTACK_PLAN_MODE:-}" = "active" ]; then
|
|
export GSTACK_PLAN_MODE="active"
|
|
else
|
|
export GSTACK_PLAN_MODE="inactive"
|
|
fi
|
|
echo "GSTACK_PLAN_MODE: $GSTACK_PLAN_MODE"
|
|
[ -n "$OPENCLAW_SESSION" ] && echo "SPAWNED_SESSION: true" || true${ctx.host === 'gbrain' || ctx.host === 'hermes' ? `
|
|
if command -v gbrain &>/dev/null; then
|
|
_BRAIN_JSON=$(gbrain doctor --fast --json 2>/dev/null || echo '{}')
|
|
_BRAIN_SCORE=$(echo "$_BRAIN_JSON" | grep -o '"health_score":[0-9]*' | cut -d: -f2)
|
|
_BRAIN_FAILS=$(echo "$_BRAIN_JSON" | grep -o '"status":"fail"' | wc -l | tr -d ' ')
|
|
_BRAIN_WARNS=$(echo "$_BRAIN_JSON" | grep -o '"status":"warn"' | wc -l | tr -d ' ')
|
|
echo "BRAIN_HEALTH: \${_BRAIN_SCORE:-unknown} (\${_BRAIN_FAILS:-0} failures, \${_BRAIN_WARNS:-0} warnings)"
|
|
if [ "\${_BRAIN_SCORE:-100}" -lt 50 ] 2>/dev/null; then
|
|
echo "$_BRAIN_JSON" | grep -o '"name":"[^"]*","status":"[^"]*","message":"[^"]*"' || true
|
|
fi
|
|
fi` : ''}
|
|
\`\`\``;
|
|
}
|