chore: bump version and changelog (v1.57.2.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-07 19:53:01 -07:00
parent 6a6900f15b
commit 03053d4e9a
2 changed files with 69 additions and 1 deletions
+68
View File
@@ -1,5 +1,73 @@
# Changelog
## [1.57.2.0] - 2026-06-08
## **When the question picker breaks mid-skill, gstack asks in plain text instead of stalling.**
## **Every skill detects a dead AskUserQuestion and falls back to a full decision brief you answer by typing a letter.**
AskUserQuestion is how every gstack skill asks you to decide. When the host's question
tool fails at runtime, which Conductor's MCP integration currently does intermittently,
skills used to stall or hard-block. Now each skill detects the failure, works out
whether a human is actually present, and if so re-renders the exact same decision as a
text message: a plain-English explanation of the issue, a completeness score on each
choice, and a recommendation with its reason, one paragraph per choice. You answer by
typing a single letter. Headless eval runs still block cleanly (no human to answer);
orchestrator sessions keep auto-choosing. This whole release was built and reviewed
through that fallback, because the Conductor tool was down the entire session.
### The numbers that matter
No production benchmark for a reliability path like this. These are the behavior and
coverage facts, verifiable with `bun test test/gstack-session-kind.test.ts
test/resolver-ask-user-format.test.ts test/auq-error-fallback-hook.test.ts`.
| When AskUserQuestion fails | Before | After |
|---|---|---|
| Interactive session (human present) | stall / hard BLOCK | full prose decision brief, answer by letter |
| Headless eval / CI | BLOCK | BLOCK (unchanged, correct) |
| Orchestrator (OpenClaw) session | undefined | auto-choose recommended (contract kept) |
| Session kinds detected | 0 | 3 (interactive / headless / spawned) |
| New tests guarding the path | 0 | 34 |
The text brief is not a degraded stub. It carries the same three things the picker
shows: a clear explanation of what is being decided, a `Completeness: X/10` on every
choice, and a recommendation with the reason it wins.
### What this means for you
If your host's question tool flakes out, a skill no longer dies on you. You get the
same decision to make, in text, and you reply with a letter. Nothing changes when the
tool works normally. If you run gstack headless, those sessions still block on a needed
question exactly as before, so eval determinism is intact.
### Itemized changes
#### Added
- `gstack-session-kind` classifies each session as interactive, headless, or spawned,
echoed as `SESSION_KIND` at skill start so any skill can branch on it.
- Plain-text fallback for AskUserQuestion: on a tool failure in an interactive session,
the skill renders the full decision brief (issue ELI10 + per-choice completeness +
recommendation) as markdown you answer by typing a letter, then stops and waits.
- A defensive hook that, when an AskUserQuestion call errors, reminds the agent to run
the fallback for the current session kind.
#### Changed
- AskUserQuestion is still sent as a normal tool call; the prose path applies only when
the tool is unavailable or erroring, and never on a `[plan-tune auto-decide]` result.
#### Fixed
- Section-loading tests use the canonical kebab test names, so the test-coverage gate
matches them.
- External-host doc-freshness checks are deterministic, no longer dependent on a prior
full regeneration.
#### For contributors
- The eval/E2E runners set `GSTACK_HEADLESS=1` so headless runs classify correctly;
interactive-path suites opt out per-run.
- Per-skill `maxSizeRatio` override in the carve-guards registry; `document-release`
gets 1.08 headroom for the cross-cutting preamble addition while every other skill
keeps the 1.05 ceiling.
## [1.57.0.0] - 2026-06-07
## **Three more heavyweight skills load lighter, and every carved skill finally has a test that proves it loads.**
+1 -1
View File
@@ -1 +1 @@
1.57.0.0
1.57.2.0