mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-01 15:51:41 +02:00
v1.53.1.0 fix: non-interactive-safe plan-tune hook install (flags + smart defaults) (#1805)
* feat(config): add plan_tune_hooks setting (prompt|yes|no) Registers a new gstack-config key controlling whether ./setup installs the plan-tune Claude Code hooks. Default "prompt". Documented in the config header and surfaced in `gstack-config defaults` / `list`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup): make plan-tune hook install non-interactive-safe The plan-tune consent prompt used a blocking `read -r` with no timeout. Under a forwarded/automated TTY (conductor workspace setup, CI with a pty) it hung setup forever. Move the decision into flags + env + saved config with a smart default: --plan-tune-hooks / --no-plan-tune-hooks / --plan-tune-hooks=yes|no|prompt > GSTACK_PLAN_TUNE_HOOKS env > plan_tune_hooks config > prompt-on-real-TTY. Explicit yes/no act non-interactively. The remaining interactive branch is gated on a real (non-quiet) TTY and uses a time-bounded `read -t 10 </dev/tty` that defaults to skip, so it can never hang. A timeout no longer persists a decline marker, so a later hands-on run can still offer the install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): run setup non-interactively in dev/workspace mode Conductor runs bin/dev-setup under a forwarded pty, so any setup prompt (skill-prefix, plan-tune consent) would hang the workspace. Detach stdin (`setup </dev/null`) so every prompt takes its smart non-interactive default: flat skill names, skip the global plan-tune hook install without writing a decline marker. Saved prefix/config preferences are still honored, and a dev workspace no longer silently mutates ~/.claude/settings.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): guard plan-tune hooks stay non-interactive Static + binary-level regression test (free, <1s): asserts the flags are wired, the plan-tune read is time-bounded (no bare blocking read), explicit yes/no decisions short-circuit before the prompt, and gstack-config knows the plan_tune_hooks key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup,config): harden plan-tune decision against bad input Review follow-ups to the non-interactive plan-tune work: - setup now lowercases + whitespace-strips the resolved decision before the case match, so an explicit opt-in via flag/env ("YES", "Yes", " yes") is honored instead of silently falling through to "prompt"/skip. Also accepts on/off and 1/0. - gstack-config rejects out-of-domain plan_tune_hooks values (anything but prompt|yes|no) with a warning + fallback to prompt, matching the existing value-whitelist pattern for explain_level / artifacts_sync_mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): never mutate global hooks during workspace setup Closing stdin alone only suppresses the prompt branch; a saved `plan_tune_hooks: yes` or exported GSTACK_PLAN_TUNE_HOOKS=yes would still resolve to "install" and rewrite the user's global ~/.claude/settings.json to point at THIS ephemeral worktree — which breaks once the workspace is deleted. Pass --plan-tune-hooks=prompt (highest precedence) so dev-setup pins resolution to prompt-mode; with stdin closed that is a guaranteed no-op skip (no install, no decline marker). To install the hooks, run ./setup --plan-tune-hooks directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): isolate config tests from host + cover new guards - Point gstack-config tests at a temp GSTACK_HOME so `get plan_tune_hooks` reads the built-in default, not whatever the host machine has in ~/.gstack/config.yaml (the prior test was non-deterministic). - Add behavioral coverage: yes/no/prompt round-trip, out-of-domain rejection. - Add a normalization guard (decision input is lowercased/trimmed) and a dev-setup guard (runs setup with --plan-tune-hooks=prompt + stdin detached). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: rebaseline parity-suite v1.44.1 -> v1.53.0.0 The frozen v1.44.1 anchor went stale: five planning skills (plan-ceo-review, plan-eng-review, plan-design-review, investigate, office-hours) crept past the 1.05x ceiling via legitimate v1.49-v1.53 growth (brain-aware planning + the v1.53 redaction guard), so `bun test` was red on a clean checkout of main. Capture a fresh baseline at HEAD (bun run scripts/capture-baseline.ts --tag v1.53.0.0) and re-point the test at it. The per-skill 1.05 ratio is kept, so future bloat is still caught; only the anchor moved. Mirrors the earlier skill-size-budget rebase (v1.44.1 -> v1.47.0.0). Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 baselines are retained for the v1->v2 audit trail. The captured skill bytes equal origin/main exactly (this branch left every SKILL.md untouched). Clears the pre-existing failures noted in the v1.53.0.0 CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(plan-tune): de-flake "derive pushes scope_appetite up" The test was ~25-50% flaky (worse on main). gstack-question-log fires a fire-and-forget background `--derive` after every write; the 5 rapid log writes spawned 5 racing background derives that collided with the test's explicit --derive — a late one that only saw 3 entries could clobber developer-profile.json after the explicit one wrote sample_size=5. Set GSTACK_QUESTION_LOG_NO_DERIVE=1 (the flag the binary documents for exactly this case) so the writes don't spawn background derives. The explicit --derive still runs, so real derive behavior is still asserted. 20/20 green after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document non-interactive dev-setup + plan-tune hook flags (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,27 +2,22 @@
|
||||
|
||||
## Test infrastructure
|
||||
|
||||
### P0: Rebaseline parity-suite (v1.44.1) — stale, 5 pre-existing failures
|
||||
### ✅ DONE (v1.53.1.0): Rebaseline parity-suite (v1.44.1 → v1.53.0.0)
|
||||
|
||||
**What:** `test/parity-suite.test.ts` checks every skill's SKILL.md size against
|
||||
the frozen `test/fixtures/parity-baseline-v1.44.1.json`. Five planning skills now
|
||||
exceed the 1.05x ceiling: `plan-ceo-review` (1.052), `plan-eng-review` (1.062),
|
||||
`plan-design-review` (1.068), `investigate` (1.053), `office-hours` (1.065).
|
||||
**What:** `test/parity-suite.test.ts` checked every skill's SKILL.md size against
|
||||
the frozen `test/fixtures/parity-baseline-v1.44.1.json`. Five planning skills had
|
||||
crept past the 1.05x ceiling: `plan-ceo-review` (1.052), `plan-eng-review` (1.062),
|
||||
`plan-design-review` (1.068), `investigate` (1.053), `office-hours` (1.065) — growth
|
||||
from the brain-aware-planning releases (v1.49–v1.52) plus the v1.53 redaction guard.
|
||||
|
||||
**Why:** These grew during the brain-aware-planning releases (v1.49–v1.52) which
|
||||
added the `BRAIN_PREFLIGHT`/`BRAIN_CACHE_REFRESH`/`BRAIN_WRITE_BACK` resolvers to
|
||||
those skills. The v1.44.1 baseline was never regenerated, so it's four releases
|
||||
stale. The failures are pre-existing on `origin/main` (proven: they fail with the
|
||||
redaction branch absent). The active size gate (`skill-size-budget`, v1.47 baseline)
|
||||
passes, and parity-suite is not in CI's `test:gate`, so nothing is blocked — but the
|
||||
local `bun test` shows red until rebaselined.
|
||||
|
||||
**How to start:** Either regenerate the fixture to a current baseline
|
||||
(`bun run scripts/capture-baseline.ts <tag>` and point the test at it), or bump the
|
||||
per-skill ratio for the planning skills. Decide whether v1.44.1 should be retired in
|
||||
favor of the v1.47 baseline the size-budget test already uses.
|
||||
|
||||
**Depends on:** nothing. Standalone.
|
||||
**Resolved:** Captured a fresh baseline at HEAD via
|
||||
`bun run scripts/capture-baseline.ts --tag v1.53.0.0` and re-pointed the test at
|
||||
`test/fixtures/parity-baseline-v1.53.0.0.json`. The per-skill 1.05 ratio is kept, so
|
||||
future bloat is still caught — only the stale anchor moved. Mirrors the earlier
|
||||
`skill-size-budget` rebase (v1.44.1 → v1.47.0.0). Historical v1.44.1 / v1.46.0.0 /
|
||||
v1.47.0.0 baselines retained in `test/fixtures/` for the v1→v2 audit trail. The
|
||||
captured skill bytes match `origin/main` exactly (the rebasing branch left every
|
||||
SKILL.md untouched). `bun test` is green again.
|
||||
|
||||
## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user