v1.53.1.0 fix: non-interactive-safe plan-tune hook install (flags + smart defaults) (#1805)

* feat(config): add plan_tune_hooks setting (prompt|yes|no) Registers a new gstack-config key controlling whether ./setup installs the plan-tune Claude Code hooks. Default "prompt". Documented in the config header and surfaced in `gstack-config defaults` / `list`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup): make plan-tune hook install non-interactive-safe The plan-tune consent prompt used a blocking `read -r` with no timeout. Under a forwarded/automated TTY (conductor workspace setup, CI with a pty) it hung setup forever. Move the decision into flags + env + saved config with a smart default: --plan-tune-hooks / --no-plan-tune-hooks / --plan-tune-hooks=yes|no|prompt > GSTACK_PLAN_TUNE_HOOKS env > plan_tune_hooks config > prompt-on-real-TTY. Explicit yes/no act non-interactively. The remaining interactive branch is gated on a real (non-quiet) TTY and uses a time-bounded `read -t 10 </dev/tty` that defaults to skip, so it can never hang. A timeout no longer persists a decline marker, so a later hands-on run can still offer the install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): run setup non-interactively in dev/workspace mode Conductor runs bin/dev-setup under a forwarded pty, so any setup prompt (skill-prefix, plan-tune consent) would hang the workspace. Detach stdin (`setup </dev/null`) so every prompt takes its smart non-interactive default: flat skill names, skip the global plan-tune hook install without writing a decline marker. Saved prefix/config preferences are still honored, and a dev workspace no longer silently mutates ~/.claude/settings.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): guard plan-tune hooks stay non-interactive Static + binary-level regression test (free, <1s): asserts the flags are wired, the plan-tune read is time-bounded (no bare blocking read), explicit yes/no decisions short-circuit before the prompt, and gstack-config knows the plan_tune_hooks key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup,config): harden plan-tune decision against bad input Review follow-ups to the non-interactive plan-tune work: - setup now lowercases + whitespace-strips the resolved decision before the case match, so an explicit opt-in via flag/env ("YES", "Yes", " yes") is honored instead of silently falling through to "prompt"/skip. Also accepts on/off and 1/0. - gstack-config rejects out-of-domain plan_tune_hooks values (anything but prompt|yes|no) with a warning + fallback to prompt, matching the existing value-whitelist pattern for explain_level / artifacts_sync_mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): never mutate global hooks during workspace setup Closing stdin alone only suppresses the prompt branch; a saved `plan_tune_hooks: yes` or exported GSTACK_PLAN_TUNE_HOOKS=yes would still resolve to "install" and rewrite the user's global ~/.claude/settings.json to point at THIS ephemeral worktree — which breaks once the workspace is deleted. Pass --plan-tune-hooks=prompt (highest precedence) so dev-setup pins resolution to prompt-mode; with stdin closed that is a guaranteed no-op skip (no install, no decline marker). To install the hooks, run ./setup --plan-tune-hooks directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): isolate config tests from host + cover new guards - Point gstack-config tests at a temp GSTACK_HOME so `get plan_tune_hooks` reads the built-in default, not whatever the host machine has in ~/.gstack/config.yaml (the prior test was non-deterministic). - Add behavioral coverage: yes/no/prompt round-trip, out-of-domain rejection. - Add a normalization guard (decision input is lowercased/trimmed) and a dev-setup guard (runs setup with --plan-tune-hooks=prompt + stdin detached). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: rebaseline parity-suite v1.44.1 -> v1.53.0.0 The frozen v1.44.1 anchor went stale: five planning skills (plan-ceo-review, plan-eng-review, plan-design-review, investigate, office-hours) crept past the 1.05x ceiling via legitimate v1.49-v1.53 growth (brain-aware planning + the v1.53 redaction guard), so `bun test` was red on a clean checkout of main. Capture a fresh baseline at HEAD (bun run scripts/capture-baseline.ts --tag v1.53.0.0) and re-point the test at it. The per-skill 1.05 ratio is kept, so future bloat is still caught; only the anchor moved. Mirrors the earlier skill-size-budget rebase (v1.44.1 -> v1.47.0.0). Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 baselines are retained for the v1->v2 audit trail. The captured skill bytes equal origin/main exactly (this branch left every SKILL.md untouched). Clears the pre-existing failures noted in the v1.53.0.0 CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(plan-tune): de-flake "derive pushes scope_appetite up" The test was ~25-50% flaky (worse on main). gstack-question-log fires a fire-and-forget background `--derive` after every write; the 5 rapid log writes spawned 5 racing background derives that collided with the test's explicit --derive — a late one that only saw 3 entries could clobber developer-profile.json after the explicit one wrote sample_size=5. Set GSTACK_QUESTION_LOG_NO_DERIVE=1 (the flag the binary documents for exactly this case) so the writes don't spawn background derives. The explicit --derive still runs, so real derive behavior is still asserted. 20/20 green after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document non-interactive dev-setup + plan-tune hook flags (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-17 04:57:20 +02:00 · 2026-05-30 11:42:13 -07:00
parent dedfe42ef0
commit 9562ad4e70
12 changed files with 937 additions and 57 deletions
@@ -2,27 +2,22 @@

 ## Test infrastructure

-### P0: Rebaseline parity-suite (v1.44.1) — stale, 5 pre-existing failures
+### ✅ DONE (v1.53.1.0): Rebaseline parity-suite (v1.44.1 → v1.53.0.0)

-**What:** `test/parity-suite.test.ts` checks every skill's SKILL.md size against
-the frozen `test/fixtures/parity-baseline-v1.44.1.json`. Five planning skills now
-exceed the 1.05x ceiling: `plan-ceo-review` (1.052), `plan-eng-review` (1.062),
-`plan-design-review` (1.068), `investigate` (1.053), `office-hours` (1.065).
+**What:** `test/parity-suite.test.ts` checked every skill's SKILL.md size against
+the frozen `test/fixtures/parity-baseline-v1.44.1.json`. Five planning skills had
+crept past the 1.05x ceiling: `plan-ceo-review` (1.052), `plan-eng-review` (1.062),
+`plan-design-review` (1.068), `investigate` (1.053), `office-hours` (1.065) — growth
+from the brain-aware-planning releases (v1.49–v1.52) plus the v1.53 redaction guard.

-**Why:** These grew during the brain-aware-planning releases (v1.49–v1.52) which
-added the `BRAIN_PREFLIGHT`/`BRAIN_CACHE_REFRESH`/`BRAIN_WRITE_BACK` resolvers to
-those skills. The v1.44.1 baseline was never regenerated, so it's four releases
-stale. The failures are pre-existing on `origin/main` (proven: they fail with the
-redaction branch absent). The active size gate (`skill-size-budget`, v1.47 baseline)
-passes, and parity-suite is not in CI's `test:gate`, so nothing is blocked — but the
-local `bun test` shows red until rebaselined.
-
-**How to start:** Either regenerate the fixture to a current baseline
-(`bun run scripts/capture-baseline.ts <tag>` and point the test at it), or bump the
-per-skill ratio for the planning skills. Decide whether v1.44.1 should be retired in
-favor of the v1.47 baseline the size-budget test already uses.
-
-**Depends on:** nothing. Standalone.
+**Resolved:** Captured a fresh baseline at HEAD via
+`bun run scripts/capture-baseline.ts --tag v1.53.0.0` and re-pointed the test at
+`test/fixtures/parity-baseline-v1.53.0.0.json`. The per-skill 1.05 ratio is kept, so
+future bloat is still caught — only the stale anchor moved. Mirrors the earlier
+`skill-size-budget` rebase (v1.44.1 → v1.47.0.0). Historical v1.44.1 / v1.46.0.0 /
+v1.47.0.0 baselines retained in `test/fixtures/` for the v1→v2 audit trail. The
+captured skill bytes match `origin/main` exactly (the rebasing branch left every
+SKILL.md untouched). `bun test` is green again.

 ## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR)