mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-02 00:01:37 +02:00
9562ad4e70
* feat(config): add plan_tune_hooks setting (prompt|yes|no) Registers a new gstack-config key controlling whether ./setup installs the plan-tune Claude Code hooks. Default "prompt". Documented in the config header and surfaced in `gstack-config defaults` / `list`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup): make plan-tune hook install non-interactive-safe The plan-tune consent prompt used a blocking `read -r` with no timeout. Under a forwarded/automated TTY (conductor workspace setup, CI with a pty) it hung setup forever. Move the decision into flags + env + saved config with a smart default: --plan-tune-hooks / --no-plan-tune-hooks / --plan-tune-hooks=yes|no|prompt > GSTACK_PLAN_TUNE_HOOKS env > plan_tune_hooks config > prompt-on-real-TTY. Explicit yes/no act non-interactively. The remaining interactive branch is gated on a real (non-quiet) TTY and uses a time-bounded `read -t 10 </dev/tty` that defaults to skip, so it can never hang. A timeout no longer persists a decline marker, so a later hands-on run can still offer the install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): run setup non-interactively in dev/workspace mode Conductor runs bin/dev-setup under a forwarded pty, so any setup prompt (skill-prefix, plan-tune consent) would hang the workspace. Detach stdin (`setup </dev/null`) so every prompt takes its smart non-interactive default: flat skill names, skip the global plan-tune hook install without writing a decline marker. Saved prefix/config preferences are still honored, and a dev workspace no longer silently mutates ~/.claude/settings.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): guard plan-tune hooks stay non-interactive Static + binary-level regression test (free, <1s): asserts the flags are wired, the plan-tune read is time-bounded (no bare blocking read), explicit yes/no decisions short-circuit before the prompt, and gstack-config knows the plan_tune_hooks key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(setup,config): harden plan-tune decision against bad input Review follow-ups to the non-interactive plan-tune work: - setup now lowercases + whitespace-strips the resolved decision before the case match, so an explicit opt-in via flag/env ("YES", "Yes", " yes") is honored instead of silently falling through to "prompt"/skip. Also accepts on/off and 1/0. - gstack-config rejects out-of-domain plan_tune_hooks values (anything but prompt|yes|no) with a warning + fallback to prompt, matching the existing value-whitelist pattern for explain_level / artifacts_sync_mode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dev-setup): never mutate global hooks during workspace setup Closing stdin alone only suppresses the prompt branch; a saved `plan_tune_hooks: yes` or exported GSTACK_PLAN_TUNE_HOOKS=yes would still resolve to "install" and rewrite the user's global ~/.claude/settings.json to point at THIS ephemeral worktree — which breaks once the workspace is deleted. Pass --plan-tune-hooks=prompt (highest precedence) so dev-setup pins resolution to prompt-mode; with stdin closed that is a guaranteed no-op skip (no install, no decline marker). To install the hooks, run ./setup --plan-tune-hooks directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(setup): isolate config tests from host + cover new guards - Point gstack-config tests at a temp GSTACK_HOME so `get plan_tune_hooks` reads the built-in default, not whatever the host machine has in ~/.gstack/config.yaml (the prior test was non-deterministic). - Add behavioral coverage: yes/no/prompt round-trip, out-of-domain rejection. - Add a normalization guard (decision input is lowercased/trimmed) and a dev-setup guard (runs setup with --plan-tune-hooks=prompt + stdin detached). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: rebaseline parity-suite v1.44.1 -> v1.53.0.0 The frozen v1.44.1 anchor went stale: five planning skills (plan-ceo-review, plan-eng-review, plan-design-review, investigate, office-hours) crept past the 1.05x ceiling via legitimate v1.49-v1.53 growth (brain-aware planning + the v1.53 redaction guard), so `bun test` was red on a clean checkout of main. Capture a fresh baseline at HEAD (bun run scripts/capture-baseline.ts --tag v1.53.0.0) and re-point the test at it. The per-skill 1.05 ratio is kept, so future bloat is still caught; only the anchor moved. Mirrors the earlier skill-size-budget rebase (v1.44.1 -> v1.47.0.0). Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 baselines are retained for the v1->v2 audit trail. The captured skill bytes equal origin/main exactly (this branch left every SKILL.md untouched). Clears the pre-existing failures noted in the v1.53.0.0 CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(plan-tune): de-flake "derive pushes scope_appetite up" The test was ~25-50% flaky (worse on main). gstack-question-log fires a fire-and-forget background `--derive` after every write; the 5 rapid log writes spawned 5 racing background derives that collided with the test's explicit --derive — a late one that only saw 3 entries could clobber developer-profile.json after the explicit one wrote sample_size=5. Set GSTACK_QUESTION_LOG_NO_DERIVE=1 (the flag the binary documents for exactly this case) so the writes don't spawn background derives. The explicit --derive still runs, so real derive behavior is still asserted. 20/20 green after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document non-interactive dev-setup + plan-tune hook flags (v1.53.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
57 lines
2.2 KiB
TypeScript
57 lines
2.2 KiB
TypeScript
/**
|
||
* Cathedral parity suite — gate-tier (free, structural + content checks).
|
||
*
|
||
* Runs every PARITY_INVARIANTS check against the current SKILL.md output
|
||
* vs the v1.53.0.0 baseline. Failures get an actionable, per-skill report
|
||
* showing missing phrases, missing headings, and size ratios.
|
||
*
|
||
* Baseline rebased v1.44.1 → v1.53.0.0: the brain-aware-planning releases
|
||
* (v1.49–v1.52) plus the v1.53 redaction guard pushed five planning skills
|
||
* past the 5% ratchet on the frozen v1.44.1 anchor. Rebasing absorbs that
|
||
* legitimate growth at HEAD while keeping the per-skill 1.05 ratio so future
|
||
* bloat is still caught. Historical v1.44.1 / v1.46.0.0 / v1.47.0.0 baselines
|
||
* are retained in test/fixtures/ for the v1→v2 audit trail.
|
||
*
|
||
* Periodic-tier LLM-judge parity (paid) lands in Phase B (v2.0.0.0)
|
||
* alongside the sections/ extraction. Plumbing is in parity-harness.ts.
|
||
*/
|
||
|
||
import { describe, test, expect } from 'bun:test';
|
||
import * as fs from 'fs';
|
||
import * as path from 'path';
|
||
import { runParityChecks, PARITY_INVARIANTS } from './helpers/parity-harness';
|
||
import type { ParityBaseline } from './helpers/capture-parity-baseline';
|
||
|
||
const REPO_ROOT = path.resolve(import.meta.dir, '..');
|
||
const BASELINE_PATH = path.join(REPO_ROOT, 'test', 'fixtures', 'parity-baseline-v1.53.0.0.json');
|
||
|
||
describe('parity suite vs v1.53.0.0 baseline (gate, free)', () => {
|
||
test('baseline exists', () => {
|
||
expect(fs.existsSync(BASELINE_PATH)).toBe(true);
|
||
});
|
||
|
||
test('all PARITY_INVARIANTS pass', () => {
|
||
const baseline: ParityBaseline = JSON.parse(fs.readFileSync(BASELINE_PATH, 'utf-8'));
|
||
const report = runParityChecks({
|
||
repoRoot: REPO_ROOT,
|
||
baseline,
|
||
invariants: PARITY_INVARIANTS,
|
||
});
|
||
|
||
// eslint-disable-next-line no-console
|
||
console.log(
|
||
`[parity] ${report.passed}/${report.totalChecks} skills passed parity vs ${baseline.tag}`,
|
||
);
|
||
|
||
if (report.failed === 0) return;
|
||
|
||
const failureMessages = report.details
|
||
.filter(d => !d.passed)
|
||
.map(d => ` ${d.skill}:\n - ${d.failures.join('\n - ')}`)
|
||
.join('\n');
|
||
throw new Error(
|
||
`${report.failed} skill(s) failed parity checks vs ${baseline.tag}:\n${failureMessages}`,
|
||
);
|
||
});
|
||
});
|