gstack

CalvinBackup/gstack

Fork 0

mirror of https://github.com/garrytan/gstack.git synced 2026-08-03 12:58:40 +02:00

Files

T

History

Garry TanandClaude Opus 4.7 2b3f9676f2 test: E2E test for /plan-tune plain-English inspection flow (gate tier)

test/skill-e2e-plan-tune.test.ts — verifies /plan-tune correctly routes
plain-English intent ("review the questions I've been asked") to the
Review question log section without requiring CLI subcommand syntax.

Seeds a synthetic question-log.jsonl with 3 entries exercising:
- override behavior (user chose expand over recommended selective)
- one-way door respect (user followed ship-test-failure-triage recommendation)
- two-way override (user skipped recommended changelog polish)

Invokes the skill via `claude -p` and asserts:
- Agent surfaces >= 2 of 3 logged question_ids in output
- Agent notices override/skip behavior from the log
- Exit reason is success or error_max_turns (not agent-crash)

Gate-tier because the core v1 DX promise is plain-English intent routing.
If it requires memorized subcommands or breaks on natural language, that's
a regression of the defining feature.

Registered in test/helpers/touchfiles.ts with dependencies:
- plan-tune/** (skill template + generated md)
- scripts/question-registry.ts (required for log lookup)
- scripts/psychographic-signals.ts, scripts/one-way-doors.ts (derive path)
- bin/gstack-question-log, gstack-question-preference, gstack-developer-profile

Skipped when EVALS_ENABLED is not set; runs on `bun run test:evals`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-17 06:48:10 +08:00

codex-session-runner.ts

fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391 )

2026-03-23 08:44:08 -07:00

e2e-helpers.ts

feat: remove trigger guard + proactive opt-out prompt (#457 )

2026-03-24 18:07:36 -07:00

eval-store.test.ts

feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83 )

2026-03-15 23:55:39 -05:00

eval-store.ts

feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425 )