gstack/test/helpers at 6c610bf55d413237e2a33c975b84684511bb7a00 - gstack - MS-GitHub-Backup (Gitea)

CalvinBackup/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-06-17 15:20:11 +02:00

Files

T

History

Garry Tan 0690066c3f test(auq): grade format-compliance gate from SDK capture, not the TUI

The real-PTY version grepped the stripAnsi'd interactive AUQ picker. Verified
directly that this cannot work: plan-mode AUQs render as a cursor picker whose
cursor-positioning escapes stripAnsi can't flatten — the picker renders fine for
a human (cursorSeen=45) but the flattened text drops ELI10:/(recommended) and
parseNumberedOptions returns 0. The test was grading a lossy projection and
failed by construction.

Rewritten to drive /plan-ceo-review via the SDK $OUT_FILE capture (the agent
writes the verbatim question it would have shown — clean text, no rendering
loss) and grade 7/7 format + kind-note + recommendation substance >=4. Same
property, reliable, environment-independent; shares the engine with the periodic
A/B and matrix evals. Result: 7/7 format, substance 5. Touchfiles key renamed
ask-user-question-format-pty -> auq-format-gate (no longer a PTY test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-02 21:42:32 -07:00

..

v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391 )

2026-05-09 08:06:47 -07:00

agent-sdk-runner.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

auq-sdk-capture.ts

test(auq): SDK capture engine + verbose-vs-carved no-degradation A/B

2026-06-01 22:17:37 -07:00

benchmark-judge.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

benchmark-runner.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

budget-override.test.ts

v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712 )

2026-05-26 16:50:03 -07:00

budget-override.ts

v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712 )

2026-05-26 16:50:03 -07:00

capture-parity-baseline.test.ts

v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712 )

2026-05-26 16:50:03 -07:00

capture-parity-baseline.ts

v1.46.0.0 feat: gstack v2 foundation — catalog tokens drop 56%, eval-first floor covers all 51 skills (#1712 )

2026-05-26 16:50:03 -07:00

claude-pty-runner.ts

v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390 )

2026-05-09 17:01:13 -07:00

claude-pty-runner.unit.test.ts

v1.31.0.0 fix: delete AskUserQuestion fallback (root cause of forever war) + harness primitives (#1390 )

2026-05-09 17:01:13 -07:00

codex-session-runner.ts

fix: enforce Codex 1024-char description limit + auto-heal stale installs (v0.11.9.0) (#391 )

2026-03-23 08:44:08 -07:00

e2e-helpers.ts

v1.39.2.0 feat: GSTACK_* env-shim for Conductor + gbrain/gstack setup docs (#1534 )

2026-05-16 12:32:33 -07:00

eval-store.test.ts

feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83 )

2026-03-15 23:55:39 -05:00

eval-store.ts

v1.32.0.0 fix wave: 7 community PRs + 5 gate-eval hardenings (#1431 )

2026-05-11 12:16:26 -07:00

gemini-session-runner.test.ts

feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )

2026-03-20 08:30:09 -07:00

gemini-session-runner.ts

feat: Gemini CLI E2E tests (v0.9.2.0) (#252 )

2026-03-20 08:30:09 -07:00

llm-judge.ts

v1.25.1.0 fix: office-hours Phase 4 STOP gate + AskUserQuestion recommendation judge (#1296 )

2026-05-01 19:51:51 -07:00

observability.test.ts

fix: never clean up observability artifacts — partial file persists after finalize

2026-03-14 12:37:38 -05:00

parity-harness.ts

refactor(plan-design-review): carve review body into on-demand section

2026-06-01 23:07:10 -07:00

pricing.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

required-reads.ts

v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806 )

2026-05-30 12:09:10 -07:00

secret-sink-harness.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

session-runner.test.ts

feat: stream-json NDJSON parser for real-time E2E progress

2026-03-14 03:49:36 -05:00

session-runner.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

skill-parser.ts

feat: content security — 4-layer prompt injection defense for pair-agent (#815 )

2026-04-06 14:41:06 -07:00

tool-map.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

touchfiles.ts

test(auq): grade format-compliance gate from SDK capture, not the TUI

2026-06-02 21:42:32 -07:00

transcript-section-logger.ts

v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806 )

2026-05-30 12:09:10 -07:00