Logo
Explore Help
Sign In
CalvinBackup/gstack
1
0
Fork 0
You've already forked gstack
mirror of https://github.com/garrytan/gstack.git synced 2026-05-02 11:45:20 +02:00
Code Issues Packages Projects Releases Wiki Activity
Files
garrytan/qa-2.1
gstack/test/helpers
T
History
Garry Tan 03a6270b9c feat: eval efficiency metrics — turns, duration, commentary across all surfaces
Add generateCommentary() for natural-language delta interpretation,
per-test turns/duration in comparison and summary output, judgePassed
unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:17:12 -05:00
..
eval-store.test.ts
feat: eval efficiency metrics — turns, duration, commentary across all surfaces
2026-03-15 21:17:12 -05:00
eval-store.ts
feat: eval efficiency metrics — turns, duration, commentary across all surfaces
2026-03-15 21:17:12 -05:00
llm-judge.ts
feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1)
2026-03-14 01:17:36 -05:00
observability.test.ts
fix: never clean up observability artifacts — partial file persists after finalize
2026-03-14 12:37:38 -05:00
session-runner.test.ts
feat: stream-json NDJSON parser for real-time E2E progress
2026-03-14 03:49:36 -05:00
session-runner.ts
fix: auto-clear stale heartbeat when process is dead
2026-03-14 12:55:40 -05:00
skill-parser.ts
feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1)
2026-03-14 01:17:36 -05:00
Powered by Gitea Version: 1.26.0 Page: 104ms Template: 3ms
Auto
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API