gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-06-22 17:49:57 +02:00

Files

T

Garry Tan 752ff50e11 test(helpers): add judgeRecommendation with deterministic regex + Haiku rubric

Existing AskUserQuestion format-regression tests only regex-match
"Recommendation:[*\s]*Choose" — they confirm the line exists but say nothing
about whether the "because Y" clause is present, specific, or substantive.
Agents frequently produce the line with boilerplate reasoning ("because it's
better"), and the regex passes anyway.

Add judgeRecommendation:
- Deterministic regex parses present / commits / has_because — no LLM call
  needed for booleans, and skipping the LLM when has_because is false avoids
  burning tokens on cases that already failed the format spec.
- Haiku 4.5 grades reason_substance 1-5 on a tight rubric scoped to the
  because-clause itself (not the surrounding pros/cons menu — that menu is
  context only). 5 = specific tradeoff vs an alternative; 3 = generic
  ("because it's faster"); 1 = boilerplate ("because it's better").
- callJudge generalized with a model arg, default Sonnet for back-compat
  with judge / outcomeJudge / judgePosture callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 14:17:58 -07:00

providers

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

agent-sdk-runner.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

benchmark-judge.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

benchmark-runner.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )