Garry Tan
dfb68fe88d
test: add fixture-based sanity test for judgeRecommendation rubric
...
Replaces "manually inject bad text into a captured file and revert the SKILL
template" sabotage testing with deterministic negative coverage: hand-graded
good/bad recommendation strings asserted against the same threshold (>= 4)
the production E2E tests use.
Seven fixtures cover the rubric corners: substance 5 (option-specific +
cross-alternative), substance 4 (option-specific without comparison), substance
~1 (boilerplate "because it's better"), substance ~3 (generic "because it's
faster"), no-because (deterministic skip), no-recommendation (deterministic
skip), and hedging ("either B or C" — fails commits).
Periodic-tier so it doesn't run on every PR but does fire on llm-judge.ts
rubric tweaks. ~$0.04 per run via Haiku 4.5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-01 14:18:16 -07:00
..
2026-05-01 08:45:36 -07:00
2026-05-01 14:17:58 -07:00
2026-04-24 00:04:53 -07:00
2026-03-18 23:57:59 -05:00
2026-04-19 17:50:31 +08:00
2026-04-19 17:50:31 +08:00
2026-04-19 17:50:31 +08:00
2026-04-26 13:55:13 -07:00
2026-04-08 22:21:28 -10:00
2026-04-23 07:25:20 -07:00
2026-03-23 23:05:22 -07:00
2026-04-18 12:30:54 +08:00
2026-04-19 08:38:19 +08:00
2026-03-30 22:07:50 -06:00
2026-04-26 13:55:13 -07:00
2026-04-18 15:05:42 +08:00
2026-04-24 01:38:21 -07:00
2026-04-24 01:38:21 -07:00
2026-04-24 01:38:21 -07:00
2026-04-24 01:38:21 -07:00
2026-04-16 10:41:38 -07:00
2026-04-26 13:55:13 -07:00
2026-04-05 02:02:06 -07:00
2026-04-24 07:51:46 -07:00
2026-04-18 15:05:42 +08:00
2026-04-28 01:17:54 -07:00
2026-04-23 23:03:27 -07:00
2026-05-01 07:21:28 -07:00
2026-04-18 15:05:42 +08:00
2026-04-18 15:05:42 +08:00
2026-04-28 01:17:54 -07:00
2026-04-26 13:55:13 -07:00
2026-03-18 23:57:59 -05:00
2026-04-17 00:45:13 -07:00
2026-04-18 15:05:42 +08:00
2026-04-06 00:47:04 -07:00
2026-03-29 17:02:01 -06:00
2026-05-01 14:18:16 -07:00
2026-04-19 08:38:19 +08:00
2026-04-25 11:52:48 -07:00
2026-04-17 00:45:13 -07:00
2026-04-18 15:05:42 +08:00
2026-05-01 07:06:37 -07:00
2026-04-23 18:25:34 -07:00
2026-04-18 15:05:42 +08:00
2026-04-16 15:39:44 -07:00
2026-04-23 18:25:34 -07:00
2026-03-26 23:21:27 -06:00
2026-04-24 01:38:21 -07:00
2026-04-18 12:30:54 +08:00
2026-04-18 23:58:59 +08:00
2026-04-26 13:55:13 -07:00
2026-04-19 08:38:19 +08:00
2026-04-26 13:55:13 -07:00
2026-05-01 08:45:36 -07:00
2026-05-01 08:45:36 -07:00
2026-04-26 13:55:13 -07:00
2026-04-19 08:38:19 +08:00
2026-04-19 17:50:31 +08:00
2026-04-24 01:38:21 -07:00
2026-04-23 17:54:54 -07:00
2026-04-19 08:38:19 +08:00
2026-03-23 06:57:22 -07:00
2026-03-26 11:08:31 -07:00
2026-03-23 10:17:33 -07:00
2026-03-31 23:08:22 -06:00
2026-05-01 08:45:36 -07:00
2026-04-19 05:44:39 +08:00
2026-04-22 01:06:22 -07:00
2026-04-23 18:42:58 -07:00
2026-04-30 02:50:09 -07:00
2026-04-30 02:50:09 -07:00
2026-05-01 08:45:36 -07:00
2026-04-30 02:50:09 -07:00
2026-05-01 08:45:36 -07:00
2026-04-26 13:55:13 -07:00
2026-04-30 02:50:09 -07:00
2026-05-01 08:45:36 -07:00
2026-04-30 02:50:09 -07:00
2026-05-01 08:45:36 -07:00
2026-05-01 14:18:06 -07:00
2026-04-26 13:55:13 -07:00
2026-04-23 18:25:34 -07:00
2026-04-18 15:05:42 +08:00
2026-04-19 05:44:39 +08:00
2026-03-23 10:17:33 -07:00
2026-03-23 10:17:33 -07:00
2026-03-30 22:07:50 -06:00
2026-04-16 10:41:38 -07:00
2026-04-19 08:38:19 +08:00
2026-04-26 13:55:13 -07:00
2026-04-04 15:32:20 -07:00
2026-04-28 20:08:04 -07:00
2026-04-07 00:23:36 -07:00
2026-03-31 23:08:22 -06:00
2026-03-26 17:31:53 -06:00
2026-03-13 21:08:12 -07:00
2026-04-16 10:41:38 -07:00
2026-05-01 07:21:28 -07:00
2026-04-19 17:50:31 +08:00
2026-04-22 01:06:22 -07:00
2026-03-29 21:43:36 -06:00
2026-05-01 07:21:28 -07:00
2026-04-01 00:50:42 -06:00
2026-05-01 08:45:36 -07:00
2026-03-27 00:44:37 -06:00
2026-04-18 15:05:42 +08:00
2026-04-18 15:05:42 +08:00
2026-04-06 14:41:06 -07:00
2026-04-26 13:55:13 -07:00