mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 21:25:27 +02:00
docs: mark E2E model pinning TODO as shipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -336,14 +336,11 @@
|
||||
**Priority:** P2
|
||||
**Depends on:** Eval persistence (shipped in v0.3.6)
|
||||
|
||||
### E2E model pinning
|
||||
### E2E model pinning — SHIPPED
|
||||
|
||||
**What:** Pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM responses.
|
||||
~~**What:** Pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM responses.~~
|
||||
|
||||
**Why:** Reduce E2E test cost and flakiness.
|
||||
|
||||
**Effort:** XS
|
||||
**Priority:** P2
|
||||
Shipped: Default model changed to Sonnet for structure tests (~30), Opus retained for quality tests (~10). `--retry 2` added. `EVALS_MODEL` env var for override. `test:e2e:fast` tier added. Rate-limit telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms tracking added to eval-store.
|
||||
|
||||
### Eval web dashboard
|
||||
|
||||
|
||||
Reference in New Issue
Block a user