feat: wire costs[] from modelUsage into eval results

Extract per-model token usage from resultLine.modelUsage (including
cache tokens and exact API cost), flow CostEntry[] through EvalCollector,
aggregate in finalize(). Extend CostEntry with cache_read_input_tokens,
cache_creation_input_tokens, cost_usd. computeCosts() prefers exact
cost_usd over MODEL_PRICING when available (~4x more accurate with
prompt caching).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-15 16:47:27 -05:00
parent 4ad73f7362
commit 02925cfc7a
7 changed files with 170 additions and 7 deletions
+4
View File
@@ -15,6 +15,10 @@ export interface CostEntry {
calls: number;
input_tokens: number;
output_tokens: number;
cache_read_input_tokens?: number;
cache_creation_input_tokens?: number;
/** Exact cost from API when available (accounts for cache discounts). */
cost_usd?: number;
}
export interface FailureEntry {