mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
00bc482fe1
* feat: add /canary, /benchmark, /land-and-deploy skills (v0.7.0) Three new skills that close the deploy loop: - /canary: standalone post-deploy monitoring with browse daemon - /benchmark: performance regression detection with Web Vitals - /land-and-deploy: merge PR, wait for deploy, canary verify production Incorporates patterns from community PR #151. Co-Authored-By: HMAKT99 <HMAKT99@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Performance & Bundle Impact category to review checklist New Pass 2 (INFORMATIONAL) category catching heavy dependencies (moment.js, lodash full), missing lazy loading, synchronous scripts, CSS @import blocking, fetch waterfalls, and tree-shaking breaks. Both /review and /ship automatically pick this up via checklist.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add {{DEPLOY_BOOTSTRAP}} resolver + deployed row in dashboard - New generateDeployBootstrap() resolver auto-detects deploy platform (Vercel, Netlify, Fly.io, GH Actions, etc.), production URL, and merge method. Persists to CLAUDE.md like test bootstrap. - Review Readiness Dashboard now shows a "Deployed" row from /land-and-deploy JSONL entries (informational, never gates shipping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: mark 3 TODOs completed, bump v0.7.0, update CHANGELOG Superseded by /land-and-deploy: - /merge skill — review-gated PR merge - Deploy-verify skill - Post-deploy verification (ship + browse) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /setup-deploy skill + platform-specific deploy verification - New /setup-deploy skill: interactive guided setup for deploy configuration. Detects Fly.io, Render, Vercel, Netlify, Heroku, Railway, GitHub Actions, and custom deploy scripts. Writes config to CLAUDE.md with custom hooks section for non-standard setups. - Enhanced deploy bootstrap: platform-specific URL resolution (fly.toml app → {app}.fly.dev, render.yaml → {service}.onrender.com, etc.), deploy status commands (fly status, heroku releases), and custom deploy hooks section in CLAUDE.md for manual/scripted deploys. - Platform-specific deploy verification in /land-and-deploy Step 6: Strategy A (GitHub Actions polling), Strategy B (platform CLI: fly/render/heroku), Strategy C (auto-deploy: vercel/netlify), Strategy D (custom hooks from CLAUDE.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: E2E + LLM-judge evals for deploy skills - 4 E2E tests: land-and-deploy (Fly.io detection + deploy report), canary (monitoring report structure), benchmark (perf report schema), setup-deploy (platform detection → CLAUDE.md config) - 4 LLM-judge evals: workflow quality for all 4 new skills - Touchfile entries for diff-based test selection (E2E + LLM-judge) - 460 free tests pass, 0 fail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden E2E tests — server lifecycle, timeouts, preamble budget, skip flaky Cross-cutting fixes: - Pre-seed ~/.gstack/.completeness-intro-seen and ~/.gstack/.telemetry-prompted so preamble doesn't burn 3-7 turns on lake intro + telemetry in every test - Each describe block creates its own test server instance instead of sharing a global that dies between suites Test fixes (5 tests): - /qa quick: own server instance + preamble skip - /review SQL injection: timeout 90→180s, maxTurns 15→20, added assertion that review output actually mentions SQL injection - /review design-lite: maxTurns 25→35 + preamble skip (now detects 7/7) - ship-base-branch: both timeouts 90→150/180s + preamble skip - plan-eng artifact: clean stale state in beforeAll, maxTurns 20→25 Skipped (4 flaky/redundant tests): - contributor-mode: tests prompt compliance, not skill functionality - design-consultation-research: WebSearch-dependent, redundant with core - design-consultation-preview: redundant with core test - /qa bootstrap: too ambitious (65 turns, installs vitest) Also: preamble skip added to qa-only, qa-fix-loop, design-consultation-core, and design-consultation-existing prompts. Updated touchfiles entries and touchfiles.test.ts. Added honest comment to codex-review-findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: redesign 6 skipped/todo E2E tests + add test.concurrent support Redesigned tests (previously skipped/todo): - contributor-mode: pre-fail approach, 5 turns/30s (was 10 turns/90s) - design-consultation-research: WebSearch-only, 8 turns/90s (was 45/480s) - design-consultation-preview: preview HTML only, 8 turns/90s (was 30/480s) - qa-bootstrap: bootstrap-only, 12 turns/90s (was 65/420s) - /ship workflow: local bare remote, 15 turns/120s (was test.todo) - /setup-browser-cookies: browser detection smoke, 5 turns/45s (was test.todo) Added testConcurrentIfSelected() helper for future parallelization. Updated touchfiles entries for all 6 re-enabled tests. Target: 0 skip, 0 todo, 0 fail across all E2E tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: relax contributor-mode assertions — test structure not exact phrasing * perf: enable test.concurrent for 31 independent E2E tests Convert 18 skill-e2e, 11 routing, and 2 codex tests from sequential to test.concurrent. Only design-consultation tests (4) remain sequential due to shared designDir state. Expected ~6x speedup on Teams high-burst. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add --concurrent flag to bun test + convert remaining 4 sequential tests bun's test.concurrent only works within a describe block, not across describe blocks. Adding --concurrent to the CLI command makes ALL tests concurrent regardless of describe boundaries. Also converted the 4 design-consultation tests to concurrent (each already independent). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: split monolithic E2E test into 8 parallel files Split test/skill-e2e.test.ts (3442 lines) into 8 category files: - skill-e2e-browse.test.ts (7 tests) - skill-e2e-review.test.ts (7 tests) - skill-e2e-qa-bugs.test.ts (3 tests) - skill-e2e-qa-workflow.test.ts (4 tests) - skill-e2e-plan.test.ts (6 tests) - skill-e2e-design.test.ts (7 tests) - skill-e2e-workflow.test.ts (6 tests) - skill-e2e-deploy.test.ts (4 tests) Bun runs each file in its own worker = 10 parallel workers (8 split + routing + codex). Expected: 78 min → ~12 min. Extracted shared helpers to test/helpers/e2e-helpers.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: bump default E2E concurrency to 15 * perf: add model pinning infrastructure + rate-limit telemetry to E2E runner Default E2E model changed from Opus to Sonnet (5x faster, 5x cheaper). Session runner now accepts `model` option with EVALS_MODEL env var override. Added timing telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms to eval-store for diagnosing rate-limit impact. Added EVALS_FAST test filtering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve 3 E2E test failures — tmpdir race, wasted turns, brittle assertions plan-design-review-plan-mode: give each test its own tmpdir to eliminate race condition where concurrent tests pollute each other's working directory. ship-local-workflow: inline ship workflow steps in prompt instead of having agent read 700+ line SKILL.md (was wasting 6 of 15 turns on file I/O). design-consultation-core: replace exact section name matching with fuzzy synonym-based matching (e.g. "Colors" matches "Color", "Type System" matches "Typography"). All 7 sections still required, LLM judge still hard fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: pin quality tests to Opus, add --retry 2 and test:e2e:fast tier ~10 quality-sensitive tests (planted-bug detection, design quality judge, strategic review, retro analysis) explicitly pinned to Opus. ~30 structure tests default to Sonnet for 5x speed improvement. Added --retry 2 to all E2E scripts for flaky test resilience. Added test:e2e:fast script that excludes 8 slowest tests for quick feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mark E2E model pinning TODO as shipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add SKILL.md merge conflict directive to CLAUDE.md When resolving merge conflicts on generated SKILL.md files, always merge the .tmpl templates first, then regenerate — never accept either side's generated output directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add DEPLOY_BOOTSTRAP resolver to gen-skill-docs The land-and-deploy template referenced {{DEPLOY_BOOTSTRAP}} but no resolver existed, causing gen-skill-docs to fail. Added generateDeployBootstrap() that generates the deploy config detection bash block (check CLAUDE.md for persisted config, auto-detect platform from config files, detect deploy workflows). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after DEPLOY_BOOTSTRAP fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move prompt temp file outside workingDirectory to prevent race condition The .prompt-tmp file was written inside workingDirectory, which gets deleted by afterAll cleanup. With --concurrent --retry, afterAll can interleave with retries, causing "No such file or directory" crashes at 0s (seen in review-design-lite and office-hours-spec-review). Fix: write prompt file to os.tmpdir() with a unique suffix so it survives directory cleanup. Also convert review-design-lite from describeE2E to describeIfSelected for proper diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add --retry 2 --concurrent flags to test:evals scripts for consistency test:evals and test:evals:all were missing the retry and concurrency flags that test:e2e already had, causing inconsistent behavior between the two script families. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: HMAKT99 <HMAKT99@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
358 lines
12 KiB
TypeScript
358 lines
12 KiB
TypeScript
/**
|
|
* Claude CLI subprocess runner for skill E2E testing.
|
|
*
|
|
* Spawns `claude -p` as a completely independent process (not via Agent SDK),
|
|
* so it works inside Claude Code sessions. Pipes prompt via stdin, streams
|
|
* NDJSON output for real-time progress, scans for browse errors.
|
|
*/
|
|
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
|
|
const GSTACK_DEV_DIR = path.join(os.homedir(), '.gstack-dev');
|
|
const HEARTBEAT_PATH = path.join(GSTACK_DEV_DIR, 'e2e-live.json');
|
|
|
|
/** Sanitize test name for use as filename: strip leading slashes, replace / with - */
|
|
export function sanitizeTestName(name: string): string {
|
|
return name.replace(/^\/+/, '').replace(/\//g, '-');
|
|
}
|
|
|
|
/** Atomic write: write to .tmp then rename. Non-fatal on error. */
|
|
function atomicWriteSync(filePath: string, data: string): void {
|
|
const tmp = filePath + '.tmp';
|
|
fs.writeFileSync(tmp, data);
|
|
fs.renameSync(tmp, filePath);
|
|
}
|
|
|
|
export interface CostEstimate {
|
|
inputChars: number;
|
|
outputChars: number;
|
|
estimatedTokens: number;
|
|
estimatedCost: number; // USD
|
|
turnsUsed: number;
|
|
}
|
|
|
|
export interface SkillTestResult {
|
|
toolCalls: Array<{ tool: string; input: any; output: string }>;
|
|
browseErrors: string[];
|
|
exitReason: string;
|
|
duration: number;
|
|
output: string;
|
|
costEstimate: CostEstimate;
|
|
transcript: any[];
|
|
/** Which model was used for this test (added for Sonnet/Opus split diagnostics) */
|
|
model: string;
|
|
/** Time from spawn to first NDJSON line, in ms (added for rate-limit diagnostics) */
|
|
firstResponseMs: number;
|
|
/** Peak latency between consecutive tool calls, in ms */
|
|
maxInterTurnMs: number;
|
|
}
|
|
|
|
const BROWSE_ERROR_PATTERNS = [
|
|
/Unknown command: \w+/,
|
|
/Unknown snapshot flag: .+/,
|
|
/ERROR: browse binary not found/,
|
|
/Server failed to start/,
|
|
/no such file or directory.*browse/i,
|
|
];
|
|
|
|
// --- Testable NDJSON parser ---
|
|
|
|
export interface ParsedNDJSON {
|
|
transcript: any[];
|
|
resultLine: any | null;
|
|
turnCount: number;
|
|
toolCallCount: number;
|
|
toolCalls: Array<{ tool: string; input: any; output: string }>;
|
|
}
|
|
|
|
/**
|
|
* Parse an array of NDJSON lines into structured transcript data.
|
|
* Pure function — no I/O, no side effects. Used by both the streaming
|
|
* reader and unit tests.
|
|
*/
|
|
export function parseNDJSON(lines: string[]): ParsedNDJSON {
|
|
const transcript: any[] = [];
|
|
let resultLine: any = null;
|
|
let turnCount = 0;
|
|
let toolCallCount = 0;
|
|
const toolCalls: ParsedNDJSON['toolCalls'] = [];
|
|
|
|
for (const line of lines) {
|
|
if (!line.trim()) continue;
|
|
try {
|
|
const event = JSON.parse(line);
|
|
transcript.push(event);
|
|
|
|
// Track turns and tool calls from assistant events
|
|
if (event.type === 'assistant') {
|
|
turnCount++;
|
|
const content = event.message?.content || [];
|
|
for (const item of content) {
|
|
if (item.type === 'tool_use') {
|
|
toolCallCount++;
|
|
toolCalls.push({
|
|
tool: item.name || 'unknown',
|
|
input: item.input || {},
|
|
output: '',
|
|
});
|
|
}
|
|
}
|
|
}
|
|
|
|
if (event.type === 'result') resultLine = event;
|
|
} catch { /* skip malformed lines */ }
|
|
}
|
|
|
|
return { transcript, resultLine, turnCount, toolCallCount, toolCalls };
|
|
}
|
|
|
|
function truncate(s: string, max: number): string {
|
|
return s.length > max ? s.slice(0, max) + '…' : s;
|
|
}
|
|
|
|
// --- Main runner ---
|
|
|
|
export async function runSkillTest(options: {
|
|
prompt: string;
|
|
workingDirectory: string;
|
|
maxTurns?: number;
|
|
allowedTools?: string[];
|
|
timeout?: number;
|
|
testName?: string;
|
|
runId?: string;
|
|
/** Model to use. Defaults to claude-sonnet-4-6 (overridable via EVALS_MODEL env). */
|
|
model?: string;
|
|
}): Promise<SkillTestResult> {
|
|
const {
|
|
prompt,
|
|
workingDirectory,
|
|
maxTurns = 15,
|
|
allowedTools = ['Bash', 'Read', 'Write'],
|
|
timeout = 120_000,
|
|
testName,
|
|
runId,
|
|
} = options;
|
|
const model = options.model ?? process.env.EVALS_MODEL ?? 'claude-sonnet-4-6';
|
|
|
|
const startTime = Date.now();
|
|
const startedAt = new Date().toISOString();
|
|
|
|
// Set up per-run log directory if runId is provided
|
|
let runDir: string | null = null;
|
|
const safeName = testName ? sanitizeTestName(testName) : null;
|
|
if (runId) {
|
|
try {
|
|
runDir = path.join(GSTACK_DEV_DIR, 'e2e-runs', runId);
|
|
fs.mkdirSync(runDir, { recursive: true });
|
|
} catch { /* non-fatal */ }
|
|
}
|
|
|
|
// Spawn claude -p with streaming NDJSON output. Prompt piped via stdin to
|
|
// avoid shell escaping issues. --verbose is required for stream-json mode.
|
|
const args = [
|
|
'-p',
|
|
'--model', model,
|
|
'--output-format', 'stream-json',
|
|
'--verbose',
|
|
'--dangerously-skip-permissions',
|
|
'--max-turns', String(maxTurns),
|
|
'--allowed-tools', ...allowedTools,
|
|
];
|
|
|
|
// Write prompt to a temp file OUTSIDE workingDirectory to avoid race conditions
|
|
// where afterAll cleanup deletes the dir before cat reads the file (especially
|
|
// with --concurrent --retry). Using os.tmpdir() + unique suffix keeps it stable.
|
|
const promptFile = path.join(os.tmpdir(), `.prompt-${process.pid}-${Date.now()}-${Math.random().toString(36).slice(2)}`);
|
|
fs.writeFileSync(promptFile, prompt);
|
|
|
|
const proc = Bun.spawn(['sh', '-c', `cat "${promptFile}" | claude ${args.map(a => `"${a}"`).join(' ')}`], {
|
|
cwd: workingDirectory,
|
|
stdout: 'pipe',
|
|
stderr: 'pipe',
|
|
});
|
|
|
|
// Race against timeout
|
|
let stderr = '';
|
|
let exitReason = 'unknown';
|
|
let timedOut = false;
|
|
|
|
const timeoutId = setTimeout(() => {
|
|
timedOut = true;
|
|
proc.kill();
|
|
}, timeout);
|
|
|
|
// Stream NDJSON from stdout for real-time progress
|
|
const collectedLines: string[] = [];
|
|
let liveTurnCount = 0;
|
|
let liveToolCount = 0;
|
|
let firstResponseMs = 0;
|
|
let lastToolTime = 0;
|
|
let maxInterTurnMs = 0;
|
|
const stderrPromise = new Response(proc.stderr).text();
|
|
|
|
const reader = proc.stdout.getReader();
|
|
const decoder = new TextDecoder();
|
|
let buf = '';
|
|
|
|
try {
|
|
while (true) {
|
|
const { done, value } = await reader.read();
|
|
if (done) break;
|
|
buf += decoder.decode(value, { stream: true });
|
|
const lines = buf.split('\n');
|
|
buf = lines.pop() || '';
|
|
for (const line of lines) {
|
|
if (!line.trim()) continue;
|
|
collectedLines.push(line);
|
|
|
|
// Real-time progress to stderr + persistent logs
|
|
try {
|
|
const event = JSON.parse(line);
|
|
if (event.type === 'assistant') {
|
|
liveTurnCount++;
|
|
const content = event.message?.content || [];
|
|
for (const item of content) {
|
|
if (item.type === 'tool_use') {
|
|
liveToolCount++;
|
|
const now = Date.now();
|
|
const elapsed = Math.round((now - startTime) / 1000);
|
|
// Track timing telemetry
|
|
if (firstResponseMs === 0) firstResponseMs = now - startTime;
|
|
if (lastToolTime > 0) {
|
|
const interTurn = now - lastToolTime;
|
|
if (interTurn > maxInterTurnMs) maxInterTurnMs = interTurn;
|
|
}
|
|
lastToolTime = now;
|
|
const progressLine = ` [${elapsed}s] turn ${liveTurnCount} tool #${liveToolCount}: ${item.name}(${truncate(JSON.stringify(item.input || {}), 80)})\n`;
|
|
process.stderr.write(progressLine);
|
|
|
|
// Persist progress.log
|
|
if (runDir) {
|
|
try { fs.appendFileSync(path.join(runDir, 'progress.log'), progressLine); } catch { /* non-fatal */ }
|
|
}
|
|
|
|
// Write heartbeat (atomic)
|
|
if (runId && testName) {
|
|
try {
|
|
const toolDesc = `${item.name}(${truncate(JSON.stringify(item.input || {}), 60)})`;
|
|
atomicWriteSync(HEARTBEAT_PATH, JSON.stringify({
|
|
runId,
|
|
pid: proc.pid,
|
|
startedAt,
|
|
currentTest: testName,
|
|
status: 'running',
|
|
turn: liveTurnCount,
|
|
toolCount: liveToolCount,
|
|
lastTool: toolDesc,
|
|
lastToolAt: new Date().toISOString(),
|
|
elapsedSec: elapsed,
|
|
}, null, 2) + '\n');
|
|
} catch { /* non-fatal */ }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
} catch { /* skip — parseNDJSON will handle it later */ }
|
|
|
|
// Append raw NDJSON line to per-test transcript file
|
|
if (runDir && safeName) {
|
|
try { fs.appendFileSync(path.join(runDir, `${safeName}.ndjson`), line + '\n'); } catch { /* non-fatal */ }
|
|
}
|
|
}
|
|
}
|
|
} catch { /* stream read error — fall through to exit code handling */ }
|
|
|
|
// Flush remaining buffer
|
|
if (buf.trim()) {
|
|
collectedLines.push(buf);
|
|
}
|
|
|
|
stderr = await stderrPromise;
|
|
const exitCode = await proc.exited;
|
|
clearTimeout(timeoutId);
|
|
|
|
try { fs.unlinkSync(promptFile); } catch { /* non-fatal */ }
|
|
|
|
if (timedOut) {
|
|
exitReason = 'timeout';
|
|
} else if (exitCode === 0) {
|
|
exitReason = 'success';
|
|
} else {
|
|
exitReason = `exit_code_${exitCode}`;
|
|
}
|
|
|
|
const duration = Date.now() - startTime;
|
|
|
|
// Parse all collected NDJSON lines
|
|
const parsed = parseNDJSON(collectedLines);
|
|
const { transcript, resultLine, toolCalls } = parsed;
|
|
const browseErrors: string[] = [];
|
|
|
|
// Scan transcript + stderr for browse errors
|
|
const allText = transcript.map(e => JSON.stringify(e)).join('\n') + '\n' + stderr;
|
|
for (const pattern of BROWSE_ERROR_PATTERNS) {
|
|
const match = allText.match(pattern);
|
|
if (match) {
|
|
browseErrors.push(match[0].slice(0, 200));
|
|
}
|
|
}
|
|
|
|
// Use resultLine for structured result data
|
|
if (resultLine) {
|
|
if (resultLine.is_error) {
|
|
// claude -p can return subtype=success with is_error=true (e.g. API connection failure)
|
|
exitReason = 'error_api';
|
|
} else if (resultLine.subtype === 'success') {
|
|
exitReason = 'success';
|
|
} else if (resultLine.subtype) {
|
|
exitReason = resultLine.subtype;
|
|
}
|
|
}
|
|
|
|
// Save failure transcript to persistent run directory (or fallback to workingDirectory)
|
|
if (browseErrors.length > 0 || exitReason !== 'success') {
|
|
try {
|
|
const failureDir = runDir || path.join(workingDirectory, '.gstack', 'test-transcripts');
|
|
fs.mkdirSync(failureDir, { recursive: true });
|
|
const failureName = safeName
|
|
? `${safeName}-failure.json`
|
|
: `e2e-${new Date().toISOString().replace(/[:.]/g, '-')}.json`;
|
|
fs.writeFileSync(
|
|
path.join(failureDir, failureName),
|
|
JSON.stringify({
|
|
prompt: prompt.slice(0, 500),
|
|
testName: testName || 'unknown',
|
|
exitReason,
|
|
browseErrors,
|
|
duration,
|
|
turnAtTimeout: timedOut ? liveTurnCount : undefined,
|
|
lastToolCall: liveToolCount > 0 ? `tool #${liveToolCount}` : undefined,
|
|
stderr: stderr.slice(0, 2000),
|
|
result: resultLine ? { type: resultLine.type, subtype: resultLine.subtype, result: resultLine.result?.slice?.(0, 500) } : null,
|
|
}, null, 2),
|
|
);
|
|
} catch { /* non-fatal */ }
|
|
}
|
|
|
|
// Cost from result line (exact) or estimate from chars
|
|
const turnsUsed = resultLine?.num_turns || 0;
|
|
const estimatedCost = resultLine?.total_cost_usd || 0;
|
|
const inputChars = prompt.length;
|
|
const outputChars = (resultLine?.result || '').length;
|
|
const estimatedTokens = (resultLine?.usage?.input_tokens || 0)
|
|
+ (resultLine?.usage?.output_tokens || 0)
|
|
+ (resultLine?.usage?.cache_read_input_tokens || 0);
|
|
|
|
const costEstimate: CostEstimate = {
|
|
inputChars,
|
|
outputChars,
|
|
estimatedTokens,
|
|
estimatedCost: Math.round((estimatedCost) * 100) / 100,
|
|
turnsUsed,
|
|
};
|
|
|
|
return { toolCalls, browseErrors, exitReason, duration, output: resultLine?.result || '', costEstimate, transcript, model, firstResponseMs, maxInterTurnMs };
|
|
}
|