mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-28 20:50:05 +02:00
81fdf9cc61
Two new gate-tier guardrails for the v1.45.0.0 compression baseline: 1. test/skill-size-budget.test.ts (NEW) — per-skill SKILL.md size budget. Compares current state to test/fixtures/parity-baseline-v1.44.1.json. Three checks: per-skill (×1.05 default ratio), total corpus, and catalog token estimate (≤7000 for v1.45). The per-skill ratio is 1.05 not 1.0 because the T4 catalog trim moves text from frontmatter to a body section; small skills see a tiny body growth that's fine when offset by the much larger catalog-token win. 2. test/skill-budget-regression.test.ts EXTENDED — hard dollar cap on per-run eval cost. Per-tier defaults: gate $25, periodic $70. Umbrella EVALS_BUDGET_HARD_CAP=$30. Catches runaway eval costs (infinite retry, model price changes) before they amortize across PRs. Both checks support an override path with audit trail: GSTACK_SIZE_BUDGET_OVERRIDE_REASON="why this is OK" — size EVALS_BUDGET_OVERRIDE_REASON="why this is OK" — cost Overrides log to ~/.gstack/analytics/spend-overrides.jsonl with timestamp + scope + reason + CI provenance (runner, branch, commit) via test/helpers/budget-override.ts. Why the override audit: a hard cap with no escape valve becomes operationally hostile (legit price changes, longer transcripts, new required evals can all blow the cap). An override with no audit becomes "everyone overrides everything and the gate is theater." This module ships the audit half so reviewers can see what was waived and why. Codex 2nd-pass critique #3 absorbed: per-suite caps + override path with auditability + budget baselines checked into repo (parity-baseline-v1.44.1.json already in test/fixtures/). Test plan: - bun test test/skill-size-budget.test.ts: 4 pass (per-skill, corpus, catalog, baseline-exists) - bun test test/skill-budget-regression.test.ts: 4 pass (2 existing ratio checks + 2 new hard-cap checks) - Existing eval runs ($14.11 e2e, $0.02 llm-judge) sit well under the new caps Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
51 lines
2.0 KiB
TypeScript
51 lines
2.0 KiB
TypeScript
/**
|
|
* Budget override audit trail (v1.45.0.0 T5).
|
|
*
|
|
* Records uses of GSTACK_SIZE_BUDGET_OVERRIDE_REASON or
|
|
* EVALS_BUDGET_OVERRIDE_REASON so a reviewer can see what was waived,
|
|
* by whom, and why. Append-only JSONL at ~/.gstack/analytics/spend-overrides.jsonl.
|
|
*
|
|
* Why audit: a hard cap with no escape valve becomes operationally hostile
|
|
* (legit price changes, longer transcripts, new required evals can all
|
|
* blow the cap). An escape valve with no audit becomes "everyone overrides
|
|
* everything and we lose the gate." This module is the audit half.
|
|
*/
|
|
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
|
|
export interface BudgetOverrideEntry {
|
|
scope: string; // e.g. 'skill-size-budget', 'evals-cost-cap'
|
|
reason: string; // user-supplied REASON env var
|
|
details?: Record<string, unknown>; // numbers / regressions
|
|
}
|
|
|
|
function getAuditPath(): string {
|
|
const base = process.env.GSTACK_HOME || path.join(os.homedir(), '.gstack');
|
|
return path.join(base, 'analytics', 'spend-overrides.jsonl');
|
|
}
|
|
|
|
export function logBudgetOverride(entry: BudgetOverrideEntry): void {
|
|
try {
|
|
const auditPath = getAuditPath();
|
|
fs.mkdirSync(path.dirname(auditPath), { recursive: true });
|
|
const line = JSON.stringify({
|
|
timestamp: new Date().toISOString(),
|
|
scope: entry.scope,
|
|
reason: entry.reason,
|
|
details: entry.details ?? {},
|
|
// Capture provenance: who/where/which CI ran
|
|
ci: process.env.CI === 'true',
|
|
runner: process.env.GITHUB_ACTIONS ? 'github-actions' : process.env.CI_RUNNER || 'local',
|
|
branch: process.env.GITHUB_REF_NAME || process.env.CI_COMMIT_REF_NAME || 'unknown',
|
|
commit: process.env.GITHUB_SHA?.slice(0, 8) || process.env.CI_COMMIT_SHORT_SHA || 'unknown',
|
|
}) + '\n';
|
|
fs.appendFileSync(auditPath, line);
|
|
} catch (err) {
|
|
// Best-effort logging; don't fail the test on audit-write errors.
|
|
// eslint-disable-next-line no-console
|
|
console.warn(`[budget-override] could not write audit log: ${(err as Error).message}`);
|
|
}
|
|
}
|