- 005_sync_heartbeats.sql migration for connectivity testing
- eval:trend --team flag pulls team eval data (graceful fallback)
- docs/TEAM_SYNC_SETUP.md step-by-step setup guide
- Design doc status updated to Phase 2 complete
- 10 new tests for sync show formatting functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- lib/cli-eval.ts: routes to list/compare/summary/push/cost/cache/watch
subcommands. Ports logic from 4 separate scripts into unified entry.
Adds ANSI color for TTY (respects NO_COLOR), --limit flag for list.
- bin/gstack-eval: bash wrapper matching bin/gstack-sync pattern
- package.json: eval:* scripts now point to lib/cli-eval.ts
- supabase/migrations/004_eval_costs.sql: per-model cost tracking + RLS
- docs/eval-result-format.md: public format spec for any language
- test/lib-eval-cli.test.ts: integration tests (spawn CLI subprocess)
including 3 push failure modes (file-not-found, invalid schema,
sync unavailable)
215 tests passing across 13 files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace project-specific references with generic language
- Add missing fields to eval result format: prompt_sha, by_category,
timestamp, response_preview
- Enrich failure format with details array, scores dict, expectation_type
- Add EVAL_JUDGE_CACHE, EVAL_VERBOSE, multiprocess worker support,
dedup on push, run scopes, model aliases, judge profiles
- Restructure credential storage to 4 layers with gstack-config (v0.3.9)
for user preferences (sync_enabled, sync_transcripts)
- Update integration points, observability, and reuse map
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design doc for Supabase-backed team data store and universal eval
infrastructure. Covers architecture, credential storage, eval formats,
YAML test case spec, Supabase schema, phased rollout, and security model.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>