feat: add CI eval workflow on Ubicloud runners

Single-job GitHub Actions workflow that runs E2E evals on every PR using
Ubicloud runners ($0.006/run — 10x cheaper than GitHub standard). Uses
EVALS_CONCURRENCY=40 with the new within-file concurrency for ~6min
wall clock. Downloads previous eval artifact from main for comparison,
uploads results, and posts a PR comment with pass/fail + cost.

Ubicloud setup required: connect GitHub repo via ubicloud.com dashboard,
add ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY as repo secrets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-22 23:06:13 -07:00
parent 7ab00588c3
commit e589a95116
3 changed files with 121 additions and 32 deletions
+8 -11
View File
@@ -338,17 +338,6 @@
**Depends on:** Video recording
### GitHub Actions eval upload
**What:** Run eval suite in CI, upload result JSON as artifact, post summary comment on PR.
**Why:** CI integration catches quality regressions before merge and provides persistent eval records per PR.
**Context:** Requires `ANTHROPIC_API_KEY` in CI secrets. Cost is ~$4/run. Eval persistence system (v0.3.6) writes JSON to `~/.gstack-dev/evals/` — CI would upload as GitHub Actions artifacts and use `eval:compare` to post delta comment.
**Effort:** M
**Priority:** P2
**Depends on:** Eval persistence (shipped in v0.3.6)
### E2E model pinning — SHIPPED
@@ -539,6 +528,14 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr
## Completed
### CI eval pipeline (v0.9.9.0)
- GitHub Actions eval upload on Ubicloud runners ($0.006/run)
- Within-file test concurrency (test() → testConcurrentIfSelected())
- Eval artifact upload + PR comment with pass/fail + cost
- Baseline comparison via artifact download from main
- EVALS_CONCURRENCY=40 for ~6min wall clock (was ~18min)
**Completed:** v0.9.9.0
### Deploy pipeline (v0.9.8.0)
- /land-and-deploy — merge PR, wait for CI/deploy, canary verification
- /canary — post-deploy monitoring loop with anomaly detection