mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-02 16:21:38 +02:00
feat: add CI eval workflow on Ubicloud runners
Single-job GitHub Actions workflow that runs E2E evals on every PR using Ubicloud runners ($0.006/run — 10x cheaper than GitHub standard). Uses EVALS_CONCURRENCY=40 with the new within-file concurrency for ~6min wall clock. Downloads previous eval artifact from main for comparison, uploads results, and posts a PR comment with pass/fail + cost. Ubicloud setup required: connect GitHub repo via ubicloud.com dashboard, add ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY as repo secrets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -338,17 +338,6 @@
|
||||
**Depends on:** Video recording
|
||||
|
||||
|
||||
### GitHub Actions eval upload
|
||||
|
||||
**What:** Run eval suite in CI, upload result JSON as artifact, post summary comment on PR.
|
||||
|
||||
**Why:** CI integration catches quality regressions before merge and provides persistent eval records per PR.
|
||||
|
||||
**Context:** Requires `ANTHROPIC_API_KEY` in CI secrets. Cost is ~$4/run. Eval persistence system (v0.3.6) writes JSON to `~/.gstack-dev/evals/` — CI would upload as GitHub Actions artifacts and use `eval:compare` to post delta comment.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** Eval persistence (shipped in v0.3.6)
|
||||
|
||||
### E2E model pinning — SHIPPED
|
||||
|
||||
@@ -539,6 +528,14 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr
|
||||
|
||||
## Completed
|
||||
|
||||
### CI eval pipeline (v0.9.9.0)
|
||||
- GitHub Actions eval upload on Ubicloud runners ($0.006/run)
|
||||
- Within-file test concurrency (test() → testConcurrentIfSelected())
|
||||
- Eval artifact upload + PR comment with pass/fail + cost
|
||||
- Baseline comparison via artifact download from main
|
||||
- EVALS_CONCURRENCY=40 for ~6min wall clock (was ~18min)
|
||||
**Completed:** v0.9.9.0
|
||||
|
||||
### Deploy pipeline (v0.9.8.0)
|
||||
- /land-and-deploy — merge PR, wait for CI/deploy, canary verification
|
||||
- /canary — post-deploy monitoring loop with anomaly detection
|
||||
|
||||
Reference in New Issue
Block a user