feat: eval:bg* scripts — detached eval runs for agents

Agent-facing convenience scripts that launch the eval suites through
gstack-detach so a harness SIGTERM can't kill a long run. eval:bg (diff-based),
eval:bg:all, eval:bg:gate, eval:bg:periodic — each returns immediately and
streams to /tmp/gstack-evals.log for polling. The plain test:evals / test:e2e
scripts stay foreground for humans.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-06-11 23:06:33 -07:00
parent d1fc21cbca
commit f8a0dc0888
+4
View File
@@ -33,6 +33,10 @@
"skill:check": "bun run scripts/skill-check.ts",
"dev:skill": "bun run scripts/dev-skill.ts",
"start": "bun run browse/src/server.ts",
"eval:bg": "bin/gstack-detach /tmp/gstack-evals.log -- bun run test:evals",
"eval:bg:all": "bin/gstack-detach /tmp/gstack-evals.log -- bun run test:evals:all",
"eval:bg:gate": "bin/gstack-detach /tmp/gstack-evals.log -- bun run test:gate",
"eval:bg:periodic": "bin/gstack-detach /tmp/gstack-evals.log -- bun run test:periodic",
"eval:list": "bun run scripts/eval-list.ts",
"eval:compare": "bun run scripts/eval-compare.ts",
"eval:summary": "bun run scripts/eval-summary.ts",