mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-17 07:10:12 +02:00
test: refresh codex/factory ship goldens with detached-eval block
a38089aa added the gstack-detach guidance to the ship template and
updated the claude golden; the codex and factory goldens missed the same
16-line block. Regenerated via bun run gen:skill-docs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
+16
@@ -1290,6 +1290,22 @@ EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval
|
||||
|
||||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||||
|
||||
**Long eval suites (30+ min): launch detached so a turn boundary can't kill them.**
|
||||
A plain backgrounded eval lives in the harness's process group and dies to a
|
||||
SIGTERM ("polite quit") on a turn boundary, a stopped monitor, or an interruption
|
||||
(observed mid-`/ship`: `script terminated by signal SIGTERM`). Run it through
|
||||
`$GSTACK_ROOT/bin/gstack-detach` instead — it survives in its own
|
||||
session, serializes against other worktrees via a machine lock (no API
|
||||
saturation), and writes a guaranteed `### gstack-detach EXIT=<code> ###` sentinel:
|
||||
|
||||
```bash
|
||||
$GSTACK_ROOT/bin/gstack-detach --label ship-evals --lock gstack-evals --timeout 5400 -- <project eval command>
|
||||
```
|
||||
|
||||
Then poll the printed log path; break on the `EXIT=` sentinel (covers both pass
|
||||
and crash — silence is never success). The detached run survives even if your
|
||||
poller is reaped.
|
||||
|
||||
**4. Check results:**
|
||||
|
||||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||||
|
||||
+16
@@ -1292,6 +1292,22 @@ EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval
|
||||
|
||||
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
|
||||
|
||||
**Long eval suites (30+ min): launch detached so a turn boundary can't kill them.**
|
||||
A plain backgrounded eval lives in the harness's process group and dies to a
|
||||
SIGTERM ("polite quit") on a turn boundary, a stopped monitor, or an interruption
|
||||
(observed mid-`/ship`: `script terminated by signal SIGTERM`). Run it through
|
||||
`$GSTACK_ROOT/bin/gstack-detach` instead — it survives in its own
|
||||
session, serializes against other worktrees via a machine lock (no API
|
||||
saturation), and writes a guaranteed `### gstack-detach EXIT=<code> ###` sentinel:
|
||||
|
||||
```bash
|
||||
$GSTACK_ROOT/bin/gstack-detach --label ship-evals --lock gstack-evals --timeout 5400 -- <project eval command>
|
||||
```
|
||||
|
||||
Then poll the printed log path; break on the `EXIT=` sentinel (covers both pass
|
||||
and crash — silence is never success). The detached run survives even if your
|
||||
poller is reaped.
|
||||
|
||||
**4. Check results:**
|
||||
|
||||
- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
|
||||
|
||||
Reference in New Issue
Block a user