mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-29 21:15:37 +02:00
test(auq): grade format-compliance gate from SDK capture, not the TUI
The real-PTY version grepped the stripAnsi'd interactive AUQ picker. Verified directly that this cannot work: plan-mode AUQs render as a cursor picker whose cursor-positioning escapes stripAnsi can't flatten — the picker renders fine for a human (cursorSeen=45) but the flattened text drops ELI10:/(recommended) and parseNumberedOptions returns 0. The test was grading a lossy projection and failed by construction. Rewritten to drive /plan-ceo-review via the SDK $OUT_FILE capture (the agent writes the verbatim question it would have shown — clean text, no rendering loss) and grade 7/7 format + kind-note + recommendation substance >=4. Same property, reliable, environment-independent; shares the engine with the periodic A/B and matrix evals. Result: 7/7 format, substance 5. Touchfiles key renamed ask-user-question-format-pty -> auq-format-gate (no longer a PTY test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -94,7 +94,7 @@ describe('selectTests', () => {
|
||||
expect(result.selected).toContain('plan-review-prosons-hardstop-neg');
|
||||
expect(result.selected).toContain('plan-review-prosons-neutral-neg');
|
||||
// v1.13.x real-PTY E2E batch entries that also depend on plan-ceo-review/**
|
||||
expect(result.selected).toContain('ask-user-question-format-pty');
|
||||
expect(result.selected).toContain('auq-format-gate');
|
||||
expect(result.selected).toContain('plan-ceo-mode-routing');
|
||||
expect(result.selected).toContain('autoplan-chain-pty');
|
||||
// Per-finding count + review-report-at-bottom (v1.21.x)
|
||||
|
||||
Reference in New Issue
Block a user