mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-19 08:10:08 +02:00
test: add multi-finding batching regression test (periodic tier)
Adds a periodic-tier E2E that catches the May 2026 transcript bug shape the existing single-finding gate-tier floor test cannot detect: a model that fires one AskUserQuestion and then batches the remaining findings into a single "## Decisions to confirm" plan write + ExitPlanMode. Why a separate test from skill-e2e-plan-eng-finding-floor: the gate-tier floor (runPlanSkillFloorCheck) exits on the first AUQ render and returns success, so a once-then-batch model would pass it trivially. This test uses runPlanSkillCounting at periodic tier with N-AUQ tracking and asserts >= 3 distinct review-phase AUQs on a 4-finding seeded plan. - test/fixtures/forcing-finding-seeds.ts: FORCING_BATCHING_ENG fixture (4 distinct non-trivial findings spread across Architecture, Code Quality, Tests, Performance — mirrors the D1-D4 transcript shape) - test/skill-e2e-plan-eng-multi-finding-batching.test.ts: new test - test/helpers/touchfiles.ts: registered in BOTH E2E_TOUCHFILES and E2E_TIERS (touchfiles.test.ts asserts exact equality) Test will fail on baseline today because today's model uses the preamble fallback to batch findings; passes after the architectural fix lands in a follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+39
@@ -81,3 +81,42 @@ export const FORCING_FLOOR_DEVEX = [
|
||||
'',
|
||||
'No quickstart command, no hosted sandbox, no copy-pasteable curl example.',
|
||||
].join('\n');
|
||||
|
||||
/**
|
||||
* Multi-finding batching regression seed (periodic tier).
|
||||
*
|
||||
* Mirrors the May 2026 transcript bug shape: 4 distinct non-trivial findings
|
||||
* spread across plan-eng-review's standard sections (Architecture, Code
|
||||
* Quality, Tests, Performance). Each finding is independent — there is no
|
||||
* legitimate reason to batch them into a single AskUserQuestion.
|
||||
*
|
||||
* Used by test/skill-e2e-plan-eng-multi-finding-batching.test.ts to assert
|
||||
* the agent fires >= 3 review-phase AUQs (i.e., does NOT batch them into a
|
||||
* "## Decisions to confirm" section + ExitPlanMode). Floor of 3 (not 4) is
|
||||
* the [N-1] tolerance from the existing finding-count band convention.
|
||||
*/
|
||||
export const FORCING_BATCHING_ENG = [
|
||||
'Please review this plan thoroughly. As you go, write your plan-mode plan to /tmp/gstack-test-plan-eng-batching.md (use Edit/Write to that exact path).',
|
||||
'',
|
||||
'# Plan: Add background job retry framework',
|
||||
'',
|
||||
'## Architecture',
|
||||
"We'll roll a custom exponential-backoff scheduler inline in each worker",
|
||||
"rather than use the existing job library's built-in retry hooks. Same",
|
||||
'shape as the library version, but we want full control over the curve.',
|
||||
'',
|
||||
'## Code quality',
|
||||
'The retry envelope (compute delay, log attempt, dispatch) is duplicated',
|
||||
'across 5 worker files with copy-pasted bodies. We will leave the',
|
||||
'duplication for now and refactor "later."',
|
||||
'',
|
||||
'## Tests',
|
||||
'The existing `processWebhookJob()` flow gets rewritten as part of this',
|
||||
'change. No regression test for the prior at-most-once delivery guarantee',
|
||||
'is planned.',
|
||||
'',
|
||||
'## Performance',
|
||||
'On every retry we re-fetch the full job payload from the database, then',
|
||||
'iterate the payload to recompute the dependency graph. Could cache the',
|
||||
'graph on the first attempt; not planned.',
|
||||
].join('\n');
|
||||
|
||||
Reference in New Issue
Block a user