Files
gstack/test
Garry Tan 6c610bf55d test: fix carve-broken CI evals (union reads + section fixtures)
Two CI eval jobs failed on the carved plan-* skills because they read content
that moved into sections/:

- llm-judge (skill-llm-eval): runWorkflowJudge sliced SKILL.md between markers
  like "## Review Sections" / "## CRITICAL RULE" that now live in
  sections/review-sections.md. The markers vanished from the skeleton, so the
  judge scored empty/wrong content. Fix: read the skeleton+sections union.
  Verified: plan-ceo modes / plan-eng sections / plan-design passes all PASS
  (25/25).

- e2e-plan (skill-e2e-plan): setupPlanDir copied only <skill>/SKILL.md into the
  fixture, not sections/. The carved skill's STOP pointed at a section file that
  was absent, so the model improvised a compressed report table instead of the
  canonical "| Review | Trigger | Why | Runs | Status | Findings |". Fix: copy
  sections/ alongside SKILL.md in all 6 setup sites. Verified: report test PASS,
  canonical table emitted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:40:40 -07:00
..