mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 13:15:24 +02:00
f458f18f42
The test checked for exact keywords like "RECOMMENDATION", "option a", "which approach" but the model sometimes phrases options as "A)" or references "Checkout" vs "Elements" directly without using the word "recommend". Added: "option b", regex for "a)"/"b)", and the actual decision terms (checkout, elements, hosted, embedded). Failed 3/3 retries in CI because the assertion was too narrow for non-deterministic LLM output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>