test(opus-4.7): tighten ambiguous /qa routing prompt

"does this feature work on mobile? can you check the deploy?" was too
vague — a reasonable agent asks "which feature?" via AskUserQuestion
instead of routing to /qa. That's not a routing miss, it's an under-
specified prompt.

Replaced with "I just pushed the login flow changes. Test the deployed
site and find any bugs." — concrete subject + clear QA verb.

Result: pos-does-it-work went from MISS to OK, routing TP rate 2/3 -> 3/3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-22 00:27:28 -07:00
parent 36ef9d9db0
commit 723f9957f2
+1 -1
View File
@@ -99,7 +99,7 @@ const ROUTING_CASES: RoutingCase[] = [
// Positive — should route
{ name: 'pos-wtf-bug', prompt: "wtf is this error coming from auth.ts:47 when the cookie expires?", shouldRoute: true, expectedSkill: 'investigate' },
{ name: 'pos-send-it', prompt: "ok this is good enough, let's send it.", shouldRoute: true, expectedSkill: 'ship' },
{ name: 'pos-does-it-work', prompt: "does this feature work on mobile? can you check the deploy?", shouldRoute: true, expectedSkill: 'qa' },
{ name: 'pos-does-it-work', prompt: "I just pushed the login flow changes. Test the deployed site and find any bugs.", shouldRoute: true, expectedSkill: 'qa' },
// Negative — should NOT route
{ name: 'neg-syntax-q', prompt: "wtf does this Python list comprehension syntax even mean, [x for x in y if z]?", shouldRoute: false },
{ name: 'neg-algo-q', prompt: "does this bubble sort algorithm actually work in O(n log n)?", shouldRoute: false },