fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions

Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.

Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
   instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
   ship descriptions — reinforces the routing signal

10/10 journey tests now pass (was 7/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-29 20:44:11 -07:00
parent 0784264aa0
commit a2ee09519c
10 changed files with 110 additions and 61 deletions
+24 -5
View File
@@ -93,11 +93,30 @@ function installSkills(tmpDir: string) {
}
}
// Copy CLAUDE.md so Claude has project context for skill routing.
const claudeMdSrc = path.join(ROOT, 'CLAUDE.md');
if (fs.existsSync(claudeMdSrc)) {
fs.copyFileSync(claudeMdSrc, path.join(tmpDir, 'CLAUDE.md'));
}
// Write a CLAUDE.md with explicit routing instructions.
// The skill descriptions in system-reminder aren't strong enough to override
// Claude's default behavior of answering directly. A CLAUDE.md instruction
// puts routing rules in project context which Claude weighs more heavily.
fs.writeFileSync(path.join(tmpDir, 'CLAUDE.md'), `# Project Instructions
## Skill routing
When the user's request matches an available skill, ALWAYS invoke it using the Skill
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
The skill has specialized workflows that produce better results than ad-hoc answers.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke investigate
- Ship, deploy, push, create PR → invoke ship
- QA, test the site, find bugs → invoke qa
- Code review, check my diff → invoke review
- Update docs after shipping → invoke document-release
- Weekly retro → invoke retro
- Design system, brand → invoke design-consultation
- Visual audit, design polish → invoke design-review
- Architecture review → invoke plan-eng-review
`);
}
/** Init a git repo with config */
+2 -2
View File
@@ -1409,13 +1409,13 @@ describe('Skill trigger phrases', () => {
];
for (const skill of SKILLS_REQUIRING_PROACTIVE) {
test(`${skill}/SKILL.md has "Proactively suggest" phrase`, () => {
test(`${skill}/SKILL.md has proactive routing phrase`, () => {
const skillPath = path.join(ROOT, skill, 'SKILL.md');
if (!fs.existsSync(skillPath)) return;
const content = fs.readFileSync(skillPath, 'utf-8');
const frontmatterEnd = content.indexOf('---', 4);
const frontmatter = content.slice(0, frontmatterEnd);
expect(frontmatter).toMatch(/Proactively suggest/i);
expect(frontmatter).toMatch(/Proactively (suggest|invoke)/i);
});
}
});