mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-12 07:41:35 +02:00
feat: /debug sub-agent escalation from /qa + recommendations in /review and /ship (v0.6.5.0) (#192)
* feat: add browse access to /debug for visual verification Debug skill can now use the browse binary to visually reproduce bugs, take screenshots as evidence, and verify fixes. This makes /debug effective for web app bugs when spawned as a sub-agent from /qa. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug sub-agent escalation to /qa (Phase 8g) When QA fix attempts fail twice on the same bug (reverted due to regressions), /qa now spawns a /debug sub-agent with a structured bug brief including symptoms, repro steps, failed fix details, and file paths. Results are reported in Phase 10's debug escalation summary. Sequential execution: one debug investigation at a time, working tree cleaned between investigations. Graceful degradation on all failure modes (BLOCKED, agent failure → deferred in report). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug recommendation to /review (Step 5.7) When /review finds what appears to be a pre-existing bug in the base branch (not introduced by the PR's diff), it now classifies it as INFORMATIONAL and recommends running /debug for systematic root-cause investigation. No Agent spawning — /review's scope stays on the diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add reverted QA commit detection to /ship During pre-landing review, /ship now checks for reverted fix(qa): commits in the branch history and recommends /debug for systematic investigation. Informational only — does not block shipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add debug escalation tests (validation + LLM judge + E2E) Skill validation: 11 new assertions covering Phase 8g trigger, structured handoff fields, agent result handlers, debug escalation summary, Step 5.7 recommendation, ship reverted QA detection, and debug browse setup. LLM judge: evaluates Phase 8g template quality — structured brief format, result handling, working tree cleanup, sequential processing. E2E: prompt-level deterministic test (verifies escalation prompt has all required fields) + full flow stub (fixture TODO for planted regression). Touchfile entries for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add worktree parallel debug agents to TODOS.md (P2) When /qa hits multiple stubborn bugs, parallel debug agents in isolated git worktrees could investigate simultaneously. Deferred from the sequential debug escalation PR as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection Two new E2E tests: - review-pre-existing-bug: plants SQL injection in base branch, verifies Step 5.7 classifies as INFORMATIONAL and recommends /debug - ship-reverted-qa-commits: creates branch with reverted fix(qa): commits, verifies /ship detects them and recommends /debug Also fixes qa-debug-prompt-logic to use correct workingDirectory, and ensures test repo init uses -b main for portability. All 4 debug-related evals pass: $0.34 total, 94s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -620,6 +620,74 @@ describe('debug skill structure', () => {
|
||||
'DEBUG REPORT', '3-strike', 'BLOCKED']) {
|
||||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||||
}
|
||||
|
||||
test('has browse setup for visual reproduction', () => {
|
||||
expect(content).toContain('Visual reproduction');
|
||||
expect(content).toContain('$B goto');
|
||||
expect(content).toContain('$B screenshot');
|
||||
});
|
||||
|
||||
test('has visual verification in Phase 5', () => {
|
||||
expect(content).toContain('Visual verification');
|
||||
expect(content).toContain('debug-ISSUE-NNN-fixed');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Debug sub-agent escalation validation ---
|
||||
|
||||
describe('Debug sub-agent escalation', () => {
|
||||
test('qa/SKILL.md has Agent in allowed-tools', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
// Check frontmatter allowed-tools section contains Agent
|
||||
const frontmatter = content.split('---')[1];
|
||||
expect(frontmatter).toContain('Agent');
|
||||
});
|
||||
|
||||
test('qa/SKILL.md has Phase 8g Debug Escalation', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('8g. Debug Escalation');
|
||||
expect(content).toContain('reverted at least twice');
|
||||
expect(content).toContain('Bug Brief');
|
||||
expect(content).toContain('debug/SKILL.md');
|
||||
});
|
||||
|
||||
test('qa/SKILL.md has structured handoff in debug prompt', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Issue ID');
|
||||
expect(content).toContain('Symptom');
|
||||
expect(content).toContain('Reproduction');
|
||||
expect(content).toContain('Failed fix attempts');
|
||||
expect(content).toContain('Files investigated');
|
||||
});
|
||||
|
||||
test('qa/SKILL.md has all four agent result handlers', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('DONE status');
|
||||
expect(content).toContain('DONE_WITH_CONCERNS');
|
||||
expect(content).toContain('git checkout .');
|
||||
expect(content).toContain('deferred (debug unavailable)');
|
||||
});
|
||||
|
||||
test('qa/SKILL.md has debug escalation summary in Phase 10', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('DEBUG ESCALATION');
|
||||
expect(content).toContain('Issues escalated');
|
||||
expect(content).toContain('Per-issue details');
|
||||
});
|
||||
|
||||
test('review/SKILL.md has Step 5.7 pre-existing bug detection', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('5.7: Pre-existing bug detection');
|
||||
expect(content).toContain('/debug');
|
||||
expect(content).toContain('pre-existing issue');
|
||||
});
|
||||
|
||||
test('ship/SKILL.md has reverted QA commit detection', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
expect(content).toContain('Reverted QA fix detection');
|
||||
expect(content).toContain('revert.*fix(qa)');
|
||||
expect(content).toContain('/debug');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Contributor mode preamble structure validation ---
|
||||
|
||||
Reference in New Issue
Block a user