mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 22:16:52 +02:00
feat: /debug sub-agent escalation from /qa + recommendations in /review and /ship (v0.6.5.0) (#192)
* feat: add browse access to /debug for visual verification Debug skill can now use the browse binary to visually reproduce bugs, take screenshots as evidence, and verify fixes. This makes /debug effective for web app bugs when spawned as a sub-agent from /qa. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug sub-agent escalation to /qa (Phase 8g) When QA fix attempts fail twice on the same bug (reverted due to regressions), /qa now spawns a /debug sub-agent with a structured bug brief including symptoms, repro steps, failed fix details, and file paths. Results are reported in Phase 10's debug escalation summary. Sequential execution: one debug investigation at a time, working tree cleaned between investigations. Graceful degradation on all failure modes (BLOCKED, agent failure → deferred in report). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug recommendation to /review (Step 5.7) When /review finds what appears to be a pre-existing bug in the base branch (not introduced by the PR's diff), it now classifies it as INFORMATIONAL and recommends running /debug for systematic root-cause investigation. No Agent spawning — /review's scope stays on the diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add reverted QA commit detection to /ship During pre-landing review, /ship now checks for reverted fix(qa): commits in the branch history and recommends /debug for systematic investigation. Informational only — does not block shipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add debug escalation tests (validation + LLM judge + E2E) Skill validation: 11 new assertions covering Phase 8g trigger, structured handoff fields, agent result handlers, debug escalation summary, Step 5.7 recommendation, ship reverted QA detection, and debug browse setup. LLM judge: evaluates Phase 8g template quality — structured brief format, result handling, working tree cleanup, sequential processing. E2E: prompt-level deterministic test (verifies escalation prompt has all required fields) + full flow stub (fixture TODO for planted regression). Touchfile entries for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add worktree parallel debug agents to TODOS.md (P2) When /qa hits multiple stubborn bugs, parallel debug agents in isolated git worktrees could investigate simultaneously. Deferred from the sequential debug escalation PR as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection Two new E2E tests: - review-pre-existing-bug: plants SQL injection in base branch, verifies Step 5.7 classifies as INFORMATIONAL and recommends /debug - ship-reverted-qa-commits: creates branch with reverted fix(qa): commits, verifies /ship detects them and recommends /debug Also fixes qa-debug-prompt-logic to use correct workingDirectory, and ensures test repo init uses -b main for portability. All 4 debug-related evals pass: $0.34 total, 94s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+50
@@ -17,6 +17,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
- Agent
|
||||
---
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
@@ -832,6 +833,41 @@ WTF-LIKELIHOOD:
|
||||
|
||||
**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
|
||||
|
||||
### 8g. Debug Escalation
|
||||
|
||||
After the fix loop completes (all issues processed through 8a-8f), check for issues that were **reverted at least twice** (two or more separate fix attempts each caused regressions and were reverted).
|
||||
|
||||
If no issues match, skip this phase entirely.
|
||||
|
||||
For each matching issue, sequentially spawn a debug sub-agent using the Agent tool:
|
||||
|
||||
**Agent prompt — structured handoff:**
|
||||
|
||||
> Read `~/.claude/skills/gstack/debug/SKILL.md` (fallback: `.claude/skills/debug/SKILL.md`) and follow the systematic debugging methodology.
|
||||
>
|
||||
> **Bug Brief:**
|
||||
> - Issue ID: ISSUE-NNN
|
||||
> - Symptom: [what was observed during QA testing]
|
||||
> - Severity: [critical/high/medium/low]
|
||||
> - Affected URL: [page URL where bug was found]
|
||||
> - Reproduction: [step-by-step repro from QA testing]
|
||||
> - Screenshot evidence: [paths to before/after screenshots from QA]
|
||||
> - Failed fix attempts:
|
||||
> 1. [what was changed] → [what regression it caused] (reverted)
|
||||
> 2. [what was changed] → [what regression it caused] (reverted)
|
||||
> - Files investigated: [list of source files already examined during fix attempts]
|
||||
> - Console errors: [relevant JS errors if any]
|
||||
>
|
||||
> Investigate this bug. You have full access to AskUserQuestion if you need user input. Output the DEBUG REPORT when done.
|
||||
|
||||
**When the debug agent returns:**
|
||||
- **DONE status:** The fix is in the working tree. Re-test the affected page via browse. If verified, commit with `fix(qa/debug): ISSUE-NNN — [description]`. If regression detected, revert and mark as "deferred (debug-investigated, fix regressed)."
|
||||
- **DONE_WITH_CONCERNS:** Apply the fix, note concerns in report. Re-test.
|
||||
- **BLOCKED:** Run `git checkout .` to discard any uncommitted debug artifacts (temp logs, assertions). Mark as "deferred (debug-investigated, root cause unclear)." Include the debug report in Phase 10 output.
|
||||
- **Agent call fails:** Mark as "deferred (debug unavailable)." Continue to Phase 9.
|
||||
|
||||
**Sequencing:** Process one debug issue at a time. Wait for each agent to complete before spawning the next. The working tree must be clean between debug investigations.
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: Final QA
|
||||
@@ -872,6 +908,20 @@ Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
|
||||
**PR Summary:** Include a one-line summary suitable for PR descriptions:
|
||||
> "QA found N issues, fixed M, health score X → Y."
|
||||
|
||||
**Debug Escalation Summary** (include only if any issues were escalated in Phase 8g):
|
||||
```
|
||||
DEBUG ESCALATION
|
||||
Issues escalated: N
|
||||
Resolved (DONE): X (commit SHAs)
|
||||
Concerns: Y
|
||||
Blocked: Z (root cause unclear)
|
||||
Unavailable: W (agent call failed)
|
||||
|
||||
Per-issue details:
|
||||
ISSUE-NNN: [symptom] → [root cause found / BLOCKED]
|
||||
Debug report: [inline summary or full report]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 11: TODOS.md Update
|
||||
|
||||
@@ -17,6 +17,7 @@ allowed-tools:
|
||||
- Grep
|
||||
- AskUserQuestion
|
||||
- WebSearch
|
||||
- Agent
|
||||
---
|
||||
|
||||
{{PREAMBLE}}
|
||||
@@ -245,6 +246,41 @@ WTF-LIKELIHOOD:
|
||||
|
||||
**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
|
||||
|
||||
### 8g. Debug Escalation
|
||||
|
||||
After the fix loop completes (all issues processed through 8a-8f), check for issues that were **reverted at least twice** (two or more separate fix attempts each caused regressions and were reverted).
|
||||
|
||||
If no issues match, skip this phase entirely.
|
||||
|
||||
For each matching issue, sequentially spawn a debug sub-agent using the Agent tool:
|
||||
|
||||
**Agent prompt — structured handoff:**
|
||||
|
||||
> Read `~/.claude/skills/gstack/debug/SKILL.md` (fallback: `.claude/skills/debug/SKILL.md`) and follow the systematic debugging methodology.
|
||||
>
|
||||
> **Bug Brief:**
|
||||
> - Issue ID: ISSUE-NNN
|
||||
> - Symptom: [what was observed during QA testing]
|
||||
> - Severity: [critical/high/medium/low]
|
||||
> - Affected URL: [page URL where bug was found]
|
||||
> - Reproduction: [step-by-step repro from QA testing]
|
||||
> - Screenshot evidence: [paths to before/after screenshots from QA]
|
||||
> - Failed fix attempts:
|
||||
> 1. [what was changed] → [what regression it caused] (reverted)
|
||||
> 2. [what was changed] → [what regression it caused] (reverted)
|
||||
> - Files investigated: [list of source files already examined during fix attempts]
|
||||
> - Console errors: [relevant JS errors if any]
|
||||
>
|
||||
> Investigate this bug. You have full access to AskUserQuestion if you need user input. Output the DEBUG REPORT when done.
|
||||
|
||||
**When the debug agent returns:**
|
||||
- **DONE status:** The fix is in the working tree. Re-test the affected page via browse. If verified, commit with `fix(qa/debug): ISSUE-NNN — [description]`. If regression detected, revert and mark as "deferred (debug-investigated, fix regressed)."
|
||||
- **DONE_WITH_CONCERNS:** Apply the fix, note concerns in report. Re-test.
|
||||
- **BLOCKED:** Run `git checkout .` to discard any uncommitted debug artifacts (temp logs, assertions). Mark as "deferred (debug-investigated, root cause unclear)." Include the debug report in Phase 10 output.
|
||||
- **Agent call fails:** Mark as "deferred (debug unavailable)." Continue to Phase 9.
|
||||
|
||||
**Sequencing:** Process one debug issue at a time. Wait for each agent to complete before spawning the next. The working tree must be clean between debug investigations.
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: Final QA
|
||||
@@ -285,6 +321,20 @@ Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
|
||||
**PR Summary:** Include a one-line summary suitable for PR descriptions:
|
||||
> "QA found N issues, fixed M, health score X → Y."
|
||||
|
||||
**Debug Escalation Summary** (include only if any issues were escalated in Phase 8g):
|
||||
```
|
||||
DEBUG ESCALATION
|
||||
Issues escalated: N
|
||||
Resolved (DONE): X (commit SHAs)
|
||||
Concerns: Y
|
||||
Blocked: Z (root cause unclear)
|
||||
Unavailable: W (agent call failed)
|
||||
|
||||
Per-issue details:
|
||||
ISSUE-NNN: [symptom] → [root cause found / BLOCKED]
|
||||
Debug report: [inline summary or full report]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 11: TODOS.md Update
|
||||
|
||||
Reference in New Issue
Block a user