mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 13:45:35 +02:00
feat: /debug sub-agent escalation from /qa + recommendations in /review and /ship (v0.6.5.0) (#192)
* feat: add browse access to /debug for visual verification Debug skill can now use the browse binary to visually reproduce bugs, take screenshots as evidence, and verify fixes. This makes /debug effective for web app bugs when spawned as a sub-agent from /qa. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug sub-agent escalation to /qa (Phase 8g) When QA fix attempts fail twice on the same bug (reverted due to regressions), /qa now spawns a /debug sub-agent with a structured bug brief including symptoms, repro steps, failed fix details, and file paths. Results are reported in Phase 10's debug escalation summary. Sequential execution: one debug investigation at a time, working tree cleaned between investigations. Graceful degradation on all failure modes (BLOCKED, agent failure → deferred in report). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug recommendation to /review (Step 5.7) When /review finds what appears to be a pre-existing bug in the base branch (not introduced by the PR's diff), it now classifies it as INFORMATIONAL and recommends running /debug for systematic root-cause investigation. No Agent spawning — /review's scope stays on the diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add reverted QA commit detection to /ship During pre-landing review, /ship now checks for reverted fix(qa): commits in the branch history and recommends /debug for systematic investigation. Informational only — does not block shipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add debug escalation tests (validation + LLM judge + E2E) Skill validation: 11 new assertions covering Phase 8g trigger, structured handoff fields, agent result handlers, debug escalation summary, Step 5.7 recommendation, ship reverted QA detection, and debug browse setup. LLM judge: evaluates Phase 8g template quality — structured brief format, result handling, working tree cleanup, sequential processing. E2E: prompt-level deterministic test (verifies escalation prompt has all required fields) + full flow stub (fixture TODO for planted regression). Touchfile entries for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add worktree parallel debug agents to TODOS.md (P2) When /qa hits multiple stubborn bugs, parallel debug agents in isolated git worktrees could investigate simultaneously. Deferred from the sequential debug escalation PR as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection Two new E2E tests: - review-pre-existing-bug: plants SQL injection in base branch, verifies Step 5.7 classifies as INFORMATIONAL and recommends /debug - ship-reverted-qa-commits: creates branch with reverted fix(qa): commits, verifies /ship detects them and recommends /debug Also fixes qa-debug-prompt-logic to use correct workingDirectory, and ensures test repo init uses -b main for portability. All 4 debug-related evals pass: $0.34 total, 94s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -145,6 +145,25 @@ ATTEMPTED: [what you tried]
|
||||
RECOMMENDATION: [what the user should do next]
|
||||
```
|
||||
|
||||
## SETUP (run this check BEFORE any browse command)
|
||||
|
||||
```bash
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
B=""
|
||||
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
|
||||
[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
|
||||
if [ -x "$B" ]; then
|
||||
echo "READY: $B"
|
||||
else
|
||||
echo "NEEDS_SETUP"
|
||||
fi
|
||||
```
|
||||
|
||||
If `NEEDS_SETUP`:
|
||||
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
# Systematic Debugging
|
||||
|
||||
## Iron Law
|
||||
@@ -171,6 +190,14 @@ Gather context before forming any hypothesis.
|
||||
|
||||
4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.
|
||||
|
||||
5. **Visual reproduction (if browser-testable bug):** Use the browse binary to reproduce the bug:
|
||||
```bash
|
||||
$B goto <affected-url>
|
||||
$B screenshot screenshots/debug-ISSUE-NNN-repro.png
|
||||
$B console --errors
|
||||
```
|
||||
Compare what you see against the reported symptom. If the bug is not reproducible visually, that's useful evidence too — the root cause may be in data flow, not rendering.
|
||||
|
||||
Output: **"Root cause hypothesis: ..."** — a specific, testable claim about what is wrong and why.
|
||||
|
||||
---
|
||||
@@ -249,6 +276,14 @@ Once root cause is confirmed:
|
||||
|
||||
Run the test suite and paste the output.
|
||||
|
||||
**Visual verification (if browser-testable bug):**
|
||||
```bash
|
||||
$B goto <affected-url>
|
||||
$B screenshot screenshots/debug-ISSUE-NNN-fixed.png
|
||||
$B console --errors
|
||||
```
|
||||
Compare against the original reproduction screenshot. The bug should no longer be present.
|
||||
|
||||
Output a structured debug report:
|
||||
```
|
||||
DEBUG REPORT
|
||||
|
||||
@@ -16,6 +16,8 @@ allowed-tools:
|
||||
|
||||
{{PREAMBLE}}
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
# Systematic Debugging
|
||||
|
||||
## Iron Law
|
||||
@@ -42,6 +44,14 @@ Gather context before forming any hypothesis.
|
||||
|
||||
4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.
|
||||
|
||||
5. **Visual reproduction (if browser-testable bug):** Use the browse binary to reproduce the bug:
|
||||
```bash
|
||||
$B goto <affected-url>
|
||||
$B screenshot screenshots/debug-ISSUE-NNN-repro.png
|
||||
$B console --errors
|
||||
```
|
||||
Compare what you see against the reported symptom. If the bug is not reproducible visually, that's useful evidence too — the root cause may be in data flow, not rendering.
|
||||
|
||||
Output: **"Root cause hypothesis: ..."** — a specific, testable claim about what is wrong and why.
|
||||
|
||||
---
|
||||
@@ -120,6 +130,14 @@ Once root cause is confirmed:
|
||||
|
||||
Run the test suite and paste the output.
|
||||
|
||||
**Visual verification (if browser-testable bug):**
|
||||
```bash
|
||||
$B goto <affected-url>
|
||||
$B screenshot screenshots/debug-ISSUE-NNN-fixed.png
|
||||
$B console --errors
|
||||
```
|
||||
Compare against the original reproduction screenshot. The bug should no longer be present.
|
||||
|
||||
Output a structured debug report:
|
||||
```
|
||||
DEBUG REPORT
|
||||
|
||||
Reference in New Issue
Block a user