mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
94c1530efc
* feat: add browse access to /debug for visual verification Debug skill can now use the browse binary to visually reproduce bugs, take screenshots as evidence, and verify fixes. This makes /debug effective for web app bugs when spawned as a sub-agent from /qa. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug sub-agent escalation to /qa (Phase 8g) When QA fix attempts fail twice on the same bug (reverted due to regressions), /qa now spawns a /debug sub-agent with a structured bug brief including symptoms, repro steps, failed fix details, and file paths. Results are reported in Phase 10's debug escalation summary. Sequential execution: one debug investigation at a time, working tree cleaned between investigations. Graceful degradation on all failure modes (BLOCKED, agent failure → deferred in report). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug recommendation to /review (Step 5.7) When /review finds what appears to be a pre-existing bug in the base branch (not introduced by the PR's diff), it now classifies it as INFORMATIONAL and recommends running /debug for systematic root-cause investigation. No Agent spawning — /review's scope stays on the diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add reverted QA commit detection to /ship During pre-landing review, /ship now checks for reverted fix(qa): commits in the branch history and recommends /debug for systematic investigation. Informational only — does not block shipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add debug escalation tests (validation + LLM judge + E2E) Skill validation: 11 new assertions covering Phase 8g trigger, structured handoff fields, agent result handlers, debug escalation summary, Step 5.7 recommendation, ship reverted QA detection, and debug browse setup. LLM judge: evaluates Phase 8g template quality — structured brief format, result handling, working tree cleanup, sequential processing. E2E: prompt-level deterministic test (verifies escalation prompt has all required fields) + full flow stub (fixture TODO for planted regression). Touchfile entries for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add worktree parallel debug agents to TODOS.md (P2) When /qa hits multiple stubborn bugs, parallel debug agents in isolated git worktrees could investigate simultaneously. Deferred from the sequential debug escalation PR as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection Two new E2E tests: - review-pre-existing-bug: plants SQL injection in base branch, verifies Step 5.7 classifies as INFORMATIONAL and recommends /debug - ship-reverted-qa-commits: creates branch with reverted fix(qa): commits, verifies /ship detects them and recommends /debug Also fixes qa-debug-prompt-logic to use correct workingDirectory, and ensures test repo init uses -b main for portability. All 4 debug-related evals pass: $0.34 total, 94s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
167 lines
6.2 KiB
Cheetah
167 lines
6.2 KiB
Cheetah
---
|
|
name: debug
|
|
version: 1.0.0
|
|
description: |
|
|
Systematic debugging with root cause investigation. Four phases: investigate,
|
|
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- Write
|
|
- Edit
|
|
- Grep
|
|
- Glob
|
|
- AskUserQuestion
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
# Systematic Debugging
|
|
|
|
## Iron Law
|
|
|
|
**NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
|
|
|
|
Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find. Find the root cause, then fix it.
|
|
|
|
---
|
|
|
|
## Phase 1: Root Cause Investigation
|
|
|
|
Gather context before forming any hypothesis.
|
|
|
|
1. **Collect symptoms:** Read the error messages, stack traces, and reproduction steps. If the user hasn't provided enough context, ask ONE question at a time via AskUserQuestion.
|
|
|
|
2. **Read the code:** Trace the code path from the symptom back to potential causes. Use Grep to find all references, Read to understand the logic.
|
|
|
|
3. **Check recent changes:**
|
|
```bash
|
|
git log --oneline -20 -- <affected-files>
|
|
```
|
|
Was this working before? What changed? A regression means the root cause is in the diff.
|
|
|
|
4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.
|
|
|
|
5. **Visual reproduction (if browser-testable bug):** Use the browse binary to reproduce the bug:
|
|
```bash
|
|
$B goto <affected-url>
|
|
$B screenshot screenshots/debug-ISSUE-NNN-repro.png
|
|
$B console --errors
|
|
```
|
|
Compare what you see against the reported symptom. If the bug is not reproducible visually, that's useful evidence too — the root cause may be in data flow, not rendering.
|
|
|
|
Output: **"Root cause hypothesis: ..."** — a specific, testable claim about what is wrong and why.
|
|
|
|
---
|
|
|
|
## Phase 2: Pattern Analysis
|
|
|
|
Check if this bug matches a known pattern:
|
|
|
|
| Pattern | Signature | Where to look |
|
|
|---------|-----------|---------------|
|
|
| Race condition | Intermittent, timing-dependent | Concurrent access to shared state |
|
|
| Nil/null propagation | NoMethodError, TypeError | Missing guards on optional values |
|
|
| State corruption | Inconsistent data, partial updates | Transactions, callbacks, hooks |
|
|
| Integration failure | Timeout, unexpected response | External API calls, service boundaries |
|
|
| Configuration drift | Works locally, fails in staging/prod | Env vars, feature flags, DB state |
|
|
| Stale cache | Shows old data, fixes on cache clear | Redis, CDN, browser cache, Turbo |
|
|
|
|
Also check:
|
|
- `TODOS.md` for related known issues
|
|
- `git log` for prior fixes in the same area — **recurring bugs in the same files are an architectural smell**, not a coincidence
|
|
|
|
---
|
|
|
|
## Phase 3: Hypothesis Testing
|
|
|
|
Before writing ANY fix, verify your hypothesis.
|
|
|
|
1. **Confirm the hypothesis:** Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match?
|
|
|
|
2. **If the hypothesis is wrong:** Return to Phase 1. Gather more evidence. Do not guess.
|
|
|
|
3. **3-strike rule:** If 3 hypotheses fail, **STOP**. Use AskUserQuestion:
|
|
```
|
|
3 hypotheses tested, none match. This may be an architectural issue
|
|
rather than a simple bug.
|
|
|
|
A) Continue investigating — I have a new hypothesis: [describe]
|
|
B) Escalate for human review — this needs someone who knows the system
|
|
C) Add logging and wait — instrument the area and catch it next time
|
|
```
|
|
|
|
**Red flags** — if you see any of these, slow down:
|
|
- "Quick fix for now" — there is no "for now." Fix it right or escalate.
|
|
- Proposing a fix before tracing data flow — you're guessing.
|
|
- Each fix reveals a new problem elsewhere — wrong layer, not wrong code.
|
|
|
|
---
|
|
|
|
## Phase 4: Implementation
|
|
|
|
Once root cause is confirmed:
|
|
|
|
1. **Fix the root cause, not the symptom.** The smallest change that eliminates the actual problem.
|
|
|
|
2. **Minimal diff:** Fewest files touched, fewest lines changed. Resist the urge to refactor adjacent code.
|
|
|
|
3. **Write a regression test** that:
|
|
- **Fails** without the fix (proves the test is meaningful)
|
|
- **Passes** with the fix (proves the fix works)
|
|
|
|
4. **Run the full test suite.** Paste the output. No regressions allowed.
|
|
|
|
5. **If the fix touches >5 files:** Use AskUserQuestion to flag the blast radius:
|
|
```
|
|
This fix touches N files. That's a large blast radius for a bug fix.
|
|
A) Proceed — the root cause genuinely spans these files
|
|
B) Split — fix the critical path now, defer the rest
|
|
C) Rethink — maybe there's a more targeted approach
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 5: Verification & Report
|
|
|
|
**Fresh verification:** Reproduce the original bug scenario and confirm it's fixed. This is not optional.
|
|
|
|
Run the test suite and paste the output.
|
|
|
|
**Visual verification (if browser-testable bug):**
|
|
```bash
|
|
$B goto <affected-url>
|
|
$B screenshot screenshots/debug-ISSUE-NNN-fixed.png
|
|
$B console --errors
|
|
```
|
|
Compare against the original reproduction screenshot. The bug should no longer be present.
|
|
|
|
Output a structured debug report:
|
|
```
|
|
DEBUG REPORT
|
|
════════════════════════════════════════
|
|
Symptom: [what the user observed]
|
|
Root cause: [what was actually wrong]
|
|
Fix: [what was changed, with file:line references]
|
|
Evidence: [test output, reproduction attempt showing fix works]
|
|
Regression test: [file:line of the new test]
|
|
Related: [TODOS.md items, prior bugs in same area, architectural notes]
|
|
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
|
|
════════════════════════════════════════
|
|
```
|
|
|
|
---
|
|
|
|
## Important Rules
|
|
|
|
- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
|
|
- **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
|
|
- **Never say "this should fix it."** Verify and prove it. Run the tests.
|
|
- **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.
|
|
- **Completion status:**
|
|
- DONE — root cause found, fix applied, regression test written, all tests pass
|
|
- DONE_WITH_CONCERNS — fixed but cannot fully verify (e.g., intermittent bug, requires staging)
|
|
- BLOCKED — root cause unclear after investigation, escalated
|