Files
gstack/VERSION
T
Garry Tan 94c1530efc feat: /debug sub-agent escalation from /qa + recommendations in /review and /ship (v0.6.5.0) (#192)
* feat: add browse access to /debug for visual verification

Debug skill can now use the browse binary to visually reproduce bugs,
take screenshots as evidence, and verify fixes. This makes /debug
effective for web app bugs when spawned as a sub-agent from /qa.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /debug sub-agent escalation to /qa (Phase 8g)

When QA fix attempts fail twice on the same bug (reverted due to
regressions), /qa now spawns a /debug sub-agent with a structured
bug brief including symptoms, repro steps, failed fix details, and
file paths. Results are reported in Phase 10's debug escalation summary.

Sequential execution: one debug investigation at a time, working tree
cleaned between investigations. Graceful degradation on all failure
modes (BLOCKED, agent failure → deferred in report).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /debug recommendation to /review (Step 5.7)

When /review finds what appears to be a pre-existing bug in the base
branch (not introduced by the PR's diff), it now classifies it as
INFORMATIONAL and recommends running /debug for systematic root-cause
investigation. No Agent spawning — /review's scope stays on the diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add reverted QA commit detection to /ship

During pre-landing review, /ship now checks for reverted fix(qa):
commits in the branch history and recommends /debug for systematic
investigation. Informational only — does not block shipping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add debug escalation tests (validation + LLM judge + E2E)

Skill validation: 11 new assertions covering Phase 8g trigger, structured
handoff fields, agent result handlers, debug escalation summary, Step 5.7
recommendation, ship reverted QA detection, and debug browse setup.

LLM judge: evaluates Phase 8g template quality — structured brief format,
result handling, working tree cleanup, sequential processing.

E2E: prompt-level deterministic test (verifies escalation prompt has all
required fields) + full flow stub (fixture TODO for planted regression).

Touchfile entries for diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add worktree parallel debug agents to TODOS.md (P2)

When /qa hits multiple stubborn bugs, parallel debug agents in
isolated git worktrees could investigate simultaneously. Deferred
from the sequential debug escalation PR as a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection

Two new E2E tests:
- review-pre-existing-bug: plants SQL injection in base branch, verifies
  Step 5.7 classifies as INFORMATIONAL and recommends /debug
- ship-reverted-qa-commits: creates branch with reverted fix(qa): commits,
  verifies /ship detects them and recommends /debug

Also fixes qa-debug-prompt-logic to use correct workingDirectory, and
ensures test repo init uses -b main for portability.

All 4 debug-related evals pass: $0.34 total, 94s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 17:59:32 -05:00

2 lines
8 B
Plaintext