gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-07-12 02:36:34 +02:00

Author	SHA1	Message	Date
Garry Tan	94c1530efc	feat: /debug sub-agent escalation from /qa + recommendations in /review and /ship (v0.6.5.0) (#192 ) * feat: add browse access to /debug for visual verification Debug skill can now use the browse binary to visually reproduce bugs, take screenshots as evidence, and verify fixes. This makes /debug effective for web app bugs when spawned as a sub-agent from /qa. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug sub-agent escalation to /qa (Phase 8g) When QA fix attempts fail twice on the same bug (reverted due to regressions), /qa now spawns a /debug sub-agent with a structured bug brief including symptoms, repro steps, failed fix details, and file paths. Results are reported in Phase 10's debug escalation summary. Sequential execution: one debug investigation at a time, working tree cleaned between investigations. Graceful degradation on all failure modes (BLOCKED, agent failure → deferred in report). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /debug recommendation to /review (Step 5.7) When /review finds what appears to be a pre-existing bug in the base branch (not introduced by the PR's diff), it now classifies it as INFORMATIONAL and recommends running /debug for systematic root-cause investigation. No Agent spawning — /review's scope stays on the diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add reverted QA commit detection to /ship During pre-landing review, /ship now checks for reverted fix(qa): commits in the branch history and recommends /debug for systematic investigation. Informational only — does not block shipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add debug escalation tests (validation + LLM judge + E2E) Skill validation: 11 new assertions covering Phase 8g trigger, structured handoff fields, agent result handlers, debug escalation summary, Step 5.7 recommendation, ship reverted QA detection, and debug browse setup. LLM judge: evaluates Phase 8g template quality — structured brief format, result handling, working tree cleanup, sequential processing. E2E: prompt-level deterministic test (verifies escalation prompt has all required fields) + full flow stub (fixture TODO for planted regression). Touchfile entries for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add worktree parallel debug agents to TODOS.md (P2) When /qa hits multiple stubborn bugs, parallel debug agents in isolated git worktrees could investigate simultaneously. Deferred from the sequential debug escalation PR as a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E evals for /review pre-existing bug + /ship reverted QA detection Two new E2E tests: - review-pre-existing-bug: plants SQL injection in base branch, verifies Step 5.7 classifies as INFORMATIONAL and recommends /debug - ship-reverted-qa-commits: creates branch with reverted fix(qa): commits, verifies /ship detects them and recommends /debug Also fixes qa-debug-prompt-logic to use correct workingDirectory, and ensures test repo init uses -b main for portability. All 4 debug-related evals pass: $0.34 total, 94s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 17:59:32 -05:00
Garry Tan	b071df3579	fix: resolve merge conflicts with origin/main (v0.6.0 + v0.6.0.1 + v0.6.1) Merge main's test bootstrap, boil-the-lake completeness principle, selective expansion, ship gate overrides, and gstack-upgrade vendor sync. Conflicts resolved: - CHANGELOG: keep main's 0.6.1/0.6.0.1/0.6.0/0.5.4/0.5.3 entries - VERSION: take main's 0.6.1 - design-consultation: office-hours naming + main's "what's out there" phrasing - ship: keep both verification rules (fresh evidence + coverage tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 14:37:22 -07:00
Garry Tan	9a55efe7c7	fix: resolve merge conflicts with origin/main (v0.4.3 + v0.4.4) Merge origin/main which added: - v0.4.3: /document-release skill, simplified AskUserQuestion (always ELI16), runtime branch detection via _BRANCH preamble variable - v0.4.4: 60-min update check TTL, --force flag for gstack-update-check Resolved conflicts: - VERSION: keep 0.6.0 (our version, above 0.4.4) - CHANGELOG: all entries preserved (0.6.0 > 0.4.4 > 0.4.3 > 0.4.2) - TODOS.md: keep both sections (Brainstorm/Design + Document-Release) - test/skill-validation.test.ts: include all three new skills in arrays (brainstorm, debug, document-release) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 17:44:33 -07:00
Garry Tan	f5b981fcc0	fix: resolve merge conflicts with origin/main (v0.4.2 base branch detection) Merge origin/main which added: - BASE_BRANCH_DETECT placeholder + dynamic branch detection in all skills - Updated contributor mode (reflection-based, 0-10 rating) - Async await wrapping in browse js/eval commands - Hardcoded-main regression test Resolved conflicts: - VERSION: keep 0.6.0 (our version, above 0.4.2) - CHANGELOG: both entries preserved (0.6.0 above 0.4.2) - gen-skill-docs.ts: keep main's updated contributor mode, add our escalation protocol - review/SKILL.md.tmpl: fix hardcoded 'origin/main' in Step 1.5 to use origin/<base> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 11:35:47 -05:00
Garry Tan	4c0a3fe13b	feat: new /brainstorm and /debug skills /brainstorm: Socratic design exploration before planning. Context gathering, clarifying questions (smart-skip), related design discovery (keyword grep), premise challenge, forced alternatives, design doc artifact with lineage tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/. /debug: Systematic root-cause debugging. Iron Law: no fixes without root cause investigation. Pattern analysis, hypothesis testing with 3-strike escalation, structured DEBUG REPORT output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 10:10:41 -05:00

5 Commits