feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff

When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-13 20:03:04 -05:00
parent 675aaa6651
commit 22c24b9092
2 changed files with 81 additions and 16 deletions
+30 -10
View File
@@ -22,7 +22,7 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
| `/qa` | QA lead | Systematic QA testing with structured reports, health scores, screenshots, and regression tracking. Three modes: full, quick, regression. |
| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. |
| `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
| `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
@@ -63,6 +63,12 @@ You: /ship
Claude: [Syncs main, runs tests, pushes branch, opens PR — 6 tool calls, done]
You: /qa
Claude: Analyzing branch diff... 8 files changed, 3 routes affected.
[Tests /listings/new, /listings/:id, /api/listings against localhost:3000]
All 3 routes working. Upload + enrichment flow passes end to end.
You: /setup-browser-cookies staging.myapp.com
Claude: Imported 8 cookies for staging.myapp.com from Chrome.
@@ -71,12 +77,6 @@ You: /qa https://staging.myapp.com --quick
Claude: [Smoke test: homepage + 5 pages, 30 seconds]
Health Score: 91/100. No critical issues. 1 medium: mobile nav overlap.
You: /browse staging.myapp.com/listings/new — test the upload flow specifically
Claude: [22 tool calls — navigates routes, fills the upload form, verifies
enrichment renders, checks console for errors, screenshots each step]
Listing flow works end to end on staging.
```
## Who this is for
@@ -471,11 +471,31 @@ This is my **QA lead mode**.
`/browse` gives the agent eyes. `/qa` gives it a testing methodology.
Where `/browse` is a single command — go here, click this, screenshot that — `/qa` is a full systematic test pass. It explores every reachable page, fills forms, clicks buttons, checks console errors, tests responsive layouts, and produces a structured report with a health score, screenshots as evidence, and ranked issues with repro steps.
The most common use case: you're on a feature branch, you just finished coding, and you want to verify everything works. Just say `/qa` — it reads your git diff, identifies which pages and routes your changes affect, spins up the browser, and tests each one. No URL required. No manual test plan. It figures out what to test from the code you changed.
Three modes:
```
You: /qa
- **Full** (default) — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
Claude: Analyzing branch diff against main...
12 files changed: 3 controllers, 2 views, 4 services, 3 tests
Affected routes: /listings/new, /listings/:id, /api/listings
Detected app running on localhost:3000.
[Tests each affected page — navigates, fills forms, clicks buttons,
screenshots, checks console errors]
QA Report: 3 routes tested, all working.
- /listings/new: upload + enrichment flow works end to end
- /listings/:id: detail page renders correctly
- /api/listings: returns 200 with expected shape
No console errors. No regressions on adjacent pages.
```
Four modes:
- **Diff-aware** (automatic on feature branches) — reads `git diff main`, identifies affected pages, tests them specifically. The fastest path from "I just wrote code" to "it works."
- **Full** — systematic exploration of the entire app. 5-15 minutes depending on app size. Documents 5-10 well-evidenced issues.
- **Quick** (`--quick`) — 30-second smoke test. Homepage + top 5 nav targets. Loads? Console errors? Broken links?
- **Regression** (`--regression baseline.json`) — run full mode, then diff against a previous baseline. Which issues are fixed? Which are new? What's the score delta?
+51 -6
View File
@@ -3,9 +3,10 @@ name: qa
version: 1.0.0
description: |
Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
"find bugs", "dogfood", or review quality. Three modes: full (systematic exploration),
quick (30-second smoke test), regression (compare against baseline). Produces structured
report with health score, screenshots, and repro steps.
"find bugs", "dogfood", or review quality. Four modes: diff-aware (automatic on feature
branches — analyzes git diff, identifies affected pages, tests them), full (systematic
exploration), quick (30-second smoke test), regression (compare against baseline). Produces
structured report with health score, screenshots, and repro steps.
allowed-tools:
- Bash
- Read
@@ -22,12 +23,14 @@ You are a QA engineer. Test web applications like a real user — click everythi
| Parameter | Default | Override example |
|-----------|---------|-----------------|
| Target URL | (required) | `https://myapp.com`, `http://localhost:3000` |
| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
| Scope | Full app | `Focus on the billing page` |
| Scope | Full app (or diff-scoped) | `Focus on the billing page` |
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
**Find the browse binary:**
```bash
@@ -55,7 +58,49 @@ mkdir -p "$REPORT_DIR/screenshots"
## Modes
### Full (default)
### Diff-aware (automatic when on a feature branch with no URL)
This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
1. **Analyze the branch diff** to understand what changed:
```bash
git diff main...HEAD --name-only
git log main..HEAD --oneline
```
2. **Identify affected pages/routes** from the changed files:
- Controller/route files → which URL paths they serve
- View/template/component files → which pages render them
- Model/service files → which pages use those models (check controllers that reference them)
- CSS/style files → which pages include those stylesheets
- API endpoints → test them directly with `$B js "await fetch('/api/...')"`
- Static pages (markdown, HTML) → navigate to them directly
3. **Detect the running app** — check common local dev ports:
```bash
$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
$B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
$B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
```
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
4. **Test each affected page/route:**
- Navigate to the page
- Take a screenshot
- Check console for errors
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
- Use `snapshot -D` before and after actions to verify the change had the expected effect
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
6. **Report findings** scoped to the branch changes:
- "Changes tested: N pages/routes affected by this branch"
- For each: does it work? Screenshot evidence.
- Any regressions on adjacent pages?
**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
### Full (default when URL is provided)
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
### Quick (`--quick`)