Files
gstack/qa/SKILL.md
T
Garry Tan e04ad1bea0 feat: QA test plan tiers with per-page risk scoring
Rewrite qa/SKILL.md to v2.0:
- Smart test plan generation with Quick/Standard/Exhaustive tiers
- Per-page risk heuristics (forms=HIGH, CSS=LOW, tests=SKIP)
- Reports persist to ~/.gstack/projects/{slug}/qa-reports/
- QA run index with bidirectional links between reports
- Report metadata: branch, commit, PR, tier
- Auto-open preference saved to ~/.gstack/config.json
- PR comment integration via gh
- file:// link output on completion
2026-03-14 00:10:07 -05:00

444 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: qa
version: 2.0.0
description: |
Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
"find bugs", "dogfood", or review quality. Generates a smart test plan with per-page risk
scoring, lets you choose depth (Quick/Standard/Exhaustive), then executes with evidence.
Reports persist to ~/.gstack/projects/ with history tracking and PR integration.
allowed-tools:
- Bash
- Read
- Write
---
# /qa: Systematic QA Testing
You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------|
| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
| Tier | (ask user) | `--quick`, `--exhaustive` |
| Output dir | `~/.gstack/projects/{slug}/qa-reports/` | `Output to /tmp/qa` |
| Scope | Full app (or diff-scoped) | `Focus on the billing page` |
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Phase 3).
**Find the browse binary:**
```bash
BROWSE_OUTPUT=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
B=$(echo "$BROWSE_OUTPUT" | head -1)
META=$(echo "$BROWSE_OUTPUT" | grep "^META:" || true)
if [ -z "$B" ]; then
echo "ERROR: browse binary not found"
exit 1
fi
echo "READY: $B"
[ -n "$META" ] && echo "$META"
```
If you see `META:UPDATE_AVAILABLE`: tell the user an update is available, STOP and wait for approval, then run the command from the META payload and re-run the setup check.
**Set up report directory (persistent, global):**
```bash
REMOTE_SLUG=$(browse/bin/remote-slug 2>/dev/null || ~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
REPORT_DIR="$HOME/.gstack/projects/$REMOTE_SLUG/qa-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
**Gather git context for report metadata:**
```bash
BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
COMMIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
COMMIT_DATE=$(git log -1 --format=%Y-%m-%d 2>/dev/null || echo "unknown")
PR_INFO=$(gh pr view --json number,url 2>/dev/null || echo "")
```
---
## Workflow
### Phase 1: Initialize
1. Find browse binary (see Setup above)
2. Create report directory
3. Copy report template from `qa/templates/qa-report-template.md` to output dir
4. Start timer for duration tracking
5. Fill in report metadata: branch, commit, PR, date
### Phase 2: Authenticate (if needed)
**If the user specified auth credentials:**
```bash
$B goto <login-url>
$B snapshot -i # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]" # NEVER include real passwords in report
$B click @e5 # submit
$B snapshot -D # verify login succeeded
```
**If the user provided a cookie file:**
```bash
$B cookie-import cookies.json
$B goto <target-url>
```
**If 2FA/OTP is required:** Ask the user for the code and wait.
**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
### Phase 3: Recon
Get a map of the application:
```bash
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links # map navigation structure
$B console --errors # any errors on landing?
```
**Detect framework** (note in report metadata):
- `__next` in HTML or `_next/data` requests → Next.js
- `csrf-token` meta tag → Rails
- `wp-content` in URLs → WordPress
- Client-side routing with no page reloads → SPA
**For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead.
**If on a feature branch (diff-aware mode):**
```bash
git diff main...HEAD --name-only
git log main..HEAD --oneline
```
Identify affected pages/routes from changed files using the Risk Heuristics below. Also:
1. **Detect the running app** — check common local dev ports:
```bash
$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
$B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
$B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
```
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
2. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
### Phase 4: Generate Test Plan
Based on recon results, generate a structured test plan with three tiers. Each tier is a superset of the one above it.
**Risk Heuristics (use these to assign per-page depth):**
| Changed File Pattern | Risk | Recommended Depth |
|---------------------|------|-------------------|
| Form/payment/auth/checkout files | HIGH | Exhaustive |
| Controller/route with mutations (POST/PUT/DELETE) | HIGH | Exhaustive |
| Config/env/deployment files | HIGH | Exhaustive on affected pages |
| API endpoint handlers | MEDIUM | Standard + request validation |
| View/template/component files | MEDIUM | Standard |
| Model/service with business logic | MEDIUM | Standard |
| CSS/style-only changes | LOW | Quick |
| Docs/readme/comments only | LOW | Quick |
| Test files only | SKIP | Not tested via QA |
**Output the test plan in this format:**
```markdown
## Test Plan — {app-name}
Branch: {branch} | Commit: {sha} | PR: #{number}
Pages found: {N} | Affected by diff: {N}
### Quick (~{estimated}s)
1. / (homepage) — smoke check
2. /dashboard — loads, no console errors
...
### Standard (~{estimated}min)
1-N. Above, plus:
N+1. /checkout — fill payment form, submit, verify flow
...
### Exhaustive (~{estimated}min)
1-N. Above, plus:
N+1. /checkout — empty, invalid, boundary inputs
N+2. All pages at 3 viewports (375px, 768px, 1280px)
...
```
**Time estimates:** Base on page count. Quick: ~3s per page. Standard: ~30-60s per page. Exhaustive: ~2-3min per page.
**Ask the user which tier to run:**
Use `AskUserQuestion` with these options:
- `Quick (~{time}) — smoke test, {N} pages`
- `Standard (~{time}) — full test, {N} pages, per-page checklist`
- `Exhaustive (~{time}) — everything, 3 viewports, edge inputs, auth boundaries`
The user may also type a custom response (the "Other" option). If they do, parse their edits (e.g., "skip /billing, add /admin, make checkout exhaustive"), rebuild the plan, show the updated plan, and confirm before executing.
**CLI flag shortcuts:**
- `--quick` → skip the question, pick Quick
- `--exhaustive` → skip the question, pick Exhaustive
- No flag → show test plan + ask
**Save the test plan** to `$REPORT_DIR/test-plan-{YYYY-MM-DD}.md` before execution begins.
### Phase 5: Execute
Run the chosen tier. Visit pages in the order specified by the test plan.
#### Quick Depth (per page)
- Navigate to the page
- Check: does it load? Any console errors?
- Note broken links visible in navigation
#### Standard Depth (per page)
Everything in Quick, plus:
- Take annotated screenshot: `$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"`
- Follow the per-page exploration checklist (see `qa/references/issue-taxonomy.md`):
1. **Visual scan** — Look at the annotated screenshot for layout issues
2. **Interactive elements** — Click buttons, links, controls. Do they work?
3. **Forms** — Fill and submit. Test empty and invalid cases
4. **Navigation** — Check all paths in and out
5. **States** — Empty state, loading, error, overflow
6. **Console** — Any new JS errors after interactions?
7. **Responsiveness** — Check mobile viewport on key pages:
```bash
$B viewport 375x812
$B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
$B viewport 1280x720
```
**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
#### Exhaustive Depth (per page)
Everything in Standard, plus:
- Every form tested with: empty submission, valid data, invalid data, boundary values, XSS-like inputs (`<script>alert(1)</script>`, `'; DROP TABLE users--`)
- Every interactive element clicked and verified
- 3 viewports: mobile (375px), tablet (768px), desktop (1280px)
- Full accessibility snapshot check
- Network request monitoring for 4xx/5xx errors and slow responses
- State testing: empty states, error states, loading states, overflow content
- Auth boundary test (attempt access while logged out)
- Back/forward navigation after interactions
- Console audit: every warning AND error, not just errors
### Phase 6: Document
Document each issue **immediately when found** — don't batch them.
**Two evidence tiers:**
**Interactive bugs** (broken flows, dead buttons, form failures):
1. Take a screenshot before the action
2. Perform the action
3. Take a screenshot showing the result
4. Use `snapshot -D` to show what changed
5. Write repro steps referencing screenshots
```bash
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
```
**Static bugs** (typos, layout issues, missing images):
1. Take a single annotated screenshot showing the problem
2. Describe what's wrong
```bash
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
```
**Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`.
### Phase 7: Wrap Up
1. **Compute health score** using the rubric below
2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
3. **Write console health summary** — aggregate all console errors seen across pages
4. **Update severity counts** in the summary table
5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework, tier
6. **Save baseline** — write `baseline.json` with:
```json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"healthScore": N,
"tier": "Standard",
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
"categoryScores": { "console": N, "links": N, ... }
}
```
7. **Update the QA run index** — append a row to `$REPORT_DIR/index.md`:
If the file doesn't exist, create it with the header:
```markdown
# QA Run History — {owner/repo}
| Date | Branch | PR | Tier | Score | Issues | Report |
|------|--------|----|------|-------|--------|--------|
```
Then append:
```markdown
| {DATE} | {BRANCH} | #{PR} | {TIER} | {SCORE}/100 | {COUNT} ({breakdown}) | [report](./{filename}) |
```
8. **Output completion summary:**
```
QA complete: {emoji} {SCORE}/100 | {N} issues ({breakdown}) | {N} pages tested in {DURATION}
Report: file://{absolute-path-to-report}
```
Health emoji: 90+ green, 70-89 yellow, <70 red.
9. **Auto-open preference** — read `~/.gstack/config.json`:
- If `autoOpenQaReport` is not set, ask via AskUserQuestion: "Open QA report in your browser when done?" with options ["Yes, always open", "No, just show the link"]. Save the answer to `~/.gstack/config.json`.
- If `autoOpenQaReport` is `true`, run `open "{report-path}"` (macOS).
- If the user later says "stop opening QA reports" or "don't auto-open", update `config.json` to `false`.
10. **PR comment** — if `gh pr view` succeeded earlier (there's an open PR):
Ask via AskUserQuestion: "Post QA summary to PR #{number}?" with options ["Yes, post comment", "No, skip"].
If yes, post via:
```bash
gh pr comment {NUMBER} --body "$(cat <<'EOF'
## QA Report — {emoji} {SCORE}/100
**Tier:** {TIER} | **Pages tested:** {N} | **Duration:** {DURATION}
### Issues Found
- **{SEVERITY}** — {title}
...
[Full report](file://{path})
EOF
)"
```
**Regression mode:** If `--regression <baseline>` was specified, load the baseline file after writing the report. Compare:
- Health score delta
- Issues fixed (in baseline but not current)
- New issues (in current but not baseline)
- Append the regression section to the report
---
## Health Score Rubric
Compute each category score (0-100), then take the weighted average.
### Console (weight: 15%)
- 0 errors → 100
- 1-3 errors → 70
- 4-10 errors → 40
- 10+ errors → 10
### Links (weight: 10%)
- 0 broken → 100
- Each broken link → -15 (minimum 0)
### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
Each category starts at 100. Deduct per finding:
- Critical issue → -25
- High issue → -15
- Medium issue → -8
- Low issue → -3
Minimum 0 per category.
### Weights
| Category | Weight |
|----------|--------|
| Console | 15% |
| Links | 10% |
| Visual | 10% |
| Functional | 20% |
| UX | 15% |
| Performance | 10% |
| Content | 5% |
| Accessibility | 15% |
### Final Score
`score = Σ (category_score × weight)`
---
## Framework-Specific Guidance
### Next.js
- Check console for hydration errors (`Hydration failed`, `Text content did not match`)
- Monitor `_next/data` requests in network — 404s indicate broken data fetching
- Test client-side navigation (click links, don't just `goto`) — catches routing issues
- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
### Rails
- Check for N+1 query warnings in console (if development mode)
- Verify CSRF token presence in forms
- Test Turbo/Stimulus integration — do page transitions work smoothly?
- Check for flash messages appearing and dismissing correctly
### WordPress
- Check for plugin conflicts (JS errors from different plugins)
- Verify admin bar visibility for logged-in users
- Test REST API endpoints (`/wp-json/`)
- Check for mixed content warnings (common with WP)
### General SPA (React, Vue, Angular)
- Use `snapshot -i` for navigation — `links` command misses client-side routes
- Check for stale state (navigate away and back — does data refresh?)
- Test browser back/forward — does the app handle history correctly?
- Check for memory leaks (monitor console after extended use)
---
## Important Rules
1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps.
4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
5. **Never read source code.** Test as a user, not a developer.
6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
---
## Output Structure
```
~/.gstack/projects/{remote-slug}/qa-reports/
├── index.md # QA run history with links
├── test-plan-{YYYY-MM-DD}.md # Approved test plan
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
├── baseline.json # For regression mode
└── screenshots/
├── initial.png # Landing page annotated screenshot
├── issue-001-step-1.png # Per-issue evidence
├── issue-001-result.png
└── ...
```
Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`