Files
gstack/qa/SKILL.md
Garry Tan f7b95329c1 feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)
* Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots

- CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift)
- Async buffer flush with Bun.write() (was appendFileSync)
- Dialog auto-accept/dismiss with buffer + prompt text support
- File upload command (upload <sel> <file...>)
- Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
- Annotated screenshots with ref labels overlaid (-a flag)
- Snapshot diffing against previous snapshot (-D flag)
- Cursor-interactive element scan for non-ARIA clickables (-C flag)
- Snapshot scoping depth limit (-d N flag)
- Health check with page.evaluate + 2s timeout
- Playwright error wrapping — actionable messages for AI agents
- Fix useragent — context recreation preserves cookies/storage/URLs
- wait --networkidle / --load / --domcontentloaded flags
- console --errors filter (error + warning only)
- cookie-import <json-file> with auto-fill domain from page URL
- 166 integration tests (was ~63)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Phase 2: Rewrite SKILL.md as QA playbook + command reference

Reorient SKILL.md files from raw command reference to QA-first playbook
with 10 workflow patterns (test user flows, verify deployments, dogfood
features, responsive layouts, file upload, forms, dialogs, compare pages).
Compact command reference tables at the bottom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Phase 3: /qa skill — systematic QA testing with health scores

New /qa skill for systematic web app QA testing. Three modes:
- full: 5-10 documented issues with screenshots and repro steps
- quick: 30-second smoke test with health score
- regression: compare against saved baseline

Includes issue taxonomy (7 categories, 4 severity levels), structured
report template, health score rubric (weighted across 7 categories),
framework detection guidance (Next.js, Rails, WordPress, SPA).

Also adds browse/bin/find-browse (DRY binary discovery using git
rev-parse), .gstack/ to .gitignore, and updated TODO roadmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Bump to v0.3.0 — Phase 2 + Phase 3 changelog

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: cookie-import-browser — Chromium cookie decryption module + tests

Pure logic module for reading and decrypting cookies from macOS Chromium
browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC
encryption with macOS Keychain access, PBKDF2 key derivation, and
per-browser key caching. 18 unit tests with encrypted cookie fixtures.

* feat: cookie picker web UI + route handler

Two-panel dark-theme picker served from the browse server. Left panel
shows source browser domains with search and import buttons. Right panel
shows imported domains with trash buttons. No cookie values exposed.
6 API endpoints, importedDomains Set tracking, inline clearCookies.

* feat: wire cookie-import-browser into browse server

Add cookie-picker route dispatch (no auth, localhost-only), add
cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort
property to BrowserManager, add write command with two modes (picker UI
vs --domain direct import), update CLI help text.

* chore: /setup-browser-cookies skill + docs (Phase 3.5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: redact sensitive values from command output (PR #21)

type no longer echoes text (reports character count), cookie redacts
value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token,
storage set drops value, forms redacts password fields. Prevents secrets
from persisting in LLM transcripts. 7 new tests.

Credit: fredluz (PR #21)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: path traversal prevention for screenshot/pdf/eval (PR #26)

Add validateOutputPath() for screenshot/pdf/responsive (restricts to
/tmp and cwd) and validateReadPath() for eval (blocks .. sequences and
absolute paths outside safe dirs). 7 new tests.

Credit: Jah-yee (PR #26)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: auto-install Playwright Chromium in setup (PR #22)

Setup now verifies Playwright can launch Chromium, and auto-installs
it via `bunx playwright install chromium` if missing. Exits non-zero
if build or Chromium launch fails.

Credit: AkbarDevop (PR #22)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: fix path validation bypass, CORS restriction, cookie-import path check

- startsWith('/tmp') matched '/tmpevil' — now requires trailing slash
- CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port>
- cookie-import now validates file paths (was missing validateReadPath)
- 3 new tests for prefix collision and cookie-import path traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review informational issues + add regression tests

- Add cookie-import to CHAIN_WRITE set for chain command routing
- Add path validation to snapshot -a -o output path
- Fix package.json version to match 0.3.1
- Use crypto.randomUUID() for temp DB paths (unpredictable filenames)
- Add regression tests for chain cookie-import and snapshot path validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add /qa, /setup-browser-cookies to README + update BROWSER.md

- Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs
- Add dedicated README sections for both new skills with usage examples
- Update demo workflow to show cookie import → QA → browse flow
- Update BROWSER.md: cookie import commands, new source files, test count (203)
- Update skill count from 6 to 8

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: team-aware /retro v2.0 — per-person praise and growth opportunities

- Identify current user via git config, orient narrative as "you" vs teammates
- Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions
- New "Your Week" section with personal deep-dive for whoever runs the command
- New "Team Breakdown" with per-person praise and growth opportunities
- Track AI-assisted commits via Co-Authored-By trailers
- Personal + team shipping streaks
- Tone: praise like a 1:1, growth like investment advice, never compare negatively

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add Conductor parallel sessions section to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:31:41 -07:00

296 lines
9.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: qa
version: 1.0.0
description: |
Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
"find bugs", "dogfood", or review quality. Three modes: full (systematic exploration),
quick (30-second smoke test), regression (compare against baseline). Produces structured
report with health score, screenshots, and repro steps.
allowed-tools:
- Bash
- Read
- Write
---
# /qa: Systematic QA Testing
You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
## Setup
**Parse the user's request for these parameters:**
| Parameter | Default | Override example |
|-----------|---------|-----------------|
| Target URL | (required) | `https://myapp.com`, `http://localhost:3000` |
| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
| Scope | Full app | `Focus on the billing page` |
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
**Find the browse binary:**
```bash
B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
if [ -z "$B" ]; then
echo "ERROR: browse binary not found"
exit 1
fi
```
**Create output directories:**
```bash
REPORT_DIR=".gstack/qa-reports"
mkdir -p "$REPORT_DIR/screenshots"
```
---
## Modes
### Full (default)
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
### Quick (`--quick`)
30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
### Regression (`--regression <baseline>`)
Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
---
## Workflow
### Phase 1: Initialize
1. Find browse binary (see Setup above)
2. Create output directories
3. Copy report template from `qa/templates/qa-report-template.md` to output dir
4. Start timer for duration tracking
### Phase 2: Authenticate (if needed)
**If the user specified auth credentials:**
```bash
$B goto <login-url>
$B snapshot -i # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]" # NEVER include real passwords in report
$B click @e5 # submit
$B snapshot -D # verify login succeeded
```
**If the user provided a cookie file:**
```bash
$B cookie-import cookies.json
$B goto <target-url>
```
**If 2FA/OTP is required:** Ask the user for the code and wait.
**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
### Phase 3: Orient
Get a map of the application:
```bash
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links # map navigation structure
$B console --errors # any errors on landing?
```
**Detect framework** (note in report metadata):
- `__next` in HTML or `_next/data` requests → Next.js
- `csrf-token` meta tag → Rails
- `wp-content` in URLs → WordPress
- Client-side routing with no page reloads → SPA
**For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead.
### Phase 4: Explore
Visit pages systematically. At each page:
```bash
$B goto <page-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$B console --errors
```
Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`):
1. **Visual scan** — Look at the annotated screenshot for layout issues
2. **Interactive elements** — Click buttons, links, controls. Do they work?
3. **Forms** — Fill and submit. Test empty, invalid, edge cases
4. **Navigation** — Check all paths in and out
5. **States** — Empty state, loading, error, overflow
6. **Console** — Any new JS errors after interactions?
7. **Responsiveness** — Check mobile viewport if relevant:
```bash
$B viewport 375x812
$B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
$B viewport 1280x720
```
**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
### Phase 5: Document
Document each issue **immediately when found** — don't batch them.
**Two evidence tiers:**
**Interactive bugs** (broken flows, dead buttons, form failures):
1. Take a screenshot before the action
2. Perform the action
3. Take a screenshot showing the result
4. Use `snapshot -D` to show what changed
5. Write repro steps referencing screenshots
```bash
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
```
**Static bugs** (typos, layout issues, missing images):
1. Take a single annotated screenshot showing the problem
2. Describe what's wrong
```bash
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
```
**Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`.
### Phase 6: Wrap Up
1. **Compute health score** using the rubric below
2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
3. **Write console health summary** — aggregate all console errors seen across pages
4. **Update severity counts** in the summary table
5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
6. **Save baseline** — write `baseline.json` with:
```json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"healthScore": N,
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
"categoryScores": { "console": N, "links": N, ... }
}
```
**Regression mode:** After writing the report, load the baseline file. Compare:
- Health score delta
- Issues fixed (in baseline but not current)
- New issues (in current but not baseline)
- Append the regression section to the report
---
## Health Score Rubric
Compute each category score (0-100), then take the weighted average.
### Console (weight: 15%)
- 0 errors → 100
- 1-3 errors → 70
- 4-10 errors → 40
- 10+ errors → 10
### Links (weight: 10%)
- 0 broken → 100
- Each broken link → -15 (minimum 0)
### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
Each category starts at 100. Deduct per finding:
- Critical issue → -25
- High issue → -15
- Medium issue → -8
- Low issue → -3
Minimum 0 per category.
### Weights
| Category | Weight |
|----------|--------|
| Console | 15% |
| Links | 10% |
| Visual | 10% |
| Functional | 20% |
| UX | 15% |
| Performance | 10% |
| Content | 5% |
| Accessibility | 15% |
### Final Score
`score = Σ (category_score × weight)`
---
## Framework-Specific Guidance
### Next.js
- Check console for hydration errors (`Hydration failed`, `Text content did not match`)
- Monitor `_next/data` requests in network — 404s indicate broken data fetching
- Test client-side navigation (click links, don't just `goto`) — catches routing issues
- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
### Rails
- Check for N+1 query warnings in console (if development mode)
- Verify CSRF token presence in forms
- Test Turbo/Stimulus integration — do page transitions work smoothly?
- Check for flash messages appearing and dismissing correctly
### WordPress
- Check for plugin conflicts (JS errors from different plugins)
- Verify admin bar visibility for logged-in users
- Test REST API endpoints (`/wp-json/`)
- Check for mixed content warnings (common with WP)
### General SPA (React, Vue, Angular)
- Use `snapshot -i` for navigation — `links` command misses client-side routes
- Check for stale state (navigate away and back — does data refresh?)
- Test browser back/forward — does the app handle history correctly?
- Check for memory leaks (monitor console after extended use)
---
## Important Rules
1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps.
4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
5. **Never read source code.** Test as a user, not a developer.
6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
---
## Output Structure
```
.gstack/qa-reports/
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
├── screenshots/
│ ├── initial.png # Landing page annotated screenshot
│ ├── issue-001-step-1.png # Per-issue evidence
│ ├── issue-001-result.png
│ └── ...
└── baseline.json # For regression mode
```
Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`