feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)

* Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots - CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift) - Async buffer flush with Bun.write() (was appendFileSync) - Dialog auto-accept/dismiss with buffer + prompt text support - File upload command (upload <sel> <file...>) - Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused) - Annotated screenshots with ref labels overlaid (-a flag) - Snapshot diffing against previous snapshot (-D flag) - Cursor-interactive element scan for non-ARIA clickables (-C flag) - Snapshot scoping depth limit (-d N flag) - Health check with page.evaluate + 2s timeout - Playwright error wrapping — actionable messages for AI agents - Fix useragent — context recreation preserves cookies/storage/URLs - wait --networkidle / --load / --domcontentloaded flags - console --errors filter (error + warning only) - cookie-import <json-file> with auto-fill domain from page URL - 166 integration tests (was ~63) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 2: Rewrite SKILL.md as QA playbook + command reference Reorient SKILL.md files from raw command reference to QA-first playbook with 10 workflow patterns (test user flows, verify deployments, dogfood features, responsive layouts, file upload, forms, dialogs, compare pages). Compact command reference tables at the bottom. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Phase 3: /qa skill — systematic QA testing with health scores New /qa skill for systematic web app QA testing. Three modes: - full: 5-10 documented issues with screenshots and repro steps - quick: 30-second smoke test with health score - regression: compare against saved baseline Includes issue taxonomy (7 categories, 4 severity levels), structured report template, health score rubric (weighted across 7 categories), framework detection guidance (Next.js, Rails, WordPress, SPA). Also adds browse/bin/find-browse (DRY binary discovery using git rev-parse), .gstack/ to .gitignore, and updated TODO roadmap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Bump to v0.3.0 — Phase 2 + Phase 3 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: cookie-import-browser — Chromium cookie decryption module + tests Pure logic module for reading and decrypting cookies from macOS Chromium browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC encryption with macOS Keychain access, PBKDF2 key derivation, and per-browser key caching. 18 unit tests with encrypted cookie fixtures. * feat: cookie picker web UI + route handler Two-panel dark-theme picker served from the browse server. Left panel shows source browser domains with search and import buttons. Right panel shows imported domains with trash buttons. No cookie values exposed. 6 API endpoints, importedDomains Set tracking, inline clearCookies. * feat: wire cookie-import-browser into browse server Add cookie-picker route dispatch (no auth, localhost-only), add cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort property to BrowserManager, add write command with two modes (picker UI vs --domain direct import), update CLI help text. * chore: /setup-browser-cookies skill + docs (Phase 3.5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: redact sensitive values from command output (PR #21) type no longer echoes text (reports character count), cookie redacts value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token, storage set drops value, forms redacts password fields. Prevents secrets from persisting in LLM transcripts. 7 new tests. Credit: fredluz (PR #21) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: path traversal prevention for screenshot/pdf/eval (PR #26) Add validateOutputPath() for screenshot/pdf/responsive (restricts to /tmp and cwd) and validateReadPath() for eval (blocks .. sequences and absolute paths outside safe dirs). 7 new tests. Credit: Jah-yee (PR #26) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install Playwright Chromium in setup (PR #22) Setup now verifies Playwright can launch Chromium, and auto-installs it via `bunx playwright install chromium` if missing. Exits non-zero if build or Chromium launch fails. Credit: AkbarDevop (PR #22) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: fix path validation bypass, CORS restriction, cookie-import path check - startsWith('/tmp') matched '/tmpevil' — now requires trailing slash - CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port> - cookie-import now validates file paths (was missing validateReadPath) - 3 new tests for prefix collision and cookie-import path traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review informational issues + add regression tests - Add cookie-import to CHAIN_WRITE set for chain command routing - Add path validation to snapshot -a -o output path - Fix package.json version to match 0.3.1 - Use crypto.randomUUID() for temp DB paths (unpredictable filenames) - Add regression tests for chain cookie-import and snapshot path validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add /qa, /setup-browser-cookies to README + update BROWSER.md - Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs - Add dedicated README sections for both new skills with usage examples - Update demo workflow to show cookie import → QA → browse flow - Update BROWSER.md: cookie import commands, new source files, test count (203) - Update skill count from 6 to 8 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: team-aware /retro v2.0 — per-person praise and growth opportunities - Identify current user via git config, orient narrative as "you" vs teammates - Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions - New "Your Week" section with personal deep-dive for whoever runs the command - New "Team Breakdown" with per-person praise and growth opportunities - Track AI-assisted commits via Co-Authored-By trailers - Personal + team shipping streaks - Tone: praise like a 1:1, growth like investment advice, never compare negatively Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add Conductor parallel sessions section to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-27 20:20:03 +02:00 · 2026-03-13 00:31:41 -07:00
parent 1b317aae9a
commit f7b95329c1
36 changed files with 5188 additions and 657 deletions
@@ -0,0 +1,295 @@
+---
+name: qa
+version: 1.0.0
+description: |
+  Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
+  "find bugs", "dogfood", or review quality. Three modes: full (systematic exploration),
+  quick (30-second smoke test), regression (compare against baseline). Produces structured
+  report with health score, screenshots, and repro steps.
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+---
+
+# /qa: Systematic QA Testing
+
+You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
+
+## Setup
+
+**Parse the user's request for these parameters:**
+
+| Parameter | Default | Override example |
+|-----------|---------|-----------------|
+| Target URL | (required) | `https://myapp.com`, `http://localhost:3000` |
+| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
+| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
+| Scope | Full app | `Focus on the billing page` |
+| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
+
+**Find the browse binary:**
+
+```bash
+B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
+if [ -z "$B" ]; then
+  echo "ERROR: browse binary not found"
+  exit 1
+fi
+```
+
+**Create output directories:**
+
+```bash
+REPORT_DIR=".gstack/qa-reports"
+mkdir -p "$REPORT_DIR/screenshots"
+```
+
+---
+
+## Modes
+
+### Full (default)
+Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
+
+### Quick (`--quick`)
+30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
+
+### Regression (`--regression <baseline>`)
+Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
+
+---
+
+## Workflow
+
+### Phase 1: Initialize
+
+1. Find browse binary (see Setup above)
+2. Create output directories
+3. Copy report template from `qa/templates/qa-report-template.md` to output dir
+4. Start timer for duration tracking
+
+### Phase 2: Authenticate (if needed)
+
+**If the user specified auth credentials:**
+
+```bash
+$B goto <login-url>
+$B snapshot -i                    # find the login form
+$B fill @e3 "user@example.com"
+$B fill @e4 "[REDACTED]"         # NEVER include real passwords in report
+$B click @e5                      # submit
+$B snapshot -D                    # verify login succeeded
+```
+
+**If the user provided a cookie file:**
+
+```bash
+$B cookie-import cookies.json
+$B goto <target-url>
+```
+
+**If 2FA/OTP is required:** Ask the user for the code and wait.
+
+**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
+
+### Phase 3: Orient
+
+Get a map of the application:
+
+```bash
+$B goto <target-url>
+$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
+$B links                          # map navigation structure
+$B console --errors               # any errors on landing?
+```
+
+**Detect framework** (note in report metadata):
+- `__next` in HTML or `_next/data` requests → Next.js
+- `csrf-token` meta tag → Rails
+- `wp-content` in URLs → WordPress
+- Client-side routing with no page reloads → SPA
+
+**For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead.
+
+### Phase 4: Explore
+
+Visit pages systematically. At each page:
+
+```bash
+$B goto <page-url>
+$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
+$B console --errors
+```
+
+Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`):
+
+1. **Visual scan** — Look at the annotated screenshot for layout issues
+2. **Interactive elements** — Click buttons, links, controls. Do they work?
+3. **Forms** — Fill and submit. Test empty, invalid, edge cases
+4. **Navigation** — Check all paths in and out
+5. **States** — Empty state, loading, error, overflow
+6. **Console** — Any new JS errors after interactions?
+7. **Responsiveness** — Check mobile viewport if relevant:
+   ```bash
+   $B viewport 375x812
+   $B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
+   $B viewport 1280x720
+   ```
+
+**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
+
+**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
+
+### Phase 5: Document
+
+Document each issue **immediately when found** — don't batch them.
+
+**Two evidence tiers:**
+
+**Interactive bugs** (broken flows, dead buttons, form failures):
+1. Take a screenshot before the action
+2. Perform the action
+3. Take a screenshot showing the result
+4. Use `snapshot -D` to show what changed
+5. Write repro steps referencing screenshots
+
+```bash
+$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
+$B click @e5
+$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
+$B snapshot -D
+```
+
+**Static bugs** (typos, layout issues, missing images):
+1. Take a single annotated screenshot showing the problem
+2. Describe what's wrong
+
+```bash
+$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
+```
+
+**Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`.
+
+### Phase 6: Wrap Up
+
+1. **Compute health score** using the rubric below
+2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
+3. **Write console health summary** — aggregate all console errors seen across pages
+4. **Update severity counts** in the summary table
+5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
+6. **Save baseline** — write `baseline.json` with:
+   ```json
+   {
+     "date": "YYYY-MM-DD",
+     "url": "<target>",
+     "healthScore": N,
+     "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
+     "categoryScores": { "console": N, "links": N, ... }
+   }
+   ```
+
+**Regression mode:** After writing the report, load the baseline file. Compare:
+- Health score delta
+- Issues fixed (in baseline but not current)
+- New issues (in current but not baseline)
+- Append the regression section to the report
+
+---
+
+## Health Score Rubric
+
+Compute each category score (0-100), then take the weighted average.
+
+### Console (weight: 15%)
+- 0 errors → 100
+- 1-3 errors → 70
+- 4-10 errors → 40
+- 10+ errors → 10
+
+### Links (weight: 10%)
+- 0 broken → 100
+- Each broken link → -15 (minimum 0)
+
+### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
+Each category starts at 100. Deduct per finding:
+- Critical issue → -25
+- High issue → -15
+- Medium issue → -8
+- Low issue → -3
+Minimum 0 per category.
+
+### Weights
+| Category | Weight |
+|----------|--------|
+| Console | 15% |
+| Links | 10% |
+| Visual | 10% |
+| Functional | 20% |
+| UX | 15% |
+| Performance | 10% |
+| Content | 5% |
+| Accessibility | 15% |
+
+### Final Score
+`score = Σ (category_score × weight)`
+
+---
+
+## Framework-Specific Guidance
+
+### Next.js
+- Check console for hydration errors (`Hydration failed`, `Text content did not match`)
+- Monitor `_next/data` requests in network — 404s indicate broken data fetching
+- Test client-side navigation (click links, don't just `goto`) — catches routing issues
+- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
+
+### Rails
+- Check for N+1 query warnings in console (if development mode)
+- Verify CSRF token presence in forms
+- Test Turbo/Stimulus integration — do page transitions work smoothly?
+- Check for flash messages appearing and dismissing correctly
+
+### WordPress
+- Check for plugin conflicts (JS errors from different plugins)
+- Verify admin bar visibility for logged-in users
+- Test REST API endpoints (`/wp-json/`)
+- Check for mixed content warnings (common with WP)
+
+### General SPA (React, Vue, Angular)
+- Use `snapshot -i` for navigation — `links` command misses client-side routes
+- Check for stale state (navigate away and back — does data refresh?)
+- Test browser back/forward — does the app handle history correctly?
+- Check for memory leaks (monitor console after extended use)
+
+---
+
+## Important Rules
+
+1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
+2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
+3. **Never include credentials.** Write `[REDACTED]` for passwords in repro steps.
+4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
+5. **Never read source code.** Test as a user, not a developer.
+6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
+7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
+8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
+9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
+10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
+
+---
+
+## Output Structure
+
+```
+.gstack/qa-reports/
+├── qa-report-{domain}-{YYYY-MM-DD}.md    # Structured report
+├── screenshots/
+│   ├── initial.png                        # Landing page annotated screenshot
+│   ├── issue-001-step-1.png               # Per-issue evidence
+│   ├── issue-001-result.png
+│   └── ...
+└── baseline.json                          # For regression mode
+```
+
+Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
@@ -0,0 +1,85 @@
+# QA Issue Taxonomy
+
+## Severity Levels
+
+| Severity | Definition | Examples |
+|----------|------------|----------|
+| **critical** | Blocks a core workflow, causes data loss, or crashes the app | Form submit causes error page, checkout flow broken, data deleted without confirmation |
+| **high** | Major feature broken or unusable, no workaround | Search returns wrong results, file upload silently fails, auth redirect loop |
+| **medium** | Feature works but with noticeable problems, workaround exists | Slow page load (>5s), form validation missing but submit still works, layout broken on mobile only |
+| **low** | Minor cosmetic or polish issue | Typo in footer, 1px alignment issue, hover state inconsistent |
+
+## Categories
+
+### 1. Visual/UI
+- Layout breaks (overlapping elements, clipped text, horizontal scrollbar)
+- Broken or missing images
+- Incorrect z-index (elements appearing behind others)
+- Font/color inconsistencies
+- Animation glitches (jank, incomplete transitions)
+- Alignment issues (off-grid, uneven spacing)
+- Dark mode / theme issues
+
+### 2. Functional
+- Broken links (404, wrong destination)
+- Dead buttons (click does nothing)
+- Form validation (missing, wrong, bypassed)
+- Incorrect redirects
+- State not persisting (data lost on refresh, back button)
+- Race conditions (double-submit, stale data)
+- Search returning wrong or no results
+
+### 3. UX
+- Confusing navigation (no breadcrumbs, dead ends)
+- Missing loading indicators (user doesn't know something is happening)
+- Slow interactions (>500ms with no feedback)
+- Unclear error messages ("Something went wrong" with no detail)
+- No confirmation before destructive actions
+- Inconsistent interaction patterns across pages
+- Dead ends (no way back, no next action)
+
+### 4. Content
+- Typos and grammar errors
+- Outdated or incorrect text
+- Placeholder / lorem ipsum text left in
+- Truncated text (cut off without ellipsis or "more")
+- Wrong labels on buttons or form fields
+- Missing or unhelpful empty states
+
+### 5. Performance
+- Slow page loads (>3 seconds)
+- Janky scrolling (dropped frames)
+- Layout shifts (content jumping after load)
+- Excessive network requests (>50 on a single page)
+- Large unoptimized images
+- Blocking JavaScript (page unresponsive during load)
+
+### 6. Console/Errors
+- JavaScript exceptions (uncaught errors)
+- Failed network requests (4xx, 5xx)
+- Deprecation warnings (upcoming breakage)
+- CORS errors
+- Mixed content warnings (HTTP resources on HTTPS)
+- CSP violations
+
+### 7. Accessibility
+- Missing alt text on images
+- Unlabeled form inputs
+- Keyboard navigation broken (can't tab to elements)
+- Focus traps (can't escape a modal or dropdown)
+- Missing or incorrect ARIA attributes
+- Insufficient color contrast
+- Content not reachable by screen reader
+
+## Per-Page Exploration Checklist
+
+For each page visited during a QA session:
+
+1. **Visual scan** — Take annotated screenshot (`snapshot -i -a -o`). Look for layout issues, broken images, alignment.
+2. **Interactive elements** — Click every button, link, and control. Does each do what it says?
+3. **Forms** — Fill and submit. Test empty submission, invalid data, edge cases (long text, special characters).
+4. **Navigation** — Check all paths in/out. Breadcrumbs, back button, deep links, mobile menu.
+5. **States** — Check empty state, loading state, error state, full/overflow state.
+6. **Console** — Run `console --errors` after interactions. Any new JS errors or failed requests?
+7. **Responsiveness** — If relevant, check mobile and tablet viewports.
+8. **Auth boundaries** — What happens when logged out? Different user roles?
@@ -0,0 +1,79 @@
+# QA Report: {APP_NAME}
+
+| Field | Value |
+|-------|-------|
+| **Date** | {DATE} |
+| **URL** | {URL} |
+| **Scope** | {SCOPE or "Full app"} |
+| **Mode** | {full / quick / regression} |
+| **Duration** | {DURATION} |
+| **Pages visited** | {COUNT} |
+| **Screenshots** | {COUNT} |
+| **Framework** | {DETECTED or "Unknown"} |
+
+## Health Score: {SCORE}/100
+
+| Category | Score |
+|----------|-------|
+| Console | {0-100} |
+| Links | {0-100} |
+| Visual | {0-100} |
+| Functional | {0-100} |
+| UX | {0-100} |
+| Performance | {0-100} |
+| Accessibility | {0-100} |
+
+## Top 3 Things to Fix
+
+1. **{ISSUE-NNN}: {title}** — {one-line description}
+2. **{ISSUE-NNN}: {title}** — {one-line description}
+3. **{ISSUE-NNN}: {title}** — {one-line description}
+
+## Console Health
+
+| Error | Count | First seen |
+|-------|-------|------------|
+| {error message} | {N} | {URL} |
+
+## Summary
+
+| Severity | Count |
+|----------|-------|
+| Critical | 0 |
+| High | 0 |
+| Medium | 0 |
+| Low | 0 |
+| **Total** | **0** |
+
+## Issues
+
+### ISSUE-001: {Short title}
+
+| Field | Value |
+|-------|-------|
+| **Severity** | critical / high / medium / low |
+| **Category** | visual / functional / ux / content / performance / console / accessibility |
+| **URL** | {page URL} |
+
+**Description:** {What is wrong, expected vs actual.}
+
+**Repro Steps:**
+
+1. Navigate to {URL}
+   ![Step 1](screenshots/issue-001-step-1.png)
+2. {Action}
+   ![Step 2](screenshots/issue-001-step-2.png)
+3. **Observe:** {what goes wrong}
+   ![Result](screenshots/issue-001-result.png)
+
+---
+
+## Regression (if applicable)
+
+| Metric | Baseline | Current | Delta |
+|--------|----------|---------|-------|
+| Health score | {N} | {N} | {+/-N} |
+| Issues | {N} | {N} | {+/-N} |
+
+**Fixed since baseline:** {list}
+**New since baseline:** {list}