* feat: add daily update check script + /gstack-upgrade skill bin/gstack-update-check: pure bash, checks VERSION against remote once/day, outputs UPGRADE_AVAILABLE or JUST_UPGRADED. Uses ~/.gstack/ for state. gstack-upgrade/SKILL.md: new skill with inline upgrade flow for all preambles. Detects global-git, local-git, vendored installs. Shows What's New from CHANGELOG. browse/test/gstack-update-check.test.ts: 10 test cases covering all branch paths. * refactor: remove version check from find-browse, simplify to binary locator Delete checkVersion(), readCache(), writeCache(), fetchRemoteSHA(), resolveSkillDir(), CacheEntry interface, REPO_URL/CACHE_PATH/CACHE_TTL constants, and META output from find-browse.ts. Version checking is now handled by bin/gstack-update-check (previous commit). * feat: add update check preamble to all 9 skills Every skill now runs bin/gstack-update-check on invocation. If an upgrade is available, reads gstack-upgrade/SKILL.md inline upgrade flow. Also adds AskUserQuestion to 5 skills that lacked it (gstack root, browse, qa, retro, setup-browser-cookies) and Bash to plan-eng-review. Simplifies qa and setup-browser-cookies setup blocks (removes META parsing). * chore: bump version and changelog (v0.3.4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused import + add corrupt cache test Address pre-landing review findings: - Remove unused mkdirSync import from gstack-update-check.test.ts - Add Path I test: corrupt cache file falls through to remote fetch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
name, version, description, allowed-tools
| name | version | description | allowed-tools | ||||
|---|---|---|---|---|---|---|---|
| qa | 1.0.0 | Systematically QA test a web application. Use when asked to "qa", "QA", "test this site", "find bugs", "dogfood", or review quality. Four modes: diff-aware (automatic on feature branches — analyzes git diff, identifies affected pages, tests them), full (systematic exploration), quick (30-second smoke test), regression (compare against baseline). Produces structured report with health score, screenshots, and repro steps. |
|
Update Check (run first)
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD"
If output shows UPGRADE_AVAILABLE <old> <new>: read ~/.claude/skills/gstack/gstack-upgrade/SKILL.md and follow the "Inline upgrade flow" (AskUserQuestion → upgrade if yes, touch ~/.gstack/last-update-check if no). If JUST_UPGRADED <from> <to>: tell user "Running gstack v{to} (just updated!)" and continue.
/qa: Systematic QA Testing
You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
Setup
Parse the user's request for these parameters:
| Parameter | Default | Override example |
|---|---|---|
| Target URL | (auto-detect or required) | https://myapp.com, http://localhost:3000 |
| Mode | full | --quick, --regression .gstack/qa-reports/baseline.json |
| Output dir | .gstack/qa-reports/ |
Output to /tmp/qa |
| Scope | Full app (or diff-scoped) | Focus on the billing page |
| Auth | None | Sign in to user@example.com, Import cookies from cookies.json |
If no URL is given and you're on a feature branch: Automatically enter diff-aware mode (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
Find the browse binary:
B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
if [ -n "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
If NEEDS_SETUP:
- Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
- Run:
cd <SKILL_DIR> && ./setup - If
bunis not installed:curl -fsSL https://bun.sh/install | bash
Create output directories:
REPORT_DIR=".gstack/qa-reports"
mkdir -p "$REPORT_DIR/screenshots"
Modes
Diff-aware (automatic when on a feature branch with no URL)
This is the primary mode for developers verifying their work. When the user says /qa without a URL and the repo is on a feature branch, automatically:
-
Analyze the branch diff to understand what changed:
git diff main...HEAD --name-only git log main..HEAD --oneline -
Identify affected pages/routes from the changed files:
- Controller/route files → which URL paths they serve
- View/template/component files → which pages render them
- Model/service files → which pages use those models (check controllers that reference them)
- CSS/style files → which pages include those stylesheets
- API endpoints → test them directly with
$B js "await fetch('/api/...')" - Static pages (markdown, HTML) → navigate to them directly
-
Detect the running app — check common local dev ports:
$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \ $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \ $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
-
Test each affected page/route:
- Navigate to the page
- Take a screenshot
- Check console for errors
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
- Use
snapshot -Dbefore and after actions to verify the change had the expected effect
-
Cross-reference with commit messages and PR description to understand intent — what should the change do? Verify it actually does that.
-
Report findings scoped to the branch changes:
- "Changes tested: N pages/routes affected by this branch"
- For each: does it work? Screenshot evidence.
- Any regressions on adjacent pages?
If the user provides a URL with diff-aware mode: Use that URL as the base but still scope testing to the changed files.
Full (default when URL is provided)
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
Quick (--quick)
30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
Regression (--regression <baseline>)
Run full mode, then load baseline.json from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
Workflow
Phase 1: Initialize
- Find browse binary (see Setup above)
- Create output directories
- Copy report template from
qa/templates/qa-report-template.mdto output dir - Start timer for duration tracking
Phase 2: Authenticate (if needed)
If the user specified auth credentials:
$B goto <login-url>
$B snapshot -i # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]" # NEVER include real passwords in report
$B click @e5 # submit
$B snapshot -D # verify login succeeded
If the user provided a cookie file:
$B cookie-import cookies.json
$B goto <target-url>
If 2FA/OTP is required: Ask the user for the code and wait.
If CAPTCHA blocks you: Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
Phase 3: Orient
Get a map of the application:
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links # map navigation structure
$B console --errors # any errors on landing?
Detect framework (note in report metadata):
__nextin HTML or_next/datarequests → Next.jscsrf-tokenmeta tag → Railswp-contentin URLs → WordPress- Client-side routing with no page reloads → SPA
For SPAs: The links command may return few results because navigation is client-side. Use snapshot -i to find nav elements (buttons, menu items) instead.
Phase 4: Explore
Visit pages systematically. At each page:
$B goto <page-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$B console --errors
Then follow the per-page exploration checklist (see qa/references/issue-taxonomy.md):
- Visual scan — Look at the annotated screenshot for layout issues
- Interactive elements — Click buttons, links, controls. Do they work?
- Forms — Fill and submit. Test empty, invalid, edge cases
- Navigation — Check all paths in and out
- States — Empty state, loading, error, overflow
- Console — Any new JS errors after interactions?
- Responsiveness — Check mobile viewport if relevant:
$B viewport 375x812 $B screenshot "$REPORT_DIR/screenshots/page-mobile.png" $B viewport 1280x720
Depth judgment: Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
Quick mode: Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
Phase 5: Document
Document each issue immediately when found — don't batch them.
Two evidence tiers:
Interactive bugs (broken flows, dead buttons, form failures):
- Take a screenshot before the action
- Perform the action
- Take a screenshot showing the result
- Use
snapshot -Dto show what changed - Write repro steps referencing screenshots
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
Static bugs (typos, layout issues, missing images):
- Take a single annotated screenshot showing the problem
- Describe what's wrong
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
Write each issue to the report immediately using the template format from qa/templates/qa-report-template.md.
Phase 6: Wrap Up
- Compute health score using the rubric below
- Write "Top 3 Things to Fix" — the 3 highest-severity issues
- Write console health summary — aggregate all console errors seen across pages
- Update severity counts in the summary table
- Fill in report metadata — date, duration, pages visited, screenshot count, framework
- Save baseline — write
baseline.jsonwith:{ "date": "YYYY-MM-DD", "url": "<target>", "healthScore": N, "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }], "categoryScores": { "console": N, "links": N, ... } }
Regression mode: After writing the report, load the baseline file. Compare:
- Health score delta
- Issues fixed (in baseline but not current)
- New issues (in current but not baseline)
- Append the regression section to the report
Health Score Rubric
Compute each category score (0-100), then take the weighted average.
Console (weight: 15%)
- 0 errors → 100
- 1-3 errors → 70
- 4-10 errors → 40
- 10+ errors → 10
Links (weight: 10%)
- 0 broken → 100
- Each broken link → -15 (minimum 0)
Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
Each category starts at 100. Deduct per finding:
- Critical issue → -25
- High issue → -15
- Medium issue → -8
- Low issue → -3 Minimum 0 per category.
Weights
| Category | Weight |
|---|---|
| Console | 15% |
| Links | 10% |
| Visual | 10% |
| Functional | 20% |
| UX | 15% |
| Performance | 10% |
| Content | 5% |
| Accessibility | 15% |
Final Score
score = Σ (category_score × weight)
Framework-Specific Guidance
Next.js
- Check console for hydration errors (
Hydration failed,Text content did not match) - Monitor
_next/datarequests in network — 404s indicate broken data fetching - Test client-side navigation (click links, don't just
goto) — catches routing issues - Check for CLS (Cumulative Layout Shift) on pages with dynamic content
Rails
- Check for N+1 query warnings in console (if development mode)
- Verify CSRF token presence in forms
- Test Turbo/Stimulus integration — do page transitions work smoothly?
- Check for flash messages appearing and dismissing correctly
WordPress
- Check for plugin conflicts (JS errors from different plugins)
- Verify admin bar visibility for logged-in users
- Test REST API endpoints (
/wp-json/) - Check for mixed content warnings (common with WP)
General SPA (React, Vue, Angular)
- Use
snapshot -ifor navigation —linkscommand misses client-side routes - Check for stale state (navigate away and back — does data refresh?)
- Test browser back/forward — does the app handle history correctly?
- Check for memory leaks (monitor console after extended use)
Important Rules
- Repro is everything. Every issue needs at least one screenshot. No exceptions.
- Verify before documenting. Retry the issue once to confirm it's reproducible, not a fluke.
- Never include credentials. Write
[REDACTED]for passwords in repro steps. - Write incrementally. Append each issue to the report as you find it. Don't batch.
- Never read source code. Test as a user, not a developer.
- Check console after every interaction. JS errors that don't surface visually are still bugs.
- Test like a user. Use realistic data. Walk through complete workflows end-to-end.
- Depth over breadth. 5-10 well-documented issues with evidence > 20 vague descriptions.
- Never delete output files. Screenshots and reports accumulate — that's intentional.
- Use
snapshot -Cfor tricky UIs. Finds clickable divs that the accessibility tree misses.
Output Structure
.gstack/qa-reports/
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
├── screenshots/
│ ├── initial.png # Landing page annotated screenshot
│ ├── issue-001-step-1.png # Per-issue evidence
│ ├── issue-001-result.png
│ └── ...
└── baseline.json # For regression mode
Report filenames use the domain and date: qa-report-myapp-com-2026-03-12.md