fix: update check preamble exits 1 when up to date — convert all skills to .tmpl

The `[ -n "$_UPD" ] && echo "$_UPD"` line in 5 skills was missing `|| true`, causing exit code 1 when the update check finds no update (empty $_UPD). Fix: convert ship/, review/, plan-ceo-review/, plan-eng-review/, retro/ to .tmpl templates using {{UPDATE_CHECK}} placeholder (same as browse/qa/etc). All 9 skills now generated from templates — preamble changes propagate everywhere. Also: regenerates qa/SKILL.md which had drifted from its template, adds 12 tests validating the update check preamble exits 0 in all skills, removes completed TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-03-14 04:40:46 -05:00
parent ed802d0c7f
commit a67dae5f84
14 changed files with 1721 additions and 222 deletions
@@ -1,11 +1,12 @@
 ---
 name: qa
-version: 2.0.0
+version: 1.0.0
 description: |
  Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
-  "find bugs", "dogfood", or review quality. Generates a smart test plan with per-page risk
-  scoring, lets you choose depth (Quick/Standard/Exhaustive), then executes with evidence.
-  Reports persist to ~/.gstack/projects/ with history tracking and PR integration.
+  "find bugs", "dogfood", or review quality. Four modes: diff-aware (automatic on feature
+  branches — analyzes git diff, identifies affected pages, tests them), full (systematic
+  exploration), quick (30-second smoke test), regression (compare against baseline). Produces
+  structured report with health score, screenshots, and repro steps.
 allowed-tools:
  - Bash
  - Read
@@ -35,12 +36,12 @@ You are a QA engineer. Test web applications like a real user — click everythi
 | Parameter | Default | Override example |
 |-----------|---------|-----------------|
 | Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
-| Tier | (ask user) | `--quick`, `--exhaustive` |
-| Output dir | `~/.gstack/projects/{slug}/qa-reports/` | `Output to /tmp/qa` |
+| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
+| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
 | Scope | Full app (or diff-scoped) | `Focus on the billing page` |
 | Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |

-**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Phase 3).
+**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.

 **Find the browse binary:**

@@ -63,22 +64,67 @@ If `NEEDS_SETUP`:
 2. Run: `cd <SKILL_DIR> && ./setup`
 3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`

-**Set up report directory (persistent, global):**
+**Create output directories:**

 ```bash
-REMOTE_SLUG=$(browse/bin/remote-slug 2>/dev/null || ~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
-REPORT_DIR="$HOME/.gstack/projects/$REMOTE_SLUG/qa-reports"
+REPORT_DIR=".gstack/qa-reports"
 mkdir -p "$REPORT_DIR/screenshots"
 ```

-**Gather git context for report metadata:**
+---

-```bash
-BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
-COMMIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
-COMMIT_DATE=$(git log -1 --format=%Y-%m-%d 2>/dev/null || echo "unknown")
-PR_INFO=$(gh pr view --json number,url 2>/dev/null || echo "")
-```
+## Modes
+
+### Diff-aware (automatic when on a feature branch with no URL)
+
+This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
+
+1. **Analyze the branch diff** to understand what changed:
+   ```bash
+   git diff main...HEAD --name-only
+   git log main..HEAD --oneline
+   ```
+
+2. **Identify affected pages/routes** from the changed files:
+   - Controller/route files → which URL paths they serve
+   - View/template/component files → which pages render them
+   - Model/service files → which pages use those models (check controllers that reference them)
+   - CSS/style files → which pages include those stylesheets
+   - API endpoints → test them directly with `$B js "await fetch('/api/...')"`
+   - Static pages (markdown, HTML) → navigate to them directly
+
+3. **Detect the running app** — check common local dev ports:
+   ```bash
+   $B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
+   $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
+   $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
+   ```
+   If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
+
+4. **Test each affected page/route:**
+   - Navigate to the page
+   - Take a screenshot
+   - Check console for errors
+   - If the change was interactive (forms, buttons, flows), test the interaction end-to-end
+   - Use `snapshot -D` before and after actions to verify the change had the expected effect
+
+5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
+
+6. **Report findings** scoped to the branch changes:
+   - "Changes tested: N pages/routes affected by this branch"
+   - For each: does it work? Screenshot evidence.
+   - Any regressions on adjacent pages?
+
+**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
+
+### Full (default when URL is provided)
+Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
+
+### Quick (`--quick`)
+30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
+
+### Regression (`--regression <baseline>`)
+Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.

 ---

@@ -87,10 +133,9 @@ PR_INFO=$(gh pr view --json number,url 2>/dev/null || echo "")
 ### Phase 1: Initialize

 1. Find browse binary (see Setup above)
-2. Create report directory
+2. Create output directories
 3. Copy report template from `qa/templates/qa-report-template.md` to output dir
 4. Start timer for duration tracking
-5. Fill in report metadata: branch, commit, PR, date

 ### Phase 2: Authenticate (if needed)

@@ -116,7 +161,7 @@ $B goto <target-url>

 **If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."

-### Phase 3: Recon
+### Phase 3: Orient

 Get a map of the application:

@@ -135,127 +180,36 @@ $B console --errors               # any errors on landing?

 **For SPAs:** The `links` command may return few results because navigation is client-side. Use `snapshot -i` to find nav elements (buttons, menu items) instead.

-**If on a feature branch (diff-aware mode):**
+### Phase 4: Explore
+
+Visit pages systematically. At each page:

 ```bash
-git diff main...HEAD --name-only
-git log main..HEAD --oneline
+$B goto <page-url>
+$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
+$B console --errors
 ```

-Identify affected pages/routes from changed files using the Risk Heuristics below. Also:
+Then follow the **per-page exploration checklist** (see `qa/references/issue-taxonomy.md`):

-1. **Detect the running app** — check common local dev ports:
+1. **Visual scan** — Look at the annotated screenshot for layout issues
+2. **Interactive elements** — Click buttons, links, controls. Do they work?
+3. **Forms** — Fill and submit. Test empty, invalid, edge cases
+4. **Navigation** — Check all paths in and out
+5. **States** — Empty state, loading, error, overflow
+6. **Console** — Any new JS errors after interactions?
+7. **Responsiveness** — Check mobile viewport if relevant:
   ```bash
-   $B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
-   $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
-   $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
+   $B viewport 375x812
+   $B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
+   $B viewport 1280x720
   ```
-   If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
-
-2. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
-
-### Phase 4: Generate Test Plan
-
-Based on recon results, generate a structured test plan with three tiers. Each tier is a superset of the one above it.
-
-**Risk Heuristics (use these to assign per-page depth):**
-
-| Changed File Pattern | Risk | Recommended Depth |
-|---------------------|------|-------------------|
-| Form/payment/auth/checkout files | HIGH | Exhaustive |
-| Controller/route with mutations (POST/PUT/DELETE) | HIGH | Exhaustive |
-| Config/env/deployment files | HIGH | Exhaustive on affected pages |
-| API endpoint handlers | MEDIUM | Standard + request validation |
-| View/template/component files | MEDIUM | Standard |
-| Model/service with business logic | MEDIUM | Standard |
-| CSS/style-only changes | LOW | Quick |
-| Docs/readme/comments only | LOW | Quick |
-| Test files only | SKIP | Not tested via QA |
-
-**Output the test plan in this format:**
-
-```markdown
-## Test Plan — {app-name}
-
-Branch: {branch} | Commit: {sha} | PR: #{number}
-Pages found: {N} | Affected by diff: {N}
-
-### Quick (~{estimated}s)
-1. / (homepage) — smoke check
-2. /dashboard — loads, no console errors
-...
-
-### Standard (~{estimated}min)
-1-N. Above, plus:
-N+1. /checkout — fill payment form, submit, verify flow
-...
-
-### Exhaustive (~{estimated}min)
-1-N. Above, plus:
-N+1. /checkout — empty, invalid, boundary inputs
-N+2. All pages at 3 viewports (375px, 768px, 1280px)
-...
-```
-
-**Time estimates:** Base on page count. Quick: ~3s per page. Standard: ~30-60s per page. Exhaustive: ~2-3min per page.
-
-**Ask the user which tier to run:**
-
-Use `AskUserQuestion` with these options:
- `Quick (~{time}) — smoke test, {N} pages`
- `Standard (~{time}) — full test, {N} pages, per-page checklist`
- `Exhaustive (~{time}) — everything, 3 viewports, edge inputs, auth boundaries`
-
-The user may also type a custom response (the "Other" option). If they do, parse their edits (e.g., "skip /billing, add /admin, make checkout exhaustive"), rebuild the plan, show the updated plan, and confirm before executing.
-
-**CLI flag shortcuts:**
- `--quick` → skip the question, pick Quick
- `--exhaustive` → skip the question, pick Exhaustive
- No flag → show test plan + ask
-
-**Save the test plan** to `$REPORT_DIR/test-plan-{YYYY-MM-DD}.md` before execution begins.
-
-### Phase 5: Execute
-
-Run the chosen tier. Visit pages in the order specified by the test plan.
-
-#### Quick Depth (per page)
- Navigate to the page
- Check: does it load? Any console errors?
- Note broken links visible in navigation
-
-#### Standard Depth (per page)
-Everything in Quick, plus:
- Take annotated screenshot: `$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"`
- Follow the per-page exploration checklist (see `qa/references/issue-taxonomy.md`):
-  1. **Visual scan** — Look at the annotated screenshot for layout issues
-  2. **Interactive elements** — Click buttons, links, controls. Do they work?
-  3. **Forms** — Fill and submit. Test empty and invalid cases
-  4. **Navigation** — Check all paths in and out
-  5. **States** — Empty state, loading, error, overflow
-  6. **Console** — Any new JS errors after interactions?
-  7. **Responsiveness** — Check mobile viewport on key pages:
-     ```bash
-     $B viewport 375x812
-     $B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
-     $B viewport 1280x720
-     ```

 **Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).

-#### Exhaustive Depth (per page)
-Everything in Standard, plus:
- Every form tested with: empty submission, valid data, invalid data, boundary values, XSS-like inputs (`<script>alert(1)</script>`, `'; DROP TABLE users--`)
- Every interactive element clicked and verified
- 3 viewports: mobile (375px), tablet (768px), desktop (1280px)
- Full accessibility snapshot check
- Network request monitoring for 4xx/5xx errors and slow responses
- State testing: empty states, error states, loading states, overflow content
- Auth boundary test (attempt access while logged out)
- Back/forward navigation after interactions
- Console audit: every warning AND error, not just errors
+**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?

-### Phase 6: Document
+### Phase 5: Document

 Document each issue **immediately when found** — don't batch them.

@@ -285,74 +239,25 @@ $B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"

 **Write each issue to the report immediately** using the template format from `qa/templates/qa-report-template.md`.

-### Phase 7: Wrap Up
+### Phase 6: Wrap Up

 1. **Compute health score** using the rubric below
 2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
 3. **Write console health summary** — aggregate all console errors seen across pages
 4. **Update severity counts** in the summary table
-5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework, tier
+5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
 6. **Save baseline** — write `baseline.json` with:
   ```json
   {
     "date": "YYYY-MM-DD",
     "url": "<target>",
     "healthScore": N,
-     "tier": "Standard",
     "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
     "categoryScores": { "console": N, "links": N, ... }
   }
   ```

-7. **Update the QA run index** — append a row to `$REPORT_DIR/index.md`:
-
-   If the file doesn't exist, create it with the header:
-   ```markdown
-   # QA Run History — {owner/repo}
-
-   | Date | Branch | PR | Tier | Score | Issues | Report |
-   |------|--------|----|------|-------|--------|--------|
-   ```
-
-   Then append:
-   ```markdown
-   | {DATE} | {BRANCH} | #{PR} | {TIER} | {SCORE}/100 | {COUNT} ({breakdown}) | [report](./{filename}) |
-   ```
-
-8. **Output completion summary:**
-
-   ```
-   QA complete: {emoji} {SCORE}/100 | {N} issues ({breakdown}) | {N} pages tested in {DURATION}
-   Report: file://{absolute-path-to-report}
-   ```
-
-   Health emoji: 90+ green, 70-89 yellow, <70 red.
-
-9. **Auto-open preference** — read `~/.gstack/config.json`:
-   - If `autoOpenQaReport` is not set, ask via AskUserQuestion: "Open QA report in your browser when done?" with options ["Yes, always open", "No, just show the link"]. Save the answer to `~/.gstack/config.json`.
-   - If `autoOpenQaReport` is `true`, run `open "{report-path}"` (macOS).
-   - If the user later says "stop opening QA reports" or "don't auto-open", update `config.json` to `false`.
-
-10. **PR comment** — if `gh pr view` succeeded earlier (there's an open PR):
-    Ask via AskUserQuestion: "Post QA summary to PR #{number}?" with options ["Yes, post comment", "No, skip"].
-
-    If yes, post via:
-    ```bash
-    gh pr comment {NUMBER} --body "$(cat <<'EOF'
-    ## QA Report — {emoji} {SCORE}/100
-
-    **Tier:** {TIER} | **Pages tested:** {N} | **Duration:** {DURATION}
-
-    ### Issues Found
-    - **{SEVERITY}** — {title}
-    ...
-
-    [Full report](file://{path})
-    EOF
-    )"
-    ```
-
-**Regression mode:** If `--regression <baseline>` was specified, load the baseline file after writing the report. Compare:
+**Regression mode:** After writing the report, load the baseline file. Compare:
 - Health score delta
 - Issues fixed (in baseline but not current)
 - New issues (in current but not baseline)
@@ -363,34 +268,24 @@ $B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
 ## Health Score Rubric

 Compute each category score (0-100), then take the weighted average.
-If a category was not tested (e.g., no pages had forms to test), score it 100 (no evidence of issues).

 ### Console (weight: 15%)
 - 0 errors → 100
 - 1-3 errors → 70
 - 4-10 errors → 40
- 11+ errors → 10
+- 10+ errors → 10

 ### Links (weight: 10%)
 - 0 broken → 100
 - Each broken link → -15 (minimum 0)

-### Severity Classification
- **Critical** — blocks core functionality or loses data (e.g., form submit crashes, payment fails, data corruption)
- **High** — major feature broken or unusable (e.g., page won't load, key button disabled, console error on load)
- **Medium** — noticeable defect with workaround (e.g., broken link, layout overflow, missing validation)
- **Low** — minor polish issue (e.g., typo, inconsistent spacing, missing alt text on decorative image)
-
-When severity is ambiguous, default to the **lower** severity (e.g., if unsure between High and Medium, pick Medium).
-
 ### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
-Each category starts at 100. Deduct per **distinct** finding (a finding = one specific defect on one specific page):
+Each category starts at 100. Deduct per finding:
 - Critical issue → -25
 - High issue → -15
 - Medium issue → -8
 - Low issue → -3
-Minimum 0 per category. Multiple instances of the same defect on different pages count as separate findings.
-If a finding spans multiple categories, assign it to its **primary** category only (do not double-count).
+Minimum 0 per category.

 ### Weights
 | Category | Weight |
@@ -455,16 +350,14 @@ If a finding spans multiple categories, assign it to its **primary** category on
 ## Output Structure

 ```
-~/.gstack/projects/{remote-slug}/qa-reports/
-├── index.md                                  # QA run history with links
-├── test-plan-{YYYY-MM-DD}.md                 # Approved test plan
-├── qa-report-{domain}-{YYYY-MM-DD}.md        # Structured report
-├── baseline.json                             # For regression mode
-└── screenshots/
-    ├── initial.png                           # Landing page annotated screenshot
-    ├── issue-001-step-1.png                  # Per-issue evidence
-    ├── issue-001-result.png
-    └── ...
+.gstack/qa-reports/
+├── qa-report-{domain}-{YYYY-MM-DD}.md    # Structured report
+├── screenshots/
+│   ├── initial.png                        # Landing page annotated screenshot
+│   ├── issue-001-step-1.png               # Per-issue evidence
+│   ├── issue-001-result.png
+│   └── ...
+└── baseline.json                          # For regression mode
 ```

 Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`