merge: integrate origin/main (v0.4.0, v0.4.1) into team-supabase-store

Resolves conflicts in CHANGELOG.md (ordering), CONTRIBUTING.md (eval tools list merge), VERSION (take main's 0.4.1), qa/SKILL.md.tmpl (keep full methodology + baseline line), eval-store.test.ts (drop redundant comment). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 21:46:40 +02:00 · 2026-03-16 07:49:27 -05:00
parent 83bfc7f88d 3e3843c4a9
commit 2357f134ce
47 changed files with 3358 additions and 269 deletions
@@ -1,48 +1,114 @@
 ---
 name: qa
-version: 1.0.0
+version: 2.0.0
 description: |
-  Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
-  "find bugs", "dogfood", or review quality. Four modes: diff-aware (automatic on feature
-  branches — analyzes git diff, identifies affected pages, tests them), full (systematic
-  exploration), quick (30-second smoke test), regression (compare against baseline). Produces
-  structured report with health score, screenshots, and repro steps.
+  Systematically QA test a web application and fix bugs found. Runs QA testing,
+  then iteratively fixes bugs in source code, committing each fix atomically and
+  re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
+  "test and fix", or "fix what's broken". Three tiers: Quick (critical/high only),
+  Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
+  fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
 allowed-tools:
  - Bash
  - Read
  - Write
+  - Edit
+  - Glob
+  - Grep
  - AskUserQuestion
 ---
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->

-## Update Check (run first)
+## Preamble (run first)

 ```bash
 _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
 [ -n "$_UPD" ] && echo "$_UPD" || true
+mkdir -p ~/.gstack/sessions
+touch ~/.gstack/sessions/"$PPID"
+_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
+find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
+_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
 ```

 If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.

-# /qa: Systematic QA Testing
+## AskUserQuestion Format

-You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
+**ALWAYS follow this structure for every AskUserQuestion call:**
+1. Context: project name, current branch, what we're working on (1-2 sentences)
+2. The specific question or decision point
+3. `RECOMMENDATION: Choose [X] because [one-line reason]`
+4. Lettered options: `A) ... B) ... C) ...`
+
+If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes.
+
+Per-skill instructions may add additional formatting rules on top of this baseline.
+
+## Contributor Mode
+
+If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
+
+**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff.
+**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
+
+**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
+
+```
+# {Title}
+
+Hey gstack team — ran into this while using /{skill-name}:
+
+**What I was trying to do:** {what the user/agent was attempting}
+**What happened instead:** {what actually happened}
+**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
+
+## Steps to reproduce
+1. {step}
+
+## Raw output
+(wrap any error messages or unexpected output in a markdown code block)
+
+**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
+```
+
+Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`
+
+Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
+
+# /qa: Test → Fix → Verify
+
+You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.

 ## Setup

 **Parse the user's request for these parameters:**

 | Parameter | Default | Override example |
-|-----------|---------|-----------------|
+|-----------|---------|-----------------:|
 | Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
-| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
+| Tier | Standard | `--quick`, `--exhaustive` |
+| Mode | full | `--regression .gstack/qa-reports/baseline.json` |
 | Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
 | Scope | Full app (or diff-scoped) | `Focus on the billing page` |
 | Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |

+**Tiers determine which issues get fixed:**
+- **Quick:** Fix critical + high severity only
+- **Standard:** + medium severity (default)
+- **Exhaustive:** + low/cosmetic severity
+
 **If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.

+**Require clean working tree before starting:**
+```bash
+if [ -n "$(git status --porcelain)" ]; then
+  echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
+  exit 1
+fi
+```
+
 **Find the browse binary:**

 ## SETUP (run this check BEFORE any browse command)
@@ -73,6 +139,22 @@ mkdir -p "$REPORT_DIR/screenshots"

 ---

+## Test Plan Context
+
+Before falling back to git diff heuristics, check for richer test plan sources:
+
+1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
+   ```bash
+   SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
+   ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
+   ```
+2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
+3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available.
+
+---
+
+## Phases 1-6: QA Baseline
+
 ## Modes

 ### Diff-aware (automatic when on a feature branch with no URL)
@@ -363,6 +445,8 @@ Minimum 0 per category.
 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
 10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.

+Record baseline health score at end of Phase 6.
+
 ---

 ## Output Structure
@@ -374,8 +458,151 @@ Minimum 0 per category.
 │   ├── initial.png                        # Landing page annotated screenshot
 │   ├── issue-001-step-1.png               # Per-issue evidence
 │   ├── issue-001-result.png
+│   ├── issue-001-before.png               # Before fix (if fixed)
+│   ├── issue-001-after.png                # After fix (if fixed)
 │   └── ...
 └── baseline.json                          # For regression mode
 ```

 Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
+
+---
+
+## Phase 7: Triage
+
+Sort all discovered issues by severity, then decide which to fix based on the selected tier:
+
+- **Quick:** Fix critical + high only. Mark medium/low as "deferred."
+- **Standard:** Fix critical + high + medium. Mark low as "deferred."
+- **Exhaustive:** Fix all, including cosmetic/low severity.
+
+Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
+
+---
+
+## Phase 8: Fix Loop
+
+For each fixable issue, in severity order:
+
+### 8a. Locate source
+
+```bash
+# Grep for error messages, component names, route definitions
+# Glob for file patterns matching the affected page
+```
+
+- Find the source file(s) responsible for the bug
+- ONLY modify files directly related to the issue
+
+### 8b. Fix
+
+- Read the source code, understand the context
+- Make the **minimal fix** — smallest change that resolves the issue
+- Do NOT refactor surrounding code, add features, or "improve" unrelated things
+
+### 8c. Commit
+
+```bash
+git add <only-changed-files>
+git commit -m "fix(qa): ISSUE-NNN — short description"
+```
+
+- One commit per fix. Never bundle multiple fixes.
+- Message format: `fix(qa): ISSUE-NNN — short description`
+
+### 8d. Re-test
+
+- Navigate back to the affected page
+- Take **before/after screenshot pair**
+- Check console for errors
+- Use `snapshot -D` to verify the change had the expected effect
+
+```bash
+$B goto <affected-url>
+$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
+$B console --errors
+$B snapshot -D
+```
+
+### 8e. Classify
+
+- **verified**: re-test confirms the fix works, no new errors introduced
+- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
+- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
+
+### 8f. Self-Regulation (STOP AND EVALUATE)
+
+Every 5 fixes (or after any revert), compute the WTF-likelihood:
+
+```
+WTF-LIKELIHOOD:
+  Start at 0%
+  Each revert:                +15%
+  Each fix touching >3 files: +5%
+  After fix 15:               +1% per additional fix
+  All remaining Low severity: +10%
+  Touching unrelated files:   +20%
+```
+
+**If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
+
+**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
+
+---
+
+## Phase 9: Final QA
+
+After all fixes are applied:
+
+1. Re-run QA on all affected pages
+2. Compute final health score
+3. **If final score is WORSE than baseline:** WARN prominently — something regressed
+
+---
+
+## Phase 10: Report
+
+Write the report to both local and project-scoped locations:
+
+**Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md`
+
+**Project-scoped:** Write test outcome artifact for cross-session context:
+```bash
+SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
+mkdir -p ~/.gstack/projects/$SLUG
+```
+Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
+
+**Per-issue additions** (beyond standard report template):
+- Fix Status: verified / best-effort / reverted / deferred
+- Commit SHA (if fixed)
+- Files Changed (if fixed)
+- Before/After screenshots (if fixed)
+
+**Summary section:**
+- Total issues found
+- Fixes applied (verified: X, best-effort: Y, reverted: Z)
+- Deferred issues
+- Health score delta: baseline → final
+
+**PR Summary:** Include a one-line summary suitable for PR descriptions:
+> "QA found N issues, fixed M, health score X → Y."
+
+---
+
+## Phase 11: TODOS.md Update
+
+If the repo has a `TODOS.md`:
+
+1. **New deferred bugs** → add as TODOs with severity, category, and repro steps
+2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}"
+
+---
+
+## Additional Rules (qa-specific)
+
+11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
+12. **One commit per fix.** Never bundle multiple fixes into one commit.
+13. **Never modify tests or CI configuration.** Only fix application source code.
+14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
+15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
@@ -1,39 +1,57 @@
 ---
 name: qa
-version: 1.0.0
+version: 2.0.0
 description: |
-  Systematically QA test a web application. Use when asked to "qa", "QA", "test this site",
-  "find bugs", "dogfood", or review quality. Four modes: diff-aware (automatic on feature
-  branches — analyzes git diff, identifies affected pages, tests them), full (systematic
-  exploration), quick (30-second smoke test), regression (compare against baseline). Produces
-  structured report with health score, screenshots, and repro steps.
+  Systematically QA test a web application and fix bugs found. Runs QA testing,
+  then iteratively fixes bugs in source code, committing each fix atomically and
+  re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
+  "test and fix", or "fix what's broken". Three tiers: Quick (critical/high only),
+  Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
+  fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
 allowed-tools:
  - Bash
  - Read
  - Write
+  - Edit
+  - Glob
+  - Grep
  - AskUserQuestion
 ---

-{{UPDATE_CHECK}}
+{{PREAMBLE}}

-# /qa: Systematic QA Testing
+# /qa: Test → Fix → Verify

-You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.
+You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.

 ## Setup

 **Parse the user's request for these parameters:**

 | Parameter | Default | Override example |
-|-----------|---------|-----------------|
+|-----------|---------|-----------------:|
 | Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
-| Mode | full | `--quick`, `--regression .gstack/qa-reports/baseline.json` |
+| Tier | Standard | `--quick`, `--exhaustive` |
+| Mode | full | `--regression .gstack/qa-reports/baseline.json` |
 | Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
 | Scope | Full app (or diff-scoped) | `Focus on the billing page` |
 | Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |

+**Tiers determine which issues get fixed:**
+- **Quick:** Fix critical + high severity only
+- **Standard:** + medium severity (default)
+- **Exhaustive:** + low/cosmetic severity
+
 **If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.

+**Require clean working tree before starting:**
+```bash
+if [ -n "$(git status --porcelain)" ]; then
+  echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa."
+  exit 1
+fi
+```
+
 **Find the browse binary:**

 {{BROWSE_SETUP}}
@@ -47,66 +65,23 @@ mkdir -p "$REPORT_DIR/screenshots"

 ---

-## Modes
+## Test Plan Context

-### Diff-aware (automatic when on a feature branch with no URL)
+Before falling back to git diff heuristics, check for richer test plan sources:

-This is the **primary mode** for developers verifying their work. When the user says `/qa` without a URL and the repo is on a feature branch, automatically:
-
-1. **Analyze the branch diff** to understand what changed:
+1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
   ```bash
-   git diff main...HEAD --name-only
-   git log main..HEAD --oneline
+   SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
+   ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
   ```
-
-2. **Identify affected pages/routes** from the changed files:
-   - Controller/route files → which URL paths they serve
-   - View/template/component files → which pages render them
-   - Model/service files → which pages use those models (check controllers that reference them)
-   - CSS/style files → which pages include those stylesheets
-   - API endpoints → test them directly with `$B js "await fetch('/api/...')"`
-   - Static pages (markdown, HTML) → navigate to them directly
-
-3. **Detect the running app** — check common local dev ports:
-   ```bash
-   $B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
-   $B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
-   $B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
-   ```
-   If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
-
-4. **Test each affected page/route:**
-   - Navigate to the page
-   - Take a screenshot
-   - Check console for errors
-   - If the change was interactive (forms, buttons, flows), test the interaction end-to-end
-   - Use `snapshot -D` before and after actions to verify the change had the expected effect
-
-5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
-
-6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
-
-7. **Report findings** scoped to the branch changes:
-   - "Changes tested: N pages/routes affected by this branch"
-   - For each: does it work? Screenshot evidence.
-   - Any regressions on adjacent pages?
-
-**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
-
-### Full (default when URL is provided)
-Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
-
-### Quick (`--quick`)
-30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
-
-### Regression (`--regression <baseline>`)
-Run full mode, then load `baseline.json` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
+2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
+3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available.

 ---

-## Workflow
+## Phases 1-6: QA Baseline

-### Phase 1: Initialize
+{{QA_METHODOLOGY}}

 1. Find browse binary (see Setup above)
 2. Create output directories
@@ -337,6 +312,8 @@ Minimum 0 per category.
 9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
 10. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.

+Record baseline health score at end of Phase 6.
+
 ---

 ## Output Structure
@@ -348,8 +325,151 @@ Minimum 0 per category.
 │   ├── initial.png                        # Landing page annotated screenshot
 │   ├── issue-001-step-1.png               # Per-issue evidence
 │   ├── issue-001-result.png
+│   ├── issue-001-before.png               # Before fix (if fixed)
+│   ├── issue-001-after.png                # After fix (if fixed)
 │   └── ...
 └── baseline.json                          # For regression mode
 ```

 Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
+
+---
+
+## Phase 7: Triage
+
+Sort all discovered issues by severity, then decide which to fix based on the selected tier:
+
+- **Quick:** Fix critical + high only. Mark medium/low as "deferred."
+- **Standard:** Fix critical + high + medium. Mark low as "deferred."
+- **Exhaustive:** Fix all, including cosmetic/low severity.
+
+Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
+
+---
+
+## Phase 8: Fix Loop
+
+For each fixable issue, in severity order:
+
+### 8a. Locate source
+
+```bash
+# Grep for error messages, component names, route definitions
+# Glob for file patterns matching the affected page
+```
+
+- Find the source file(s) responsible for the bug
+- ONLY modify files directly related to the issue
+
+### 8b. Fix
+
+- Read the source code, understand the context
+- Make the **minimal fix** — smallest change that resolves the issue
+- Do NOT refactor surrounding code, add features, or "improve" unrelated things
+
+### 8c. Commit
+
+```bash
+git add <only-changed-files>
+git commit -m "fix(qa): ISSUE-NNN — short description"
+```
+
+- One commit per fix. Never bundle multiple fixes.
+- Message format: `fix(qa): ISSUE-NNN — short description`
+
+### 8d. Re-test
+
+- Navigate back to the affected page
+- Take **before/after screenshot pair**
+- Check console for errors
+- Use `snapshot -D` to verify the change had the expected effect
+
+```bash
+$B goto <affected-url>
+$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
+$B console --errors
+$B snapshot -D
+```
+
+### 8e. Classify
+
+- **verified**: re-test confirms the fix works, no new errors introduced
+- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
+- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
+
+### 8f. Self-Regulation (STOP AND EVALUATE)
+
+Every 5 fixes (or after any revert), compute the WTF-likelihood:
+
+```
+WTF-LIKELIHOOD:
+  Start at 0%
+  Each revert:                +15%
+  Each fix touching >3 files: +5%
+  After fix 15:               +1% per additional fix
+  All remaining Low severity: +10%
+  Touching unrelated files:   +20%
+```
+
+**If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
+
+**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
+
+---
+
+## Phase 9: Final QA
+
+After all fixes are applied:
+
+1. Re-run QA on all affected pages
+2. Compute final health score
+3. **If final score is WORSE than baseline:** WARN prominently — something regressed
+
+---
+
+## Phase 10: Report
+
+Write the report to both local and project-scoped locations:
+
+**Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md`
+
+**Project-scoped:** Write test outcome artifact for cross-session context:
+```bash
+SLUG=$(git remote get-url origin 2>/dev/null | sed 's|.*[:/]\([^/]*/[^/]*\)\.git$|\1|;s|.*[:/]\([^/]*/[^/]*\)$|\1|' | tr '/' '-')
+mkdir -p ~/.gstack/projects/$SLUG
+```
+Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
+
+**Per-issue additions** (beyond standard report template):
+- Fix Status: verified / best-effort / reverted / deferred
+- Commit SHA (if fixed)
+- Files Changed (if fixed)
+- Before/After screenshots (if fixed)
+
+**Summary section:**
+- Total issues found
+- Fixes applied (verified: X, best-effort: Y, reverted: Z)
+- Deferred issues
+- Health score delta: baseline → final
+
+**PR Summary:** Include a one-line summary suitable for PR descriptions:
+> "QA found N issues, fixed M, health score X → Y."
+
+---
+
+## Phase 11: TODOS.md Update
+
+If the repo has a `TODOS.md`:
+
+1. **New deferred bugs** → add as TODOs with severity, category, and repro steps
+2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}"
+
+---
+
+## Additional Rules (qa-specific)
+
+11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
+12. **One commit per fix.** Never bundle multiple fixes into one commit.
+13. **Never modify tests or CI configuration.** Only fix application source code.
+14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
+15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
@@ -72,6 +72,33 @@

 ---

+## Fixes Applied (if applicable)
+
+| Issue | Fix Status | Commit | Files Changed |
+|-------|-----------|--------|---------------|
+| ISSUE-NNN | verified / best-effort / reverted / deferred | {SHA} | {files} |
+
+### Before/After Evidence
+
+#### ISSUE-NNN: {title}
+**Before:** ![Before](screenshots/issue-NNN-before.png)
+**After:** ![After](screenshots/issue-NNN-after.png)
+
+---
+
+## Ship Readiness
+
+| Metric | Value |
+|--------|-------|
+| Health score | {before} → {after} ({delta}) |
+| Issues found | N |
+| Fixes applied | N (verified: X, best-effort: Y, reverted: Z) |
+| Deferred | N |
+
+**PR Summary:** "QA found N issues, fixed M, health score X → Y."
+
+---
+
 ## Regression (if applicable)

 | Metric | Baseline | Current | Delta |