mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
60061d0b6d
* fix: replace zsh-incompatible raw globs with find-based alternatives and setopt guards Zsh's NOMATCH option (on by default) causes raw globs like `*.yaml` and `*deploy*` to throw errors when no files match, instead of silently expanding to nothing as bash does. The preamble resolver already handled this correctly with find, but 38 glob instances across 13 templates and 2 resolvers still used raw shell globs. Two fix approaches based on complexity: - find-based replacement for cat/for/ls-with-pipes patterns (.github/workflows/) - setopt +o nomatch guard for simple ls -t patterns (~/.gstack/, ~/.claude/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files from updated templates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.8.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add zsh glob safety test + fix 2 missed resolver globs Adds a test that scans all generated SKILL.md bash blocks for raw glob patterns and verifies they have either a find-based replacement or a setopt +o nomatch guard. The test immediately caught 2 unguarded blocks in review.ts (design doc re-check and plan file discovery). Also syncs package.json version to 0.12.8.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
325 lines
11 KiB
Cheetah
325 lines
11 KiB
Cheetah
---
|
|
name: qa
|
|
preamble-tier: 4
|
|
version: 2.0.0
|
|
description: |
|
|
Systematically QA test a web application and fix bugs found. Runs QA testing,
|
|
then iteratively fixes bugs in source code, committing each fix atomically and
|
|
re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
|
|
"test and fix", or "fix what's broken".
|
|
Proactively suggest when the user says a feature is ready for testing
|
|
or asks "does this work?". Three tiers: Quick (critical/high only),
|
|
Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
|
|
fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only.
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- Write
|
|
- Edit
|
|
- Glob
|
|
- Grep
|
|
- AskUserQuestion
|
|
- WebSearch
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
{{BASE_BRANCH_DETECT}}
|
|
|
|
# /qa: Test → Fix → Verify
|
|
|
|
You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
|
|
|
|
## Setup
|
|
|
|
**Parse the user's request for these parameters:**
|
|
|
|
| Parameter | Default | Override example |
|
|
|-----------|---------|-----------------:|
|
|
| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
|
|
| Tier | Standard | `--quick`, `--exhaustive` |
|
|
| Mode | full | `--regression .gstack/qa-reports/baseline.json` |
|
|
| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
|
|
| Scope | Full app (or diff-scoped) | `Focus on the billing page` |
|
|
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
|
|
|
|
**Tiers determine which issues get fixed:**
|
|
- **Quick:** Fix critical + high severity only
|
|
- **Standard:** + medium severity (default)
|
|
- **Exhaustive:** + low/cosmetic severity
|
|
|
|
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
|
|
|
|
**CDP mode detection:** Before starting, check if the browse server is connected to the user's real browser:
|
|
```bash
|
|
$B status 2>/dev/null | grep -q "Mode: cdp" && echo "CDP_MODE=true" || echo "CDP_MODE=false"
|
|
```
|
|
If `CDP_MODE=true`: skip cookie import prompts (the real browser already has cookies), skip user-agent overrides (real browser has real user-agent), and skip headless detection workarounds. The user's real auth sessions are already available.
|
|
|
|
**Check for clean working tree:**
|
|
|
|
```bash
|
|
git status --porcelain
|
|
```
|
|
|
|
If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion:
|
|
|
|
"Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit."
|
|
|
|
- A) Commit my changes — commit all current changes with a descriptive message, then start QA
|
|
- B) Stash my changes — stash, run QA, pop the stash after
|
|
- C) Abort — I'll clean up manually
|
|
|
|
RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits.
|
|
|
|
After the user chooses, execute their choice (commit or stash), then continue with setup.
|
|
|
|
**Find the browse binary:**
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
**Check test framework (bootstrap if needed):**
|
|
|
|
{{TEST_BOOTSTRAP}}
|
|
|
|
**Create output directories:**
|
|
|
|
```bash
|
|
mkdir -p .gstack/qa-reports/screenshots
|
|
```
|
|
|
|
---
|
|
|
|
## Test Plan Context
|
|
|
|
Before falling back to git diff heuristics, check for richer test plan sources:
|
|
|
|
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
|
```bash
|
|
setopt +o nomatch 2>/dev/null || true # zsh compat
|
|
{{SLUG_EVAL}}
|
|
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
|
```
|
|
2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
|
|
3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available.
|
|
|
|
---
|
|
|
|
## Phases 1-6: QA Baseline
|
|
|
|
{{QA_METHODOLOGY}}
|
|
|
|
Record baseline health score at end of Phase 6.
|
|
|
|
---
|
|
|
|
## Output Structure
|
|
|
|
```
|
|
.gstack/qa-reports/
|
|
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
|
|
├── screenshots/
|
|
│ ├── initial.png # Landing page annotated screenshot
|
|
│ ├── issue-001-step-1.png # Per-issue evidence
|
|
│ ├── issue-001-result.png
|
|
│ ├── issue-001-before.png # Before fix (if fixed)
|
|
│ ├── issue-001-after.png # After fix (if fixed)
|
|
│ └── ...
|
|
└── baseline.json # For regression mode
|
|
```
|
|
|
|
Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
|
|
|
|
---
|
|
|
|
## Phase 7: Triage
|
|
|
|
Sort all discovered issues by severity, then decide which to fix based on the selected tier:
|
|
|
|
- **Quick:** Fix critical + high only. Mark medium/low as "deferred."
|
|
- **Standard:** Fix critical + high + medium. Mark low as "deferred."
|
|
- **Exhaustive:** Fix all, including cosmetic/low severity.
|
|
|
|
Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
|
|
|
|
---
|
|
|
|
## Phase 8: Fix Loop
|
|
|
|
For each fixable issue, in severity order:
|
|
|
|
### 8a. Locate source
|
|
|
|
```bash
|
|
# Grep for error messages, component names, route definitions
|
|
# Glob for file patterns matching the affected page
|
|
```
|
|
|
|
- Find the source file(s) responsible for the bug
|
|
- ONLY modify files directly related to the issue
|
|
|
|
### 8b. Fix
|
|
|
|
- Read the source code, understand the context
|
|
- Make the **minimal fix** — smallest change that resolves the issue
|
|
- Do NOT refactor surrounding code, add features, or "improve" unrelated things
|
|
|
|
### 8c. Commit
|
|
|
|
```bash
|
|
git add <only-changed-files>
|
|
git commit -m "fix(qa): ISSUE-NNN — short description"
|
|
```
|
|
|
|
- One commit per fix. Never bundle multiple fixes.
|
|
- Message format: `fix(qa): ISSUE-NNN — short description`
|
|
|
|
### 8d. Re-test
|
|
|
|
- Navigate back to the affected page
|
|
- Take **before/after screenshot pair**
|
|
- Check console for errors
|
|
- Use `snapshot -D` to verify the change had the expected effect
|
|
|
|
```bash
|
|
$B goto <affected-url>
|
|
$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
|
|
$B console --errors
|
|
$B snapshot -D
|
|
```
|
|
|
|
### 8e. Classify
|
|
|
|
- **verified**: re-test confirms the fix works, no new errors introduced
|
|
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
|
|
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
|
|
|
|
### 8e.5. Regression Test
|
|
|
|
Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
|
|
|
|
**1. Study the project's existing test patterns:**
|
|
|
|
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
|
|
- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
|
|
The regression test must look like it was written by the same developer.
|
|
|
|
**2. Trace the bug's codepath, then write a regression test:**
|
|
|
|
Before writing the test, trace the data flow through the code you just fixed:
|
|
- What input/state triggered the bug? (the exact precondition)
|
|
- What codepath did it follow? (which branches, which function calls)
|
|
- Where did it break? (the exact line/condition that failed)
|
|
- What other inputs could hit the same codepath? (edge cases around the fix)
|
|
|
|
The test MUST:
|
|
- Set up the precondition that triggered the bug (the exact state that made it break)
|
|
- Perform the action that exposed the bug
|
|
- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
|
|
- If you found adjacent edge cases while tracing, test those too (e.g., null input, empty array, boundary value)
|
|
- Include full attribution comment:
|
|
```
|
|
// Regression: ISSUE-NNN — {what broke}
|
|
// Found by /qa on {YYYY-MM-DD}
|
|
// Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
|
|
```
|
|
|
|
Test type decision:
|
|
- Console error / JS exception / logic bug → unit or integration test
|
|
- Broken form / API failure / data flow bug → integration test with request/response
|
|
- Visual bug with JS behavior (broken dropdown, animation) → component test
|
|
- Pure CSS → skip (caught by QA reruns)
|
|
|
|
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
|
|
|
|
Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
|
|
|
|
**3. Run only the new test file:**
|
|
|
|
```bash
|
|
{detected test command} {new-test-file}
|
|
```
|
|
|
|
**4. Evaluate:**
|
|
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
|
|
- Fails → fix test once. Still failing → delete test, defer.
|
|
- Taking >2 min exploration → skip and defer.
|
|
|
|
**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
|
|
|
|
### 8f. Self-Regulation (STOP AND EVALUATE)
|
|
|
|
Every 5 fixes (or after any revert), compute the WTF-likelihood:
|
|
|
|
```
|
|
WTF-LIKELIHOOD:
|
|
Start at 0%
|
|
Each revert: +15%
|
|
Each fix touching >3 files: +5%
|
|
After fix 15: +1% per additional fix
|
|
All remaining Low severity: +10%
|
|
Touching unrelated files: +20%
|
|
```
|
|
|
|
**If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
|
|
|
|
**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
|
|
|
|
---
|
|
|
|
## Phase 9: Final QA
|
|
|
|
After all fixes are applied:
|
|
|
|
1. Re-run QA on all affected pages
|
|
2. Compute final health score
|
|
3. **If final score is WORSE than baseline:** WARN prominently — something regressed
|
|
|
|
---
|
|
|
|
## Phase 10: Report
|
|
|
|
Write the report to both local and project-scoped locations:
|
|
|
|
**Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md`
|
|
|
|
**Project-scoped:** Write test outcome artifact for cross-session context:
|
|
```bash
|
|
{{SLUG_SETUP}}
|
|
```
|
|
Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
|
|
|
|
**Per-issue additions** (beyond standard report template):
|
|
- Fix Status: verified / best-effort / reverted / deferred
|
|
- Commit SHA (if fixed)
|
|
- Files Changed (if fixed)
|
|
- Before/After screenshots (if fixed)
|
|
|
|
**Summary section:**
|
|
- Total issues found
|
|
- Fixes applied (verified: X, best-effort: Y, reverted: Z)
|
|
- Deferred issues
|
|
- Health score delta: baseline → final
|
|
|
|
**PR Summary:** Include a one-line summary suitable for PR descriptions:
|
|
> "QA found N issues, fixed M, health score X → Y."
|
|
|
|
---
|
|
|
|
## Phase 11: TODOS.md Update
|
|
|
|
If the repo has a `TODOS.md`:
|
|
|
|
1. **New deferred bugs** → add as TODOs with severity, category, and repro steps
|
|
2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}"
|
|
|
|
---
|
|
|
|
## Additional Rules (qa-specific)
|
|
|
|
11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.
|
|
12. **One commit per fix.** Never bundle multiple fixes into one commit.
|
|
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
|
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
|
15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
|