mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-13 16:04:58 +02:00
1a4f0c9c15
* fix(learnings): use token-OR matching in gstack-learnings-search --query
Split the query on whitespace into tokens; a learning matches if ANY
token appears as a substring in ANY of key/insight/files. Previously
the whole query was a single substring, so multi-word queries like
"debug investigation" only matched learnings whose insight contained
that exact contiguous phrase, which is usually nothing.
Whitespace-only query falls through to no-query (matches today's no-flag
behavior). Single-word queries behave exactly as before.
Adds test/gstack-learnings-search.test.ts: 3 assertions covering
multi-token, single-token, and no-query backwards compat.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(resolver): parameterized LEARNINGS_SEARCH with shell-injection guard
The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that
gets interpolated as --query "<keyword>" into the generated bash. Empty
value falls through to no-query (principle of least surprise: a stray
{{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a
build failure). Pattern reuses the parameterized-macro parsing from
composition.ts. The 13 templates that don't pass a query stay
byte-identical in their generated SKILL.md output.
Shell-injection guard: the query value is whitelisted to
^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks,
semicolons, or quotes throw a loud build error instead of emitting
executable bash. Static template queries are safe by inspection;
this defends against future contributors writing dangerous values.
Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args,
claude+query=foo bar on both cross-project and project-scoped branches,
codex host variant, empty value semantics, and shell-injection payloads
(\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(skills): task-shaped queries + mid-flow refresh in /investigate /qa /ship
The three long skills now pull learnings keyed to their theme at the
top, then re-pull at phase boundaries as work shifts to new sub-tasks.
Top-of-skill queries (5-6 token unions, token-OR matched):
- investigate: "debug investigation root cause hypothesis bug fix"
- qa: "qa testing bug regression flake fixture"
- ship: "release ship version changelog merge pr"
Mid-flow refresh blocks (concrete keyword recipe + worked examples):
- investigate: between Phase 1 (hypothesis) and Phase 2 (analysis),
keyed to the hypothesis noun. Examples: auth-cookie, session-expiry.
- qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the
buggy component name. Examples: checkout-button, signup-form.
- ship: just before Step 12 (VERSION bump), keyed to the headline
feature. Examples: learnings-search, pacing, worktree-ship.
Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes,
dots, colons) so dynamic queries cannot inject shell metacharacters.
The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form.
Backwards-compat verified via diff: their generated SKILL.md output is
byte-identical to before this change.
Golden ship fixtures regenerated to match the new ship/SKILL.md output.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore: bump version and changelog (v1.33.1.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* test: refresh codex+factory ship golden fixtures
Follow-up to 513c9660 — the codex and factory host outputs needed
regeneration too, missed in the initial commit because gen:skill-docs
was only run for the claude host. Now matches gen:skill-docs --host all.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
355 lines
12 KiB
Cheetah
355 lines
12 KiB
Cheetah
---
|
|
name: qa
|
|
preamble-tier: 4
|
|
version: 2.0.0
|
|
description: |
|
|
Systematically QA test a web application and fix bugs found. Runs QA testing,
|
|
then iteratively fixes bugs in source code, committing each fix atomically and
|
|
re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs",
|
|
"test and fix", or "fix what's broken".
|
|
Proactively suggest when the user says a feature is ready for testing
|
|
or asks "does this work?". Three tiers: Quick (critical/high only),
|
|
Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores,
|
|
fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only. (gstack)
|
|
voice-triggers:
|
|
- "quality check"
|
|
- "test the app"
|
|
- "run QA"
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- Write
|
|
- Edit
|
|
- Glob
|
|
- Grep
|
|
- AskUserQuestion
|
|
- WebSearch
|
|
triggers:
|
|
- qa test this
|
|
- find bugs on site
|
|
- test the site
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
{{BASE_BRANCH_DETECT}}
|
|
|
|
{{GBRAIN_CONTEXT_LOAD}}
|
|
|
|
# /qa: Test → Fix → Verify
|
|
|
|
You are a QA engineer AND a bug-fix engineer. Test web applications like a real user — click everything, fill every form, check every state. When you find bugs, fix them in source code with atomic commits, then re-verify. Produce a structured report with before/after evidence.
|
|
|
|
## Setup
|
|
|
|
**Parse the user's request for these parameters:**
|
|
|
|
| Parameter | Default | Override example |
|
|
|-----------|---------|-----------------:|
|
|
| Target URL | (auto-detect or required) | `https://myapp.com`, `http://localhost:3000` |
|
|
| Tier | Standard | `--quick`, `--exhaustive` |
|
|
| Mode | full | `--regression .gstack/qa-reports/baseline.json` |
|
|
| Output dir | `.gstack/qa-reports/` | `Output to /tmp/qa` |
|
|
| Scope | Full app (or diff-scoped) | `Focus on the billing page` |
|
|
| Auth | None | `Sign in to user@example.com`, `Import cookies from cookies.json` |
|
|
|
|
**Tiers determine which issues get fixed:**
|
|
- **Quick:** Fix critical + high severity only
|
|
- **Standard:** + medium severity (default)
|
|
- **Exhaustive:** + low/cosmetic severity
|
|
|
|
**If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works.
|
|
|
|
**CDP mode detection:** Before starting, check if the browse server is connected to the user's real browser:
|
|
```bash
|
|
$B status 2>/dev/null | grep -q "Mode: cdp" && echo "CDP_MODE=true" || echo "CDP_MODE=false"
|
|
```
|
|
If `CDP_MODE=true`: skip cookie import prompts (the real browser already has cookies), skip user-agent overrides (real browser has real user-agent), and skip headless detection workarounds. The user's real auth sessions are already available.
|
|
|
|
**Check for clean working tree:**
|
|
|
|
```bash
|
|
git status --porcelain
|
|
```
|
|
|
|
If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion:
|
|
|
|
"Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit."
|
|
|
|
- A) Commit my changes — commit all current changes with a descriptive message, then start QA
|
|
- B) Stash my changes — stash, run QA, pop the stash after
|
|
- C) Abort — I'll clean up manually
|
|
|
|
RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits.
|
|
|
|
After the user chooses, execute their choice (commit or stash), then continue with setup.
|
|
|
|
**Find the browse binary:**
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
**Check test framework (bootstrap if needed):**
|
|
|
|
{{TEST_BOOTSTRAP}}
|
|
|
|
**Create output directories:**
|
|
|
|
```bash
|
|
mkdir -p .gstack/qa-reports/screenshots
|
|
```
|
|
|
|
---
|
|
|
|
{{LEARNINGS_SEARCH:query=qa testing bug regression flake fixture}}
|
|
|
|
## Test Plan Context
|
|
|
|
Before falling back to git diff heuristics, check for richer test plan sources:
|
|
|
|
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
|
```bash
|
|
setopt +o nomatch 2>/dev/null || true # zsh compat
|
|
{{SLUG_EVAL}}
|
|
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
|
```
|
|
2. **Conversation context:** Check if a prior `/plan-eng-review` or `/plan-ceo-review` produced test plan output in this conversation
|
|
3. **Use whichever source is richer.** Fall back to git diff analysis only if neither is available.
|
|
|
|
---
|
|
|
|
## Phases 1-6: QA Baseline
|
|
|
|
{{QA_METHODOLOGY}}
|
|
|
|
Record baseline health score at end of Phase 6.
|
|
|
|
---
|
|
|
|
## Output Structure
|
|
|
|
```
|
|
.gstack/qa-reports/
|
|
├── qa-report-{domain}-{YYYY-MM-DD}.md # Structured report
|
|
├── screenshots/
|
|
│ ├── initial.png # Landing page annotated screenshot
|
|
│ ├── issue-001-step-1.png # Per-issue evidence
|
|
│ ├── issue-001-result.png
|
|
│ ├── issue-001-before.png # Before fix (if fixed)
|
|
│ ├── issue-001-after.png # After fix (if fixed)
|
|
│ └── ...
|
|
└── baseline.json # For regression mode
|
|
```
|
|
|
|
Report filenames use the domain and date: `qa-report-myapp-com-2026-03-12.md`
|
|
|
|
---
|
|
|
|
## Phase 7: Triage
|
|
|
|
Sort all discovered issues by severity, then decide which to fix based on the selected tier:
|
|
|
|
- **Quick:** Fix critical + high only. Mark medium/low as "deferred."
|
|
- **Standard:** Fix critical + high + medium. Mark low as "deferred."
|
|
- **Exhaustive:** Fix all, including cosmetic/low severity.
|
|
|
|
Mark issues that cannot be fixed from source code (e.g., third-party widget bugs, infrastructure issues) as "deferred" regardless of tier.
|
|
|
|
### Refresh learnings for the component/page where the bug lives
|
|
|
|
The top-of-skill learnings pull was keyed to "qa testing" broadly. Before the fix loop, re-pull learnings keyed to the component or page where the bug you're about to fix lives so prior fixes for the same component-shape surface.
|
|
|
|
Pick ONE keyword that names the buggy component or page. The keyword should be a noun: the failing component name, the page route base, or the feature noun. The keyword MUST be alphanumeric or hyphen only — no quotes, slashes, dots, colons, or whitespace. If your candidate has any of those, simplify to just the alphanumeric stem.
|
|
|
|
Worked examples (qa-specific): good keywords are `checkout-button`, `signup-form`, `payment`. Bad: `tests are failing`, `<failing-test>`, `app/views/_checkout.html.erb`.
|
|
|
|
```bash
|
|
~/.claude/skills/gstack/bin/gstack-learnings-search --query "<your-keyword>" --limit 5 2>/dev/null || true
|
|
```
|
|
|
|
If any learnings come back, name which one applies to the fix you're about to make in one sentence. If none come back, continue without reference — the absence is itself useful information.
|
|
|
|
---
|
|
|
|
## Phase 8: Fix Loop
|
|
|
|
For each fixable issue, in severity order:
|
|
|
|
### 8a. Locate source
|
|
|
|
```bash
|
|
# Grep for error messages, component names, route definitions
|
|
# Glob for file patterns matching the affected page
|
|
```
|
|
|
|
- Find the source file(s) responsible for the bug
|
|
- ONLY modify files directly related to the issue
|
|
|
|
### 8b. Fix
|
|
|
|
- Read the source code, understand the context
|
|
- Make the **minimal fix** — smallest change that resolves the issue
|
|
- Do NOT refactor surrounding code, add features, or "improve" unrelated things
|
|
|
|
### 8c. Commit
|
|
|
|
```bash
|
|
git add <only-changed-files>
|
|
git commit -m "fix(qa): ISSUE-NNN — short description"
|
|
```
|
|
|
|
- One commit per fix. Never bundle multiple fixes.
|
|
- Message format: `fix(qa): ISSUE-NNN — short description`
|
|
|
|
### 8d. Re-test
|
|
|
|
- Navigate back to the affected page
|
|
- Take **before/after screenshot pair**
|
|
- Check console for errors
|
|
- Use `snapshot -D` to verify the change had the expected effect
|
|
|
|
```bash
|
|
$B goto <affected-url>
|
|
$B screenshot "$REPORT_DIR/screenshots/issue-NNN-after.png"
|
|
$B console --errors
|
|
$B snapshot -D
|
|
```
|
|
|
|
### 8e. Classify
|
|
|
|
- **verified**: re-test confirms the fix works, no new errors introduced
|
|
- **best-effort**: fix applied but couldn't fully verify (e.g., needs auth state, external service)
|
|
- **reverted**: regression detected → `git revert HEAD` → mark issue as "deferred"
|
|
|
|
### 8e.5. Regression Test
|
|
|
|
Skip if: classification is not "verified", OR the fix is purely visual/CSS with no JS behavior, OR no test framework was detected AND user declined bootstrap.
|
|
|
|
**1. Study the project's existing test patterns:**
|
|
|
|
Read 2-3 test files closest to the fix (same directory, same code type). Match exactly:
|
|
- File naming, imports, assertion style, describe/it nesting, setup/teardown patterns
|
|
The regression test must look like it was written by the same developer.
|
|
|
|
**2. Trace the bug's codepath, then write a regression test:**
|
|
|
|
Before writing the test, trace the data flow through the code you just fixed:
|
|
- What input/state triggered the bug? (the exact precondition)
|
|
- What codepath did it follow? (which branches, which function calls)
|
|
- Where did it break? (the exact line/condition that failed)
|
|
- What other inputs could hit the same codepath? (edge cases around the fix)
|
|
|
|
The test MUST:
|
|
- Set up the precondition that triggered the bug (the exact state that made it break)
|
|
- Perform the action that exposed the bug
|
|
- Assert the correct behavior (NOT "it renders" or "it doesn't throw")
|
|
- If you found adjacent edge cases while tracing, test those too (e.g., null input, empty array, boundary value)
|
|
- Include full attribution comment:
|
|
```
|
|
// Regression: ISSUE-NNN — {what broke}
|
|
// Found by /qa on {YYYY-MM-DD}
|
|
// Report: .gstack/qa-reports/qa-report-{domain}-{date}.md
|
|
```
|
|
|
|
Test type decision:
|
|
- Console error / JS exception / logic bug → unit or integration test
|
|
- Broken form / API failure / data flow bug → integration test with request/response
|
|
- Visual bug with JS behavior (broken dropdown, animation) → component test
|
|
- Pure CSS → skip (caught by QA reruns)
|
|
|
|
Generate unit tests. Mock all external dependencies (DB, API, Redis, file system).
|
|
|
|
Use auto-incrementing names to avoid collisions: check existing `{name}.regression-*.test.{ext}` files, take max number + 1.
|
|
|
|
**3. Run only the new test file:**
|
|
|
|
```bash
|
|
{detected test command} {new-test-file}
|
|
```
|
|
|
|
**4. Evaluate:**
|
|
- Passes → commit: `git commit -m "test(qa): regression test for ISSUE-NNN — {desc}"`
|
|
- Fails → fix test once. Still failing → delete test, defer.
|
|
- Taking >2 min exploration → skip and defer.
|
|
|
|
**5. WTF-likelihood exclusion:** Test commits don't count toward the heuristic.
|
|
|
|
### 8f. Self-Regulation (STOP AND EVALUATE)
|
|
|
|
Every 5 fixes (or after any revert), compute the WTF-likelihood:
|
|
|
|
```
|
|
WTF-LIKELIHOOD:
|
|
Start at 0%
|
|
Each revert: +15%
|
|
Each fix touching >3 files: +5%
|
|
After fix 15: +1% per additional fix
|
|
All remaining Low severity: +10%
|
|
Touching unrelated files: +20%
|
|
```
|
|
|
|
**If WTF > 20%:** STOP immediately. Show the user what you've done so far. Ask whether to continue.
|
|
|
|
**Hard cap: 50 fixes.** After 50 fixes, stop regardless of remaining issues.
|
|
|
|
---
|
|
|
|
## Phase 9: Final QA
|
|
|
|
After all fixes are applied:
|
|
|
|
1. Re-run QA on all affected pages
|
|
2. Compute final health score
|
|
3. **If final score is WORSE than baseline:** WARN prominently — something regressed
|
|
|
|
---
|
|
|
|
## Phase 10: Report
|
|
|
|
Write the report to both local and project-scoped locations:
|
|
|
|
**Local:** `.gstack/qa-reports/qa-report-{domain}-{YYYY-MM-DD}.md`
|
|
|
|
**Project-scoped:** Write test outcome artifact for cross-session context:
|
|
```bash
|
|
{{SLUG_SETUP}}
|
|
```
|
|
Write to `~/.gstack/projects/{slug}/{user}-{branch}-test-outcome-{datetime}.md`
|
|
|
|
**Per-issue additions** (beyond standard report template):
|
|
- Fix Status: verified / best-effort / reverted / deferred
|
|
- Commit SHA (if fixed)
|
|
- Files Changed (if fixed)
|
|
- Before/After screenshots (if fixed)
|
|
|
|
**Summary section:**
|
|
- Total issues found
|
|
- Fixes applied (verified: X, best-effort: Y, reverted: Z)
|
|
- Deferred issues
|
|
- Health score delta: baseline → final
|
|
|
|
**PR Summary:** Include a one-line summary suitable for PR descriptions:
|
|
> "QA found N issues, fixed M, health score X → Y."
|
|
|
|
---
|
|
|
|
## Phase 11: TODOS.md Update
|
|
|
|
If the repo has a `TODOS.md`:
|
|
|
|
1. **New deferred bugs** → add as TODOs with severity, category, and repro steps
|
|
2. **Fixed bugs that were in TODOS.md** → annotate with "Fixed by /qa on {branch}, {date}"
|
|
|
|
---
|
|
|
|
{{LEARNINGS_LOG}}
|
|
|
|
{{GBRAIN_SAVE_RESULTS}}
|
|
|
|
## Additional Rules (qa-specific)
|
|
|
|
11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding.
|
|
12. **One commit per fix.** Never bundle multiple fixes into one commit.
|
|
13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files.
|
|
14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.
|
|
15. **Self-regulate.** Follow the WTF-likelihood heuristic. When in doubt, stop and ask.
|