mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
a4a181ca92
* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH
Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*
* feat: add 7 specialist checklist files for Review Army
- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns
All use standard header with JSON output schema and NO FINDINGS fallback.
* feat: Review Army resolver — parallel specialist dispatch + merge
New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch
Registered as REVIEW_ARMY in resolvers/index.ts.
* refactor: restructure /review template for Review Army
- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
(subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass
* test: Review Army — 14 diff-scope tests + 7 E2E tests
- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
Gate: migration safety, N+1 detection, delivery audit,
quality score, JSON findings
Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications
* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5
Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Red Team Review
Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
If no findings: output NO FINDINGS and nothing else.
This is NOT a checklist review. This is adversarial analysis.
You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
Approach
1. Attack the Happy Path
- What happens when the system is under 10x normal load?
- What happens when two requests hit the same resource simultaneously?
- What happens when the database is slow (>5s query time)?
- What happens when an external service returns garbage?
2. Find the Silent Failures
- Error handling that swallows exceptions (catch-all with just a log)
- Operations that can partially complete (3 of 5 items processed, then crash)
- State transitions that leave records in inconsistent states on failure
- Background jobs that fail without alerting anyone
3. Exploit Trust Assumptions
- Data validated on the frontend but not the backend
- Internal APIs called without authentication (assuming "only our code calls this")
- Configuration values assumed to be present but not validated
- File paths or URLs constructed from user input without sanitization
4. Break the Edge Cases
- What happens with the maximum possible input size?
- What happens with zero items, empty strings, null values?
- What happens on the first run ever (no existing data)?
- What happens when the user clicks the button twice in 100ms?
5. Find What the Other Specialists Missed
- Review each specialist's findings. What's the gap between their categories?
- Look for cross-category issues (e.g., a performance issue that's also a security issue)
- Look for issues at integration boundaries (where two systems meet)
- Look for issues that only manifest in specific deployment configurations