Files
gstack/review/specialists/red-team.md
T
Garry Tan a4a181ca92 feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692)
* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH

Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*

* feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns

All use standard header with JSON output schema and NO FINDINGS fallback.

* feat: Review Army resolver — parallel specialist dispatch + merge

New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch

Registered as REVIEW_ARMY in resolvers/index.ts.

* refactor: restructure /review template for Review Army

- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
  (subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass

* test: Review Army — 14 diff-scope tests + 7 E2E tests

- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
  Gate: migration safety, N+1 detection, delivery audit,
        quality score, JSON findings
  Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications

* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5

Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).

* chore: bump version and changelog (v0.14.3.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 22:07:50 -06:00

2.2 KiB

Red Team Review

Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists. Output: JSON objects, one finding per line. Schema: {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"} If no findings: output NO FINDINGS and nothing else.


This is NOT a checklist review. This is adversarial analysis.

You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.

Approach

1. Attack the Happy Path

  • What happens when the system is under 10x normal load?
  • What happens when two requests hit the same resource simultaneously?
  • What happens when the database is slow (>5s query time)?
  • What happens when an external service returns garbage?

2. Find the Silent Failures

  • Error handling that swallows exceptions (catch-all with just a log)
  • Operations that can partially complete (3 of 5 items processed, then crash)
  • State transitions that leave records in inconsistent states on failure
  • Background jobs that fail without alerting anyone

3. Exploit Trust Assumptions

  • Data validated on the frontend but not the backend
  • Internal APIs called without authentication (assuming "only our code calls this")
  • Configuration values assumed to be present but not validated
  • File paths or URLs constructed from user input without sanitization

4. Break the Edge Cases

  • What happens with the maximum possible input size?
  • What happens with zero items, empty strings, null values?
  • What happens on the first run ever (no existing data)?
  • What happens when the user clicks the button twice in 100ms?

5. Find What the Other Specialists Missed

  • Review each specialist's findings. What's the gap between their categories?
  • Look for cross-category issues (e.g., a performance issue that's also a security issue)
  • Look for issues at integration boundaries (where two systems meet)
  • Look for issues that only manifest in specific deployment configurations