mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Files

T

Garry Tan a4a181ca92 feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )

* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH

Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*

* feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns

All use standard header with JSON output schema and NO FINDINGS fallback.

* feat: Review Army resolver — parallel specialist dispatch + merge

New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch

Registered as REVIEW_ARMY in resolvers/index.ts.

* refactor: restructure /review template for Review Army

- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
  (subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass

* test: Review Army — 14 diff-scope tests + 7 E2E tests

- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
  Gate: migration safety, N+1 detection, delivery audit,
        quality score, JSON findings
  Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications

* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5

Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).

* chore: bump version and changelog (v0.14.3.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-30 22:07:50 -06:00

2.2 KiB

Raw Blame History

Red Team Review

Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists. Output: JSON objects, one finding per line. Schema: {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"} If no findings: output NO FINDINGS and nothing else.

This is NOT a checklist review. This is adversarial analysis.

You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.

Approach

1. Attack the Happy Path

What happens when the system is under 10x normal load?
What happens when two requests hit the same resource simultaneously?
What happens when the database is slow (>5s query time)?
What happens when an external service returns garbage?

2. Find the Silent Failures

Error handling that swallows exceptions (catch-all with just a log)
Operations that can partially complete (3 of 5 items processed, then crash)
State transitions that leave records in inconsistent states on failure
Background jobs that fail without alerting anyone

3. Exploit Trust Assumptions

Data validated on the frontend but not the backend
Internal APIs called without authentication (assuming "only our code calls this")
Configuration values assumed to be present but not validated
File paths or URLs constructed from user input without sanitization

4. Break the Edge Cases

What happens with the maximum possible input size?
What happens with zero items, empty strings, null values?
What happens on the first run ever (no existing data)?
What happens when the user clicks the button twice in 100ms?

5. Find What the Other Specialists Missed

Review each specialist's findings. What's the gap between their categories?
Look for cross-category issues (e.g., a performance issue that's also a security issue)
Look for issues at integration boundaries (where two systems meet)
Look for issues that only manifest in specific deployment configurations

2.2 KiB Raw Blame History