feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns

All use standard header with JSON output schema and NO FINDINGS fallback.
This commit is contained in:
Garry Tan
2026-03-30 19:52:36 -07:00
parent 55d80c4ccc
commit f06c38d975
7 changed files with 340 additions and 0 deletions
+48
View File
@@ -0,0 +1,48 @@
# API Contract Specialist Review Checklist
Scope: When SCOPE_API=true
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"api-contract","summary":"...","fix":"...","fingerprint":"path:line:api-contract","specialist":"api-contract"}
If no findings: output `NO FINDINGS` and nothing else.
---
## Categories
### Breaking Changes
- Removed fields from response bodies (clients may depend on them)
- Changed field types (string → number, object → array)
- New required parameters added to existing endpoints
- Changed HTTP methods (GET → POST) or status codes (200 → 201)
- Renamed endpoints without maintaining the old path as a redirect/alias
- Changed authentication requirements (public → authenticated)
### Versioning Strategy
- Breaking changes made without a version bump (v1 → v2)
- Multiple versioning strategies mixed in the same API (URL vs header vs query param)
- Deprecated endpoints without a sunset timeline or migration guide
- Version-specific logic scattered across controllers instead of centralized
### Error Response Consistency
- New endpoints returning different error formats than existing ones
- Error responses missing standard fields (error code, message, details)
- HTTP status codes that don't match the error type (200 for errors, 500 for validation)
- Error messages that leak internal implementation details (stack traces, SQL)
### Rate Limiting & Pagination
- New endpoints missing rate limiting when similar endpoints have it
- Pagination changes (offset → cursor) without backwards compatibility
- Changed page sizes or default limits without documentation
- Missing total count or next-page indicators in paginated responses
### Documentation Drift
- OpenAPI/Swagger spec not updated to match new endpoints or changed params
- README or API docs describing old behavior after changes
- Example requests/responses that no longer work
- Missing documentation for new endpoints or changed parameters
### Backwards Compatibility
- Clients on older versions: will they break?
- Mobile apps that can't force-update: does the API still work for them?
- Webhook payloads changed without notifying subscribers
- SDK or client library changes needed to use new features
+47
View File
@@ -0,0 +1,47 @@
# Data Migration Specialist Review Checklist
Scope: When SCOPE_MIGRATIONS=true
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"}
If no findings: output `NO FINDINGS` and nothing else.
---
## Categories
### Reversibility
- Can this migration be rolled back without data loss?
- Is there a corresponding down/rollback migration?
- Does the rollback actually undo the change or just no-op?
- Would rolling back break the current application code?
### Data Loss Risk
- Dropping columns that still contain data (add deprecation period first)
- Changing column types that truncate data (varchar(255) → varchar(50))
- Removing tables without verifying no code references them
- Renaming columns without updating all references (ORM, raw SQL, views)
- NOT NULL constraints added to columns with existing NULL values (needs backfill first)
### Lock Duration
- ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL)
- Adding indexes without CONCURRENTLY on tables with >100K rows
- Multiple ALTER TABLE statements that could be combined into one lock acquisition
- Schema changes that acquire exclusive locks during peak traffic hours
### Backfill Strategy
- New NOT NULL columns without DEFAULT value (requires backfill before constraint)
- New columns with computed defaults that need batch population
- Missing backfill script or rake task for existing records
- Backfill that updates all rows at once instead of batching (locks table)
### Index Creation
- CREATE INDEX without CONCURRENTLY on production tables
- Duplicate indexes (new index covers same columns as existing one)
- Missing indexes on new foreign key columns
- Partial indexes where a full index would be more useful (or vice versa)
### Multi-Phase Safety
- Migrations that must be deployed in a specific order with application code
- Schema changes that break the current running code (deploy code first, then migrate)
- Migrations that assume a deploy boundary (old code + new schema = crash)
- Missing feature flag to handle mixed old/new code during rolling deploy
+45
View File
@@ -0,0 +1,45 @@
# Maintainability Specialist Review Checklist
Scope: Always-on (every review)
Output: JSON objects, one finding per line. Schema:
{"severity":"INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"maintainability","summary":"...","fix":"...","fingerprint":"path:line:maintainability","specialist":"maintainability"}
If no findings: output `NO FINDINGS` and nothing else.
---
## Categories
### Dead Code & Unused Imports
- Variables assigned but never read in the changed files
- Functions/methods defined but never called (check with Grep across the repo)
- Imports/requires that are no longer referenced after the change
- Commented-out code blocks (either remove or explain why they exist)
### Magic Numbers & String Coupling
- Bare numeric literals used in logic (thresholds, limits, retry counts) — should be named constants
- Error message strings used as query filters or conditionals elsewhere
- Hardcoded URLs, ports, or hostnames that should be config
- Duplicated literal values across multiple files
### Stale Comments & Docstrings
- Comments that describe old behavior after the code was changed in this diff
- TODO/FIXME comments that reference completed work
- Docstrings with parameter lists that don't match the current function signature
- ASCII diagrams in comments that no longer match the code flow
### DRY Violations
- Similar code blocks (3+ lines) appearing multiple times within the diff
- Copy-paste patterns where a shared helper would be cleaner
- Configuration or setup logic duplicated across test files
- Repeated conditional chains that could be a lookup table or map
### Conditional Side Effects
- Code paths that branch on a condition but forget a side effect on one branch
- Log messages that claim an action happened but the action was conditionally skipped
- State transitions where one branch updates related records but the other doesn't
- Event emissions that only fire on the happy path, missing error/edge paths
### Module Boundary Violations
- Reaching into another module's internal implementation (accessing private-by-convention methods)
- Direct database queries in controllers/views that should go through a service/model
- Tight coupling between components that should communicate through interfaces
+51
View File
@@ -0,0 +1,51 @@
# Performance Specialist Review Checklist
Scope: When SCOPE_BACKEND=true OR SCOPE_FRONTEND=true
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"performance","summary":"...","fix":"...","fingerprint":"path:line:performance","specialist":"performance"}
If no findings: output `NO FINDINGS` and nothing else.
---
## Categories
### N+1 Queries
- ActiveRecord/ORM associations traversed in loops without eager loading (.includes, joinedload, include)
- Database queries inside iteration blocks (each, map, forEach) that could be batched
- Nested serializers that trigger lazy-loaded associations
- GraphQL resolvers that query per-field instead of batching (check for DataLoader usage)
### Missing Database Indexes
- New WHERE clauses on columns without indexes (check migration files or schema)
- New ORDER BY on non-indexed columns
- Composite queries (WHERE a AND b) without composite indexes
- Foreign key columns added without indexes
### Algorithmic Complexity
- O(n^2) or worse patterns: nested loops over collections, Array.find inside Array.map
- Repeated linear searches that could use a hash/map/set lookup
- String concatenation in loops (use join or StringBuilder)
- Sorting or filtering large collections multiple times when once would suffice
### Bundle Size Impact (Frontend)
- New production dependencies that are known-heavy (moment.js, lodash full, jquery)
- Barrel imports (import from 'library') instead of deep imports (import from 'library/specific')
- Large static assets (images, fonts) committed without optimization
- Missing code splitting for route-level chunks
### Rendering Performance (Frontend)
- Fetch waterfalls: sequential API calls that could be parallel (Promise.all)
- Unnecessary re-renders from unstable references (new objects/arrays in render)
- Missing React.memo, useMemo, or useCallback on expensive computations
- Layout thrashing from reading then writing DOM properties in loops
- Missing loading="lazy" on below-fold images
### Missing Pagination
- List endpoints that return unbounded results (no LIMIT, no pagination params)
- Database queries without LIMIT that grow with data volume
- API responses that embed full nested objects instead of IDs with expansion
### Blocking in Async Contexts
- Synchronous I/O (file reads, subprocess, HTTP requests) inside async functions
- time.sleep() / Thread.sleep() inside event-loop-based handlers
- CPU-intensive computation blocking the main thread without worker offload
+44
View File
@@ -0,0 +1,44 @@
# Red Team Review
Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
If no findings: output `NO FINDINGS` and nothing else.
---
This is NOT a checklist review. This is adversarial analysis.
You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
## Approach
### 1. Attack the Happy Path
- What happens when the system is under 10x normal load?
- What happens when two requests hit the same resource simultaneously?
- What happens when the database is slow (>5s query time)?
- What happens when an external service returns garbage?
### 2. Find the Silent Failures
- Error handling that swallows exceptions (catch-all with just a log)
- Operations that can partially complete (3 of 5 items processed, then crash)
- State transitions that leave records in inconsistent states on failure
- Background jobs that fail without alerting anyone
### 3. Exploit Trust Assumptions
- Data validated on the frontend but not the backend
- Internal APIs called without authentication (assuming "only our code calls this")
- Configuration values assumed to be present but not validated
- File paths or URLs constructed from user input without sanitization
### 4. Break the Edge Cases
- What happens with the maximum possible input size?
- What happens with zero items, empty strings, null values?
- What happens on the first run ever (no existing data)?
- What happens when the user clicks the button twice in 100ms?
### 5. Find What the Other Specialists Missed
- Review each specialist's findings. What's the gap between their categories?
- Look for cross-category issues (e.g., a performance issue that's also a security issue)
- Look for issues at integration boundaries (where two systems meet)
- Look for issues that only manifest in specific deployment configurations
+60
View File
@@ -0,0 +1,60 @@
# Security Specialist Review Checklist
Scope: When SCOPE_AUTH=true OR (SCOPE_BACKEND=true AND diff > 100 lines)
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"security","summary":"...","fix":"...","fingerprint":"path:line:security","specialist":"security"}
If no findings: output `NO FINDINGS` and nothing else.
---
This checklist goes deeper than the main CRITICAL pass. The main agent already checks SQL injection, race conditions, LLM trust, and enum completeness. This specialist focuses on auth/authz patterns, cryptographic misuse, and attack surface expansion.
## Categories
### Input Validation at Trust Boundaries
- User input accepted without validation at controller/handler level
- Query parameters used directly in database queries or file paths
- Request body fields accepted without type checking or schema validation
- File uploads without type/size/content validation
- Webhook payloads processed without signature verification
### Auth & Authorization Bypass
- Endpoints missing authentication middleware (check route definitions)
- Authorization checks that default to "allow" instead of "deny"
- Role escalation paths (user can modify their own role/permissions)
- Direct object reference vulnerabilities (user A accesses user B's data by changing an ID)
- Session fixation or session hijacking opportunities
- Token/API key validation that doesn't check expiration
### Injection Vectors (beyond SQL)
- Command injection via subprocess calls with user-controlled arguments
- Template injection (Jinja2, ERB, Handlebars) with user input
- LDAP injection in directory queries
- SSRF via user-controlled URLs (fetch, redirect, webhook targets)
- Path traversal via user-controlled file paths (../../etc/passwd)
- Header injection via user-controlled values in HTTP headers
### Cryptographic Misuse
- Weak hashing algorithms (MD5, SHA1) for security-sensitive operations
- Predictable randomness (Math.random, rand()) for tokens or secrets
- Non-constant-time comparisons (==) on secrets, tokens, or digests
- Hardcoded encryption keys or IVs
- Missing salt in password hashing
### Secrets Exposure
- API keys, tokens, or passwords in source code (even in comments)
- Secrets logged in application logs or error messages
- Credentials in URLs (query parameters or basic auth in URL)
- Sensitive data in error responses returned to users
- PII stored in plaintext when encryption is expected
### XSS via Escape Hatches
- Rails: .html_safe, raw() on user-controlled data
- React: dangerouslySetInnerHTML with user content
- Vue: v-html with user content
- Django: |safe, mark_safe() on user input
- General: innerHTML assignment with unsanitized data
### Deserialization
- Deserializing untrusted data (pickle, Marshal, YAML.load, JSON.parse of executable types)
- Accepting serialized objects from user input or external APIs without schema validation
+45
View File
@@ -0,0 +1,45 @@
# Testing Specialist Review Checklist
Scope: Always-on (every review)
Output: JSON objects, one finding per line. Schema:
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"testing","summary":"...","fix":"...","fingerprint":"path:line:testing","specialist":"testing"}
If no findings: output `NO FINDINGS` and nothing else.
---
## Categories
### Missing Negative-Path Tests
- New code paths that handle errors, rejections, or invalid input with NO corresponding test
- Guard clauses and early returns that are untested
- Error branches in try/catch, rescue, or error boundaries with no failure-path test
- Permission/auth checks that are asserted in code but never tested for the "denied" case
### Missing Edge-Case Coverage
- Boundary values: zero, negative, max-int, empty string, empty array, nil/null/undefined
- Single-element collections (off-by-one on loops)
- Unicode and special characters in user-facing inputs
- Concurrent access patterns with no race-condition test
### Test Isolation Violations
- Tests sharing mutable state (class variables, global singletons, DB records not cleaned up)
- Order-dependent tests (pass in sequence, fail when randomized)
- Tests that depend on system clock, timezone, or locale
- Tests that make real network calls instead of using stubs/mocks
### Flaky Test Patterns
- Timing-dependent assertions (sleep, setTimeout, waitFor with tight timeouts)
- Assertions on ordering of unordered results (hash keys, Set iteration, async resolution order)
- Tests that depend on external services (APIs, databases) without fallback
- Randomized test data without seed control
### Security Enforcement Tests Missing
- Auth/authz checks in controllers with no test for the "unauthorized" case
- Rate limiting logic with no test proving it actually blocks
- Input sanitization with no test for malicious input
- CSRF/CORS configuration with no integration test
### Coverage Gaps
- New public methods/functions with zero test coverage
- Changed methods where existing tests only cover the old behavior, not the new branch
- Utility functions called from multiple places but tested only indirectly