feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement - maintainability.md (always-on): dead code, DRY, stale comments - security.md (conditional): OWASP deep analysis, auth bypass, injection - performance.md (conditional): N+1 queries, bundle impact, complexity - data-migration.md (conditional): reversibility, lock duration, backfill - api-contract.md (conditional): breaking changes, versioning, error format - red-team.md (conditional): adversarial analysis, cross-cutting concerns All use standard header with JSON output schema and NO FINDINGS fallback.
2026-06-19 16:20:09 +02:00 · 2026-03-30 19:52:36 -07:00
parent 55d80c4ccc
commit f06c38d975
7 changed files with 340 additions and 0 deletions
@@ -0,0 +1,48 @@
+# API Contract Specialist Review Checklist
+
+Scope: When SCOPE_API=true
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"api-contract","summary":"...","fix":"...","fingerprint":"path:line:api-contract","specialist":"api-contract"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+## Categories
+
+### Breaking Changes
+- Removed fields from response bodies (clients may depend on them)
+- Changed field types (string → number, object → array)
+- New required parameters added to existing endpoints
+- Changed HTTP methods (GET → POST) or status codes (200 → 201)
+- Renamed endpoints without maintaining the old path as a redirect/alias
+- Changed authentication requirements (public → authenticated)
+
+### Versioning Strategy
+- Breaking changes made without a version bump (v1 → v2)
+- Multiple versioning strategies mixed in the same API (URL vs header vs query param)
+- Deprecated endpoints without a sunset timeline or migration guide
+- Version-specific logic scattered across controllers instead of centralized
+
+### Error Response Consistency
+- New endpoints returning different error formats than existing ones
+- Error responses missing standard fields (error code, message, details)
+- HTTP status codes that don't match the error type (200 for errors, 500 for validation)
+- Error messages that leak internal implementation details (stack traces, SQL)
+
+### Rate Limiting & Pagination
+- New endpoints missing rate limiting when similar endpoints have it
+- Pagination changes (offset → cursor) without backwards compatibility
+- Changed page sizes or default limits without documentation
+- Missing total count or next-page indicators in paginated responses
+
+### Documentation Drift
+- OpenAPI/Swagger spec not updated to match new endpoints or changed params
+- README or API docs describing old behavior after changes
+- Example requests/responses that no longer work
+- Missing documentation for new endpoints or changed parameters
+
+### Backwards Compatibility
+- Clients on older versions: will they break?
+- Mobile apps that can't force-update: does the API still work for them?
+- Webhook payloads changed without notifying subscribers
+- SDK or client library changes needed to use new features
@@ -0,0 +1,47 @@
+# Data Migration Specialist Review Checklist
+
+Scope: When SCOPE_MIGRATIONS=true
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+## Categories
+
+### Reversibility
+- Can this migration be rolled back without data loss?
+- Is there a corresponding down/rollback migration?
+- Does the rollback actually undo the change or just no-op?
+- Would rolling back break the current application code?
+
+### Data Loss Risk
+- Dropping columns that still contain data (add deprecation period first)
+- Changing column types that truncate data (varchar(255) → varchar(50))
+- Removing tables without verifying no code references them
+- Renaming columns without updating all references (ORM, raw SQL, views)
+- NOT NULL constraints added to columns with existing NULL values (needs backfill first)
+
+### Lock Duration
+- ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL)
+- Adding indexes without CONCURRENTLY on tables with >100K rows
+- Multiple ALTER TABLE statements that could be combined into one lock acquisition
+- Schema changes that acquire exclusive locks during peak traffic hours
+
+### Backfill Strategy
+- New NOT NULL columns without DEFAULT value (requires backfill before constraint)
+- New columns with computed defaults that need batch population
+- Missing backfill script or rake task for existing records
+- Backfill that updates all rows at once instead of batching (locks table)
+
+### Index Creation
+- CREATE INDEX without CONCURRENTLY on production tables
+- Duplicate indexes (new index covers same columns as existing one)
+- Missing indexes on new foreign key columns
+- Partial indexes where a full index would be more useful (or vice versa)
+
+### Multi-Phase Safety
+- Migrations that must be deployed in a specific order with application code
+- Schema changes that break the current running code (deploy code first, then migrate)
+- Migrations that assume a deploy boundary (old code + new schema = crash)
+- Missing feature flag to handle mixed old/new code during rolling deploy
@@ -0,0 +1,45 @@
+# Maintainability Specialist Review Checklist
+
+Scope: Always-on (every review)
+Output: JSON objects, one finding per line. Schema:
+{"severity":"INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"maintainability","summary":"...","fix":"...","fingerprint":"path:line:maintainability","specialist":"maintainability"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+## Categories
+
+### Dead Code & Unused Imports
+- Variables assigned but never read in the changed files
+- Functions/methods defined but never called (check with Grep across the repo)
+- Imports/requires that are no longer referenced after the change
+- Commented-out code blocks (either remove or explain why they exist)
+
+### Magic Numbers & String Coupling
+- Bare numeric literals used in logic (thresholds, limits, retry counts) — should be named constants
+- Error message strings used as query filters or conditionals elsewhere
+- Hardcoded URLs, ports, or hostnames that should be config
+- Duplicated literal values across multiple files
+
+### Stale Comments & Docstrings
+- Comments that describe old behavior after the code was changed in this diff
+- TODO/FIXME comments that reference completed work
+- Docstrings with parameter lists that don't match the current function signature
+- ASCII diagrams in comments that no longer match the code flow
+
+### DRY Violations
+- Similar code blocks (3+ lines) appearing multiple times within the diff
+- Copy-paste patterns where a shared helper would be cleaner
+- Configuration or setup logic duplicated across test files
+- Repeated conditional chains that could be a lookup table or map
+
+### Conditional Side Effects
+- Code paths that branch on a condition but forget a side effect on one branch
+- Log messages that claim an action happened but the action was conditionally skipped
+- State transitions where one branch updates related records but the other doesn't
+- Event emissions that only fire on the happy path, missing error/edge paths
+
+### Module Boundary Violations
+- Reaching into another module's internal implementation (accessing private-by-convention methods)
+- Direct database queries in controllers/views that should go through a service/model
+- Tight coupling between components that should communicate through interfaces
@@ -0,0 +1,51 @@
+# Performance Specialist Review Checklist
+
+Scope: When SCOPE_BACKEND=true OR SCOPE_FRONTEND=true
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"performance","summary":"...","fix":"...","fingerprint":"path:line:performance","specialist":"performance"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+## Categories
+
+### N+1 Queries
+- ActiveRecord/ORM associations traversed in loops without eager loading (.includes, joinedload, include)
+- Database queries inside iteration blocks (each, map, forEach) that could be batched
+- Nested serializers that trigger lazy-loaded associations
+- GraphQL resolvers that query per-field instead of batching (check for DataLoader usage)
+
+### Missing Database Indexes
+- New WHERE clauses on columns without indexes (check migration files or schema)
+- New ORDER BY on non-indexed columns
+- Composite queries (WHERE a AND b) without composite indexes
+- Foreign key columns added without indexes
+
+### Algorithmic Complexity
+- O(n^2) or worse patterns: nested loops over collections, Array.find inside Array.map
+- Repeated linear searches that could use a hash/map/set lookup
+- String concatenation in loops (use join or StringBuilder)
+- Sorting or filtering large collections multiple times when once would suffice
+
+### Bundle Size Impact (Frontend)
+- New production dependencies that are known-heavy (moment.js, lodash full, jquery)
+- Barrel imports (import from 'library') instead of deep imports (import from 'library/specific')
+- Large static assets (images, fonts) committed without optimization
+- Missing code splitting for route-level chunks
+
+### Rendering Performance (Frontend)
+- Fetch waterfalls: sequential API calls that could be parallel (Promise.all)
+- Unnecessary re-renders from unstable references (new objects/arrays in render)
+- Missing React.memo, useMemo, or useCallback on expensive computations
+- Layout thrashing from reading then writing DOM properties in loops
+- Missing loading="lazy" on below-fold images
+
+### Missing Pagination
+- List endpoints that return unbounded results (no LIMIT, no pagination params)
+- Database queries without LIMIT that grow with data volume
+- API responses that embed full nested objects instead of IDs with expansion
+
+### Blocking in Async Contexts
+- Synchronous I/O (file reads, subprocess, HTTP requests) inside async functions
+- time.sleep() / Thread.sleep() inside event-loop-based handlers
+- CPU-intensive computation blocking the main thread without worker offload
@@ -0,0 +1,44 @@
+# Red Team Review
+
+Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+This is NOT a checklist review. This is adversarial analysis.
+
+You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
+
+## Approach
+
+### 1. Attack the Happy Path
+- What happens when the system is under 10x normal load?
+- What happens when two requests hit the same resource simultaneously?
+- What happens when the database is slow (>5s query time)?
+- What happens when an external service returns garbage?
+
+### 2. Find the Silent Failures
+- Error handling that swallows exceptions (catch-all with just a log)
+- Operations that can partially complete (3 of 5 items processed, then crash)
+- State transitions that leave records in inconsistent states on failure
+- Background jobs that fail without alerting anyone
+
+### 3. Exploit Trust Assumptions
+- Data validated on the frontend but not the backend
+- Internal APIs called without authentication (assuming "only our code calls this")
+- Configuration values assumed to be present but not validated
+- File paths or URLs constructed from user input without sanitization
+
+### 4. Break the Edge Cases
+- What happens with the maximum possible input size?
+- What happens with zero items, empty strings, null values?
+- What happens on the first run ever (no existing data)?
+- What happens when the user clicks the button twice in 100ms?
+
+### 5. Find What the Other Specialists Missed
+- Review each specialist's findings. What's the gap between their categories?
+- Look for cross-category issues (e.g., a performance issue that's also a security issue)
+- Look for issues at integration boundaries (where two systems meet)
+- Look for issues that only manifest in specific deployment configurations
@@ -0,0 +1,60 @@
+# Security Specialist Review Checklist
+
+Scope: When SCOPE_AUTH=true OR (SCOPE_BACKEND=true AND diff > 100 lines)
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"security","summary":"...","fix":"...","fingerprint":"path:line:security","specialist":"security"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+This checklist goes deeper than the main CRITICAL pass. The main agent already checks SQL injection, race conditions, LLM trust, and enum completeness. This specialist focuses on auth/authz patterns, cryptographic misuse, and attack surface expansion.
+
+## Categories
+
+### Input Validation at Trust Boundaries
+- User input accepted without validation at controller/handler level
+- Query parameters used directly in database queries or file paths
+- Request body fields accepted without type checking or schema validation
+- File uploads without type/size/content validation
+- Webhook payloads processed without signature verification
+
+### Auth & Authorization Bypass
+- Endpoints missing authentication middleware (check route definitions)
+- Authorization checks that default to "allow" instead of "deny"
+- Role escalation paths (user can modify their own role/permissions)
+- Direct object reference vulnerabilities (user A accesses user B's data by changing an ID)
+- Session fixation or session hijacking opportunities
+- Token/API key validation that doesn't check expiration
+
+### Injection Vectors (beyond SQL)
+- Command injection via subprocess calls with user-controlled arguments
+- Template injection (Jinja2, ERB, Handlebars) with user input
+- LDAP injection in directory queries
+- SSRF via user-controlled URLs (fetch, redirect, webhook targets)
+- Path traversal via user-controlled file paths (../../etc/passwd)
+- Header injection via user-controlled values in HTTP headers
+
+### Cryptographic Misuse
+- Weak hashing algorithms (MD5, SHA1) for security-sensitive operations
+- Predictable randomness (Math.random, rand()) for tokens or secrets
+- Non-constant-time comparisons (==) on secrets, tokens, or digests
+- Hardcoded encryption keys or IVs
+- Missing salt in password hashing
+
+### Secrets Exposure
+- API keys, tokens, or passwords in source code (even in comments)
+- Secrets logged in application logs or error messages
+- Credentials in URLs (query parameters or basic auth in URL)
+- Sensitive data in error responses returned to users
+- PII stored in plaintext when encryption is expected
+
+### XSS via Escape Hatches
+- Rails: .html_safe, raw() on user-controlled data
+- React: dangerouslySetInnerHTML with user content
+- Vue: v-html with user content
+- Django: |safe, mark_safe() on user input
+- General: innerHTML assignment with unsanitized data
+
+### Deserialization
+- Deserializing untrusted data (pickle, Marshal, YAML.load, JSON.parse of executable types)
+- Accepting serialized objects from user input or external APIs without schema validation
@@ -0,0 +1,45 @@
+# Testing Specialist Review Checklist
+
+Scope: Always-on (every review)
+Output: JSON objects, one finding per line. Schema:
+{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"testing","summary":"...","fix":"...","fingerprint":"path:line:testing","specialist":"testing"}
+If no findings: output `NO FINDINGS` and nothing else.
+
+---
+
+## Categories
+
+### Missing Negative-Path Tests
+- New code paths that handle errors, rejections, or invalid input with NO corresponding test
+- Guard clauses and early returns that are untested
+- Error branches in try/catch, rescue, or error boundaries with no failure-path test
+- Permission/auth checks that are asserted in code but never tested for the "denied" case
+
+### Missing Edge-Case Coverage
+- Boundary values: zero, negative, max-int, empty string, empty array, nil/null/undefined
+- Single-element collections (off-by-one on loops)
+- Unicode and special characters in user-facing inputs
+- Concurrent access patterns with no race-condition test
+
+### Test Isolation Violations
+- Tests sharing mutable state (class variables, global singletons, DB records not cleaned up)
+- Order-dependent tests (pass in sequence, fail when randomized)
+- Tests that depend on system clock, timezone, or locale
+- Tests that make real network calls instead of using stubs/mocks
+
+### Flaky Test Patterns
+- Timing-dependent assertions (sleep, setTimeout, waitFor with tight timeouts)
+- Assertions on ordering of unordered results (hash keys, Set iteration, async resolution order)
+- Tests that depend on external services (APIs, databases) without fallback
+- Randomized test data without seed control
+
+### Security Enforcement Tests Missing
+- Auth/authz checks in controllers with no test for the "unauthorized" case
+- Rate limiting logic with no test proving it actually blocks
+- Input sanitization with no test for malicious input
+- CSRF/CORS configuration with no integration test
+
+### Coverage Gaps
+- New public methods/functions with zero test coverage
+- Changed methods where existing tests only cover the old behavior, not the new branch
+- Utility functions called from multiple places but tested only indirectly