From f06c38d975e1a4d2618df79f1fc99bac5e0ac57a Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 30 Mar 2026 19:52:36 -0700 Subject: [PATCH] feat: add 7 specialist checklist files for Review Army - testing.md (always-on): coverage gaps, flaky patterns, security enforcement - maintainability.md (always-on): dead code, DRY, stale comments - security.md (conditional): OWASP deep analysis, auth bypass, injection - performance.md (conditional): N+1 queries, bundle impact, complexity - data-migration.md (conditional): reversibility, lock duration, backfill - api-contract.md (conditional): breaking changes, versioning, error format - red-team.md (conditional): adversarial analysis, cross-cutting concerns All use standard header with JSON output schema and NO FINDINGS fallback. --- review/specialists/api-contract.md | 48 +++++++++++++++++++++ review/specialists/data-migration.md | 47 +++++++++++++++++++++ review/specialists/maintainability.md | 45 ++++++++++++++++++++ review/specialists/performance.md | 51 +++++++++++++++++++++++ review/specialists/red-team.md | 44 ++++++++++++++++++++ review/specialists/security.md | 60 +++++++++++++++++++++++++++ review/specialists/testing.md | 45 ++++++++++++++++++++ 7 files changed, 340 insertions(+) create mode 100644 review/specialists/api-contract.md create mode 100644 review/specialists/data-migration.md create mode 100644 review/specialists/maintainability.md create mode 100644 review/specialists/performance.md create mode 100644 review/specialists/red-team.md create mode 100644 review/specialists/security.md create mode 100644 review/specialists/testing.md diff --git a/review/specialists/api-contract.md b/review/specialists/api-contract.md new file mode 100644 index 00000000..1fc8ab83 --- /dev/null +++ b/review/specialists/api-contract.md @@ -0,0 +1,48 @@ +# API Contract Specialist Review Checklist + +Scope: When SCOPE_API=true +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"api-contract","summary":"...","fix":"...","fingerprint":"path:line:api-contract","specialist":"api-contract"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +## Categories + +### Breaking Changes +- Removed fields from response bodies (clients may depend on them) +- Changed field types (string → number, object → array) +- New required parameters added to existing endpoints +- Changed HTTP methods (GET → POST) or status codes (200 → 201) +- Renamed endpoints without maintaining the old path as a redirect/alias +- Changed authentication requirements (public → authenticated) + +### Versioning Strategy +- Breaking changes made without a version bump (v1 → v2) +- Multiple versioning strategies mixed in the same API (URL vs header vs query param) +- Deprecated endpoints without a sunset timeline or migration guide +- Version-specific logic scattered across controllers instead of centralized + +### Error Response Consistency +- New endpoints returning different error formats than existing ones +- Error responses missing standard fields (error code, message, details) +- HTTP status codes that don't match the error type (200 for errors, 500 for validation) +- Error messages that leak internal implementation details (stack traces, SQL) + +### Rate Limiting & Pagination +- New endpoints missing rate limiting when similar endpoints have it +- Pagination changes (offset → cursor) without backwards compatibility +- Changed page sizes or default limits without documentation +- Missing total count or next-page indicators in paginated responses + +### Documentation Drift +- OpenAPI/Swagger spec not updated to match new endpoints or changed params +- README or API docs describing old behavior after changes +- Example requests/responses that no longer work +- Missing documentation for new endpoints or changed parameters + +### Backwards Compatibility +- Clients on older versions: will they break? +- Mobile apps that can't force-update: does the API still work for them? +- Webhook payloads changed without notifying subscribers +- SDK or client library changes needed to use new features diff --git a/review/specialists/data-migration.md b/review/specialists/data-migration.md new file mode 100644 index 00000000..437194f6 --- /dev/null +++ b/review/specialists/data-migration.md @@ -0,0 +1,47 @@ +# Data Migration Specialist Review Checklist + +Scope: When SCOPE_MIGRATIONS=true +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +## Categories + +### Reversibility +- Can this migration be rolled back without data loss? +- Is there a corresponding down/rollback migration? +- Does the rollback actually undo the change or just no-op? +- Would rolling back break the current application code? + +### Data Loss Risk +- Dropping columns that still contain data (add deprecation period first) +- Changing column types that truncate data (varchar(255) → varchar(50)) +- Removing tables without verifying no code references them +- Renaming columns without updating all references (ORM, raw SQL, views) +- NOT NULL constraints added to columns with existing NULL values (needs backfill first) + +### Lock Duration +- ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL) +- Adding indexes without CONCURRENTLY on tables with >100K rows +- Multiple ALTER TABLE statements that could be combined into one lock acquisition +- Schema changes that acquire exclusive locks during peak traffic hours + +### Backfill Strategy +- New NOT NULL columns without DEFAULT value (requires backfill before constraint) +- New columns with computed defaults that need batch population +- Missing backfill script or rake task for existing records +- Backfill that updates all rows at once instead of batching (locks table) + +### Index Creation +- CREATE INDEX without CONCURRENTLY on production tables +- Duplicate indexes (new index covers same columns as existing one) +- Missing indexes on new foreign key columns +- Partial indexes where a full index would be more useful (or vice versa) + +### Multi-Phase Safety +- Migrations that must be deployed in a specific order with application code +- Schema changes that break the current running code (deploy code first, then migrate) +- Migrations that assume a deploy boundary (old code + new schema = crash) +- Missing feature flag to handle mixed old/new code during rolling deploy diff --git a/review/specialists/maintainability.md b/review/specialists/maintainability.md new file mode 100644 index 00000000..258d0f2f --- /dev/null +++ b/review/specialists/maintainability.md @@ -0,0 +1,45 @@ +# Maintainability Specialist Review Checklist + +Scope: Always-on (every review) +Output: JSON objects, one finding per line. Schema: +{"severity":"INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"maintainability","summary":"...","fix":"...","fingerprint":"path:line:maintainability","specialist":"maintainability"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +## Categories + +### Dead Code & Unused Imports +- Variables assigned but never read in the changed files +- Functions/methods defined but never called (check with Grep across the repo) +- Imports/requires that are no longer referenced after the change +- Commented-out code blocks (either remove or explain why they exist) + +### Magic Numbers & String Coupling +- Bare numeric literals used in logic (thresholds, limits, retry counts) — should be named constants +- Error message strings used as query filters or conditionals elsewhere +- Hardcoded URLs, ports, or hostnames that should be config +- Duplicated literal values across multiple files + +### Stale Comments & Docstrings +- Comments that describe old behavior after the code was changed in this diff +- TODO/FIXME comments that reference completed work +- Docstrings with parameter lists that don't match the current function signature +- ASCII diagrams in comments that no longer match the code flow + +### DRY Violations +- Similar code blocks (3+ lines) appearing multiple times within the diff +- Copy-paste patterns where a shared helper would be cleaner +- Configuration or setup logic duplicated across test files +- Repeated conditional chains that could be a lookup table or map + +### Conditional Side Effects +- Code paths that branch on a condition but forget a side effect on one branch +- Log messages that claim an action happened but the action was conditionally skipped +- State transitions where one branch updates related records but the other doesn't +- Event emissions that only fire on the happy path, missing error/edge paths + +### Module Boundary Violations +- Reaching into another module's internal implementation (accessing private-by-convention methods) +- Direct database queries in controllers/views that should go through a service/model +- Tight coupling between components that should communicate through interfaces diff --git a/review/specialists/performance.md b/review/specialists/performance.md new file mode 100644 index 00000000..78a1e793 --- /dev/null +++ b/review/specialists/performance.md @@ -0,0 +1,51 @@ +# Performance Specialist Review Checklist + +Scope: When SCOPE_BACKEND=true OR SCOPE_FRONTEND=true +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"performance","summary":"...","fix":"...","fingerprint":"path:line:performance","specialist":"performance"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +## Categories + +### N+1 Queries +- ActiveRecord/ORM associations traversed in loops without eager loading (.includes, joinedload, include) +- Database queries inside iteration blocks (each, map, forEach) that could be batched +- Nested serializers that trigger lazy-loaded associations +- GraphQL resolvers that query per-field instead of batching (check for DataLoader usage) + +### Missing Database Indexes +- New WHERE clauses on columns without indexes (check migration files or schema) +- New ORDER BY on non-indexed columns +- Composite queries (WHERE a AND b) without composite indexes +- Foreign key columns added without indexes + +### Algorithmic Complexity +- O(n^2) or worse patterns: nested loops over collections, Array.find inside Array.map +- Repeated linear searches that could use a hash/map/set lookup +- String concatenation in loops (use join or StringBuilder) +- Sorting or filtering large collections multiple times when once would suffice + +### Bundle Size Impact (Frontend) +- New production dependencies that are known-heavy (moment.js, lodash full, jquery) +- Barrel imports (import from 'library') instead of deep imports (import from 'library/specific') +- Large static assets (images, fonts) committed without optimization +- Missing code splitting for route-level chunks + +### Rendering Performance (Frontend) +- Fetch waterfalls: sequential API calls that could be parallel (Promise.all) +- Unnecessary re-renders from unstable references (new objects/arrays in render) +- Missing React.memo, useMemo, or useCallback on expensive computations +- Layout thrashing from reading then writing DOM properties in loops +- Missing loading="lazy" on below-fold images + +### Missing Pagination +- List endpoints that return unbounded results (no LIMIT, no pagination params) +- Database queries without LIMIT that grow with data volume +- API responses that embed full nested objects instead of IDs with expansion + +### Blocking in Async Contexts +- Synchronous I/O (file reads, subprocess, HTTP requests) inside async functions +- time.sleep() / Thread.sleep() inside event-loop-based handlers +- CPU-intensive computation blocking the main thread without worker offload diff --git a/review/specialists/red-team.md b/review/specialists/red-team.md new file mode 100644 index 00000000..38a72182 --- /dev/null +++ b/review/specialists/red-team.md @@ -0,0 +1,44 @@ +# Red Team Review + +Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists. +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +This is NOT a checklist review. This is adversarial analysis. + +You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously. + +## Approach + +### 1. Attack the Happy Path +- What happens when the system is under 10x normal load? +- What happens when two requests hit the same resource simultaneously? +- What happens when the database is slow (>5s query time)? +- What happens when an external service returns garbage? + +### 2. Find the Silent Failures +- Error handling that swallows exceptions (catch-all with just a log) +- Operations that can partially complete (3 of 5 items processed, then crash) +- State transitions that leave records in inconsistent states on failure +- Background jobs that fail without alerting anyone + +### 3. Exploit Trust Assumptions +- Data validated on the frontend but not the backend +- Internal APIs called without authentication (assuming "only our code calls this") +- Configuration values assumed to be present but not validated +- File paths or URLs constructed from user input without sanitization + +### 4. Break the Edge Cases +- What happens with the maximum possible input size? +- What happens with zero items, empty strings, null values? +- What happens on the first run ever (no existing data)? +- What happens when the user clicks the button twice in 100ms? + +### 5. Find What the Other Specialists Missed +- Review each specialist's findings. What's the gap between their categories? +- Look for cross-category issues (e.g., a performance issue that's also a security issue) +- Look for issues at integration boundaries (where two systems meet) +- Look for issues that only manifest in specific deployment configurations diff --git a/review/specialists/security.md b/review/specialists/security.md new file mode 100644 index 00000000..81136dd8 --- /dev/null +++ b/review/specialists/security.md @@ -0,0 +1,60 @@ +# Security Specialist Review Checklist + +Scope: When SCOPE_AUTH=true OR (SCOPE_BACKEND=true AND diff > 100 lines) +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"security","summary":"...","fix":"...","fingerprint":"path:line:security","specialist":"security"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +This checklist goes deeper than the main CRITICAL pass. The main agent already checks SQL injection, race conditions, LLM trust, and enum completeness. This specialist focuses on auth/authz patterns, cryptographic misuse, and attack surface expansion. + +## Categories + +### Input Validation at Trust Boundaries +- User input accepted without validation at controller/handler level +- Query parameters used directly in database queries or file paths +- Request body fields accepted without type checking or schema validation +- File uploads without type/size/content validation +- Webhook payloads processed without signature verification + +### Auth & Authorization Bypass +- Endpoints missing authentication middleware (check route definitions) +- Authorization checks that default to "allow" instead of "deny" +- Role escalation paths (user can modify their own role/permissions) +- Direct object reference vulnerabilities (user A accesses user B's data by changing an ID) +- Session fixation or session hijacking opportunities +- Token/API key validation that doesn't check expiration + +### Injection Vectors (beyond SQL) +- Command injection via subprocess calls with user-controlled arguments +- Template injection (Jinja2, ERB, Handlebars) with user input +- LDAP injection in directory queries +- SSRF via user-controlled URLs (fetch, redirect, webhook targets) +- Path traversal via user-controlled file paths (../../etc/passwd) +- Header injection via user-controlled values in HTTP headers + +### Cryptographic Misuse +- Weak hashing algorithms (MD5, SHA1) for security-sensitive operations +- Predictable randomness (Math.random, rand()) for tokens or secrets +- Non-constant-time comparisons (==) on secrets, tokens, or digests +- Hardcoded encryption keys or IVs +- Missing salt in password hashing + +### Secrets Exposure +- API keys, tokens, or passwords in source code (even in comments) +- Secrets logged in application logs or error messages +- Credentials in URLs (query parameters or basic auth in URL) +- Sensitive data in error responses returned to users +- PII stored in plaintext when encryption is expected + +### XSS via Escape Hatches +- Rails: .html_safe, raw() on user-controlled data +- React: dangerouslySetInnerHTML with user content +- Vue: v-html with user content +- Django: |safe, mark_safe() on user input +- General: innerHTML assignment with unsanitized data + +### Deserialization +- Deserializing untrusted data (pickle, Marshal, YAML.load, JSON.parse of executable types) +- Accepting serialized objects from user input or external APIs without schema validation diff --git a/review/specialists/testing.md b/review/specialists/testing.md new file mode 100644 index 00000000..a6076cf6 --- /dev/null +++ b/review/specialists/testing.md @@ -0,0 +1,45 @@ +# Testing Specialist Review Checklist + +Scope: Always-on (every review) +Output: JSON objects, one finding per line. Schema: +{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"testing","summary":"...","fix":"...","fingerprint":"path:line:testing","specialist":"testing"} +If no findings: output `NO FINDINGS` and nothing else. + +--- + +## Categories + +### Missing Negative-Path Tests +- New code paths that handle errors, rejections, or invalid input with NO corresponding test +- Guard clauses and early returns that are untested +- Error branches in try/catch, rescue, or error boundaries with no failure-path test +- Permission/auth checks that are asserted in code but never tested for the "denied" case + +### Missing Edge-Case Coverage +- Boundary values: zero, negative, max-int, empty string, empty array, nil/null/undefined +- Single-element collections (off-by-one on loops) +- Unicode and special characters in user-facing inputs +- Concurrent access patterns with no race-condition test + +### Test Isolation Violations +- Tests sharing mutable state (class variables, global singletons, DB records not cleaned up) +- Order-dependent tests (pass in sequence, fail when randomized) +- Tests that depend on system clock, timezone, or locale +- Tests that make real network calls instead of using stubs/mocks + +### Flaky Test Patterns +- Timing-dependent assertions (sleep, setTimeout, waitFor with tight timeouts) +- Assertions on ordering of unordered results (hash keys, Set iteration, async resolution order) +- Tests that depend on external services (APIs, databases) without fallback +- Randomized test data without seed control + +### Security Enforcement Tests Missing +- Auth/authz checks in controllers with no test for the "unauthorized" case +- Rate limiting logic with no test proving it actually blocks +- Input sanitization with no test for malicious input +- CSRF/CORS configuration with no integration test + +### Coverage Gaps +- New public methods/functions with zero test coverage +- Changed methods where existing tests only cover the old behavior, not the new branch +- Utility functions called from multiple places but tested only indirectly