mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
feat: add 7 specialist checklist files for Review Army
- testing.md (always-on): coverage gaps, flaky patterns, security enforcement - maintainability.md (always-on): dead code, DRY, stale comments - security.md (conditional): OWASP deep analysis, auth bypass, injection - performance.md (conditional): N+1 queries, bundle impact, complexity - data-migration.md (conditional): reversibility, lock duration, backfill - api-contract.md (conditional): breaking changes, versioning, error format - red-team.md (conditional): adversarial analysis, cross-cutting concerns All use standard header with JSON output schema and NO FINDINGS fallback.
This commit is contained in:
@@ -0,0 +1,48 @@
|
||||
# API Contract Specialist Review Checklist
|
||||
|
||||
Scope: When SCOPE_API=true
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"api-contract","summary":"...","fix":"...","fingerprint":"path:line:api-contract","specialist":"api-contract"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
### Breaking Changes
|
||||
- Removed fields from response bodies (clients may depend on them)
|
||||
- Changed field types (string → number, object → array)
|
||||
- New required parameters added to existing endpoints
|
||||
- Changed HTTP methods (GET → POST) or status codes (200 → 201)
|
||||
- Renamed endpoints without maintaining the old path as a redirect/alias
|
||||
- Changed authentication requirements (public → authenticated)
|
||||
|
||||
### Versioning Strategy
|
||||
- Breaking changes made without a version bump (v1 → v2)
|
||||
- Multiple versioning strategies mixed in the same API (URL vs header vs query param)
|
||||
- Deprecated endpoints without a sunset timeline or migration guide
|
||||
- Version-specific logic scattered across controllers instead of centralized
|
||||
|
||||
### Error Response Consistency
|
||||
- New endpoints returning different error formats than existing ones
|
||||
- Error responses missing standard fields (error code, message, details)
|
||||
- HTTP status codes that don't match the error type (200 for errors, 500 for validation)
|
||||
- Error messages that leak internal implementation details (stack traces, SQL)
|
||||
|
||||
### Rate Limiting & Pagination
|
||||
- New endpoints missing rate limiting when similar endpoints have it
|
||||
- Pagination changes (offset → cursor) without backwards compatibility
|
||||
- Changed page sizes or default limits without documentation
|
||||
- Missing total count or next-page indicators in paginated responses
|
||||
|
||||
### Documentation Drift
|
||||
- OpenAPI/Swagger spec not updated to match new endpoints or changed params
|
||||
- README or API docs describing old behavior after changes
|
||||
- Example requests/responses that no longer work
|
||||
- Missing documentation for new endpoints or changed parameters
|
||||
|
||||
### Backwards Compatibility
|
||||
- Clients on older versions: will they break?
|
||||
- Mobile apps that can't force-update: does the API still work for them?
|
||||
- Webhook payloads changed without notifying subscribers
|
||||
- SDK or client library changes needed to use new features
|
||||
@@ -0,0 +1,47 @@
|
||||
# Data Migration Specialist Review Checklist
|
||||
|
||||
Scope: When SCOPE_MIGRATIONS=true
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
### Reversibility
|
||||
- Can this migration be rolled back without data loss?
|
||||
- Is there a corresponding down/rollback migration?
|
||||
- Does the rollback actually undo the change or just no-op?
|
||||
- Would rolling back break the current application code?
|
||||
|
||||
### Data Loss Risk
|
||||
- Dropping columns that still contain data (add deprecation period first)
|
||||
- Changing column types that truncate data (varchar(255) → varchar(50))
|
||||
- Removing tables without verifying no code references them
|
||||
- Renaming columns without updating all references (ORM, raw SQL, views)
|
||||
- NOT NULL constraints added to columns with existing NULL values (needs backfill first)
|
||||
|
||||
### Lock Duration
|
||||
- ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL)
|
||||
- Adding indexes without CONCURRENTLY on tables with >100K rows
|
||||
- Multiple ALTER TABLE statements that could be combined into one lock acquisition
|
||||
- Schema changes that acquire exclusive locks during peak traffic hours
|
||||
|
||||
### Backfill Strategy
|
||||
- New NOT NULL columns without DEFAULT value (requires backfill before constraint)
|
||||
- New columns with computed defaults that need batch population
|
||||
- Missing backfill script or rake task for existing records
|
||||
- Backfill that updates all rows at once instead of batching (locks table)
|
||||
|
||||
### Index Creation
|
||||
- CREATE INDEX without CONCURRENTLY on production tables
|
||||
- Duplicate indexes (new index covers same columns as existing one)
|
||||
- Missing indexes on new foreign key columns
|
||||
- Partial indexes where a full index would be more useful (or vice versa)
|
||||
|
||||
### Multi-Phase Safety
|
||||
- Migrations that must be deployed in a specific order with application code
|
||||
- Schema changes that break the current running code (deploy code first, then migrate)
|
||||
- Migrations that assume a deploy boundary (old code + new schema = crash)
|
||||
- Missing feature flag to handle mixed old/new code during rolling deploy
|
||||
@@ -0,0 +1,45 @@
|
||||
# Maintainability Specialist Review Checklist
|
||||
|
||||
Scope: Always-on (every review)
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"maintainability","summary":"...","fix":"...","fingerprint":"path:line:maintainability","specialist":"maintainability"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
### Dead Code & Unused Imports
|
||||
- Variables assigned but never read in the changed files
|
||||
- Functions/methods defined but never called (check with Grep across the repo)
|
||||
- Imports/requires that are no longer referenced after the change
|
||||
- Commented-out code blocks (either remove or explain why they exist)
|
||||
|
||||
### Magic Numbers & String Coupling
|
||||
- Bare numeric literals used in logic (thresholds, limits, retry counts) — should be named constants
|
||||
- Error message strings used as query filters or conditionals elsewhere
|
||||
- Hardcoded URLs, ports, or hostnames that should be config
|
||||
- Duplicated literal values across multiple files
|
||||
|
||||
### Stale Comments & Docstrings
|
||||
- Comments that describe old behavior after the code was changed in this diff
|
||||
- TODO/FIXME comments that reference completed work
|
||||
- Docstrings with parameter lists that don't match the current function signature
|
||||
- ASCII diagrams in comments that no longer match the code flow
|
||||
|
||||
### DRY Violations
|
||||
- Similar code blocks (3+ lines) appearing multiple times within the diff
|
||||
- Copy-paste patterns where a shared helper would be cleaner
|
||||
- Configuration or setup logic duplicated across test files
|
||||
- Repeated conditional chains that could be a lookup table or map
|
||||
|
||||
### Conditional Side Effects
|
||||
- Code paths that branch on a condition but forget a side effect on one branch
|
||||
- Log messages that claim an action happened but the action was conditionally skipped
|
||||
- State transitions where one branch updates related records but the other doesn't
|
||||
- Event emissions that only fire on the happy path, missing error/edge paths
|
||||
|
||||
### Module Boundary Violations
|
||||
- Reaching into another module's internal implementation (accessing private-by-convention methods)
|
||||
- Direct database queries in controllers/views that should go through a service/model
|
||||
- Tight coupling between components that should communicate through interfaces
|
||||
@@ -0,0 +1,51 @@
|
||||
# Performance Specialist Review Checklist
|
||||
|
||||
Scope: When SCOPE_BACKEND=true OR SCOPE_FRONTEND=true
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"performance","summary":"...","fix":"...","fingerprint":"path:line:performance","specialist":"performance"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
### N+1 Queries
|
||||
- ActiveRecord/ORM associations traversed in loops without eager loading (.includes, joinedload, include)
|
||||
- Database queries inside iteration blocks (each, map, forEach) that could be batched
|
||||
- Nested serializers that trigger lazy-loaded associations
|
||||
- GraphQL resolvers that query per-field instead of batching (check for DataLoader usage)
|
||||
|
||||
### Missing Database Indexes
|
||||
- New WHERE clauses on columns without indexes (check migration files or schema)
|
||||
- New ORDER BY on non-indexed columns
|
||||
- Composite queries (WHERE a AND b) without composite indexes
|
||||
- Foreign key columns added without indexes
|
||||
|
||||
### Algorithmic Complexity
|
||||
- O(n^2) or worse patterns: nested loops over collections, Array.find inside Array.map
|
||||
- Repeated linear searches that could use a hash/map/set lookup
|
||||
- String concatenation in loops (use join or StringBuilder)
|
||||
- Sorting or filtering large collections multiple times when once would suffice
|
||||
|
||||
### Bundle Size Impact (Frontend)
|
||||
- New production dependencies that are known-heavy (moment.js, lodash full, jquery)
|
||||
- Barrel imports (import from 'library') instead of deep imports (import from 'library/specific')
|
||||
- Large static assets (images, fonts) committed without optimization
|
||||
- Missing code splitting for route-level chunks
|
||||
|
||||
### Rendering Performance (Frontend)
|
||||
- Fetch waterfalls: sequential API calls that could be parallel (Promise.all)
|
||||
- Unnecessary re-renders from unstable references (new objects/arrays in render)
|
||||
- Missing React.memo, useMemo, or useCallback on expensive computations
|
||||
- Layout thrashing from reading then writing DOM properties in loops
|
||||
- Missing loading="lazy" on below-fold images
|
||||
|
||||
### Missing Pagination
|
||||
- List endpoints that return unbounded results (no LIMIT, no pagination params)
|
||||
- Database queries without LIMIT that grow with data volume
|
||||
- API responses that embed full nested objects instead of IDs with expansion
|
||||
|
||||
### Blocking in Async Contexts
|
||||
- Synchronous I/O (file reads, subprocess, HTTP requests) inside async functions
|
||||
- time.sleep() / Thread.sleep() inside event-loop-based handlers
|
||||
- CPU-intensive computation blocking the main thread without worker offload
|
||||
@@ -0,0 +1,44 @@
|
||||
# Red Team Review
|
||||
|
||||
Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
This is NOT a checklist review. This is adversarial analysis.
|
||||
|
||||
You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
|
||||
|
||||
## Approach
|
||||
|
||||
### 1. Attack the Happy Path
|
||||
- What happens when the system is under 10x normal load?
|
||||
- What happens when two requests hit the same resource simultaneously?
|
||||
- What happens when the database is slow (>5s query time)?
|
||||
- What happens when an external service returns garbage?
|
||||
|
||||
### 2. Find the Silent Failures
|
||||
- Error handling that swallows exceptions (catch-all with just a log)
|
||||
- Operations that can partially complete (3 of 5 items processed, then crash)
|
||||
- State transitions that leave records in inconsistent states on failure
|
||||
- Background jobs that fail without alerting anyone
|
||||
|
||||
### 3. Exploit Trust Assumptions
|
||||
- Data validated on the frontend but not the backend
|
||||
- Internal APIs called without authentication (assuming "only our code calls this")
|
||||
- Configuration values assumed to be present but not validated
|
||||
- File paths or URLs constructed from user input without sanitization
|
||||
|
||||
### 4. Break the Edge Cases
|
||||
- What happens with the maximum possible input size?
|
||||
- What happens with zero items, empty strings, null values?
|
||||
- What happens on the first run ever (no existing data)?
|
||||
- What happens when the user clicks the button twice in 100ms?
|
||||
|
||||
### 5. Find What the Other Specialists Missed
|
||||
- Review each specialist's findings. What's the gap between their categories?
|
||||
- Look for cross-category issues (e.g., a performance issue that's also a security issue)
|
||||
- Look for issues at integration boundaries (where two systems meet)
|
||||
- Look for issues that only manifest in specific deployment configurations
|
||||
@@ -0,0 +1,60 @@
|
||||
# Security Specialist Review Checklist
|
||||
|
||||
Scope: When SCOPE_AUTH=true OR (SCOPE_BACKEND=true AND diff > 100 lines)
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"security","summary":"...","fix":"...","fingerprint":"path:line:security","specialist":"security"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
This checklist goes deeper than the main CRITICAL pass. The main agent already checks SQL injection, race conditions, LLM trust, and enum completeness. This specialist focuses on auth/authz patterns, cryptographic misuse, and attack surface expansion.
|
||||
|
||||
## Categories
|
||||
|
||||
### Input Validation at Trust Boundaries
|
||||
- User input accepted without validation at controller/handler level
|
||||
- Query parameters used directly in database queries or file paths
|
||||
- Request body fields accepted without type checking or schema validation
|
||||
- File uploads without type/size/content validation
|
||||
- Webhook payloads processed without signature verification
|
||||
|
||||
### Auth & Authorization Bypass
|
||||
- Endpoints missing authentication middleware (check route definitions)
|
||||
- Authorization checks that default to "allow" instead of "deny"
|
||||
- Role escalation paths (user can modify their own role/permissions)
|
||||
- Direct object reference vulnerabilities (user A accesses user B's data by changing an ID)
|
||||
- Session fixation or session hijacking opportunities
|
||||
- Token/API key validation that doesn't check expiration
|
||||
|
||||
### Injection Vectors (beyond SQL)
|
||||
- Command injection via subprocess calls with user-controlled arguments
|
||||
- Template injection (Jinja2, ERB, Handlebars) with user input
|
||||
- LDAP injection in directory queries
|
||||
- SSRF via user-controlled URLs (fetch, redirect, webhook targets)
|
||||
- Path traversal via user-controlled file paths (../../etc/passwd)
|
||||
- Header injection via user-controlled values in HTTP headers
|
||||
|
||||
### Cryptographic Misuse
|
||||
- Weak hashing algorithms (MD5, SHA1) for security-sensitive operations
|
||||
- Predictable randomness (Math.random, rand()) for tokens or secrets
|
||||
- Non-constant-time comparisons (==) on secrets, tokens, or digests
|
||||
- Hardcoded encryption keys or IVs
|
||||
- Missing salt in password hashing
|
||||
|
||||
### Secrets Exposure
|
||||
- API keys, tokens, or passwords in source code (even in comments)
|
||||
- Secrets logged in application logs or error messages
|
||||
- Credentials in URLs (query parameters or basic auth in URL)
|
||||
- Sensitive data in error responses returned to users
|
||||
- PII stored in plaintext when encryption is expected
|
||||
|
||||
### XSS via Escape Hatches
|
||||
- Rails: .html_safe, raw() on user-controlled data
|
||||
- React: dangerouslySetInnerHTML with user content
|
||||
- Vue: v-html with user content
|
||||
- Django: |safe, mark_safe() on user input
|
||||
- General: innerHTML assignment with unsanitized data
|
||||
|
||||
### Deserialization
|
||||
- Deserializing untrusted data (pickle, Marshal, YAML.load, JSON.parse of executable types)
|
||||
- Accepting serialized objects from user input or external APIs without schema validation
|
||||
@@ -0,0 +1,45 @@
|
||||
# Testing Specialist Review Checklist
|
||||
|
||||
Scope: Always-on (every review)
|
||||
Output: JSON objects, one finding per line. Schema:
|
||||
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"testing","summary":"...","fix":"...","fingerprint":"path:line:testing","specialist":"testing"}
|
||||
If no findings: output `NO FINDINGS` and nothing else.
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
### Missing Negative-Path Tests
|
||||
- New code paths that handle errors, rejections, or invalid input with NO corresponding test
|
||||
- Guard clauses and early returns that are untested
|
||||
- Error branches in try/catch, rescue, or error boundaries with no failure-path test
|
||||
- Permission/auth checks that are asserted in code but never tested for the "denied" case
|
||||
|
||||
### Missing Edge-Case Coverage
|
||||
- Boundary values: zero, negative, max-int, empty string, empty array, nil/null/undefined
|
||||
- Single-element collections (off-by-one on loops)
|
||||
- Unicode and special characters in user-facing inputs
|
||||
- Concurrent access patterns with no race-condition test
|
||||
|
||||
### Test Isolation Violations
|
||||
- Tests sharing mutable state (class variables, global singletons, DB records not cleaned up)
|
||||
- Order-dependent tests (pass in sequence, fail when randomized)
|
||||
- Tests that depend on system clock, timezone, or locale
|
||||
- Tests that make real network calls instead of using stubs/mocks
|
||||
|
||||
### Flaky Test Patterns
|
||||
- Timing-dependent assertions (sleep, setTimeout, waitFor with tight timeouts)
|
||||
- Assertions on ordering of unordered results (hash keys, Set iteration, async resolution order)
|
||||
- Tests that depend on external services (APIs, databases) without fallback
|
||||
- Randomized test data without seed control
|
||||
|
||||
### Security Enforcement Tests Missing
|
||||
- Auth/authz checks in controllers with no test for the "unauthorized" case
|
||||
- Rate limiting logic with no test proving it actually blocks
|
||||
- Input sanitization with no test for malicious input
|
||||
- CSRF/CORS configuration with no integration test
|
||||
|
||||
### Coverage Gaps
|
||||
- New public methods/functions with zero test coverage
|
||||
- Changed methods where existing tests only cover the old behavior, not the new branch
|
||||
- Utility functions called from multiple places but tested only indirectly
|
||||
Reference in New Issue
Block a user