Files
shannon/.claude/commands/review.md
Arjun Malleswaran 78a0a61208 Feat/temporal (#46)
* refactor: modularize claude-executor and extract shared utilities

- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts

* feat: add Claude Code custom commands for debug and review

* feat: add Temporal integration foundation (phase 1-2)

- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence

* feat: add Temporal activities for agent execution (phase 3)

- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state

* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)

* feat: add Temporal worker, client, and query tools (phase 5)

- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker

* feat: fix Docker worker container setup

- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable

* fix: add report assembly step to Temporal workflow

- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header

* refactor: consolidate Docker setup to root docker-compose.yml

* feat: improve Temporal client UX and env handling

- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands

* refactor: simplify session ID handling and improve Taskfile options

- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters

* chore: add .env.example and simplify .gitignore

* docs: update README and CLAUDE.md for Temporal workflow usage

- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview

* refactor: replace Taskfile with bash CLI script

- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands

* docs: fix deliverable filename in README

* refactor: remove direct CLI and .shannon-store.json in favor of Temporal

- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references

* chore: remove licensing comments from prompt files to prevent leaking into actual prompts

* fix: resolve parallel workflow race conditions and retry logic bugs

- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows

* refactor: pipeline vuln→exploit workflow for parallel execution

- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics

* fix: re-throw retryable errors in checkExploitationQueue

* fix: detect and retry on Claude Code spending cap errors

- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success

* fix: increase heartbeat timeout to prevent false worker-dead detection

Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).

* fix: temporal db init

* fix: persist home dir

* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>

- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access

---------

Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
2026-01-15 10:36:11 -08:00

6.4 KiB

description
description
Review code changes for Shannon-specific patterns, security, and common mistakes

Review the current changes (staged or working directory) with focus on Shannon-specific patterns and common mistakes.

Step 1: Gather Changes

Run these commands to understand the scope:

git diff --stat HEAD
git diff HEAD

Step 2: Check Shannon-Specific Patterns

Error Handling (CRITICAL)

  • All errors use PentestError - Never use raw Error. Use new PentestError(message, type, retryable, context)
  • Error type is appropriate - Use correct type: 'config', 'network', 'tool', 'prompt', 'filesystem', 'validation', 'billing', 'unknown'
  • Retryable flag matches behavior - If error will be retried, set retryable: true
  • Context includes debugging info - Add relevant paths, tool names, error codes to context object
  • Never swallow errors silently - Always log or propagate errors

Audit System & Concurrency (CRITICAL)

  • Mutex protection for parallel operations - Use sessionMutex.lock() when updating session.json during parallel agent execution
  • Reload before modify - Always call this.metricsTracker.reload() before updating metrics in mutex block
  • Atomic writes for session.json - Use atomicWrite() for session metadata, never fs.writeFile() directly
  • Stream drain handling - Log writes must wait for buffer drain before resolving
  • Semaphore release in finally - Git semaphore must be released in finally block

Claude SDK Integration (CRITICAL)

  • MCP server configuration - Verify Playwright MCP uses --isolated and unique --user-data-dir
  • Prompt variable interpolation - Check all {{VARIABLE}} placeholders are replaced
  • Turn counting - Increment turnCount on assistant messages, not tool calls
  • Cost tracking - Extract cost from final result message, track even on failure
  • API error detection - Check for "session limit reached" (fatal) vs other errors

Configuration & Validation (CRITICAL)

  • FAILSAFE_SCHEMA for YAML - Never use default schema (prevents code execution)
  • Security pattern detection - Check for path traversal (../), HTML injection (<>), JavaScript URLs
  • Rule conflict detection - Rules cannot appear in both avoid AND focus
  • Duplicate rule detection - Same type:url_path cannot appear twice
  • JSON Schema validation before use - Config must pass AJV validation

Session & Agent Management (CRITICAL)

  • Deliverable dependencies respected - Exploitation agents only run if vulnerability queue exists AND has items
  • Queue validation before exploitation - Use safeValidateQueueAndDeliverable() to check eligibility
  • Git checkpoint before agent run - Create checkpoint for rollback on failure
  • Git rollback on retry - Call rollbackGitWorkspace() before each retry attempt
  • Agent prerequisites checked - Verify prerequisite agents completed before running dependent agent

Parallel Execution

  • Promise.allSettled for parallel agents - Never use Promise.all (partial failures should not crash batch)
  • Staggered startup - 2-second delay between parallel agent starts to prevent API throttle
  • Individual retry loops - Each agent retries independently (3 attempts max)
  • Results aggregated correctly - Handle both 'fulfilled' and 'rejected' results from Promise.allSettled

Step 3: TypeScript Safety

Type Assertions (WARNING)

  • No double casting - Never use as unknown as SomeType (bypasses type safety)
  • Validate before casting - JSON parsed data should be validated (JSON Schema) before as Type
  • Prefer type guards - Use instanceof or property checks instead of assertions where possible

Null/Undefined Handling

  • Explicit null checks - Use if (x === null || x === undefined) not truthy checks for critical paths
  • Nullish coalescing - Use ?? for null/undefined, not || which also catches empty string/0
  • Optional chaining - Use ?. for nested property access on potentially undefined objects

Imports & Types

  • Type imports - Use import type { ... } for type-only imports
  • No implicit any - All function parameters and returns must have explicit types
  • Readonly for constants - Use Object.freeze() and Readonly<> for immutable data

Step 4: Security Review

Defensive Tool Security

  • No credentials in logs - Check that passwords, tokens, TOTP secrets are not logged to audit files
  • Config file size limit - Ensure 1MB max for config files (DoS prevention)
  • Safe shell execution - Command arguments must be escaped/sanitized

Code Injection Prevention

  • YAML safe parsing - FAILSAFE_SCHEMA only
  • No eval/Function - Never use dynamic code evaluation
  • Input validation at boundaries - URLs, paths validated before use

Step 5: Common Mistakes to Avoid

Anti-Patterns Found in Codebase

  • Catch + re-throw without context - Don't just throw error, wrap with additional context
  • Silent failures in session loading - Corrupted session files should warn user, not silently reset
  • Duplicate retry logic - Don't implement retry at both caller and callee level
  • Hardcoded error message matching - Prefer error codes over regex on error.message
  • Missing timeout on long operations - Git operations and API calls should have timeouts

Code Quality

  • No dead code added - Remove unused imports, functions, variables
  • No over-engineering - Don't add abstractions for single-use operations
  • Comments only where needed - Self-documenting code preferred over excessive comments
  • Consistent file naming - kebab-case for files (e.g., queue-validation.ts)

Step 6: Provide Feedback

For each issue found:

  1. Location: File and line number
  2. Issue: What's wrong and why it matters
  3. Fix: How to correct it (with code example if helpful)
  4. Severity: Critical / Warning / Suggestion

Severity Definitions

  • Critical: Will cause bugs, crashes, data loss, or security issues
  • Warning: Code smell, inconsistent pattern, or potential future issue
  • Suggestion: Style improvement or minor enhancement

Summarize with:

  • Total issues by severity
  • Overall assessment (Ready to commit / Needs fixes / Needs discussion)

Now review the current changes.