mirror of
https://github.com/KeygraphHQ/shannon.git
synced 2026-02-12 09:12:50 +00:00
* refactor: modularize claude-executor and extract shared utilities
- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts
* feat: add Claude Code custom commands for debug and review
* feat: add Temporal integration foundation (phase 1-2)
- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence
* feat: add Temporal activities for agent execution (phase 3)
- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state
* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)
* feat: add Temporal worker, client, and query tools (phase 5)
- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker
* feat: fix Docker worker container setup
- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable
* fix: add report assembly step to Temporal workflow
- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header
* refactor: consolidate Docker setup to root docker-compose.yml
* feat: improve Temporal client UX and env handling
- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands
* refactor: simplify session ID handling and improve Taskfile options
- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters
* chore: add .env.example and simplify .gitignore
* docs: update README and CLAUDE.md for Temporal workflow usage
- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview
* refactor: replace Taskfile with bash CLI script
- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands
* docs: fix deliverable filename in README
* refactor: remove direct CLI and .shannon-store.json in favor of Temporal
- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references
* chore: remove licensing comments from prompt files to prevent leaking into actual prompts
* fix: resolve parallel workflow race conditions and retry logic bugs
- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows
* refactor: pipeline vuln→exploit workflow for parallel execution
- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics
* fix: re-throw retryable errors in checkExploitationQueue
* fix: detect and retry on Claude Code spending cap errors
- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success
* fix: increase heartbeat timeout to prevent false worker-dead detection
Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).
* fix: temporal db init
* fix: persist home dir
* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>
- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access
---------
Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
6.4 KiB
6.4 KiB
description
| description |
|---|
| Review code changes for Shannon-specific patterns, security, and common mistakes |
Review the current changes (staged or working directory) with focus on Shannon-specific patterns and common mistakes.
Step 1: Gather Changes
Run these commands to understand the scope:
git diff --stat HEAD
git diff HEAD
Step 2: Check Shannon-Specific Patterns
Error Handling (CRITICAL)
- All errors use PentestError - Never use raw
Error. Usenew PentestError(message, type, retryable, context) - Error type is appropriate - Use correct type: 'config', 'network', 'tool', 'prompt', 'filesystem', 'validation', 'billing', 'unknown'
- Retryable flag matches behavior - If error will be retried, set
retryable: true - Context includes debugging info - Add relevant paths, tool names, error codes to context object
- Never swallow errors silently - Always log or propagate errors
Audit System & Concurrency (CRITICAL)
- Mutex protection for parallel operations - Use
sessionMutex.lock()when updatingsession.jsonduring parallel agent execution - Reload before modify - Always call
this.metricsTracker.reload()before updating metrics in mutex block - Atomic writes for session.json - Use
atomicWrite()for session metadata, neverfs.writeFile()directly - Stream drain handling - Log writes must wait for buffer drain before resolving
- Semaphore release in finally - Git semaphore must be released in
finallyblock
Claude SDK Integration (CRITICAL)
- MCP server configuration - Verify Playwright MCP uses
--isolatedand unique--user-data-dir - Prompt variable interpolation - Check all
{{VARIABLE}}placeholders are replaced - Turn counting - Increment
turnCounton assistant messages, not tool calls - Cost tracking - Extract cost from final
resultmessage, track even on failure - API error detection - Check for "session limit reached" (fatal) vs other errors
Configuration & Validation (CRITICAL)
- FAILSAFE_SCHEMA for YAML - Never use default schema (prevents code execution)
- Security pattern detection - Check for path traversal (
../), HTML injection (<>), JavaScript URLs - Rule conflict detection - Rules cannot appear in both
avoidANDfocus - Duplicate rule detection - Same
type:url_pathcannot appear twice - JSON Schema validation before use - Config must pass AJV validation
Session & Agent Management (CRITICAL)
- Deliverable dependencies respected - Exploitation agents only run if vulnerability queue exists AND has items
- Queue validation before exploitation - Use
safeValidateQueueAndDeliverable()to check eligibility - Git checkpoint before agent run - Create checkpoint for rollback on failure
- Git rollback on retry - Call
rollbackGitWorkspace()before each retry attempt - Agent prerequisites checked - Verify prerequisite agents completed before running dependent agent
Parallel Execution
- Promise.allSettled for parallel agents - Never use
Promise.all(partial failures should not crash batch) - Staggered startup - 2-second delay between parallel agent starts to prevent API throttle
- Individual retry loops - Each agent retries independently (3 attempts max)
- Results aggregated correctly - Handle both 'fulfilled' and 'rejected' results from
Promise.allSettled
Step 3: TypeScript Safety
Type Assertions (WARNING)
- No double casting - Never use
as unknown as SomeType(bypasses type safety) - Validate before casting - JSON parsed data should be validated (JSON Schema) before
as Type - Prefer type guards - Use
instanceofor property checks instead of assertions where possible
Null/Undefined Handling
- Explicit null checks - Use
if (x === null || x === undefined)not truthy checks for critical paths - Nullish coalescing - Use
??for null/undefined, not||which also catches empty string/0 - Optional chaining - Use
?.for nested property access on potentially undefined objects
Imports & Types
- Type imports - Use
import type { ... }for type-only imports - No implicit any - All function parameters and returns must have explicit types
- Readonly for constants - Use
Object.freeze()andReadonly<>for immutable data
Step 4: Security Review
Defensive Tool Security
- No credentials in logs - Check that passwords, tokens, TOTP secrets are not logged to audit files
- Config file size limit - Ensure 1MB max for config files (DoS prevention)
- Safe shell execution - Command arguments must be escaped/sanitized
Code Injection Prevention
- YAML safe parsing - FAILSAFE_SCHEMA only
- No eval/Function - Never use dynamic code evaluation
- Input validation at boundaries - URLs, paths validated before use
Step 5: Common Mistakes to Avoid
Anti-Patterns Found in Codebase
- Catch + re-throw without context - Don't just
throw error, wrap with additional context - Silent failures in session loading - Corrupted session files should warn user, not silently reset
- Duplicate retry logic - Don't implement retry at both caller and callee level
- Hardcoded error message matching - Prefer error codes over regex on error.message
- Missing timeout on long operations - Git operations and API calls should have timeouts
Code Quality
- No dead code added - Remove unused imports, functions, variables
- No over-engineering - Don't add abstractions for single-use operations
- Comments only where needed - Self-documenting code preferred over excessive comments
- Consistent file naming - kebab-case for files (e.g.,
queue-validation.ts)
Step 6: Provide Feedback
For each issue found:
- Location: File and line number
- Issue: What's wrong and why it matters
- Fix: How to correct it (with code example if helpful)
- Severity: Critical / Warning / Suggestion
Severity Definitions
- Critical: Will cause bugs, crashes, data loss, or security issues
- Warning: Code smell, inconsistent pattern, or potential future issue
- Suggestion: Style improvement or minor enhancement
Summarize with:
- Total issues by severity
- Overall assessment (Ready to commit / Needs fixes / Needs discussion)
Now review the current changes.