Files
shannon/CLAUDE.md
Arjun Malleswaran 78a0a61208 Feat/temporal (#46)
* refactor: modularize claude-executor and extract shared utilities

- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts

* feat: add Claude Code custom commands for debug and review

* feat: add Temporal integration foundation (phase 1-2)

- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence

* feat: add Temporal activities for agent execution (phase 3)

- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state

* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)

* feat: add Temporal worker, client, and query tools (phase 5)

- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker

* feat: fix Docker worker container setup

- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable

* fix: add report assembly step to Temporal workflow

- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header

* refactor: consolidate Docker setup to root docker-compose.yml

* feat: improve Temporal client UX and env handling

- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands

* refactor: simplify session ID handling and improve Taskfile options

- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters

* chore: add .env.example and simplify .gitignore

* docs: update README and CLAUDE.md for Temporal workflow usage

- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview

* refactor: replace Taskfile with bash CLI script

- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands

* docs: fix deliverable filename in README

* refactor: remove direct CLI and .shannon-store.json in favor of Temporal

- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references

* chore: remove licensing comments from prompt files to prevent leaking into actual prompts

* fix: resolve parallel workflow race conditions and retry logic bugs

- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows

* refactor: pipeline vuln→exploit workflow for parallel execution

- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics

* fix: re-throw retryable errors in checkExploitationQueue

* fix: detect and retry on Claude Code spending cap errors

- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success

* fix: increase heartbeat timeout to prevent false worker-dead detection

Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).

* fix: temporal db init

* fix: persist home dir

* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>

- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access

---------

Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
2026-01-15 10:36:11 -08:00

13 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is an AI-powered penetration testing agent designed for defensive security analysis. The tool automates vulnerability assessment by combining external reconnaissance tools with AI-powered code analysis to identify security weaknesses in web applications and their source code.

Commands

Prerequisites

  • Docker - Container runtime
  • Anthropic API key - Set in .env file

Running the Penetration Testing Agent (Docker + Temporal)

# Configure credentials
cp .env.example .env
# Edit .env:
#   ANTHROPIC_API_KEY=your-key
#   CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000  # Prevents token limits during long reports

# Start a pentest workflow
./shannon start URL=<url> REPO=<path>

Examples:

./shannon start URL=https://example.com REPO=/path/to/repo
./shannon start URL=https://example.com REPO=/path/to/repo CONFIG=./configs/my-config.yaml
./shannon start URL=https://example.com REPO=/path/to/repo OUTPUT=./my-reports

Monitoring Progress

./shannon logs                      # View real-time worker logs
./shannon query ID=<workflow-id>    # Query specific workflow progress
# Temporal Web UI available at http://localhost:8233

Stopping Shannon

./shannon stop                      # Stop containers (preserves workflow data)
./shannon stop CLEAN=true           # Full cleanup including volumes

Options

CONFIG=<file>          YAML configuration file for authentication and testing parameters
OUTPUT=<path>          Custom output directory for session folder (default: ./audit-logs/)
PIPELINE_TESTING=true  Use minimal prompts and fast retry intervals (10s instead of 5min)
REBUILD=true           Force Docker rebuild with --no-cache (use when code changes aren't picked up)

Generate TOTP for Authentication

TOTP generation is handled automatically via the generate_totp MCP tool during authentication flows.

Development Commands

# Build TypeScript
npm run build

# Run with pipeline testing mode (fast, minimal deliverables)
./shannon start URL=<url> REPO=<path> PIPELINE_TESTING=true

Architecture & Components

Core Modules

  • src/config-parser.ts - Handles YAML configuration parsing, validation, and distribution to agents
  • src/error-handling.ts - Comprehensive error handling with retry logic and categorized error types
  • src/tool-checker.ts - Validates availability of external security tools before execution
  • src/session-manager.ts - Agent definitions, execution order, and parallel groups
  • src/queue-validation.ts - Validates deliverables and agent prerequisites

Temporal Orchestration Layer

Shannon uses Temporal for durable workflow orchestration:

  • src/temporal/shared.ts - Types, interfaces, query definitions
  • src/temporal/workflows.ts - Main workflow (pentestPipelineWorkflow)
  • src/temporal/activities.ts - Activity implementations with heartbeats
  • src/temporal/worker.ts - Worker process entry point
  • src/temporal/client.ts - CLI client for starting workflows
  • src/temporal/query.ts - Query tool for progress inspection

Key features:

  • Crash recovery - Workflows resume automatically after worker restart
  • Queryable progress - Real-time status via ./shannon query or Temporal Web UI
  • Intelligent retry - Distinguishes transient vs permanent errors
  • Parallel execution - 5 concurrent agents in vulnerability/exploitation phases

Five-Phase Testing Workflow

  1. Pre-Reconnaissance (pre-recon) - External tool scans (nmap, subfinder, whatweb) + source code analysis
  2. Reconnaissance (recon) - Analysis of initial findings and attack surface mapping
  3. Vulnerability Analysis (5 agents run in parallel)
    • injection-vuln - SQL injection, command injection
    • xss-vuln - Cross-site scripting
    • auth-vuln - Authentication bypasses
    • authz-vuln - Authorization flaws
    • ssrf-vuln - Server-side request forgery
  4. Exploitation (5 agents run in parallel, only if vulnerabilities found)
    • injection-exploit - Exploit injection vulnerabilities
    • xss-exploit - Exploit XSS vulnerabilities
    • auth-exploit - Exploit authentication issues
    • authz-exploit - Exploit authorization flaws
    • ssrf-exploit - Exploit SSRF vulnerabilities
  5. Reporting (report) - Executive-level security report generation

Configuration System

The agent supports YAML configuration files with JSON Schema validation:

  • configs/config-schema.json - JSON Schema for configuration validation
  • configs/example-config.yaml - Template configuration file
  • configs/juice-shop-config.yaml - Example configuration for OWASP Juice Shop
  • configs/keygraph-config.yaml - Configuration for Keygraph applications
  • configs/chatwoot-config.yaml - Configuration for Chatwoot applications
  • configs/metabase-config.yaml - Configuration for Metabase applications
  • configs/cal-com-config.yaml - Configuration for Cal.com applications

Configuration includes:

  • Authentication settings (form, SSO, API, basic auth)
  • Multi-factor authentication with TOTP support
  • Custom login flow instructions
  • Application-specific testing parameters

Prompt Templates

The prompts/ directory contains specialized prompt templates for each testing phase:

  • pre-recon-code.txt - Initial code analysis prompts
  • recon.txt - Reconnaissance analysis prompts
  • vuln-*.txt - Vulnerability assessment prompts (injection, XSS, auth, authz, SSRF)
  • exploit-*.txt - Exploitation attempt prompts
  • report-executive.txt - Executive report generation prompts

Claude Agent SDK Integration

The agent uses the @anthropic-ai/claude-agent-sdk with maximum autonomy configuration:

  • maxTurns: 10_000 - Allows extensive autonomous analysis
  • permissionMode: 'bypassPermissions' - Full system access for thorough testing
  • Playwright MCP integration for web browser automation
  • Working directory set to target local repository
  • Configuration context injection for authenticated testing

Authentication & Login Resources

  • prompts/shared/login-instructions.txt - Login flow template for all agents
  • TOTP token generation via MCP generate_totp tool
  • Support for multi-factor authentication workflows
  • Configurable authentication mechanisms (form, SSO, API, basic)

Output & Deliverables

All analysis results are saved to the deliverables/ directory within the target local repository, including:

  • Pre-reconnaissance reports with external scan results
  • Vulnerability assessment findings
  • Exploitation attempt results
  • Executive-level security reports with business impact analysis

External Tool Dependencies

The agent integrates with external security tools:

  • nmap - Network port scanning
  • subfinder - Subdomain discovery
  • whatweb - Web technology fingerprinting

Tools are validated for availability before execution using the tool-checker module.

Audit & Metrics System

The agent implements a crash-safe audit system with the following features:

Architecture:

  • audit-logs/ (or custom --output path): Centralized metrics and forensic logs
    • {hostname}_{sessionId}/session.json - Comprehensive metrics with attempt-level detail
    • {hostname}_{sessionId}/prompts/ - Exact prompts used for reproducibility
    • {hostname}_{sessionId}/agents/ - Turn-by-turn execution logs
    • {hostname}_{sessionId}/deliverables/ - Security reports and findings

Crash Safety:

  • Append-only logging with immediate flush (survives kill -9)
  • Atomic writes for session.json (no partial writes)
  • Event-based logging (tool_start, tool_end, llm_response)

Concurrency Safety:

  • SessionMutex prevents race conditions during parallel agent execution
  • 5x faster execution with parallel vulnerability and exploitation phases

Metrics & Reporting:

  • Phase-level and agent-level timing/cost aggregations
  • Validation results integrated with metrics

Development Notes

Learning from Reference Implementations

A working POC exists at /Users/arjunmalleswaran/Code/shannon-pocs that demonstrates the ideal Temporal + Claude Agent SDK integration. When implementing Temporal features, agents can ask questions in the chat, and the user will relay them to another Claude Code session working in that POC directory.

How to use this approach:

  1. When stuck or unsure about Temporal patterns, write a specific question in the chat
  2. The user will ask an agent working on the POC to answer
  3. The user relays the answer (code snippets, patterns, explanations) back
  4. Apply the learned patterns to Shannon's codebase

Example questions to ask:

  • "How does the POC structure its workflow to handle parallel activities?"
  • "Show me how heartbeats are implemented in the POC's activities"
  • "What retry configuration does the POC use for long-running agent activities?"
  • "How does the POC integrate Claude Agent SDK calls within Temporal activities?"

Reference implementation:

  • Temporal + Claude Agent SDK: /Users/arjunmalleswaran/Code/shannon-pocs - working implementation demonstrating workflows, activities, worker setup, and SDK integration

Adding a New Agent

  1. Define the agent in src/session-manager.ts (add to AGENT_QUEUE and appropriate parallel group)
  2. Create prompt template in prompts/ (e.g., vuln-newtype.txt or exploit-newtype.txt)
  3. Add activity function in src/temporal/activities.ts
  4. Register activity in src/temporal/workflows.ts within the appropriate phase

Modifying Prompts

  • Prompt templates use variable substitution: {{TARGET_URL}}, {{CONFIG_CONTEXT}}, {{LOGIN_INSTRUCTIONS}}
  • Shared partials in prompts/shared/ are included via prompt-manager.ts
  • Test changes with PIPELINE_TESTING=true for faster iteration

Key Design Patterns

  • Configuration-Driven Architecture: YAML configs with JSON Schema validation
  • Modular Error Handling: Categorized error types with retry logic
  • SDK-First Approach: Heavy reliance on Claude Agent SDK for autonomous AI operations
  • Progressive Analysis: Each phase builds on previous phase results

Error Handling Strategy

The application uses a comprehensive error handling system with:

  • Categorized error types (PentestError, ConfigError, NetworkError, etc.)
  • Automatic retry logic for transient failures (3 attempts per agent)
  • Graceful degradation when external tools are unavailable
  • Detailed error logging and user-friendly error messages

Testing Mode

The agent includes a testing mode that skips external tool execution for faster development cycles:

./shannon start URL=<url> REPO=<path> PIPELINE_TESTING=true

Security Focus

This is explicitly designed as a defensive security tool for:

  • Vulnerability assessment
  • Security analysis
  • Penetration testing
  • Security report generation

The tool should only be used on systems you own or have explicit permission to test.

Key Files & Directories

Entry Points:

  • src/temporal/workflows.ts - Temporal workflow definition
  • src/temporal/activities.ts - Activity implementations with heartbeats
  • src/temporal/worker.ts - Worker process entry point
  • src/temporal/client.ts - CLI client for starting workflows

Core Logic:

  • src/session-manager.ts - Agent definitions, execution order, parallel groups
  • src/ai/claude-executor.ts - Claude Agent SDK integration
  • src/config-parser.ts - YAML config parsing with JSON Schema validation
  • src/audit/ - Crash-safe logging and metrics system

Configuration:

  • shannon - CLI script for running pentests
  • docker-compose.yml - Temporal server + worker containers
  • configs/ - YAML configs with config-schema.json for validation
  • prompts/ - AI prompt templates (vuln-*.txt, exploit-*.txt, etc.)

Output:

  • audit-logs/{hostname}_{sessionId}/ - Session metrics, agent logs, deliverables

Troubleshooting

Common Issues

  • "Repository not found": Ensure target local directory exists and is accessible

Temporal & Docker Issues

  • "Temporal not ready": Wait for health check or run docker compose logs temporal
  • Worker not processing: Ensure worker container is running with docker compose ps
  • Reset workflow state: ./shannon stop CLEAN=true removes all Temporal data and volumes
  • Local apps unreachable: Use host.docker.internal instead of localhost for URLs
  • Container permissions: On Linux, may need sudo for docker commands

External Tool Dependencies

Missing tools can be skipped using PIPELINE_TESTING=true mode during development:

  • nmap - Network scanning
  • subfinder - Subdomain discovery
  • whatweb - Web technology detection

Diagnostic & Utility Scripts

# View Temporal workflow history
open http://localhost:8233