mirror of
https://github.com/KeygraphHQ/shannon.git
synced 2026-05-28 19:31:34 +02:00
Feat/temporal (#46)
* refactor: modularize claude-executor and extract shared utilities
- Extract message handling into src/ai/message-handlers.ts with pure functions
- Extract output formatting into src/ai/output-formatters.ts
- Extract progress management into src/ai/progress-manager.ts
- Add audit-logger.ts with Null Object pattern for optional logging
- Add shared utilities: formatting.ts, file-io.ts, functional.ts
- Consolidate getPromptNameForAgent into src/types/agents.ts
* feat: add Claude Code custom commands for debug and review
* feat: add Temporal integration foundation (phase 1-2)
- Add Temporal SDK dependencies (@temporalio/client, worker, workflow, activity)
- Add shared types for pipeline state, metrics, and progress queries
- Add classifyErrorForTemporal() for retry behavior classification
- Add docker-compose for Temporal server with SQLite persistence
* feat: add Temporal activities for agent execution (phase 3)
- Add activities.ts with heartbeat loop, git checkpoint/rollback, and error classification
- Export runClaudePrompt, validateAgentOutput, ClaudePromptResult for Temporal use
- Track attempt number via Temporal Context for accurate audit logging
- Rollback git workspace before retry to ensure clean state
* feat: add Temporal workflow for 5-phase pipeline orchestration (phase 4)
* feat: add Temporal worker, client, and query tools (phase 5)
- Add worker.ts with workflow bundling and graceful shutdown
- Add client.ts CLI to start pipelines with progress polling
- Add query.ts CLI to inspect running workflow state
- Fix buffer overflow by truncating error messages and stack traces
- Skip git operations gracefully on non-git repositories
- Add kill.sh/start.sh dev scripts and Dockerfile.worker
* feat: fix Docker worker container setup
- Install uv instead of deprecated uvx package
- Add mcp-server and configs directories to container
- Mount target repo dynamically via TARGET_REPO env variable
* fix: add report assembly step to Temporal workflow
- Add assembleReportActivity to concatenate exploitation evidence files before report agent runs
- Call assembleFinalReport in workflow Phase 5 before runReportAgent
- Ensure deliverables directory exists before writing final report
- Simplify pipeline-testing report prompt to just prepend header
* refactor: consolidate Docker setup to root docker-compose.yml
* feat: improve Temporal client UX and env handling
- Change default to fire-and-forget (--wait flag to opt-in)
- Add splash screen and improve console output formatting
- Add .env to gitignore, remove from dockerignore for container access
- Add Taskfile for common development commands
* refactor: simplify session ID handling and improve Taskfile options
- Include hostname in workflow ID for better audit log organization
- Extract sanitizeHostname utility to audit/utils.ts for reuse
- Remove unused generateSessionLogPath and buildLogFilePath functions
- Simplify Taskfile with CONFIG/OUTPUT/CLEAN named parameters
* chore: add .env.example and simplify .gitignore
* docs: update README and CLAUDE.md for Temporal workflow usage
- Replace Docker CLI instructions with Task-based commands
- Add monitoring/stopping sections and workflow examples
- Document Temporal orchestration layer and troubleshooting
- Simplify file structure to key files overview
* refactor: replace Taskfile with bash CLI script
- Add shannon bash script with start/logs/query/stop/help commands
- Remove Taskfile.yml dependency (no longer requires Task installation)
- Update README.md and CLAUDE.md to use ./shannon commands
- Update client.ts output to show ./shannon commands
* docs: fix deliverable filename in README
* refactor: remove direct CLI and .shannon-store.json in favor of Temporal
- Delete src/shannon.ts direct CLI entry point (Temporal is now the only mode)
- Remove .shannon-store.json session lock (Temporal handles workflow deduplication)
- Remove broken scripts/export-metrics.js (imported non-existent function)
- Update package.json to remove main, start script, and bin entry
- Clean up CLAUDE.md and debug.md to remove obsolete references
* chore: remove licensing comments from prompt files to prevent leaking into actual prompts
* fix: resolve parallel workflow race conditions and retry logic bugs
- Fix save_deliverable race condition using closure pattern instead of global variable
- Fix error classification order so OutputValidationError matches before generic validation
- Fix ApplicationFailure re-classification bug by checking instanceof before re-throwing
- Add per-error-type retry limits (3 for output validation, 50 for billing)
- Add fast retry intervals for pipeline testing mode (10s vs 5min)
- Increase worker concurrent activities to 25 for parallel workflows
* refactor: pipeline vuln→exploit workflow for parallel execution
- Replace sync barrier between vuln/exploit phases with independent pipelines
- Each vuln type runs: vuln agent → queue check → conditional exploit
- Add checkExploitationQueue activity to skip exploits when no vulns found
- Use Promise.allSettled for graceful failure handling across pipelines
- Add PipelineSummary type for aggregated cost/duration/turns metrics
* fix: re-throw retryable errors in checkExploitationQueue
* fix: detect and retry on Claude Code spending cap errors
- Add spending cap pattern detection in detectApiError() with retryable error
- Add matching patterns to classifyErrorForTemporal() for proper Temporal retry
- Add defense-in-depth safeguard in runClaudePrompt() for $0 cost / low turn detection
- Add final sanity check in activities before declaring success
* fix: increase heartbeat timeout to prevent false worker-dead detection
Original 30s timeout was from POC spec assuming <5min activities. With
hour-long activities and multiple concurrent workflows sharing one worker,
resource contention causes event loop stalls exceeding 30s, triggering
false heartbeat timeouts. Increased to 10min (prod) and 5min (testing).
* fix: temporal db init
* fix: persist home dir
* feat: add per-workflow unified logging with ./shannon logs ID=<workflow-id>
- Add WorkflowLogger class for human-readable, per-workflow log files
- Create workflow.log in audit-logs/{workflowId}/ with phase, agent, tool, and LLM events
- Update ./shannon logs to require ID param and tail specific workflow log
- Add phase transition logging at workflow boundaries
- Include workflow completion summary with agent breakdown (duration, cost)
- Mount audit-logs volume in docker-compose for host access
---------
Co-authored-by: ezl-keygraph <ezhil@keygraph.io>
This commit is contained in:
committed by
GitHub
parent
45acb16711
commit
51e621d0d5
@@ -0,0 +1,79 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for audit logging - callers never check for null
|
||||
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
|
||||
export interface AuditLogger {
|
||||
logLlmResponse(turn: number, content: string): Promise<void>;
|
||||
logToolStart(toolName: string, parameters: unknown): Promise<void>;
|
||||
logToolEnd(result: unknown): Promise<void>;
|
||||
logError(error: Error, duration: number, turns: number): Promise<void>;
|
||||
}
|
||||
|
||||
class RealAuditLogger implements AuditLogger {
|
||||
private auditSession: AuditSession;
|
||||
|
||||
constructor(auditSession: AuditSession) {
|
||||
this.auditSession = auditSession;
|
||||
}
|
||||
|
||||
async logLlmResponse(turn: number, content: string): Promise<void> {
|
||||
await this.auditSession.logEvent('llm_response', {
|
||||
turn,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolStart(toolName: string, parameters: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_start', {
|
||||
toolName,
|
||||
parameters,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolEnd(result: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_end', {
|
||||
result,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logError(error: Error, duration: number, turns: number): Promise<void> {
|
||||
await this.auditSession.logEvent('error', {
|
||||
message: error.message,
|
||||
errorType: error.constructor.name,
|
||||
stack: error.stack,
|
||||
duration,
|
||||
turns,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullAuditLogger implements AuditLogger {
|
||||
async logLlmResponse(_turn: number, _content: string): Promise<void> {}
|
||||
|
||||
async logToolStart(_toolName: string, _parameters: unknown): Promise<void> {}
|
||||
|
||||
async logToolEnd(_result: unknown): Promise<void> {}
|
||||
|
||||
async logError(_error: Error, _duration: number, _turns: number): Promise<void> {}
|
||||
}
|
||||
|
||||
// Returns no-op when auditSession is null
|
||||
export function createAuditLogger(auditSession: AuditSession | null): AuditLogger {
|
||||
if (auditSession) {
|
||||
return new RealAuditLogger(auditSession);
|
||||
}
|
||||
|
||||
return new NullAuditLogger();
|
||||
}
|
||||
+262
-484
@@ -4,35 +4,33 @@
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { $, fs, path } from 'zx';
|
||||
// Production Claude agent execution with retry, git checkpoints, and audit logging
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import chalk, { type ChalkInstance } from 'chalk';
|
||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { fileURLToPath } from 'url';
|
||||
import { dirname } from 'path';
|
||||
|
||||
import { isRetryableError, getRetryDelay, PentestError } from '../error-handling.js';
|
||||
import { ProgressIndicator } from '../progress-indicator.js';
|
||||
import { timingResults, costResults, Timer } from '../utils/metrics.js';
|
||||
import { formatDuration } from '../audit/utils.js';
|
||||
import { createGitCheckpoint, commitGitSuccess, rollbackGitWorkspace } from '../utils/git-manager.js';
|
||||
import { timingResults, Timer } from '../utils/metrics.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { createGitCheckpoint, commitGitSuccess, rollbackGitWorkspace, getGitCommitHash } from '../utils/git-manager.js';
|
||||
import { AGENT_VALIDATORS, MCP_AGENT_MAPPING } from '../constants.js';
|
||||
import { filterJsonToolCalls, getAgentPrefix } from '../utils/output-formatter.js';
|
||||
import { generateSessionLogPath } from '../session-manager.js';
|
||||
import { AuditSession } from '../audit/index.js';
|
||||
import { createShannonHelperServer } from '../../mcp-server/dist/index.js';
|
||||
import type { SessionMetadata } from '../audit/utils.js';
|
||||
import type { PromptName } from '../types/index.js';
|
||||
import { getPromptNameForAgent } from '../types/agents.js';
|
||||
import type { AgentName } from '../types/index.js';
|
||||
|
||||
import { dispatchMessage } from './message-handlers.js';
|
||||
import { detectExecutionContext, formatErrorOutput, formatCompletionMessage } from './output-formatters.js';
|
||||
import { createProgressManager } from './progress-manager.js';
|
||||
import { createAuditLogger } from './audit-logger.js';
|
||||
|
||||
// Extend global for loader flag
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = dirname(__filename);
|
||||
|
||||
// Result types
|
||||
interface ClaudePromptResult {
|
||||
export interface ClaudePromptResult {
|
||||
result?: string | null;
|
||||
success: boolean;
|
||||
duration: number;
|
||||
@@ -40,14 +38,12 @@ interface ClaudePromptResult {
|
||||
cost: number;
|
||||
partialCost?: number;
|
||||
apiErrorDetected?: boolean;
|
||||
logFile?: string;
|
||||
error?: string;
|
||||
errorType?: string;
|
||||
prompt?: string;
|
||||
retryable?: boolean;
|
||||
}
|
||||
|
||||
// MCP Server types
|
||||
interface StdioMcpServer {
|
||||
type: 'stdio';
|
||||
command: string;
|
||||
@@ -57,157 +53,29 @@ interface StdioMcpServer {
|
||||
|
||||
type McpServer = ReturnType<typeof createShannonHelperServer> | StdioMcpServer;
|
||||
|
||||
/**
|
||||
* Convert agent name to prompt name for MCP_AGENT_MAPPING lookup
|
||||
*/
|
||||
function agentNameToPromptName(agentName: string): PromptName {
|
||||
// Special cases
|
||||
if (agentName === 'pre-recon') return 'pre-recon-code';
|
||||
if (agentName === 'report') return 'report-executive';
|
||||
if (agentName === 'recon') return 'recon';
|
||||
|
||||
// Pattern: {type}-vuln → vuln-{type}
|
||||
const vulnMatch = agentName.match(/^(.+)-vuln$/);
|
||||
if (vulnMatch) {
|
||||
return `vuln-${vulnMatch[1]}` as PromptName;
|
||||
}
|
||||
|
||||
// Pattern: {type}-exploit → exploit-{type}
|
||||
const exploitMatch = agentName.match(/^(.+)-exploit$/);
|
||||
if (exploitMatch) {
|
||||
return `exploit-${exploitMatch[1]}` as PromptName;
|
||||
}
|
||||
|
||||
// Default: return as-is
|
||||
return agentName as PromptName;
|
||||
}
|
||||
|
||||
// Simplified validation using direct agent name mapping
|
||||
async function validateAgentOutput(
|
||||
result: ClaudePromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string
|
||||
): Promise<boolean> {
|
||||
console.log(chalk.blue(` 🔍 Validating ${agentName} agent output`));
|
||||
|
||||
try {
|
||||
// Check if agent completed successfully
|
||||
if (!result.success || !result.result) {
|
||||
console.log(chalk.red(` ❌ Validation failed: Agent execution was unsuccessful`));
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get validator function for this agent
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
|
||||
if (!validator) {
|
||||
console.log(chalk.yellow(` ⚠️ No validator found for agent "${agentName}" - assuming success`));
|
||||
console.log(chalk.green(` ✅ Validation passed: Unknown agent with successful result`));
|
||||
return true;
|
||||
}
|
||||
|
||||
console.log(chalk.blue(` 📋 Using validator for agent: ${agentName}`));
|
||||
console.log(chalk.blue(` 📂 Source directory: ${sourceDir}`));
|
||||
|
||||
// Apply validation function
|
||||
const validationResult = await validator(sourceDir);
|
||||
|
||||
if (validationResult) {
|
||||
console.log(chalk.green(` ✅ Validation passed: Required files/structure present`));
|
||||
} else {
|
||||
console.log(chalk.red(` ❌ Validation failed: Missing required deliverable files`));
|
||||
}
|
||||
|
||||
return validationResult;
|
||||
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.red(` ❌ Validation failed with error: ${errMsg}`));
|
||||
return false; // Assume invalid on validation error
|
||||
}
|
||||
}
|
||||
|
||||
// Pure function: Run Claude Code with SDK - Maximum Autonomy
|
||||
// WARNING: This is a low-level function. Use runClaudePromptWithRetry() for agent execution
|
||||
async function runClaudePrompt(
|
||||
prompt: string,
|
||||
// Configures MCP servers for agent execution, with Docker-specific Chromium handling
|
||||
function buildMcpServers(
|
||||
sourceDir: string,
|
||||
_allowedTools: string = 'Read',
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
agentName: string | null = null,
|
||||
colorFn: ChalkInstance = chalk.cyan,
|
||||
sessionMetadata: SessionMetadata | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
attemptNumber: number = 1
|
||||
): Promise<ClaudePromptResult> {
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
let totalCost = 0;
|
||||
let partialCost = 0; // Track partial cost for crash safety
|
||||
agentName: string | null
|
||||
): Record<string, McpServer> {
|
||||
const shannonHelperServer = createShannonHelperServer(sourceDir);
|
||||
|
||||
// Auto-detect execution mode to adjust logging behavior
|
||||
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
|
||||
const useCleanOutput = description.includes('Pre-recon agent') ||
|
||||
description.includes('Recon agent') ||
|
||||
description.includes('Executive Summary and Report Cleanup') ||
|
||||
description.includes('vuln agent') ||
|
||||
description.includes('exploit agent');
|
||||
const mcpServers: Record<string, McpServer> = {
|
||||
'shannon-helper': shannonHelperServer,
|
||||
};
|
||||
|
||||
// Disable status manager - using simple JSON filtering for all agents now
|
||||
const statusManager = null;
|
||||
if (agentName) {
|
||||
const promptName = getPromptNameForAgent(agentName as AgentName);
|
||||
const playwrightMcpName = MCP_AGENT_MAPPING[promptName as keyof typeof MCP_AGENT_MAPPING] || null;
|
||||
|
||||
// Setup progress indicator for clean output agents (unless disabled via flag)
|
||||
let progressIndicator: ProgressIndicator | null = null;
|
||||
if (useCleanOutput && !global.SHANNON_DISABLE_LOADER) {
|
||||
const agentType = description.includes('Pre-recon') ? 'pre-reconnaissance' :
|
||||
description.includes('Recon') ? 'reconnaissance' :
|
||||
description.includes('Report') ? 'report generation' : 'analysis';
|
||||
progressIndicator = new ProgressIndicator(`Running ${agentType}...`);
|
||||
}
|
||||
|
||||
// NOTE: Logging now handled by AuditSession (append-only, crash-safe)
|
||||
let logFilePath: string | null = null;
|
||||
if (sessionMetadata && sessionMetadata.webUrl && sessionMetadata.id) {
|
||||
const timestamp = new Date().toISOString().replace(/T/, '_').replace(/[:.]/g, '-').slice(0, 19);
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
const logDir = generateSessionLogPath(sessionMetadata.webUrl, sessionMetadata.id);
|
||||
logFilePath = path.join(logDir, `${timestamp}_${agentKey}_attempt-${attemptNumber}.log`);
|
||||
} else {
|
||||
console.log(chalk.blue(` 🤖 Running Claude Code: ${description}...`));
|
||||
}
|
||||
|
||||
// Declare variables that need to be accessible in both try and catch blocks
|
||||
let turnCount = 0;
|
||||
|
||||
try {
|
||||
// Create MCP server with target directory context
|
||||
const shannonHelperServer = createShannonHelperServer(sourceDir);
|
||||
|
||||
// Look up agent's assigned Playwright MCP server
|
||||
let playwrightMcpName: string | null = null;
|
||||
if (agentName) {
|
||||
const promptName = agentNameToPromptName(agentName);
|
||||
playwrightMcpName = MCP_AGENT_MAPPING[promptName as keyof typeof MCP_AGENT_MAPPING] || null;
|
||||
|
||||
if (playwrightMcpName) {
|
||||
console.log(chalk.gray(` 🎭 Assigned ${agentName} → ${playwrightMcpName}`));
|
||||
}
|
||||
}
|
||||
|
||||
// Configure MCP servers: shannon-helper (SDK) + playwright-agentN (stdio)
|
||||
const mcpServers: Record<string, McpServer> = {
|
||||
'shannon-helper': shannonHelperServer,
|
||||
};
|
||||
|
||||
// Add Playwright MCP server if this agent needs browser automation
|
||||
if (playwrightMcpName) {
|
||||
console.log(chalk.gray(` Assigned ${agentName} -> ${playwrightMcpName}`));
|
||||
|
||||
const userDataDir = `/tmp/${playwrightMcpName}`;
|
||||
|
||||
// Detect if running in Docker via explicit environment variable
|
||||
// Docker uses system Chromium; local dev uses Playwright's bundled browsers
|
||||
const isDocker = process.env.SHANNON_DOCKER === 'true';
|
||||
|
||||
// Build args array - conditionally add --executable-path for Docker
|
||||
const mcpArgs: string[] = [
|
||||
'@playwright/mcp@latest',
|
||||
'--isolated',
|
||||
@@ -220,7 +88,6 @@ async function runClaudePrompt(
|
||||
mcpArgs.push('--browser', 'chromium');
|
||||
}
|
||||
|
||||
// Filter out undefined env values for type safety
|
||||
const envVars: Record<string, string> = Object.fromEntries(
|
||||
Object.entries({
|
||||
...process.env,
|
||||
@@ -236,335 +103,200 @@ async function runClaudePrompt(
|
||||
env: envVars,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
const options = {
|
||||
model: 'claude-sonnet-4-5-20250929', // Use latest Claude 4.5 Sonnet
|
||||
maxTurns: 10_000, // Maximum turns for autonomous work
|
||||
cwd: sourceDir, // Set working directory using SDK option
|
||||
permissionMode: 'bypassPermissions' as const, // Bypass all permission checks for pentesting
|
||||
mcpServers,
|
||||
return mcpServers;
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
async function writeErrorLog(
|
||||
err: Error & { code?: string; status?: number },
|
||||
sourceDir: string,
|
||||
fullPrompt: string,
|
||||
duration: number
|
||||
): Promise<void> {
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: formatTimestamp(),
|
||||
agent: 'claude-executor',
|
||||
error: {
|
||||
name: err.constructor.name,
|
||||
message: err.message,
|
||||
code: err.code,
|
||||
status: err.status,
|
||||
stack: err.stack
|
||||
},
|
||||
context: {
|
||||
sourceDir,
|
||||
prompt: fullPrompt.slice(0, 200) + '...',
|
||||
retryable: isRetryableError(err)
|
||||
},
|
||||
duration
|
||||
};
|
||||
const logPath = path.join(sourceDir, 'error.log');
|
||||
await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
|
||||
} catch (logError) {
|
||||
const logErrMsg = logError instanceof Error ? logError.message : String(logError);
|
||||
console.log(chalk.gray(` (Failed to write error log: ${logErrMsg})`));
|
||||
}
|
||||
}
|
||||
|
||||
// SDK Options only shown for verbose agents (not clean output)
|
||||
if (!useCleanOutput) {
|
||||
console.log(chalk.gray(` SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`));
|
||||
export async function validateAgentOutput(
|
||||
result: ClaudePromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string
|
||||
): Promise<boolean> {
|
||||
console.log(chalk.blue(` Validating ${agentName} agent output`));
|
||||
|
||||
try {
|
||||
// Check if agent completed successfully
|
||||
if (!result.success || !result.result) {
|
||||
console.log(chalk.red(` Validation failed: Agent execution was unsuccessful`));
|
||||
return false;
|
||||
}
|
||||
|
||||
let result: string | null = null;
|
||||
const messages: string[] = [];
|
||||
let apiErrorDetected = false;
|
||||
// Get validator function for this agent
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
|
||||
// Start progress indicator for clean output agents
|
||||
if (progressIndicator) {
|
||||
progressIndicator.start();
|
||||
if (!validator) {
|
||||
console.log(chalk.yellow(` No validator found for agent "${agentName}" - assuming success`));
|
||||
console.log(chalk.green(` Validation passed: Unknown agent with successful result`));
|
||||
return true;
|
||||
}
|
||||
|
||||
let lastHeartbeat = Date.now();
|
||||
const HEARTBEAT_INTERVAL = 30000; // 30 seconds
|
||||
console.log(chalk.blue(` Using validator for agent: ${agentName}`));
|
||||
console.log(chalk.blue(` Source directory: ${sourceDir}`));
|
||||
|
||||
try {
|
||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||
// Periodic heartbeat for long-running agents (only when loader is disabled)
|
||||
const now = Date.now();
|
||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||
console.log(chalk.blue(` ⏱️ [${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`));
|
||||
lastHeartbeat = now;
|
||||
}
|
||||
// Apply validation function
|
||||
const validationResult = await validator(sourceDir);
|
||||
|
||||
if (message.type === "assistant") {
|
||||
turnCount++;
|
||||
if (validationResult) {
|
||||
console.log(chalk.green(` Validation passed: Required files/structure present`));
|
||||
} else {
|
||||
console.log(chalk.red(` Validation failed: Missing required deliverable files`));
|
||||
}
|
||||
|
||||
const messageContent = message.message as { content: unknown };
|
||||
const content = Array.isArray(messageContent.content)
|
||||
? messageContent.content.map((c: { text?: string }) => c.text || JSON.stringify(c)).join('\n')
|
||||
: String(messageContent.content);
|
||||
return validationResult;
|
||||
|
||||
if (statusManager) {
|
||||
// Smart status updates for parallel execution - disabled
|
||||
} else if (useCleanOutput) {
|
||||
// Clean output for all agents: filter JSON tool calls but show meaningful text
|
||||
const cleanedContent = filterJsonToolCalls(content);
|
||||
if (cleanedContent.trim()) {
|
||||
// Temporarily stop progress indicator to show output
|
||||
if (progressIndicator) {
|
||||
progressIndicator.stop();
|
||||
}
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.red(` Validation failed with error: ${errMsg}`));
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if (isParallelExecution) {
|
||||
// Compact output for parallel agents with prefixes
|
||||
const prefix = getAgentPrefix(description);
|
||||
console.log(colorFn(`${prefix} ${cleanedContent}`));
|
||||
} else {
|
||||
// Full turn output for single agents
|
||||
console.log(colorFn(`\n 🤖 Turn ${turnCount} (${description}):`));
|
||||
console.log(colorFn(` ${cleanedContent}`));
|
||||
}
|
||||
// Low-level SDK execution. Handles message streaming, progress, and audit logging.
|
||||
// Exported for Temporal activities to call single-attempt execution.
|
||||
export async function runClaudePrompt(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
agentName: string | null = null,
|
||||
colorFn: ChalkInstance = chalk.cyan,
|
||||
sessionMetadata: SessionMetadata | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
attemptNumber: number = 1
|
||||
): Promise<ClaudePromptResult> {
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
|
||||
// Restart progress indicator after output
|
||||
if (progressIndicator) {
|
||||
progressIndicator.start();
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Full streaming output - show complete messages with specialist color
|
||||
console.log(colorFn(`\n 🤖 Turn ${turnCount} (${description}):`));
|
||||
console.log(colorFn(` ${content}`));
|
||||
}
|
||||
const execContext = detectExecutionContext(description);
|
||||
const progress = createProgressManager(
|
||||
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||
global.SHANNON_DISABLE_LOADER ?? false
|
||||
);
|
||||
const auditLogger = createAuditLogger(auditSession);
|
||||
|
||||
// Log to audit system (crash-safe, append-only)
|
||||
if (auditSession) {
|
||||
await auditSession.logEvent('llm_response', {
|
||||
turn: turnCount,
|
||||
content,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
}
|
||||
console.log(chalk.blue(` Running Claude Code: ${description}...`));
|
||||
|
||||
messages.push(content);
|
||||
const mcpServers = buildMcpServers(sourceDir, agentName);
|
||||
const options = {
|
||||
model: 'claude-sonnet-4-5-20250929',
|
||||
maxTurns: 10_000,
|
||||
cwd: sourceDir,
|
||||
permissionMode: 'bypassPermissions' as const,
|
||||
mcpServers,
|
||||
};
|
||||
|
||||
// Check for API error patterns in assistant message content
|
||||
if (content && typeof content === 'string') {
|
||||
const lowerContent = content.toLowerCase();
|
||||
if (lowerContent.includes('session limit reached')) {
|
||||
throw new PentestError('Session limit reached', 'billing', false);
|
||||
}
|
||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||
apiErrorDetected = true;
|
||||
console.log(chalk.red(` ⚠️ API Error detected in assistant response: ${content.trim()}`));
|
||||
}
|
||||
}
|
||||
if (!execContext.useCleanOutput) {
|
||||
console.log(chalk.gray(` SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`));
|
||||
}
|
||||
|
||||
} else if (message.type === "system" && (message as { subtype?: string }).subtype === "init") {
|
||||
// Show useful system info only for verbose agents
|
||||
if (!useCleanOutput) {
|
||||
const initMsg = message as { model?: string; permissionMode?: string; mcp_servers?: Array<{ name: string; status: string }> };
|
||||
console.log(chalk.blue(` ℹ️ Model: ${initMsg.model}, Permission: ${initMsg.permissionMode}`));
|
||||
if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
|
||||
const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
|
||||
console.log(chalk.blue(` 📦 MCP: ${mcpStatus}`));
|
||||
}
|
||||
}
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let totalCost = 0;
|
||||
|
||||
} else if (message.type === "user") {
|
||||
// Skip user messages (these are our own inputs echoed back)
|
||||
continue;
|
||||
progress.start();
|
||||
|
||||
} else if ((message.type as string) === "tool_use") {
|
||||
const toolMsg = message as unknown as { name: string; input?: Record<string, unknown> };
|
||||
console.log(chalk.yellow(`\n 🔧 Using Tool: ${toolMsg.name}`));
|
||||
if (toolMsg.input && Object.keys(toolMsg.input).length > 0) {
|
||||
console.log(chalk.gray(` Input: ${JSON.stringify(toolMsg.input, null, 2)}`));
|
||||
}
|
||||
try {
|
||||
const messageLoopResult = await processMessageStream(
|
||||
fullPrompt,
|
||||
options,
|
||||
{ execContext, description, colorFn, progress, auditLogger },
|
||||
timer
|
||||
);
|
||||
|
||||
// Log tool start event
|
||||
if (auditSession) {
|
||||
await auditSession.logEvent('tool_start', {
|
||||
toolName: toolMsg.name,
|
||||
parameters: toolMsg.input,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
}
|
||||
} else if ((message.type as string) === "tool_result") {
|
||||
const resultMsg = message as unknown as { content?: unknown };
|
||||
console.log(chalk.green(` ✅ Tool Result:`));
|
||||
if (resultMsg.content) {
|
||||
// Show tool results but truncate if too long
|
||||
const resultStr = typeof resultMsg.content === 'string' ? resultMsg.content : JSON.stringify(resultMsg.content, null, 2);
|
||||
if (resultStr.length > 500) {
|
||||
console.log(chalk.gray(` ${resultStr.slice(0, 500)}...\n [Result truncated - ${resultStr.length} total chars]`));
|
||||
} else {
|
||||
console.log(chalk.gray(` ${resultStr}`));
|
||||
}
|
||||
}
|
||||
turnCount = messageLoopResult.turnCount;
|
||||
result = messageLoopResult.result;
|
||||
apiErrorDetected = messageLoopResult.apiErrorDetected;
|
||||
totalCost = messageLoopResult.cost;
|
||||
|
||||
// Log tool end event
|
||||
if (auditSession) {
|
||||
await auditSession.logEvent('tool_end', {
|
||||
result: resultMsg.content,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
}
|
||||
} else if (message.type === "result") {
|
||||
const resultMessage = message as {
|
||||
result?: string;
|
||||
total_cost_usd?: number;
|
||||
duration_ms?: number;
|
||||
subtype?: string;
|
||||
permission_denials?: unknown[];
|
||||
};
|
||||
result = resultMessage.result || null;
|
||||
// === SPENDING CAP SAFEGUARD ===
|
||||
// Defense-in-depth: Detect spending cap that slipped through detectApiError().
|
||||
// When spending cap is hit, Claude returns a short message with $0 cost.
|
||||
// Legitimate agent work NEVER costs $0 with only 1-2 turns.
|
||||
if (turnCount <= 2 && totalCost === 0) {
|
||||
const resultLower = (result || '').toLowerCase();
|
||||
const BILLING_KEYWORDS = ['spending', 'cap', 'limit', 'budget', 'resets'];
|
||||
const looksLikeBillingError = BILLING_KEYWORDS.some((kw) =>
|
||||
resultLower.includes(kw)
|
||||
);
|
||||
|
||||
if (!statusManager) {
|
||||
if (useCleanOutput) {
|
||||
// Clean completion output - just duration and cost
|
||||
console.log(chalk.magenta(`\n 🏁 COMPLETED:`));
|
||||
const cost = resultMessage.total_cost_usd || 0;
|
||||
console.log(chalk.gray(` ⏱️ Duration: ${((resultMessage.duration_ms || 0)/1000).toFixed(1)}s, Cost: $${cost.toFixed(4)}`));
|
||||
|
||||
if (resultMessage.subtype === "error_max_turns") {
|
||||
console.log(chalk.red(` ⚠️ Stopped: Hit maximum turns limit`));
|
||||
} else if (resultMessage.subtype === "error_during_execution") {
|
||||
console.log(chalk.red(` ❌ Stopped: Execution error`));
|
||||
}
|
||||
|
||||
if (resultMessage.permission_denials && resultMessage.permission_denials.length > 0) {
|
||||
console.log(chalk.yellow(` 🚫 ${resultMessage.permission_denials.length} permission denials`));
|
||||
}
|
||||
} else {
|
||||
// Full completion output for agents without clean output
|
||||
console.log(chalk.magenta(`\n 🏁 COMPLETED:`));
|
||||
const cost = resultMessage.total_cost_usd || 0;
|
||||
console.log(chalk.gray(` ⏱️ Duration: ${((resultMessage.duration_ms || 0)/1000).toFixed(1)}s, Cost: $${cost.toFixed(4)}`));
|
||||
|
||||
if (resultMessage.subtype === "error_max_turns") {
|
||||
console.log(chalk.red(` ⚠️ Stopped: Hit maximum turns limit`));
|
||||
} else if (resultMessage.subtype === "error_during_execution") {
|
||||
console.log(chalk.red(` ❌ Stopped: Execution error`));
|
||||
}
|
||||
|
||||
if (resultMessage.permission_denials && resultMessage.permission_denials.length > 0) {
|
||||
console.log(chalk.yellow(` 🚫 ${resultMessage.permission_denials.length} permission denials`));
|
||||
}
|
||||
|
||||
// Show result content (if it's reasonable length)
|
||||
if (result && typeof result === 'string') {
|
||||
if (result.length > 1000) {
|
||||
console.log(chalk.magenta(` 📄 ${result.slice(0, 1000)}... [${result.length} total chars]`));
|
||||
} else {
|
||||
console.log(chalk.magenta(` 📄 ${result}`));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Track cost for all agents
|
||||
const cost = resultMessage.total_cost_usd || 0;
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
costResults.agents[agentKey] = cost;
|
||||
costResults.total += cost;
|
||||
|
||||
// Store cost for return value and partial tracking
|
||||
totalCost = cost;
|
||||
partialCost = cost;
|
||||
break;
|
||||
} else {
|
||||
// Log any other message types we might not be handling
|
||||
console.log(chalk.gray(` 💬 ${message.type}: ${JSON.stringify(message, null, 2)}`));
|
||||
}
|
||||
if (looksLikeBillingError) {
|
||||
throw new PentestError(
|
||||
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||
'billing',
|
||||
true // Retryable - Temporal will use 5-30 min backoff
|
||||
);
|
||||
}
|
||||
} catch (queryError) {
|
||||
throw queryError; // Re-throw to outer catch
|
||||
}
|
||||
|
||||
const duration = timer.stop();
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
timingResults.agents[agentKey] = duration;
|
||||
timingResults.agents[execContext.agentKey] = duration;
|
||||
|
||||
// API error detection is logged but not immediately failed
|
||||
if (apiErrorDetected) {
|
||||
console.log(chalk.yellow(` ⚠️ API Error detected in ${description} - will validate deliverables before failing`));
|
||||
console.log(chalk.yellow(` API Error detected in ${description} - will validate deliverables before failing`));
|
||||
}
|
||||
|
||||
// Show completion messages based on agent type
|
||||
if (progressIndicator) {
|
||||
const agentType = description.includes('Pre-recon') ? 'Pre-recon analysis' :
|
||||
description.includes('Recon') ? 'Reconnaissance' :
|
||||
description.includes('Report') ? 'Report generation' : 'Analysis';
|
||||
progressIndicator.finish(`${agentType} complete! (${turnCount} turns, ${formatDuration(duration)})`);
|
||||
} else if (isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
console.log(chalk.green(`${prefix} ✅ Complete (${turnCount} turns, ${formatDuration(duration)})`));
|
||||
} else if (!useCleanOutput) {
|
||||
console.log(chalk.green(` ✅ Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`));
|
||||
}
|
||||
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||
|
||||
// Return result with log file path for all agents
|
||||
const returnData: ClaudePromptResult = {
|
||||
return {
|
||||
result,
|
||||
success: true,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
cost: totalCost,
|
||||
partialCost,
|
||||
partialCost: totalCost,
|
||||
apiErrorDetected
|
||||
};
|
||||
if (logFilePath) {
|
||||
returnData.logFile = logFilePath;
|
||||
}
|
||||
return returnData;
|
||||
|
||||
} catch (error) {
|
||||
const duration = timer.stop();
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
timingResults.agents[agentKey] = duration;
|
||||
timingResults.agents[execContext.agentKey] = duration;
|
||||
|
||||
const err = error as Error & { code?: string; status?: number; duration?: number; cost?: number };
|
||||
const err = error as Error & { code?: string; status?: number };
|
||||
|
||||
// Log error to audit system
|
||||
if (auditSession) {
|
||||
await auditSession.logEvent('error', {
|
||||
message: err.message,
|
||||
errorType: err.constructor.name,
|
||||
stack: err.stack,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
}
|
||||
|
||||
// Show error messages based on agent type
|
||||
if (progressIndicator) {
|
||||
progressIndicator.stop();
|
||||
const agentType = description.includes('Pre-recon') ? 'Pre-recon analysis' :
|
||||
description.includes('Recon') ? 'Reconnaissance' :
|
||||
description.includes('Report') ? 'Report generation' : 'Analysis';
|
||||
console.log(chalk.red(`❌ ${agentType} failed (${formatDuration(duration)})`));
|
||||
} else if (isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
console.log(chalk.red(`${prefix} ❌ Failed (${formatDuration(duration)})`));
|
||||
} else if (!useCleanOutput) {
|
||||
console.log(chalk.red(` ❌ Claude Code failed: ${description} (${formatDuration(duration)})`));
|
||||
}
|
||||
console.log(chalk.red(` Error Type: ${err.constructor.name}`));
|
||||
console.log(chalk.red(` Message: ${err.message}`));
|
||||
console.log(chalk.gray(` Agent: ${description}`));
|
||||
console.log(chalk.gray(` Working Directory: ${sourceDir}`));
|
||||
console.log(chalk.gray(` Retryable: ${isRetryableError(err) ? 'Yes' : 'No'}`));
|
||||
|
||||
// Log additional context if available
|
||||
if (err.code) {
|
||||
console.log(chalk.gray(` Error Code: ${err.code}`));
|
||||
}
|
||||
if (err.status) {
|
||||
console.log(chalk.gray(` HTTP Status: ${err.status}`));
|
||||
}
|
||||
|
||||
// Save detailed error to log file for debugging
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: new Date().toISOString(),
|
||||
agent: description,
|
||||
error: {
|
||||
name: err.constructor.name,
|
||||
message: err.message,
|
||||
code: err.code,
|
||||
status: err.status,
|
||||
stack: err.stack
|
||||
},
|
||||
context: {
|
||||
sourceDir,
|
||||
prompt: fullPrompt.slice(0, 200) + '...',
|
||||
retryable: isRetryableError(err)
|
||||
},
|
||||
duration
|
||||
};
|
||||
|
||||
const logPath = path.join(sourceDir, 'error.log');
|
||||
await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
|
||||
} catch (logError) {
|
||||
const logErrMsg = logError instanceof Error ? logError.message : String(logError);
|
||||
console.log(chalk.gray(` (Failed to write error log: ${logErrMsg})`));
|
||||
}
|
||||
await auditLogger.logError(err, duration, turnCount);
|
||||
progress.stop();
|
||||
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||
|
||||
return {
|
||||
error: err.message,
|
||||
@@ -572,17 +304,85 @@ async function runClaudePrompt(
|
||||
prompt: fullPrompt.slice(0, 100) + '...',
|
||||
success: false,
|
||||
duration,
|
||||
cost: partialCost,
|
||||
cost: totalCost,
|
||||
retryable: isRetryableError(err)
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// PREFERRED: Production-ready Claude agent execution with full orchestration
|
||||
|
||||
interface MessageLoopResult {
|
||||
turnCount: number;
|
||||
result: string | null;
|
||||
apiErrorDetected: boolean;
|
||||
cost: number;
|
||||
}
|
||||
|
||||
interface MessageLoopDeps {
|
||||
execContext: ReturnType<typeof detectExecutionContext>;
|
||||
description: string;
|
||||
colorFn: ChalkInstance;
|
||||
progress: ReturnType<typeof createProgressManager>;
|
||||
auditLogger: ReturnType<typeof createAuditLogger>;
|
||||
}
|
||||
|
||||
async function processMessageStream(
|
||||
fullPrompt: string,
|
||||
options: NonNullable<Parameters<typeof query>[0]['options']>,
|
||||
deps: MessageLoopDeps,
|
||||
timer: Timer
|
||||
): Promise<MessageLoopResult> {
|
||||
const { execContext, description, colorFn, progress, auditLogger } = deps;
|
||||
const HEARTBEAT_INTERVAL = 30000;
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let cost = 0;
|
||||
let lastHeartbeat = Date.now();
|
||||
|
||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||
// Heartbeat logging when loader is disabled
|
||||
const now = Date.now();
|
||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||
console.log(chalk.blue(` [${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`));
|
||||
lastHeartbeat = now;
|
||||
}
|
||||
|
||||
// Increment turn count for assistant messages
|
||||
if (message.type === 'assistant') {
|
||||
turnCount++;
|
||||
}
|
||||
|
||||
const dispatchResult = await dispatchMessage(
|
||||
message as { type: string; subtype?: string },
|
||||
turnCount,
|
||||
{ execContext, description, colorFn, progress, auditLogger }
|
||||
);
|
||||
|
||||
if (dispatchResult.type === 'throw') {
|
||||
throw dispatchResult.error;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'complete') {
|
||||
result = dispatchResult.result;
|
||||
cost = dispatchResult.cost;
|
||||
break;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'continue' && dispatchResult.apiErrorDetected) {
|
||||
apiErrorDetected = true;
|
||||
}
|
||||
}
|
||||
|
||||
return { turnCount, result, apiErrorDetected, cost };
|
||||
}
|
||||
|
||||
// Main entry point for agent execution. Handles retries, git checkpoints, and validation.
|
||||
export async function runClaudePromptWithRetry(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
allowedTools: string = 'Read',
|
||||
_allowedTools: string = 'Read',
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
agentName: string | null = null,
|
||||
@@ -593,9 +393,8 @@ export async function runClaudePromptWithRetry(
|
||||
let lastError: Error | undefined;
|
||||
let retryContext = context;
|
||||
|
||||
console.log(chalk.cyan(`🚀 Starting ${description} with ${maxRetries} max attempts`));
|
||||
console.log(chalk.cyan(`Starting ${description} with ${maxRetries} max attempts`));
|
||||
|
||||
// Initialize audit session (crash-safe logging)
|
||||
let auditSession: AuditSession | null = null;
|
||||
if (sessionMetadata && agentName) {
|
||||
auditSession = new AuditSession(sessionMetadata);
|
||||
@@ -603,29 +402,27 @@ export async function runClaudePromptWithRetry(
|
||||
}
|
||||
|
||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||
// Create checkpoint before each attempt
|
||||
await createGitCheckpoint(sourceDir, description, attempt);
|
||||
|
||||
// Start agent tracking in audit system (saves prompt snapshot automatically)
|
||||
if (auditSession && agentName) {
|
||||
const fullPrompt = retryContext ? `${retryContext}\n\n${prompt}` : prompt;
|
||||
await auditSession.startAgent(agentName, fullPrompt, attempt);
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await runClaudePrompt(prompt, sourceDir, allowedTools, retryContext, description, agentName, colorFn, sessionMetadata, auditSession, attempt);
|
||||
const result = await runClaudePrompt(
|
||||
prompt, sourceDir, retryContext,
|
||||
description, agentName, colorFn, sessionMetadata, auditSession, attempt
|
||||
);
|
||||
|
||||
// Validate output after successful run
|
||||
if (result.success) {
|
||||
const validationPassed = await validateAgentOutput(result, agentName, sourceDir);
|
||||
|
||||
if (validationPassed) {
|
||||
// Check if API error was detected but validation passed
|
||||
if (result.apiErrorDetected) {
|
||||
console.log(chalk.yellow(`📋 Validation: Ready for exploitation despite API error warnings`));
|
||||
console.log(chalk.yellow(`Validation: Ready for exploitation despite API error warnings`));
|
||||
}
|
||||
|
||||
// Record successful attempt in audit system
|
||||
if (auditSession && agentName) {
|
||||
const commitHash = await getGitCommitHash(sourceDir);
|
||||
const endResult: {
|
||||
@@ -646,15 +443,13 @@ export async function runClaudePromptWithRetry(
|
||||
await auditSession.endAgent(agentName, endResult);
|
||||
}
|
||||
|
||||
// Commit successful changes (will include the snapshot)
|
||||
await commitGitSuccess(sourceDir, description);
|
||||
console.log(chalk.green.bold(`🎉 ${description} completed successfully on attempt ${attempt}/${maxRetries}`));
|
||||
console.log(chalk.green.bold(`${description} completed successfully on attempt ${attempt}/${maxRetries}`));
|
||||
return result;
|
||||
// Validation failure is retryable - agent might succeed on retry with cleaner workspace
|
||||
} else {
|
||||
// Agent completed but output validation failed
|
||||
console.log(chalk.yellow(`⚠️ ${description} completed but output validation failed`));
|
||||
console.log(chalk.yellow(`${description} completed but output validation failed`));
|
||||
|
||||
// Record failed validation attempt in audit system
|
||||
if (auditSession && agentName) {
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber: attempt,
|
||||
@@ -666,20 +461,17 @@ export async function runClaudePromptWithRetry(
|
||||
});
|
||||
}
|
||||
|
||||
// If API error detected AND validation failed, this is a retryable error
|
||||
if (result.apiErrorDetected) {
|
||||
console.log(chalk.yellow(`⚠️ API Error detected with validation failure - treating as retryable`));
|
||||
console.log(chalk.yellow(`API Error detected with validation failure - treating as retryable`));
|
||||
lastError = new Error('API Error: terminated with validation failure');
|
||||
} else {
|
||||
lastError = new Error('Output validation failed');
|
||||
}
|
||||
|
||||
if (attempt < maxRetries) {
|
||||
// Rollback contaminated workspace
|
||||
await rollbackGitWorkspace(sourceDir, 'validation failure');
|
||||
continue;
|
||||
} else {
|
||||
// FAIL FAST - Don't continue with broken pipeline
|
||||
throw new PentestError(
|
||||
`Agent ${description} failed output validation after ${maxRetries} attempts. Required deliverable files were not created.`,
|
||||
'validation',
|
||||
@@ -694,7 +486,6 @@ export async function runClaudePromptWithRetry(
|
||||
const err = error as Error & { duration?: number; cost?: number; partialResults?: unknown };
|
||||
lastError = err;
|
||||
|
||||
// Record failed attempt in audit system
|
||||
if (auditSession && agentName) {
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber: attempt,
|
||||
@@ -706,24 +497,21 @@ export async function runClaudePromptWithRetry(
|
||||
});
|
||||
}
|
||||
|
||||
// Check if error is retryable
|
||||
if (!isRetryableError(err)) {
|
||||
console.log(chalk.red(`❌ ${description} failed with non-retryable error: ${err.message}`));
|
||||
console.log(chalk.red(`${description} failed with non-retryable error: ${err.message}`));
|
||||
await rollbackGitWorkspace(sourceDir, 'non-retryable error cleanup');
|
||||
throw err;
|
||||
}
|
||||
|
||||
if (attempt < maxRetries) {
|
||||
// Rollback for clean retry
|
||||
await rollbackGitWorkspace(sourceDir, 'retryable error cleanup');
|
||||
|
||||
const delay = getRetryDelay(err, attempt);
|
||||
const delaySeconds = (delay / 1000).toFixed(1);
|
||||
console.log(chalk.yellow(`⚠️ ${description} failed (attempt ${attempt}/${maxRetries})`));
|
||||
console.log(chalk.yellow(`${description} failed (attempt ${attempt}/${maxRetries})`));
|
||||
console.log(chalk.gray(` Error: ${err.message}`));
|
||||
console.log(chalk.gray(` Workspace rolled back, retrying in ${delaySeconds}s...`));
|
||||
|
||||
// Preserve any partial results for next retry
|
||||
if (err.partialResults) {
|
||||
retryContext = `${context}\n\nPrevious partial results: ${JSON.stringify(err.partialResults)}`;
|
||||
}
|
||||
@@ -731,7 +519,7 @@ export async function runClaudePromptWithRetry(
|
||||
await new Promise(resolve => setTimeout(resolve, delay));
|
||||
} else {
|
||||
await rollbackGitWorkspace(sourceDir, 'final failure cleanup');
|
||||
console.log(chalk.red(`❌ ${description} failed after ${maxRetries} attempts`));
|
||||
console.log(chalk.red(`${description} failed after ${maxRetries} attempts`));
|
||||
console.log(chalk.red(` Final error: ${err.message}`));
|
||||
}
|
||||
}
|
||||
@@ -739,13 +527,3 @@ export async function runClaudePromptWithRetry(
|
||||
|
||||
throw lastError;
|
||||
}
|
||||
|
||||
// Helper function to get git commit hash
|
||||
async function getGitCommitHash(sourceDir: string): Promise<string | null> {
|
||||
try {
|
||||
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
|
||||
return result.stdout.trim();
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,272 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Pure functions for processing SDK message types
|
||||
|
||||
import { PentestError } from '../error-handling.js';
|
||||
import { filterJsonToolCalls } from '../utils/output-formatter.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import chalk from 'chalk';
|
||||
import {
|
||||
formatAssistantOutput,
|
||||
formatResultOutput,
|
||||
formatToolUseOutput,
|
||||
formatToolResultOutput,
|
||||
} from './output-formatters.js';
|
||||
import { costResults } from '../utils/metrics.js';
|
||||
import type { AuditLogger } from './audit-logger.js';
|
||||
import type { ProgressManager } from './progress-manager.js';
|
||||
import type {
|
||||
AssistantMessage,
|
||||
ResultMessage,
|
||||
ToolUseMessage,
|
||||
ToolResultMessage,
|
||||
AssistantResult,
|
||||
ResultData,
|
||||
ToolUseData,
|
||||
ToolResultData,
|
||||
ApiErrorDetection,
|
||||
ContentBlock,
|
||||
SystemInitMessage,
|
||||
ExecutionContext,
|
||||
} from './types.js';
|
||||
import type { ChalkInstance } from 'chalk';
|
||||
|
||||
// Handles both array and string content formats from SDK
|
||||
export function extractMessageContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content
|
||||
.map((c: ContentBlock) => c.text || JSON.stringify(c))
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
export function detectApiError(content: string): ApiErrorDetection {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
const lowerContent = content.toLowerCase();
|
||||
|
||||
// === BILLING/SPENDING CAP ERRORS (Retryable with long backoff) ===
|
||||
// When Claude Code hits its spending cap, it returns a short message like
|
||||
// "Spending cap reached resets 8am" instead of throwing an error.
|
||||
// These should retry with 5-30 min backoff so workflows can recover when cap resets.
|
||||
const BILLING_PATTERNS = [
|
||||
'spending cap',
|
||||
'spending limit',
|
||||
'cap reached',
|
||||
'budget exceeded',
|
||||
'usage limit',
|
||||
];
|
||||
|
||||
const isBillingError = BILLING_PATTERNS.some((pattern) =>
|
||||
lowerContent.includes(pattern)
|
||||
);
|
||||
|
||||
if (isBillingError) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true // RETRYABLE - Temporal will use 5-30 min backoff
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
// === SESSION LIMIT (Non-retryable) ===
|
||||
// Different from spending cap - usually means something is fundamentally wrong
|
||||
if (lowerContent.includes('session limit reached')) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError('Session limit reached', 'billing', false),
|
||||
};
|
||||
}
|
||||
|
||||
// Non-fatal API errors - detected but continue
|
||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||
return { detected: true };
|
||||
}
|
||||
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
export function handleAssistantMessage(
|
||||
message: AssistantMessage,
|
||||
turnCount: number
|
||||
): AssistantResult {
|
||||
const content = extractMessageContent(message);
|
||||
const cleanedContent = filterJsonToolCalls(content);
|
||||
const errorDetection = detectApiError(content);
|
||||
|
||||
const result: AssistantResult = {
|
||||
content,
|
||||
cleanedContent,
|
||||
apiErrorDetected: errorDetection.detected,
|
||||
logData: {
|
||||
turn: turnCount,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
},
|
||||
};
|
||||
|
||||
// Only add shouldThrow if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (errorDetection.shouldThrow) {
|
||||
result.shouldThrow = errorDetection.shouldThrow;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Final message of a query with cost/duration info
|
||||
export function handleResultMessage(message: ResultMessage): ResultData {
|
||||
const result: ResultData = {
|
||||
result: message.result || null,
|
||||
cost: message.total_cost_usd || 0,
|
||||
duration_ms: message.duration_ms || 0,
|
||||
permissionDenials: message.permission_denials?.length || 0,
|
||||
};
|
||||
|
||||
// Only add subtype if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (message.subtype) {
|
||||
result.subtype = message.subtype;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
export function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
|
||||
return {
|
||||
toolName: message.name,
|
||||
parameters: message.input || {},
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
// Truncates long results for display (500 char limit), preserves full content for logging
|
||||
export function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
|
||||
const content = message.content;
|
||||
const contentStr =
|
||||
typeof content === 'string' ? content : JSON.stringify(content, null, 2);
|
||||
|
||||
const displayContent =
|
||||
contentStr.length > 500
|
||||
? `${contentStr.slice(0, 500)}...\n[Result truncated - ${contentStr.length} total chars]`
|
||||
: contentStr;
|
||||
|
||||
return {
|
||||
content,
|
||||
displayContent,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
// Output helper for console logging
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
// Message dispatch result types
|
||||
export type MessageDispatchAction =
|
||||
| { type: 'continue'; apiErrorDetected?: boolean }
|
||||
| { type: 'complete'; result: string | null; cost: number }
|
||||
| { type: 'throw'; error: Error };
|
||||
|
||||
export interface MessageDispatchDeps {
|
||||
execContext: ExecutionContext;
|
||||
description: string;
|
||||
colorFn: ChalkInstance;
|
||||
progress: ProgressManager;
|
||||
auditLogger: AuditLogger;
|
||||
}
|
||||
|
||||
// Dispatches SDK messages to appropriate handlers and formatters
|
||||
export async function dispatchMessage(
|
||||
message: { type: string; subtype?: string },
|
||||
turnCount: number,
|
||||
deps: MessageDispatchDeps
|
||||
): Promise<MessageDispatchAction> {
|
||||
const { execContext, description, colorFn, progress, auditLogger } = deps;
|
||||
|
||||
switch (message.type) {
|
||||
case 'assistant': {
|
||||
const assistantResult = handleAssistantMessage(message as AssistantMessage, turnCount);
|
||||
|
||||
if (assistantResult.shouldThrow) {
|
||||
return { type: 'throw', error: assistantResult.shouldThrow };
|
||||
}
|
||||
|
||||
if (assistantResult.cleanedContent.trim()) {
|
||||
progress.stop();
|
||||
outputLines(formatAssistantOutput(
|
||||
assistantResult.cleanedContent,
|
||||
execContext,
|
||||
turnCount,
|
||||
description,
|
||||
colorFn
|
||||
));
|
||||
progress.start();
|
||||
}
|
||||
|
||||
await auditLogger.logLlmResponse(turnCount, assistantResult.content);
|
||||
|
||||
if (assistantResult.apiErrorDetected) {
|
||||
console.log(chalk.red(` API Error detected in assistant response`));
|
||||
return { type: 'continue', apiErrorDetected: true };
|
||||
}
|
||||
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'system': {
|
||||
if (message.subtype === 'init' && !execContext.useCleanOutput) {
|
||||
const initMsg = message as SystemInitMessage;
|
||||
console.log(chalk.blue(` Model: ${initMsg.model}, Permission: ${initMsg.permissionMode}`));
|
||||
if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
|
||||
const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
|
||||
console.log(chalk.blue(` MCP: ${mcpStatus}`));
|
||||
}
|
||||
}
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'user':
|
||||
return { type: 'continue' };
|
||||
|
||||
case 'tool_use': {
|
||||
const toolData = handleToolUseMessage(message as unknown as ToolUseMessage);
|
||||
outputLines(formatToolUseOutput(toolData.toolName, toolData.parameters));
|
||||
await auditLogger.logToolStart(toolData.toolName, toolData.parameters);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'tool_result': {
|
||||
const toolResultData = handleToolResultMessage(message as unknown as ToolResultMessage);
|
||||
outputLines(formatToolResultOutput(toolResultData.displayContent));
|
||||
await auditLogger.logToolEnd(toolResultData.content);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'result': {
|
||||
const resultData = handleResultMessage(message as ResultMessage);
|
||||
outputLines(formatResultOutput(resultData, !execContext.useCleanOutput));
|
||||
costResults.agents[execContext.agentKey] = resultData.cost;
|
||||
costResults.total += resultData.cost;
|
||||
return { type: 'complete', result: resultData.result, cost: resultData.cost };
|
||||
}
|
||||
|
||||
default:
|
||||
console.log(chalk.gray(` ${message.type}: ${JSON.stringify(message, null, 2)}`));
|
||||
return { type: 'continue' };
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,169 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Pure functions for formatting console output
|
||||
|
||||
import chalk from 'chalk';
|
||||
import { extractAgentType, formatDuration } from '../utils/formatting.js';
|
||||
import { getAgentPrefix } from '../utils/output-formatter.js';
|
||||
import type { ExecutionContext, ResultData } from './types.js';
|
||||
|
||||
export function detectExecutionContext(description: string): ExecutionContext {
|
||||
const isParallelExecution =
|
||||
description.includes('vuln agent') || description.includes('exploit agent');
|
||||
|
||||
const useCleanOutput =
|
||||
description.includes('Pre-recon agent') ||
|
||||
description.includes('Recon agent') ||
|
||||
description.includes('Executive Summary and Report Cleanup') ||
|
||||
description.includes('vuln agent') ||
|
||||
description.includes('exploit agent');
|
||||
|
||||
const agentType = extractAgentType(description);
|
||||
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
return { isParallelExecution, useCleanOutput, agentType, agentKey };
|
||||
}
|
||||
|
||||
export function formatAssistantOutput(
|
||||
cleanedContent: string,
|
||||
context: ExecutionContext,
|
||||
turnCount: number,
|
||||
description: string,
|
||||
colorFn: typeof chalk.cyan = chalk.cyan
|
||||
): string[] {
|
||||
if (!cleanedContent.trim()) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
// Compact output for parallel agents with prefixes
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(colorFn(`${prefix} ${cleanedContent}`));
|
||||
} else {
|
||||
// Full turn output for sequential agents
|
||||
lines.push(colorFn(`\n Turn ${turnCount} (${description}):`));
|
||||
lines.push(colorFn(` ${cleanedContent}`));
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatResultOutput(data: ResultData, showFullResult: boolean): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(chalk.magenta(`\n COMPLETED:`));
|
||||
lines.push(
|
||||
chalk.gray(
|
||||
` Duration: ${(data.duration_ms / 1000).toFixed(1)}s, Cost: $${data.cost.toFixed(4)}`
|
||||
)
|
||||
);
|
||||
|
||||
if (data.subtype === 'error_max_turns') {
|
||||
lines.push(chalk.red(` Stopped: Hit maximum turns limit`));
|
||||
} else if (data.subtype === 'error_during_execution') {
|
||||
lines.push(chalk.red(` Stopped: Execution error`));
|
||||
}
|
||||
|
||||
if (data.permissionDenials > 0) {
|
||||
lines.push(chalk.yellow(` ${data.permissionDenials} permission denials`));
|
||||
}
|
||||
|
||||
if (showFullResult && data.result && typeof data.result === 'string') {
|
||||
if (data.result.length > 1000) {
|
||||
lines.push(chalk.magenta(` ${data.result.slice(0, 1000)}... [${data.result.length} total chars]`));
|
||||
} else {
|
||||
lines.push(chalk.magenta(` ${data.result}`));
|
||||
}
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatErrorOutput(
|
||||
error: Error & { code?: string; status?: number },
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
duration: number,
|
||||
sourceDir: string,
|
||||
isRetryable: boolean
|
||||
): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(chalk.red(`${prefix} Failed (${formatDuration(duration)})`));
|
||||
} else if (context.useCleanOutput) {
|
||||
lines.push(chalk.red(`${context.agentType} failed (${formatDuration(duration)})`));
|
||||
} else {
|
||||
lines.push(chalk.red(` Claude Code failed: ${description} (${formatDuration(duration)})`));
|
||||
}
|
||||
|
||||
lines.push(chalk.red(` Error Type: ${error.constructor.name}`));
|
||||
lines.push(chalk.red(` Message: ${error.message}`));
|
||||
lines.push(chalk.gray(` Agent: ${description}`));
|
||||
lines.push(chalk.gray(` Working Directory: ${sourceDir}`));
|
||||
lines.push(chalk.gray(` Retryable: ${isRetryable ? 'Yes' : 'No'}`));
|
||||
|
||||
if (error.code) {
|
||||
lines.push(chalk.gray(` Error Code: ${error.code}`));
|
||||
}
|
||||
if (error.status) {
|
||||
lines.push(chalk.gray(` HTTP Status: ${error.status}`));
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatCompletionMessage(
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
turnCount: number,
|
||||
duration: number
|
||||
): string {
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
return chalk.green(`${prefix} Complete (${turnCount} turns, ${formatDuration(duration)})`);
|
||||
}
|
||||
|
||||
if (context.useCleanOutput) {
|
||||
return chalk.green(
|
||||
`${context.agentType.charAt(0).toUpperCase() + context.agentType.slice(1)} complete! (${turnCount} turns, ${formatDuration(duration)})`
|
||||
);
|
||||
}
|
||||
|
||||
return chalk.green(
|
||||
` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`
|
||||
);
|
||||
}
|
||||
|
||||
export function formatToolUseOutput(
|
||||
toolName: string,
|
||||
input: Record<string, unknown> | undefined
|
||||
): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(chalk.yellow(`\n Using Tool: ${toolName}`));
|
||||
if (input && Object.keys(input).length > 0) {
|
||||
lines.push(chalk.gray(` Input: ${JSON.stringify(input, null, 2)}`));
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatToolResultOutput(displayContent: string): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(chalk.green(` Tool Result:`));
|
||||
if (displayContent) {
|
||||
lines.push(chalk.gray(` ${displayContent}`));
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for progress indicator - callers never check for null
|
||||
|
||||
import { ProgressIndicator } from '../progress-indicator.js';
|
||||
import { extractAgentType } from '../utils/formatting.js';
|
||||
|
||||
export interface ProgressContext {
|
||||
description: string;
|
||||
useCleanOutput: boolean;
|
||||
}
|
||||
|
||||
export interface ProgressManager {
|
||||
start(): void;
|
||||
stop(): void;
|
||||
finish(message: string): void;
|
||||
isActive(): boolean;
|
||||
}
|
||||
|
||||
class RealProgressManager implements ProgressManager {
|
||||
private indicator: ProgressIndicator;
|
||||
private active: boolean = false;
|
||||
|
||||
constructor(message: string) {
|
||||
this.indicator = new ProgressIndicator(message);
|
||||
}
|
||||
|
||||
start(): void {
|
||||
this.indicator.start();
|
||||
this.active = true;
|
||||
}
|
||||
|
||||
stop(): void {
|
||||
this.indicator.stop();
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
finish(message: string): void {
|
||||
this.indicator.finish(message);
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
isActive(): boolean {
|
||||
return this.active;
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullProgressManager implements ProgressManager {
|
||||
start(): void {}
|
||||
|
||||
stop(): void {}
|
||||
|
||||
finish(_message: string): void {}
|
||||
|
||||
isActive(): boolean {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Returns no-op when disabled
|
||||
export function createProgressManager(
|
||||
context: ProgressContext,
|
||||
disableLoader: boolean
|
||||
): ProgressManager {
|
||||
if (!context.useCleanOutput || disableLoader) {
|
||||
return new NullProgressManager();
|
||||
}
|
||||
|
||||
const agentType = extractAgentType(context.description);
|
||||
return new RealProgressManager(`Running ${agentType}...`);
|
||||
}
|
||||
+134
@@ -0,0 +1,134 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Type definitions for Claude executor message processing pipeline
|
||||
|
||||
export interface ExecutionContext {
|
||||
isParallelExecution: boolean;
|
||||
useCleanOutput: boolean;
|
||||
agentType: string;
|
||||
agentKey: string;
|
||||
}
|
||||
|
||||
export interface ProcessingState {
|
||||
turnCount: number;
|
||||
result: string | null;
|
||||
apiErrorDetected: boolean;
|
||||
totalCost: number;
|
||||
partialCost: number;
|
||||
lastHeartbeat: number;
|
||||
}
|
||||
|
||||
export interface ProcessingResult {
|
||||
result: string | null;
|
||||
turnCount: number;
|
||||
apiErrorDetected: boolean;
|
||||
totalCost: number;
|
||||
}
|
||||
|
||||
export interface AssistantResult {
|
||||
content: string;
|
||||
cleanedContent: string;
|
||||
apiErrorDetected: boolean;
|
||||
shouldThrow?: Error;
|
||||
logData: {
|
||||
turn: number;
|
||||
content: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultData {
|
||||
result: string | null;
|
||||
cost: number;
|
||||
duration_ms: number;
|
||||
subtype?: string;
|
||||
permissionDenials: number;
|
||||
}
|
||||
|
||||
export interface ToolUseData {
|
||||
toolName: string;
|
||||
parameters: Record<string, unknown>;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ToolResultData {
|
||||
content: unknown;
|
||||
displayContent: string;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ContentBlock {
|
||||
type?: string;
|
||||
text?: string;
|
||||
}
|
||||
|
||||
export interface AssistantMessage {
|
||||
type: 'assistant';
|
||||
message: {
|
||||
content: ContentBlock[] | string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultMessage {
|
||||
type: 'result';
|
||||
result?: string;
|
||||
total_cost_usd?: number;
|
||||
duration_ms?: number;
|
||||
subtype?: string;
|
||||
permission_denials?: unknown[];
|
||||
}
|
||||
|
||||
export interface ToolUseMessage {
|
||||
type: 'tool_use';
|
||||
name: string;
|
||||
input?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface ToolResultMessage {
|
||||
type: 'tool_result';
|
||||
content?: unknown;
|
||||
}
|
||||
|
||||
export interface ApiErrorDetection {
|
||||
detected: boolean;
|
||||
shouldThrow?: Error;
|
||||
}
|
||||
|
||||
// Message types from SDK stream
|
||||
export type SdkMessage =
|
||||
| AssistantMessage
|
||||
| ResultMessage
|
||||
| ToolUseMessage
|
||||
| ToolResultMessage
|
||||
| SystemInitMessage
|
||||
| UserMessage;
|
||||
|
||||
export interface SystemInitMessage {
|
||||
type: 'system';
|
||||
subtype: 'init';
|
||||
model?: string;
|
||||
permissionMode?: string;
|
||||
mcp_servers?: Array<{ name: string; status: string }>;
|
||||
}
|
||||
|
||||
export interface UserMessage {
|
||||
type: 'user';
|
||||
}
|
||||
|
||||
// Dispatch result types for message processing
|
||||
export type MessageDispatchResult =
|
||||
| { action: 'continue' }
|
||||
| { action: 'break'; result: string | null; cost: number }
|
||||
| { action: 'throw'; error: Error };
|
||||
|
||||
export interface MessageDispatchContext {
|
||||
turnCount: number;
|
||||
execContext: ExecutionContext;
|
||||
description: string;
|
||||
colorFn: (text: string) => string;
|
||||
useCleanOutput: boolean;
|
||||
}
|
||||
@@ -12,8 +12,10 @@
|
||||
*/
|
||||
|
||||
import { AgentLogger } from './logger.js';
|
||||
import { WorkflowLogger, type AgentLogDetails, type WorkflowSummary } from './workflow-logger.js';
|
||||
import { MetricsTracker } from './metrics-tracker.js';
|
||||
import { initializeAuditStructure, formatTimestamp, type SessionMetadata } from './utils.js';
|
||||
import { initializeAuditStructure, type SessionMetadata } from './utils.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { SessionMutex } from '../utils/concurrency.js';
|
||||
|
||||
// Global mutex instance
|
||||
@@ -36,7 +38,9 @@ export class AuditSession {
|
||||
private sessionMetadata: SessionMetadata;
|
||||
private sessionId: string;
|
||||
private metricsTracker: MetricsTracker;
|
||||
private workflowLogger: WorkflowLogger;
|
||||
private currentLogger: AgentLogger | null = null;
|
||||
private currentAgentName: string | null = null;
|
||||
private initialized: boolean = false;
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata) {
|
||||
@@ -53,6 +57,7 @@ export class AuditSession {
|
||||
|
||||
// Components
|
||||
this.metricsTracker = new MetricsTracker(sessionMetadata);
|
||||
this.workflowLogger = new WorkflowLogger(sessionMetadata);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -70,6 +75,9 @@ export class AuditSession {
|
||||
// Initialize metrics tracker (loads or creates session.json)
|
||||
await this.metricsTracker.initialize();
|
||||
|
||||
// Initialize workflow logger
|
||||
await this.workflowLogger.initialize();
|
||||
|
||||
this.initialized = true;
|
||||
}
|
||||
|
||||
@@ -97,6 +105,9 @@ export class AuditSession {
|
||||
await AgentLogger.savePrompt(this.sessionMetadata, agentName, promptContent);
|
||||
}
|
||||
|
||||
// Track current agent name for workflow logging
|
||||
this.currentAgentName = agentName;
|
||||
|
||||
// Create and initialize logger for this attempt
|
||||
this.currentLogger = new AgentLogger(this.sessionMetadata, agentName, attemptNumber);
|
||||
await this.currentLogger.initialize();
|
||||
@@ -110,6 +121,9 @@ export class AuditSession {
|
||||
attemptNumber,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
|
||||
// Log to unified workflow log
|
||||
await this.workflowLogger.logAgent(agentName, 'start', { attemptNumber });
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -120,7 +134,30 @@ export class AuditSession {
|
||||
throw new Error('No active logger. Call startAgent() first.');
|
||||
}
|
||||
|
||||
// Log to agent-specific log file (JSON format)
|
||||
await this.currentLogger.logEvent(eventType, eventData);
|
||||
|
||||
// Also log to unified workflow log (human-readable format)
|
||||
const data = eventData as Record<string, unknown>;
|
||||
const agentName = this.currentAgentName || 'unknown';
|
||||
switch (eventType) {
|
||||
case 'tool_start':
|
||||
await this.workflowLogger.logToolStart(
|
||||
agentName,
|
||||
String(data.toolName || ''),
|
||||
data.parameters
|
||||
);
|
||||
break;
|
||||
case 'llm_response':
|
||||
await this.workflowLogger.logLlmResponse(
|
||||
agentName,
|
||||
Number(data.turn || 0),
|
||||
String(data.content || '')
|
||||
);
|
||||
break;
|
||||
// tool_end and error events are intentionally not logged to workflow log
|
||||
// to reduce noise - the agent completion message captures the outcome
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -142,10 +179,23 @@ export class AuditSession {
|
||||
this.currentLogger = null;
|
||||
}
|
||||
|
||||
// Reset current agent name
|
||||
this.currentAgentName = null;
|
||||
|
||||
// Log to unified workflow log
|
||||
const agentLogDetails: AgentLogDetails = {
|
||||
attemptNumber: result.attemptNumber,
|
||||
duration_ms: result.duration_ms,
|
||||
cost_usd: result.cost_usd,
|
||||
success: result.success,
|
||||
...(result.error !== undefined && { error: result.error }),
|
||||
};
|
||||
await this.workflowLogger.logAgent(agentName, 'end', agentLogDetails);
|
||||
|
||||
// Mutex-protected update to session.json
|
||||
const unlock = await sessionMutex.lock(this.sessionId);
|
||||
try {
|
||||
// Reload metrics (in case of parallel updates)
|
||||
// Reload inside mutex to prevent lost updates during parallel exploitation phase
|
||||
await this.metricsTracker.reload();
|
||||
|
||||
// Update metrics
|
||||
@@ -177,4 +227,28 @@ export class AuditSession {
|
||||
await this.ensureInitialized();
|
||||
return this.metricsTracker.getMetrics();
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase start to unified workflow log
|
||||
*/
|
||||
async logPhaseStart(phase: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logPhase(phase, 'start');
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase completion to unified workflow log
|
||||
*/
|
||||
async logPhaseComplete(phase: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logPhase(phase, 'complete');
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion to unified workflow log
|
||||
*/
|
||||
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logWorkflowComplete(summary);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -18,5 +18,6 @@
|
||||
|
||||
export { AuditSession } from './audit-session.js';
|
||||
export { AgentLogger } from './logger.js';
|
||||
export { WorkflowLogger } from './workflow-logger.js';
|
||||
export { MetricsTracker } from './metrics-tracker.js';
|
||||
export * as AuditUtils from './utils.js';
|
||||
|
||||
+4
-13
@@ -15,10 +15,10 @@ import fs from 'fs';
|
||||
import {
|
||||
generateLogPath,
|
||||
generatePromptPath,
|
||||
atomicWrite,
|
||||
formatTimestamp,
|
||||
type SessionMetadata,
|
||||
} from './utils.js';
|
||||
import { atomicWrite } from '../utils/file-io.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
|
||||
interface LogEvent {
|
||||
type: string;
|
||||
@@ -96,22 +96,13 @@ export class AgentLogger {
|
||||
return;
|
||||
}
|
||||
|
||||
// Write and flush immediately (crash-safe)
|
||||
const needsDrain = !this.stream.write(text, 'utf8', (error) => {
|
||||
if (error) {
|
||||
reject(error);
|
||||
}
|
||||
if (error) reject(error);
|
||||
});
|
||||
|
||||
if (needsDrain) {
|
||||
// Buffer is full, wait for drain
|
||||
const drainHandler = (): void => {
|
||||
this.stream!.removeListener('drain', drainHandler);
|
||||
resolve();
|
||||
};
|
||||
this.stream.once('drain', drainHandler);
|
||||
this.stream.once('drain', resolve);
|
||||
} else {
|
||||
// Buffer has space, resolve immediately
|
||||
resolve();
|
||||
}
|
||||
});
|
||||
|
||||
@@ -13,13 +13,12 @@
|
||||
|
||||
import {
|
||||
generateSessionJsonPath,
|
||||
atomicWrite,
|
||||
readJson,
|
||||
fileExists,
|
||||
formatTimestamp,
|
||||
calculatePercentage,
|
||||
type SessionMetadata,
|
||||
} from './utils.js';
|
||||
import { atomicWrite, readJson, fileExists } from '../utils/file-io.js';
|
||||
import { formatTimestamp, calculatePercentage } from '../utils/formatting.js';
|
||||
import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
|
||||
import type { AgentName } from '../types/index.js';
|
||||
|
||||
interface AttemptData {
|
||||
attempt_number: number;
|
||||
@@ -152,16 +151,14 @@ export class MetricsTracker {
|
||||
}
|
||||
|
||||
// Initialize agent metrics if not exists
|
||||
if (!this.data.metrics.agents[agentName]) {
|
||||
this.data.metrics.agents[agentName] = {
|
||||
status: 'in-progress',
|
||||
attempts: [],
|
||||
final_duration_ms: 0,
|
||||
total_cost_usd: 0,
|
||||
};
|
||||
}
|
||||
|
||||
const agent = this.data.metrics.agents[agentName]!;
|
||||
const existingAgent = this.data.metrics.agents[agentName];
|
||||
const agent = existingAgent ?? {
|
||||
status: 'in-progress' as const,
|
||||
attempts: [],
|
||||
final_duration_ms: 0,
|
||||
total_cost_usd: 0,
|
||||
};
|
||||
this.data.metrics.agents[agentName] = agent;
|
||||
|
||||
// Add attempt to array
|
||||
const attempt: AttemptData = {
|
||||
@@ -255,36 +252,19 @@ export class MetricsTracker {
|
||||
private calculatePhaseMetrics(
|
||||
successfulAgents: Array<[string, AgentMetrics]>
|
||||
): Record<string, PhaseMetrics> {
|
||||
const phases: Record<string, AgentMetrics[]> = {
|
||||
const phases: Record<PhaseName, AgentMetrics[]> = {
|
||||
'pre-recon': [],
|
||||
recon: [],
|
||||
'recon': [],
|
||||
'vulnerability-analysis': [],
|
||||
exploitation: [],
|
||||
reporting: [],
|
||||
'exploitation': [],
|
||||
'reporting': [],
|
||||
};
|
||||
|
||||
// Map agents to phases
|
||||
const agentPhaseMap: Record<string, string> = {
|
||||
'pre-recon': 'pre-recon',
|
||||
recon: 'recon',
|
||||
'injection-vuln': 'vulnerability-analysis',
|
||||
'xss-vuln': 'vulnerability-analysis',
|
||||
'auth-vuln': 'vulnerability-analysis',
|
||||
'authz-vuln': 'vulnerability-analysis',
|
||||
'ssrf-vuln': 'vulnerability-analysis',
|
||||
'injection-exploit': 'exploitation',
|
||||
'xss-exploit': 'exploitation',
|
||||
'auth-exploit': 'exploitation',
|
||||
'authz-exploit': 'exploitation',
|
||||
'ssrf-exploit': 'exploitation',
|
||||
report: 'reporting',
|
||||
};
|
||||
|
||||
// Group agents by phase
|
||||
// Group agents by phase using imported AGENT_PHASE_MAP
|
||||
for (const [agentName, agentData] of successfulAgents) {
|
||||
const phase = agentPhaseMap[agentName];
|
||||
if (phase && phases[phase]) {
|
||||
phases[phase]!.push(agentData);
|
||||
const phase = AGENT_PHASE_MAP[agentName as AgentName];
|
||||
if (phase) {
|
||||
phases[phase].push(agentData);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -296,7 +276,6 @@ export class MetricsTracker {
|
||||
if (agentList.length === 0) continue;
|
||||
|
||||
const phaseDuration = agentList.reduce((sum, agent) => sum + agent.final_duration_ms, 0);
|
||||
|
||||
const phaseCost = agentList.reduce((sum, agent) => sum + agent.total_cost_usd, 0);
|
||||
|
||||
phaseMetrics[phaseName] = {
|
||||
|
||||
+18
-4
@@ -31,12 +31,18 @@ export interface SessionMetadata {
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate standardized session identifier: {hostname}_{sessionId}
|
||||
* Extract and sanitize hostname from URL for use in identifiers
|
||||
*/
|
||||
export function sanitizeHostname(url: string): string {
|
||||
return new URL(url).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate standardized session identifier from workflow ID
|
||||
* Workflow IDs already contain hostname, so we use them directly
|
||||
*/
|
||||
export function generateSessionIdentifier(sessionMetadata: SessionMetadata): string {
|
||||
const { id, webUrl } = sessionMetadata;
|
||||
const hostname = new URL(webUrl).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||
return `${hostname}_${id}`;
|
||||
return sessionMetadata.id;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -79,6 +85,14 @@ export function generateSessionJsonPath(sessionMetadata: SessionMetadata): strin
|
||||
return path.join(auditPath, 'session.json');
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to workflow.log file
|
||||
*/
|
||||
export function generateWorkflowLogPath(sessionMetadata: SessionMetadata): string {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
return path.join(auditPath, 'workflow.log');
|
||||
}
|
||||
|
||||
/**
|
||||
* Ensure directory exists (idempotent, race-safe)
|
||||
*/
|
||||
|
||||
@@ -0,0 +1,382 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Workflow Logger
|
||||
*
|
||||
* Provides a unified, human-readable log file per workflow.
|
||||
* Optimized for `tail -f` viewing during concurrent workflow execution.
|
||||
*/
|
||||
|
||||
import fs from 'fs';
|
||||
import path from 'path';
|
||||
import { generateWorkflowLogPath, ensureDirectory, type SessionMetadata } from './utils.js';
|
||||
import { formatDuration, formatTimestamp } from '../utils/formatting.js';
|
||||
|
||||
export interface AgentLogDetails {
|
||||
attemptNumber?: number;
|
||||
duration_ms?: number;
|
||||
cost_usd?: number;
|
||||
success?: boolean;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export interface AgentMetricsSummary {
|
||||
durationMs: number;
|
||||
costUsd: number | null;
|
||||
}
|
||||
|
||||
export interface WorkflowSummary {
|
||||
status: 'completed' | 'failed';
|
||||
totalDurationMs: number;
|
||||
totalCostUsd: number;
|
||||
completedAgents: string[];
|
||||
agentMetrics: Record<string, AgentMetricsSummary>;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* WorkflowLogger - Manages the unified workflow log file
|
||||
*/
|
||||
export class WorkflowLogger {
|
||||
private sessionMetadata: SessionMetadata;
|
||||
private logPath: string;
|
||||
private stream: fs.WriteStream | null = null;
|
||||
private initialized: boolean = false;
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata) {
|
||||
this.sessionMetadata = sessionMetadata;
|
||||
this.logPath = generateWorkflowLogPath(sessionMetadata);
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize the log stream (creates file and writes header)
|
||||
*/
|
||||
async initialize(): Promise<void> {
|
||||
if (this.initialized) {
|
||||
return;
|
||||
}
|
||||
|
||||
// Ensure directory exists
|
||||
await ensureDirectory(path.dirname(this.logPath));
|
||||
|
||||
// Create write stream with append mode
|
||||
this.stream = fs.createWriteStream(this.logPath, {
|
||||
flags: 'a',
|
||||
encoding: 'utf8',
|
||||
autoClose: true,
|
||||
});
|
||||
|
||||
this.initialized = true;
|
||||
|
||||
// Write header only if file is new (empty)
|
||||
const stats = await fs.promises.stat(this.logPath).catch(() => null);
|
||||
if (!stats || stats.size === 0) {
|
||||
await this.writeHeader();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Write header to log file
|
||||
*/
|
||||
private async writeHeader(): Promise<void> {
|
||||
const header = [
|
||||
`================================================================================`,
|
||||
`Shannon Pentest - Workflow Log`,
|
||||
`================================================================================`,
|
||||
`Workflow ID: ${this.sessionMetadata.id}`,
|
||||
`Target URL: ${this.sessionMetadata.webUrl}`,
|
||||
`Started: ${formatTimestamp()}`,
|
||||
`================================================================================`,
|
||||
``,
|
||||
].join('\n');
|
||||
|
||||
return this.writeRaw(header);
|
||||
}
|
||||
|
||||
/**
|
||||
* Write raw text to log file with immediate flush
|
||||
*/
|
||||
private writeRaw(text: string): Promise<void> {
|
||||
return new Promise((resolve, reject) => {
|
||||
if (!this.initialized || !this.stream) {
|
||||
reject(new Error('WorkflowLogger not initialized'));
|
||||
return;
|
||||
}
|
||||
|
||||
const needsDrain = !this.stream.write(text, 'utf8', (error) => {
|
||||
if (error) reject(error);
|
||||
});
|
||||
|
||||
if (needsDrain) {
|
||||
this.stream.once('drain', resolve);
|
||||
} else {
|
||||
resolve();
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Format timestamp for log line (local time, human readable)
|
||||
*/
|
||||
private formatLogTime(): string {
|
||||
const now = new Date();
|
||||
return now.toISOString().replace('T', ' ').slice(0, 19);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a phase transition event
|
||||
*/
|
||||
async logPhase(phase: string, event: 'start' | 'complete'): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const action = event === 'start' ? 'Starting' : 'Completed';
|
||||
const line = `[${this.formatLogTime()}] [PHASE] ${action}: ${phase}\n`;
|
||||
|
||||
// Add blank line before phase start for readability
|
||||
if (event === 'start') {
|
||||
await this.writeRaw('\n');
|
||||
}
|
||||
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log an agent event
|
||||
*/
|
||||
async logAgent(
|
||||
agentName: string,
|
||||
event: 'start' | 'end',
|
||||
details?: AgentLogDetails
|
||||
): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
let message: string;
|
||||
|
||||
if (event === 'start') {
|
||||
const attempt = details?.attemptNumber ?? 1;
|
||||
message = `${agentName}: Starting (attempt ${attempt})`;
|
||||
} else {
|
||||
const parts: string[] = [agentName + ':'];
|
||||
|
||||
if (details?.success === false) {
|
||||
parts.push('Failed');
|
||||
if (details?.error) {
|
||||
parts.push(`- ${details.error}`);
|
||||
}
|
||||
} else {
|
||||
parts.push('Completed');
|
||||
}
|
||||
|
||||
if (details?.duration_ms !== undefined) {
|
||||
parts.push(`(${formatDuration(details.duration_ms)}`);
|
||||
if (details?.cost_usd !== undefined) {
|
||||
parts.push(`$${details.cost_usd.toFixed(2)})`);
|
||||
} else {
|
||||
parts.push(')');
|
||||
}
|
||||
}
|
||||
|
||||
message = parts.join(' ');
|
||||
}
|
||||
|
||||
const line = `[${this.formatLogTime()}] [AGENT] ${message}\n`;
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a general event
|
||||
*/
|
||||
async logEvent(eventType: string, message: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const line = `[${this.formatLogTime()}] [${eventType.toUpperCase()}] ${message}\n`;
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log an error
|
||||
*/
|
||||
async logError(error: Error, context?: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const contextStr = context ? ` (${context})` : '';
|
||||
const line = `[${this.formatLogTime()}] [ERROR] ${error.message}${contextStr}\n`;
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate string to max length with ellipsis
|
||||
*/
|
||||
private truncate(str: string, maxLen: number): string {
|
||||
if (str.length <= maxLen) return str;
|
||||
return str.slice(0, maxLen - 3) + '...';
|
||||
}
|
||||
|
||||
/**
|
||||
* Format tool parameters for human-readable display
|
||||
*/
|
||||
private formatToolParams(toolName: string, params: unknown): string {
|
||||
if (!params || typeof params !== 'object') {
|
||||
return '';
|
||||
}
|
||||
|
||||
const p = params as Record<string, unknown>;
|
||||
|
||||
// Tool-specific formatting for common tools
|
||||
switch (toolName) {
|
||||
case 'Bash':
|
||||
if (p.command) {
|
||||
return this.truncate(String(p.command).replace(/\n/g, ' '), 100);
|
||||
}
|
||||
break;
|
||||
case 'Read':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Write':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Edit':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Glob':
|
||||
if (p.pattern) {
|
||||
return String(p.pattern);
|
||||
}
|
||||
break;
|
||||
case 'Grep':
|
||||
if (p.pattern) {
|
||||
const path = p.path ? ` in ${p.path}` : '';
|
||||
return `"${this.truncate(String(p.pattern), 50)}"${path}`;
|
||||
}
|
||||
break;
|
||||
case 'WebFetch':
|
||||
if (p.url) {
|
||||
return String(p.url);
|
||||
}
|
||||
break;
|
||||
case 'mcp__playwright__browser_navigate':
|
||||
if (p.url) {
|
||||
return String(p.url);
|
||||
}
|
||||
break;
|
||||
case 'mcp__playwright__browser_click':
|
||||
if (p.selector) {
|
||||
return this.truncate(String(p.selector), 60);
|
||||
}
|
||||
break;
|
||||
case 'mcp__playwright__browser_type':
|
||||
if (p.selector) {
|
||||
const text = p.text ? `: "${this.truncate(String(p.text), 30)}"` : '';
|
||||
return `${this.truncate(String(p.selector), 40)}${text}`;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
// Default: show first string-valued param truncated
|
||||
for (const [key, val] of Object.entries(p)) {
|
||||
if (typeof val === 'string' && val.length > 0) {
|
||||
return `${key}=${this.truncate(val, 60)}`;
|
||||
}
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Log tool start event
|
||||
*/
|
||||
async logToolStart(agentName: string, toolName: string, parameters: unknown): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const params = this.formatToolParams(toolName, parameters);
|
||||
const paramStr = params ? `: ${params}` : '';
|
||||
const line = `[${this.formatLogTime()}] [${agentName}] [TOOL] ${toolName}${paramStr}\n`;
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log LLM response
|
||||
*/
|
||||
async logLlmResponse(agentName: string, turn: number, content: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
// Show full content, replacing newlines with escaped version for single-line output
|
||||
const escaped = content.replace(/\n/g, '\\n');
|
||||
const line = `[${this.formatLogTime()}] [${agentName}] [LLM] Turn ${turn}: ${escaped}\n`;
|
||||
await this.writeRaw(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion with full summary
|
||||
*/
|
||||
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const status = summary.status === 'completed' ? 'COMPLETED' : 'FAILED';
|
||||
|
||||
await this.writeRaw('\n');
|
||||
await this.writeRaw(`================================================================================\n`);
|
||||
await this.writeRaw(`Workflow ${status}\n`);
|
||||
await this.writeRaw(`────────────────────────────────────────\n`);
|
||||
await this.writeRaw(`Workflow ID: ${this.sessionMetadata.id}\n`);
|
||||
await this.writeRaw(`Status: ${summary.status}\n`);
|
||||
await this.writeRaw(`Duration: ${formatDuration(summary.totalDurationMs)}\n`);
|
||||
await this.writeRaw(`Total Cost: $${summary.totalCostUsd.toFixed(4)}\n`);
|
||||
await this.writeRaw(`Agents: ${summary.completedAgents.length} completed\n`);
|
||||
|
||||
if (summary.error) {
|
||||
await this.writeRaw(`Error: ${summary.error}\n`);
|
||||
}
|
||||
|
||||
await this.writeRaw(`\n`);
|
||||
await this.writeRaw(`Agent Breakdown:\n`);
|
||||
|
||||
for (const agentName of summary.completedAgents) {
|
||||
const metrics = summary.agentMetrics[agentName];
|
||||
if (metrics) {
|
||||
const duration = formatDuration(metrics.durationMs);
|
||||
const cost = metrics.costUsd !== null ? `$${metrics.costUsd.toFixed(4)}` : 'N/A';
|
||||
await this.writeRaw(` - ${agentName} (${duration}, ${cost})\n`);
|
||||
} else {
|
||||
await this.writeRaw(` - ${agentName}\n`);
|
||||
}
|
||||
}
|
||||
|
||||
await this.writeRaw(`================================================================================\n`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Ensure initialized (helper for lazy initialization)
|
||||
*/
|
||||
private async ensureInitialized(): Promise<void> {
|
||||
if (!this.initialized) {
|
||||
await this.initialize();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Close the log stream
|
||||
*/
|
||||
async close(): Promise<void> {
|
||||
if (!this.initialized || !this.stream) {
|
||||
return;
|
||||
}
|
||||
|
||||
return new Promise((resolve) => {
|
||||
this.stream!.end(() => {
|
||||
this.initialized = false;
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
}
|
||||
+186
-74
@@ -14,6 +14,12 @@ import type {
|
||||
PromptErrorResult,
|
||||
} from './types/errors.js';
|
||||
|
||||
// Temporal error classification for ApplicationFailure wrapping
|
||||
export interface TemporalErrorClassification {
|
||||
type: string;
|
||||
retryable: boolean;
|
||||
}
|
||||
|
||||
// Custom error class for pentest operations
|
||||
export class PentestError extends Error {
|
||||
name = 'PentestError' as const;
|
||||
@@ -37,11 +43,11 @@ export class PentestError extends Error {
|
||||
}
|
||||
|
||||
// Centralized error logging function
|
||||
export const logError = async (
|
||||
export async function logError(
|
||||
error: Error & { type?: PentestErrorType; retryable?: boolean; context?: PentestErrorContext },
|
||||
contextMsg: string,
|
||||
sourceDir: string | null = null
|
||||
): Promise<LogEntry> => {
|
||||
): Promise<LogEntry> {
|
||||
const timestamp = new Date().toISOString();
|
||||
const logEntry: LogEntry = {
|
||||
timestamp,
|
||||
@@ -80,13 +86,13 @@ export const logError = async (
|
||||
}
|
||||
|
||||
return logEntry;
|
||||
};
|
||||
}
|
||||
|
||||
// Handle tool execution errors
|
||||
export const handleToolError = (
|
||||
export function handleToolError(
|
||||
toolName: string,
|
||||
error: Error & { code?: string }
|
||||
): ToolErrorResult => {
|
||||
): ToolErrorResult {
|
||||
const isRetryable =
|
||||
error.code === 'ECONNRESET' ||
|
||||
error.code === 'ETIMEDOUT' ||
|
||||
@@ -105,13 +111,13 @@ export const handleToolError = (
|
||||
{ toolName, originalError: error.message, errorCode: error.code }
|
||||
),
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
// Handle prompt loading errors
|
||||
export const handlePromptError = (
|
||||
export function handlePromptError(
|
||||
promptName: string,
|
||||
error: Error
|
||||
): PromptErrorResult => {
|
||||
): PromptErrorResult {
|
||||
return {
|
||||
success: false,
|
||||
error: new PentestError(
|
||||
@@ -121,78 +127,63 @@ export const handlePromptError = (
|
||||
{ promptName, originalError: error.message }
|
||||
),
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
// Check if an error should trigger a retry for Claude agents
|
||||
export const isRetryableError = (error: Error): boolean => {
|
||||
// Patterns that indicate retryable errors
|
||||
const RETRYABLE_PATTERNS = [
|
||||
// Network and connection errors
|
||||
'network',
|
||||
'connection',
|
||||
'timeout',
|
||||
'econnreset',
|
||||
'enotfound',
|
||||
'econnrefused',
|
||||
// Rate limiting
|
||||
'rate limit',
|
||||
'429',
|
||||
'too many requests',
|
||||
// Server errors
|
||||
'server error',
|
||||
'5xx',
|
||||
'internal server error',
|
||||
'service unavailable',
|
||||
'bad gateway',
|
||||
// Claude API errors
|
||||
'mcp server',
|
||||
'model unavailable',
|
||||
'service temporarily unavailable',
|
||||
'api error',
|
||||
'terminated',
|
||||
// Max turns
|
||||
'max turns',
|
||||
'maximum turns',
|
||||
];
|
||||
|
||||
// Patterns that indicate non-retryable errors (checked before default)
|
||||
const NON_RETRYABLE_PATTERNS = [
|
||||
'authentication',
|
||||
'invalid prompt',
|
||||
'out of memory',
|
||||
'permission denied',
|
||||
'session limit reached',
|
||||
'invalid api key',
|
||||
];
|
||||
|
||||
// Conservative retry classification - unknown errors don't retry (fail-safe default)
|
||||
export function isRetryableError(error: Error): boolean {
|
||||
const message = error.message.toLowerCase();
|
||||
|
||||
// Network and connection errors - always retryable
|
||||
if (
|
||||
message.includes('network') ||
|
||||
message.includes('connection') ||
|
||||
message.includes('timeout') ||
|
||||
message.includes('econnreset') ||
|
||||
message.includes('enotfound') ||
|
||||
message.includes('econnrefused')
|
||||
) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Rate limiting - retryable with longer backoff
|
||||
if (
|
||||
message.includes('rate limit') ||
|
||||
message.includes('429') ||
|
||||
message.includes('too many requests')
|
||||
) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Server errors - retryable
|
||||
if (
|
||||
message.includes('server error') ||
|
||||
message.includes('5xx') ||
|
||||
message.includes('internal server error') ||
|
||||
message.includes('service unavailable') ||
|
||||
message.includes('bad gateway')
|
||||
) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Claude API specific errors - retryable
|
||||
if (
|
||||
message.includes('mcp server') ||
|
||||
message.includes('model unavailable') ||
|
||||
message.includes('service temporarily unavailable') ||
|
||||
message.includes('api error') ||
|
||||
message.includes('terminated')
|
||||
) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Max turns without completion - retryable once
|
||||
if (message.includes('max turns') || message.includes('maximum turns')) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Non-retryable errors
|
||||
if (
|
||||
message.includes('authentication') ||
|
||||
message.includes('invalid prompt') ||
|
||||
message.includes('out of memory') ||
|
||||
message.includes('permission denied') ||
|
||||
message.includes('session limit reached') ||
|
||||
message.includes('invalid api key')
|
||||
) {
|
||||
// Check for explicit non-retryable patterns first
|
||||
if (NON_RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern))) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Default to non-retryable for unknown errors
|
||||
return false;
|
||||
};
|
||||
// Check for retryable patterns
|
||||
return RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern));
|
||||
}
|
||||
|
||||
// Get retry delay based on error type and attempt number
|
||||
export const getRetryDelay = (error: Error, attempt: number): number => {
|
||||
// Rate limit errors get longer base delay (30s) vs standard exponential backoff (2s)
|
||||
export function getRetryDelay(error: Error, attempt: number): number {
|
||||
const message = error.message.toLowerCase();
|
||||
|
||||
// Rate limiting gets longer delays
|
||||
@@ -204,4 +195,125 @@ export const getRetryDelay = (error: Error, attempt: number): number => {
|
||||
const baseDelay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
|
||||
const jitter = Math.random() * 1000; // 0-1s random
|
||||
return Math.min(baseDelay + jitter, 30000); // Max 30s
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Classifies errors for Temporal workflow retry behavior.
|
||||
* Returns error type and whether Temporal should retry.
|
||||
*
|
||||
* Used by activities to wrap errors in ApplicationFailure:
|
||||
* - Retryable errors: Temporal retries with configured backoff
|
||||
* - Non-retryable errors: Temporal fails immediately
|
||||
*/
|
||||
export function classifyErrorForTemporal(error: unknown): TemporalErrorClassification {
|
||||
const message = (error instanceof Error ? error.message : String(error)).toLowerCase();
|
||||
|
||||
// === BILLING ERRORS (Retryable with long backoff) ===
|
||||
// Anthropic returns billing as 400 invalid_request_error
|
||||
// Human can add credits OR wait for spending cap to reset (5-30 min backoff)
|
||||
if (
|
||||
message.includes('billing_error') ||
|
||||
message.includes('credit balance is too low') ||
|
||||
message.includes('insufficient credits') ||
|
||||
message.includes('usage is blocked due to insufficient credits') ||
|
||||
message.includes('please visit plans & billing') ||
|
||||
message.includes('please visit plans and billing') ||
|
||||
message.includes('usage limit reached') ||
|
||||
message.includes('quota exceeded') ||
|
||||
message.includes('daily rate limit') ||
|
||||
message.includes('limit will reset') ||
|
||||
// Claude Code spending cap patterns (returns short message instead of error)
|
||||
message.includes('spending cap') ||
|
||||
message.includes('spending limit') ||
|
||||
message.includes('cap reached') ||
|
||||
message.includes('budget exceeded') ||
|
||||
message.includes('billing limit reached')
|
||||
) {
|
||||
return { type: 'BillingError', retryable: true };
|
||||
}
|
||||
|
||||
// === PERMANENT ERRORS (Non-retryable) ===
|
||||
|
||||
// Authentication (401) - bad API key won't fix itself
|
||||
if (
|
||||
message.includes('authentication') ||
|
||||
message.includes('api key') ||
|
||||
message.includes('401') ||
|
||||
message.includes('authentication_error')
|
||||
) {
|
||||
return { type: 'AuthenticationError', retryable: false };
|
||||
}
|
||||
|
||||
// Permission (403) - access won't be granted
|
||||
if (
|
||||
message.includes('permission') ||
|
||||
message.includes('forbidden') ||
|
||||
message.includes('403')
|
||||
) {
|
||||
return { type: 'PermissionError', retryable: false };
|
||||
}
|
||||
|
||||
// === OUTPUT VALIDATION ERRORS (Retryable) ===
|
||||
// Agent didn't produce expected deliverables - retry may succeed
|
||||
// IMPORTANT: Must come BEFORE generic 'validation' check below
|
||||
if (
|
||||
message.includes('failed output validation') ||
|
||||
message.includes('output validation failed')
|
||||
) {
|
||||
return { type: 'OutputValidationError', retryable: true };
|
||||
}
|
||||
|
||||
// Invalid Request (400) - malformed request is permanent
|
||||
// Note: Checked AFTER billing and AFTER output validation
|
||||
if (
|
||||
message.includes('invalid_request_error') ||
|
||||
message.includes('malformed') ||
|
||||
message.includes('validation')
|
||||
) {
|
||||
return { type: 'InvalidRequestError', retryable: false };
|
||||
}
|
||||
|
||||
// Request Too Large (413) - won't fit no matter how many retries
|
||||
if (
|
||||
message.includes('request_too_large') ||
|
||||
message.includes('too large') ||
|
||||
message.includes('413')
|
||||
) {
|
||||
return { type: 'RequestTooLargeError', retryable: false };
|
||||
}
|
||||
|
||||
// Configuration errors - missing files need manual fix
|
||||
if (
|
||||
message.includes('enoent') ||
|
||||
message.includes('no such file') ||
|
||||
message.includes('cli not installed')
|
||||
) {
|
||||
return { type: 'ConfigurationError', retryable: false };
|
||||
}
|
||||
|
||||
// Execution limits - max turns/budget reached
|
||||
if (
|
||||
message.includes('max turns') ||
|
||||
message.includes('budget') ||
|
||||
message.includes('execution limit') ||
|
||||
message.includes('error_max_turns') ||
|
||||
message.includes('error_max_budget')
|
||||
) {
|
||||
return { type: 'ExecutionLimitError', retryable: false };
|
||||
}
|
||||
|
||||
// Invalid target URL - bad URL format won't fix itself
|
||||
if (
|
||||
message.includes('invalid url') ||
|
||||
message.includes('invalid target') ||
|
||||
message.includes('malformed url') ||
|
||||
message.includes('invalid uri')
|
||||
) {
|
||||
return { type: 'InvalidTargetError', retryable: false };
|
||||
}
|
||||
|
||||
// === TRANSIENT ERRORS (Retryable) ===
|
||||
// Rate limits (429), server errors (5xx), network issues
|
||||
// Let Temporal retry with configured backoff
|
||||
return { type: 'TransientError', retryable: true };
|
||||
}
|
||||
|
||||
+55
-68
@@ -7,7 +7,7 @@
|
||||
import { $, fs, path } from 'zx';
|
||||
import chalk from 'chalk';
|
||||
import { Timer } from '../utils/metrics.js';
|
||||
import { formatDuration } from '../audit/utils.js';
|
||||
import { formatDuration } from '../utils/formatting.js';
|
||||
import { handleToolError, PentestError } from '../error-handling.js';
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import { runClaudePromptWithRetry } from '../ai/claude-executor.js';
|
||||
@@ -40,11 +40,17 @@ interface PromptVariables {
|
||||
repoPath: string;
|
||||
}
|
||||
|
||||
// Discriminated union for Wave1 tool results - clearer than loose union types
|
||||
type Wave1ToolResult =
|
||||
| { kind: 'scan'; result: TerminalScanResult }
|
||||
| { kind: 'skipped'; message: string }
|
||||
| { kind: 'agent'; result: AgentResult };
|
||||
|
||||
interface Wave1Results {
|
||||
nmap: TerminalScanResult | string | AgentResult;
|
||||
subfinder: TerminalScanResult | string | AgentResult;
|
||||
whatweb: TerminalScanResult | string | AgentResult;
|
||||
naabu?: TerminalScanResult | string | AgentResult;
|
||||
nmap: Wave1ToolResult;
|
||||
subfinder: Wave1ToolResult;
|
||||
whatweb: Wave1ToolResult;
|
||||
naabu?: Wave1ToolResult;
|
||||
codeAnalysis: AgentResult;
|
||||
}
|
||||
|
||||
@@ -57,7 +63,7 @@ interface PreReconResult {
|
||||
report: string;
|
||||
}
|
||||
|
||||
// Pure function: Run terminal scanning tools
|
||||
// Runs external security tools (nmap, whatweb, etc). Schemathesis requires schemas from code analysis.
|
||||
async function runTerminalScan(tool: ToolName, target: string, sourceDir: string | null = null): Promise<TerminalScanResult> {
|
||||
const timer = new Timer(`command-${tool}`);
|
||||
try {
|
||||
@@ -89,7 +95,7 @@ async function runTerminalScan(tool: ToolName, target: string, sourceDir: string
|
||||
return { tool: 'whatweb', output: result.stdout, status: 'success', duration: whatwebDuration };
|
||||
}
|
||||
case 'schemathesis': {
|
||||
// Only run if API schemas found
|
||||
// Schemathesis depends on code analysis output - skip if no schemas found
|
||||
const schemasDir = path.join(sourceDir || '.', 'outputs', 'schemas');
|
||||
if (await fs.pathExists(schemasDir)) {
|
||||
const schemaFiles = await fs.readdir(schemasDir) as string[];
|
||||
@@ -146,6 +152,8 @@ async function runPreReconWave1(
|
||||
|
||||
const operations: Promise<TerminalScanResult | AgentResult>[] = [];
|
||||
|
||||
const skippedResult = (message: string): Wave1ToolResult => ({ kind: 'skipped', message });
|
||||
|
||||
// Skip external commands in pipeline testing mode
|
||||
if (pipelineTestingMode) {
|
||||
console.log(chalk.gray(' ⏭️ Skipping external tools (pipeline testing mode)'));
|
||||
@@ -163,9 +171,9 @@ async function runPreReconWave1(
|
||||
);
|
||||
const [codeAnalysis] = await Promise.all(operations);
|
||||
return {
|
||||
nmap: 'Skipped (pipeline testing mode)',
|
||||
subfinder: 'Skipped (pipeline testing mode)',
|
||||
whatweb: 'Skipped (pipeline testing mode)',
|
||||
nmap: skippedResult('Skipped (pipeline testing mode)'),
|
||||
subfinder: skippedResult('Skipped (pipeline testing mode)'),
|
||||
whatweb: skippedResult('Skipped (pipeline testing mode)'),
|
||||
codeAnalysis: codeAnalysis as AgentResult
|
||||
};
|
||||
} else {
|
||||
@@ -192,9 +200,9 @@ async function runPreReconWave1(
|
||||
const [nmap, subfinder, whatweb, codeAnalysis] = await Promise.all(operations);
|
||||
|
||||
return {
|
||||
nmap: nmap as TerminalScanResult,
|
||||
subfinder: subfinder as TerminalScanResult,
|
||||
whatweb: whatweb as TerminalScanResult,
|
||||
nmap: { kind: 'scan', result: nmap as TerminalScanResult },
|
||||
subfinder: { kind: 'scan', result: subfinder as TerminalScanResult },
|
||||
whatweb: { kind: 'scan', result: whatweb as TerminalScanResult },
|
||||
codeAnalysis: codeAnalysis as AgentResult
|
||||
};
|
||||
}
|
||||
@@ -250,17 +258,21 @@ async function runPreReconWave2(
|
||||
return response;
|
||||
}
|
||||
|
||||
// Helper type for stitching results
|
||||
interface StitchableResult {
|
||||
status?: string;
|
||||
output?: string;
|
||||
tool?: string;
|
||||
// Extracts status and output from a Wave1 tool result
|
||||
function extractResult(r: Wave1ToolResult | undefined): { status: string; output: string } {
|
||||
if (!r) return { status: 'Skipped', output: 'No output' };
|
||||
switch (r.kind) {
|
||||
case 'scan':
|
||||
return { status: r.result.status || 'Skipped', output: r.result.output || 'No output' };
|
||||
case 'skipped':
|
||||
return { status: 'Skipped', output: r.message };
|
||||
case 'agent':
|
||||
return { status: r.result.success ? 'success' : 'error', output: 'See agent output' };
|
||||
}
|
||||
}
|
||||
|
||||
// Pure function: Stitch together pre-recon outputs and save to file
|
||||
async function stitchPreReconOutputs(outputs: (StitchableResult | string | undefined)[], sourceDir: string): Promise<string> {
|
||||
const [nmap, subfinder, whatweb, naabu, codeAnalysis, ...additionalScans] = outputs;
|
||||
|
||||
// Combines tool outputs into single deliverable. Falls back to reference if file missing.
|
||||
async function stitchPreReconOutputs(wave1: Wave1Results, additionalScans: TerminalScanResult[], sourceDir: string): Promise<string> {
|
||||
// Try to read the code analysis deliverable file
|
||||
let codeAnalysisContent = 'No analysis available';
|
||||
try {
|
||||
@@ -269,62 +281,45 @@ async function stitchPreReconOutputs(outputs: (StitchableResult | string | undef
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
console.log(chalk.yellow(`⚠️ Could not read code analysis deliverable: ${err.message}`));
|
||||
// Fallback message if file doesn't exist
|
||||
codeAnalysisContent = 'Analysis located in deliverables/code_analysis_deliverable.md';
|
||||
}
|
||||
|
||||
|
||||
// Build additional scans section
|
||||
let additionalSection = '';
|
||||
if (additionalScans && additionalScans.length > 0) {
|
||||
if (additionalScans.length > 0) {
|
||||
additionalSection = '\n## Authenticated Scans\n';
|
||||
additionalScans.forEach(scan => {
|
||||
const s = scan as StitchableResult;
|
||||
if (s && s.tool) {
|
||||
additionalSection += `
|
||||
### ${s.tool.toUpperCase()}
|
||||
Status: ${s.status}
|
||||
${s.output}
|
||||
for (const scan of additionalScans) {
|
||||
additionalSection += `
|
||||
### ${scan.tool.toUpperCase()}
|
||||
Status: ${scan.status}
|
||||
${scan.output}
|
||||
`;
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
const nmapResult = nmap as StitchableResult | string | undefined;
|
||||
const subfinderResult = subfinder as StitchableResult | string | undefined;
|
||||
const whatwebResult = whatweb as StitchableResult | string | undefined;
|
||||
const naabuResult = naabu as StitchableResult | string | undefined;
|
||||
|
||||
const getStatus = (r: StitchableResult | string | undefined): string => {
|
||||
if (!r) return 'Skipped';
|
||||
if (typeof r === 'string') return 'Skipped';
|
||||
return r.status || 'Skipped';
|
||||
};
|
||||
|
||||
const getOutput = (r: StitchableResult | string | undefined): string => {
|
||||
if (!r) return 'No output';
|
||||
if (typeof r === 'string') return r;
|
||||
return r.output || 'No output';
|
||||
};
|
||||
const nmap = extractResult(wave1.nmap);
|
||||
const subfinder = extractResult(wave1.subfinder);
|
||||
const whatweb = extractResult(wave1.whatweb);
|
||||
const naabu = extractResult(wave1.naabu);
|
||||
|
||||
const report = `
|
||||
# Pre-Reconnaissance Report
|
||||
|
||||
## Port Discovery (naabu)
|
||||
Status: ${getStatus(naabuResult)}
|
||||
${getOutput(naabuResult)}
|
||||
Status: ${naabu.status}
|
||||
${naabu.output}
|
||||
|
||||
## Network Scanning (nmap)
|
||||
Status: ${getStatus(nmapResult)}
|
||||
${getOutput(nmapResult)}
|
||||
Status: ${nmap.status}
|
||||
${nmap.output}
|
||||
|
||||
## Subdomain Discovery (subfinder)
|
||||
Status: ${getStatus(subfinderResult)}
|
||||
${getOutput(subfinderResult)}
|
||||
Status: ${subfinder.status}
|
||||
${subfinder.output}
|
||||
|
||||
## Technology Detection (whatweb)
|
||||
Status: ${getStatus(whatwebResult)}
|
||||
${getOutput(whatwebResult)}
|
||||
Status: ${whatweb.status}
|
||||
${whatweb.output}
|
||||
## Code Analysis
|
||||
${codeAnalysisContent}
|
||||
${additionalSection}
|
||||
@@ -375,16 +370,8 @@ export async function executePreReconPhase(
|
||||
console.log(chalk.green(' ✅ Wave 2 operations completed'));
|
||||
|
||||
console.log(chalk.blue('📝 Stitching pre-recon outputs...'));
|
||||
// Combine wave 1 and wave 2 results for stitching
|
||||
const allResults: (StitchableResult | string | undefined)[] = [
|
||||
wave1Results.nmap as StitchableResult | string,
|
||||
wave1Results.subfinder as StitchableResult | string,
|
||||
wave1Results.whatweb as StitchableResult | string,
|
||||
wave1Results.naabu as StitchableResult | string | undefined,
|
||||
wave1Results.codeAnalysis as unknown as StitchableResult,
|
||||
...(wave2Results.schemathesis ? [wave2Results.schemathesis as StitchableResult] : [])
|
||||
];
|
||||
const preReconReport = await stitchPreReconOutputs(allResults, sourceDir);
|
||||
const additionalScans = wave2Results.schemathesis ? [wave2Results.schemathesis] : [];
|
||||
const preReconReport = await stitchPreReconOutputs(wave1Results, additionalScans, sourceDir);
|
||||
const duration = timer.stop();
|
||||
|
||||
console.log(chalk.green(`✅ Pre-reconnaissance complete in ${formatDuration(duration)}`));
|
||||
|
||||
@@ -48,9 +48,12 @@ export async function assembleFinalReport(sourceDir: string): Promise<string> {
|
||||
}
|
||||
|
||||
const finalContent = sections.join('\n\n');
|
||||
const finalReportPath = path.join(sourceDir, 'deliverables', 'comprehensive_security_assessment_report.md');
|
||||
const deliverablesDir = path.join(sourceDir, 'deliverables');
|
||||
const finalReportPath = path.join(deliverablesDir, 'comprehensive_security_assessment_report.md');
|
||||
|
||||
try {
|
||||
// Ensure deliverables directory exists
|
||||
await fs.ensureDir(deliverablesDir);
|
||||
await fs.writeFile(finalReportPath, finalContent);
|
||||
console.log(chalk.green(`✅ Final report assembled at ${finalReportPath}`));
|
||||
} catch (error) {
|
||||
|
||||
+40
-35
@@ -6,6 +6,7 @@
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import { PentestError } from './error-handling.js';
|
||||
import { asyncPipe } from './utils/functional.js';
|
||||
|
||||
export type VulnType = 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
||||
|
||||
@@ -16,9 +17,11 @@ interface VulnTypeConfigItem {
|
||||
|
||||
type VulnTypeConfig = Record<VulnType, VulnTypeConfigItem>;
|
||||
|
||||
type ErrorMessageResolver = string | ((existence: FileExistence) => string);
|
||||
|
||||
interface ValidationRule {
|
||||
predicate: (existence: FileExistence) => boolean;
|
||||
errorMessage: string;
|
||||
errorMessage: ErrorMessageResolver;
|
||||
retryable: boolean;
|
||||
}
|
||||
|
||||
@@ -94,40 +97,36 @@ const VULN_TYPE_CONFIG: VulnTypeConfig = Object.freeze({
|
||||
}),
|
||||
}) as VulnTypeConfig;
|
||||
|
||||
// Functional composition utilities - async pipe for promise chain
|
||||
type PipeFunction = (x: any) => any | Promise<any>;
|
||||
|
||||
const pipe =
|
||||
(...fns: PipeFunction[]) =>
|
||||
(x: any): Promise<any> =>
|
||||
fns.reduce(async (v, f) => f(await v), Promise.resolve(x));
|
||||
|
||||
// Pure function to create validation rule
|
||||
const createValidationRule = (
|
||||
function createValidationRule(
|
||||
predicate: (existence: FileExistence) => boolean,
|
||||
errorMessage: string,
|
||||
errorMessage: ErrorMessageResolver,
|
||||
retryable: boolean = true
|
||||
): ValidationRule => Object.freeze({ predicate, errorMessage, retryable });
|
||||
): ValidationRule {
|
||||
return Object.freeze({ predicate, errorMessage, retryable });
|
||||
}
|
||||
|
||||
// Validation rules for file existence (following QUEUE_VALIDATION_FLOW.md)
|
||||
// Symmetric deliverable rules: queue and deliverable must exist together (prevents partial analysis from triggering exploitation)
|
||||
const fileExistenceRules: readonly ValidationRule[] = Object.freeze([
|
||||
// Rule 1: Neither deliverable nor queue exists
|
||||
createValidationRule(
|
||||
({ deliverableExists, queueExists }) => deliverableExists || queueExists,
|
||||
'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.'
|
||||
),
|
||||
// Rule 2: Queue doesn't exist but deliverable exists
|
||||
createValidationRule(
|
||||
({ deliverableExists, queueExists }) => !(!queueExists && deliverableExists),
|
||||
'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.'
|
||||
),
|
||||
// Rule 3: Queue exists but deliverable doesn't exist
|
||||
createValidationRule(
|
||||
({ deliverableExists, queueExists }) => !(queueExists && !deliverableExists),
|
||||
'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.'
|
||||
({ deliverableExists, queueExists }) => deliverableExists && queueExists,
|
||||
getExistenceErrorMessage
|
||||
),
|
||||
]);
|
||||
|
||||
// Generate appropriate error message based on which files are missing
|
||||
function getExistenceErrorMessage(existence: FileExistence): string {
|
||||
const { deliverableExists, queueExists } = existence;
|
||||
|
||||
if (!deliverableExists && !queueExists) {
|
||||
return 'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.';
|
||||
}
|
||||
if (!queueExists) {
|
||||
return 'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.';
|
||||
}
|
||||
return 'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.';
|
||||
}
|
||||
|
||||
// Pure function to create file paths
|
||||
const createPaths = (
|
||||
vulnType: VulnType,
|
||||
@@ -170,7 +169,7 @@ const checkFileExistence = async (
|
||||
});
|
||||
};
|
||||
|
||||
// Pure function to validate existence rules
|
||||
// Validates deliverable/queue symmetry - both must exist or neither
|
||||
const validateExistenceRules = (
|
||||
pathsWithExistence: PathsWithExistence | PathsWithError
|
||||
): PathsWithExistence | PathsWithError => {
|
||||
@@ -182,9 +181,14 @@ const validateExistenceRules = (
|
||||
const failedRule = fileExistenceRules.find((rule) => !rule.predicate(existence));
|
||||
|
||||
if (failedRule) {
|
||||
const message =
|
||||
typeof failedRule.errorMessage === 'function'
|
||||
? failedRule.errorMessage(existence)
|
||||
: failedRule.errorMessage;
|
||||
|
||||
return {
|
||||
error: new PentestError(
|
||||
`${failedRule.errorMessage} (${vulnType})`,
|
||||
`${message} (${vulnType})`,
|
||||
'validation',
|
||||
failedRule.retryable,
|
||||
{
|
||||
@@ -224,7 +228,7 @@ const validateQueueStructure = (content: string): QueueValidationResult => {
|
||||
}
|
||||
};
|
||||
|
||||
// Pure function to read and validate queue content
|
||||
// Queue parse failures are retryable - agent can fix malformed JSON on retry
|
||||
const validateQueueContent = async (
|
||||
pathsWithExistence: PathsWithExistence | PathsWithError
|
||||
): Promise<PathsWithQueue | PathsWithError> => {
|
||||
@@ -273,7 +277,7 @@ const validateQueueContent = async (
|
||||
}
|
||||
};
|
||||
|
||||
// Pure function to determine exploitation decision
|
||||
// Final decision: skip if queue says no vulns, proceed if vulns found, error otherwise
|
||||
const determineExploitationDecision = (
|
||||
validatedData: PathsWithQueue | PathsWithError
|
||||
): ExploitationDecision => {
|
||||
@@ -294,17 +298,18 @@ const determineExploitationDecision = (
|
||||
};
|
||||
|
||||
// Main functional validation pipeline
|
||||
export const validateQueueAndDeliverable = async (
|
||||
export async function validateQueueAndDeliverable(
|
||||
vulnType: VulnType,
|
||||
sourceDir: string
|
||||
): Promise<ExploitationDecision> =>
|
||||
(await pipe(
|
||||
() => createPaths(vulnType, sourceDir),
|
||||
): Promise<ExploitationDecision> {
|
||||
return asyncPipe<ExploitationDecision>(
|
||||
createPaths(vulnType, sourceDir),
|
||||
checkFileExistence,
|
||||
validateExistenceRules,
|
||||
validateQueueContent,
|
||||
determineExploitationDecision
|
||||
)(() => createPaths(vulnType, sourceDir))) as ExploitationDecision;
|
||||
);
|
||||
}
|
||||
|
||||
// Pure function to safely validate (returns result instead of throwing)
|
||||
export const safeValidateQueueAndDeliverable = async (
|
||||
|
||||
+20
-6
@@ -106,10 +106,24 @@ export const getParallelGroups = (): Readonly<{ vuln: AgentName[]; exploit: Agen
|
||||
exploit: ['injection-exploit', 'xss-exploit', 'auth-exploit', 'ssrf-exploit', 'authz-exploit']
|
||||
});
|
||||
|
||||
// Generate a session-based log folder path (used by claude-executor.ts)
|
||||
export const generateSessionLogPath = (webUrl: string, sessionId: string): string => {
|
||||
const hostname = new URL(webUrl).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||
const sessionFolderName = `${hostname}_${sessionId}`;
|
||||
return path.join(process.cwd(), 'agent-logs', sessionFolderName);
|
||||
};
|
||||
// Phase names for metrics aggregation
|
||||
export type PhaseName = 'pre-recon' | 'recon' | 'vulnerability-analysis' | 'exploitation' | 'reporting';
|
||||
|
||||
// Map agents to their corresponding phases (single source of truth)
|
||||
export const AGENT_PHASE_MAP: Readonly<Record<AgentName, PhaseName>> = Object.freeze({
|
||||
'pre-recon': 'pre-recon',
|
||||
'recon': 'recon',
|
||||
'injection-vuln': 'vulnerability-analysis',
|
||||
'xss-vuln': 'vulnerability-analysis',
|
||||
'auth-vuln': 'vulnerability-analysis',
|
||||
'authz-vuln': 'vulnerability-analysis',
|
||||
'ssrf-vuln': 'vulnerability-analysis',
|
||||
'injection-exploit': 'exploitation',
|
||||
'xss-exploit': 'exploitation',
|
||||
'auth-exploit': 'exploitation',
|
||||
'authz-exploit': 'exploitation',
|
||||
'ssrf-exploit': 'exploitation',
|
||||
'report': 'reporting',
|
||||
});
|
||||
|
||||
|
||||
|
||||
-897
@@ -1,897 +0,0 @@
|
||||
#!/usr/bin/env node
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { path, fs, $ } from 'zx';
|
||||
import chalk, { type ChalkInstance } from 'chalk';
|
||||
import dotenv from 'dotenv';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
// Config and Tools
|
||||
import { parseConfig, distributeConfig } from './config-parser.js';
|
||||
import { checkToolAvailability, handleMissingTools } from './tool-checker.js';
|
||||
|
||||
// Session
|
||||
import { AGENTS, getParallelGroups } from './session-manager.js';
|
||||
import type { AgentName, PromptName } from './types/index.js';
|
||||
|
||||
// Setup and Deliverables
|
||||
import { setupLocalRepo } from './setup/environment.js';
|
||||
|
||||
// AI and Prompts
|
||||
import { runClaudePromptWithRetry } from './ai/claude-executor.js';
|
||||
import { loadPrompt } from './prompts/prompt-manager.js';
|
||||
|
||||
// Phases
|
||||
import { executePreReconPhase } from './phases/pre-recon.js';
|
||||
import { assembleFinalReport } from './phases/reporting.js';
|
||||
|
||||
// Utils
|
||||
import { timingResults, displayTimingSummary, Timer } from './utils/metrics.js';
|
||||
import { formatDuration, generateAuditPath } from './audit/utils.js';
|
||||
import type { SessionMetadata } from './audit/utils.js';
|
||||
import { AuditSession } from './audit/audit-session.js';
|
||||
|
||||
// CLI
|
||||
import { showHelp, displaySplashScreen } from './cli/ui.js';
|
||||
import { validateWebUrl, validateRepoPath } from './cli/input-validator.js';
|
||||
|
||||
// Error Handling
|
||||
import { PentestError, logError } from './error-handling.js';
|
||||
|
||||
import type { DistributedConfig } from './types/config.js';
|
||||
import type { ToolAvailability } from './tool-checker.js';
|
||||
import { safeValidateQueueAndDeliverable } from './queue-validation.js';
|
||||
|
||||
// Extend global namespace for SHANNON_DISABLE_LOADER
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
// Session Lock File Management
|
||||
const STORE_PATH = path.join(process.cwd(), '.shannon-store.json');
|
||||
|
||||
interface Session {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
status: 'in-progress' | 'completed' | 'failed';
|
||||
startedAt: string;
|
||||
}
|
||||
|
||||
interface SessionStore {
|
||||
sessions: Session[];
|
||||
}
|
||||
|
||||
function generateSessionId(): string {
|
||||
return crypto.randomUUID();
|
||||
}
|
||||
|
||||
async function loadSessions(): Promise<SessionStore> {
|
||||
try {
|
||||
if (await fs.pathExists(STORE_PATH)) {
|
||||
return await fs.readJson(STORE_PATH) as SessionStore;
|
||||
}
|
||||
} catch {
|
||||
// Corrupted file, start fresh
|
||||
}
|
||||
return { sessions: [] };
|
||||
}
|
||||
|
||||
async function saveSessions(store: SessionStore): Promise<void> {
|
||||
await fs.writeJson(STORE_PATH, store, { spaces: 2 });
|
||||
}
|
||||
|
||||
async function createSession(webUrl: string, repoPath: string): Promise<Session> {
|
||||
const store = await loadSessions();
|
||||
|
||||
// Check for existing in-progress session
|
||||
const existing = store.sessions.find(
|
||||
s => s.repoPath === repoPath && s.status === 'in-progress'
|
||||
);
|
||||
if (existing) {
|
||||
throw new PentestError(
|
||||
`Session already in progress for ${repoPath}`,
|
||||
'validation',
|
||||
false,
|
||||
{ sessionId: existing.id }
|
||||
);
|
||||
}
|
||||
|
||||
const session: Session = {
|
||||
id: generateSessionId(),
|
||||
webUrl,
|
||||
repoPath,
|
||||
status: 'in-progress',
|
||||
startedAt: new Date().toISOString()
|
||||
};
|
||||
|
||||
store.sessions.push(session);
|
||||
await saveSessions(store);
|
||||
return session;
|
||||
}
|
||||
|
||||
async function updateSessionStatus(
|
||||
sessionId: string,
|
||||
status: 'in-progress' | 'completed' | 'failed'
|
||||
): Promise<void> {
|
||||
const store = await loadSessions();
|
||||
const session = store.sessions.find(s => s.id === sessionId);
|
||||
if (session) {
|
||||
session.status = status;
|
||||
await saveSessions(store);
|
||||
}
|
||||
}
|
||||
|
||||
interface PromptVariables {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
sourceDir: string;
|
||||
}
|
||||
|
||||
interface MainResult {
|
||||
reportPath: string;
|
||||
auditLogsPath: string;
|
||||
}
|
||||
|
||||
interface AgentResult {
|
||||
success: boolean;
|
||||
duration: number;
|
||||
cost?: number;
|
||||
error?: string;
|
||||
retryable?: boolean;
|
||||
}
|
||||
|
||||
interface ParallelAgentResult {
|
||||
agentName: AgentName;
|
||||
success: boolean;
|
||||
timing?: number | undefined;
|
||||
cost?: number | undefined;
|
||||
attempts: number;
|
||||
error?: string | undefined;
|
||||
}
|
||||
|
||||
// Configure zx to disable timeouts (let tools run as long as needed)
|
||||
$.timeout = 0;
|
||||
|
||||
// Helper function to get prompt name from agent name
|
||||
const getPromptName = (agentName: AgentName): PromptName => {
|
||||
const mappings: Record<AgentName, PromptName> = {
|
||||
'pre-recon': 'pre-recon-code',
|
||||
'recon': 'recon',
|
||||
'injection-vuln': 'vuln-injection',
|
||||
'xss-vuln': 'vuln-xss',
|
||||
'auth-vuln': 'vuln-auth',
|
||||
'ssrf-vuln': 'vuln-ssrf',
|
||||
'authz-vuln': 'vuln-authz',
|
||||
'injection-exploit': 'exploit-injection',
|
||||
'xss-exploit': 'exploit-xss',
|
||||
'auth-exploit': 'exploit-auth',
|
||||
'ssrf-exploit': 'exploit-ssrf',
|
||||
'authz-exploit': 'exploit-authz',
|
||||
'report': 'report-executive'
|
||||
};
|
||||
|
||||
return mappings[agentName] || agentName as PromptName;
|
||||
};
|
||||
|
||||
// Get color function for agent
|
||||
const getAgentColor = (agentName: AgentName): ChalkInstance => {
|
||||
const colorMap: Partial<Record<AgentName, ChalkInstance>> = {
|
||||
'injection-vuln': chalk.red,
|
||||
'injection-exploit': chalk.red,
|
||||
'xss-vuln': chalk.yellow,
|
||||
'xss-exploit': chalk.yellow,
|
||||
'auth-vuln': chalk.blue,
|
||||
'auth-exploit': chalk.blue,
|
||||
'ssrf-vuln': chalk.magenta,
|
||||
'ssrf-exploit': chalk.magenta,
|
||||
'authz-vuln': chalk.green,
|
||||
'authz-exploit': chalk.green
|
||||
};
|
||||
return colorMap[agentName] || chalk.cyan;
|
||||
};
|
||||
|
||||
/**
|
||||
* Consolidate deliverables from target repo into the session folder
|
||||
*/
|
||||
async function consolidateOutputs(sourceDir: string, sessionPath: string): Promise<void> {
|
||||
const srcDeliverables = path.join(sourceDir, 'deliverables');
|
||||
const destDeliverables = path.join(sessionPath, 'deliverables');
|
||||
|
||||
try {
|
||||
if (await fs.pathExists(srcDeliverables)) {
|
||||
await fs.copy(srcDeliverables, destDeliverables, { overwrite: true });
|
||||
console.log(chalk.gray(`📄 Deliverables copied to session folder`));
|
||||
} else {
|
||||
console.log(chalk.yellow(`⚠️ No deliverables directory found at ${srcDeliverables}`));
|
||||
}
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
console.log(chalk.yellow(`⚠️ Failed to consolidate deliverables: ${err.message}`));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Run a single agent
|
||||
*/
|
||||
async function runAgent(
|
||||
agentName: AgentName,
|
||||
sourceDir: string,
|
||||
variables: PromptVariables,
|
||||
distributedConfig: DistributedConfig | null,
|
||||
pipelineTestingMode: boolean,
|
||||
sessionMetadata: SessionMetadata
|
||||
): Promise<AgentResult> {
|
||||
const agent = AGENTS[agentName];
|
||||
const promptName = getPromptName(agentName);
|
||||
const prompt = await loadPrompt(promptName, variables, distributedConfig, pipelineTestingMode);
|
||||
|
||||
return await runClaudePromptWithRetry(
|
||||
prompt,
|
||||
sourceDir,
|
||||
'*',
|
||||
'',
|
||||
agent.displayName,
|
||||
agentName,
|
||||
getAgentColor(agentName),
|
||||
sessionMetadata
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Run vulnerability agents in parallel
|
||||
*/
|
||||
async function runParallelVuln(
|
||||
sourceDir: string,
|
||||
variables: PromptVariables,
|
||||
distributedConfig: DistributedConfig | null,
|
||||
pipelineTestingMode: boolean,
|
||||
sessionMetadata: SessionMetadata
|
||||
): Promise<ParallelAgentResult[]> {
|
||||
const { vuln: vulnAgents } = getParallelGroups();
|
||||
|
||||
console.log(chalk.cyan(`\nStarting ${vulnAgents.length} vulnerability analysis specialists in parallel...`));
|
||||
console.log(chalk.gray(' Specialists: ' + vulnAgents.join(', ')));
|
||||
console.log();
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
const results = await Promise.allSettled(
|
||||
vulnAgents.map(async (agentName, index) => {
|
||||
// Add 2-second stagger to prevent API overwhelm
|
||||
await new Promise(resolve => setTimeout(resolve, index * 2000));
|
||||
|
||||
let lastError: Error | undefined;
|
||||
let attempts = 0;
|
||||
const maxAttempts = 3;
|
||||
|
||||
while (attempts < maxAttempts) {
|
||||
attempts++;
|
||||
try {
|
||||
const result = await runAgent(
|
||||
agentName,
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
|
||||
// Validate vulnerability analysis results
|
||||
const vulnType = agentName.replace('-vuln', '');
|
||||
try {
|
||||
const validation = await safeValidateQueueAndDeliverable(vulnType as 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz', sourceDir);
|
||||
|
||||
if (validation.success && validation.data) {
|
||||
console.log(chalk.blue(`${agentName}: ${validation.data.shouldExploit ? `Ready for exploitation (${validation.data.vulnerabilityCount} vulnerabilities)` : 'No vulnerabilities found'}`));
|
||||
}
|
||||
} catch {
|
||||
// Validation failure is non-critical
|
||||
}
|
||||
|
||||
return {
|
||||
agentName,
|
||||
success: result.success,
|
||||
timing: result.duration,
|
||||
cost: result.cost,
|
||||
attempts
|
||||
};
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
if (attempts < maxAttempts) {
|
||||
console.log(chalk.yellow(`Warning: ${agentName} failed attempt ${attempts}/${maxAttempts}, retrying...`));
|
||||
await new Promise(resolve => setTimeout(resolve, 5000));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
agentName,
|
||||
success: false,
|
||||
attempts,
|
||||
error: lastError?.message || 'Unknown error'
|
||||
};
|
||||
})
|
||||
);
|
||||
|
||||
const totalDuration = Date.now() - startTime;
|
||||
|
||||
// Process and display results
|
||||
console.log(chalk.cyan('\nVulnerability Analysis Results'));
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
console.log(chalk.bold('Agent Status Attempt Duration Cost'));
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
|
||||
const processedResults: ParallelAgentResult[] = [];
|
||||
|
||||
results.forEach((result, index) => {
|
||||
const agentName = vulnAgents[index]!;
|
||||
const agentDisplay = agentName.padEnd(22);
|
||||
|
||||
if (result.status === 'fulfilled') {
|
||||
const data = result.value;
|
||||
processedResults.push(data);
|
||||
|
||||
if (data.success) {
|
||||
const duration = formatDuration(data.timing || 0);
|
||||
const cost = `$${(data.cost || 0).toFixed(4)}`;
|
||||
|
||||
console.log(
|
||||
`${chalk.green(agentDisplay)} ${chalk.green('Success')} ` +
|
||||
`${data.attempts}/3 ${duration.padEnd(11)} ${cost}`
|
||||
);
|
||||
} else {
|
||||
console.log(
|
||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
||||
`${data.attempts}/3 - -`
|
||||
);
|
||||
if (data.error) {
|
||||
console.log(chalk.gray(` Error: ${data.error.substring(0, 60)}...`));
|
||||
}
|
||||
}
|
||||
} else {
|
||||
processedResults.push({
|
||||
agentName,
|
||||
success: false,
|
||||
attempts: 3,
|
||||
error: String(result.reason)
|
||||
});
|
||||
|
||||
console.log(
|
||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
||||
`3/3 - -`
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
const successCount = processedResults.filter(r => r.success).length;
|
||||
console.log(chalk.cyan(`Summary: ${successCount}/${vulnAgents.length} succeeded in ${formatDuration(totalDuration)}`));
|
||||
|
||||
return processedResults;
|
||||
}
|
||||
|
||||
/**
|
||||
* Run exploitation agents in parallel
|
||||
*/
|
||||
async function runParallelExploit(
|
||||
sourceDir: string,
|
||||
variables: PromptVariables,
|
||||
distributedConfig: DistributedConfig | null,
|
||||
pipelineTestingMode: boolean,
|
||||
sessionMetadata: SessionMetadata
|
||||
): Promise<ParallelAgentResult[]> {
|
||||
const { exploit: exploitAgents, vuln: vulnAgents } = getParallelGroups();
|
||||
|
||||
// Load validation module
|
||||
const { safeValidateQueueAndDeliverable } = await import('./queue-validation.js');
|
||||
|
||||
// Check eligibility
|
||||
const eligibilityChecks = await Promise.all(
|
||||
exploitAgents.map(async (agentName) => {
|
||||
const vulnAgentName = agentName.replace('-exploit', '-vuln') as AgentName;
|
||||
const vulnType = vulnAgentName.replace('-vuln', '') as 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
||||
|
||||
const validation = await safeValidateQueueAndDeliverable(vulnType, sourceDir);
|
||||
|
||||
if (!validation.success || !validation.data?.shouldExploit) {
|
||||
console.log(chalk.gray(`Skipping ${agentName} (no vulnerabilities found in ${vulnAgentName})`));
|
||||
return { agentName, eligible: false };
|
||||
}
|
||||
|
||||
console.log(chalk.blue(`${agentName} eligible (${validation.data.vulnerabilityCount} vulnerabilities from ${vulnAgentName})`));
|
||||
return { agentName, eligible: true };
|
||||
})
|
||||
);
|
||||
|
||||
const eligibleAgents = eligibilityChecks
|
||||
.filter(check => check.eligible)
|
||||
.map(check => check.agentName);
|
||||
|
||||
if (eligibleAgents.length === 0) {
|
||||
console.log(chalk.gray('No exploitation agents eligible (no vulnerabilities found)'));
|
||||
return [];
|
||||
}
|
||||
|
||||
console.log(chalk.cyan(`\nStarting ${eligibleAgents.length} exploitation specialists in parallel...`));
|
||||
console.log(chalk.gray(' Specialists: ' + eligibleAgents.join(', ')));
|
||||
console.log();
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
const results = await Promise.allSettled(
|
||||
eligibleAgents.map(async (agentName, index) => {
|
||||
await new Promise(resolve => setTimeout(resolve, index * 2000));
|
||||
|
||||
let lastError: Error | undefined;
|
||||
let attempts = 0;
|
||||
const maxAttempts = 3;
|
||||
|
||||
while (attempts < maxAttempts) {
|
||||
attempts++;
|
||||
try {
|
||||
const result = await runAgent(
|
||||
agentName,
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
|
||||
return {
|
||||
agentName,
|
||||
success: result.success,
|
||||
timing: result.duration,
|
||||
cost: result.cost,
|
||||
attempts
|
||||
};
|
||||
} catch (error) {
|
||||
lastError = error as Error;
|
||||
if (attempts < maxAttempts) {
|
||||
console.log(chalk.yellow(`Warning: ${agentName} failed attempt ${attempts}/${maxAttempts}, retrying...`));
|
||||
await new Promise(resolve => setTimeout(resolve, 5000));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
agentName,
|
||||
success: false,
|
||||
attempts,
|
||||
error: lastError?.message || 'Unknown error'
|
||||
};
|
||||
})
|
||||
);
|
||||
|
||||
const totalDuration = Date.now() - startTime;
|
||||
|
||||
// Process and display results
|
||||
console.log(chalk.cyan('\nExploitation Results'));
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
console.log(chalk.bold('Agent Status Attempt Duration Cost'));
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
|
||||
const processedResults: ParallelAgentResult[] = [];
|
||||
|
||||
results.forEach((result, index) => {
|
||||
const agentName = eligibleAgents[index]!;
|
||||
const agentDisplay = agentName.padEnd(22);
|
||||
|
||||
if (result.status === 'fulfilled') {
|
||||
const data = result.value;
|
||||
processedResults.push(data);
|
||||
|
||||
if (data.success) {
|
||||
const duration = formatDuration(data.timing || 0);
|
||||
const cost = `$${(data.cost || 0).toFixed(4)}`;
|
||||
|
||||
console.log(
|
||||
`${chalk.green(agentDisplay)} ${chalk.green('Success')} ` +
|
||||
`${data.attempts}/3 ${duration.padEnd(11)} ${cost}`
|
||||
);
|
||||
} else {
|
||||
console.log(
|
||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
||||
`${data.attempts}/3 - -`
|
||||
);
|
||||
if (data.error) {
|
||||
console.log(chalk.gray(` Error: ${data.error.substring(0, 60)}...`));
|
||||
}
|
||||
}
|
||||
} else {
|
||||
processedResults.push({
|
||||
agentName,
|
||||
success: false,
|
||||
attempts: 3,
|
||||
error: String(result.reason)
|
||||
});
|
||||
|
||||
console.log(
|
||||
`${chalk.red(agentDisplay)} ${chalk.red('Failed ')} ` +
|
||||
`3/3 - -`
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
console.log(chalk.gray('-'.repeat(80)));
|
||||
const successCount = processedResults.filter(r => r.success).length;
|
||||
console.log(chalk.cyan(`Summary: ${successCount}/${eligibleAgents.length} succeeded in ${formatDuration(totalDuration)}`));
|
||||
|
||||
return processedResults;
|
||||
}
|
||||
|
||||
// Setup graceful cleanup on process signals
|
||||
process.on('SIGINT', async () => {
|
||||
console.log(chalk.yellow('\n⚠️ Received SIGINT, cleaning up...'));
|
||||
|
||||
process.exit(0);
|
||||
});
|
||||
|
||||
process.on('SIGTERM', async () => {
|
||||
console.log(chalk.yellow('\n⚠️ Received SIGTERM, cleaning up...'));
|
||||
|
||||
process.exit(0);
|
||||
});
|
||||
|
||||
// Main orchestration function
|
||||
async function main(
|
||||
webUrl: string,
|
||||
repoPath: string,
|
||||
configPath: string | null = null,
|
||||
pipelineTestingMode: boolean = false,
|
||||
disableLoader: boolean = false,
|
||||
outputPath: string | null = null
|
||||
): Promise<MainResult> {
|
||||
// Set global flag for loader control
|
||||
global.SHANNON_DISABLE_LOADER = disableLoader;
|
||||
|
||||
const totalTimer = new Timer('total-execution');
|
||||
timingResults.total = totalTimer;
|
||||
|
||||
// Display splash screen
|
||||
await displaySplashScreen();
|
||||
|
||||
console.log(chalk.cyan.bold('🚀 AI PENETRATION TESTING AGENT'));
|
||||
console.log(chalk.cyan(`🎯 Target: ${webUrl}`));
|
||||
console.log(chalk.cyan(`📁 Source: ${repoPath}`));
|
||||
if (configPath) {
|
||||
console.log(chalk.cyan(`⚙️ Config: ${configPath}`));
|
||||
}
|
||||
if (outputPath) {
|
||||
console.log(chalk.cyan(`📂 Output: ${outputPath}`));
|
||||
}
|
||||
console.log(chalk.gray('─'.repeat(60)));
|
||||
|
||||
// Parse configuration if provided
|
||||
let distributedConfig: DistributedConfig | null = null;
|
||||
if (configPath) {
|
||||
try {
|
||||
// Resolve config path - check configs folder if relative path
|
||||
let resolvedConfigPath = configPath;
|
||||
if (!path.isAbsolute(configPath)) {
|
||||
const configsDir = path.join(process.cwd(), 'configs');
|
||||
const configInConfigsDir = path.join(configsDir, configPath);
|
||||
// Check if file exists in configs directory, otherwise use original path
|
||||
if (await fs.pathExists(configInConfigsDir)) {
|
||||
resolvedConfigPath = configInConfigsDir;
|
||||
}
|
||||
}
|
||||
|
||||
const config = await parseConfig(resolvedConfigPath);
|
||||
distributedConfig = distributeConfig(config);
|
||||
console.log(chalk.green(`✅ Configuration loaded successfully`));
|
||||
} catch (error) {
|
||||
await logError(error as Error, `Configuration loading from ${configPath}`);
|
||||
throw error; // Let the main error boundary handle it
|
||||
}
|
||||
}
|
||||
|
||||
// Check tool availability
|
||||
const toolAvailability: ToolAvailability = await checkToolAvailability();
|
||||
handleMissingTools(toolAvailability);
|
||||
|
||||
// Setup local repository
|
||||
console.log(chalk.blue('📁 Setting up local repository...'));
|
||||
let sourceDir: string;
|
||||
try {
|
||||
sourceDir = await setupLocalRepo(repoPath);
|
||||
console.log(chalk.green('✅ Local repository setup successfully'));
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
console.log(chalk.red(`❌ Failed to setup local repository: ${err.message}`));
|
||||
console.log(chalk.gray('This could be due to:'));
|
||||
console.log(chalk.gray(' - Insufficient permissions'));
|
||||
console.log(chalk.gray(' - Repository path not accessible'));
|
||||
console.log(chalk.gray(' - Git initialization issues'));
|
||||
console.log(chalk.gray(' - Insufficient disk space'));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const variables: PromptVariables = { webUrl, repoPath, sourceDir };
|
||||
|
||||
// Create session (acts as lock file)
|
||||
const session: Session = await createSession(webUrl, repoPath);
|
||||
console.log(chalk.blue(`Session created: ${session.id.substring(0, 8)}...`));
|
||||
|
||||
// Session metadata for audit logging
|
||||
const sessionMetadata: SessionMetadata = {
|
||||
id: session.id,
|
||||
webUrl,
|
||||
repoPath: sourceDir,
|
||||
...(outputPath && { outputPath })
|
||||
};
|
||||
|
||||
// Create outputs directory in source directory
|
||||
try {
|
||||
const outputsDir = path.join(sourceDir, 'outputs');
|
||||
await fs.ensureDir(outputsDir);
|
||||
await fs.ensureDir(path.join(outputsDir, 'schemas'));
|
||||
await fs.ensureDir(path.join(outputsDir, 'scans'));
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
throw new PentestError(
|
||||
`Failed to create output directories: ${err.message}`,
|
||||
'filesystem',
|
||||
false,
|
||||
{ sourceDir, originalError: err.message }
|
||||
);
|
||||
}
|
||||
|
||||
try {
|
||||
// PHASE 1: PRE-RECONNAISSANCE
|
||||
const { duration: preReconDuration } = await executePreReconPhase(
|
||||
webUrl,
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
toolAvailability,
|
||||
pipelineTestingMode,
|
||||
session.id,
|
||||
outputPath
|
||||
);
|
||||
console.log(chalk.green(`Pre-reconnaissance complete in ${formatDuration(preReconDuration)}`));
|
||||
|
||||
// PHASE 2: RECONNAISSANCE
|
||||
console.log(chalk.magenta.bold('\n🔎 PHASE 2: RECONNAISSANCE'));
|
||||
console.log(chalk.magenta('Analyzing initial findings...'));
|
||||
const reconTimer = new Timer('phase-2-recon');
|
||||
|
||||
await runAgent(
|
||||
'recon',
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
const reconDuration = reconTimer.stop();
|
||||
console.log(chalk.green(`✅ Reconnaissance complete in ${formatDuration(reconDuration)}`));
|
||||
|
||||
// PHASE 3: VULNERABILITY ANALYSIS
|
||||
const vulnTimer = new Timer('phase-3-vulnerability-analysis');
|
||||
console.log(chalk.red.bold('\n🚨 PHASE 3: VULNERABILITY ANALYSIS'));
|
||||
|
||||
const vulnResults = await runParallelVuln(
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
|
||||
const vulnDuration = vulnTimer.stop();
|
||||
console.log(chalk.green(`✅ Vulnerability analysis phase complete in ${formatDuration(vulnDuration)}`));
|
||||
|
||||
// PHASE 4: EXPLOITATION
|
||||
const exploitTimer = new Timer('phase-4-exploitation');
|
||||
console.log(chalk.red.bold('\n💥 PHASE 4: EXPLOITATION'));
|
||||
|
||||
const exploitResults = await runParallelExploit(
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
|
||||
const exploitDuration = exploitTimer.stop();
|
||||
console.log(chalk.green(`✅ Exploitation phase complete in ${formatDuration(exploitDuration)}`));
|
||||
|
||||
// PHASE 5: REPORTING
|
||||
console.log(chalk.greenBright.bold('\n📊 PHASE 5: REPORTING'));
|
||||
console.log(chalk.greenBright('Generating executive summary and assembling final report...'));
|
||||
const reportTimer = new Timer('phase-5-reporting');
|
||||
|
||||
// Assemble all deliverables into a single concatenated report
|
||||
console.log(chalk.blue('📝 Assembling deliverables from specialist agents...'));
|
||||
try {
|
||||
await assembleFinalReport(sourceDir);
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
console.log(chalk.red(`❌ Error assembling final report: ${err.message}`));
|
||||
}
|
||||
|
||||
// Run reporter agent to create executive summary
|
||||
console.log(chalk.blue('Generating executive summary and cleaning up report...'));
|
||||
await runAgent(
|
||||
'report',
|
||||
sourceDir,
|
||||
variables,
|
||||
distributedConfig,
|
||||
pipelineTestingMode,
|
||||
sessionMetadata
|
||||
);
|
||||
|
||||
const reportDuration = reportTimer.stop();
|
||||
console.log(chalk.green(`✅ Final report generated in ${formatDuration(reportDuration)}`));
|
||||
|
||||
// Calculate final timing
|
||||
timingResults.total.stop();
|
||||
|
||||
// Mark session as completed in both stores
|
||||
await updateSessionStatus(session.id, 'completed');
|
||||
|
||||
// Update audit system's session.json status
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.updateSessionStatus('completed');
|
||||
|
||||
// Display comprehensive timing summary
|
||||
displayTimingSummary();
|
||||
|
||||
console.log(chalk.cyan.bold('\n🎉 PENETRATION TESTING COMPLETE!'));
|
||||
console.log(chalk.gray('─'.repeat(60)));
|
||||
|
||||
// Calculate audit logs path
|
||||
const auditLogsPath = generateAuditPath(sessionMetadata);
|
||||
|
||||
// Consolidate deliverables into the session folder
|
||||
await consolidateOutputs(sourceDir, auditLogsPath);
|
||||
console.log(chalk.green(`\n📂 All outputs consolidated: ${auditLogsPath}`));
|
||||
|
||||
return {
|
||||
reportPath: path.join(sourceDir, 'deliverables', 'comprehensive_security_assessment_report.md'),
|
||||
auditLogsPath
|
||||
};
|
||||
|
||||
} catch (error) {
|
||||
// Mark session as failed in both stores
|
||||
await updateSessionStatus(session.id, 'failed');
|
||||
|
||||
// Update audit system's session.json status
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.updateSessionStatus('failed');
|
||||
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// Entry point - handle both direct node execution and shebang execution
|
||||
let args = process.argv.slice(2);
|
||||
// If first arg is the script name (from shebang), remove it
|
||||
if (args[0] && args[0].includes('shannon')) {
|
||||
args = args.slice(1);
|
||||
}
|
||||
|
||||
// Parse flags and arguments
|
||||
let configPath: string | null = null;
|
||||
let outputPath: string | null = null;
|
||||
let pipelineTestingMode = false;
|
||||
let disableLoader = false;
|
||||
const nonFlagArgs: string[] = [];
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
if (args[i] === '--config') {
|
||||
if (i + 1 < args.length) {
|
||||
configPath = args[i + 1]!;
|
||||
i++; // Skip the next argument
|
||||
} else {
|
||||
console.log(chalk.red('❌ --config flag requires a file path'));
|
||||
process.exit(1);
|
||||
}
|
||||
} else if (args[i] === '--output') {
|
||||
if (i + 1 < args.length) {
|
||||
outputPath = path.resolve(args[i + 1]!);
|
||||
i++; // Skip the next argument
|
||||
} else {
|
||||
console.log(chalk.red('❌ --output flag requires a directory path'));
|
||||
process.exit(1);
|
||||
}
|
||||
} else if (args[i] === '--pipeline-testing') {
|
||||
pipelineTestingMode = true;
|
||||
} else if (args[i] === '--disable-loader') {
|
||||
disableLoader = true;
|
||||
} else if (!args[i]!.startsWith('-')) {
|
||||
nonFlagArgs.push(args[i]!);
|
||||
}
|
||||
}
|
||||
|
||||
// Handle help flag
|
||||
if (args.includes('--help') || args.includes('-h') || args.includes('help')) {
|
||||
showHelp();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Handle no arguments - show help
|
||||
if (nonFlagArgs.length === 0) {
|
||||
console.log(chalk.red.bold('❌ Error: No arguments provided\n'));
|
||||
showHelp();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Handle insufficient arguments
|
||||
if (nonFlagArgs.length < 2) {
|
||||
console.log(chalk.red('❌ Both WEB_URL and REPO_PATH are required'));
|
||||
console.log(chalk.gray('Usage: shannon <WEB_URL> <REPO_PATH> [--config config.yaml]'));
|
||||
console.log(chalk.gray('Help: shannon --help'));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const [webUrl, repoPath] = nonFlagArgs;
|
||||
|
||||
// Validate web URL
|
||||
const webUrlValidation = validateWebUrl(webUrl!);
|
||||
if (!webUrlValidation.valid) {
|
||||
console.log(chalk.red(`❌ Invalid web URL: ${webUrlValidation.error}`));
|
||||
console.log(chalk.gray(`Expected format: https://example.com`));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Validate repository path
|
||||
const repoPathValidation = await validateRepoPath(repoPath!);
|
||||
if (!repoPathValidation.valid) {
|
||||
console.log(chalk.red(`❌ Invalid repository path: ${repoPathValidation.error}`));
|
||||
console.log(chalk.gray(`Expected: Accessible local directory path`));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Success - show validated inputs
|
||||
console.log(chalk.green('✅ Input validation passed:'));
|
||||
console.log(chalk.gray(` Target Web URL: ${webUrl}`));
|
||||
console.log(chalk.gray(` Target Repository: ${repoPathValidation.path}\n`));
|
||||
console.log(chalk.gray(` Config Path: ${configPath}\n`));
|
||||
if (outputPath) {
|
||||
console.log(chalk.gray(` Output Path: ${outputPath}\n`));
|
||||
}
|
||||
if (pipelineTestingMode) {
|
||||
console.log(chalk.yellow('⚡ PIPELINE TESTING MODE ENABLED - Using minimal test prompts for fast pipeline validation\n'));
|
||||
}
|
||||
if (disableLoader) {
|
||||
console.log(chalk.yellow('⚙️ LOADER DISABLED - Progress indicator will not be shown\n'));
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await main(webUrl!, repoPathValidation.path!, configPath, pipelineTestingMode, disableLoader, outputPath);
|
||||
console.log(chalk.green.bold('\n📄 FINAL REPORT AVAILABLE:'));
|
||||
console.log(chalk.cyan(result.reportPath));
|
||||
console.log(chalk.green.bold('\n📂 AUDIT LOGS AVAILABLE:'));
|
||||
console.log(chalk.cyan(result.auditLogsPath));
|
||||
|
||||
} catch (error) {
|
||||
// Enhanced error boundary with proper logging
|
||||
if (error instanceof PentestError) {
|
||||
await logError(error, 'Main execution failed');
|
||||
console.log(chalk.red.bold('\n🚨 PENTEST EXECUTION FAILED'));
|
||||
console.log(chalk.red(` Type: ${error.type}`));
|
||||
console.log(chalk.red(` Retryable: ${error.retryable ? 'Yes' : 'No'}`));
|
||||
|
||||
if (error.retryable) {
|
||||
console.log(chalk.yellow(' Consider running the command again or checking network connectivity.'));
|
||||
}
|
||||
} else {
|
||||
const err = error as Error;
|
||||
console.log(chalk.red.bold('\n🚨 UNEXPECTED ERROR OCCURRED'));
|
||||
console.log(chalk.red(` Error: ${err?.message || err?.toString() || 'Unknown error'}`));
|
||||
|
||||
if (process.env.DEBUG) {
|
||||
console.log(chalk.gray(` Stack: ${err?.stack || 'No stack trace available'}`));
|
||||
}
|
||||
}
|
||||
|
||||
process.exit(1);
|
||||
}
|
||||
@@ -0,0 +1,469 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal activities for Shannon agent execution.
|
||||
*
|
||||
* Each activity wraps a single agent execution with:
|
||||
* - Heartbeat loop (2s interval) to signal worker liveness
|
||||
* - Git checkpoint/rollback/commit per attempt
|
||||
* - Error classification for Temporal retry behavior
|
||||
* - Audit session logging
|
||||
*
|
||||
* Temporal handles retries based on error classification:
|
||||
* - Retryable: BillingError, TransientError (429, 5xx, network)
|
||||
* - Non-retryable: AuthenticationError, PermissionError, ConfigurationError, etc.
|
||||
*/
|
||||
|
||||
import { heartbeat, ApplicationFailure, Context } from '@temporalio/activity';
|
||||
import chalk from 'chalk';
|
||||
|
||||
// Max lengths to prevent Temporal protobuf buffer overflow
|
||||
const MAX_ERROR_MESSAGE_LENGTH = 2000;
|
||||
const MAX_STACK_TRACE_LENGTH = 1000;
|
||||
|
||||
// Max retries for output validation errors (agent didn't save deliverables)
|
||||
// Lower than default 50 since this is unlikely to self-heal
|
||||
const MAX_OUTPUT_VALIDATION_RETRIES = 3;
|
||||
|
||||
/**
|
||||
* Truncate error message to prevent buffer overflow in Temporal serialization.
|
||||
*/
|
||||
function truncateErrorMessage(message: string): string {
|
||||
if (message.length <= MAX_ERROR_MESSAGE_LENGTH) {
|
||||
return message;
|
||||
}
|
||||
return message.slice(0, MAX_ERROR_MESSAGE_LENGTH - 20) + '\n[truncated]';
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate stack trace on an ApplicationFailure to prevent buffer overflow.
|
||||
*/
|
||||
function truncateStackTrace(failure: ApplicationFailure): void {
|
||||
if (failure.stack && failure.stack.length > MAX_STACK_TRACE_LENGTH) {
|
||||
failure.stack = failure.stack.slice(0, MAX_STACK_TRACE_LENGTH) + '\n[stack truncated]';
|
||||
}
|
||||
}
|
||||
|
||||
import {
|
||||
runClaudePrompt,
|
||||
validateAgentOutput,
|
||||
type ClaudePromptResult,
|
||||
} from '../ai/claude-executor.js';
|
||||
import { loadPrompt } from '../prompts/prompt-manager.js';
|
||||
import { parseConfig, distributeConfig } from '../config-parser.js';
|
||||
import { classifyErrorForTemporal } from '../error-handling.js';
|
||||
import {
|
||||
safeValidateQueueAndDeliverable,
|
||||
type VulnType,
|
||||
type ExploitationDecision,
|
||||
} from '../queue-validation.js';
|
||||
import {
|
||||
createGitCheckpoint,
|
||||
commitGitSuccess,
|
||||
rollbackGitWorkspace,
|
||||
getGitCommitHash,
|
||||
} from '../utils/git-manager.js';
|
||||
import { assembleFinalReport } from '../phases/reporting.js';
|
||||
import { getPromptNameForAgent } from '../types/agents.js';
|
||||
import { AuditSession } from '../audit/index.js';
|
||||
import type { WorkflowSummary } from '../audit/workflow-logger.js';
|
||||
import type { AgentName } from '../types/agents.js';
|
||||
import type { AgentMetrics } from './shared.js';
|
||||
import type { DistributedConfig } from '../types/config.js';
|
||||
import type { SessionMetadata } from '../audit/utils.js';
|
||||
|
||||
const HEARTBEAT_INTERVAL_MS = 2000; // Must be < heartbeatTimeout (10min production, 5min testing)
|
||||
|
||||
/**
|
||||
* Input for all agent activities.
|
||||
* Matches PipelineInput but with required workflowId for audit correlation.
|
||||
*/
|
||||
export interface ActivityInput {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
configPath?: string;
|
||||
outputPath?: string;
|
||||
pipelineTestingMode?: boolean;
|
||||
workflowId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Core activity implementation.
|
||||
*
|
||||
* Executes a single agent with:
|
||||
* 1. Heartbeat loop for worker liveness
|
||||
* 2. Config loading (if configPath provided)
|
||||
* 3. Audit session initialization
|
||||
* 4. Prompt loading
|
||||
* 5. Git checkpoint before execution
|
||||
* 6. Agent execution (single attempt)
|
||||
* 7. Output validation
|
||||
* 8. Git commit on success, rollback on failure
|
||||
* 9. Error classification for Temporal retry
|
||||
*/
|
||||
async function runAgentActivity(
|
||||
agentName: AgentName,
|
||||
input: ActivityInput
|
||||
): Promise<AgentMetrics> {
|
||||
const {
|
||||
webUrl,
|
||||
repoPath,
|
||||
configPath,
|
||||
outputPath,
|
||||
pipelineTestingMode = false,
|
||||
workflowId,
|
||||
} = input;
|
||||
|
||||
const startTime = Date.now();
|
||||
|
||||
// Get attempt number from Temporal context (tracks retries automatically)
|
||||
const attemptNumber = Context.current().info.attempt;
|
||||
|
||||
// Heartbeat loop - signals worker is alive to Temporal server
|
||||
const heartbeatInterval = setInterval(() => {
|
||||
const elapsed = Math.floor((Date.now() - startTime) / 1000);
|
||||
heartbeat({ agent: agentName, elapsedSeconds: elapsed, attempt: attemptNumber });
|
||||
}, HEARTBEAT_INTERVAL_MS);
|
||||
|
||||
try {
|
||||
// 1. Load config (if provided)
|
||||
let distributedConfig: DistributedConfig | null = null;
|
||||
if (configPath) {
|
||||
try {
|
||||
const config = await parseConfig(configPath);
|
||||
distributedConfig = distributeConfig(config);
|
||||
} catch (err) {
|
||||
throw new Error(`Failed to load config ${configPath}: ${err instanceof Error ? err.message : String(err)}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Build session metadata for audit
|
||||
const sessionMetadata: SessionMetadata = {
|
||||
id: workflowId,
|
||||
webUrl,
|
||||
repoPath,
|
||||
...(outputPath && { outputPath }),
|
||||
};
|
||||
|
||||
// 3. Initialize audit session (idempotent, safe across retries)
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize();
|
||||
|
||||
// 4. Load prompt
|
||||
const promptName = getPromptNameForAgent(agentName);
|
||||
const prompt = await loadPrompt(
|
||||
promptName,
|
||||
{ webUrl, repoPath },
|
||||
distributedConfig,
|
||||
pipelineTestingMode
|
||||
);
|
||||
|
||||
// 5. Create git checkpoint before execution
|
||||
await createGitCheckpoint(repoPath, agentName, attemptNumber);
|
||||
await auditSession.startAgent(agentName, prompt, attemptNumber);
|
||||
|
||||
// 6. Execute agent (single attempt - Temporal handles retries)
|
||||
const result: ClaudePromptResult = await runClaudePrompt(
|
||||
prompt,
|
||||
repoPath,
|
||||
'', // context
|
||||
agentName, // description
|
||||
agentName,
|
||||
chalk.cyan,
|
||||
sessionMetadata,
|
||||
auditSession,
|
||||
attemptNumber
|
||||
);
|
||||
|
||||
// 6.5. Sanity check: Detect spending cap that slipped through all detection layers
|
||||
// Defense-in-depth: A successful agent execution should never have ≤2 turns with $0 cost
|
||||
if (result.success && (result.turns ?? 0) <= 2 && (result.cost || 0) === 0) {
|
||||
const resultText = result.result || '';
|
||||
const looksLikeBillingError = /spending|cap|limit|budget|resets/i.test(resultText);
|
||||
|
||||
if (looksLikeBillingError) {
|
||||
await rollbackGitWorkspace(repoPath, 'spending cap detected');
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber,
|
||||
duration_ms: result.duration,
|
||||
cost_usd: 0,
|
||||
success: false,
|
||||
error: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
|
||||
});
|
||||
// Throw as billing error so Temporal retries with long backoff
|
||||
throw new Error(`Spending cap likely reached: ${resultText.slice(0, 100)}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Handle execution failure
|
||||
if (!result.success) {
|
||||
await rollbackGitWorkspace(repoPath, 'execution failure');
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber,
|
||||
duration_ms: result.duration,
|
||||
cost_usd: result.cost || 0,
|
||||
success: false,
|
||||
error: result.error || 'Execution failed',
|
||||
});
|
||||
throw new Error(result.error || 'Agent execution failed');
|
||||
}
|
||||
|
||||
// 8. Validate output
|
||||
const validationPassed = await validateAgentOutput(result, agentName, repoPath);
|
||||
if (!validationPassed) {
|
||||
await rollbackGitWorkspace(repoPath, 'validation failure');
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber,
|
||||
duration_ms: result.duration,
|
||||
cost_usd: result.cost || 0,
|
||||
success: false,
|
||||
error: 'Output validation failed',
|
||||
});
|
||||
|
||||
// Limit output validation retries (unlikely to self-heal)
|
||||
if (attemptNumber >= MAX_OUTPUT_VALIDATION_RETRIES) {
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`Agent ${agentName} failed output validation after ${attemptNumber} attempts`,
|
||||
'OutputValidationError',
|
||||
[{ agentName, attemptNumber, elapsed: Date.now() - startTime }]
|
||||
);
|
||||
}
|
||||
// Let Temporal retry (will be classified as OutputValidationError)
|
||||
throw new Error(`Agent ${agentName} failed output validation`);
|
||||
}
|
||||
|
||||
// 9. Success - commit and log
|
||||
const commitHash = await getGitCommitHash(repoPath);
|
||||
await auditSession.endAgent(agentName, {
|
||||
attemptNumber,
|
||||
duration_ms: result.duration,
|
||||
cost_usd: result.cost || 0,
|
||||
success: true,
|
||||
...(commitHash && { checkpoint: commitHash }),
|
||||
});
|
||||
await commitGitSuccess(repoPath, agentName);
|
||||
|
||||
// 10. Return metrics
|
||||
return {
|
||||
durationMs: Date.now() - startTime,
|
||||
inputTokens: null, // Not currently exposed by SDK wrapper
|
||||
outputTokens: null,
|
||||
costUsd: result.cost ?? null,
|
||||
numTurns: result.turns ?? null,
|
||||
};
|
||||
} catch (error) {
|
||||
// Rollback git workspace before Temporal retry to ensure clean state
|
||||
try {
|
||||
await rollbackGitWorkspace(repoPath, 'error recovery');
|
||||
} catch (rollbackErr) {
|
||||
// Log but don't fail - rollback is best-effort
|
||||
console.error(`Failed to rollback git workspace for ${agentName}:`, rollbackErr);
|
||||
}
|
||||
|
||||
// If error is already an ApplicationFailure (e.g., from our retry limit logic),
|
||||
// re-throw it directly without re-classifying
|
||||
if (error instanceof ApplicationFailure) {
|
||||
throw error;
|
||||
}
|
||||
|
||||
// Classify error for Temporal retry behavior
|
||||
const classified = classifyErrorForTemporal(error);
|
||||
// Truncate message to prevent protobuf buffer overflow
|
||||
const rawMessage = error instanceof Error ? error.message : String(error);
|
||||
const message = truncateErrorMessage(rawMessage);
|
||||
|
||||
if (classified.retryable) {
|
||||
// Temporal will retry with configured backoff
|
||||
const failure = ApplicationFailure.create({
|
||||
message,
|
||||
type: classified.type,
|
||||
details: [{ agentName, attemptNumber, elapsed: Date.now() - startTime }],
|
||||
});
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
} else {
|
||||
// Fail immediately - no retry
|
||||
const failure = ApplicationFailure.nonRetryable(message, classified.type, [
|
||||
{ agentName, attemptNumber, elapsed: Date.now() - startTime },
|
||||
]);
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
}
|
||||
} finally {
|
||||
clearInterval(heartbeatInterval);
|
||||
}
|
||||
}
|
||||
|
||||
// === Individual Agent Activity Exports ===
|
||||
// Each function is a thin wrapper around runAgentActivity with the agent name.
|
||||
|
||||
export async function runPreReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('pre-recon', input);
|
||||
}
|
||||
|
||||
export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('recon', input);
|
||||
}
|
||||
|
||||
export async function runInjectionVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('injection-vuln', input);
|
||||
}
|
||||
|
||||
export async function runXssVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('xss-vuln', input);
|
||||
}
|
||||
|
||||
export async function runAuthVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('auth-vuln', input);
|
||||
}
|
||||
|
||||
export async function runSsrfVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('ssrf-vuln', input);
|
||||
}
|
||||
|
||||
export async function runAuthzVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('authz-vuln', input);
|
||||
}
|
||||
|
||||
export async function runInjectionExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('injection-exploit', input);
|
||||
}
|
||||
|
||||
export async function runXssExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('xss-exploit', input);
|
||||
}
|
||||
|
||||
export async function runAuthExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('auth-exploit', input);
|
||||
}
|
||||
|
||||
export async function runSsrfExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('ssrf-exploit', input);
|
||||
}
|
||||
|
||||
export async function runAuthzExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('authz-exploit', input);
|
||||
}
|
||||
|
||||
export async function runReportAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('report', input);
|
||||
}
|
||||
|
||||
/**
|
||||
* Assemble the final report by concatenating exploitation evidence files.
|
||||
* This must be called BEFORE runReportAgent to create the file that the report agent will modify.
|
||||
*/
|
||||
export async function assembleReportActivity(input: ActivityInput): Promise<void> {
|
||||
const { repoPath } = input;
|
||||
console.log(chalk.blue('📝 Assembling deliverables from specialist agents...'));
|
||||
try {
|
||||
await assembleFinalReport(repoPath);
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
console.log(chalk.yellow(`⚠️ Error assembling final report: ${err.message}`));
|
||||
// Don't throw - the report agent can still create content even if no exploitation files exist
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if exploitation should run for a given vulnerability type.
|
||||
* Reads the vulnerability queue file and returns the decision.
|
||||
*
|
||||
* This activity allows the workflow to skip exploit agents entirely
|
||||
* when no vulnerabilities were found, saving API calls and time.
|
||||
*
|
||||
* Error handling:
|
||||
* - Retryable errors (missing files, invalid JSON): re-throw for Temporal retry
|
||||
* - Non-retryable errors: skip exploitation gracefully
|
||||
*/
|
||||
export async function checkExploitationQueue(
|
||||
input: ActivityInput,
|
||||
vulnType: VulnType
|
||||
): Promise<ExploitationDecision> {
|
||||
const { repoPath } = input;
|
||||
|
||||
const result = await safeValidateQueueAndDeliverable(vulnType, repoPath);
|
||||
|
||||
if (result.success && result.data) {
|
||||
const { shouldExploit, vulnerabilityCount } = result.data;
|
||||
console.log(
|
||||
chalk.blue(
|
||||
`🔍 ${vulnType}: ${shouldExploit ? `${vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`
|
||||
)
|
||||
);
|
||||
return result.data;
|
||||
}
|
||||
|
||||
// Validation failed - check if we should retry or skip
|
||||
const error = result.error;
|
||||
if (error?.retryable) {
|
||||
// Re-throw retryable errors so Temporal can retry the vuln agent
|
||||
console.log(chalk.yellow(`⚠️ ${vulnType}: ${error.message} (retrying)`));
|
||||
throw error;
|
||||
}
|
||||
|
||||
// Non-retryable error - skip exploitation gracefully
|
||||
console.log(
|
||||
chalk.yellow(`⚠️ ${vulnType}: ${error?.message ?? 'Unknown error'}, skipping exploitation`)
|
||||
);
|
||||
return {
|
||||
shouldExploit: false,
|
||||
shouldRetry: false,
|
||||
vulnerabilityCount: 0,
|
||||
vulnType,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase transition to the unified workflow log.
|
||||
* Called at phase boundaries for per-workflow logging.
|
||||
*/
|
||||
export async function logPhaseTransition(
|
||||
input: ActivityInput,
|
||||
phase: string,
|
||||
event: 'start' | 'complete'
|
||||
): Promise<void> {
|
||||
const { webUrl, repoPath, outputPath, workflowId } = input;
|
||||
|
||||
const sessionMetadata: SessionMetadata = {
|
||||
id: workflowId,
|
||||
webUrl,
|
||||
repoPath,
|
||||
...(outputPath && { outputPath }),
|
||||
};
|
||||
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize();
|
||||
|
||||
if (event === 'start') {
|
||||
await auditSession.logPhaseStart(phase);
|
||||
} else {
|
||||
await auditSession.logPhaseComplete(phase);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion with full summary to the unified workflow log.
|
||||
* Called at the end of the workflow to write a summary breakdown.
|
||||
*/
|
||||
export async function logWorkflowComplete(
|
||||
input: ActivityInput,
|
||||
summary: WorkflowSummary
|
||||
): Promise<void> {
|
||||
const { webUrl, repoPath, outputPath, workflowId } = input;
|
||||
|
||||
const sessionMetadata: SessionMetadata = {
|
||||
id: workflowId,
|
||||
webUrl,
|
||||
repoPath,
|
||||
...(outputPath && { outputPath }),
|
||||
};
|
||||
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize();
|
||||
await auditSession.logWorkflowComplete(summary);
|
||||
}
|
||||
@@ -0,0 +1,212 @@
|
||||
#!/usr/bin/env node
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal client for starting Shannon pentest pipeline workflows.
|
||||
*
|
||||
* Starts a workflow and optionally waits for completion with progress polling.
|
||||
*
|
||||
* Usage:
|
||||
* npm run temporal:start -- <webUrl> <repoPath> [options]
|
||||
* # or
|
||||
* node dist/temporal/client.js <webUrl> <repoPath> [options]
|
||||
*
|
||||
* Options:
|
||||
* --config <path> Configuration file path
|
||||
* --output <path> Output directory for audit logs
|
||||
* --pipeline-testing Use minimal prompts for fast testing
|
||||
* --workflow-id <id> Custom workflow ID (default: shannon-<timestamp>)
|
||||
* --wait Wait for workflow completion with progress polling
|
||||
*
|
||||
* Environment:
|
||||
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||
*/
|
||||
|
||||
import { Connection, Client } from '@temporalio/client';
|
||||
import dotenv from 'dotenv';
|
||||
import chalk from 'chalk';
|
||||
import { displaySplashScreen } from '../splash-screen.js';
|
||||
import { sanitizeHostname } from '../audit/utils.js';
|
||||
// Import types only - these don't pull in workflow runtime code
|
||||
import type { PipelineInput, PipelineState, PipelineProgress } from './shared.js';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
// Query name must match the one defined in workflows.ts
|
||||
const PROGRESS_QUERY = 'getProgress';
|
||||
|
||||
function showUsage(): void {
|
||||
console.log(chalk.cyan.bold('\nShannon Temporal Client'));
|
||||
console.log(chalk.gray('Start a pentest pipeline workflow\n'));
|
||||
console.log(chalk.yellow('Usage:'));
|
||||
console.log(
|
||||
' node dist/temporal/client.js <webUrl> <repoPath> [options]\n'
|
||||
);
|
||||
console.log(chalk.yellow('Options:'));
|
||||
console.log(' --config <path> Configuration file path');
|
||||
console.log(' --output <path> Output directory for audit logs');
|
||||
console.log(' --pipeline-testing Use minimal prompts for fast testing');
|
||||
console.log(
|
||||
' --workflow-id <id> Custom workflow ID (default: shannon-<timestamp>)'
|
||||
);
|
||||
console.log(' --wait Wait for workflow completion with progress polling\n');
|
||||
console.log(chalk.yellow('Examples:'));
|
||||
console.log(' node dist/temporal/client.js https://example.com /path/to/repo');
|
||||
console.log(
|
||||
' node dist/temporal/client.js https://example.com /path/to/repo --config config.yaml\n'
|
||||
);
|
||||
}
|
||||
|
||||
async function startPipeline(): Promise<void> {
|
||||
const args = process.argv.slice(2);
|
||||
|
||||
if (args.includes('--help') || args.includes('-h') || args.length === 0) {
|
||||
showUsage();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
// Parse arguments
|
||||
let webUrl: string | undefined;
|
||||
let repoPath: string | undefined;
|
||||
let configPath: string | undefined;
|
||||
let outputPath: string | undefined;
|
||||
let pipelineTestingMode = false;
|
||||
let customWorkflowId: string | undefined;
|
||||
let waitForCompletion = false;
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
const arg = args[i];
|
||||
if (arg === '--config') {
|
||||
const nextArg = args[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
configPath = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--output') {
|
||||
const nextArg = args[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
outputPath = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--workflow-id') {
|
||||
const nextArg = args[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
customWorkflowId = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--pipeline-testing') {
|
||||
pipelineTestingMode = true;
|
||||
} else if (arg === '--wait') {
|
||||
waitForCompletion = true;
|
||||
} else if (arg && !arg.startsWith('-')) {
|
||||
if (!webUrl) {
|
||||
webUrl = arg;
|
||||
} else if (!repoPath) {
|
||||
repoPath = arg;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (!webUrl || !repoPath) {
|
||||
console.log(chalk.red('Error: webUrl and repoPath are required'));
|
||||
showUsage();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// Display splash screen
|
||||
await displaySplashScreen();
|
||||
|
||||
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||
console.log(chalk.gray(`Connecting to Temporal at ${address}...`));
|
||||
|
||||
const connection = await Connection.connect({ address });
|
||||
const client = new Client({ connection });
|
||||
|
||||
try {
|
||||
const hostname = sanitizeHostname(webUrl);
|
||||
const workflowId = customWorkflowId || `${hostname}_shannon-${Date.now()}`;
|
||||
|
||||
const input: PipelineInput = {
|
||||
webUrl,
|
||||
repoPath,
|
||||
...(configPath && { configPath }),
|
||||
...(outputPath && { outputPath }),
|
||||
...(pipelineTestingMode && { pipelineTestingMode }),
|
||||
};
|
||||
|
||||
console.log(chalk.green.bold(`✓ Workflow started: ${workflowId}`));
|
||||
console.log();
|
||||
console.log(chalk.white(' Target: ') + chalk.cyan(webUrl));
|
||||
console.log(chalk.white(' Repository: ') + chalk.cyan(repoPath));
|
||||
if (configPath) {
|
||||
console.log(chalk.white(' Config: ') + chalk.cyan(configPath));
|
||||
}
|
||||
if (pipelineTestingMode) {
|
||||
console.log(chalk.white(' Mode: ') + chalk.yellow('Pipeline Testing'));
|
||||
}
|
||||
console.log();
|
||||
|
||||
// Start workflow by name (not by importing the function)
|
||||
const handle = await client.workflow.start<(input: PipelineInput) => Promise<PipelineState>>(
|
||||
'pentestPipelineWorkflow',
|
||||
{
|
||||
taskQueue: 'shannon-pipeline',
|
||||
workflowId,
|
||||
args: [input],
|
||||
}
|
||||
);
|
||||
|
||||
if (!waitForCompletion) {
|
||||
console.log(chalk.bold('Monitor progress:'));
|
||||
console.log(chalk.white(' Web UI: ') + chalk.blue(`http://localhost:8233/namespaces/default/workflows/${workflowId}`));
|
||||
console.log(chalk.white(' Logs: ') + chalk.gray(`./shannon logs ID=${workflowId}`));
|
||||
console.log(chalk.white(' Query: ') + chalk.gray(`./shannon query ID=${workflowId}`));
|
||||
console.log();
|
||||
return;
|
||||
}
|
||||
|
||||
// Poll for progress every 30 seconds
|
||||
const progressInterval = setInterval(async () => {
|
||||
try {
|
||||
const progress = await handle.query<PipelineProgress>(PROGRESS_QUERY);
|
||||
const elapsed = Math.floor(progress.elapsedMs / 1000);
|
||||
console.log(
|
||||
chalk.gray(`[${elapsed}s]`),
|
||||
chalk.cyan(`Phase: ${progress.currentPhase || 'unknown'}`),
|
||||
chalk.gray(`| Agent: ${progress.currentAgent || 'none'}`),
|
||||
chalk.gray(`| Completed: ${progress.completedAgents.length}/13`)
|
||||
);
|
||||
} catch {
|
||||
// Workflow may have completed
|
||||
}
|
||||
}, 30000);
|
||||
|
||||
try {
|
||||
const result = await handle.result();
|
||||
clearInterval(progressInterval);
|
||||
|
||||
console.log(chalk.green.bold('\nPipeline completed successfully!'));
|
||||
if (result.summary) {
|
||||
console.log(chalk.gray(`Duration: ${Math.floor(result.summary.totalDurationMs / 1000)}s`));
|
||||
console.log(chalk.gray(`Agents completed: ${result.summary.agentCount}`));
|
||||
console.log(chalk.gray(`Total turns: ${result.summary.totalTurns}`));
|
||||
console.log(chalk.gray(`Total cost: $${result.summary.totalCostUsd.toFixed(4)}`));
|
||||
}
|
||||
} catch (error) {
|
||||
clearInterval(progressInterval);
|
||||
console.error(chalk.red.bold('\nPipeline failed:'), error);
|
||||
process.exit(1);
|
||||
}
|
||||
} finally {
|
||||
await connection.close();
|
||||
}
|
||||
}
|
||||
|
||||
startPipeline().catch((err) => {
|
||||
console.error(chalk.red('Client error:'), err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -0,0 +1,155 @@
|
||||
#!/usr/bin/env node
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal query tool for inspecting Shannon workflow progress.
|
||||
*
|
||||
* Queries a running or completed workflow and displays its state.
|
||||
*
|
||||
* Usage:
|
||||
* npm run temporal:query -- <workflowId>
|
||||
* # or
|
||||
* node dist/temporal/query.js <workflowId>
|
||||
*
|
||||
* Environment:
|
||||
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||
*/
|
||||
|
||||
import { Connection, Client } from '@temporalio/client';
|
||||
import dotenv from 'dotenv';
|
||||
import chalk from 'chalk';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
// Query name must match the one defined in workflows.ts
|
||||
const PROGRESS_QUERY = 'getProgress';
|
||||
|
||||
// Types duplicated from shared.ts to avoid importing workflow APIs
|
||||
interface AgentMetrics {
|
||||
durationMs: number;
|
||||
inputTokens: number | null;
|
||||
outputTokens: number | null;
|
||||
costUsd: number | null;
|
||||
numTurns: number | null;
|
||||
}
|
||||
|
||||
interface PipelineProgress {
|
||||
status: 'running' | 'completed' | 'failed';
|
||||
currentPhase: string | null;
|
||||
currentAgent: string | null;
|
||||
completedAgents: string[];
|
||||
failedAgent: string | null;
|
||||
error: string | null;
|
||||
startTime: number;
|
||||
agentMetrics: Record<string, AgentMetrics>;
|
||||
workflowId: string;
|
||||
elapsedMs: number;
|
||||
}
|
||||
|
||||
function showUsage(): void {
|
||||
console.log(chalk.cyan.bold('\nShannon Temporal Query Tool'));
|
||||
console.log(chalk.gray('Query progress of a running workflow\n'));
|
||||
console.log(chalk.yellow('Usage:'));
|
||||
console.log(' node dist/temporal/query.js <workflowId>\n');
|
||||
console.log(chalk.yellow('Examples:'));
|
||||
console.log(' node dist/temporal/query.js shannon-1704672000000\n');
|
||||
}
|
||||
|
||||
function getStatusColor(status: string): string {
|
||||
switch (status) {
|
||||
case 'running':
|
||||
return chalk.yellow(status);
|
||||
case 'completed':
|
||||
return chalk.green(status);
|
||||
case 'failed':
|
||||
return chalk.red(status);
|
||||
default:
|
||||
return status;
|
||||
}
|
||||
}
|
||||
|
||||
function formatDuration(ms: number): string {
|
||||
const seconds = Math.floor(ms / 1000);
|
||||
const minutes = Math.floor(seconds / 60);
|
||||
const hours = Math.floor(minutes / 60);
|
||||
|
||||
if (hours > 0) {
|
||||
return `${hours}h ${minutes % 60}m`;
|
||||
} else if (minutes > 0) {
|
||||
return `${minutes}m ${seconds % 60}s`;
|
||||
}
|
||||
return `${seconds}s`;
|
||||
}
|
||||
|
||||
async function queryWorkflow(): Promise<void> {
|
||||
const workflowId = process.argv[2];
|
||||
|
||||
if (!workflowId || workflowId === '--help' || workflowId === '-h') {
|
||||
showUsage();
|
||||
process.exit(workflowId ? 0 : 1);
|
||||
}
|
||||
|
||||
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||
|
||||
const connection = await Connection.connect({ address });
|
||||
const client = new Client({ connection });
|
||||
|
||||
try {
|
||||
const handle = client.workflow.getHandle(workflowId);
|
||||
const progress = await handle.query<PipelineProgress>(PROGRESS_QUERY);
|
||||
|
||||
console.log(chalk.cyan.bold('\nWorkflow Progress'));
|
||||
console.log(chalk.gray('\u2500'.repeat(40)));
|
||||
console.log(`${chalk.white('Workflow ID:')} ${progress.workflowId}`);
|
||||
console.log(`${chalk.white('Status:')} ${getStatusColor(progress.status)}`);
|
||||
console.log(
|
||||
`${chalk.white('Current Phase:')} ${progress.currentPhase || 'none'}`
|
||||
);
|
||||
console.log(
|
||||
`${chalk.white('Current Agent:')} ${progress.currentAgent || 'none'}`
|
||||
);
|
||||
console.log(`${chalk.white('Elapsed:')} ${formatDuration(progress.elapsedMs)}`);
|
||||
console.log(
|
||||
`${chalk.white('Completed:')} ${progress.completedAgents.length}/13 agents`
|
||||
);
|
||||
|
||||
if (progress.completedAgents.length > 0) {
|
||||
console.log(chalk.gray('\nCompleted agents:'));
|
||||
for (const agent of progress.completedAgents) {
|
||||
const metrics = progress.agentMetrics[agent];
|
||||
const duration = metrics ? formatDuration(metrics.durationMs) : 'unknown';
|
||||
const cost = metrics?.costUsd ? `$${metrics.costUsd.toFixed(4)}` : '';
|
||||
console.log(
|
||||
chalk.green(` - ${agent}`) +
|
||||
chalk.gray(` (${duration}${cost ? ', ' + cost : ''})`)
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
if (progress.error) {
|
||||
console.log(chalk.red(`\nError: ${progress.error}`));
|
||||
console.log(chalk.red(`Failed agent: ${progress.failedAgent}`));
|
||||
}
|
||||
|
||||
console.log();
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
if (err.message?.includes('not found')) {
|
||||
console.log(chalk.red(`Workflow not found: ${workflowId}`));
|
||||
} else {
|
||||
console.error(chalk.red('Query failed:'), err.message);
|
||||
}
|
||||
process.exit(1);
|
||||
} finally {
|
||||
await connection.close();
|
||||
}
|
||||
}
|
||||
|
||||
queryWorkflow().catch((err) => {
|
||||
console.error(chalk.red('Query error:'), err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -0,0 +1,61 @@
|
||||
import { defineQuery } from '@temporalio/workflow';
|
||||
|
||||
// === Types ===
|
||||
|
||||
export interface PipelineInput {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
configPath?: string;
|
||||
outputPath?: string;
|
||||
pipelineTestingMode?: boolean;
|
||||
workflowId?: string; // Added by client, used for audit correlation
|
||||
}
|
||||
|
||||
export interface AgentMetrics {
|
||||
durationMs: number;
|
||||
inputTokens: number | null;
|
||||
outputTokens: number | null;
|
||||
costUsd: number | null;
|
||||
numTurns: number | null;
|
||||
}
|
||||
|
||||
export interface PipelineSummary {
|
||||
totalCostUsd: number;
|
||||
totalDurationMs: number; // Wall-clock time (end - start)
|
||||
totalTurns: number;
|
||||
agentCount: number;
|
||||
}
|
||||
|
||||
export interface PipelineState {
|
||||
status: 'running' | 'completed' | 'failed';
|
||||
currentPhase: string | null;
|
||||
currentAgent: string | null;
|
||||
completedAgents: string[];
|
||||
failedAgent: string | null;
|
||||
error: string | null;
|
||||
startTime: number;
|
||||
agentMetrics: Record<string, AgentMetrics>;
|
||||
summary: PipelineSummary | null;
|
||||
}
|
||||
|
||||
// Extended state returned by getProgress query (includes computed fields)
|
||||
export interface PipelineProgress extends PipelineState {
|
||||
workflowId: string;
|
||||
elapsedMs: number;
|
||||
}
|
||||
|
||||
// Result from a single vuln→exploit pipeline
|
||||
export interface VulnExploitPipelineResult {
|
||||
vulnType: string;
|
||||
vulnMetrics: AgentMetrics | null;
|
||||
exploitMetrics: AgentMetrics | null;
|
||||
exploitDecision: {
|
||||
shouldExploit: boolean;
|
||||
vulnerabilityCount: number;
|
||||
} | null;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
// === Queries ===
|
||||
|
||||
export const getProgress = defineQuery<PipelineProgress>('getProgress');
|
||||
@@ -0,0 +1,79 @@
|
||||
#!/usr/bin/env node
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal worker for Shannon pentest pipeline.
|
||||
*
|
||||
* Polls the 'shannon-pipeline' task queue and executes activities.
|
||||
* Handles up to 25 concurrent activities to support multiple parallel workflows.
|
||||
*
|
||||
* Usage:
|
||||
* npm run temporal:worker
|
||||
* # or
|
||||
* node dist/temporal/worker.js
|
||||
*
|
||||
* Environment:
|
||||
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||
*/
|
||||
|
||||
import { NativeConnection, Worker, bundleWorkflowCode } from '@temporalio/worker';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import path from 'node:path';
|
||||
import dotenv from 'dotenv';
|
||||
import chalk from 'chalk';
|
||||
import * as activities from './activities.js';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
||||
|
||||
async function runWorker(): Promise<void> {
|
||||
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||
console.log(chalk.cyan(`Connecting to Temporal at ${address}...`));
|
||||
|
||||
const connection = await NativeConnection.connect({ address });
|
||||
|
||||
// Bundle workflows for Temporal's V8 isolate
|
||||
console.log(chalk.gray('Bundling workflows...'));
|
||||
const workflowBundle = await bundleWorkflowCode({
|
||||
workflowsPath: path.join(__dirname, 'workflows.js'),
|
||||
});
|
||||
|
||||
const worker = await Worker.create({
|
||||
connection,
|
||||
namespace: 'default',
|
||||
workflowBundle,
|
||||
activities,
|
||||
taskQueue: 'shannon-pipeline',
|
||||
maxConcurrentActivityTaskExecutions: 25, // Support multiple parallel workflows (5 agents × ~5 workflows)
|
||||
});
|
||||
|
||||
// Graceful shutdown handling
|
||||
const shutdown = async (): Promise<void> => {
|
||||
console.log(chalk.yellow('\nShutting down worker...'));
|
||||
worker.shutdown();
|
||||
};
|
||||
|
||||
process.on('SIGINT', shutdown);
|
||||
process.on('SIGTERM', shutdown);
|
||||
|
||||
console.log(chalk.green('Shannon worker started'));
|
||||
console.log(chalk.gray('Task queue: shannon-pipeline'));
|
||||
console.log(chalk.gray('Press Ctrl+C to stop\n'));
|
||||
|
||||
try {
|
||||
await worker.run();
|
||||
} finally {
|
||||
await connection.close();
|
||||
console.log(chalk.gray('Worker stopped'));
|
||||
}
|
||||
}
|
||||
|
||||
runWorker().catch((err) => {
|
||||
console.error(chalk.red('Worker failed:'), err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -0,0 +1,325 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal workflow for Shannon pentest pipeline.
|
||||
*
|
||||
* Orchestrates the penetration testing workflow:
|
||||
* 1. Pre-Reconnaissance (sequential)
|
||||
* 2. Reconnaissance (sequential)
|
||||
* 3-4. Vulnerability + Exploitation (5 pipelined pairs in parallel)
|
||||
* Each pair: vuln agent → queue check → conditional exploit
|
||||
* No synchronization barrier - exploits start when their vuln finishes
|
||||
* 5. Reporting (sequential)
|
||||
*
|
||||
* Features:
|
||||
* - Queryable state via getProgress
|
||||
* - Automatic retry with backoff for transient/billing errors
|
||||
* - Non-retryable classification for permanent errors
|
||||
* - Audit correlation via workflowId
|
||||
* - Graceful failure handling: pipelines continue if one fails
|
||||
*/
|
||||
|
||||
import {
|
||||
proxyActivities,
|
||||
setHandler,
|
||||
workflowInfo,
|
||||
} from '@temporalio/workflow';
|
||||
import type * as activities from './activities.js';
|
||||
import type { ActivityInput } from './activities.js';
|
||||
import {
|
||||
getProgress,
|
||||
type PipelineInput,
|
||||
type PipelineState,
|
||||
type PipelineProgress,
|
||||
type PipelineSummary,
|
||||
type VulnExploitPipelineResult,
|
||||
type AgentMetrics,
|
||||
} from './shared.js';
|
||||
import type { VulnType } from '../queue-validation.js';
|
||||
|
||||
// Retry configuration for production (long intervals for billing recovery)
|
||||
const PRODUCTION_RETRY = {
|
||||
initialInterval: '5 minutes',
|
||||
maximumInterval: '30 minutes',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 50,
|
||||
nonRetryableErrorTypes: [
|
||||
'AuthenticationError',
|
||||
'PermissionError',
|
||||
'InvalidRequestError',
|
||||
'RequestTooLargeError',
|
||||
'ConfigurationError',
|
||||
'InvalidTargetError',
|
||||
'ExecutionLimitError',
|
||||
],
|
||||
};
|
||||
|
||||
// Retry configuration for pipeline testing (fast iteration)
|
||||
const TESTING_RETRY = {
|
||||
initialInterval: '10 seconds',
|
||||
maximumInterval: '30 seconds',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 5,
|
||||
nonRetryableErrorTypes: PRODUCTION_RETRY.nonRetryableErrorTypes,
|
||||
};
|
||||
|
||||
// Activity proxy with production retry configuration (default)
|
||||
const acts = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '2 hours',
|
||||
heartbeatTimeout: '10 minutes', // Long timeout for resource-constrained workers with many concurrent activities
|
||||
retry: PRODUCTION_RETRY,
|
||||
});
|
||||
|
||||
// Activity proxy with testing retry configuration (fast)
|
||||
const testActs = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '10 minutes',
|
||||
heartbeatTimeout: '5 minutes', // Shorter for testing but still tolerant of resource contention
|
||||
retry: TESTING_RETRY,
|
||||
});
|
||||
|
||||
/**
|
||||
* Compute aggregated metrics from the current pipeline state.
|
||||
* Called on both success and failure to provide partial metrics.
|
||||
*/
|
||||
function computeSummary(state: PipelineState): PipelineSummary {
|
||||
const metrics = Object.values(state.agentMetrics);
|
||||
return {
|
||||
totalCostUsd: metrics.reduce((sum, m) => sum + (m.costUsd ?? 0), 0),
|
||||
totalDurationMs: Date.now() - state.startTime,
|
||||
totalTurns: metrics.reduce((sum, m) => sum + (m.numTurns ?? 0), 0),
|
||||
agentCount: state.completedAgents.length,
|
||||
};
|
||||
}
|
||||
|
||||
export async function pentestPipelineWorkflow(
|
||||
input: PipelineInput
|
||||
): Promise<PipelineState> {
|
||||
const { workflowId } = workflowInfo();
|
||||
|
||||
// Select activity proxy based on testing mode
|
||||
// Pipeline testing uses fast retry intervals (10s) for quick iteration
|
||||
const a = input.pipelineTestingMode ? testActs : acts;
|
||||
|
||||
// Workflow state (queryable)
|
||||
const state: PipelineState = {
|
||||
status: 'running',
|
||||
currentPhase: null,
|
||||
currentAgent: null,
|
||||
completedAgents: [],
|
||||
failedAgent: null,
|
||||
error: null,
|
||||
startTime: Date.now(),
|
||||
agentMetrics: {},
|
||||
summary: null,
|
||||
};
|
||||
|
||||
// Register query handler for real-time progress inspection
|
||||
setHandler(getProgress, (): PipelineProgress => ({
|
||||
...state,
|
||||
workflowId,
|
||||
elapsedMs: Date.now() - state.startTime,
|
||||
}));
|
||||
|
||||
// Build ActivityInput with required workflowId for audit correlation
|
||||
// Activities require workflowId (non-optional), PipelineInput has it optional
|
||||
// Use spread to conditionally include optional properties (exactOptionalPropertyTypes)
|
||||
const activityInput: ActivityInput = {
|
||||
webUrl: input.webUrl,
|
||||
repoPath: input.repoPath,
|
||||
workflowId,
|
||||
...(input.configPath !== undefined && { configPath: input.configPath }),
|
||||
...(input.outputPath !== undefined && { outputPath: input.outputPath }),
|
||||
...(input.pipelineTestingMode !== undefined && {
|
||||
pipelineTestingMode: input.pipelineTestingMode,
|
||||
}),
|
||||
};
|
||||
|
||||
try {
|
||||
// === Phase 1: Pre-Reconnaissance ===
|
||||
state.currentPhase = 'pre-recon';
|
||||
state.currentAgent = 'pre-recon';
|
||||
await a.logPhaseTransition(activityInput, 'pre-recon', 'start');
|
||||
state.agentMetrics['pre-recon'] =
|
||||
await a.runPreReconAgent(activityInput);
|
||||
state.completedAgents.push('pre-recon');
|
||||
await a.logPhaseTransition(activityInput, 'pre-recon', 'complete');
|
||||
|
||||
// === Phase 2: Reconnaissance ===
|
||||
state.currentPhase = 'recon';
|
||||
state.currentAgent = 'recon';
|
||||
await a.logPhaseTransition(activityInput, 'recon', 'start');
|
||||
state.agentMetrics['recon'] = await a.runReconAgent(activityInput);
|
||||
state.completedAgents.push('recon');
|
||||
await a.logPhaseTransition(activityInput, 'recon', 'complete');
|
||||
|
||||
// === Phases 3-4: Vulnerability Analysis + Exploitation (Pipelined) ===
|
||||
// Each vuln type runs as an independent pipeline:
|
||||
// vuln agent → queue check → conditional exploit agent
|
||||
// This eliminates the synchronization barrier between phases - each exploit
|
||||
// starts immediately when its vuln agent finishes, not waiting for all.
|
||||
state.currentPhase = 'vulnerability-exploitation';
|
||||
state.currentAgent = 'pipelines';
|
||||
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'start');
|
||||
|
||||
// Helper: Run a single vuln→exploit pipeline
|
||||
async function runVulnExploitPipeline(
|
||||
vulnType: VulnType,
|
||||
runVulnAgent: () => Promise<AgentMetrics>,
|
||||
runExploitAgent: () => Promise<AgentMetrics>
|
||||
): Promise<VulnExploitPipelineResult> {
|
||||
// Step 1: Run vulnerability agent
|
||||
const vulnMetrics = await runVulnAgent();
|
||||
|
||||
// Step 2: Check exploitation queue (starts immediately after vuln)
|
||||
const decision = await a.checkExploitationQueue(activityInput, vulnType);
|
||||
|
||||
// Step 3: Conditionally run exploit agent
|
||||
let exploitMetrics: AgentMetrics | null = null;
|
||||
if (decision.shouldExploit) {
|
||||
exploitMetrics = await runExploitAgent();
|
||||
}
|
||||
|
||||
return {
|
||||
vulnType,
|
||||
vulnMetrics,
|
||||
exploitMetrics,
|
||||
exploitDecision: {
|
||||
shouldExploit: decision.shouldExploit,
|
||||
vulnerabilityCount: decision.vulnerabilityCount,
|
||||
},
|
||||
error: null,
|
||||
};
|
||||
}
|
||||
|
||||
// Run all 5 pipelines in parallel with graceful failure handling
|
||||
// Promise.allSettled ensures other pipelines continue if one fails
|
||||
const pipelineResults = await Promise.allSettled([
|
||||
runVulnExploitPipeline(
|
||||
'injection',
|
||||
() => a.runInjectionVulnAgent(activityInput),
|
||||
() => a.runInjectionExploitAgent(activityInput)
|
||||
),
|
||||
runVulnExploitPipeline(
|
||||
'xss',
|
||||
() => a.runXssVulnAgent(activityInput),
|
||||
() => a.runXssExploitAgent(activityInput)
|
||||
),
|
||||
runVulnExploitPipeline(
|
||||
'auth',
|
||||
() => a.runAuthVulnAgent(activityInput),
|
||||
() => a.runAuthExploitAgent(activityInput)
|
||||
),
|
||||
runVulnExploitPipeline(
|
||||
'ssrf',
|
||||
() => a.runSsrfVulnAgent(activityInput),
|
||||
() => a.runSsrfExploitAgent(activityInput)
|
||||
),
|
||||
runVulnExploitPipeline(
|
||||
'authz',
|
||||
() => a.runAuthzVulnAgent(activityInput),
|
||||
() => a.runAuthzExploitAgent(activityInput)
|
||||
),
|
||||
]);
|
||||
|
||||
// Aggregate results from all pipelines
|
||||
const failedPipelines: string[] = [];
|
||||
for (const result of pipelineResults) {
|
||||
if (result.status === 'fulfilled') {
|
||||
const { vulnType, vulnMetrics, exploitMetrics } = result.value;
|
||||
|
||||
// Record vuln agent metrics
|
||||
if (vulnMetrics) {
|
||||
state.agentMetrics[`${vulnType}-vuln`] = vulnMetrics;
|
||||
state.completedAgents.push(`${vulnType}-vuln`);
|
||||
}
|
||||
|
||||
// Record exploit agent metrics (if it ran)
|
||||
if (exploitMetrics) {
|
||||
state.agentMetrics[`${vulnType}-exploit`] = exploitMetrics;
|
||||
state.completedAgents.push(`${vulnType}-exploit`);
|
||||
}
|
||||
} else {
|
||||
// Pipeline failed - log error but continue with others
|
||||
const errorMsg =
|
||||
result.reason instanceof Error
|
||||
? result.reason.message
|
||||
: String(result.reason);
|
||||
failedPipelines.push(errorMsg);
|
||||
}
|
||||
}
|
||||
|
||||
// Log any pipeline failures (workflow continues despite failures)
|
||||
if (failedPipelines.length > 0) {
|
||||
console.log(
|
||||
`⚠️ ${failedPipelines.length} pipeline(s) failed:`,
|
||||
failedPipelines
|
||||
);
|
||||
}
|
||||
|
||||
// Update phase markers
|
||||
state.currentPhase = 'exploitation';
|
||||
state.currentAgent = null;
|
||||
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'complete');
|
||||
|
||||
// === Phase 5: Reporting ===
|
||||
state.currentPhase = 'reporting';
|
||||
state.currentAgent = 'report';
|
||||
await a.logPhaseTransition(activityInput, 'reporting', 'start');
|
||||
|
||||
// First, assemble the concatenated report from exploitation evidence files
|
||||
await a.assembleReportActivity(activityInput);
|
||||
|
||||
// Then run the report agent to add executive summary and clean up
|
||||
state.agentMetrics['report'] = await a.runReportAgent(activityInput);
|
||||
state.completedAgents.push('report');
|
||||
await a.logPhaseTransition(activityInput, 'reporting', 'complete');
|
||||
|
||||
// === Complete ===
|
||||
state.status = 'completed';
|
||||
state.currentPhase = null;
|
||||
state.currentAgent = null;
|
||||
state.summary = computeSummary(state);
|
||||
|
||||
// Log workflow completion summary
|
||||
await a.logWorkflowComplete(activityInput, {
|
||||
status: 'completed',
|
||||
totalDurationMs: state.summary.totalDurationMs,
|
||||
totalCostUsd: state.summary.totalCostUsd,
|
||||
completedAgents: state.completedAgents,
|
||||
agentMetrics: Object.fromEntries(
|
||||
Object.entries(state.agentMetrics).map(([name, m]) => [
|
||||
name,
|
||||
{ durationMs: m.durationMs, costUsd: m.costUsd },
|
||||
])
|
||||
),
|
||||
});
|
||||
|
||||
return state;
|
||||
} catch (error) {
|
||||
state.status = 'failed';
|
||||
state.failedAgent = state.currentAgent;
|
||||
state.error = error instanceof Error ? error.message : String(error);
|
||||
state.summary = computeSummary(state);
|
||||
|
||||
// Log workflow failure summary
|
||||
await a.logWorkflowComplete(activityInput, {
|
||||
status: 'failed',
|
||||
totalDurationMs: state.summary.totalDurationMs,
|
||||
totalCostUsd: state.summary.totalCostUsd,
|
||||
completedAgents: state.completedAgents,
|
||||
agentMetrics: Object.fromEntries(
|
||||
Object.entries(state.agentMetrics).map(([name, m]) => [
|
||||
name,
|
||||
{ durationMs: m.durationMs, costUsd: m.costUsd },
|
||||
])
|
||||
),
|
||||
error: state.error ?? undefined,
|
||||
});
|
||||
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
+23
-4
@@ -47,10 +47,6 @@ export type PlaywrightAgent =
|
||||
|
||||
export type AgentValidator = (sourceDir: string) => Promise<boolean>;
|
||||
|
||||
export type AgentValidatorMap = Record<AgentName, AgentValidator>;
|
||||
|
||||
export type McpAgentMapping = Record<PromptName, PlaywrightAgent>;
|
||||
|
||||
export type AgentStatus =
|
||||
| 'pending'
|
||||
| 'in_progress'
|
||||
@@ -63,3 +59,26 @@ export interface AgentDefinition {
|
||||
displayName: string;
|
||||
prerequisites: AgentName[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Maps an agent name to its corresponding prompt file name.
|
||||
*/
|
||||
export function getPromptNameForAgent(agentName: AgentName): PromptName {
|
||||
const mappings: Record<AgentName, PromptName> = {
|
||||
'pre-recon': 'pre-recon-code',
|
||||
'recon': 'recon',
|
||||
'injection-vuln': 'vuln-injection',
|
||||
'xss-vuln': 'vuln-xss',
|
||||
'auth-vuln': 'vuln-auth',
|
||||
'ssrf-vuln': 'vuln-ssrf',
|
||||
'authz-vuln': 'vuln-authz',
|
||||
'injection-exploit': 'exploit-injection',
|
||||
'xss-exploit': 'exploit-xss',
|
||||
'auth-exploit': 'exploit-auth',
|
||||
'ssrf-exploit': 'exploit-ssrf',
|
||||
'authz-exploit': 'exploit-authz',
|
||||
'report': 'report-executive',
|
||||
};
|
||||
|
||||
return mappings[agentName];
|
||||
}
|
||||
|
||||
@@ -31,13 +31,12 @@ type UnlockFunction = () => void;
|
||||
* }
|
||||
* ```
|
||||
*/
|
||||
// Promise-based mutex with queue semantics - safe for parallel agents on same session
|
||||
export class SessionMutex {
|
||||
// Map of sessionId -> Promise (represents active lock)
|
||||
private locks: Map<string, Promise<void>> = new Map();
|
||||
|
||||
/**
|
||||
* Acquire lock for a session
|
||||
*/
|
||||
// Wait for existing lock, then acquire. Queue ensures FIFO ordering.
|
||||
async lock(sessionId: string): Promise<UnlockFunction> {
|
||||
if (this.locks.has(sessionId)) {
|
||||
// Wait for existing lock to be released
|
||||
|
||||
@@ -0,0 +1,73 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* File I/O Utilities
|
||||
*
|
||||
* Core utility functions for file operations including atomic writes,
|
||||
* directory creation, and JSON file handling.
|
||||
*/
|
||||
|
||||
import fs from 'fs/promises';
|
||||
|
||||
/**
|
||||
* Ensure directory exists (idempotent, race-safe)
|
||||
*/
|
||||
export async function ensureDirectory(dirPath: string): Promise<void> {
|
||||
try {
|
||||
await fs.mkdir(dirPath, { recursive: true });
|
||||
} catch (error) {
|
||||
// Ignore EEXIST errors (race condition safe)
|
||||
if ((error as NodeJS.ErrnoException).code !== 'EEXIST') {
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Atomic write using temp file + rename pattern
|
||||
* Guarantees no partial writes or corruption on crash
|
||||
*/
|
||||
export async function atomicWrite(filePath: string, data: object | string): Promise<void> {
|
||||
const tempPath = `${filePath}.tmp`;
|
||||
const content = typeof data === 'string' ? data : JSON.stringify(data, null, 2);
|
||||
|
||||
try {
|
||||
// Write to temp file
|
||||
await fs.writeFile(tempPath, content, 'utf8');
|
||||
|
||||
// Atomic rename (POSIX guarantee: atomic on same filesystem)
|
||||
await fs.rename(tempPath, filePath);
|
||||
} catch (error) {
|
||||
// Clean up temp file on failure
|
||||
try {
|
||||
await fs.unlink(tempPath);
|
||||
} catch {
|
||||
// Ignore cleanup errors
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Read and parse JSON file
|
||||
*/
|
||||
export async function readJson<T = unknown>(filePath: string): Promise<T> {
|
||||
const content = await fs.readFile(filePath, 'utf8');
|
||||
return JSON.parse(content) as T;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if file exists
|
||||
*/
|
||||
export async function fileExists(filePath: string): Promise<boolean> {
|
||||
try {
|
||||
await fs.access(filePath);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Formatting Utilities
|
||||
*
|
||||
* Generic formatting functions for durations, timestamps, and percentages.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Format duration in milliseconds to human-readable string
|
||||
*/
|
||||
export function formatDuration(ms: number): string {
|
||||
if (ms < 1000) {
|
||||
return `${ms}ms`;
|
||||
}
|
||||
|
||||
const seconds = ms / 1000;
|
||||
if (seconds < 60) {
|
||||
return `${seconds.toFixed(1)}s`;
|
||||
}
|
||||
|
||||
const minutes = Math.floor(seconds / 60);
|
||||
const remainingSeconds = Math.floor(seconds % 60);
|
||||
return `${minutes}m ${remainingSeconds}s`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Format timestamp to ISO 8601 string
|
||||
*/
|
||||
export function formatTimestamp(timestamp: number = Date.now()): string {
|
||||
return new Date(timestamp).toISOString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate percentage
|
||||
*/
|
||||
export function calculatePercentage(part: number, total: number): number {
|
||||
if (total === 0) return 0;
|
||||
return (part / total) * 100;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract agent type from description string for display purposes
|
||||
*/
|
||||
export function extractAgentType(description: string): string {
|
||||
if (description.includes('Pre-recon')) {
|
||||
return 'pre-reconnaissance';
|
||||
}
|
||||
if (description.includes('Recon')) {
|
||||
return 'reconnaissance';
|
||||
}
|
||||
if (description.includes('Report')) {
|
||||
return 'report generation';
|
||||
}
|
||||
return 'analysis';
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Functional Programming Utilities
|
||||
*
|
||||
* Generic functional composition patterns for async operations.
|
||||
*/
|
||||
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
type PipelineFunction = (x: any) => any | Promise<any>;
|
||||
|
||||
/**
|
||||
* Async pipeline that passes result through a series of functions.
|
||||
* Clearer than reduce-based pipe and easier to debug.
|
||||
*/
|
||||
export async function asyncPipe<TResult>(
|
||||
initial: unknown,
|
||||
...fns: PipelineFunction[]
|
||||
): Promise<TResult> {
|
||||
let result = initial;
|
||||
for (const fn of fns) {
|
||||
result = await fn(result);
|
||||
}
|
||||
return result as TResult;
|
||||
}
|
||||
+171
-148
@@ -7,13 +7,76 @@
|
||||
import { $ } from 'zx';
|
||||
import chalk from 'chalk';
|
||||
|
||||
/**
|
||||
* Check if a directory is a git repository.
|
||||
* Returns true if the directory contains a .git folder or is inside a git repo.
|
||||
*/
|
||||
export async function isGitRepository(dir: string): Promise<boolean> {
|
||||
try {
|
||||
await $`cd ${dir} && git rev-parse --git-dir`.quiet();
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
interface GitOperationResult {
|
||||
success: boolean;
|
||||
hadChanges?: boolean;
|
||||
error?: Error;
|
||||
}
|
||||
|
||||
// Global git operations semaphore to prevent index.lock conflicts during parallel execution
|
||||
/**
|
||||
* Get list of changed files from git status --porcelain output
|
||||
*/
|
||||
async function getChangedFiles(
|
||||
sourceDir: string,
|
||||
operationDescription: string
|
||||
): Promise<string[]> {
|
||||
const status = await executeGitCommandWithRetry(
|
||||
['git', 'status', '--porcelain'],
|
||||
sourceDir,
|
||||
operationDescription
|
||||
);
|
||||
return status.stdout
|
||||
.trim()
|
||||
.split('\n')
|
||||
.filter((line) => line.length > 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a summary of changed files with truncation for long lists
|
||||
*/
|
||||
function logChangeSummary(
|
||||
changes: string[],
|
||||
messageWithChanges: string,
|
||||
messageWithoutChanges: string,
|
||||
color: typeof chalk.green,
|
||||
maxToShow: number = 5
|
||||
): void {
|
||||
if (changes.length > 0) {
|
||||
console.log(color(messageWithChanges.replace('{count}', String(changes.length))));
|
||||
changes.slice(0, maxToShow).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
||||
if (changes.length > maxToShow) {
|
||||
console.log(chalk.gray(` ... and ${changes.length - maxToShow} more files`));
|
||||
}
|
||||
} else {
|
||||
console.log(color(messageWithoutChanges));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert unknown error to GitOperationResult
|
||||
*/
|
||||
function toErrorResult(error: unknown): GitOperationResult {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
return {
|
||||
success: false,
|
||||
error: error instanceof Error ? error : new Error(errMsg),
|
||||
};
|
||||
}
|
||||
|
||||
// Serializes git operations to prevent index.lock conflicts during parallel agent execution
|
||||
class GitSemaphore {
|
||||
private queue: Array<() => void> = [];
|
||||
private running: boolean = false;
|
||||
@@ -41,33 +104,38 @@ class GitSemaphore {
|
||||
|
||||
const gitSemaphore = new GitSemaphore();
|
||||
|
||||
// Execute git commands with retry logic for index.lock conflicts
|
||||
export const executeGitCommandWithRetry = async (
|
||||
const GIT_LOCK_ERROR_PATTERNS = [
|
||||
'index.lock',
|
||||
'unable to lock',
|
||||
'Another git process',
|
||||
'fatal: Unable to create',
|
||||
'fatal: index file',
|
||||
];
|
||||
|
||||
function isGitLockError(errorMessage: string): boolean {
|
||||
return GIT_LOCK_ERROR_PATTERNS.some((pattern) => errorMessage.includes(pattern));
|
||||
}
|
||||
|
||||
// Retries git commands on lock conflicts with exponential backoff
|
||||
export async function executeGitCommandWithRetry(
|
||||
commandArgs: string[],
|
||||
sourceDir: string,
|
||||
description: string,
|
||||
maxRetries: number = 5
|
||||
): Promise<{ stdout: string; stderr: string }> => {
|
||||
): Promise<{ stdout: string; stderr: string }> {
|
||||
await gitSemaphore.acquire();
|
||||
|
||||
try {
|
||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||
try {
|
||||
// For arrays like ['git', 'status', '--porcelain'], execute parts separately
|
||||
const [cmd, ...args] = commandArgs;
|
||||
const result = await $`cd ${sourceDir} && ${cmd} ${args}`;
|
||||
return result;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
const isLockError =
|
||||
errMsg.includes('index.lock') ||
|
||||
errMsg.includes('unable to lock') ||
|
||||
errMsg.includes('Another git process') ||
|
||||
errMsg.includes('fatal: Unable to create') ||
|
||||
errMsg.includes('fatal: index file');
|
||||
|
||||
if (isLockError && attempt < maxRetries) {
|
||||
const delay = Math.pow(2, attempt - 1) * 1000; // Exponential backoff: 1s, 2s, 4s, 8s, 16s
|
||||
if (isGitLockError(errMsg) && attempt < maxRetries) {
|
||||
const delay = Math.pow(2, attempt - 1) * 1000;
|
||||
console.log(
|
||||
chalk.yellow(
|
||||
` ⚠️ Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`
|
||||
@@ -80,84 +148,81 @@ export const executeGitCommandWithRetry = async (
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
// Should never reach here but TypeScript needs a return
|
||||
throw new Error(`Git command failed after ${maxRetries} retries`);
|
||||
} finally {
|
||||
gitSemaphore.release();
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// Pure functions for Git workspace management
|
||||
const cleanWorkspace = async (
|
||||
// Two-phase reset: hard reset (tracked files) + clean (untracked files)
|
||||
export async function rollbackGitWorkspace(
|
||||
sourceDir: string,
|
||||
reason: string = 'clean start'
|
||||
): Promise<GitOperationResult> => {
|
||||
console.log(chalk.blue(` 🧹 Cleaning workspace for ${reason}`));
|
||||
try {
|
||||
// Check for uncommitted changes
|
||||
const status = await $`cd ${sourceDir} && git status --porcelain`;
|
||||
const hasChanges = status.stdout.trim().length > 0;
|
||||
|
||||
if (hasChanges) {
|
||||
// Show what we're about to remove
|
||||
const changes = status.stdout
|
||||
.trim()
|
||||
.split('\n')
|
||||
.filter((line) => line.length > 0);
|
||||
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
||||
|
||||
await $`cd ${sourceDir} && git reset --hard HEAD`;
|
||||
await $`cd ${sourceDir} && git clean -fd`;
|
||||
|
||||
console.log(
|
||||
chalk.yellow(` ✅ Rollback completed - removed ${changes.length} contaminated changes:`)
|
||||
);
|
||||
changes.slice(0, 3).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
||||
if (changes.length > 3) {
|
||||
console.log(chalk.gray(` ... and ${changes.length - 3} more files`));
|
||||
}
|
||||
} else {
|
||||
console.log(chalk.blue(` ✅ Workspace already clean (no changes to remove)`));
|
||||
}
|
||||
return { success: true, hadChanges: hasChanges };
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.yellow(` ⚠️ Workspace cleanup failed: ${errMsg}`));
|
||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
||||
reason: string = 'retry preparation'
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
console.log(chalk.gray(` ⏭️ Skipping git rollback (not a git repository)`));
|
||||
return { success: true };
|
||||
}
|
||||
};
|
||||
|
||||
export const createGitCheckpoint = async (
|
||||
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
||||
try {
|
||||
const changes = await getChangedFiles(sourceDir, 'status check for rollback');
|
||||
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'reset', '--hard', 'HEAD'],
|
||||
sourceDir,
|
||||
'hard reset for rollback'
|
||||
);
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'clean', '-fd'],
|
||||
sourceDir,
|
||||
'cleaning untracked files for rollback'
|
||||
);
|
||||
|
||||
logChangeSummary(
|
||||
changes,
|
||||
' ✅ Rollback completed - removed {count} contaminated changes:',
|
||||
' ✅ Rollback completed - no changes to remove',
|
||||
chalk.yellow,
|
||||
3
|
||||
);
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const result = toErrorResult(error);
|
||||
console.log(chalk.red(` ❌ Rollback failed after retries: ${result.error?.message}`));
|
||||
return result;
|
||||
}
|
||||
}
|
||||
|
||||
// Creates checkpoint before each attempt. First attempt preserves workspace; retries clean it.
|
||||
export async function createGitCheckpoint(
|
||||
sourceDir: string,
|
||||
description: string,
|
||||
attempt: number
|
||||
): Promise<GitOperationResult> => {
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
console.log(chalk.gray(` ⏭️ Skipping git checkpoint (not a git repository)`));
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
console.log(chalk.blue(` 📍 Creating checkpoint for ${description} (attempt ${attempt})`));
|
||||
try {
|
||||
// Only clean workspace on retry attempts (attempt > 1), not on first attempts
|
||||
// This preserves deliverables between agents while still cleaning on actual retries
|
||||
// First attempt: preserve existing deliverables. Retries: clean workspace to prevent pollution
|
||||
if (attempt > 1) {
|
||||
const cleanResult = await cleanWorkspace(sourceDir, `${description} (retry cleanup)`);
|
||||
const cleanResult = await rollbackGitWorkspace(sourceDir, `${description} (retry cleanup)`);
|
||||
if (!cleanResult.success) {
|
||||
const errMsg = cleanResult.error?.message || 'Unknown error';
|
||||
console.log(
|
||||
chalk.yellow(` ⚠️ Workspace cleanup failed, continuing anyway: ${errMsg}`)
|
||||
chalk.yellow(` ⚠️ Workspace cleanup failed, continuing anyway: ${cleanResult.error?.message}`)
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// Check for uncommitted changes with retry logic
|
||||
const status = await executeGitCommandWithRetry(
|
||||
['git', 'status', '--porcelain'],
|
||||
sourceDir,
|
||||
'status check'
|
||||
);
|
||||
const hasChanges = status.stdout.trim().length > 0;
|
||||
const changes = await getChangedFiles(sourceDir, 'status check');
|
||||
const hasChanges = changes.length > 0;
|
||||
|
||||
// Stage changes with retry logic
|
||||
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes');
|
||||
|
||||
// Create commit with retry logic
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
|
||||
sourceDir,
|
||||
@@ -171,106 +236,64 @@ export const createGitCheckpoint = async (
|
||||
}
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.yellow(` ⚠️ Checkpoint creation failed after retries: ${errMsg}`));
|
||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
||||
const result = toErrorResult(error);
|
||||
console.log(chalk.yellow(` ⚠️ Checkpoint creation failed after retries: ${result.error?.message}`));
|
||||
return result;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
export const commitGitSuccess = async (
|
||||
export async function commitGitSuccess(
|
||||
sourceDir: string,
|
||||
description: string
|
||||
): Promise<GitOperationResult> => {
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
console.log(chalk.gray(` ⏭️ Skipping git commit (not a git repository)`));
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
console.log(chalk.green(` 💾 Committing successful results for ${description}`));
|
||||
try {
|
||||
// Check what we're about to commit with retry logic
|
||||
const status = await executeGitCommandWithRetry(
|
||||
['git', 'status', '--porcelain'],
|
||||
sourceDir,
|
||||
'status check for success commit'
|
||||
);
|
||||
const changes = status.stdout
|
||||
.trim()
|
||||
.split('\n')
|
||||
.filter((line) => line.length > 0);
|
||||
const changes = await getChangedFiles(sourceDir, 'status check for success commit');
|
||||
|
||||
// Stage changes with retry logic
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'add', '-A'],
|
||||
sourceDir,
|
||||
'staging changes for success commit'
|
||||
);
|
||||
|
||||
// Create success commit with retry logic
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'commit', '-m', `✅ ${description}: completed successfully`, '--allow-empty'],
|
||||
sourceDir,
|
||||
'creating success commit'
|
||||
);
|
||||
|
||||
if (changes.length > 0) {
|
||||
console.log(chalk.green(` ✅ Success commit created with ${changes.length} file changes:`));
|
||||
changes.slice(0, 5).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
||||
if (changes.length > 5) {
|
||||
console.log(chalk.gray(` ... and ${changes.length - 5} more files`));
|
||||
}
|
||||
} else {
|
||||
console.log(chalk.green(` ✅ Empty success commit created (agent made no file changes)`));
|
||||
}
|
||||
logChangeSummary(
|
||||
changes,
|
||||
' ✅ Success commit created with {count} file changes:',
|
||||
' ✅ Empty success commit created (agent made no file changes)',
|
||||
chalk.green,
|
||||
5
|
||||
);
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.yellow(` ⚠️ Success commit failed after retries: ${errMsg}`));
|
||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
||||
const result = toErrorResult(error);
|
||||
console.log(chalk.yellow(` ⚠️ Success commit failed after retries: ${result.error?.message}`));
|
||||
return result;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
export const rollbackGitWorkspace = async (
|
||||
sourceDir: string,
|
||||
reason: string = 'retry preparation'
|
||||
): Promise<GitOperationResult> => {
|
||||
console.log(chalk.yellow(` 🔄 Rolling back workspace for ${reason}`));
|
||||
/**
|
||||
* Get current git commit hash.
|
||||
* Returns null if not a git repository.
|
||||
*/
|
||||
export async function getGitCommitHash(sourceDir: string): Promise<string | null> {
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
return null;
|
||||
}
|
||||
try {
|
||||
// Show what we're about to remove with retry logic
|
||||
const status = await executeGitCommandWithRetry(
|
||||
['git', 'status', '--porcelain'],
|
||||
sourceDir,
|
||||
'status check for rollback'
|
||||
);
|
||||
const changes = status.stdout
|
||||
.trim()
|
||||
.split('\n')
|
||||
.filter((line) => line.length > 0);
|
||||
|
||||
// Reset to HEAD with retry logic
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'reset', '--hard', 'HEAD'],
|
||||
sourceDir,
|
||||
'hard reset for rollback'
|
||||
);
|
||||
|
||||
// Clean untracked files with retry logic
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'clean', '-fd'],
|
||||
sourceDir,
|
||||
'cleaning untracked files for rollback'
|
||||
);
|
||||
|
||||
if (changes.length > 0) {
|
||||
console.log(
|
||||
chalk.yellow(` ✅ Rollback completed - removed ${changes.length} contaminated changes:`)
|
||||
);
|
||||
changes.slice(0, 3).forEach((change) => console.log(chalk.gray(` ${change}`)));
|
||||
if (changes.length > 3) {
|
||||
console.log(chalk.gray(` ... and ${changes.length - 3} more files`));
|
||||
}
|
||||
} else {
|
||||
console.log(chalk.yellow(` ✅ Rollback completed - no changes to remove`));
|
||||
}
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
console.log(chalk.red(` ❌ Rollback failed after retries: ${errMsg}`));
|
||||
return { success: false, error: error instanceof Error ? error : new Error(errMsg) };
|
||||
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
|
||||
return result.stdout.trim();
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import chalk from 'chalk';
|
||||
import { formatDuration } from '../audit/utils.js';
|
||||
import { formatDuration } from './formatting.js';
|
||||
|
||||
// Timing utilities
|
||||
|
||||
|
||||
Reference in New Issue
Block a user