mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-07-05 13:37:48 +02:00
CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
This commit is contained in:
@@ -0,0 +1,369 @@
|
||||
"""
|
||||
FuzzForge Common Storage Activities
|
||||
|
||||
Activities for interacting with MinIO storage:
|
||||
- get_target_activity: Download target from MinIO to local cache
|
||||
- cleanup_cache_activity: Remove target from local cache
|
||||
- upload_results_activity: Upload workflow results to MinIO
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
import boto3
|
||||
from botocore.exceptions import ClientError
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Initialize S3 client (MinIO)
|
||||
s3_client = boto3.client(
|
||||
's3',
|
||||
endpoint_url=os.getenv('S3_ENDPOINT', 'http://minio:9000'),
|
||||
aws_access_key_id=os.getenv('S3_ACCESS_KEY', 'fuzzforge'),
|
||||
aws_secret_access_key=os.getenv('S3_SECRET_KEY', 'fuzzforge123'),
|
||||
region_name=os.getenv('S3_REGION', 'us-east-1'),
|
||||
use_ssl=os.getenv('S3_USE_SSL', 'false').lower() == 'true'
|
||||
)
|
||||
|
||||
# Configuration
|
||||
S3_BUCKET = os.getenv('S3_BUCKET', 'targets')
|
||||
CACHE_DIR = Path(os.getenv('CACHE_DIR', '/cache'))
|
||||
CACHE_MAX_SIZE_GB = int(os.getenv('CACHE_MAX_SIZE', '10').rstrip('GB'))
|
||||
|
||||
|
||||
@activity.defn(name="get_target")
|
||||
async def get_target_activity(
|
||||
target_id: str,
|
||||
run_id: str = None,
|
||||
workspace_isolation: str = "isolated"
|
||||
) -> str:
|
||||
"""
|
||||
Download target from MinIO to local cache.
|
||||
|
||||
Args:
|
||||
target_id: UUID of the uploaded target
|
||||
run_id: Workflow run ID for isolation (required for isolated mode)
|
||||
workspace_isolation: Isolation mode - "isolated" (default), "shared", or "copy-on-write"
|
||||
|
||||
Returns:
|
||||
Local path to the cached target workspace
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If target doesn't exist in MinIO
|
||||
ValueError: If run_id not provided for isolated mode
|
||||
Exception: For other download errors
|
||||
"""
|
||||
logger.info(
|
||||
f"Activity: get_target (target_id={target_id}, run_id={run_id}, "
|
||||
f"isolation={workspace_isolation})"
|
||||
)
|
||||
|
||||
# Validate isolation mode
|
||||
valid_modes = ["isolated", "shared", "copy-on-write"]
|
||||
if workspace_isolation not in valid_modes:
|
||||
raise ValueError(
|
||||
f"Invalid workspace_isolation mode: {workspace_isolation}. "
|
||||
f"Must be one of: {valid_modes}"
|
||||
)
|
||||
|
||||
# Require run_id for isolated and copy-on-write modes
|
||||
if workspace_isolation in ["isolated", "copy-on-write"] and not run_id:
|
||||
raise ValueError(
|
||||
f"run_id is required for workspace_isolation='{workspace_isolation}'"
|
||||
)
|
||||
|
||||
# Define cache paths based on isolation mode
|
||||
if workspace_isolation == "isolated":
|
||||
# Each run gets its own isolated workspace
|
||||
cache_path = CACHE_DIR / target_id / run_id
|
||||
cached_file = cache_path / "target"
|
||||
elif workspace_isolation == "shared":
|
||||
# All runs share the same workspace (legacy behavior)
|
||||
cache_path = CACHE_DIR / target_id
|
||||
cached_file = cache_path / "target"
|
||||
else: # copy-on-write
|
||||
# Shared download, run-specific copy
|
||||
shared_cache_path = CACHE_DIR / target_id / "shared"
|
||||
cache_path = CACHE_DIR / target_id / run_id
|
||||
cached_file = shared_cache_path / "target"
|
||||
|
||||
# Handle copy-on-write mode
|
||||
if workspace_isolation == "copy-on-write":
|
||||
# Check if shared cache exists
|
||||
if cached_file.exists():
|
||||
logger.info(f"Copy-on-write: Shared cache HIT for {target_id}")
|
||||
|
||||
# Copy shared workspace to run-specific path
|
||||
shared_workspace = shared_cache_path / "workspace"
|
||||
run_workspace = cache_path / "workspace"
|
||||
|
||||
if shared_workspace.exists():
|
||||
logger.info(f"Copying workspace to isolated run path: {run_workspace}")
|
||||
cache_path.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copytree(shared_workspace, run_workspace)
|
||||
return str(run_workspace)
|
||||
else:
|
||||
# Shared file exists but not extracted (non-tarball)
|
||||
run_file = cache_path / "target"
|
||||
cache_path.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(cached_file, run_file)
|
||||
return str(run_file)
|
||||
# If shared cache doesn't exist, fall through to download
|
||||
|
||||
# Check if target is already cached (isolated or shared mode)
|
||||
elif cached_file.exists():
|
||||
# Update access time for LRU
|
||||
cached_file.touch()
|
||||
logger.info(f"Cache HIT: {target_id} (mode: {workspace_isolation})")
|
||||
|
||||
# Check if workspace directory exists (extracted tarball)
|
||||
workspace_dir = cache_path / "workspace"
|
||||
if workspace_dir.exists() and workspace_dir.is_dir():
|
||||
logger.info(f"Returning cached workspace: {workspace_dir}")
|
||||
return str(workspace_dir)
|
||||
else:
|
||||
# Return cached file (not a tarball)
|
||||
return str(cached_file)
|
||||
|
||||
# Cache miss - download from MinIO
|
||||
logger.info(
|
||||
f"Cache MISS: {target_id} (mode: {workspace_isolation}), "
|
||||
f"downloading from MinIO..."
|
||||
)
|
||||
|
||||
try:
|
||||
# Create cache directory
|
||||
cache_path.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Download from S3/MinIO
|
||||
s3_key = f'{target_id}/target'
|
||||
logger.info(f"Downloading s3://{S3_BUCKET}/{s3_key} -> {cached_file}")
|
||||
|
||||
s3_client.download_file(
|
||||
Bucket=S3_BUCKET,
|
||||
Key=s3_key,
|
||||
Filename=str(cached_file)
|
||||
)
|
||||
|
||||
# Verify file was downloaded
|
||||
if not cached_file.exists():
|
||||
raise FileNotFoundError(f"Downloaded file not found: {cached_file}")
|
||||
|
||||
file_size = cached_file.stat().st_size
|
||||
logger.info(
|
||||
f"✓ Downloaded target {target_id} "
|
||||
f"({file_size / 1024 / 1024:.2f} MB)"
|
||||
)
|
||||
|
||||
# Extract tarball if it's an archive
|
||||
import tarfile
|
||||
workspace_dir = cache_path / "workspace"
|
||||
|
||||
if tarfile.is_tarfile(str(cached_file)):
|
||||
logger.info(f"Extracting tarball to {workspace_dir}...")
|
||||
workspace_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with tarfile.open(str(cached_file), 'r:*') as tar:
|
||||
tar.extractall(path=workspace_dir)
|
||||
|
||||
logger.info(f"✓ Extracted tarball to {workspace_dir}")
|
||||
|
||||
# For copy-on-write mode, copy to run-specific path
|
||||
if workspace_isolation == "copy-on-write":
|
||||
run_cache_path = CACHE_DIR / target_id / run_id
|
||||
run_workspace = run_cache_path / "workspace"
|
||||
logger.info(f"Copy-on-write: Copying to {run_workspace}")
|
||||
run_cache_path.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copytree(workspace_dir, run_workspace)
|
||||
return str(run_workspace)
|
||||
|
||||
return str(workspace_dir)
|
||||
else:
|
||||
# Not a tarball
|
||||
if workspace_isolation == "copy-on-write":
|
||||
# Copy file to run-specific path
|
||||
run_cache_path = CACHE_DIR / target_id / run_id
|
||||
run_file = run_cache_path / "target"
|
||||
logger.info(f"Copy-on-write: Copying file to {run_file}")
|
||||
run_cache_path.mkdir(parents=True, exist_ok=True)
|
||||
shutil.copy2(cached_file, run_file)
|
||||
return str(run_file)
|
||||
|
||||
return str(cached_file)
|
||||
|
||||
except ClientError as e:
|
||||
error_code = e.response['Error']['Code']
|
||||
if error_code == '404' or error_code == 'NoSuchKey':
|
||||
logger.error(f"Target not found in MinIO: {target_id}")
|
||||
raise FileNotFoundError(f"Target {target_id} not found in storage")
|
||||
else:
|
||||
logger.error(f"S3/MinIO error downloading target: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to download target {target_id}: {e}", exc_info=True)
|
||||
# Cleanup partial download
|
||||
if cache_path.exists():
|
||||
shutil.rmtree(cache_path, ignore_errors=True)
|
||||
raise
|
||||
|
||||
|
||||
@activity.defn(name="cleanup_cache")
|
||||
async def cleanup_cache_activity(
|
||||
target_path: str,
|
||||
workspace_isolation: str = "isolated"
|
||||
) -> None:
|
||||
"""
|
||||
Remove target from local cache after workflow completes.
|
||||
|
||||
Args:
|
||||
target_path: Path to the cached target workspace (from get_target_activity)
|
||||
workspace_isolation: Isolation mode used - determines cleanup scope
|
||||
|
||||
Notes:
|
||||
- "isolated" mode: Removes the entire run-specific directory
|
||||
- "copy-on-write" mode: Removes run-specific directory, keeps shared cache
|
||||
- "shared" mode: Does NOT remove cache (shared across runs)
|
||||
"""
|
||||
logger.info(
|
||||
f"Activity: cleanup_cache (path={target_path}, "
|
||||
f"isolation={workspace_isolation})"
|
||||
)
|
||||
|
||||
try:
|
||||
target = Path(target_path)
|
||||
|
||||
# For shared mode, don't clean up (cache is shared across runs)
|
||||
if workspace_isolation == "shared":
|
||||
logger.info(
|
||||
f"Skipping cleanup for shared workspace (mode={workspace_isolation})"
|
||||
)
|
||||
return
|
||||
|
||||
# For isolated and copy-on-write modes, clean up run-specific directory
|
||||
# Navigate up to the run-specific directory: /cache/{target_id}/{run_id}/
|
||||
if target.name == "workspace":
|
||||
# Path is .../workspace, go up one level to run directory
|
||||
run_dir = target.parent
|
||||
else:
|
||||
# Path is a file, go up one level to run directory
|
||||
run_dir = target.parent
|
||||
|
||||
# Validate it's in cache and looks like a run-specific path
|
||||
if run_dir.exists() and run_dir.is_relative_to(CACHE_DIR):
|
||||
# Check if parent is target_id directory (validate structure)
|
||||
target_id_dir = run_dir.parent
|
||||
if target_id_dir.is_relative_to(CACHE_DIR):
|
||||
shutil.rmtree(run_dir)
|
||||
logger.info(
|
||||
f"✓ Cleaned up run-specific directory: {run_dir} "
|
||||
f"(mode={workspace_isolation})"
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
f"Unexpected cache structure, skipping cleanup: {run_dir}"
|
||||
)
|
||||
else:
|
||||
logger.warning(
|
||||
f"Cache path not in CACHE_DIR or doesn't exist: {run_dir}"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Don't fail workflow if cleanup fails
|
||||
logger.error(
|
||||
f"Failed to cleanup cache {target_path}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
|
||||
|
||||
@activity.defn(name="upload_results")
|
||||
async def upload_results_activity(
|
||||
workflow_id: str,
|
||||
results: dict,
|
||||
results_format: str = "json"
|
||||
) -> str:
|
||||
"""
|
||||
Upload workflow results to MinIO.
|
||||
|
||||
Args:
|
||||
workflow_id: Workflow execution ID
|
||||
results: Results dictionary to upload
|
||||
results_format: Format for results (json, sarif, etc.)
|
||||
|
||||
Returns:
|
||||
S3 URL to the uploaded results
|
||||
"""
|
||||
logger.info(
|
||||
f"Activity: upload_results "
|
||||
f"(workflow_id={workflow_id}, format={results_format})"
|
||||
)
|
||||
|
||||
try:
|
||||
import json
|
||||
|
||||
# Prepare results content
|
||||
if results_format == "json":
|
||||
content = json.dumps(results, indent=2).encode('utf-8')
|
||||
content_type = 'application/json'
|
||||
file_ext = 'json'
|
||||
elif results_format == "sarif":
|
||||
content = json.dumps(results, indent=2).encode('utf-8')
|
||||
content_type = 'application/sarif+json'
|
||||
file_ext = 'sarif'
|
||||
else:
|
||||
# Default to JSON
|
||||
content = json.dumps(results, indent=2).encode('utf-8')
|
||||
content_type = 'application/json'
|
||||
file_ext = 'json'
|
||||
|
||||
# Upload to MinIO
|
||||
s3_key = f'{workflow_id}/results.{file_ext}'
|
||||
logger.info(f"Uploading results to s3://results/{s3_key}")
|
||||
|
||||
s3_client.put_object(
|
||||
Bucket='results',
|
||||
Key=s3_key,
|
||||
Body=content,
|
||||
ContentType=content_type,
|
||||
Metadata={
|
||||
'workflow_id': workflow_id,
|
||||
'format': results_format
|
||||
}
|
||||
)
|
||||
|
||||
# Construct S3 URL
|
||||
s3_endpoint = os.getenv('S3_ENDPOINT', 'http://minio:9000')
|
||||
s3_url = f"{s3_endpoint}/results/{s3_key}"
|
||||
|
||||
logger.info(f"✓ Uploaded results: {s3_url}")
|
||||
return s3_url
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to upload results for workflow {workflow_id}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
raise
|
||||
|
||||
|
||||
def _check_cache_size():
|
||||
"""Check total cache size and log warning if exceeding limit"""
|
||||
try:
|
||||
total_size = 0
|
||||
for item in CACHE_DIR.rglob('*'):
|
||||
if item.is_file():
|
||||
total_size += item.stat().st_size
|
||||
|
||||
total_size_gb = total_size / (1024 ** 3)
|
||||
if total_size_gb > CACHE_MAX_SIZE_GB:
|
||||
logger.warning(
|
||||
f"Cache size ({total_size_gb:.2f} GB) exceeds "
|
||||
f"limit ({CACHE_MAX_SIZE_GB} GB). Consider cleanup."
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to check cache size: {e}")
|
||||
@@ -16,7 +16,7 @@ Security Analyzer Module - Analyzes code for security vulnerabilities
|
||||
import logging
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from typing import Dict, Any, List
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
@@ -17,7 +17,6 @@ from abc import ABC, abstractmethod
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from pydantic import BaseModel, Field
|
||||
from datetime import datetime
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
"""
|
||||
Fuzzing modules for FuzzForge
|
||||
|
||||
This package contains fuzzing modules for different fuzzing engines.
|
||||
"""
|
||||
|
||||
from .atheris_fuzzer import AtherisFuzzer
|
||||
from .cargo_fuzzer import CargoFuzzer
|
||||
|
||||
__all__ = ["AtherisFuzzer", "CargoFuzzer"]
|
||||
@@ -0,0 +1,608 @@
|
||||
"""
|
||||
Atheris Fuzzer Module
|
||||
|
||||
Reusable module for fuzzing Python code using Atheris.
|
||||
Discovers and fuzzes user-provided Python targets with TestOneInput() function.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import base64
|
||||
import importlib.util
|
||||
import logging
|
||||
import multiprocessing
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Callable
|
||||
import uuid
|
||||
|
||||
import httpx
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _run_atheris_in_subprocess(
|
||||
target_path_str: str,
|
||||
corpus_dir_str: str,
|
||||
max_iterations: int,
|
||||
timeout_seconds: int,
|
||||
shared_crashes: Any,
|
||||
exec_counter: multiprocessing.Value,
|
||||
crash_counter: multiprocessing.Value,
|
||||
coverage_counter: multiprocessing.Value
|
||||
):
|
||||
"""
|
||||
Run atheris.Fuzz() in a separate process to isolate os._exit() calls.
|
||||
|
||||
This function runs in a subprocess and loads the target module,
|
||||
sets up atheris, and runs fuzzing. Stats are communicated via shared memory.
|
||||
|
||||
Args:
|
||||
target_path_str: String path to target file
|
||||
corpus_dir_str: String path to corpus directory
|
||||
max_iterations: Maximum fuzzing iterations
|
||||
timeout_seconds: Timeout in seconds
|
||||
shared_crashes: Manager().list() for storing crash details
|
||||
exec_counter: Shared counter for executions
|
||||
crash_counter: Shared counter for crashes
|
||||
coverage_counter: Shared counter for coverage edges
|
||||
"""
|
||||
import atheris
|
||||
import importlib.util
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
|
||||
target_path = Path(target_path_str)
|
||||
total_executions = 0
|
||||
|
||||
# NOTE: Crash details are written directly to shared_crashes (Manager().list())
|
||||
# so they can be accessed by parent process after subprocess exits.
|
||||
# We don't use a local crashes list because os._exit() prevents cleanup code.
|
||||
|
||||
try:
|
||||
# Load target module in subprocess
|
||||
module_name = f"fuzz_target_{uuid.uuid4().hex[:8]}"
|
||||
spec = importlib.util.spec_from_file_location(module_name, target_path)
|
||||
if spec is None or spec.loader is None:
|
||||
raise ImportError(f"Could not load module from {target_path}")
|
||||
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
sys.modules[module_name] = module
|
||||
spec.loader.exec_module(module)
|
||||
|
||||
if not hasattr(module, "TestOneInput"):
|
||||
raise AttributeError("Module does not have TestOneInput() function")
|
||||
|
||||
test_one_input = module.TestOneInput
|
||||
|
||||
# Wrapper to track executions and crashes
|
||||
def fuzz_wrapper(data):
|
||||
nonlocal total_executions
|
||||
total_executions += 1
|
||||
|
||||
# Update shared counter for live stats
|
||||
with exec_counter.get_lock():
|
||||
exec_counter.value += 1
|
||||
|
||||
try:
|
||||
test_one_input(data)
|
||||
except Exception as e:
|
||||
# Capture crash details to shared memory
|
||||
crash_info = {
|
||||
"input": bytes(data), # Convert to bytes for serialization
|
||||
"exception_type": type(e).__name__,
|
||||
"exception_message": str(e),
|
||||
"stack_trace": traceback.format_exc(),
|
||||
"execution": total_executions
|
||||
}
|
||||
# Write to shared memory so parent process can access crash details
|
||||
shared_crashes.append(crash_info)
|
||||
|
||||
# Update shared crash counter
|
||||
with crash_counter.get_lock():
|
||||
crash_counter.value += 1
|
||||
|
||||
# Re-raise so Atheris detects it
|
||||
raise
|
||||
|
||||
# Check for dictionary file in target directory
|
||||
dict_args = []
|
||||
target_dir = target_path.parent
|
||||
for dict_name in ["fuzz.dict", "fuzzing.dict", "dict.txt"]:
|
||||
dict_path = target_dir / dict_name
|
||||
if dict_path.exists():
|
||||
dict_args.append(f"-dict={dict_path}")
|
||||
break
|
||||
|
||||
# Configure Atheris
|
||||
atheris_args = [
|
||||
"atheris_fuzzer",
|
||||
f"-runs={max_iterations}",
|
||||
f"-max_total_time={timeout_seconds}",
|
||||
"-print_final_stats=1"
|
||||
] + dict_args + [corpus_dir_str] # Corpus directory as positional arg
|
||||
|
||||
atheris.Setup(atheris_args, fuzz_wrapper)
|
||||
|
||||
# Run fuzzing (this will call os._exit() when done)
|
||||
atheris.Fuzz()
|
||||
|
||||
except SystemExit:
|
||||
# Atheris exits when done - this is normal
|
||||
# Crash details already written to shared_crashes
|
||||
pass
|
||||
except Exception:
|
||||
# Fatal error - traceback already written to shared memory
|
||||
# via crash handler in fuzz_wrapper
|
||||
pass
|
||||
|
||||
|
||||
class AtherisFuzzer(BaseModule):
|
||||
"""
|
||||
Atheris fuzzing module - discovers and fuzzes Python code.
|
||||
|
||||
This module can be used by any workflow to fuzz Python targets.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.crashes = []
|
||||
self.total_executions = 0
|
||||
self.start_time = None
|
||||
self.last_stats_time = 0
|
||||
self.run_id = None
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Return module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="atheris_fuzzer",
|
||||
version="1.0.0",
|
||||
description="Python fuzzing using Atheris - discovers and fuzzes TestOneInput() functions",
|
||||
author="FuzzForge Team",
|
||||
category="fuzzer",
|
||||
tags=["fuzzing", "atheris", "python", "coverage"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"target_file": {
|
||||
"type": "string",
|
||||
"description": "Python file with TestOneInput() function (auto-discovered if not specified)"
|
||||
},
|
||||
"max_iterations": {
|
||||
"type": "integer",
|
||||
"description": "Maximum fuzzing iterations",
|
||||
"default": 100000
|
||||
},
|
||||
"timeout_seconds": {
|
||||
"type": "integer",
|
||||
"description": "Fuzzing timeout in seconds",
|
||||
"default": 300
|
||||
},
|
||||
"stats_callback": {
|
||||
"description": "Optional callback for real-time statistics"
|
||||
}
|
||||
}
|
||||
},
|
||||
requires_workspace=True
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate fuzzing configuration"""
|
||||
max_iterations = config.get("max_iterations", 100000)
|
||||
if not isinstance(max_iterations, int) or max_iterations <= 0:
|
||||
raise ValueError(f"max_iterations must be positive integer, got: {max_iterations}")
|
||||
|
||||
timeout = config.get("timeout_seconds", 300)
|
||||
if not isinstance(timeout, int) or timeout <= 0:
|
||||
raise ValueError(f"timeout_seconds must be positive integer, got: {timeout}")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
|
||||
"""
|
||||
Execute Atheris fuzzing on user code.
|
||||
|
||||
Args:
|
||||
config: Fuzzing configuration
|
||||
workspace: Path to user's uploaded code
|
||||
|
||||
Returns:
|
||||
ModuleResult with crash findings
|
||||
"""
|
||||
self.start_timer()
|
||||
self.start_time = time.time()
|
||||
|
||||
# Validate configuration
|
||||
self.validate_config(config)
|
||||
self.validate_workspace(workspace)
|
||||
|
||||
# Extract config
|
||||
target_file = config.get("target_file")
|
||||
max_iterations = config.get("max_iterations", 100000)
|
||||
timeout_seconds = config.get("timeout_seconds", 300)
|
||||
stats_callback = config.get("stats_callback")
|
||||
self.run_id = config.get("run_id")
|
||||
|
||||
logger.info(
|
||||
f"Starting Atheris fuzzing (max_iterations={max_iterations}, "
|
||||
f"timeout={timeout_seconds}s, target={target_file or 'auto-discover'})"
|
||||
)
|
||||
|
||||
try:
|
||||
# Step 1: Discover or load target
|
||||
target_path = self._discover_target(workspace, target_file)
|
||||
logger.info(f"Using fuzz target: {target_path}")
|
||||
|
||||
# Step 2: Load target module
|
||||
test_one_input = self._load_target_module(target_path)
|
||||
logger.info(f"Loaded TestOneInput function from {target_path}")
|
||||
|
||||
# Step 3: Run fuzzing
|
||||
await self._run_fuzzing(
|
||||
test_one_input=test_one_input,
|
||||
target_path=target_path,
|
||||
workspace=workspace,
|
||||
max_iterations=max_iterations,
|
||||
timeout_seconds=timeout_seconds,
|
||||
stats_callback=stats_callback
|
||||
)
|
||||
|
||||
# Step 4: Generate findings from crashes
|
||||
findings = await self._generate_findings(target_path)
|
||||
|
||||
logger.info(
|
||||
f"Fuzzing completed: {self.total_executions} executions, "
|
||||
f"{len(self.crashes)} crashes found"
|
||||
)
|
||||
|
||||
# Generate SARIF report (always, even with no findings)
|
||||
from modules.reporter import SARIFReporter
|
||||
reporter = SARIFReporter()
|
||||
reporter_config = {
|
||||
"findings": findings,
|
||||
"tool_name": "Atheris Fuzzer",
|
||||
"tool_version": self._metadata.version
|
||||
}
|
||||
reporter_result = await reporter.execute(reporter_config, workspace)
|
||||
sarif_report = reporter_result.sarif
|
||||
|
||||
return ModuleResult(
|
||||
module=self._metadata.name,
|
||||
version=self._metadata.version,
|
||||
status="success",
|
||||
execution_time=self.get_execution_time(),
|
||||
findings=findings,
|
||||
summary={
|
||||
"total_executions": self.total_executions,
|
||||
"crashes_found": len(self.crashes),
|
||||
"execution_time": self.get_execution_time(),
|
||||
"target_file": str(target_path.relative_to(workspace))
|
||||
},
|
||||
metadata={
|
||||
"max_iterations": max_iterations,
|
||||
"timeout_seconds": timeout_seconds
|
||||
},
|
||||
sarif=sarif_report
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing failed: {e}", exc_info=True)
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
def _discover_target(self, workspace: Path, target_file: Optional[str]) -> Path:
|
||||
"""
|
||||
Discover fuzz target in workspace.
|
||||
|
||||
Args:
|
||||
workspace: Path to workspace
|
||||
target_file: Explicit target file or None for auto-discovery
|
||||
|
||||
Returns:
|
||||
Path to target file
|
||||
"""
|
||||
if target_file:
|
||||
# Use specified target
|
||||
target_path = workspace / target_file
|
||||
if not target_path.exists():
|
||||
raise FileNotFoundError(f"Target file not found: {target_file}")
|
||||
return target_path
|
||||
|
||||
# Auto-discover: look for fuzz_*.py or *_fuzz.py
|
||||
logger.info("Auto-discovering fuzz targets...")
|
||||
|
||||
candidates = []
|
||||
# Use rglob for recursive search (searches all subdirectories)
|
||||
for pattern in ["fuzz_*.py", "*_fuzz.py", "fuzz_target.py"]:
|
||||
matches = list(workspace.rglob(pattern))
|
||||
candidates.extend(matches)
|
||||
|
||||
if not candidates:
|
||||
raise FileNotFoundError(
|
||||
"No fuzz targets found. Expected files matching: fuzz_*.py, *_fuzz.py, or fuzz_target.py"
|
||||
)
|
||||
|
||||
# Use first candidate
|
||||
target = candidates[0]
|
||||
if len(candidates) > 1:
|
||||
logger.warning(
|
||||
f"Multiple fuzz targets found: {[str(c) for c in candidates]}. "
|
||||
f"Using: {target.name}"
|
||||
)
|
||||
|
||||
return target
|
||||
|
||||
def _load_target_module(self, target_path: Path) -> Callable:
|
||||
"""
|
||||
Load target module and get TestOneInput function.
|
||||
|
||||
Args:
|
||||
target_path: Path to Python file with TestOneInput
|
||||
|
||||
Returns:
|
||||
TestOneInput function
|
||||
"""
|
||||
# Add target directory to sys.path
|
||||
target_dir = target_path.parent
|
||||
if str(target_dir) not in sys.path:
|
||||
sys.path.insert(0, str(target_dir))
|
||||
|
||||
# Load module dynamically
|
||||
module_name = target_path.stem
|
||||
spec = importlib.util.spec_from_file_location(module_name, target_path)
|
||||
if spec is None or spec.loader is None:
|
||||
raise ImportError(f"Cannot load module from {target_path}")
|
||||
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
|
||||
# Get TestOneInput function
|
||||
if not hasattr(module, "TestOneInput"):
|
||||
raise AttributeError(
|
||||
f"Module {module_name} does not have TestOneInput() function. "
|
||||
"Atheris requires a TestOneInput(data: bytes) function."
|
||||
)
|
||||
|
||||
return module.TestOneInput
|
||||
|
||||
async def _run_fuzzing(
|
||||
self,
|
||||
test_one_input: Callable,
|
||||
target_path: Path,
|
||||
workspace: Path,
|
||||
max_iterations: int,
|
||||
timeout_seconds: int,
|
||||
stats_callback: Optional[Callable] = None
|
||||
):
|
||||
"""
|
||||
Run Atheris fuzzing with real-time monitoring.
|
||||
|
||||
Args:
|
||||
test_one_input: TestOneInput function to fuzz (not used, loaded in subprocess)
|
||||
target_path: Path to target file
|
||||
workspace: Path to workspace directory
|
||||
max_iterations: Max iterations
|
||||
timeout_seconds: Timeout in seconds
|
||||
stats_callback: Optional callback for stats
|
||||
"""
|
||||
self.crashes = []
|
||||
self.total_executions = 0
|
||||
|
||||
# Create corpus directory in workspace
|
||||
corpus_dir = workspace / ".fuzzforge_corpus"
|
||||
corpus_dir.mkdir(exist_ok=True)
|
||||
logger.info(f"Using corpus directory: {corpus_dir}")
|
||||
|
||||
logger.info(f"Starting Atheris fuzzer in subprocess (max_runs={max_iterations}, timeout={timeout_seconds}s)...")
|
||||
|
||||
# Create shared memory for subprocess communication
|
||||
ctx = multiprocessing.get_context('spawn')
|
||||
manager = ctx.Manager()
|
||||
shared_crashes = manager.list() # Shared list for crash details
|
||||
exec_counter = ctx.Value('i', 0) # Shared execution counter
|
||||
crash_counter = ctx.Value('i', 0) # Shared crash counter
|
||||
coverage_counter = ctx.Value('i', 0) # Shared coverage counter
|
||||
|
||||
# Start fuzzing in subprocess
|
||||
process = ctx.Process(
|
||||
target=_run_atheris_in_subprocess,
|
||||
args=(str(target_path), str(corpus_dir), max_iterations, timeout_seconds, shared_crashes, exec_counter, crash_counter, coverage_counter)
|
||||
)
|
||||
|
||||
# Run fuzzing in a separate task with monitoring
|
||||
async def monitor_stats():
|
||||
"""Monitor and report stats every 0.5 seconds"""
|
||||
while True:
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
if stats_callback:
|
||||
elapsed = time.time() - self.start_time
|
||||
# Read from shared counters
|
||||
current_execs = exec_counter.value
|
||||
current_crashes = crash_counter.value
|
||||
current_coverage = coverage_counter.value
|
||||
execs_per_sec = current_execs / elapsed if elapsed > 0 else 0
|
||||
|
||||
# Count corpus files
|
||||
try:
|
||||
corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
|
||||
except Exception:
|
||||
corpus_size = 0
|
||||
|
||||
# TODO: Get real coverage from Atheris
|
||||
# For now use corpus_size as proxy
|
||||
coverage_value = current_coverage if current_coverage > 0 else corpus_size
|
||||
|
||||
await stats_callback({
|
||||
"total_execs": current_execs,
|
||||
"execs_per_sec": execs_per_sec,
|
||||
"crashes": current_crashes,
|
||||
"corpus_size": corpus_size,
|
||||
"coverage": coverage_value, # Using corpus as coverage proxy
|
||||
"elapsed_time": int(elapsed)
|
||||
})
|
||||
|
||||
# Start monitoring task
|
||||
monitor_task = None
|
||||
if stats_callback:
|
||||
monitor_task = asyncio.create_task(monitor_stats())
|
||||
|
||||
try:
|
||||
# Start subprocess
|
||||
process.start()
|
||||
logger.info(f"Fuzzing subprocess started (PID: {process.pid})")
|
||||
|
||||
# Wait for subprocess to complete
|
||||
while process.is_alive():
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
# NOTE: We cannot use result_queue because Atheris calls os._exit()
|
||||
# which terminates immediately without putting results in the queue.
|
||||
# Instead, we rely on shared memory (Manager().list() and Value counters).
|
||||
|
||||
# Read final values from shared memory
|
||||
self.total_executions = exec_counter.value
|
||||
total_crashes = crash_counter.value
|
||||
|
||||
# Read crash details from shared memory and convert to our format
|
||||
self.crashes = []
|
||||
for crash_data in shared_crashes:
|
||||
# Reconstruct crash info with exception object
|
||||
crash_info = {
|
||||
"input": crash_data["input"],
|
||||
"exception": Exception(crash_data["exception_message"]),
|
||||
"exception_type": crash_data["exception_type"],
|
||||
"stack_trace": crash_data["stack_trace"],
|
||||
"execution": crash_data["execution"]
|
||||
}
|
||||
self.crashes.append(crash_info)
|
||||
|
||||
logger.warning(
|
||||
f"Crash found (execution {crash_data['execution']}): "
|
||||
f"{crash_data['exception_type']}: {crash_data['exception_message']}"
|
||||
)
|
||||
|
||||
logger.info(f"Fuzzing completed: {self.total_executions} executions, {total_crashes} crashes found")
|
||||
|
||||
# Send final stats update
|
||||
if stats_callback:
|
||||
elapsed = time.time() - self.start_time
|
||||
execs_per_sec = self.total_executions / elapsed if elapsed > 0 else 0
|
||||
|
||||
# Count final corpus size
|
||||
try:
|
||||
final_corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
|
||||
except Exception:
|
||||
final_corpus_size = 0
|
||||
|
||||
# TODO: Parse coverage from Atheris output
|
||||
# For now, use corpus size as proxy (corpus grows with coverage)
|
||||
# libFuzzer writes coverage to stdout but sys.stdout redirection
|
||||
# doesn't work because it writes to FD 1 directly from C++
|
||||
final_coverage = coverage_counter.value if coverage_counter.value > 0 else final_corpus_size
|
||||
|
||||
await stats_callback({
|
||||
"total_execs": self.total_executions,
|
||||
"execs_per_sec": execs_per_sec,
|
||||
"crashes": total_crashes,
|
||||
"corpus_size": final_corpus_size,
|
||||
"coverage": final_coverage,
|
||||
"elapsed_time": int(elapsed)
|
||||
})
|
||||
|
||||
# Wait for process to fully terminate
|
||||
process.join(timeout=5)
|
||||
|
||||
if process.exitcode is not None and process.exitcode != 0:
|
||||
logger.warning(f"Subprocess exited with code: {process.exitcode}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing execution error: {e}")
|
||||
if process.is_alive():
|
||||
logger.warning("Terminating fuzzing subprocess...")
|
||||
process.terminate()
|
||||
process.join(timeout=5)
|
||||
if process.is_alive():
|
||||
process.kill()
|
||||
raise
|
||||
finally:
|
||||
# Stop monitoring
|
||||
if monitor_task:
|
||||
monitor_task.cancel()
|
||||
try:
|
||||
await monitor_task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
|
||||
async def _generate_findings(self, target_path: Path) -> List[ModuleFinding]:
|
||||
"""
|
||||
Generate ModuleFinding objects from crashes.
|
||||
|
||||
Args:
|
||||
target_path: Path to target file
|
||||
|
||||
Returns:
|
||||
List of findings
|
||||
"""
|
||||
findings = []
|
||||
|
||||
for idx, crash in enumerate(self.crashes):
|
||||
# Encode crash input for storage
|
||||
crash_input_b64 = base64.b64encode(crash["input"]).decode()
|
||||
|
||||
finding = self.create_finding(
|
||||
title=f"Crash: {crash['exception_type']}",
|
||||
description=(
|
||||
f"Atheris found crash during fuzzing:\n"
|
||||
f"Exception: {crash['exception_type']}\n"
|
||||
f"Message: {str(crash['exception'])}\n"
|
||||
f"Execution: {crash['execution']}"
|
||||
),
|
||||
severity="critical",
|
||||
category="crash",
|
||||
file_path=str(target_path),
|
||||
metadata={
|
||||
"crash_input_base64": crash_input_b64,
|
||||
"crash_input_hex": crash["input"].hex(),
|
||||
"exception_type": crash["exception_type"],
|
||||
"stack_trace": crash["stack_trace"],
|
||||
"execution_number": crash["execution"]
|
||||
},
|
||||
recommendation=(
|
||||
"Review the crash stack trace and input to identify the vulnerability. "
|
||||
"The crash input is provided in base64 and hex formats for reproduction."
|
||||
)
|
||||
)
|
||||
findings.append(finding)
|
||||
|
||||
# Report crash to backend for real-time monitoring
|
||||
if self.run_id:
|
||||
try:
|
||||
crash_report = {
|
||||
"run_id": self.run_id,
|
||||
"crash_id": f"crash_{idx + 1}",
|
||||
"timestamp": datetime.utcnow().isoformat(),
|
||||
"crash_type": crash["exception_type"],
|
||||
"stack_trace": crash["stack_trace"],
|
||||
"input_file": crash_input_b64,
|
||||
"severity": "critical",
|
||||
"exploitability": "unknown"
|
||||
}
|
||||
|
||||
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
await client.post(
|
||||
f"{backend_url}/fuzzing/{self.run_id}/crash",
|
||||
json=crash_report
|
||||
)
|
||||
logger.debug(f"Crash report sent to backend: {crash_report['crash_id']}")
|
||||
except Exception as e:
|
||||
logger.debug(f"Failed to post crash report to backend: {e}")
|
||||
|
||||
return findings
|
||||
@@ -0,0 +1,455 @@
|
||||
"""
|
||||
Cargo Fuzzer Module
|
||||
|
||||
Reusable module for fuzzing Rust code using cargo-fuzz (libFuzzer).
|
||||
Discovers and fuzzes user-provided Rust targets with fuzz_target!() macros.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional, Callable
|
||||
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CargoFuzzer(BaseModule):
|
||||
"""
|
||||
Cargo-fuzz (libFuzzer) fuzzer module for Rust code.
|
||||
|
||||
Discovers fuzz targets in user's Rust project and runs cargo-fuzz
|
||||
to find crashes, undefined behavior, and memory safety issues.
|
||||
"""
|
||||
|
||||
def get_metadata(self) -> ModuleMetadata:
|
||||
"""Get module metadata"""
|
||||
return ModuleMetadata(
|
||||
name="cargo_fuzz",
|
||||
version="0.11.2",
|
||||
description="Fuzz Rust code using cargo-fuzz with libFuzzer backend",
|
||||
author="FuzzForge Team",
|
||||
category="fuzzer",
|
||||
tags=["fuzzing", "rust", "cargo-fuzz", "libfuzzer", "memory-safety"],
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"target_name": {
|
||||
"type": "string",
|
||||
"description": "Fuzz target name (auto-discovered if not specified)"
|
||||
},
|
||||
"max_iterations": {
|
||||
"type": "integer",
|
||||
"default": 1000000,
|
||||
"description": "Maximum fuzzing iterations"
|
||||
},
|
||||
"timeout_seconds": {
|
||||
"type": "integer",
|
||||
"default": 1800,
|
||||
"description": "Fuzzing timeout in seconds"
|
||||
},
|
||||
"sanitizer": {
|
||||
"type": "string",
|
||||
"enum": ["address", "memory", "undefined"],
|
||||
"default": "address",
|
||||
"description": "Sanitizer to use (address, memory, undefined)"
|
||||
}
|
||||
}
|
||||
},
|
||||
output_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"description": "Crashes and memory safety issues found"
|
||||
},
|
||||
"summary": {
|
||||
"type": "object",
|
||||
"description": "Fuzzing execution summary"
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
def validate_config(self, config: Dict[str, Any]) -> bool:
|
||||
"""Validate configuration"""
|
||||
max_iterations = config.get("max_iterations", 1000000)
|
||||
if not isinstance(max_iterations, int) or max_iterations < 1:
|
||||
raise ValueError("max_iterations must be a positive integer")
|
||||
|
||||
timeout = config.get("timeout_seconds", 1800)
|
||||
if not isinstance(timeout, int) or timeout < 1:
|
||||
raise ValueError("timeout_seconds must be a positive integer")
|
||||
|
||||
sanitizer = config.get("sanitizer", "address")
|
||||
if sanitizer not in ["address", "memory", "undefined"]:
|
||||
raise ValueError("sanitizer must be one of: address, memory, undefined")
|
||||
|
||||
return True
|
||||
|
||||
async def execute(
|
||||
self,
|
||||
config: Dict[str, Any],
|
||||
workspace: Path,
|
||||
stats_callback: Optional[Callable] = None
|
||||
) -> ModuleResult:
|
||||
"""
|
||||
Execute cargo-fuzz on user's Rust code.
|
||||
|
||||
Args:
|
||||
config: Fuzzer configuration
|
||||
workspace: Path to workspace directory containing Rust project
|
||||
stats_callback: Optional callback for real-time stats updates
|
||||
|
||||
Returns:
|
||||
ModuleResult containing findings and summary
|
||||
"""
|
||||
self.start_timer()
|
||||
|
||||
try:
|
||||
# Validate inputs
|
||||
self.validate_config(config)
|
||||
self.validate_workspace(workspace)
|
||||
|
||||
logger.info(f"Running cargo-fuzz on {workspace}")
|
||||
|
||||
# Step 1: Discover fuzz targets
|
||||
targets = await self._discover_fuzz_targets(workspace)
|
||||
if not targets:
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error="No fuzz targets found. Expected fuzz targets in fuzz/fuzz_targets/"
|
||||
)
|
||||
|
||||
# Get target name from config or use first discovered target
|
||||
target_name = config.get("target_name")
|
||||
if not target_name:
|
||||
target_name = targets[0]
|
||||
logger.info(f"No target specified, using first discovered target: {target_name}")
|
||||
elif target_name not in targets:
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=f"Target '{target_name}' not found. Available targets: {', '.join(targets)}"
|
||||
)
|
||||
|
||||
# Step 2: Build fuzz target
|
||||
logger.info(f"Building fuzz target: {target_name}")
|
||||
build_success = await self._build_fuzz_target(workspace, target_name, config)
|
||||
if not build_success:
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=f"Failed to build fuzz target: {target_name}"
|
||||
)
|
||||
|
||||
# Step 3: Run fuzzing
|
||||
logger.info(f"Starting fuzzing: {target_name}")
|
||||
findings, stats = await self._run_fuzzing(
|
||||
workspace,
|
||||
target_name,
|
||||
config,
|
||||
stats_callback
|
||||
)
|
||||
|
||||
# Step 4: Parse crash artifacts
|
||||
crash_findings = await self._parse_crash_artifacts(workspace, target_name)
|
||||
findings.extend(crash_findings)
|
||||
|
||||
logger.info(f"Fuzzing completed: {len(findings)} crashes found")
|
||||
|
||||
return self.create_result(
|
||||
findings=findings,
|
||||
status="success",
|
||||
summary=stats
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Cargo fuzzer failed: {e}")
|
||||
return self.create_result(
|
||||
findings=[],
|
||||
status="failed",
|
||||
error=str(e)
|
||||
)
|
||||
|
||||
async def _discover_fuzz_targets(self, workspace: Path) -> List[str]:
|
||||
"""
|
||||
Discover fuzz targets in the project.
|
||||
|
||||
Looks for fuzz targets in fuzz/fuzz_targets/ directory.
|
||||
"""
|
||||
fuzz_targets_dir = workspace / "fuzz" / "fuzz_targets"
|
||||
if not fuzz_targets_dir.exists():
|
||||
logger.warning(f"No fuzz targets directory found: {fuzz_targets_dir}")
|
||||
return []
|
||||
|
||||
targets = []
|
||||
for file in fuzz_targets_dir.glob("*.rs"):
|
||||
target_name = file.stem
|
||||
targets.append(target_name)
|
||||
logger.info(f"Discovered fuzz target: {target_name}")
|
||||
|
||||
return targets
|
||||
|
||||
async def _build_fuzz_target(
|
||||
self,
|
||||
workspace: Path,
|
||||
target_name: str,
|
||||
config: Dict[str, Any]
|
||||
) -> bool:
|
||||
"""Build the fuzz target with instrumentation"""
|
||||
try:
|
||||
sanitizer = config.get("sanitizer", "address")
|
||||
|
||||
# Build command
|
||||
cmd = [
|
||||
"cargo", "fuzz", "build",
|
||||
target_name,
|
||||
f"--sanitizer={sanitizer}"
|
||||
]
|
||||
|
||||
logger.debug(f"Build command: {' '.join(cmd)}")
|
||||
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
cwd=workspace,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE
|
||||
)
|
||||
|
||||
stdout, stderr = await proc.communicate()
|
||||
|
||||
if proc.returncode != 0:
|
||||
logger.error(f"Build failed: {stderr.decode()}")
|
||||
return False
|
||||
|
||||
logger.info("Build successful")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Build error: {e}")
|
||||
return False
|
||||
|
||||
async def _run_fuzzing(
|
||||
self,
|
||||
workspace: Path,
|
||||
target_name: str,
|
||||
config: Dict[str, Any],
|
||||
stats_callback: Optional[Callable]
|
||||
) -> tuple[List[ModuleFinding], Dict[str, Any]]:
|
||||
"""
|
||||
Run cargo-fuzz and collect statistics.
|
||||
|
||||
Returns:
|
||||
Tuple of (findings, stats_dict)
|
||||
"""
|
||||
max_iterations = config.get("max_iterations", 1000000)
|
||||
timeout_seconds = config.get("timeout_seconds", 1800)
|
||||
sanitizer = config.get("sanitizer", "address")
|
||||
|
||||
findings = []
|
||||
stats = {
|
||||
"total_executions": 0,
|
||||
"crashes_found": 0,
|
||||
"corpus_size": 0,
|
||||
"coverage": 0.0,
|
||||
"execution_time": 0.0
|
||||
}
|
||||
|
||||
try:
|
||||
# Cargo fuzz run command
|
||||
cmd = [
|
||||
"cargo", "fuzz", "run",
|
||||
target_name,
|
||||
f"--sanitizer={sanitizer}",
|
||||
"--",
|
||||
f"-runs={max_iterations}",
|
||||
f"-max_total_time={timeout_seconds}"
|
||||
]
|
||||
|
||||
logger.debug(f"Fuzz command: {' '.join(cmd)}")
|
||||
|
||||
start_time = time.time()
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
cwd=workspace,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.STDOUT
|
||||
)
|
||||
|
||||
# Monitor output and extract stats
|
||||
last_stats_time = time.time()
|
||||
async for line in proc.stdout:
|
||||
line_str = line.decode('utf-8', errors='ignore').strip()
|
||||
|
||||
# Parse libFuzzer stats
|
||||
# Example: "#12345 NEW cov: 123 ft: 456 corp: 10/234b"
|
||||
stats_match = re.match(r'#(\d+)\s+.*cov:\s*(\d+).*corp:\s*(\d+)', line_str)
|
||||
if stats_match:
|
||||
execs = int(stats_match.group(1))
|
||||
cov = int(stats_match.group(2))
|
||||
corp = int(stats_match.group(3))
|
||||
|
||||
stats["total_executions"] = execs
|
||||
stats["coverage"] = float(cov)
|
||||
stats["corpus_size"] = corp
|
||||
stats["execution_time"] = time.time() - start_time
|
||||
|
||||
# Invoke stats callback for real-time monitoring
|
||||
if stats_callback and time.time() - last_stats_time >= 0.5:
|
||||
await stats_callback({
|
||||
"total_execs": execs,
|
||||
"execs_per_sec": execs / stats["execution_time"] if stats["execution_time"] > 0 else 0,
|
||||
"crashes": stats["crashes_found"],
|
||||
"coverage": cov,
|
||||
"corpus_size": corp,
|
||||
"elapsed_time": int(stats["execution_time"])
|
||||
})
|
||||
last_stats_time = time.time()
|
||||
|
||||
# Detect crash line
|
||||
if "SUMMARY:" in line_str or "ERROR:" in line_str:
|
||||
logger.info(f"Detected crash: {line_str}")
|
||||
stats["crashes_found"] += 1
|
||||
|
||||
await proc.wait()
|
||||
stats["execution_time"] = time.time() - start_time
|
||||
|
||||
# Send final stats update
|
||||
if stats_callback:
|
||||
await stats_callback({
|
||||
"total_execs": stats["total_executions"],
|
||||
"execs_per_sec": stats["total_executions"] / stats["execution_time"] if stats["execution_time"] > 0 else 0,
|
||||
"crashes": stats["crashes_found"],
|
||||
"coverage": stats["coverage"],
|
||||
"corpus_size": stats["corpus_size"],
|
||||
"elapsed_time": int(stats["execution_time"])
|
||||
})
|
||||
|
||||
logger.info(
|
||||
f"Fuzzing completed: {stats['total_executions']} execs, "
|
||||
f"{stats['crashes_found']} crashes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing error: {e}")
|
||||
|
||||
return findings, stats
|
||||
|
||||
async def _parse_crash_artifacts(
|
||||
self,
|
||||
workspace: Path,
|
||||
target_name: str
|
||||
) -> List[ModuleFinding]:
|
||||
"""
|
||||
Parse crash artifacts from fuzz/artifacts directory.
|
||||
|
||||
Cargo-fuzz stores crashes in: fuzz/artifacts/<target_name>/
|
||||
"""
|
||||
findings = []
|
||||
artifacts_dir = workspace / "fuzz" / "artifacts" / target_name
|
||||
|
||||
if not artifacts_dir.exists():
|
||||
logger.info("No crash artifacts found")
|
||||
return findings
|
||||
|
||||
# Find all crash files
|
||||
for crash_file in artifacts_dir.glob("crash-*"):
|
||||
try:
|
||||
finding = await self._analyze_crash(workspace, target_name, crash_file)
|
||||
if finding:
|
||||
findings.append(finding)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
|
||||
|
||||
logger.info(f"Parsed {len(findings)} crash artifacts")
|
||||
return findings
|
||||
|
||||
async def _analyze_crash(
|
||||
self,
|
||||
workspace: Path,
|
||||
target_name: str,
|
||||
crash_file: Path
|
||||
) -> Optional[ModuleFinding]:
|
||||
"""
|
||||
Analyze a single crash file.
|
||||
|
||||
Runs cargo-fuzz with the crash input to reproduce and get stack trace.
|
||||
"""
|
||||
try:
|
||||
# Read crash input
|
||||
crash_input = crash_file.read_bytes()
|
||||
|
||||
# Reproduce crash to get stack trace
|
||||
cmd = [
|
||||
"cargo", "fuzz", "run",
|
||||
target_name,
|
||||
str(crash_file)
|
||||
]
|
||||
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
cwd=workspace,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.STDOUT,
|
||||
env={**os.environ, "RUST_BACKTRACE": "1"}
|
||||
)
|
||||
|
||||
stdout, _ = await proc.communicate()
|
||||
output = stdout.decode('utf-8', errors='ignore')
|
||||
|
||||
# Parse stack trace and error type
|
||||
error_type = "Unknown Crash"
|
||||
stack_trace = output
|
||||
|
||||
# Extract error type
|
||||
if "SEGV" in output:
|
||||
error_type = "Segmentation Fault"
|
||||
severity = "critical"
|
||||
elif "heap-use-after-free" in output:
|
||||
error_type = "Use After Free"
|
||||
severity = "critical"
|
||||
elif "heap-buffer-overflow" in output:
|
||||
error_type = "Heap Buffer Overflow"
|
||||
severity = "critical"
|
||||
elif "stack-buffer-overflow" in output:
|
||||
error_type = "Stack Buffer Overflow"
|
||||
severity = "high"
|
||||
elif "panic" in output.lower():
|
||||
error_type = "Panic"
|
||||
severity = "medium"
|
||||
else:
|
||||
severity = "high"
|
||||
|
||||
# Create finding
|
||||
finding = self.create_finding(
|
||||
title=f"Crash: {error_type} in {target_name}",
|
||||
description=f"Cargo-fuzz discovered a crash in target '{target_name}'. "
|
||||
f"Error type: {error_type}. "
|
||||
f"Input size: {len(crash_input)} bytes.",
|
||||
severity=severity,
|
||||
category="crash",
|
||||
file_path=f"fuzz/fuzz_targets/{target_name}.rs",
|
||||
code_snippet=stack_trace[:500],
|
||||
recommendation="Review the crash details and fix the underlying bug. "
|
||||
"Use AddressSanitizer to identify memory safety issues. "
|
||||
"Consider adding bounds checks or using safer APIs.",
|
||||
metadata={
|
||||
"error_type": error_type,
|
||||
"crash_file": crash_file.name,
|
||||
"input_size": len(crash_input),
|
||||
"reproducer": crash_file.name,
|
||||
"stack_trace": stack_trace
|
||||
}
|
||||
)
|
||||
|
||||
return finding
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
|
||||
return None
|
||||
@@ -17,7 +17,6 @@ import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
from datetime import datetime
|
||||
import json
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
|
||||
@@ -16,16 +16,16 @@ File Scanner Module - Scans and enumerates files in the workspace
|
||||
import logging
|
||||
import mimetypes
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List
|
||||
from typing import Dict, Any
|
||||
import hashlib
|
||||
|
||||
try:
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
|
||||
except ImportError:
|
||||
try:
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
from modules.base import BaseModule, ModuleMetadata, ModuleResult
|
||||
except ImportError:
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
|
||||
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow
|
||||
|
||||
Fuzzes user-provided Python code using Atheris.
|
||||
"""
|
||||
|
||||
from .workflow import AtherisFuzzingWorkflow
|
||||
|
||||
__all__ = ["AtherisFuzzingWorkflow"]
|
||||
@@ -0,0 +1,122 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow Activities
|
||||
|
||||
Activities specific to the Atheris fuzzing workflow.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_with_atheris")
|
||||
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Fuzzing activity using the AtherisFuzzer module on user code.
|
||||
|
||||
This activity:
|
||||
1. Imports the reusable AtherisFuzzer module
|
||||
2. Sets up real-time stats callback
|
||||
3. Executes fuzzing on user's TestOneInput() function
|
||||
4. Returns findings as ModuleResult
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory (user's uploaded code)
|
||||
config: Fuzzer configuration (target_file, max_iterations, timeout_seconds)
|
||||
|
||||
Returns:
|
||||
Fuzzer results dictionary (findings, summary, metadata)
|
||||
"""
|
||||
logger.info(f"Activity: fuzz_with_atheris (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
# Import reusable AtherisFuzzer module
|
||||
from modules.fuzzer import AtherisFuzzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
# Get activity info for real-time stats
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
# Define stats callback for real-time monitoring
|
||||
async def stats_callback(stats_data: Dict[str, Any]):
|
||||
"""Callback for live fuzzing statistics"""
|
||||
try:
|
||||
# Prepare stats payload for backend
|
||||
coverage_value = stats_data.get("coverage", 0)
|
||||
logger.info(f"COVERAGE_DEBUG: coverage from stats_data = {coverage_value}")
|
||||
|
||||
stats_payload = {
|
||||
"run_id": run_id,
|
||||
"workflow": "atheris_fuzzing",
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"unique_crashes": stats_data.get("crashes", 0),
|
||||
"coverage": coverage_value,
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"last_crash_time": None
|
||||
}
|
||||
|
||||
# POST stats to backend API for real-time monitoring
|
||||
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
try:
|
||||
await client.post(
|
||||
f"{backend_url}/fuzzing/{run_id}/stats",
|
||||
json=stats_payload
|
||||
)
|
||||
except Exception as http_err:
|
||||
logger.debug(f"Failed to post stats to backend: {http_err}")
|
||||
|
||||
# Also log for debugging
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "fuzzing_live_update",
|
||||
"workflow_type": "atheris_fuzzing",
|
||||
"run_id": run_id,
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"coverage": stats_data.get("coverage", 0.0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"timestamp": datetime.utcnow().isoformat()
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning(f"Error in stats callback: {e}")
|
||||
|
||||
# Add stats callback and run_id to config
|
||||
config["stats_callback"] = stats_callback
|
||||
config["run_id"] = run_id
|
||||
|
||||
# Execute the fuzzer module
|
||||
fuzzer = AtherisFuzzer()
|
||||
result = await fuzzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{result.summary.get('total_executions', 0)} executions, "
|
||||
f"{result.summary.get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -0,0 +1,65 @@
|
||||
name: atheris_fuzzing
|
||||
version: "1.0.0"
|
||||
vertical: python
|
||||
description: "Fuzz Python code using Atheris with real-time monitoring. Automatically discovers and fuzzes TestOneInput() functions in user code."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "atheris"
|
||||
- "python"
|
||||
- "coverage"
|
||||
- "security"
|
||||
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
target_file: null
|
||||
max_iterations: 1000000
|
||||
timeout_seconds: 1800
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_file:
|
||||
type: string
|
||||
description: "Python file with TestOneInput() function (auto-discovered if not specified)"
|
||||
max_iterations:
|
||||
type: integer
|
||||
default: 1000000
|
||||
description: "Maximum fuzzing iterations"
|
||||
timeout_seconds:
|
||||
type: integer
|
||||
default: 1800
|
||||
description: "Fuzzing timeout in seconds (30 minutes)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
findings:
|
||||
type: array
|
||||
description: "Crashes and vulnerabilities found during fuzzing"
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
title:
|
||||
type: string
|
||||
severity:
|
||||
type: string
|
||||
category:
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
summary:
|
||||
type: object
|
||||
description: "Fuzzing execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
execution_time:
|
||||
type: number
|
||||
@@ -0,0 +1,175 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow - Temporal Version
|
||||
|
||||
Fuzzes user-provided Python code using Atheris with real-time monitoring.
|
||||
"""
|
||||
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class AtherisFuzzingWorkflow:
|
||||
"""
|
||||
Fuzz Python code using Atheris.
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run atheris_fuzzing .
|
||||
2. CLI uploads project to MinIO
|
||||
3. Worker downloads project
|
||||
4. Worker fuzzes TestOneInput() function
|
||||
5. Crashes reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # MinIO UUID of uploaded user code
|
||||
target_file: Optional[str] = None, # Optional: specific file to fuzz
|
||||
max_iterations: int = 1000000,
|
||||
timeout_seconds: int = 1800 # 30 minutes default for fuzzing
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
target_file: Optional specific Python file with TestOneInput() (auto-discovered if None)
|
||||
max_iterations: Maximum fuzzing iterations
|
||||
timeout_seconds: Fuzzing timeout in seconds
|
||||
|
||||
Returns:
|
||||
Dictionary containing findings and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting AtherisFuzzingWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id}, "
|
||||
f"target_file={target_file or 'auto-discover'}, max_iterations={max_iterations}, "
|
||||
f"timeout_seconds={timeout_seconds})"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Step 1: Download user's project from MinIO
|
||||
workflow.logger.info("Step 1: Downloading user code from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
|
||||
|
||||
# Step 2: Run Atheris fuzzing
|
||||
workflow.logger.info("Step 2: Running Atheris fuzzing")
|
||||
|
||||
# Use defaults if parameters are None
|
||||
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
|
||||
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
|
||||
|
||||
fuzz_config = {
|
||||
"target_file": target_file,
|
||||
"max_iterations": actual_max_iterations,
|
||||
"timeout_seconds": actual_timeout_seconds
|
||||
}
|
||||
|
||||
fuzz_results = await workflow.execute_activity(
|
||||
"fuzz_with_atheris",
|
||||
args=[target_path, fuzz_config],
|
||||
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 60),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
|
||||
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
|
||||
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
# Step 3: Upload results to MinIO
|
||||
workflow.logger.info("Step 3: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, fuzz_results, "json"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 4: Cleanup cache
|
||||
workflow.logger.info("Step 4: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["findings"] = fuzz_results.get("findings", [])
|
||||
results["summary"] = fuzz_results.get("summary", {})
|
||||
results["sarif"] = fuzz_results.get("sarif") or {}
|
||||
workflow.logger.info(
|
||||
f"✓ Workflow completed successfully: {workflow_id} "
|
||||
f"({results['summary'].get('crashes_found', 0)} crashes found)"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Cargo Fuzzing Workflow"""
|
||||
|
||||
from .workflow import CargoFuzzingWorkflow
|
||||
|
||||
__all__ = ["CargoFuzzingWorkflow"]
|
||||
@@ -0,0 +1,203 @@
|
||||
"""
|
||||
Cargo Fuzzing Workflow Activities
|
||||
|
||||
Activities specific to the cargo-fuzz fuzzing workflow.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_with_cargo")
|
||||
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Fuzzing activity using the CargoFuzzer module on user code.
|
||||
|
||||
This activity:
|
||||
1. Imports the reusable CargoFuzzer module
|
||||
2. Sets up real-time stats callback
|
||||
3. Executes fuzzing on user's fuzz_target!() functions
|
||||
4. Returns findings as ModuleResult
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory (user's uploaded Rust project)
|
||||
config: Fuzzer configuration (target_name, max_iterations, timeout_seconds, sanitizer)
|
||||
|
||||
Returns:
|
||||
Fuzzer results dictionary (findings, summary, metadata)
|
||||
"""
|
||||
logger.info(f"Activity: fuzz_with_cargo (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
# Import reusable CargoFuzzer module
|
||||
from modules.fuzzer import CargoFuzzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
# Get activity info for real-time stats
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
# Define stats callback for real-time monitoring
|
||||
async def stats_callback(stats_data: Dict[str, Any]):
|
||||
"""Callback for live fuzzing statistics"""
|
||||
try:
|
||||
# Prepare stats payload for backend
|
||||
coverage_value = stats_data.get("coverage", 0)
|
||||
|
||||
stats_payload = {
|
||||
"run_id": run_id,
|
||||
"workflow": "cargo_fuzzing",
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"unique_crashes": stats_data.get("crashes", 0),
|
||||
"coverage": coverage_value,
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"last_crash_time": None
|
||||
}
|
||||
|
||||
# POST stats to backend API for real-time monitoring
|
||||
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
try:
|
||||
await client.post(
|
||||
f"{backend_url}/fuzzing/{run_id}/stats",
|
||||
json=stats_payload
|
||||
)
|
||||
except Exception as http_err:
|
||||
logger.debug(f"Failed to post stats to backend: {http_err}")
|
||||
|
||||
# Also log for debugging
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "fuzzing_live_update",
|
||||
"workflow_type": "cargo_fuzzing",
|
||||
"run_id": run_id,
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"coverage": stats_data.get("coverage", 0.0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"timestamp": datetime.utcnow().isoformat()
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Stats callback error: {e}")
|
||||
|
||||
# Initialize CargoFuzzer module
|
||||
fuzzer = CargoFuzzer()
|
||||
|
||||
# Execute fuzzing with stats callback
|
||||
module_result = await fuzzer.execute(
|
||||
config=config,
|
||||
workspace=workspace,
|
||||
stats_callback=stats_callback
|
||||
)
|
||||
|
||||
# Convert ModuleResult to dictionary
|
||||
result_dict = {
|
||||
"findings": [],
|
||||
"summary": module_result.summary,
|
||||
"metadata": module_result.metadata,
|
||||
"status": module_result.status,
|
||||
"error": module_result.error
|
||||
}
|
||||
|
||||
# Convert findings to dict format
|
||||
for finding in module_result.findings:
|
||||
finding_dict = {
|
||||
"id": finding.id,
|
||||
"title": finding.title,
|
||||
"description": finding.description,
|
||||
"severity": finding.severity,
|
||||
"category": finding.category,
|
||||
"file_path": finding.file_path,
|
||||
"line_start": finding.line_start,
|
||||
"line_end": finding.line_end,
|
||||
"code_snippet": finding.code_snippet,
|
||||
"recommendation": finding.recommendation,
|
||||
"metadata": finding.metadata
|
||||
}
|
||||
result_dict["findings"].append(finding_dict)
|
||||
|
||||
# Generate SARIF report from findings
|
||||
if module_result.findings:
|
||||
# Convert findings to SARIF format
|
||||
severity_map = {
|
||||
"critical": "error",
|
||||
"high": "error",
|
||||
"medium": "warning",
|
||||
"low": "note",
|
||||
"info": "note"
|
||||
}
|
||||
|
||||
results = []
|
||||
for finding in module_result.findings:
|
||||
result = {
|
||||
"ruleId": finding.metadata.get("rule_id", finding.category),
|
||||
"level": severity_map.get(finding.severity, "warning"),
|
||||
"message": {"text": finding.description},
|
||||
"locations": []
|
||||
}
|
||||
|
||||
if finding.file_path:
|
||||
location = {
|
||||
"physicalLocation": {
|
||||
"artifactLocation": {"uri": finding.file_path},
|
||||
"region": {
|
||||
"startLine": finding.line_start or 1,
|
||||
"endLine": finding.line_end or finding.line_start or 1
|
||||
}
|
||||
}
|
||||
}
|
||||
result["locations"].append(location)
|
||||
|
||||
results.append(result)
|
||||
|
||||
result_dict["sarif"] = {
|
||||
"version": "2.1.0",
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"runs": [{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "cargo-fuzz",
|
||||
"version": "0.11.2"
|
||||
}
|
||||
},
|
||||
"results": results
|
||||
}]
|
||||
}
|
||||
else:
|
||||
result_dict["sarif"] = {
|
||||
"version": "2.1.0",
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"runs": []
|
||||
}
|
||||
|
||||
logger.info(
|
||||
f"Fuzzing activity completed: {len(module_result.findings)} crashes found, "
|
||||
f"{module_result.summary.get('total_executions', 0)} executions"
|
||||
)
|
||||
|
||||
return result_dict
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing activity failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -0,0 +1,71 @@
|
||||
name: cargo_fuzzing
|
||||
version: "1.0.0"
|
||||
vertical: rust
|
||||
description: "Fuzz Rust code using cargo-fuzz with real-time monitoring. Automatically discovers and fuzzes fuzz_target!() functions in user code."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "cargo-fuzz"
|
||||
- "rust"
|
||||
- "libfuzzer"
|
||||
- "memory-safety"
|
||||
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
target_name: null
|
||||
max_iterations: 1000000
|
||||
timeout_seconds: 1800
|
||||
sanitizer: "address"
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_name:
|
||||
type: string
|
||||
description: "Fuzz target name from fuzz/fuzz_targets/ (auto-discovered if not specified)"
|
||||
max_iterations:
|
||||
type: integer
|
||||
default: 1000000
|
||||
description: "Maximum fuzzing iterations"
|
||||
timeout_seconds:
|
||||
type: integer
|
||||
default: 1800
|
||||
description: "Fuzzing timeout in seconds (30 minutes)"
|
||||
sanitizer:
|
||||
type: string
|
||||
enum: ["address", "memory", "undefined"]
|
||||
default: "address"
|
||||
description: "Sanitizer to use (address, memory, undefined)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
findings:
|
||||
type: array
|
||||
description: "Crashes and memory safety issues found during fuzzing"
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
title:
|
||||
type: string
|
||||
severity:
|
||||
type: string
|
||||
category:
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
summary:
|
||||
type: object
|
||||
description: "Fuzzing execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
execution_time:
|
||||
type: number
|
||||
@@ -0,0 +1,180 @@
|
||||
"""
|
||||
Cargo Fuzzing Workflow - Temporal Version
|
||||
|
||||
Fuzzes user-provided Rust code using cargo-fuzz with real-time monitoring.
|
||||
"""
|
||||
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class CargoFuzzingWorkflow:
|
||||
"""
|
||||
Fuzz Rust code using cargo-fuzz (libFuzzer).
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run cargo_fuzzing .
|
||||
2. CLI uploads Rust project to MinIO
|
||||
3. Worker downloads project
|
||||
4. Worker discovers fuzz targets in fuzz/fuzz_targets/
|
||||
5. Worker fuzzes the target with cargo-fuzz
|
||||
6. Crashes reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # MinIO UUID of uploaded user code
|
||||
target_name: Optional[str] = None, # Optional: specific fuzz target name
|
||||
max_iterations: int = 1000000,
|
||||
timeout_seconds: int = 1800, # 30 minutes default for fuzzing
|
||||
sanitizer: str = "address"
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
target_name: Optional specific fuzz target name (auto-discovered if None)
|
||||
max_iterations: Maximum fuzzing iterations
|
||||
timeout_seconds: Fuzzing timeout in seconds
|
||||
sanitizer: Sanitizer to use (address, memory, undefined)
|
||||
|
||||
Returns:
|
||||
Dictionary containing findings and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting CargoFuzzingWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id}, "
|
||||
f"target_name={target_name or 'auto-discover'}, max_iterations={max_iterations}, "
|
||||
f"timeout_seconds={timeout_seconds}, sanitizer={sanitizer})"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Step 1: Download user's Rust project from MinIO
|
||||
workflow.logger.info("Step 1: Downloading user code from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
|
||||
|
||||
# Step 2: Run cargo-fuzz
|
||||
workflow.logger.info("Step 2: Running cargo-fuzz")
|
||||
|
||||
# Use defaults if parameters are None
|
||||
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
|
||||
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
|
||||
actual_sanitizer = sanitizer if sanitizer is not None else "address"
|
||||
|
||||
fuzz_config = {
|
||||
"target_name": target_name,
|
||||
"max_iterations": actual_max_iterations,
|
||||
"timeout_seconds": actual_timeout_seconds,
|
||||
"sanitizer": actual_sanitizer
|
||||
}
|
||||
|
||||
fuzz_results = await workflow.execute_activity(
|
||||
"fuzz_with_cargo",
|
||||
args=[target_path, fuzz_config],
|
||||
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 120),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
|
||||
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
|
||||
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
# Step 3: Upload results to MinIO
|
||||
workflow.logger.info("Step 3: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, fuzz_results, "json"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 4: Cleanup cache
|
||||
workflow.logger.info("Step 4: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["findings"] = fuzz_results.get("findings", [])
|
||||
results["summary"] = fuzz_results.get("summary", {})
|
||||
results["sarif"] = fuzz_results.get("sarif") or {}
|
||||
workflow.logger.info(
|
||||
f"✓ Workflow completed successfully: {workflow_id} "
|
||||
f"({results['summary'].get('crashes_found', 0)} crashes found)"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -1,12 +0,0 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
@@ -1,47 +0,0 @@
|
||||
# Secret Detection Workflow Dockerfile
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog (use direct binary download to avoid install script issues)
|
||||
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
|
||||
&& tar -xzf trufflehog.tar.gz \
|
||||
&& mv trufflehog /usr/local/bin/ \
|
||||
&& rm trufflehog.tar.gz
|
||||
|
||||
# Install Gitleaks (use specific version to avoid API rate limiting)
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_8.18.2_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create toolbox directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# The toolbox code will be mounted at runtime from the backend container
|
||||
# This includes:
|
||||
# - /opt/prefect/toolbox/modules/base.py
|
||||
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
|
||||
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
|
||||
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
VOLUME /opt/prefect/toolbox
|
||||
|
||||
# Set working directory for execution
|
||||
WORKDIR /opt/prefect
|
||||
-58
@@ -1,58 +0,0 @@
|
||||
# Secret Detection Workflow Dockerfile - Self-Contained Version
|
||||
# This version copies all required modules into the image for complete isolation
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog
|
||||
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
|
||||
|
||||
# Install Gitleaks
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
|
||||
/opt/prefect/toolbox/modules/reporter \
|
||||
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
|
||||
|
||||
# Copy the base module and required modules
|
||||
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
|
||||
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
|
||||
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
|
||||
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
|
||||
|
||||
# Copy the workflow code
|
||||
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
|
||||
# Copy toolbox init files
|
||||
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
|
||||
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
|
||||
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
|
||||
|
||||
# Install Python dependencies for the modules
|
||||
RUN pip install --no-cache-dir \
|
||||
pydantic \
|
||||
asyncio
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# Set default command (can be overridden)
|
||||
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
|
||||
@@ -1,130 +0,0 @@
|
||||
# Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple industry-standard tools:
|
||||
|
||||
- **TruffleHog**: Comprehensive secret detection with verification capabilities
|
||||
- **Gitleaks**: Git-specific secret scanning and leak detection
|
||||
|
||||
## Features
|
||||
|
||||
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
|
||||
- **Deduplication**: Automatically removes duplicate findings across tools
|
||||
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
|
||||
- **Configurable**: Supports extensive configuration for both tools
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Required Modules
|
||||
- `toolbox.modules.secret_detection.trufflehog`
|
||||
- `toolbox.modules.secret_detection.gitleaks`
|
||||
- `toolbox.modules.reporter` (SARIF reporter)
|
||||
- `toolbox.modules.base` (Base module interface)
|
||||
|
||||
### External Tools
|
||||
- TruffleHog v3.63.2+
|
||||
- Gitleaks v8.18.0+
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
This workflow provides two Docker deployment approaches:
|
||||
|
||||
### 1. Volume-Based Approach (Default: `Dockerfile`)
|
||||
|
||||
**Advantages:**
|
||||
- Live code updates without rebuilding images
|
||||
- Smaller image sizes
|
||||
- Consistent module versions across workflows
|
||||
- Faster development iteration
|
||||
|
||||
**How it works:**
|
||||
- Docker image contains only external tools (TruffleHog, Gitleaks)
|
||||
- Python modules are mounted at runtime from the backend container
|
||||
- Backend manages code synchronization via shared volumes
|
||||
|
||||
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
|
||||
|
||||
**Advantages:**
|
||||
- Complete isolation and reproducibility
|
||||
- No runtime dependencies on backend code
|
||||
- Can run independently of FuzzForge platform
|
||||
- Better for CI/CD integration
|
||||
|
||||
**How it works:**
|
||||
- All required Python modules are copied into the Docker image
|
||||
- Image is completely self-contained
|
||||
- Larger image size but fully portable
|
||||
|
||||
## Configuration
|
||||
|
||||
### TruffleHog Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"trufflehog_config": {
|
||||
"verify": true, // Verify discovered secrets
|
||||
"concurrency": 10, // Number of concurrent workers
|
||||
"max_depth": 10, // Maximum directory depth
|
||||
"include_detectors": [], // Specific detectors to include
|
||||
"exclude_detectors": [] // Specific detectors to exclude
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gitleaks Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect", // "detect" or "protect"
|
||||
"redact": true, // Redact secrets in output
|
||||
"max_target_megabytes": 100, // Maximum file size (MB)
|
||||
"no_git": false, // Scan without Git context
|
||||
"config_file": "", // Custom Gitleaks config
|
||||
"baseline_file": "" // Baseline file for known findings
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/path/to/scan",
|
||||
"volume_mode": "ro",
|
||||
"parameters": {
|
||||
"trufflehog_config": {
|
||||
"verify": true,
|
||||
"concurrency": 15
|
||||
},
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect",
|
||||
"max_target_megabytes": 200
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
The workflow generates a SARIF report containing:
|
||||
- All unique findings from both tools
|
||||
- Severity levels mapped to standard scale
|
||||
- File locations and line numbers
|
||||
- Detailed descriptions and recommendations
|
||||
- Tool-specific metadata
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **TruffleHog**: CPU-intensive with verification enabled
|
||||
- **Gitleaks**: Memory-intensive for large repositories
|
||||
- **Recommended Resources**: 512Mi memory, 500m CPU
|
||||
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Secrets are redacted in output by default
|
||||
- Verified secrets are marked with higher severity
|
||||
- Both tools support custom rules and exclusions
|
||||
- Consider using baseline files for known false positives
|
||||
@@ -1,17 +0,0 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This package contains the comprehensive secret detection workflow that combines
|
||||
multiple secret detection tools for thorough analysis.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
@@ -1,113 +0,0 @@
|
||||
name: secret_detection_scan
|
||||
version: "2.0.0"
|
||||
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "secrets"
|
||||
- "credentials"
|
||||
- "detection"
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace"
|
||||
volume_mode: "ro"
|
||||
trufflehog_config: {}
|
||||
gitleaks_config: {}
|
||||
reporter_config: {}
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
trufflehog_config:
|
||||
type: object
|
||||
description: "TruffleHog configuration"
|
||||
properties:
|
||||
verify:
|
||||
type: boolean
|
||||
description: "Verify discovered secrets"
|
||||
concurrency:
|
||||
type: integer
|
||||
description: "Number of concurrent workers"
|
||||
max_depth:
|
||||
type: integer
|
||||
description: "Maximum directory depth to scan"
|
||||
include_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to include"
|
||||
exclude_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to exclude"
|
||||
gitleaks_config:
|
||||
type: object
|
||||
description: "Gitleaks configuration"
|
||||
properties:
|
||||
scan_mode:
|
||||
type: string
|
||||
enum: ["detect", "protect"]
|
||||
description: "Scan mode"
|
||||
redact:
|
||||
type: boolean
|
||||
description: "Redact secrets in output"
|
||||
max_target_megabytes:
|
||||
type: integer
|
||||
description: "Maximum file size to scan (MB)"
|
||||
no_git:
|
||||
type: boolean
|
||||
description: "Scan files without Git context"
|
||||
config_file:
|
||||
type: string
|
||||
description: "Path to custom configuration file"
|
||||
baseline_file:
|
||||
type: string
|
||||
description: "Path to baseline file"
|
||||
reporter_config:
|
||||
type: object
|
||||
description: "SARIF reporter configuration"
|
||||
properties:
|
||||
output_file:
|
||||
type: string
|
||||
description: "Output SARIF file name"
|
||||
include_code_flows:
|
||||
type: boolean
|
||||
description: "Include code flow information"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted security findings"
|
||||
@@ -1,290 +0,0 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple tools:
|
||||
- TruffleHog: Comprehensive secret detection with verification
|
||||
- Gitleaks: Git-specific secret scanning
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from prefect import flow, task
|
||||
from prefect.artifacts import create_markdown_artifact, create_table_artifact
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
# Add modules to path
|
||||
sys.path.insert(0, '/app')
|
||||
|
||||
# Import modules
|
||||
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
|
||||
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@task(name="trufflehog_scan")
|
||||
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run TruffleHog secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: TruffleHog configuration
|
||||
|
||||
Returns:
|
||||
TruffleHog results
|
||||
"""
|
||||
logger.info("Running TruffleHog secret detection")
|
||||
module = TruffleHogModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="gitleaks_scan")
|
||||
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run Gitleaks secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Gitleaks configuration
|
||||
|
||||
Returns:
|
||||
Gitleaks results
|
||||
"""
|
||||
logger.info("Running Gitleaks secret detection")
|
||||
module = GitleaksModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="aggregate_findings")
|
||||
async def aggregate_findings_task(
|
||||
trufflehog_results: Dict[str, Any],
|
||||
gitleaks_results: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to aggregate findings from all secret detection tools.
|
||||
|
||||
Args:
|
||||
trufflehog_results: Results from TruffleHog
|
||||
gitleaks_results: Results from Gitleaks
|
||||
config: Reporter configuration
|
||||
workspace: Path to workspace
|
||||
|
||||
Returns:
|
||||
Aggregated SARIF report
|
||||
"""
|
||||
logger.info("Aggregating secret detection findings")
|
||||
|
||||
# Combine all findings
|
||||
all_findings = []
|
||||
|
||||
# Add TruffleHog findings
|
||||
trufflehog_findings = trufflehog_results.get("findings", [])
|
||||
all_findings.extend(trufflehog_findings)
|
||||
|
||||
# Add Gitleaks findings
|
||||
gitleaks_findings = gitleaks_results.get("findings", [])
|
||||
all_findings.extend(gitleaks_findings)
|
||||
|
||||
# Deduplicate findings based on file path and line number
|
||||
unique_findings = []
|
||||
seen_signatures = set()
|
||||
|
||||
for finding in all_findings:
|
||||
# Create signature for deduplication
|
||||
signature = (
|
||||
finding.get("file_path", ""),
|
||||
finding.get("line_start", 0),
|
||||
finding.get("title", "").lower()[:50] # First 50 chars of title
|
||||
)
|
||||
|
||||
if signature not in seen_signatures:
|
||||
seen_signatures.add(signature)
|
||||
unique_findings.append(finding)
|
||||
else:
|
||||
logger.debug(f"Deduplicated finding: {signature}")
|
||||
|
||||
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
|
||||
|
||||
# Generate SARIF report
|
||||
reporter = SARIFReporter()
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": unique_findings,
|
||||
"tool_name": "FuzzForge Secret Detection",
|
||||
"tool_version": "1.0.0",
|
||||
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
return result.dict().get("sarif", {})
|
||||
|
||||
|
||||
@flow(name="secret_detection_scan", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
volume_mode: str = "ro",
|
||||
trufflehog_config: Optional[Dict[str, Any]] = None,
|
||||
gitleaks_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main secret detection workflow.
|
||||
|
||||
This workflow:
|
||||
1. Runs TruffleHog for comprehensive secret detection
|
||||
2. Runs Gitleaks for Git-specific secret detection
|
||||
3. Aggregates and deduplicates findings
|
||||
4. Generates a unified SARIF report
|
||||
|
||||
Args:
|
||||
target_path: Path to the mounted workspace (default: /workspace)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
trufflehog_config: Configuration for TruffleHog
|
||||
gitleaks_config: Configuration for Gitleaks
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings report
|
||||
"""
|
||||
logger.info("Starting comprehensive secret detection workflow")
|
||||
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
|
||||
|
||||
# Set workspace path
|
||||
workspace = Path(target_path)
|
||||
|
||||
if not workspace.exists():
|
||||
logger.error(f"Workspace does not exist: {workspace}")
|
||||
return {
|
||||
"error": f"Workspace not found: {workspace}",
|
||||
"sarif": None
|
||||
}
|
||||
|
||||
# Default configurations - merge with provided configs to ensure defaults are always applied
|
||||
default_trufflehog_config = {
|
||||
"verify": False,
|
||||
"concurrency": 10,
|
||||
"max_depth": 10,
|
||||
"no_git": True # Add no_git for filesystem scanning
|
||||
}
|
||||
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
|
||||
|
||||
default_gitleaks_config = {
|
||||
"scan_mode": "detect",
|
||||
"redact": True,
|
||||
"max_target_megabytes": 100,
|
||||
"no_git": True # Critical for non-git directories
|
||||
}
|
||||
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
|
||||
|
||||
default_reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
reporter_config = {**default_reporter_config, **(reporter_config or {})}
|
||||
|
||||
try:
|
||||
# Run secret detection tools in parallel
|
||||
logger.info("Phase 1: Running secret detection tools")
|
||||
|
||||
# Create tasks for parallel execution
|
||||
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
|
||||
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
|
||||
|
||||
# Wait for both to complete
|
||||
trufflehog_results, gitleaks_results = await asyncio.gather(
|
||||
trufflehog_task_result,
|
||||
gitleaks_task_result,
|
||||
return_exceptions=True
|
||||
)
|
||||
|
||||
# Handle any exceptions
|
||||
if isinstance(trufflehog_results, Exception):
|
||||
logger.error(f"TruffleHog failed: {trufflehog_results}")
|
||||
trufflehog_results = {"findings": [], "status": "failed"}
|
||||
|
||||
if isinstance(gitleaks_results, Exception):
|
||||
logger.error(f"Gitleaks failed: {gitleaks_results}")
|
||||
gitleaks_results = {"findings": [], "status": "failed"}
|
||||
|
||||
# Aggregate findings
|
||||
logger.info("Phase 2: Aggregating findings")
|
||||
sarif_report = await aggregate_findings_task(
|
||||
trufflehog_results,
|
||||
gitleaks_results,
|
||||
reporter_config,
|
||||
workspace
|
||||
)
|
||||
|
||||
# Log summary
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
results_count = len(sarif_report["runs"][0].get("results", []))
|
||||
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
|
||||
|
||||
# Log tool-specific stats
|
||||
trufflehog_count = len(trufflehog_results.get("findings", []))
|
||||
gitleaks_count = len(gitleaks_results.get("findings", []))
|
||||
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
|
||||
else:
|
||||
logger.info("Workflow completed successfully with no findings")
|
||||
|
||||
return sarif_report
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Secret detection workflow failed: {e}")
|
||||
# Return error in SARIF format
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "FuzzForge Secret Detection",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e)
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For local testing
|
||||
import asyncio
|
||||
|
||||
asyncio.run(main_flow(
|
||||
target_path="/tmp/test",
|
||||
trufflehog_config={"verify": True, "max_depth": 5},
|
||||
gitleaks_config={"scan_mode": "detect"}
|
||||
))
|
||||
@@ -0,0 +1,113 @@
|
||||
name: ossfuzz_campaign
|
||||
version: "1.0.0"
|
||||
vertical: ossfuzz
|
||||
description: "Generic OSS-Fuzz fuzzing campaign. Automatically reads project configuration from OSS-Fuzz repo and runs fuzzing using Google's infrastructure."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "oss-fuzz"
|
||||
- "libfuzzer"
|
||||
- "afl"
|
||||
- "honggfuzz"
|
||||
- "memory-safety"
|
||||
- "security"
|
||||
|
||||
# Workspace isolation mode
|
||||
# OSS-Fuzz campaigns use isolated mode for safe concurrent campaigns
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
project_name: null
|
||||
campaign_duration_hours: 1
|
||||
override_engine: null
|
||||
override_sanitizer: null
|
||||
max_iterations: null
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
required:
|
||||
- project_name
|
||||
properties:
|
||||
project_name:
|
||||
type: string
|
||||
description: "OSS-Fuzz project name (e.g., 'curl', 'sqlite3', 'libxml2')"
|
||||
examples:
|
||||
- "curl"
|
||||
- "sqlite3"
|
||||
- "libxml2"
|
||||
- "openssl"
|
||||
- "zlib"
|
||||
|
||||
campaign_duration_hours:
|
||||
type: integer
|
||||
default: 1
|
||||
minimum: 1
|
||||
maximum: 168 # 1 week max
|
||||
description: "How many hours to run the fuzzing campaign"
|
||||
|
||||
override_engine:
|
||||
type: string
|
||||
enum: ["libfuzzer", "afl", "honggfuzz"]
|
||||
description: "Override fuzzing engine from project.yaml (optional)"
|
||||
|
||||
override_sanitizer:
|
||||
type: string
|
||||
enum: ["address", "memory", "undefined", "dataflow"]
|
||||
description: "Override sanitizer from project.yaml (optional)"
|
||||
|
||||
max_iterations:
|
||||
type: integer
|
||||
minimum: 1000
|
||||
description: "Optional limit on fuzzing iterations (optional)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
project_name:
|
||||
type: string
|
||||
description: "OSS-Fuzz project that was fuzzed"
|
||||
|
||||
summary:
|
||||
type: object
|
||||
description: "Campaign execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
unique_crashes:
|
||||
type: integer
|
||||
duration_hours:
|
||||
type: number
|
||||
engine_used:
|
||||
type: string
|
||||
sanitizer_used:
|
||||
type: string
|
||||
|
||||
crashes:
|
||||
type: array
|
||||
description: "List of crash file paths"
|
||||
items:
|
||||
type: string
|
||||
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted crash reports (future)"
|
||||
|
||||
examples:
|
||||
- name: "Fuzz curl for 1 hour"
|
||||
parameters:
|
||||
project_name: "curl"
|
||||
campaign_duration_hours: 1
|
||||
|
||||
- name: "Fuzz sqlite3 with AFL"
|
||||
parameters:
|
||||
project_name: "sqlite3"
|
||||
campaign_duration_hours: 2
|
||||
override_engine: "afl"
|
||||
|
||||
- name: "Fuzz libxml2 with memory sanitizer"
|
||||
parameters:
|
||||
project_name: "libxml2"
|
||||
campaign_duration_hours: 6
|
||||
override_sanitizer: "memory"
|
||||
@@ -0,0 +1,219 @@
|
||||
"""
|
||||
OSS-Fuzz Campaign Workflow - Temporal Version
|
||||
|
||||
Generic workflow for running OSS-Fuzz campaigns using Google's infrastructure.
|
||||
Automatically reads project configuration from OSS-Fuzz project.yaml files.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class OssfuzzCampaignWorkflow:
|
||||
"""
|
||||
Generic OSS-Fuzz fuzzing campaign workflow.
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run ossfuzz_campaign . project_name=curl
|
||||
2. Worker loads project config from OSS-Fuzz repo
|
||||
3. Worker builds project using OSS-Fuzz's build system
|
||||
4. Worker runs fuzzing with engines from project.yaml
|
||||
5. Crashes and corpus reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # Required by FuzzForge (not used, OSS-Fuzz downloads from Google)
|
||||
project_name: str, # Required: OSS-Fuzz project name (e.g., "curl", "sqlite3")
|
||||
campaign_duration_hours: int = 1,
|
||||
override_engine: Optional[str] = None, # Override engine from project.yaml
|
||||
override_sanitizer: Optional[str] = None, # Override sanitizer from project.yaml
|
||||
max_iterations: Optional[int] = None # Optional: limit fuzzing iterations
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of uploaded target (not used, required by FuzzForge)
|
||||
project_name: Name of OSS-Fuzz project (e.g., "curl", "sqlite3", "libxml2")
|
||||
campaign_duration_hours: How many hours to fuzz (default: 1)
|
||||
override_engine: Override fuzzing engine from project.yaml
|
||||
override_sanitizer: Override sanitizer from project.yaml
|
||||
max_iterations: Optional limit on fuzzing iterations
|
||||
|
||||
Returns:
|
||||
Dictionary containing crashes, stats, and SARIF report
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting OSS-Fuzz Campaign for project '{project_name}' "
|
||||
f"(workflow_id={workflow_id}, duration={campaign_duration_hours}h)"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"project_name": project_name,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Step 1: Load OSS-Fuzz project configuration
|
||||
workflow.logger.info(f"Step 1: Loading project config for '{project_name}'")
|
||||
project_config = await workflow.execute_activity(
|
||||
"load_ossfuzz_project",
|
||||
args=[project_name],
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "load_config",
|
||||
"status": "success",
|
||||
"language": project_config.get("language"),
|
||||
"engines": project_config.get("fuzzing_engines", []),
|
||||
"sanitizers": project_config.get("sanitizers", [])
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Loaded config: language={project_config.get('language')}, "
|
||||
f"engines={project_config.get('fuzzing_engines')}"
|
||||
)
|
||||
|
||||
# Step 2: Build project using OSS-Fuzz infrastructure
|
||||
workflow.logger.info(f"Step 2: Building project '{project_name}'")
|
||||
|
||||
build_result = await workflow.execute_activity(
|
||||
"build_ossfuzz_project",
|
||||
args=[
|
||||
project_name,
|
||||
project_config,
|
||||
override_sanitizer,
|
||||
override_engine
|
||||
],
|
||||
start_to_close_timeout=timedelta(minutes=30),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "build_project",
|
||||
"status": "success",
|
||||
"fuzz_targets": len(build_result.get("fuzz_targets", [])),
|
||||
"sanitizer": build_result.get("sanitizer_used"),
|
||||
"engine": build_result.get("engine_used")
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Build completed: {len(build_result.get('fuzz_targets', []))} fuzz targets found"
|
||||
)
|
||||
|
||||
if not build_result.get("fuzz_targets"):
|
||||
raise Exception(f"No fuzz targets found for project {project_name}")
|
||||
|
||||
# Step 3: Run fuzzing on discovered targets
|
||||
workflow.logger.info(f"Step 3: Fuzzing {len(build_result['fuzz_targets'])} targets")
|
||||
|
||||
# Determine which engine to use
|
||||
engine_to_use = override_engine if override_engine else build_result["engine_used"]
|
||||
duration_seconds = campaign_duration_hours * 3600
|
||||
|
||||
# Fuzz each target (in parallel if multiple targets)
|
||||
fuzz_futures = []
|
||||
for target_path in build_result["fuzz_targets"]:
|
||||
future = workflow.execute_activity(
|
||||
"fuzz_target",
|
||||
args=[target_path, engine_to_use, duration_seconds, None, None],
|
||||
start_to_close_timeout=timedelta(seconds=duration_seconds + 300),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
fuzz_futures.append(future)
|
||||
|
||||
# Wait for all fuzzing to complete
|
||||
fuzz_results = await asyncio.gather(*fuzz_futures, return_exceptions=True)
|
||||
|
||||
# Aggregate results
|
||||
total_execs = 0
|
||||
total_crashes = 0
|
||||
all_crashes = []
|
||||
|
||||
for i, result in enumerate(fuzz_results):
|
||||
if isinstance(result, Exception):
|
||||
workflow.logger.error(f"Fuzzing failed for target {i}: {result}")
|
||||
continue
|
||||
|
||||
total_execs += result.get("total_executions", 0)
|
||||
total_crashes += result.get("crashes", 0)
|
||||
all_crashes.extend(result.get("crash_files", []))
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"total_executions": total_execs,
|
||||
"crashes_found": total_crashes,
|
||||
"targets_fuzzed": len(build_result["fuzz_targets"])
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: {total_execs} executions, {total_crashes} crashes"
|
||||
)
|
||||
|
||||
# Step 4: Generate SARIF report
|
||||
workflow.logger.info("Step 4: Generating SARIF report")
|
||||
|
||||
# TODO: Implement crash minimization and SARIF generation
|
||||
# For now, return raw results
|
||||
|
||||
results["status"] = "success"
|
||||
results["summary"] = {
|
||||
"project": project_name,
|
||||
"total_executions": total_execs,
|
||||
"crashes_found": total_crashes,
|
||||
"unique_crashes": len(set(all_crashes)),
|
||||
"duration_hours": campaign_duration_hours,
|
||||
"engine_used": engine_to_use,
|
||||
"sanitizer_used": build_result.get("sanitizer_used")
|
||||
}
|
||||
results["crashes"] = all_crashes[:100] # Limit to first 100 crashes
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Campaign completed: {project_name} - "
|
||||
f"{total_execs} execs, {total_crashes} crashes"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -1,187 +0,0 @@
|
||||
"""
|
||||
Manual Workflow Registry for Prefect Deployment
|
||||
|
||||
This file contains the manual registry of all workflows that can be deployed.
|
||||
Developers MUST add their workflows here after creating them.
|
||||
|
||||
This approach is required because:
|
||||
1. Prefect cannot deploy dynamically imported flows
|
||||
2. Docker deployment needs static flow references
|
||||
3. Explicit registration provides better control and visibility
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from typing import Dict, Any, Callable
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Import only essential workflows
|
||||
# Import each workflow individually to handle failures gracefully
|
||||
security_assessment_flow = None
|
||||
secret_detection_flow = None
|
||||
|
||||
# Try to import each workflow individually
|
||||
try:
|
||||
from .security_assessment.workflow import main_flow as security_assessment_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import security_assessment workflow: {e}")
|
||||
|
||||
try:
|
||||
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
|
||||
|
||||
|
||||
# Manual registry - developers add workflows here after creation
|
||||
# Only include workflows that were successfully imported
|
||||
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
# Add workflows that were successfully imported
|
||||
if security_assessment_flow is not None:
|
||||
WORKFLOW_REGISTRY["security_assessment"] = {
|
||||
"flow": security_assessment_flow,
|
||||
"module_path": "toolbox.workflows.security_assessment.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
|
||||
}
|
||||
|
||||
if secret_detection_flow is not None:
|
||||
WORKFLOW_REGISTRY["secret_detection_scan"] = {
|
||||
"flow": secret_detection_flow,
|
||||
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
|
||||
}
|
||||
|
||||
#
|
||||
# To add a new workflow, follow this pattern:
|
||||
#
|
||||
# "my_new_workflow": {
|
||||
# "flow": my_new_flow_function, # Import the flow function above
|
||||
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
|
||||
# "function_name": "my_new_flow_function",
|
||||
# "description": "Description of what this workflow does",
|
||||
# "version": "1.0.0",
|
||||
# "author": "Developer Name",
|
||||
# "tags": ["tag1", "tag2"]
|
||||
# }
|
||||
|
||||
|
||||
def get_workflow_flow(workflow_name: str) -> Callable:
|
||||
"""
|
||||
Get the flow function for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Flow function
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}. "
|
||||
f"Please add the workflow to toolbox/workflows/registry.py"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]["flow"]
|
||||
|
||||
|
||||
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get registry information for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Registry information dictionary
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]
|
||||
|
||||
|
||||
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
|
||||
"""
|
||||
Get all registered workflows.
|
||||
|
||||
Returns:
|
||||
Dictionary of all workflow registry entries
|
||||
"""
|
||||
return WORKFLOW_REGISTRY.copy()
|
||||
|
||||
|
||||
def validate_registry() -> bool:
|
||||
"""
|
||||
Validate the workflow registry for consistency.
|
||||
|
||||
Returns:
|
||||
True if valid, raises exceptions if not
|
||||
|
||||
Raises:
|
||||
ValueError: If registry is invalid
|
||||
"""
|
||||
if not WORKFLOW_REGISTRY:
|
||||
raise ValueError("Workflow registry is empty")
|
||||
|
||||
required_fields = ["flow", "module_path", "function_name", "description"]
|
||||
|
||||
for name, entry in WORKFLOW_REGISTRY.items():
|
||||
# Check required fields
|
||||
missing_fields = [field for field in required_fields if field not in entry]
|
||||
if missing_fields:
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' missing required fields: {missing_fields}"
|
||||
)
|
||||
|
||||
# Check if flow is callable
|
||||
if not callable(entry["flow"]):
|
||||
raise ValueError(f"Workflow '{name}' flow is not callable")
|
||||
|
||||
# Check if flow has the required Prefect attributes
|
||||
if not hasattr(entry["flow"], "deploy"):
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
|
||||
)
|
||||
|
||||
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
|
||||
return True
|
||||
|
||||
|
||||
# Validate registry on import
|
||||
try:
|
||||
validate_registry()
|
||||
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow registry validation failed: {e}")
|
||||
raise
|
||||
@@ -1,30 +0,0 @@
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Create toolbox directory structure to match expected import paths
|
||||
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
|
||||
|
||||
# Copy base module infrastructure
|
||||
COPY modules/__init__.py /app/toolbox/modules/
|
||||
COPY modules/base.py /app/toolbox/modules/
|
||||
|
||||
# Copy only required modules (manual selection)
|
||||
COPY modules/scanner /app/toolbox/modules/scanner
|
||||
COPY modules/analyzer /app/toolbox/modules/analyzer
|
||||
COPY modules/reporter /app/toolbox/modules/reporter
|
||||
|
||||
# Copy this workflow
|
||||
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
|
||||
|
||||
# Install workflow-specific requirements if they exist
|
||||
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
|
||||
|
||||
# Install common requirements
|
||||
RUN pip install --no-cache-dir pyyaml
|
||||
|
||||
# Set Python path
|
||||
ENV PYTHONPATH=/app:$PYTHONPATH
|
||||
|
||||
# Create workspace directory
|
||||
RUN mkdir -p /workspace
|
||||
@@ -0,0 +1,150 @@
|
||||
"""
|
||||
Security Assessment Workflow Activities
|
||||
|
||||
Activities specific to the security assessment workflow:
|
||||
- scan_files_activity: Scan files in the workspace
|
||||
- analyze_security_activity: Analyze security vulnerabilities
|
||||
- generate_sarif_report_activity: Generate SARIF report from findings
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="scan_files")
|
||||
async def scan_files_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Scan files in the workspace.
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory
|
||||
config: Scanner configuration
|
||||
|
||||
Returns:
|
||||
Scanner results dictionary
|
||||
"""
|
||||
logger.info(f"Activity: scan_files (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
from modules.scanner import FileScanner
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
scanner = FileScanner()
|
||||
result = await scanner.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ File scanning completed: "
|
||||
f"{result.summary.get('total_files', 0)} files scanned"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"File scanning failed: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@activity.defn(name="analyze_security")
|
||||
async def analyze_security_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Analyze security vulnerabilities in the workspace.
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory
|
||||
config: Analyzer configuration
|
||||
|
||||
Returns:
|
||||
Analysis results dictionary
|
||||
"""
|
||||
logger.info(f"Activity: analyze_security (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
from modules.analyzer import SecurityAnalyzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
analyzer = SecurityAnalyzer()
|
||||
result = await analyzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ Security analysis completed: "
|
||||
f"{result.summary.get('total_findings', 0)} findings"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Security analysis failed: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@activity.defn(name="generate_sarif_report")
|
||||
async def generate_sarif_report_activity(
|
||||
scan_results: dict,
|
||||
analysis_results: dict,
|
||||
config: dict,
|
||||
workspace_path: str
|
||||
) -> dict:
|
||||
"""
|
||||
Generate SARIF report from scan and analysis results.
|
||||
|
||||
Args:
|
||||
scan_results: Results from file scanner
|
||||
analysis_results: Results from security analyzer
|
||||
config: Reporter configuration
|
||||
workspace_path: Path to the workspace
|
||||
|
||||
Returns:
|
||||
SARIF report dictionary
|
||||
"""
|
||||
logger.info("Activity: generate_sarif_report")
|
||||
|
||||
try:
|
||||
from modules.reporter import SARIFReporter
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
|
||||
# Combine findings from all modules
|
||||
all_findings = []
|
||||
|
||||
# Add scanner findings (only sensitive files, not all files)
|
||||
scanner_findings = scan_results.get("findings", [])
|
||||
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
|
||||
all_findings.extend(sensitive_findings)
|
||||
|
||||
# Add analyzer findings
|
||||
analyzer_findings = analysis_results.get("findings", [])
|
||||
all_findings.extend(analyzer_findings)
|
||||
|
||||
# Prepare reporter config
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": all_findings,
|
||||
"tool_name": "FuzzForge Security Assessment",
|
||||
"tool_version": "1.0.0"
|
||||
}
|
||||
|
||||
reporter = SARIFReporter()
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
|
||||
# Extract SARIF from result
|
||||
sarif = result.dict().get("sarif", {})
|
||||
|
||||
logger.info(f"✓ SARIF report generated with {len(all_findings)} findings")
|
||||
return sarif
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"SARIF report generation failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -1,8 +1,8 @@
|
||||
name: security_assessment
|
||||
version: "2.0.0"
|
||||
vertical: rust
|
||||
description: "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "security"
|
||||
- "scanner"
|
||||
@@ -11,28 +11,14 @@ tags:
|
||||
- "sarif"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "file_scanner"
|
||||
- "security_analyzer"
|
||||
- "sarif_reporter"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
# Using "shared" mode for read-only security analysis (no file modifications)
|
||||
workspace_isolation: "shared"
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace"
|
||||
volume_mode: "ro"
|
||||
scanner_config: {}
|
||||
analyzer_config: {}
|
||||
reporter_config: {}
|
||||
@@ -40,15 +26,6 @@ default_parameters:
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
scanner_config:
|
||||
type: object
|
||||
description: "File scanner configuration"
|
||||
|
||||
@@ -1,4 +0,0 @@
|
||||
# Requirements for security assessment workflow
|
||||
pydantic>=2.0.0
|
||||
pyyaml>=6.0
|
||||
aiofiles>=23.0.0
|
||||
@@ -1,5 +1,7 @@
|
||||
"""
|
||||
Security Assessment Workflow - Comprehensive security analysis using multiple modules
|
||||
Security Assessment Workflow - Temporal Version
|
||||
|
||||
Comprehensive security analysis using multiple modules.
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
@@ -13,240 +15,219 @@ Security Assessment Workflow - Comprehensive security analysis using multiple mo
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
from prefect import flow, task
|
||||
import json
|
||||
|
||||
# Add modules to path
|
||||
sys.path.insert(0, '/app')
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import modules
|
||||
from toolbox.modules.scanner import FileScanner
|
||||
from toolbox.modules.analyzer import SecurityAnalyzer
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
# Import activity interfaces (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@task(name="file_scanning")
|
||||
async def scan_files_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@workflow.defn
|
||||
class SecurityAssessmentWorkflow:
|
||||
"""
|
||||
Task to scan files in the workspace.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Scanner configuration
|
||||
|
||||
Returns:
|
||||
Scanner results
|
||||
"""
|
||||
logger.info(f"Starting file scanning in {workspace}")
|
||||
scanner = FileScanner()
|
||||
|
||||
result = await scanner.execute(config, workspace)
|
||||
|
||||
logger.info(f"File scanning completed: {result.summary.get('total_files', 0)} files found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="security_analysis")
|
||||
async def analyze_security_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to analyze security vulnerabilities.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Analyzer configuration
|
||||
|
||||
Returns:
|
||||
Analysis results
|
||||
"""
|
||||
logger.info("Starting security analysis")
|
||||
analyzer = SecurityAnalyzer()
|
||||
|
||||
result = await analyzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"Security analysis completed: {result.summary.get('total_findings', 0)} findings"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="report_generation")
|
||||
async def generate_report_task(
|
||||
scan_results: Dict[str, Any],
|
||||
analysis_results: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to generate SARIF report from all findings.
|
||||
|
||||
Args:
|
||||
scan_results: Results from scanner
|
||||
analysis_results: Results from analyzer
|
||||
config: Reporter configuration
|
||||
workspace: Path to the workspace
|
||||
|
||||
Returns:
|
||||
SARIF report
|
||||
"""
|
||||
logger.info("Generating SARIF report")
|
||||
reporter = SARIFReporter()
|
||||
|
||||
# Combine findings from all modules
|
||||
all_findings = []
|
||||
|
||||
# Add scanner findings (only sensitive files, not all files)
|
||||
scanner_findings = scan_results.get("findings", [])
|
||||
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
|
||||
all_findings.extend(sensitive_findings)
|
||||
|
||||
# Add analyzer findings
|
||||
analyzer_findings = analysis_results.get("findings", [])
|
||||
all_findings.extend(analyzer_findings)
|
||||
|
||||
# Prepare reporter config
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": all_findings,
|
||||
"tool_name": "FuzzForge Security Assessment",
|
||||
"tool_version": "1.0.0"
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
|
||||
# Extract SARIF from result
|
||||
sarif = result.dict().get("sarif", {})
|
||||
|
||||
logger.info(f"Report generated with {len(all_findings)} total findings")
|
||||
return sarif
|
||||
|
||||
|
||||
@flow(name="security_assessment", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
volume_mode: str = "ro",
|
||||
scanner_config: Optional[Dict[str, Any]] = None,
|
||||
analyzer_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main security assessment workflow.
|
||||
Comprehensive security assessment workflow.
|
||||
|
||||
This workflow:
|
||||
1. Scans files in the workspace
|
||||
2. Analyzes code for security vulnerabilities
|
||||
3. Generates a SARIF report with all findings
|
||||
|
||||
Args:
|
||||
target_path: Path to the mounted workspace (default: /workspace)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
scanner_config: Configuration for file scanner
|
||||
analyzer_config: Configuration for security analyzer
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings report
|
||||
1. Downloads target from MinIO
|
||||
2. Scans files in the workspace
|
||||
3. Analyzes code for security vulnerabilities
|
||||
4. Generates a SARIF report with all findings
|
||||
5. Uploads results to MinIO
|
||||
6. Cleans up cache
|
||||
"""
|
||||
logger.info(f"Starting security assessment workflow")
|
||||
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
|
||||
|
||||
# Set workspace path
|
||||
workspace = Path(target_path)
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str,
|
||||
scanner_config: Optional[Dict[str, Any]] = None,
|
||||
analyzer_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
if not workspace.exists():
|
||||
logger.error(f"Workspace does not exist: {workspace}")
|
||||
return {
|
||||
"error": f"Workspace not found: {workspace}",
|
||||
"sarif": None
|
||||
}
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
scanner_config: Configuration for file scanner
|
||||
analyzer_config: Configuration for security analyzer
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
# Default configurations
|
||||
if not scanner_config:
|
||||
scanner_config = {
|
||||
"patterns": ["*"],
|
||||
"check_sensitive": True,
|
||||
"calculate_hashes": False,
|
||||
"max_file_size": 10485760 # 10MB
|
||||
}
|
||||
Returns:
|
||||
Dictionary containing SARIF report and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
if not analyzer_config:
|
||||
analyzer_config = {
|
||||
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
|
||||
"check_secrets": True,
|
||||
"check_sql": True,
|
||||
"check_dangerous_functions": True
|
||||
}
|
||||
|
||||
if not reporter_config:
|
||||
reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
|
||||
try:
|
||||
# Execute workflow tasks
|
||||
logger.info("Phase 1: File scanning")
|
||||
scan_results = await scan_files_task(workspace, scanner_config)
|
||||
|
||||
logger.info("Phase 2: Security analysis")
|
||||
analysis_results = await analyze_security_task(workspace, analyzer_config)
|
||||
|
||||
logger.info("Phase 3: Report generation")
|
||||
sarif_report = await generate_report_task(
|
||||
scan_results,
|
||||
analysis_results,
|
||||
reporter_config,
|
||||
workspace
|
||||
workflow.logger.info(
|
||||
f"Starting SecurityAssessmentWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id})"
|
||||
)
|
||||
|
||||
# Log summary
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
results_count = len(sarif_report["runs"][0].get("results", []))
|
||||
logger.info(f"Workflow completed successfully with {results_count} findings")
|
||||
else:
|
||||
logger.info("Workflow completed successfully")
|
||||
# Default configurations
|
||||
if not scanner_config:
|
||||
scanner_config = {
|
||||
"patterns": ["*"],
|
||||
"check_sensitive": True,
|
||||
"calculate_hashes": False,
|
||||
"max_file_size": 10485760 # 10MB
|
||||
}
|
||||
|
||||
return sarif_report
|
||||
if not analyzer_config:
|
||||
analyzer_config = {
|
||||
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
|
||||
"check_secrets": True,
|
||||
"check_sql": True,
|
||||
"check_dangerous_functions": True
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow failed: {e}")
|
||||
# Return error in SARIF format
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "FuzzForge Security Assessment",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e)
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
if not reporter_config:
|
||||
reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation (using shared mode for read-only analysis)
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For local testing
|
||||
import asyncio
|
||||
# Step 1: Download target from MinIO
|
||||
workflow.logger.info("Step 1: Downloading target from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ Target downloaded to: {target_path}")
|
||||
|
||||
asyncio.run(main_flow(
|
||||
target_path="/tmp/test",
|
||||
scanner_config={"patterns": ["*.py"]},
|
||||
analyzer_config={"check_secrets": True}
|
||||
))
|
||||
# Step 2: File scanning
|
||||
workflow.logger.info("Step 2: Scanning files")
|
||||
scan_results = await workflow.execute_activity(
|
||||
"scan_files",
|
||||
args=[target_path, scanner_config],
|
||||
start_to_close_timeout=timedelta(minutes=10),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "file_scanning",
|
||||
"status": "success",
|
||||
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ File scanning completed: "
|
||||
f"{scan_results.get('summary', {}).get('total_files', 0)} files"
|
||||
)
|
||||
|
||||
# Step 3: Security analysis
|
||||
workflow.logger.info("Step 3: Analyzing security vulnerabilities")
|
||||
analysis_results = await workflow.execute_activity(
|
||||
"analyze_security",
|
||||
args=[target_path, analyzer_config],
|
||||
start_to_close_timeout=timedelta(minutes=15),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "security_analysis",
|
||||
"status": "success",
|
||||
"findings": analysis_results.get("summary", {}).get("total_findings", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Security analysis completed: "
|
||||
f"{analysis_results.get('summary', {}).get('total_findings', 0)} findings"
|
||||
)
|
||||
|
||||
# Step 4: Generate SARIF report
|
||||
workflow.logger.info("Step 4: Generating SARIF report")
|
||||
sarif_report = await workflow.execute_activity(
|
||||
"generate_sarif_report",
|
||||
args=[scan_results, analysis_results, reporter_config, target_path],
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "report_generation",
|
||||
"status": "success"
|
||||
})
|
||||
|
||||
# Count total findings in SARIF
|
||||
total_findings = 0
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
total_findings = len(sarif_report["runs"][0].get("results", []))
|
||||
|
||||
workflow.logger.info(f"✓ SARIF report generated with {total_findings} findings")
|
||||
|
||||
# Step 5: Upload results to MinIO
|
||||
workflow.logger.info("Step 5: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, sarif_report, "sarif"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 6: Cleanup cache
|
||||
workflow.logger.info("Step 6: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "shared"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up (skipped for shared mode)")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["sarif"] = sarif_report
|
||||
results["summary"] = {
|
||||
"total_findings": total_findings,
|
||||
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
|
||||
}
|
||||
workflow.logger.info(f"✓ Workflow completed successfully: {workflow_id}")
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
|
||||
Reference in New Issue
Block a user