CI/CD Integration with Ephemeral Deployment Model (#14)

* feat: Complete migration from Prefect to Temporal

BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal

## Major Changes
- Replace Prefect with Temporal for workflow orchestration
- Implement vertical worker architecture (rust, android)
- Replace Docker registry with MinIO for unified storage
- Refactor activities to be co-located with workflows
- Update all API endpoints for Temporal compatibility

## Infrastructure
- New: docker-compose.temporal.yaml (Temporal + MinIO + workers)
- New: workers/ directory with rust and android vertical workers
- New: backend/src/temporal/ (manager, discovery)
- New: backend/src/storage/ (S3-cached storage with MinIO)
- New: backend/toolbox/common/ (shared storage activities)
- Deleted: docker-compose.yaml (old Prefect setup)
- Deleted: backend/src/core/prefect_manager.py
- Deleted: backend/src/services/prefect_stats_monitor.py
- Deleted: Docker registry and insecure-registries requirement

## Workflows
- Migrated: security_assessment workflow to Temporal
- New: rust_test workflow (example/test workflow)
- Deleted: secret_detection_scan (Prefect-based, to be reimplemented)
- Activities now co-located with workflows for independent testing

## API Changes
- Updated: backend/src/api/workflows.py (Temporal submission)
- Updated: backend/src/api/runs.py (Temporal status/results)
- Updated: backend/src/main.py (727 lines, TemporalManager integration)
- Updated: All 16 MCP tools to use TemporalManager

## Testing
-  All services healthy (Temporal, PostgreSQL, MinIO, workers, backend)
-  All API endpoints functional
-  End-to-end workflow test passed (72 findings from vulnerable_app)
-  MinIO storage integration working (target upload/download, results)
-  Worker activity discovery working (6 activities registered)
-  Tarball extraction working
-  SARIF report generation working

## Documentation
- ARCHITECTURE.md: Complete Temporal architecture documentation
- QUICKSTART_TEMPORAL.md: Getting started guide
- MIGRATION_DECISION.md: Why we chose Temporal over Prefect
- IMPLEMENTATION_STATUS.md: Migration progress tracking
- workers/README.md: Worker development guide

## Dependencies
- Added: temporalio>=1.6.0
- Added: boto3>=1.34.0 (MinIO S3 client)
- Removed: prefect>=3.4.18

* feat: Add Python fuzzing vertical with Atheris integration

This commit implements a complete Python fuzzing workflow using Atheris:

## Python Worker (workers/python/)
- Dockerfile with Python 3.11, Atheris, and build tools
- Generic worker.py for dynamic workflow discovery
- requirements.txt with temporalio, boto3, atheris dependencies
- Added to docker-compose.temporal.yaml with dedicated cache volume

## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/)
- Reusable module extending BaseModule
- Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py)
- Recursive search to find targets in nested directories
- Dynamically loads TestOneInput() function
- Configurable max_iterations and timeout
- Real-time stats callback support for live monitoring
- Returns findings as ModuleFinding objects

## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/)
- Temporal workflow for orchestrating fuzzing
- Downloads user code from MinIO
- Executes AtherisFuzzer module
- Uploads results to MinIO
- Cleans up cache after execution
- metadata.yaml with vertical: python for routing

## Test Project (test_projects/python_fuzz_waterfall/)
- Demonstrates stateful waterfall vulnerability
- main.py with check_secret() that leaks progress
- fuzz_target.py with Atheris TestOneInput() harness
- Complete README with usage instructions

## Backend Fixes
- Fixed parameter merging in REST API endpoints (workflows.py)
- Changed workflow parameter passing from positional args to kwargs (manager.py)
- Default parameters now properly merged with user parameters

## Testing
 Worker discovered AtherisFuzzingWorkflow
 Workflow executed end-to-end successfully
 Fuzz target auto-discovered in nested directories
 Atheris ran 100,000 iterations
 Results uploaded and cache cleaned

* chore: Complete Temporal migration with updated CLI/SDK/docs

This commit includes all remaining Temporal migration changes:

## CLI Updates (cli/)
- Updated workflow execution commands for Temporal
- Enhanced error handling and exceptions
- Updated dependencies in uv.lock

## SDK Updates (sdk/)
- Client methods updated for Temporal workflows
- Updated models for new workflow execution
- Updated dependencies in uv.lock

## Documentation Updates (docs/)
- Architecture documentation for Temporal
- Workflow concept documentation
- Resource management documentation (new)
- Debugging guide (new)
- Updated tutorials and how-to guides
- Troubleshooting updates

## README Updates
- Main README with Temporal instructions
- Backend README
- CLI README
- SDK README

## Other
- Updated IMPLEMENTATION_STATUS.md
- Removed old vulnerable_app.tar.gz

These changes complete the Temporal migration and ensure the
CLI/SDK work correctly with the new backend.

* fix: Use positional args instead of kwargs for Temporal workflows

The Temporal Python SDK's start_workflow() method doesn't accept
a 'kwargs' parameter. Workflows must receive parameters as positional
arguments via the 'args' parameter.

Changed from:
  args=workflow_args  # Positional arguments

This fixes the error:
  TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs'

Workflows now correctly receive parameters in order:
- security_assessment: [target_id, scanner_config, analyzer_config, reporter_config]
- atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds]
- rust_test: [target_id, test_message]

* fix: Filter metadata-only parameters from workflow arguments

SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5.
The issue was that target_path and volume_mode from default_parameters
were being passed to the workflow, when they should only be used by
the system for configuration.

Now filters out metadata-only parameters (target_path, volume_mode)
before passing arguments to workflow execution.

* refactor: Remove Prefect leftovers and volume mounting legacy

Complete cleanup of Prefect migration artifacts:

Backend:
- Delete registry.py and workflow_discovery.py (Prefect-specific files)
- Remove Docker validation from setup.py (no longer needed)
- Remove ResourceLimits and VolumeMount models
- Remove target_path and volume_mode from WorkflowSubmission
- Remove supported_volume_modes from API and discovery
- Clean up metadata.yaml files (remove volume/path fields)
- Simplify parameter filtering in manager.py

SDK:
- Remove volume_mode parameter from client methods
- Remove ResourceLimits and VolumeMount models
- Remove Prefect error patterns from docker_logs.py
- Clean up WorkflowSubmission and WorkflowMetadata models

CLI:
- Remove Volume Modes display from workflow info

All removed features are Prefect-specific or Docker volume mounting
artifacts. Temporal workflows use MinIO storage exclusively.

* feat: Add comprehensive test suite and benchmark infrastructure

- Add 68 unit tests for fuzzer, scanner, and analyzer modules
- Implement pytest-based test infrastructure with fixtures
- Add 6 performance benchmarks with category-specific thresholds
- Configure GitHub Actions for automated testing and benchmarking
- Add test and benchmark documentation

Test coverage:
- AtherisFuzzer: 8 tests
- CargoFuzzer: 14 tests
- FileScanner: 22 tests
- SecurityAnalyzer: 24 tests

All tests passing (68/68)
All benchmarks passing (6/6)

* fix: Resolve all ruff linting violations across codebase

Fixed 27 ruff violations in 12 files:
- Removed unused imports (Depends, Dict, Any, Optional, etc.)
- Fixed undefined workflow_info variable in workflows.py
- Removed dead code with undefined variables in atheris_fuzzer.py
- Changed f-string to regular string where no placeholders used

All files now pass ruff checks for CI/CD compliance.

* fix: Configure CI for unit tests only

- Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility
- Commented out integration-tests job (no integration tests yet)
- Updated test-summary to only depend on lint and unit-tests

CI will now run successfully with 68 unit tests. Integration tests can be added later.

* feat: Add CI/CD integration with ephemeral deployment model

Implements comprehensive CI/CD support for FuzzForge with on-demand worker management:

**Worker Management (v0.7.0)**
- Add WorkerManager for automatic worker lifecycle control
- Auto-start workers from stopped state when workflows execute
- Auto-stop workers after workflow completion
- Health checks and startup timeout handling (90s default)

**CI/CD Features**
- `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info)
- `--export-sarif` flag: Export findings in SARIF 2.1.0 format
- `--auto-start`/`--auto-stop` flags: Control worker lifecycle
- Exit code propagation: Returns 1 on blocking findings, 0 on success

**Exit Code Fix**
- Add `except typer.Exit: raise` handlers at 3 critical locations
- Move worker cleanup to finally block for guaranteed execution
- Exit codes now propagate correctly even when build fails

**CI Scripts & Examples**
- ci-start.sh: Start FuzzForge services with health checks
- ci-stop.sh: Clean shutdown with volume preservation option
- GitHub Actions workflow example (security-scan.yml)
- GitLab CI pipeline example (.gitlab-ci.example.yml)
- docker-compose.ci.yml: CI-optimized compose file with profiles

**OSS-Fuzz Integration**
- New ossfuzz_campaign workflow for running OSS-Fuzz projects
- OSS-Fuzz worker with Docker-in-Docker support
- Configurable campaign duration and project selection

**Documentation**
- Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md)
- Updated architecture docs with worker lifecycle details
- Updated workspace isolation documentation
- CLI README with worker management examples

**SDK Enhancements**
- Add get_workflow_worker_info() endpoint
- Worker vertical metadata in workflow responses

**Testing**
- All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing
- All monitoring commands tested: stats, crashes, status, finding
- Full CI pipeline simulation verified
- Exit codes verified for success/failure scenarios

Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers.

* fix: Resolve ruff linting violations in CI/CD code

- Remove unused variables (run_id, defaults, result)
- Remove unused imports
- Fix f-string without placeholders

All CI/CD integration files now pass ruff checks.
This commit is contained in:
tduhamel42
2025-10-14 10:13:45 +02:00
committed by GitHub
parent 987c49569c
commit 60ca088ecf
167 changed files with 26101 additions and 5703 deletions
@@ -0,0 +1,369 @@
"""
FuzzForge Common Storage Activities
Activities for interacting with MinIO storage:
- get_target_activity: Download target from MinIO to local cache
- cleanup_cache_activity: Remove target from local cache
- upload_results_activity: Upload workflow results to MinIO
"""
import logging
import os
import shutil
from pathlib import Path
import boto3
from botocore.exceptions import ClientError
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Initialize S3 client (MinIO)
s3_client = boto3.client(
's3',
endpoint_url=os.getenv('S3_ENDPOINT', 'http://minio:9000'),
aws_access_key_id=os.getenv('S3_ACCESS_KEY', 'fuzzforge'),
aws_secret_access_key=os.getenv('S3_SECRET_KEY', 'fuzzforge123'),
region_name=os.getenv('S3_REGION', 'us-east-1'),
use_ssl=os.getenv('S3_USE_SSL', 'false').lower() == 'true'
)
# Configuration
S3_BUCKET = os.getenv('S3_BUCKET', 'targets')
CACHE_DIR = Path(os.getenv('CACHE_DIR', '/cache'))
CACHE_MAX_SIZE_GB = int(os.getenv('CACHE_MAX_SIZE', '10').rstrip('GB'))
@activity.defn(name="get_target")
async def get_target_activity(
target_id: str,
run_id: str = None,
workspace_isolation: str = "isolated"
) -> str:
"""
Download target from MinIO to local cache.
Args:
target_id: UUID of the uploaded target
run_id: Workflow run ID for isolation (required for isolated mode)
workspace_isolation: Isolation mode - "isolated" (default), "shared", or "copy-on-write"
Returns:
Local path to the cached target workspace
Raises:
FileNotFoundError: If target doesn't exist in MinIO
ValueError: If run_id not provided for isolated mode
Exception: For other download errors
"""
logger.info(
f"Activity: get_target (target_id={target_id}, run_id={run_id}, "
f"isolation={workspace_isolation})"
)
# Validate isolation mode
valid_modes = ["isolated", "shared", "copy-on-write"]
if workspace_isolation not in valid_modes:
raise ValueError(
f"Invalid workspace_isolation mode: {workspace_isolation}. "
f"Must be one of: {valid_modes}"
)
# Require run_id for isolated and copy-on-write modes
if workspace_isolation in ["isolated", "copy-on-write"] and not run_id:
raise ValueError(
f"run_id is required for workspace_isolation='{workspace_isolation}'"
)
# Define cache paths based on isolation mode
if workspace_isolation == "isolated":
# Each run gets its own isolated workspace
cache_path = CACHE_DIR / target_id / run_id
cached_file = cache_path / "target"
elif workspace_isolation == "shared":
# All runs share the same workspace (legacy behavior)
cache_path = CACHE_DIR / target_id
cached_file = cache_path / "target"
else: # copy-on-write
# Shared download, run-specific copy
shared_cache_path = CACHE_DIR / target_id / "shared"
cache_path = CACHE_DIR / target_id / run_id
cached_file = shared_cache_path / "target"
# Handle copy-on-write mode
if workspace_isolation == "copy-on-write":
# Check if shared cache exists
if cached_file.exists():
logger.info(f"Copy-on-write: Shared cache HIT for {target_id}")
# Copy shared workspace to run-specific path
shared_workspace = shared_cache_path / "workspace"
run_workspace = cache_path / "workspace"
if shared_workspace.exists():
logger.info(f"Copying workspace to isolated run path: {run_workspace}")
cache_path.mkdir(parents=True, exist_ok=True)
shutil.copytree(shared_workspace, run_workspace)
return str(run_workspace)
else:
# Shared file exists but not extracted (non-tarball)
run_file = cache_path / "target"
cache_path.mkdir(parents=True, exist_ok=True)
shutil.copy2(cached_file, run_file)
return str(run_file)
# If shared cache doesn't exist, fall through to download
# Check if target is already cached (isolated or shared mode)
elif cached_file.exists():
# Update access time for LRU
cached_file.touch()
logger.info(f"Cache HIT: {target_id} (mode: {workspace_isolation})")
# Check if workspace directory exists (extracted tarball)
workspace_dir = cache_path / "workspace"
if workspace_dir.exists() and workspace_dir.is_dir():
logger.info(f"Returning cached workspace: {workspace_dir}")
return str(workspace_dir)
else:
# Return cached file (not a tarball)
return str(cached_file)
# Cache miss - download from MinIO
logger.info(
f"Cache MISS: {target_id} (mode: {workspace_isolation}), "
f"downloading from MinIO..."
)
try:
# Create cache directory
cache_path.mkdir(parents=True, exist_ok=True)
# Download from S3/MinIO
s3_key = f'{target_id}/target'
logger.info(f"Downloading s3://{S3_BUCKET}/{s3_key} -> {cached_file}")
s3_client.download_file(
Bucket=S3_BUCKET,
Key=s3_key,
Filename=str(cached_file)
)
# Verify file was downloaded
if not cached_file.exists():
raise FileNotFoundError(f"Downloaded file not found: {cached_file}")
file_size = cached_file.stat().st_size
logger.info(
f"✓ Downloaded target {target_id} "
f"({file_size / 1024 / 1024:.2f} MB)"
)
# Extract tarball if it's an archive
import tarfile
workspace_dir = cache_path / "workspace"
if tarfile.is_tarfile(str(cached_file)):
logger.info(f"Extracting tarball to {workspace_dir}...")
workspace_dir.mkdir(parents=True, exist_ok=True)
with tarfile.open(str(cached_file), 'r:*') as tar:
tar.extractall(path=workspace_dir)
logger.info(f"✓ Extracted tarball to {workspace_dir}")
# For copy-on-write mode, copy to run-specific path
if workspace_isolation == "copy-on-write":
run_cache_path = CACHE_DIR / target_id / run_id
run_workspace = run_cache_path / "workspace"
logger.info(f"Copy-on-write: Copying to {run_workspace}")
run_cache_path.mkdir(parents=True, exist_ok=True)
shutil.copytree(workspace_dir, run_workspace)
return str(run_workspace)
return str(workspace_dir)
else:
# Not a tarball
if workspace_isolation == "copy-on-write":
# Copy file to run-specific path
run_cache_path = CACHE_DIR / target_id / run_id
run_file = run_cache_path / "target"
logger.info(f"Copy-on-write: Copying file to {run_file}")
run_cache_path.mkdir(parents=True, exist_ok=True)
shutil.copy2(cached_file, run_file)
return str(run_file)
return str(cached_file)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == '404' or error_code == 'NoSuchKey':
logger.error(f"Target not found in MinIO: {target_id}")
raise FileNotFoundError(f"Target {target_id} not found in storage")
else:
logger.error(f"S3/MinIO error downloading target: {e}", exc_info=True)
raise
except Exception as e:
logger.error(f"Failed to download target {target_id}: {e}", exc_info=True)
# Cleanup partial download
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
raise
@activity.defn(name="cleanup_cache")
async def cleanup_cache_activity(
target_path: str,
workspace_isolation: str = "isolated"
) -> None:
"""
Remove target from local cache after workflow completes.
Args:
target_path: Path to the cached target workspace (from get_target_activity)
workspace_isolation: Isolation mode used - determines cleanup scope
Notes:
- "isolated" mode: Removes the entire run-specific directory
- "copy-on-write" mode: Removes run-specific directory, keeps shared cache
- "shared" mode: Does NOT remove cache (shared across runs)
"""
logger.info(
f"Activity: cleanup_cache (path={target_path}, "
f"isolation={workspace_isolation})"
)
try:
target = Path(target_path)
# For shared mode, don't clean up (cache is shared across runs)
if workspace_isolation == "shared":
logger.info(
f"Skipping cleanup for shared workspace (mode={workspace_isolation})"
)
return
# For isolated and copy-on-write modes, clean up run-specific directory
# Navigate up to the run-specific directory: /cache/{target_id}/{run_id}/
if target.name == "workspace":
# Path is .../workspace, go up one level to run directory
run_dir = target.parent
else:
# Path is a file, go up one level to run directory
run_dir = target.parent
# Validate it's in cache and looks like a run-specific path
if run_dir.exists() and run_dir.is_relative_to(CACHE_DIR):
# Check if parent is target_id directory (validate structure)
target_id_dir = run_dir.parent
if target_id_dir.is_relative_to(CACHE_DIR):
shutil.rmtree(run_dir)
logger.info(
f"✓ Cleaned up run-specific directory: {run_dir} "
f"(mode={workspace_isolation})"
)
else:
logger.warning(
f"Unexpected cache structure, skipping cleanup: {run_dir}"
)
else:
logger.warning(
f"Cache path not in CACHE_DIR or doesn't exist: {run_dir}"
)
except Exception as e:
# Don't fail workflow if cleanup fails
logger.error(
f"Failed to cleanup cache {target_path}: {e}",
exc_info=True
)
@activity.defn(name="upload_results")
async def upload_results_activity(
workflow_id: str,
results: dict,
results_format: str = "json"
) -> str:
"""
Upload workflow results to MinIO.
Args:
workflow_id: Workflow execution ID
results: Results dictionary to upload
results_format: Format for results (json, sarif, etc.)
Returns:
S3 URL to the uploaded results
"""
logger.info(
f"Activity: upload_results "
f"(workflow_id={workflow_id}, format={results_format})"
)
try:
import json
# Prepare results content
if results_format == "json":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
elif results_format == "sarif":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/sarif+json'
file_ext = 'sarif'
else:
# Default to JSON
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
# Upload to MinIO
s3_key = f'{workflow_id}/results.{file_ext}'
logger.info(f"Uploading results to s3://results/{s3_key}")
s3_client.put_object(
Bucket='results',
Key=s3_key,
Body=content,
ContentType=content_type,
Metadata={
'workflow_id': workflow_id,
'format': results_format
}
)
# Construct S3 URL
s3_endpoint = os.getenv('S3_ENDPOINT', 'http://minio:9000')
s3_url = f"{s3_endpoint}/results/{s3_key}"
logger.info(f"✓ Uploaded results: {s3_url}")
return s3_url
except Exception as e:
logger.error(
f"Failed to upload results for workflow {workflow_id}: {e}",
exc_info=True
)
raise
def _check_cache_size():
"""Check total cache size and log warning if exceeding limit"""
try:
total_size = 0
for item in CACHE_DIR.rglob('*'):
if item.is_file():
total_size += item.stat().st_size
total_size_gb = total_size / (1024 ** 3)
if total_size_gb > CACHE_MAX_SIZE_GB:
logger.warning(
f"Cache size ({total_size_gb:.2f} GB) exceeds "
f"limit ({CACHE_MAX_SIZE_GB} GB). Consider cleanup."
)
except Exception as e:
logger.error(f"Failed to check cache size: {e}")
@@ -16,7 +16,7 @@ Security Analyzer Module - Analyzes code for security vulnerabilities
import logging
import re
from pathlib import Path
from typing import Dict, Any, List, Optional
from typing import Dict, Any, List
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
-1
View File
@@ -17,7 +17,6 @@ from abc import ABC, abstractmethod
from pathlib import Path
from typing import Dict, Any, List, Optional
from pydantic import BaseModel, Field
from datetime import datetime
import logging
logger = logging.getLogger(__name__)
@@ -0,0 +1,10 @@
"""
Fuzzing modules for FuzzForge
This package contains fuzzing modules for different fuzzing engines.
"""
from .atheris_fuzzer import AtherisFuzzer
from .cargo_fuzzer import CargoFuzzer
__all__ = ["AtherisFuzzer", "CargoFuzzer"]
@@ -0,0 +1,608 @@
"""
Atheris Fuzzer Module
Reusable module for fuzzing Python code using Atheris.
Discovers and fuzzes user-provided Python targets with TestOneInput() function.
"""
import asyncio
import base64
import importlib.util
import logging
import multiprocessing
import os
import sys
import time
from datetime import datetime
from pathlib import Path
from typing import Dict, Any, List, Optional, Callable
import uuid
import httpx
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
def _run_atheris_in_subprocess(
target_path_str: str,
corpus_dir_str: str,
max_iterations: int,
timeout_seconds: int,
shared_crashes: Any,
exec_counter: multiprocessing.Value,
crash_counter: multiprocessing.Value,
coverage_counter: multiprocessing.Value
):
"""
Run atheris.Fuzz() in a separate process to isolate os._exit() calls.
This function runs in a subprocess and loads the target module,
sets up atheris, and runs fuzzing. Stats are communicated via shared memory.
Args:
target_path_str: String path to target file
corpus_dir_str: String path to corpus directory
max_iterations: Maximum fuzzing iterations
timeout_seconds: Timeout in seconds
shared_crashes: Manager().list() for storing crash details
exec_counter: Shared counter for executions
crash_counter: Shared counter for crashes
coverage_counter: Shared counter for coverage edges
"""
import atheris
import importlib.util
import traceback
from pathlib import Path
target_path = Path(target_path_str)
total_executions = 0
# NOTE: Crash details are written directly to shared_crashes (Manager().list())
# so they can be accessed by parent process after subprocess exits.
# We don't use a local crashes list because os._exit() prevents cleanup code.
try:
# Load target module in subprocess
module_name = f"fuzz_target_{uuid.uuid4().hex[:8]}"
spec = importlib.util.spec_from_file_location(module_name, target_path)
if spec is None or spec.loader is None:
raise ImportError(f"Could not load module from {target_path}")
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
if not hasattr(module, "TestOneInput"):
raise AttributeError("Module does not have TestOneInput() function")
test_one_input = module.TestOneInput
# Wrapper to track executions and crashes
def fuzz_wrapper(data):
nonlocal total_executions
total_executions += 1
# Update shared counter for live stats
with exec_counter.get_lock():
exec_counter.value += 1
try:
test_one_input(data)
except Exception as e:
# Capture crash details to shared memory
crash_info = {
"input": bytes(data), # Convert to bytes for serialization
"exception_type": type(e).__name__,
"exception_message": str(e),
"stack_trace": traceback.format_exc(),
"execution": total_executions
}
# Write to shared memory so parent process can access crash details
shared_crashes.append(crash_info)
# Update shared crash counter
with crash_counter.get_lock():
crash_counter.value += 1
# Re-raise so Atheris detects it
raise
# Check for dictionary file in target directory
dict_args = []
target_dir = target_path.parent
for dict_name in ["fuzz.dict", "fuzzing.dict", "dict.txt"]:
dict_path = target_dir / dict_name
if dict_path.exists():
dict_args.append(f"-dict={dict_path}")
break
# Configure Atheris
atheris_args = [
"atheris_fuzzer",
f"-runs={max_iterations}",
f"-max_total_time={timeout_seconds}",
"-print_final_stats=1"
] + dict_args + [corpus_dir_str] # Corpus directory as positional arg
atheris.Setup(atheris_args, fuzz_wrapper)
# Run fuzzing (this will call os._exit() when done)
atheris.Fuzz()
except SystemExit:
# Atheris exits when done - this is normal
# Crash details already written to shared_crashes
pass
except Exception:
# Fatal error - traceback already written to shared memory
# via crash handler in fuzz_wrapper
pass
class AtherisFuzzer(BaseModule):
"""
Atheris fuzzing module - discovers and fuzzes Python code.
This module can be used by any workflow to fuzz Python targets.
"""
def __init__(self):
super().__init__()
self.crashes = []
self.total_executions = 0
self.start_time = None
self.last_stats_time = 0
self.run_id = None
def get_metadata(self) -> ModuleMetadata:
"""Return module metadata"""
return ModuleMetadata(
name="atheris_fuzzer",
version="1.0.0",
description="Python fuzzing using Atheris - discovers and fuzzes TestOneInput() functions",
author="FuzzForge Team",
category="fuzzer",
tags=["fuzzing", "atheris", "python", "coverage"],
input_schema={
"type": "object",
"properties": {
"target_file": {
"type": "string",
"description": "Python file with TestOneInput() function (auto-discovered if not specified)"
},
"max_iterations": {
"type": "integer",
"description": "Maximum fuzzing iterations",
"default": 100000
},
"timeout_seconds": {
"type": "integer",
"description": "Fuzzing timeout in seconds",
"default": 300
},
"stats_callback": {
"description": "Optional callback for real-time statistics"
}
}
},
requires_workspace=True
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate fuzzing configuration"""
max_iterations = config.get("max_iterations", 100000)
if not isinstance(max_iterations, int) or max_iterations <= 0:
raise ValueError(f"max_iterations must be positive integer, got: {max_iterations}")
timeout = config.get("timeout_seconds", 300)
if not isinstance(timeout, int) or timeout <= 0:
raise ValueError(f"timeout_seconds must be positive integer, got: {timeout}")
return True
async def execute(self, config: Dict[str, Any], workspace: Path) -> ModuleResult:
"""
Execute Atheris fuzzing on user code.
Args:
config: Fuzzing configuration
workspace: Path to user's uploaded code
Returns:
ModuleResult with crash findings
"""
self.start_timer()
self.start_time = time.time()
# Validate configuration
self.validate_config(config)
self.validate_workspace(workspace)
# Extract config
target_file = config.get("target_file")
max_iterations = config.get("max_iterations", 100000)
timeout_seconds = config.get("timeout_seconds", 300)
stats_callback = config.get("stats_callback")
self.run_id = config.get("run_id")
logger.info(
f"Starting Atheris fuzzing (max_iterations={max_iterations}, "
f"timeout={timeout_seconds}s, target={target_file or 'auto-discover'})"
)
try:
# Step 1: Discover or load target
target_path = self._discover_target(workspace, target_file)
logger.info(f"Using fuzz target: {target_path}")
# Step 2: Load target module
test_one_input = self._load_target_module(target_path)
logger.info(f"Loaded TestOneInput function from {target_path}")
# Step 3: Run fuzzing
await self._run_fuzzing(
test_one_input=test_one_input,
target_path=target_path,
workspace=workspace,
max_iterations=max_iterations,
timeout_seconds=timeout_seconds,
stats_callback=stats_callback
)
# Step 4: Generate findings from crashes
findings = await self._generate_findings(target_path)
logger.info(
f"Fuzzing completed: {self.total_executions} executions, "
f"{len(self.crashes)} crashes found"
)
# Generate SARIF report (always, even with no findings)
from modules.reporter import SARIFReporter
reporter = SARIFReporter()
reporter_config = {
"findings": findings,
"tool_name": "Atheris Fuzzer",
"tool_version": self._metadata.version
}
reporter_result = await reporter.execute(reporter_config, workspace)
sarif_report = reporter_result.sarif
return ModuleResult(
module=self._metadata.name,
version=self._metadata.version,
status="success",
execution_time=self.get_execution_time(),
findings=findings,
summary={
"total_executions": self.total_executions,
"crashes_found": len(self.crashes),
"execution_time": self.get_execution_time(),
"target_file": str(target_path.relative_to(workspace))
},
metadata={
"max_iterations": max_iterations,
"timeout_seconds": timeout_seconds
},
sarif=sarif_report
)
except Exception as e:
logger.error(f"Fuzzing failed: {e}", exc_info=True)
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
def _discover_target(self, workspace: Path, target_file: Optional[str]) -> Path:
"""
Discover fuzz target in workspace.
Args:
workspace: Path to workspace
target_file: Explicit target file or None for auto-discovery
Returns:
Path to target file
"""
if target_file:
# Use specified target
target_path = workspace / target_file
if not target_path.exists():
raise FileNotFoundError(f"Target file not found: {target_file}")
return target_path
# Auto-discover: look for fuzz_*.py or *_fuzz.py
logger.info("Auto-discovering fuzz targets...")
candidates = []
# Use rglob for recursive search (searches all subdirectories)
for pattern in ["fuzz_*.py", "*_fuzz.py", "fuzz_target.py"]:
matches = list(workspace.rglob(pattern))
candidates.extend(matches)
if not candidates:
raise FileNotFoundError(
"No fuzz targets found. Expected files matching: fuzz_*.py, *_fuzz.py, or fuzz_target.py"
)
# Use first candidate
target = candidates[0]
if len(candidates) > 1:
logger.warning(
f"Multiple fuzz targets found: {[str(c) for c in candidates]}. "
f"Using: {target.name}"
)
return target
def _load_target_module(self, target_path: Path) -> Callable:
"""
Load target module and get TestOneInput function.
Args:
target_path: Path to Python file with TestOneInput
Returns:
TestOneInput function
"""
# Add target directory to sys.path
target_dir = target_path.parent
if str(target_dir) not in sys.path:
sys.path.insert(0, str(target_dir))
# Load module dynamically
module_name = target_path.stem
spec = importlib.util.spec_from_file_location(module_name, target_path)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot load module from {target_path}")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Get TestOneInput function
if not hasattr(module, "TestOneInput"):
raise AttributeError(
f"Module {module_name} does not have TestOneInput() function. "
"Atheris requires a TestOneInput(data: bytes) function."
)
return module.TestOneInput
async def _run_fuzzing(
self,
test_one_input: Callable,
target_path: Path,
workspace: Path,
max_iterations: int,
timeout_seconds: int,
stats_callback: Optional[Callable] = None
):
"""
Run Atheris fuzzing with real-time monitoring.
Args:
test_one_input: TestOneInput function to fuzz (not used, loaded in subprocess)
target_path: Path to target file
workspace: Path to workspace directory
max_iterations: Max iterations
timeout_seconds: Timeout in seconds
stats_callback: Optional callback for stats
"""
self.crashes = []
self.total_executions = 0
# Create corpus directory in workspace
corpus_dir = workspace / ".fuzzforge_corpus"
corpus_dir.mkdir(exist_ok=True)
logger.info(f"Using corpus directory: {corpus_dir}")
logger.info(f"Starting Atheris fuzzer in subprocess (max_runs={max_iterations}, timeout={timeout_seconds}s)...")
# Create shared memory for subprocess communication
ctx = multiprocessing.get_context('spawn')
manager = ctx.Manager()
shared_crashes = manager.list() # Shared list for crash details
exec_counter = ctx.Value('i', 0) # Shared execution counter
crash_counter = ctx.Value('i', 0) # Shared crash counter
coverage_counter = ctx.Value('i', 0) # Shared coverage counter
# Start fuzzing in subprocess
process = ctx.Process(
target=_run_atheris_in_subprocess,
args=(str(target_path), str(corpus_dir), max_iterations, timeout_seconds, shared_crashes, exec_counter, crash_counter, coverage_counter)
)
# Run fuzzing in a separate task with monitoring
async def monitor_stats():
"""Monitor and report stats every 0.5 seconds"""
while True:
await asyncio.sleep(0.5)
if stats_callback:
elapsed = time.time() - self.start_time
# Read from shared counters
current_execs = exec_counter.value
current_crashes = crash_counter.value
current_coverage = coverage_counter.value
execs_per_sec = current_execs / elapsed if elapsed > 0 else 0
# Count corpus files
try:
corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
except Exception:
corpus_size = 0
# TODO: Get real coverage from Atheris
# For now use corpus_size as proxy
coverage_value = current_coverage if current_coverage > 0 else corpus_size
await stats_callback({
"total_execs": current_execs,
"execs_per_sec": execs_per_sec,
"crashes": current_crashes,
"corpus_size": corpus_size,
"coverage": coverage_value, # Using corpus as coverage proxy
"elapsed_time": int(elapsed)
})
# Start monitoring task
monitor_task = None
if stats_callback:
monitor_task = asyncio.create_task(monitor_stats())
try:
# Start subprocess
process.start()
logger.info(f"Fuzzing subprocess started (PID: {process.pid})")
# Wait for subprocess to complete
while process.is_alive():
await asyncio.sleep(0.1)
# NOTE: We cannot use result_queue because Atheris calls os._exit()
# which terminates immediately without putting results in the queue.
# Instead, we rely on shared memory (Manager().list() and Value counters).
# Read final values from shared memory
self.total_executions = exec_counter.value
total_crashes = crash_counter.value
# Read crash details from shared memory and convert to our format
self.crashes = []
for crash_data in shared_crashes:
# Reconstruct crash info with exception object
crash_info = {
"input": crash_data["input"],
"exception": Exception(crash_data["exception_message"]),
"exception_type": crash_data["exception_type"],
"stack_trace": crash_data["stack_trace"],
"execution": crash_data["execution"]
}
self.crashes.append(crash_info)
logger.warning(
f"Crash found (execution {crash_data['execution']}): "
f"{crash_data['exception_type']}: {crash_data['exception_message']}"
)
logger.info(f"Fuzzing completed: {self.total_executions} executions, {total_crashes} crashes found")
# Send final stats update
if stats_callback:
elapsed = time.time() - self.start_time
execs_per_sec = self.total_executions / elapsed if elapsed > 0 else 0
# Count final corpus size
try:
final_corpus_size = len(list(corpus_dir.iterdir())) if corpus_dir.exists() else 0
except Exception:
final_corpus_size = 0
# TODO: Parse coverage from Atheris output
# For now, use corpus size as proxy (corpus grows with coverage)
# libFuzzer writes coverage to stdout but sys.stdout redirection
# doesn't work because it writes to FD 1 directly from C++
final_coverage = coverage_counter.value if coverage_counter.value > 0 else final_corpus_size
await stats_callback({
"total_execs": self.total_executions,
"execs_per_sec": execs_per_sec,
"crashes": total_crashes,
"corpus_size": final_corpus_size,
"coverage": final_coverage,
"elapsed_time": int(elapsed)
})
# Wait for process to fully terminate
process.join(timeout=5)
if process.exitcode is not None and process.exitcode != 0:
logger.warning(f"Subprocess exited with code: {process.exitcode}")
except Exception as e:
logger.error(f"Fuzzing execution error: {e}")
if process.is_alive():
logger.warning("Terminating fuzzing subprocess...")
process.terminate()
process.join(timeout=5)
if process.is_alive():
process.kill()
raise
finally:
# Stop monitoring
if monitor_task:
monitor_task.cancel()
try:
await monitor_task
except asyncio.CancelledError:
pass
async def _generate_findings(self, target_path: Path) -> List[ModuleFinding]:
"""
Generate ModuleFinding objects from crashes.
Args:
target_path: Path to target file
Returns:
List of findings
"""
findings = []
for idx, crash in enumerate(self.crashes):
# Encode crash input for storage
crash_input_b64 = base64.b64encode(crash["input"]).decode()
finding = self.create_finding(
title=f"Crash: {crash['exception_type']}",
description=(
f"Atheris found crash during fuzzing:\n"
f"Exception: {crash['exception_type']}\n"
f"Message: {str(crash['exception'])}\n"
f"Execution: {crash['execution']}"
),
severity="critical",
category="crash",
file_path=str(target_path),
metadata={
"crash_input_base64": crash_input_b64,
"crash_input_hex": crash["input"].hex(),
"exception_type": crash["exception_type"],
"stack_trace": crash["stack_trace"],
"execution_number": crash["execution"]
},
recommendation=(
"Review the crash stack trace and input to identify the vulnerability. "
"The crash input is provided in base64 and hex formats for reproduction."
)
)
findings.append(finding)
# Report crash to backend for real-time monitoring
if self.run_id:
try:
crash_report = {
"run_id": self.run_id,
"crash_id": f"crash_{idx + 1}",
"timestamp": datetime.utcnow().isoformat(),
"crash_type": crash["exception_type"],
"stack_trace": crash["stack_trace"],
"input_file": crash_input_b64,
"severity": "critical",
"exploitability": "unknown"
}
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
await client.post(
f"{backend_url}/fuzzing/{self.run_id}/crash",
json=crash_report
)
logger.debug(f"Crash report sent to backend: {crash_report['crash_id']}")
except Exception as e:
logger.debug(f"Failed to post crash report to backend: {e}")
return findings
@@ -0,0 +1,455 @@
"""
Cargo Fuzzer Module
Reusable module for fuzzing Rust code using cargo-fuzz (libFuzzer).
Discovers and fuzzes user-provided Rust targets with fuzz_target!() macros.
"""
import asyncio
import logging
import os
import re
import time
from pathlib import Path
from typing import Dict, Any, List, Optional, Callable
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
logger = logging.getLogger(__name__)
class CargoFuzzer(BaseModule):
"""
Cargo-fuzz (libFuzzer) fuzzer module for Rust code.
Discovers fuzz targets in user's Rust project and runs cargo-fuzz
to find crashes, undefined behavior, and memory safety issues.
"""
def get_metadata(self) -> ModuleMetadata:
"""Get module metadata"""
return ModuleMetadata(
name="cargo_fuzz",
version="0.11.2",
description="Fuzz Rust code using cargo-fuzz with libFuzzer backend",
author="FuzzForge Team",
category="fuzzer",
tags=["fuzzing", "rust", "cargo-fuzz", "libfuzzer", "memory-safety"],
input_schema={
"type": "object",
"properties": {
"target_name": {
"type": "string",
"description": "Fuzz target name (auto-discovered if not specified)"
},
"max_iterations": {
"type": "integer",
"default": 1000000,
"description": "Maximum fuzzing iterations"
},
"timeout_seconds": {
"type": "integer",
"default": 1800,
"description": "Fuzzing timeout in seconds"
},
"sanitizer": {
"type": "string",
"enum": ["address", "memory", "undefined"],
"default": "address",
"description": "Sanitizer to use (address, memory, undefined)"
}
}
},
output_schema={
"type": "object",
"properties": {
"findings": {
"type": "array",
"description": "Crashes and memory safety issues found"
},
"summary": {
"type": "object",
"description": "Fuzzing execution summary"
}
}
}
)
def validate_config(self, config: Dict[str, Any]) -> bool:
"""Validate configuration"""
max_iterations = config.get("max_iterations", 1000000)
if not isinstance(max_iterations, int) or max_iterations < 1:
raise ValueError("max_iterations must be a positive integer")
timeout = config.get("timeout_seconds", 1800)
if not isinstance(timeout, int) or timeout < 1:
raise ValueError("timeout_seconds must be a positive integer")
sanitizer = config.get("sanitizer", "address")
if sanitizer not in ["address", "memory", "undefined"]:
raise ValueError("sanitizer must be one of: address, memory, undefined")
return True
async def execute(
self,
config: Dict[str, Any],
workspace: Path,
stats_callback: Optional[Callable] = None
) -> ModuleResult:
"""
Execute cargo-fuzz on user's Rust code.
Args:
config: Fuzzer configuration
workspace: Path to workspace directory containing Rust project
stats_callback: Optional callback for real-time stats updates
Returns:
ModuleResult containing findings and summary
"""
self.start_timer()
try:
# Validate inputs
self.validate_config(config)
self.validate_workspace(workspace)
logger.info(f"Running cargo-fuzz on {workspace}")
# Step 1: Discover fuzz targets
targets = await self._discover_fuzz_targets(workspace)
if not targets:
return self.create_result(
findings=[],
status="failed",
error="No fuzz targets found. Expected fuzz targets in fuzz/fuzz_targets/"
)
# Get target name from config or use first discovered target
target_name = config.get("target_name")
if not target_name:
target_name = targets[0]
logger.info(f"No target specified, using first discovered target: {target_name}")
elif target_name not in targets:
return self.create_result(
findings=[],
status="failed",
error=f"Target '{target_name}' not found. Available targets: {', '.join(targets)}"
)
# Step 2: Build fuzz target
logger.info(f"Building fuzz target: {target_name}")
build_success = await self._build_fuzz_target(workspace, target_name, config)
if not build_success:
return self.create_result(
findings=[],
status="failed",
error=f"Failed to build fuzz target: {target_name}"
)
# Step 3: Run fuzzing
logger.info(f"Starting fuzzing: {target_name}")
findings, stats = await self._run_fuzzing(
workspace,
target_name,
config,
stats_callback
)
# Step 4: Parse crash artifacts
crash_findings = await self._parse_crash_artifacts(workspace, target_name)
findings.extend(crash_findings)
logger.info(f"Fuzzing completed: {len(findings)} crashes found")
return self.create_result(
findings=findings,
status="success",
summary=stats
)
except Exception as e:
logger.error(f"Cargo fuzzer failed: {e}")
return self.create_result(
findings=[],
status="failed",
error=str(e)
)
async def _discover_fuzz_targets(self, workspace: Path) -> List[str]:
"""
Discover fuzz targets in the project.
Looks for fuzz targets in fuzz/fuzz_targets/ directory.
"""
fuzz_targets_dir = workspace / "fuzz" / "fuzz_targets"
if not fuzz_targets_dir.exists():
logger.warning(f"No fuzz targets directory found: {fuzz_targets_dir}")
return []
targets = []
for file in fuzz_targets_dir.glob("*.rs"):
target_name = file.stem
targets.append(target_name)
logger.info(f"Discovered fuzz target: {target_name}")
return targets
async def _build_fuzz_target(
self,
workspace: Path,
target_name: str,
config: Dict[str, Any]
) -> bool:
"""Build the fuzz target with instrumentation"""
try:
sanitizer = config.get("sanitizer", "address")
# Build command
cmd = [
"cargo", "fuzz", "build",
target_name,
f"--sanitizer={sanitizer}"
]
logger.debug(f"Build command: {' '.join(cmd)}")
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
stdout, stderr = await proc.communicate()
if proc.returncode != 0:
logger.error(f"Build failed: {stderr.decode()}")
return False
logger.info("Build successful")
return True
except Exception as e:
logger.error(f"Build error: {e}")
return False
async def _run_fuzzing(
self,
workspace: Path,
target_name: str,
config: Dict[str, Any],
stats_callback: Optional[Callable]
) -> tuple[List[ModuleFinding], Dict[str, Any]]:
"""
Run cargo-fuzz and collect statistics.
Returns:
Tuple of (findings, stats_dict)
"""
max_iterations = config.get("max_iterations", 1000000)
timeout_seconds = config.get("timeout_seconds", 1800)
sanitizer = config.get("sanitizer", "address")
findings = []
stats = {
"total_executions": 0,
"crashes_found": 0,
"corpus_size": 0,
"coverage": 0.0,
"execution_time": 0.0
}
try:
# Cargo fuzz run command
cmd = [
"cargo", "fuzz", "run",
target_name,
f"--sanitizer={sanitizer}",
"--",
f"-runs={max_iterations}",
f"-max_total_time={timeout_seconds}"
]
logger.debug(f"Fuzz command: {' '.join(cmd)}")
start_time = time.time()
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT
)
# Monitor output and extract stats
last_stats_time = time.time()
async for line in proc.stdout:
line_str = line.decode('utf-8', errors='ignore').strip()
# Parse libFuzzer stats
# Example: "#12345 NEW cov: 123 ft: 456 corp: 10/234b"
stats_match = re.match(r'#(\d+)\s+.*cov:\s*(\d+).*corp:\s*(\d+)', line_str)
if stats_match:
execs = int(stats_match.group(1))
cov = int(stats_match.group(2))
corp = int(stats_match.group(3))
stats["total_executions"] = execs
stats["coverage"] = float(cov)
stats["corpus_size"] = corp
stats["execution_time"] = time.time() - start_time
# Invoke stats callback for real-time monitoring
if stats_callback and time.time() - last_stats_time >= 0.5:
await stats_callback({
"total_execs": execs,
"execs_per_sec": execs / stats["execution_time"] if stats["execution_time"] > 0 else 0,
"crashes": stats["crashes_found"],
"coverage": cov,
"corpus_size": corp,
"elapsed_time": int(stats["execution_time"])
})
last_stats_time = time.time()
# Detect crash line
if "SUMMARY:" in line_str or "ERROR:" in line_str:
logger.info(f"Detected crash: {line_str}")
stats["crashes_found"] += 1
await proc.wait()
stats["execution_time"] = time.time() - start_time
# Send final stats update
if stats_callback:
await stats_callback({
"total_execs": stats["total_executions"],
"execs_per_sec": stats["total_executions"] / stats["execution_time"] if stats["execution_time"] > 0 else 0,
"crashes": stats["crashes_found"],
"coverage": stats["coverage"],
"corpus_size": stats["corpus_size"],
"elapsed_time": int(stats["execution_time"])
})
logger.info(
f"Fuzzing completed: {stats['total_executions']} execs, "
f"{stats['crashes_found']} crashes"
)
except Exception as e:
logger.error(f"Fuzzing error: {e}")
return findings, stats
async def _parse_crash_artifacts(
self,
workspace: Path,
target_name: str
) -> List[ModuleFinding]:
"""
Parse crash artifacts from fuzz/artifacts directory.
Cargo-fuzz stores crashes in: fuzz/artifacts/<target_name>/
"""
findings = []
artifacts_dir = workspace / "fuzz" / "artifacts" / target_name
if not artifacts_dir.exists():
logger.info("No crash artifacts found")
return findings
# Find all crash files
for crash_file in artifacts_dir.glob("crash-*"):
try:
finding = await self._analyze_crash(workspace, target_name, crash_file)
if finding:
findings.append(finding)
except Exception as e:
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
logger.info(f"Parsed {len(findings)} crash artifacts")
return findings
async def _analyze_crash(
self,
workspace: Path,
target_name: str,
crash_file: Path
) -> Optional[ModuleFinding]:
"""
Analyze a single crash file.
Runs cargo-fuzz with the crash input to reproduce and get stack trace.
"""
try:
# Read crash input
crash_input = crash_file.read_bytes()
# Reproduce crash to get stack trace
cmd = [
"cargo", "fuzz", "run",
target_name,
str(crash_file)
]
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=workspace,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
env={**os.environ, "RUST_BACKTRACE": "1"}
)
stdout, _ = await proc.communicate()
output = stdout.decode('utf-8', errors='ignore')
# Parse stack trace and error type
error_type = "Unknown Crash"
stack_trace = output
# Extract error type
if "SEGV" in output:
error_type = "Segmentation Fault"
severity = "critical"
elif "heap-use-after-free" in output:
error_type = "Use After Free"
severity = "critical"
elif "heap-buffer-overflow" in output:
error_type = "Heap Buffer Overflow"
severity = "critical"
elif "stack-buffer-overflow" in output:
error_type = "Stack Buffer Overflow"
severity = "high"
elif "panic" in output.lower():
error_type = "Panic"
severity = "medium"
else:
severity = "high"
# Create finding
finding = self.create_finding(
title=f"Crash: {error_type} in {target_name}",
description=f"Cargo-fuzz discovered a crash in target '{target_name}'. "
f"Error type: {error_type}. "
f"Input size: {len(crash_input)} bytes.",
severity=severity,
category="crash",
file_path=f"fuzz/fuzz_targets/{target_name}.rs",
code_snippet=stack_trace[:500],
recommendation="Review the crash details and fix the underlying bug. "
"Use AddressSanitizer to identify memory safety issues. "
"Consider adding bounds checks or using safer APIs.",
metadata={
"error_type": error_type,
"crash_file": crash_file.name,
"input_size": len(crash_input),
"reproducer": crash_file.name,
"stack_trace": stack_trace
}
)
return finding
except Exception as e:
logger.warning(f"Failed to analyze crash {crash_file}: {e}")
return None
@@ -17,7 +17,6 @@ import logging
from pathlib import Path
from typing import Dict, Any, List
from datetime import datetime
import json
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
@@ -16,16 +16,16 @@ File Scanner Module - Scans and enumerates files in the workspace
import logging
import mimetypes
from pathlib import Path
from typing import Dict, Any, List
from typing import Dict, Any
import hashlib
try:
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
except ImportError:
try:
from modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from modules.base import BaseModule, ModuleMetadata, ModuleResult
except ImportError:
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult, ModuleFinding
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
logger = logging.getLogger(__name__)
@@ -0,0 +1,9 @@
"""
Atheris Fuzzing Workflow
Fuzzes user-provided Python code using Atheris.
"""
from .workflow import AtherisFuzzingWorkflow
__all__ = ["AtherisFuzzingWorkflow"]
@@ -0,0 +1,122 @@
"""
Atheris Fuzzing Workflow Activities
Activities specific to the Atheris fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_atheris")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the AtherisFuzzer module on user code.
This activity:
1. Imports the reusable AtherisFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's TestOneInput() function
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded code)
config: Fuzzer configuration (target_file, max_iterations, timeout_seconds)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_atheris (workspace={workspace_path})")
try:
# Import reusable AtherisFuzzer module
from modules.fuzzer import AtherisFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
logger.info(f"COVERAGE_DEBUG: coverage from stats_data = {coverage_value}")
stats_payload = {
"run_id": run_id,
"workflow": "atheris_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "atheris_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.warning(f"Error in stats callback: {e}")
# Add stats callback and run_id to config
config["stats_callback"] = stats_callback
config["run_id"] = run_id
# Execute the fuzzer module
fuzzer = AtherisFuzzer()
result = await fuzzer.execute(config, workspace)
logger.info(
f"✓ Fuzzing completed: "
f"{result.summary.get('total_executions', 0)} executions, "
f"{result.summary.get('crashes_found', 0)} crashes"
)
return result.dict()
except Exception as e:
logger.error(f"Fuzzing failed: {e}", exc_info=True)
raise
@@ -0,0 +1,65 @@
name: atheris_fuzzing
version: "1.0.0"
vertical: python
description: "Fuzz Python code using Atheris with real-time monitoring. Automatically discovers and fuzzes TestOneInput() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "atheris"
- "python"
- "coverage"
- "security"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_file: null
max_iterations: 1000000
timeout_seconds: 1800
parameters:
type: object
properties:
target_file:
type: string
description: "Python file with TestOneInput() function (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and vulnerabilities found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,175 @@
"""
Atheris Fuzzing Workflow - Temporal Version
Fuzzes user-provided Python code using Atheris with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class AtherisFuzzingWorkflow:
"""
Fuzz Python code using Atheris.
User workflow:
1. User runs: ff workflow run atheris_fuzzing .
2. CLI uploads project to MinIO
3. Worker downloads project
4. Worker fuzzes TestOneInput() function
5. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_file: Optional[str] = None, # Optional: specific file to fuzz
max_iterations: int = 1000000,
timeout_seconds: int = 1800 # 30 minutes default for fuzzing
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_file: Optional specific Python file with TestOneInput() (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting AtherisFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_file={target_file or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run Atheris fuzzing
workflow.logger.info("Step 2: Running Atheris fuzzing")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
fuzz_config = {
"target_file": target_file,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_atheris",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 60),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -0,0 +1,5 @@
"""Cargo Fuzzing Workflow"""
from .workflow import CargoFuzzingWorkflow
__all__ = ["CargoFuzzingWorkflow"]
@@ -0,0 +1,203 @@
"""
Cargo Fuzzing Workflow Activities
Activities specific to the cargo-fuzz fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_cargo")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the CargoFuzzer module on user code.
This activity:
1. Imports the reusable CargoFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's fuzz_target!() functions
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded Rust project)
config: Fuzzer configuration (target_name, max_iterations, timeout_seconds, sanitizer)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_cargo (workspace={workspace_path})")
try:
# Import reusable CargoFuzzer module
from modules.fuzzer import CargoFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
stats_payload = {
"run_id": run_id,
"workflow": "cargo_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "cargo_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.error(f"Stats callback error: {e}")
# Initialize CargoFuzzer module
fuzzer = CargoFuzzer()
# Execute fuzzing with stats callback
module_result = await fuzzer.execute(
config=config,
workspace=workspace,
stats_callback=stats_callback
)
# Convert ModuleResult to dictionary
result_dict = {
"findings": [],
"summary": module_result.summary,
"metadata": module_result.metadata,
"status": module_result.status,
"error": module_result.error
}
# Convert findings to dict format
for finding in module_result.findings:
finding_dict = {
"id": finding.id,
"title": finding.title,
"description": finding.description,
"severity": finding.severity,
"category": finding.category,
"file_path": finding.file_path,
"line_start": finding.line_start,
"line_end": finding.line_end,
"code_snippet": finding.code_snippet,
"recommendation": finding.recommendation,
"metadata": finding.metadata
}
result_dict["findings"].append(finding_dict)
# Generate SARIF report from findings
if module_result.findings:
# Convert findings to SARIF format
severity_map = {
"critical": "error",
"high": "error",
"medium": "warning",
"low": "note",
"info": "note"
}
results = []
for finding in module_result.findings:
result = {
"ruleId": finding.metadata.get("rule_id", finding.category),
"level": severity_map.get(finding.severity, "warning"),
"message": {"text": finding.description},
"locations": []
}
if finding.file_path:
location = {
"physicalLocation": {
"artifactLocation": {"uri": finding.file_path},
"region": {
"startLine": finding.line_start or 1,
"endLine": finding.line_end or finding.line_start or 1
}
}
}
result["locations"].append(location)
results.append(result)
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": [{
"tool": {
"driver": {
"name": "cargo-fuzz",
"version": "0.11.2"
}
},
"results": results
}]
}
else:
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": []
}
logger.info(
f"Fuzzing activity completed: {len(module_result.findings)} crashes found, "
f"{module_result.summary.get('total_executions', 0)} executions"
)
return result_dict
except Exception as e:
logger.error(f"Fuzzing activity failed: {e}", exc_info=True)
raise
@@ -0,0 +1,71 @@
name: cargo_fuzzing
version: "1.0.0"
vertical: rust
description: "Fuzz Rust code using cargo-fuzz with real-time monitoring. Automatically discovers and fuzzes fuzz_target!() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "cargo-fuzz"
- "rust"
- "libfuzzer"
- "memory-safety"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_name: null
max_iterations: 1000000
timeout_seconds: 1800
sanitizer: "address"
parameters:
type: object
properties:
target_name:
type: string
description: "Fuzz target name from fuzz/fuzz_targets/ (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
sanitizer:
type: string
enum: ["address", "memory", "undefined"]
default: "address"
description: "Sanitizer to use (address, memory, undefined)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and memory safety issues found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,180 @@
"""
Cargo Fuzzing Workflow - Temporal Version
Fuzzes user-provided Rust code using cargo-fuzz with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class CargoFuzzingWorkflow:
"""
Fuzz Rust code using cargo-fuzz (libFuzzer).
User workflow:
1. User runs: ff workflow run cargo_fuzzing .
2. CLI uploads Rust project to MinIO
3. Worker downloads project
4. Worker discovers fuzz targets in fuzz/fuzz_targets/
5. Worker fuzzes the target with cargo-fuzz
6. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_name: Optional[str] = None, # Optional: specific fuzz target name
max_iterations: int = 1000000,
timeout_seconds: int = 1800, # 30 minutes default for fuzzing
sanitizer: str = "address"
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_name: Optional specific fuzz target name (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
sanitizer: Sanitizer to use (address, memory, undefined)
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting CargoFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_name={target_name or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds}, sanitizer={sanitizer})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's Rust project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run cargo-fuzz
workflow.logger.info("Step 2: Running cargo-fuzz")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
actual_sanitizer = sanitizer if sanitizer is not None else "address"
fuzz_config = {
"target_name": target_name,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds,
"sanitizer": actual_sanitizer
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_cargo",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 120),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -1,12 +0,0 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,47 +0,0 @@
# Secret Detection Workflow Dockerfile
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog (use direct binary download to avoid install script issues)
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
&& tar -xzf trufflehog.tar.gz \
&& mv trufflehog /usr/local/bin/ \
&& rm trufflehog.tar.gz
# Install Gitleaks (use specific version to avoid API rate limiting)
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_8.18.2_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create toolbox directory structure
RUN mkdir -p /opt/prefect/toolbox
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# The toolbox code will be mounted at runtime from the backend container
# This includes:
# - /opt/prefect/toolbox/modules/base.py
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
VOLUME /opt/prefect/toolbox
# Set working directory for execution
WORKDIR /opt/prefect
@@ -1,58 +0,0 @@
# Secret Detection Workflow Dockerfile - Self-Contained Version
# This version copies all required modules into the image for complete isolation
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
# Install Gitleaks
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
&& tar -xzf gitleaks_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create directory structure
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
/opt/prefect/toolbox/modules/reporter \
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
# Copy the base module and required modules
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
# Copy the workflow code
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
# Copy toolbox init files
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
# Install Python dependencies for the modules
RUN pip install --no-cache-dir \
pydantic \
asyncio
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# Set default command (can be overridden)
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
@@ -1,130 +0,0 @@
# Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple industry-standard tools:
- **TruffleHog**: Comprehensive secret detection with verification capabilities
- **Gitleaks**: Git-specific secret scanning and leak detection
## Features
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
- **Deduplication**: Automatically removes duplicate findings across tools
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
- **Configurable**: Supports extensive configuration for both tools
## Dependencies
### Required Modules
- `toolbox.modules.secret_detection.trufflehog`
- `toolbox.modules.secret_detection.gitleaks`
- `toolbox.modules.reporter` (SARIF reporter)
- `toolbox.modules.base` (Base module interface)
### External Tools
- TruffleHog v3.63.2+
- Gitleaks v8.18.0+
## Docker Deployment
This workflow provides two Docker deployment approaches:
### 1. Volume-Based Approach (Default: `Dockerfile`)
**Advantages:**
- Live code updates without rebuilding images
- Smaller image sizes
- Consistent module versions across workflows
- Faster development iteration
**How it works:**
- Docker image contains only external tools (TruffleHog, Gitleaks)
- Python modules are mounted at runtime from the backend container
- Backend manages code synchronization via shared volumes
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
**Advantages:**
- Complete isolation and reproducibility
- No runtime dependencies on backend code
- Can run independently of FuzzForge platform
- Better for CI/CD integration
**How it works:**
- All required Python modules are copied into the Docker image
- Image is completely self-contained
- Larger image size but fully portable
## Configuration
### TruffleHog Configuration
```json
{
"trufflehog_config": {
"verify": true, // Verify discovered secrets
"concurrency": 10, // Number of concurrent workers
"max_depth": 10, // Maximum directory depth
"include_detectors": [], // Specific detectors to include
"exclude_detectors": [] // Specific detectors to exclude
}
}
```
### Gitleaks Configuration
```json
{
"gitleaks_config": {
"scan_mode": "detect", // "detect" or "protect"
"redact": true, // Redact secrets in output
"max_target_megabytes": 100, // Maximum file size (MB)
"no_git": false, // Scan without Git context
"config_file": "", // Custom Gitleaks config
"baseline_file": "" // Baseline file for known findings
}
}
```
## Usage Example
```bash
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/path/to/scan",
"volume_mode": "ro",
"parameters": {
"trufflehog_config": {
"verify": true,
"concurrency": 15
},
"gitleaks_config": {
"scan_mode": "detect",
"max_target_megabytes": 200
}
}
}'
```
## Output Format
The workflow generates a SARIF report containing:
- All unique findings from both tools
- Severity levels mapped to standard scale
- File locations and line numbers
- Detailed descriptions and recommendations
- Tool-specific metadata
## Performance Considerations
- **TruffleHog**: CPU-intensive with verification enabled
- **Gitleaks**: Memory-intensive for large repositories
- **Recommended Resources**: 512Mi memory, 500m CPU
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
## Security Notes
- Secrets are redacted in output by default
- Verified secrets are marked with higher severity
- Both tools support custom rules and exclusions
- Consider using baseline files for known false positives
@@ -1,17 +0,0 @@
"""
Secret Detection Scan Workflow
This package contains the comprehensive secret detection workflow that combines
multiple secret detection tools for thorough analysis.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,113 +0,0 @@
name: secret_detection_scan
version: "2.0.0"
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "secrets"
- "credentials"
- "detection"
- "trufflehog"
- "gitleaks"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "trufflehog"
- "gitleaks"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
trufflehog_config: {}
gitleaks_config: {}
reporter_config: {}
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
trufflehog_config:
type: object
description: "TruffleHog configuration"
properties:
verify:
type: boolean
description: "Verify discovered secrets"
concurrency:
type: integer
description: "Number of concurrent workers"
max_depth:
type: integer
description: "Maximum directory depth to scan"
include_detectors:
type: array
items:
type: string
description: "Specific detectors to include"
exclude_detectors:
type: array
items:
type: string
description: "Specific detectors to exclude"
gitleaks_config:
type: object
description: "Gitleaks configuration"
properties:
scan_mode:
type: string
enum: ["detect", "protect"]
description: "Scan mode"
redact:
type: boolean
description: "Redact secrets in output"
max_target_megabytes:
type: integer
description: "Maximum file size to scan (MB)"
no_git:
type: boolean
description: "Scan files without Git context"
config_file:
type: string
description: "Path to custom configuration file"
baseline_file:
type: string
description: "Path to baseline file"
reporter_config:
type: object
description: "SARIF reporter configuration"
properties:
output_file:
type: string
description: "Output SARIF file name"
include_code_flows:
type: boolean
description: "Include code flow information"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"
@@ -1,290 +0,0 @@
"""
Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple tools:
- TruffleHog: Comprehensive secret detection with verification
- Gitleaks: Git-specific secret scanning
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from typing import Dict, Any, List, Optional
from prefect import flow, task
from prefect.artifacts import create_markdown_artifact, create_table_artifact
import asyncio
import json
# Add modules to path
sys.path.insert(0, '/app')
# Import modules
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
from toolbox.modules.reporter import SARIFReporter
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="trufflehog_scan")
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run TruffleHog secret detection.
Args:
workspace: Path to the workspace
config: TruffleHog configuration
Returns:
TruffleHog results
"""
logger.info("Running TruffleHog secret detection")
module = TruffleHogModule()
result = await module.execute(config, workspace)
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
return result.dict()
@task(name="gitleaks_scan")
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run Gitleaks secret detection.
Args:
workspace: Path to the workspace
config: Gitleaks configuration
Returns:
Gitleaks results
"""
logger.info("Running Gitleaks secret detection")
module = GitleaksModule()
result = await module.execute(config, workspace)
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
return result.dict()
@task(name="aggregate_findings")
async def aggregate_findings_task(
trufflehog_results: Dict[str, Any],
gitleaks_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to aggregate findings from all secret detection tools.
Args:
trufflehog_results: Results from TruffleHog
gitleaks_results: Results from Gitleaks
config: Reporter configuration
workspace: Path to workspace
Returns:
Aggregated SARIF report
"""
logger.info("Aggregating secret detection findings")
# Combine all findings
all_findings = []
# Add TruffleHog findings
trufflehog_findings = trufflehog_results.get("findings", [])
all_findings.extend(trufflehog_findings)
# Add Gitleaks findings
gitleaks_findings = gitleaks_results.get("findings", [])
all_findings.extend(gitleaks_findings)
# Deduplicate findings based on file path and line number
unique_findings = []
seen_signatures = set()
for finding in all_findings:
# Create signature for deduplication
signature = (
finding.get("file_path", ""),
finding.get("line_start", 0),
finding.get("title", "").lower()[:50] # First 50 chars of title
)
if signature not in seen_signatures:
seen_signatures.add(signature)
unique_findings.append(finding)
else:
logger.debug(f"Deduplicated finding: {signature}")
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
# Generate SARIF report
reporter = SARIFReporter()
reporter_config = {
**config,
"findings": unique_findings,
"tool_name": "FuzzForge Secret Detection",
"tool_version": "1.0.0",
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
}
result = await reporter.execute(reporter_config, workspace)
return result.dict().get("sarif", {})
@flow(name="secret_detection_scan", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
trufflehog_config: Optional[Dict[str, Any]] = None,
gitleaks_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main secret detection workflow.
This workflow:
1. Runs TruffleHog for comprehensive secret detection
2. Runs Gitleaks for Git-specific secret detection
3. Aggregates and deduplicates findings
4. Generates a unified SARIF report
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
trufflehog_config: Configuration for TruffleHog
gitleaks_config: Configuration for Gitleaks
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
"""
logger.info("Starting comprehensive secret detection workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
# Default configurations - merge with provided configs to ensure defaults are always applied
default_trufflehog_config = {
"verify": False,
"concurrency": 10,
"max_depth": 10,
"no_git": True # Add no_git for filesystem scanning
}
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
default_gitleaks_config = {
"scan_mode": "detect",
"redact": True,
"max_target_megabytes": 100,
"no_git": True # Critical for non-git directories
}
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
default_reporter_config = {
"include_code_flows": False
}
reporter_config = {**default_reporter_config, **(reporter_config or {})}
try:
# Run secret detection tools in parallel
logger.info("Phase 1: Running secret detection tools")
# Create tasks for parallel execution
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
# Wait for both to complete
trufflehog_results, gitleaks_results = await asyncio.gather(
trufflehog_task_result,
gitleaks_task_result,
return_exceptions=True
)
# Handle any exceptions
if isinstance(trufflehog_results, Exception):
logger.error(f"TruffleHog failed: {trufflehog_results}")
trufflehog_results = {"findings": [], "status": "failed"}
if isinstance(gitleaks_results, Exception):
logger.error(f"Gitleaks failed: {gitleaks_results}")
gitleaks_results = {"findings": [], "status": "failed"}
# Aggregate findings
logger.info("Phase 2: Aggregating findings")
sarif_report = await aggregate_findings_task(
trufflehog_results,
gitleaks_results,
reporter_config,
workspace
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
# Log tool-specific stats
trufflehog_count = len(trufflehog_results.get("findings", []))
gitleaks_count = len(gitleaks_results.get("findings", []))
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
else:
logger.info("Workflow completed successfully with no findings")
return sarif_report
except Exception as e:
logger.error(f"Secret detection workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Secret Detection",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
}
if __name__ == "__main__":
# For local testing
import asyncio
asyncio.run(main_flow(
target_path="/tmp/test",
trufflehog_config={"verify": True, "max_depth": 5},
gitleaks_config={"scan_mode": "detect"}
))
@@ -0,0 +1,113 @@
name: ossfuzz_campaign
version: "1.0.0"
vertical: ossfuzz
description: "Generic OSS-Fuzz fuzzing campaign. Automatically reads project configuration from OSS-Fuzz repo and runs fuzzing using Google's infrastructure."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "oss-fuzz"
- "libfuzzer"
- "afl"
- "honggfuzz"
- "memory-safety"
- "security"
# Workspace isolation mode
# OSS-Fuzz campaigns use isolated mode for safe concurrent campaigns
workspace_isolation: "isolated"
default_parameters:
project_name: null
campaign_duration_hours: 1
override_engine: null
override_sanitizer: null
max_iterations: null
parameters:
type: object
required:
- project_name
properties:
project_name:
type: string
description: "OSS-Fuzz project name (e.g., 'curl', 'sqlite3', 'libxml2')"
examples:
- "curl"
- "sqlite3"
- "libxml2"
- "openssl"
- "zlib"
campaign_duration_hours:
type: integer
default: 1
minimum: 1
maximum: 168 # 1 week max
description: "How many hours to run the fuzzing campaign"
override_engine:
type: string
enum: ["libfuzzer", "afl", "honggfuzz"]
description: "Override fuzzing engine from project.yaml (optional)"
override_sanitizer:
type: string
enum: ["address", "memory", "undefined", "dataflow"]
description: "Override sanitizer from project.yaml (optional)"
max_iterations:
type: integer
minimum: 1000
description: "Optional limit on fuzzing iterations (optional)"
output_schema:
type: object
properties:
project_name:
type: string
description: "OSS-Fuzz project that was fuzzed"
summary:
type: object
description: "Campaign execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
unique_crashes:
type: integer
duration_hours:
type: number
engine_used:
type: string
sanitizer_used:
type: string
crashes:
type: array
description: "List of crash file paths"
items:
type: string
sarif:
type: object
description: "SARIF-formatted crash reports (future)"
examples:
- name: "Fuzz curl for 1 hour"
parameters:
project_name: "curl"
campaign_duration_hours: 1
- name: "Fuzz sqlite3 with AFL"
parameters:
project_name: "sqlite3"
campaign_duration_hours: 2
override_engine: "afl"
- name: "Fuzz libxml2 with memory sanitizer"
parameters:
project_name: "libxml2"
campaign_duration_hours: 6
override_sanitizer: "memory"
@@ -0,0 +1,219 @@
"""
OSS-Fuzz Campaign Workflow - Temporal Version
Generic workflow for running OSS-Fuzz campaigns using Google's infrastructure.
Automatically reads project configuration from OSS-Fuzz project.yaml files.
"""
import asyncio
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class OssfuzzCampaignWorkflow:
"""
Generic OSS-Fuzz fuzzing campaign workflow.
User workflow:
1. User runs: ff workflow run ossfuzz_campaign . project_name=curl
2. Worker loads project config from OSS-Fuzz repo
3. Worker builds project using OSS-Fuzz's build system
4. Worker runs fuzzing with engines from project.yaml
5. Crashes and corpus reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # Required by FuzzForge (not used, OSS-Fuzz downloads from Google)
project_name: str, # Required: OSS-Fuzz project name (e.g., "curl", "sqlite3")
campaign_duration_hours: int = 1,
override_engine: Optional[str] = None, # Override engine from project.yaml
override_sanitizer: Optional[str] = None, # Override sanitizer from project.yaml
max_iterations: Optional[int] = None # Optional: limit fuzzing iterations
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of uploaded target (not used, required by FuzzForge)
project_name: Name of OSS-Fuzz project (e.g., "curl", "sqlite3", "libxml2")
campaign_duration_hours: How many hours to fuzz (default: 1)
override_engine: Override fuzzing engine from project.yaml
override_sanitizer: Override sanitizer from project.yaml
max_iterations: Optional limit on fuzzing iterations
Returns:
Dictionary containing crashes, stats, and SARIF report
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting OSS-Fuzz Campaign for project '{project_name}' "
f"(workflow_id={workflow_id}, duration={campaign_duration_hours}h)"
)
results = {
"workflow_id": workflow_id,
"project_name": project_name,
"status": "running",
"steps": []
}
try:
# Step 1: Load OSS-Fuzz project configuration
workflow.logger.info(f"Step 1: Loading project config for '{project_name}'")
project_config = await workflow.execute_activity(
"load_ossfuzz_project",
args=[project_name],
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "load_config",
"status": "success",
"language": project_config.get("language"),
"engines": project_config.get("fuzzing_engines", []),
"sanitizers": project_config.get("sanitizers", [])
})
workflow.logger.info(
f"✓ Loaded config: language={project_config.get('language')}, "
f"engines={project_config.get('fuzzing_engines')}"
)
# Step 2: Build project using OSS-Fuzz infrastructure
workflow.logger.info(f"Step 2: Building project '{project_name}'")
build_result = await workflow.execute_activity(
"build_ossfuzz_project",
args=[
project_name,
project_config,
override_sanitizer,
override_engine
],
start_to_close_timeout=timedelta(minutes=30),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "build_project",
"status": "success",
"fuzz_targets": len(build_result.get("fuzz_targets", [])),
"sanitizer": build_result.get("sanitizer_used"),
"engine": build_result.get("engine_used")
})
workflow.logger.info(
f"✓ Build completed: {len(build_result.get('fuzz_targets', []))} fuzz targets found"
)
if not build_result.get("fuzz_targets"):
raise Exception(f"No fuzz targets found for project {project_name}")
# Step 3: Run fuzzing on discovered targets
workflow.logger.info(f"Step 3: Fuzzing {len(build_result['fuzz_targets'])} targets")
# Determine which engine to use
engine_to_use = override_engine if override_engine else build_result["engine_used"]
duration_seconds = campaign_duration_hours * 3600
# Fuzz each target (in parallel if multiple targets)
fuzz_futures = []
for target_path in build_result["fuzz_targets"]:
future = workflow.execute_activity(
"fuzz_target",
args=[target_path, engine_to_use, duration_seconds, None, None],
start_to_close_timeout=timedelta(seconds=duration_seconds + 300),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
fuzz_futures.append(future)
# Wait for all fuzzing to complete
fuzz_results = await asyncio.gather(*fuzz_futures, return_exceptions=True)
# Aggregate results
total_execs = 0
total_crashes = 0
all_crashes = []
for i, result in enumerate(fuzz_results):
if isinstance(result, Exception):
workflow.logger.error(f"Fuzzing failed for target {i}: {result}")
continue
total_execs += result.get("total_executions", 0)
total_crashes += result.get("crashes", 0)
all_crashes.extend(result.get("crash_files", []))
results["steps"].append({
"step": "fuzzing",
"status": "success",
"total_executions": total_execs,
"crashes_found": total_crashes,
"targets_fuzzed": len(build_result["fuzz_targets"])
})
workflow.logger.info(
f"✓ Fuzzing completed: {total_execs} executions, {total_crashes} crashes"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
# TODO: Implement crash minimization and SARIF generation
# For now, return raw results
results["status"] = "success"
results["summary"] = {
"project": project_name,
"total_executions": total_execs,
"crashes_found": total_crashes,
"unique_crashes": len(set(all_crashes)),
"duration_hours": campaign_duration_hours,
"engine_used": engine_to_use,
"sanitizer_used": build_result.get("sanitizer_used")
}
results["crashes"] = all_crashes[:100] # Limit to first 100 crashes
workflow.logger.info(
f"✓ Campaign completed: {project_name} - "
f"{total_execs} execs, {total_crashes} crashes"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
-187
View File
@@ -1,187 +0,0 @@
"""
Manual Workflow Registry for Prefect Deployment
This file contains the manual registry of all workflows that can be deployed.
Developers MUST add their workflows here after creating them.
This approach is required because:
1. Prefect cannot deploy dynamically imported flows
2. Docker deployment needs static flow references
3. Explicit registration provides better control and visibility
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from typing import Dict, Any, Callable
import logging
logger = logging.getLogger(__name__)
# Import only essential workflows
# Import each workflow individually to handle failures gracefully
security_assessment_flow = None
secret_detection_flow = None
# Try to import each workflow individually
try:
from .security_assessment.workflow import main_flow as security_assessment_flow
except ImportError as e:
logger.warning(f"Failed to import security_assessment workflow: {e}")
try:
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
except ImportError as e:
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
# Manual registry - developers add workflows here after creation
# Only include workflows that were successfully imported
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
# Add workflows that were successfully imported
if security_assessment_flow is not None:
WORKFLOW_REGISTRY["security_assessment"] = {
"flow": security_assessment_flow,
"module_path": "toolbox.workflows.security_assessment.workflow",
"function_name": "main_flow",
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
}
if secret_detection_flow is not None:
WORKFLOW_REGISTRY["secret_detection_scan"] = {
"flow": secret_detection_flow,
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
"function_name": "main_flow",
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
}
#
# To add a new workflow, follow this pattern:
#
# "my_new_workflow": {
# "flow": my_new_flow_function, # Import the flow function above
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
# "function_name": "my_new_flow_function",
# "description": "Description of what this workflow does",
# "version": "1.0.0",
# "author": "Developer Name",
# "tags": ["tag1", "tag2"]
# }
def get_workflow_flow(workflow_name: str) -> Callable:
"""
Get the flow function for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Flow function
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}. "
f"Please add the workflow to toolbox/workflows/registry.py"
)
return WORKFLOW_REGISTRY[workflow_name]["flow"]
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information dictionary
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}"
)
return WORKFLOW_REGISTRY[workflow_name]
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
"""
Get all registered workflows.
Returns:
Dictionary of all workflow registry entries
"""
return WORKFLOW_REGISTRY.copy()
def validate_registry() -> bool:
"""
Validate the workflow registry for consistency.
Returns:
True if valid, raises exceptions if not
Raises:
ValueError: If registry is invalid
"""
if not WORKFLOW_REGISTRY:
raise ValueError("Workflow registry is empty")
required_fields = ["flow", "module_path", "function_name", "description"]
for name, entry in WORKFLOW_REGISTRY.items():
# Check required fields
missing_fields = [field for field in required_fields if field not in entry]
if missing_fields:
raise ValueError(
f"Workflow '{name}' missing required fields: {missing_fields}"
)
# Check if flow is callable
if not callable(entry["flow"]):
raise ValueError(f"Workflow '{name}' flow is not callable")
# Check if flow has the required Prefect attributes
if not hasattr(entry["flow"], "deploy"):
raise ValueError(
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
)
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
return True
# Validate registry on import
try:
validate_registry()
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
except Exception as e:
logger.error(f"Workflow registry validation failed: {e}")
raise
@@ -1,30 +0,0 @@
FROM prefecthq/prefect:3-python3.11
WORKDIR /app
# Create toolbox directory structure to match expected import paths
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
# Copy base module infrastructure
COPY modules/__init__.py /app/toolbox/modules/
COPY modules/base.py /app/toolbox/modules/
# Copy only required modules (manual selection)
COPY modules/scanner /app/toolbox/modules/scanner
COPY modules/analyzer /app/toolbox/modules/analyzer
COPY modules/reporter /app/toolbox/modules/reporter
# Copy this workflow
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
# Install workflow-specific requirements if they exist
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
# Install common requirements
RUN pip install --no-cache-dir pyyaml
# Set Python path
ENV PYTHONPATH=/app:$PYTHONPATH
# Create workspace directory
RUN mkdir -p /workspace
@@ -0,0 +1,150 @@
"""
Security Assessment Workflow Activities
Activities specific to the security assessment workflow:
- scan_files_activity: Scan files in the workspace
- analyze_security_activity: Analyze security vulnerabilities
- generate_sarif_report_activity: Generate SARIF report from findings
"""
import logging
import sys
from pathlib import Path
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="scan_files")
async def scan_files_activity(workspace_path: str, config: dict) -> dict:
"""
Scan files in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Scanner configuration
Returns:
Scanner results dictionary
"""
logger.info(f"Activity: scan_files (workspace={workspace_path})")
try:
from modules.scanner import FileScanner
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(
f"✓ File scanning completed: "
f"{result.summary.get('total_files', 0)} files scanned"
)
return result.dict()
except Exception as e:
logger.error(f"File scanning failed: {e}", exc_info=True)
raise
@activity.defn(name="analyze_security")
async def analyze_security_activity(workspace_path: str, config: dict) -> dict:
"""
Analyze security vulnerabilities in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Analyzer configuration
Returns:
Analysis results dictionary
"""
logger.info(f"Activity: analyze_security (workspace={workspace_path})")
try:
from modules.analyzer import SecurityAnalyzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"✓ Security analysis completed: "
f"{result.summary.get('total_findings', 0)} findings"
)
return result.dict()
except Exception as e:
logger.error(f"Security analysis failed: {e}", exc_info=True)
raise
@activity.defn(name="generate_sarif_report")
async def generate_sarif_report_activity(
scan_results: dict,
analysis_results: dict,
config: dict,
workspace_path: str
) -> dict:
"""
Generate SARIF report from scan and analysis results.
Args:
scan_results: Results from file scanner
analysis_results: Results from security analyzer
config: Reporter configuration
workspace_path: Path to the workspace
Returns:
SARIF report dictionary
"""
logger.info("Activity: generate_sarif_report")
try:
from modules.reporter import SARIFReporter
workspace = Path(workspace_path)
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
reporter = SARIFReporter()
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"✓ SARIF report generated with {len(all_findings)} findings")
return sarif
except Exception as e:
logger.error(f"SARIF report generation failed: {e}", exc_info=True)
raise
@@ -1,8 +1,8 @@
name: security_assessment
version: "2.0.0"
vertical: rust
description: "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "security"
- "scanner"
@@ -11,28 +11,14 @@ tags:
- "sarif"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "file_scanner"
- "security_analyzer"
- "sarif_reporter"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
# Using "shared" mode for read-only security analysis (no file modifications)
workspace_isolation: "shared"
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
scanner_config: {}
analyzer_config: {}
reporter_config: {}
@@ -40,15 +26,6 @@ default_parameters:
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
scanner_config:
type: object
description: "File scanner configuration"
@@ -1,4 +0,0 @@
# Requirements for security assessment workflow
pydantic>=2.0.0
pyyaml>=6.0
aiofiles>=23.0.0
@@ -1,5 +1,7 @@
"""
Security Assessment Workflow - Comprehensive security analysis using multiple modules
Security Assessment Workflow - Temporal Version
Comprehensive security analysis using multiple modules.
"""
# Copyright (c) 2025 FuzzingLabs
@@ -13,240 +15,219 @@ Security Assessment Workflow - Comprehensive security analysis using multiple mo
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from datetime import timedelta
from typing import Dict, Any, Optional
from prefect import flow, task
import json
# Add modules to path
sys.path.insert(0, '/app')
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import modules
from toolbox.modules.scanner import FileScanner
from toolbox.modules.analyzer import SecurityAnalyzer
from toolbox.modules.reporter import SARIFReporter
# Import activity interfaces (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="file_scanning")
async def scan_files_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
@workflow.defn
class SecurityAssessmentWorkflow:
"""
Task to scan files in the workspace.
Args:
workspace: Path to the workspace
config: Scanner configuration
Returns:
Scanner results
"""
logger.info(f"Starting file scanning in {workspace}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(f"File scanning completed: {result.summary.get('total_files', 0)} files found")
return result.dict()
@task(name="security_analysis")
async def analyze_security_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to analyze security vulnerabilities.
Args:
workspace: Path to the workspace
config: Analyzer configuration
Returns:
Analysis results
"""
logger.info("Starting security analysis")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"Security analysis completed: {result.summary.get('total_findings', 0)} findings"
)
return result.dict()
@task(name="report_generation")
async def generate_report_task(
scan_results: Dict[str, Any],
analysis_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to generate SARIF report from all findings.
Args:
scan_results: Results from scanner
analysis_results: Results from analyzer
config: Reporter configuration
workspace: Path to the workspace
Returns:
SARIF report
"""
logger.info("Generating SARIF report")
reporter = SARIFReporter()
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"Report generated with {len(all_findings)} total findings")
return sarif
@flow(name="security_assessment", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main security assessment workflow.
Comprehensive security assessment workflow.
This workflow:
1. Scans files in the workspace
2. Analyzes code for security vulnerabilities
3. Generates a SARIF report with all findings
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
1. Downloads target from MinIO
2. Scans files in the workspace
3. Analyzes code for security vulnerabilities
4. Generates a SARIF report with all findings
5. Uploads results to MinIO
6. Cleans up cache
"""
logger.info(f"Starting security assessment workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
@workflow.run
async def run(
self,
target_id: str,
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main workflow execution.
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
Args:
target_id: UUID of the uploaded target in MinIO
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
Returns:
Dictionary containing SARIF report and summary
"""
workflow_id = workflow.info().workflow_id
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
try:
# Execute workflow tasks
logger.info("Phase 1: File scanning")
scan_results = await scan_files_task(workspace, scanner_config)
logger.info("Phase 2: Security analysis")
analysis_results = await analyze_security_task(workspace, analyzer_config)
logger.info("Phase 3: Report generation")
sarif_report = await generate_report_task(
scan_results,
analysis_results,
reporter_config,
workspace
workflow.logger.info(
f"Starting SecurityAssessmentWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id})"
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} findings")
else:
logger.info("Workflow completed successfully")
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
return sarif_report
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
except Exception as e:
logger.error(f"Workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Security Assessment",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation (using shared mode for read-only analysis)
run_id = workflow.info().run_id
if __name__ == "__main__":
# For local testing
import asyncio
# Step 1: Download target from MinIO
workflow.logger.info("Step 1: Downloading target from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ Target downloaded to: {target_path}")
asyncio.run(main_flow(
target_path="/tmp/test",
scanner_config={"patterns": ["*.py"]},
analyzer_config={"check_secrets": True}
))
# Step 2: File scanning
workflow.logger.info("Step 2: Scanning files")
scan_results = await workflow.execute_activity(
"scan_files",
args=[target_path, scanner_config],
start_to_close_timeout=timedelta(minutes=10),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "file_scanning",
"status": "success",
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
})
workflow.logger.info(
f"✓ File scanning completed: "
f"{scan_results.get('summary', {}).get('total_files', 0)} files"
)
# Step 3: Security analysis
workflow.logger.info("Step 3: Analyzing security vulnerabilities")
analysis_results = await workflow.execute_activity(
"analyze_security",
args=[target_path, analyzer_config],
start_to_close_timeout=timedelta(minutes=15),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "security_analysis",
"status": "success",
"findings": analysis_results.get("summary", {}).get("total_findings", 0)
})
workflow.logger.info(
f"✓ Security analysis completed: "
f"{analysis_results.get('summary', {}).get('total_findings', 0)} findings"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
sarif_report = await workflow.execute_activity(
"generate_sarif_report",
args=[scan_results, analysis_results, reporter_config, target_path],
start_to_close_timeout=timedelta(minutes=5)
)
results["steps"].append({
"step": "report_generation",
"status": "success"
})
# Count total findings in SARIF
total_findings = 0
if sarif_report and "runs" in sarif_report:
total_findings = len(sarif_report["runs"][0].get("results", []))
workflow.logger.info(f"✓ SARIF report generated with {total_findings} findings")
# Step 5: Upload results to MinIO
workflow.logger.info("Step 5: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, sarif_report, "sarif"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 6: Cleanup cache
workflow.logger.info("Step 6: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "shared"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up (skipped for shared mode)")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["sarif"] = sarif_report
results["summary"] = {
"total_findings": total_findings,
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
}
workflow.logger.info(f"✓ Workflow completed successfully: {workflow_id}")
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise