mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-05-22 15:39:44 +02:00
CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
This commit is contained in:
@@ -0,0 +1,45 @@
|
||||
# OSS-Fuzz Worker - Generic fuzzing using OSS-Fuzz infrastructure
|
||||
FROM gcr.io/oss-fuzz-base/base-builder:latest
|
||||
|
||||
# Install Python, Docker CLI, and dependencies (use Python 3.8 from base image)
|
||||
RUN apt-get update && apt-get install -y \
|
||||
python3-pip \
|
||||
python3-dev \
|
||||
git \
|
||||
docker.io \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Upgrade pip
|
||||
RUN python3 -m pip install --upgrade pip
|
||||
|
||||
# Install Temporal Python SDK and dependencies
|
||||
RUN pip3 install --no-cache-dir \
|
||||
temporalio==1.5.0 \
|
||||
boto3==1.34.50 \
|
||||
pyyaml==6.0.1 \
|
||||
psutil==5.9.8
|
||||
|
||||
# Create necessary directories
|
||||
RUN mkdir -p /app /cache /corpus /output
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/app
|
||||
ENV WORKER_VERTICAL=ossfuzz
|
||||
ENV MAX_CONCURRENT_ACTIVITIES=2
|
||||
ENV CACHE_DIR=/cache
|
||||
ENV CACHE_MAX_SIZE=50GB
|
||||
ENV CACHE_TTL=30d
|
||||
|
||||
# Clone OSS-Fuzz repo (will be cached in /cache by worker)
|
||||
# This is just to have helper scripts available
|
||||
RUN git clone --depth=1 https://github.com/google/oss-fuzz.git /opt/oss-fuzz
|
||||
|
||||
# Copy worker code
|
||||
COPY worker.py /app/
|
||||
COPY activities.py /app/
|
||||
COPY requirements.txt /app/
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Run worker
|
||||
CMD ["python3", "worker.py"]
|
||||
@@ -0,0 +1,413 @@
|
||||
"""
|
||||
OSS-Fuzz Campaign Activities
|
||||
|
||||
Activities for running OSS-Fuzz campaigns using Google's infrastructure.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import subprocess
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from datetime import datetime
|
||||
|
||||
import yaml
|
||||
from temporalio import activity
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Paths
|
||||
OSS_FUZZ_REPO = Path("/opt/oss-fuzz")
|
||||
CACHE_DIR = Path(os.getenv("CACHE_DIR", "/cache"))
|
||||
|
||||
|
||||
@activity.defn(name="load_ossfuzz_project")
|
||||
async def load_ossfuzz_project_activity(project_name: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Load OSS-Fuzz project configuration from project.yaml.
|
||||
|
||||
Args:
|
||||
project_name: Name of the OSS-Fuzz project (e.g., "curl", "sqlite3")
|
||||
|
||||
Returns:
|
||||
Dictionary with project config, paths, and metadata
|
||||
"""
|
||||
logger.info(f"Loading OSS-Fuzz project: {project_name}")
|
||||
|
||||
# Update OSS-Fuzz repo if it exists, clone if not
|
||||
if OSS_FUZZ_REPO.exists():
|
||||
logger.info("Updating OSS-Fuzz repository...")
|
||||
subprocess.run(
|
||||
["git", "-C", str(OSS_FUZZ_REPO), "pull", "--depth=1"],
|
||||
check=False # Don't fail if already up to date
|
||||
)
|
||||
else:
|
||||
logger.info("Cloning OSS-Fuzz repository...")
|
||||
subprocess.run(
|
||||
[
|
||||
"git", "clone", "--depth=1",
|
||||
"https://github.com/google/oss-fuzz.git",
|
||||
str(OSS_FUZZ_REPO)
|
||||
],
|
||||
check=True
|
||||
)
|
||||
|
||||
# Find project directory
|
||||
project_path = OSS_FUZZ_REPO / "projects" / project_name
|
||||
if not project_path.exists():
|
||||
raise ValueError(
|
||||
f"Project '{project_name}' not found in OSS-Fuzz. "
|
||||
f"Available projects: https://github.com/google/oss-fuzz/tree/master/projects"
|
||||
)
|
||||
|
||||
# Read project.yaml
|
||||
config_file = project_path / "project.yaml"
|
||||
if not config_file.exists():
|
||||
raise ValueError(f"No project.yaml found for project '{project_name}'")
|
||||
|
||||
with open(config_file) as f:
|
||||
config = yaml.safe_load(f)
|
||||
|
||||
# Add paths
|
||||
config["project_name"] = project_name
|
||||
config["project_path"] = str(project_path)
|
||||
config["dockerfile_path"] = str(project_path / "Dockerfile")
|
||||
config["build_script_path"] = str(project_path / "build.sh")
|
||||
|
||||
# Validate required fields
|
||||
if not config.get("language"):
|
||||
logger.warning(f"No language specified in project.yaml for {project_name}")
|
||||
|
||||
logger.info(
|
||||
f"✓ Loaded project {project_name}: "
|
||||
f"language={config.get('language', 'unknown')}, "
|
||||
f"engines={config.get('fuzzing_engines', [])}, "
|
||||
f"sanitizers={config.get('sanitizers', [])}"
|
||||
)
|
||||
|
||||
return config
|
||||
|
||||
|
||||
@activity.defn(name="build_ossfuzz_project")
|
||||
async def build_ossfuzz_project_activity(
|
||||
project_name: str,
|
||||
project_config: Dict[str, Any],
|
||||
sanitizer: Optional[str] = None,
|
||||
engine: Optional[str] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Build OSS-Fuzz project directly using build.sh (no Docker-in-Docker).
|
||||
|
||||
Args:
|
||||
project_name: Name of the project
|
||||
project_config: Configuration from project.yaml
|
||||
sanitizer: Override sanitizer (default: first from project.yaml)
|
||||
engine: Override engine (default: first from project.yaml)
|
||||
|
||||
Returns:
|
||||
Dictionary with build results and discovered fuzz targets
|
||||
"""
|
||||
logger.info(f"Building OSS-Fuzz project: {project_name}")
|
||||
|
||||
# Determine sanitizer and engine
|
||||
sanitizers = project_config.get("sanitizers", ["address"])
|
||||
engines = project_config.get("fuzzing_engines", ["libfuzzer"])
|
||||
|
||||
use_sanitizer = sanitizer if sanitizer else sanitizers[0]
|
||||
use_engine = engine if engine else engines[0]
|
||||
|
||||
logger.info(f"Building with sanitizer={use_sanitizer}, engine={use_engine}")
|
||||
|
||||
# Setup directories
|
||||
src_dir = Path("/src")
|
||||
out_dir = Path("/out")
|
||||
src_dir.mkdir(exist_ok=True)
|
||||
out_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Clean previous build artifacts
|
||||
for item in out_dir.glob("*"):
|
||||
if item.is_file():
|
||||
item.unlink()
|
||||
elif item.is_dir():
|
||||
shutil.rmtree(item)
|
||||
|
||||
# Copy project files from OSS-Fuzz repo to /src
|
||||
project_path = Path(project_config["project_path"])
|
||||
build_script = project_path / "build.sh"
|
||||
|
||||
if not build_script.exists():
|
||||
raise Exception(f"build.sh not found for project {project_name}")
|
||||
|
||||
logger.info(f"Copying project files from {project_path} to {src_dir}")
|
||||
|
||||
# Copy build.sh
|
||||
shutil.copy2(build_script, src_dir / "build.sh")
|
||||
os.chmod(src_dir / "build.sh", 0o755)
|
||||
|
||||
# Copy any fuzzer source files (*.cc, *.c, *.cpp files)
|
||||
for pattern in ["*.cc", "*.c", "*.cpp", "*.h", "*.hh", "*.hpp"]:
|
||||
for src_file in project_path.glob(pattern):
|
||||
dest_file = src_dir / src_file.name
|
||||
shutil.copy2(src_file, dest_file)
|
||||
logger.info(f"Copied: {src_file.name}")
|
||||
|
||||
# Clone project source code to subdirectory
|
||||
main_repo = project_config.get("main_repo")
|
||||
work_dir = src_dir
|
||||
|
||||
if main_repo:
|
||||
logger.info(f"Cloning project source from {main_repo}")
|
||||
project_src_dir = src_dir / project_name
|
||||
|
||||
# Remove existing directory if present
|
||||
if project_src_dir.exists():
|
||||
shutil.rmtree(project_src_dir)
|
||||
|
||||
clone_cmd = ["git", "clone", "--depth=1", main_repo, str(project_src_dir)]
|
||||
result = subprocess.run(clone_cmd, capture_output=True, text=True, timeout=600)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.warning(f"Failed to clone {main_repo}: {result.stderr}")
|
||||
logger.info("Continuing without cloning (build.sh may download source)")
|
||||
else:
|
||||
# Copy build.sh into the project source directory
|
||||
shutil.copy2(src_dir / "build.sh", project_src_dir / "build.sh")
|
||||
os.chmod(project_src_dir / "build.sh", 0o755)
|
||||
# build.sh should run from within the project directory
|
||||
work_dir = project_src_dir
|
||||
logger.info(f"Build will run from: {work_dir}")
|
||||
else:
|
||||
logger.info("No main_repo in project.yaml, build.sh will download source")
|
||||
|
||||
# Set OSS-Fuzz environment variables
|
||||
build_env = os.environ.copy()
|
||||
build_env.update({
|
||||
"SRC": str(src_dir),
|
||||
"OUT": str(out_dir),
|
||||
"FUZZING_ENGINE": use_engine,
|
||||
"SANITIZER": use_sanitizer,
|
||||
"ARCHITECTURE": "x86_64",
|
||||
# Use clang's built-in libfuzzer instead of separate library
|
||||
"LIB_FUZZING_ENGINE": "-fsanitize=fuzzer",
|
||||
})
|
||||
|
||||
# Set sanitizer flags
|
||||
if use_sanitizer == "address":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=address"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=address"
|
||||
elif use_sanitizer == "memory":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=memory"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=memory"
|
||||
elif use_sanitizer == "undefined":
|
||||
build_env["CFLAGS"] = build_env.get("CFLAGS", "") + " -fsanitize=undefined"
|
||||
build_env["CXXFLAGS"] = build_env.get("CXXFLAGS", "") + " -fsanitize=undefined"
|
||||
|
||||
# Execute build.sh from the work directory
|
||||
logger.info(f"Executing build.sh in {work_dir}")
|
||||
build_cmd = ["bash", "./build.sh"]
|
||||
|
||||
result = subprocess.run(
|
||||
build_cmd,
|
||||
cwd=str(work_dir),
|
||||
env=build_env,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=1800 # 30 minutes max build time
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.error(f"Build failed:\nSTDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}")
|
||||
raise Exception(f"Build failed for {project_name}: {result.stderr}")
|
||||
|
||||
logger.info("✓ Build completed successfully")
|
||||
logger.info(f"Build output:\n{result.stdout[-2000:]}") # Last 2000 chars
|
||||
|
||||
# Discover fuzz targets in /out
|
||||
fuzz_targets = []
|
||||
for file in out_dir.glob("*"):
|
||||
if file.is_file() and os.access(file, os.X_OK):
|
||||
# Check if it's a fuzz target (executable, not .so/.a/.o)
|
||||
if file.suffix not in ['.so', '.a', '.o', '.zip']:
|
||||
fuzz_targets.append(str(file))
|
||||
logger.info(f"Found fuzz target: {file.name}")
|
||||
|
||||
if not fuzz_targets:
|
||||
logger.warning(f"No fuzz targets found in {out_dir}")
|
||||
logger.info(f"Directory contents: {list(out_dir.glob('*'))}")
|
||||
|
||||
return {
|
||||
"fuzz_targets": fuzz_targets,
|
||||
"build_log": result.stdout[-5000:], # Last 5000 chars
|
||||
"sanitizer_used": use_sanitizer,
|
||||
"engine_used": use_engine,
|
||||
"out_dir": str(out_dir)
|
||||
}
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_target")
|
||||
async def fuzz_target_activity(
|
||||
target_path: str,
|
||||
engine: str,
|
||||
duration_seconds: int,
|
||||
corpus_dir: Optional[str] = None,
|
||||
dict_file: Optional[str] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Run fuzzing on a target with specified engine.
|
||||
|
||||
Args:
|
||||
target_path: Path to fuzz target executable
|
||||
engine: Fuzzing engine (libfuzzer, afl, honggfuzz)
|
||||
duration_seconds: How long to fuzz
|
||||
corpus_dir: Optional corpus directory
|
||||
dict_file: Optional dictionary file
|
||||
|
||||
Returns:
|
||||
Dictionary with fuzzing stats and results
|
||||
"""
|
||||
logger.info(f"Fuzzing {Path(target_path).name} with {engine} for {duration_seconds}s")
|
||||
|
||||
# Prepare corpus directory
|
||||
if not corpus_dir:
|
||||
corpus_dir = str(CACHE_DIR / "corpus" / Path(target_path).stem)
|
||||
Path(corpus_dir).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
output_dir = CACHE_DIR / "output" / Path(target_path).stem
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
start_time = datetime.now()
|
||||
|
||||
try:
|
||||
if engine == "libfuzzer":
|
||||
cmd = [
|
||||
target_path,
|
||||
corpus_dir,
|
||||
f"-max_total_time={duration_seconds}",
|
||||
"-print_final_stats=1",
|
||||
f"-artifact_prefix={output_dir}/"
|
||||
]
|
||||
if dict_file:
|
||||
cmd.append(f"-dict={dict_file}")
|
||||
|
||||
elif engine == "afl":
|
||||
cmd = [
|
||||
"afl-fuzz",
|
||||
"-i", corpus_dir if Path(corpus_dir).glob("*") else "-", # Empty corpus OK
|
||||
"-o", str(output_dir),
|
||||
"-t", "1000", # Timeout per execution
|
||||
"-m", "none", # No memory limit
|
||||
"--", target_path, "@@"
|
||||
]
|
||||
|
||||
elif engine == "honggfuzz":
|
||||
cmd = [
|
||||
"honggfuzz",
|
||||
f"--run_time={duration_seconds}",
|
||||
"-i", corpus_dir,
|
||||
"-o", str(output_dir),
|
||||
"--", target_path
|
||||
]
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unsupported fuzzing engine: {engine}")
|
||||
|
||||
logger.info(f"Starting fuzzer: {' '.join(cmd[:5])}...")
|
||||
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=duration_seconds + 120 # Add 2 minute buffer
|
||||
)
|
||||
|
||||
end_time = datetime.now()
|
||||
elapsed = (end_time - start_time).total_seconds()
|
||||
|
||||
# Parse stats from output
|
||||
stats = parse_fuzzing_stats(result.stdout, result.stderr, engine)
|
||||
stats["elapsed_time"] = elapsed
|
||||
stats["target_name"] = Path(target_path).name
|
||||
stats["engine"] = engine
|
||||
|
||||
# Find crashes
|
||||
crashes = find_crashes(output_dir)
|
||||
stats["crashes"] = len(crashes)
|
||||
stats["crash_files"] = crashes
|
||||
|
||||
# Collect new corpus files
|
||||
new_corpus = collect_corpus(corpus_dir)
|
||||
stats["corpus_size"] = len(new_corpus)
|
||||
stats["corpus_files"] = new_corpus
|
||||
|
||||
logger.info(
|
||||
f"✓ Fuzzing completed: {stats.get('total_executions', 0)} execs, "
|
||||
f"{len(crashes)} crashes"
|
||||
)
|
||||
|
||||
return stats
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning(f"Fuzzing timed out after {duration_seconds}s")
|
||||
return {
|
||||
"target_name": Path(target_path).name,
|
||||
"engine": engine,
|
||||
"status": "timeout",
|
||||
"elapsed_time": duration_seconds
|
||||
}
|
||||
|
||||
|
||||
def parse_fuzzing_stats(stdout: str, stderr: str, engine: str) -> Dict[str, Any]:
|
||||
"""Parse fuzzing statistics from output"""
|
||||
stats = {}
|
||||
|
||||
if engine == "libfuzzer":
|
||||
# Parse libFuzzer stats
|
||||
for line in (stdout + stderr).split('\n'):
|
||||
if "#" in line and "NEW" in line:
|
||||
# Example: #8192 NEW cov: 1234 ft: 5678 corp: 89/10KB
|
||||
parts = line.split()
|
||||
for i, part in enumerate(parts):
|
||||
if part.startswith("cov:"):
|
||||
stats["coverage"] = int(parts[i+1])
|
||||
elif part.startswith("corp:"):
|
||||
stats["corpus_entries"] = int(parts[i+1].split('/')[0])
|
||||
elif part.startswith("exec/s:"):
|
||||
stats["executions_per_sec"] = float(parts[i+1])
|
||||
elif part.startswith("#"):
|
||||
stats["total_executions"] = int(part[1:])
|
||||
|
||||
elif engine == "afl":
|
||||
# Parse AFL stats (would need to read fuzzer_stats file)
|
||||
pass
|
||||
|
||||
elif engine == "honggfuzz":
|
||||
# Parse Honggfuzz stats
|
||||
pass
|
||||
|
||||
return stats
|
||||
|
||||
|
||||
def find_crashes(output_dir: Path) -> List[str]:
|
||||
"""Find crash files in output directory"""
|
||||
crashes = []
|
||||
|
||||
# libFuzzer crash files start with "crash-" or "leak-"
|
||||
for pattern in ["crash-*", "leak-*", "timeout-*"]:
|
||||
crashes.extend([str(f) for f in output_dir.glob(pattern)])
|
||||
|
||||
# AFL crashes in crashes/ subdirectory
|
||||
crashes_dir = output_dir / "crashes"
|
||||
if crashes_dir.exists():
|
||||
crashes.extend([str(f) for f in crashes_dir.glob("*") if f.is_file()])
|
||||
|
||||
return crashes
|
||||
|
||||
|
||||
def collect_corpus(corpus_dir: str) -> List[str]:
|
||||
"""Collect corpus files"""
|
||||
corpus_path = Path(corpus_dir)
|
||||
if not corpus_path.exists():
|
||||
return []
|
||||
|
||||
return [str(f) for f in corpus_path.glob("*") if f.is_file()]
|
||||
@@ -0,0 +1,4 @@
|
||||
temporalio==1.5.0
|
||||
boto3==1.34.50
|
||||
pyyaml==6.0.1
|
||||
psutil==5.9.8
|
||||
@@ -0,0 +1,319 @@
|
||||
"""
|
||||
FuzzForge Vertical Worker: OSS-Fuzz Campaigns
|
||||
|
||||
This worker:
|
||||
1. Discovers workflows for the 'ossfuzz' vertical from mounted toolbox
|
||||
2. Dynamically imports and registers workflow classes
|
||||
3. Connects to Temporal and processes tasks
|
||||
4. Handles activities for OSS-Fuzz project building and fuzzing
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import importlib
|
||||
import inspect
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import List, Any
|
||||
|
||||
import yaml
|
||||
from temporalio.client import Client
|
||||
from temporalio.worker import Worker
|
||||
|
||||
# Add toolbox to path for workflow and activity imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
# Import common storage activities
|
||||
from toolbox.common.storage_activities import (
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity
|
||||
)
|
||||
|
||||
# Import OSS-Fuzz specific activities
|
||||
from activities import (
|
||||
load_ossfuzz_project_activity,
|
||||
build_ossfuzz_project_activity,
|
||||
fuzz_target_activity
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=os.getenv('LOG_LEVEL', 'INFO'),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def discover_workflows(vertical: str) -> List[Any]:
|
||||
"""
|
||||
Discover workflows for this vertical from mounted toolbox.
|
||||
|
||||
Args:
|
||||
vertical: The vertical name (e.g., 'ossfuzz')
|
||||
|
||||
Returns:
|
||||
List of workflow classes decorated with @workflow.defn
|
||||
"""
|
||||
workflows = []
|
||||
toolbox_path = Path("/app/toolbox/workflows")
|
||||
|
||||
if not toolbox_path.exists():
|
||||
logger.warning(f"Toolbox path does not exist: {toolbox_path}")
|
||||
return workflows
|
||||
|
||||
logger.info(f"Scanning for workflows in: {toolbox_path}")
|
||||
|
||||
for workflow_dir in toolbox_path.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
metadata_file = workflow_dir / "metadata.yaml"
|
||||
if not metadata_file.exists():
|
||||
logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Parse metadata
|
||||
with open(metadata_file) as f:
|
||||
metadata = yaml.safe_load(f)
|
||||
|
||||
# Check if workflow is for this vertical
|
||||
workflow_vertical = metadata.get("vertical")
|
||||
if workflow_vertical != vertical:
|
||||
logger.debug(
|
||||
f"Workflow {workflow_dir.name} is for vertical '{workflow_vertical}', "
|
||||
f"not '{vertical}', skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Check if workflow.py exists
|
||||
workflow_file = workflow_dir / "workflow.py"
|
||||
if not workflow_file.exists():
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
|
||||
)
|
||||
continue
|
||||
|
||||
# Dynamically import workflow module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.workflow"
|
||||
logger.info(f"Importing workflow module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import workflow module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @workflow.defn decorated classes
|
||||
found_workflows = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isclass):
|
||||
# Check if class has Temporal workflow definition
|
||||
if hasattr(obj, '__temporal_workflow_definition'):
|
||||
workflows.append(obj)
|
||||
found_workflows = True
|
||||
logger.info(
|
||||
f"✓ Discovered workflow: {name} from {workflow_dir.name} "
|
||||
f"(vertical: {vertical})"
|
||||
)
|
||||
|
||||
if not found_workflows:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has no @workflow.defn decorated classes"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing workflow {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(workflows)} workflows for vertical '{vertical}'")
|
||||
return workflows
|
||||
|
||||
|
||||
async def discover_activities(workflows_dir: Path) -> List[Any]:
|
||||
"""
|
||||
Discover activities from workflow directories.
|
||||
|
||||
Looks for activities.py files alongside workflow.py in each workflow directory.
|
||||
|
||||
Args:
|
||||
workflows_dir: Path to workflows directory
|
||||
|
||||
Returns:
|
||||
List of activity functions decorated with @activity.defn
|
||||
"""
|
||||
activities = []
|
||||
|
||||
if not workflows_dir.exists():
|
||||
logger.warning(f"Workflows directory does not exist: {workflows_dir}")
|
||||
return activities
|
||||
|
||||
logger.info(f"Scanning for workflow activities in: {workflows_dir}")
|
||||
|
||||
for workflow_dir in workflows_dir.iterdir():
|
||||
if not workflow_dir.is_dir():
|
||||
continue
|
||||
|
||||
# Skip special directories
|
||||
if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
|
||||
continue
|
||||
|
||||
# Check if activities.py exists
|
||||
activities_file = workflow_dir / "activities.py"
|
||||
if not activities_file.exists():
|
||||
logger.debug(f"No activities.py in {workflow_dir.name}, skipping")
|
||||
continue
|
||||
|
||||
try:
|
||||
# Dynamically import activities module
|
||||
module_name = f"toolbox.workflows.{workflow_dir.name}.activities"
|
||||
logger.info(f"Importing activities module: {module_name}")
|
||||
|
||||
try:
|
||||
module = importlib.import_module(module_name)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Failed to import activities module {module_name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
# Find @activity.defn decorated functions
|
||||
found_activities = False
|
||||
for name, obj in inspect.getmembers(module, inspect.isfunction):
|
||||
# Check if function has Temporal activity definition
|
||||
if hasattr(obj, '__temporal_activity_definition'):
|
||||
activities.append(obj)
|
||||
found_activities = True
|
||||
logger.info(
|
||||
f"✓ Discovered activity: {name} from {workflow_dir.name}"
|
||||
)
|
||||
|
||||
if not found_activities:
|
||||
logger.warning(
|
||||
f"Workflow {workflow_dir.name} has activities.py but no @activity.defn decorated functions"
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error processing activities from {workflow_dir.name}: {e}",
|
||||
exc_info=True
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(f"Discovered {len(activities)} workflow-specific activities")
|
||||
return activities
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main worker entry point"""
|
||||
# Get configuration from environment
|
||||
vertical = os.getenv("WORKER_VERTICAL", "ossfuzz")
|
||||
temporal_address = os.getenv("TEMPORAL_ADDRESS", "localhost:7233")
|
||||
temporal_namespace = os.getenv("TEMPORAL_NAMESPACE", "default")
|
||||
task_queue = os.getenv("WORKER_TASK_QUEUE", f"{vertical}-queue")
|
||||
max_concurrent_activities = int(os.getenv("MAX_CONCURRENT_ACTIVITIES", "2"))
|
||||
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"FuzzForge Vertical Worker: {vertical}")
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"Temporal Address: {temporal_address}")
|
||||
logger.info(f"Temporal Namespace: {temporal_namespace}")
|
||||
logger.info(f"Task Queue: {task_queue}")
|
||||
logger.info(f"Max Concurrent Activities: {max_concurrent_activities}")
|
||||
logger.info("=" * 60)
|
||||
|
||||
# Discover workflows for this vertical
|
||||
logger.info(f"Discovering workflows for vertical: {vertical}")
|
||||
workflows = await discover_workflows(vertical)
|
||||
|
||||
if not workflows:
|
||||
logger.error(f"No workflows found for vertical: {vertical}")
|
||||
logger.error("Worker cannot start without workflows. Exiting...")
|
||||
sys.exit(1)
|
||||
|
||||
# Discover activities from workflow directories
|
||||
logger.info("Discovering workflow-specific activities...")
|
||||
workflows_dir = Path("/app/toolbox/workflows")
|
||||
workflow_activities = await discover_activities(workflows_dir)
|
||||
|
||||
# Combine common storage activities, OSS-Fuzz activities, and workflow-specific activities
|
||||
activities = [
|
||||
get_target_activity,
|
||||
cleanup_cache_activity,
|
||||
upload_results_activity,
|
||||
load_ossfuzz_project_activity,
|
||||
build_ossfuzz_project_activity,
|
||||
fuzz_target_activity
|
||||
] + workflow_activities
|
||||
|
||||
logger.info(
|
||||
f"Total activities registered: {len(activities)} "
|
||||
f"(3 common + 3 ossfuzz + {len(workflow_activities)} workflow-specific)"
|
||||
)
|
||||
|
||||
# Connect to Temporal
|
||||
logger.info(f"Connecting to Temporal at {temporal_address}...")
|
||||
try:
|
||||
client = await Client.connect(
|
||||
temporal_address,
|
||||
namespace=temporal_namespace
|
||||
)
|
||||
logger.info("✓ Connected to Temporal successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Temporal: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Create worker with discovered workflows and activities
|
||||
logger.info(f"Creating worker on task queue: {task_queue}")
|
||||
|
||||
try:
|
||||
worker = Worker(
|
||||
client,
|
||||
task_queue=task_queue,
|
||||
workflows=workflows,
|
||||
activities=activities,
|
||||
max_concurrent_activities=max_concurrent_activities
|
||||
)
|
||||
logger.info("✓ Worker created successfully")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create worker: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
|
||||
# Start worker
|
||||
logger.info("=" * 60)
|
||||
logger.info(f"🚀 Worker started for vertical '{vertical}'")
|
||||
logger.info(f"📦 Registered {len(workflows)} workflows")
|
||||
logger.info(f"⚙️ Registered {len(activities)} activities")
|
||||
logger.info(f"📨 Listening on task queue: {task_queue}")
|
||||
logger.info("=" * 60)
|
||||
logger.info("Worker is ready to process tasks...")
|
||||
|
||||
try:
|
||||
await worker.run()
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Shutting down worker (keyboard interrupt)...")
|
||||
except Exception as e:
|
||||
logger.error(f"Worker error: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
asyncio.run(main())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("Worker stopped")
|
||||
except Exception as e:
|
||||
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
Reference in New Issue
Block a user