mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-05-23 16:19:45 +02:00
CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
This commit is contained in:
@@ -0,0 +1,9 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow
|
||||
|
||||
Fuzzes user-provided Python code using Atheris.
|
||||
"""
|
||||
|
||||
from .workflow import AtherisFuzzingWorkflow
|
||||
|
||||
__all__ = ["AtherisFuzzingWorkflow"]
|
||||
@@ -0,0 +1,122 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow Activities
|
||||
|
||||
Activities specific to the Atheris fuzzing workflow.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_with_atheris")
|
||||
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Fuzzing activity using the AtherisFuzzer module on user code.
|
||||
|
||||
This activity:
|
||||
1. Imports the reusable AtherisFuzzer module
|
||||
2. Sets up real-time stats callback
|
||||
3. Executes fuzzing on user's TestOneInput() function
|
||||
4. Returns findings as ModuleResult
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory (user's uploaded code)
|
||||
config: Fuzzer configuration (target_file, max_iterations, timeout_seconds)
|
||||
|
||||
Returns:
|
||||
Fuzzer results dictionary (findings, summary, metadata)
|
||||
"""
|
||||
logger.info(f"Activity: fuzz_with_atheris (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
# Import reusable AtherisFuzzer module
|
||||
from modules.fuzzer import AtherisFuzzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
# Get activity info for real-time stats
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
# Define stats callback for real-time monitoring
|
||||
async def stats_callback(stats_data: Dict[str, Any]):
|
||||
"""Callback for live fuzzing statistics"""
|
||||
try:
|
||||
# Prepare stats payload for backend
|
||||
coverage_value = stats_data.get("coverage", 0)
|
||||
logger.info(f"COVERAGE_DEBUG: coverage from stats_data = {coverage_value}")
|
||||
|
||||
stats_payload = {
|
||||
"run_id": run_id,
|
||||
"workflow": "atheris_fuzzing",
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"unique_crashes": stats_data.get("crashes", 0),
|
||||
"coverage": coverage_value,
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"last_crash_time": None
|
||||
}
|
||||
|
||||
# POST stats to backend API for real-time monitoring
|
||||
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
try:
|
||||
await client.post(
|
||||
f"{backend_url}/fuzzing/{run_id}/stats",
|
||||
json=stats_payload
|
||||
)
|
||||
except Exception as http_err:
|
||||
logger.debug(f"Failed to post stats to backend: {http_err}")
|
||||
|
||||
# Also log for debugging
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "fuzzing_live_update",
|
||||
"workflow_type": "atheris_fuzzing",
|
||||
"run_id": run_id,
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"coverage": stats_data.get("coverage", 0.0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"timestamp": datetime.utcnow().isoformat()
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning(f"Error in stats callback: {e}")
|
||||
|
||||
# Add stats callback and run_id to config
|
||||
config["stats_callback"] = stats_callback
|
||||
config["run_id"] = run_id
|
||||
|
||||
# Execute the fuzzer module
|
||||
fuzzer = AtherisFuzzer()
|
||||
result = await fuzzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{result.summary.get('total_executions', 0)} executions, "
|
||||
f"{result.summary.get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -0,0 +1,65 @@
|
||||
name: atheris_fuzzing
|
||||
version: "1.0.0"
|
||||
vertical: python
|
||||
description: "Fuzz Python code using Atheris with real-time monitoring. Automatically discovers and fuzzes TestOneInput() functions in user code."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "atheris"
|
||||
- "python"
|
||||
- "coverage"
|
||||
- "security"
|
||||
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
target_file: null
|
||||
max_iterations: 1000000
|
||||
timeout_seconds: 1800
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_file:
|
||||
type: string
|
||||
description: "Python file with TestOneInput() function (auto-discovered if not specified)"
|
||||
max_iterations:
|
||||
type: integer
|
||||
default: 1000000
|
||||
description: "Maximum fuzzing iterations"
|
||||
timeout_seconds:
|
||||
type: integer
|
||||
default: 1800
|
||||
description: "Fuzzing timeout in seconds (30 minutes)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
findings:
|
||||
type: array
|
||||
description: "Crashes and vulnerabilities found during fuzzing"
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
title:
|
||||
type: string
|
||||
severity:
|
||||
type: string
|
||||
category:
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
summary:
|
||||
type: object
|
||||
description: "Fuzzing execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
execution_time:
|
||||
type: number
|
||||
@@ -0,0 +1,175 @@
|
||||
"""
|
||||
Atheris Fuzzing Workflow - Temporal Version
|
||||
|
||||
Fuzzes user-provided Python code using Atheris with real-time monitoring.
|
||||
"""
|
||||
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class AtherisFuzzingWorkflow:
|
||||
"""
|
||||
Fuzz Python code using Atheris.
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run atheris_fuzzing .
|
||||
2. CLI uploads project to MinIO
|
||||
3. Worker downloads project
|
||||
4. Worker fuzzes TestOneInput() function
|
||||
5. Crashes reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # MinIO UUID of uploaded user code
|
||||
target_file: Optional[str] = None, # Optional: specific file to fuzz
|
||||
max_iterations: int = 1000000,
|
||||
timeout_seconds: int = 1800 # 30 minutes default for fuzzing
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
target_file: Optional specific Python file with TestOneInput() (auto-discovered if None)
|
||||
max_iterations: Maximum fuzzing iterations
|
||||
timeout_seconds: Fuzzing timeout in seconds
|
||||
|
||||
Returns:
|
||||
Dictionary containing findings and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting AtherisFuzzingWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id}, "
|
||||
f"target_file={target_file or 'auto-discover'}, max_iterations={max_iterations}, "
|
||||
f"timeout_seconds={timeout_seconds})"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Step 1: Download user's project from MinIO
|
||||
workflow.logger.info("Step 1: Downloading user code from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
|
||||
|
||||
# Step 2: Run Atheris fuzzing
|
||||
workflow.logger.info("Step 2: Running Atheris fuzzing")
|
||||
|
||||
# Use defaults if parameters are None
|
||||
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
|
||||
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
|
||||
|
||||
fuzz_config = {
|
||||
"target_file": target_file,
|
||||
"max_iterations": actual_max_iterations,
|
||||
"timeout_seconds": actual_timeout_seconds
|
||||
}
|
||||
|
||||
fuzz_results = await workflow.execute_activity(
|
||||
"fuzz_with_atheris",
|
||||
args=[target_path, fuzz_config],
|
||||
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 60),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
|
||||
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
|
||||
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
# Step 3: Upload results to MinIO
|
||||
workflow.logger.info("Step 3: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, fuzz_results, "json"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 4: Cleanup cache
|
||||
workflow.logger.info("Step 4: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["findings"] = fuzz_results.get("findings", [])
|
||||
results["summary"] = fuzz_results.get("summary", {})
|
||||
results["sarif"] = fuzz_results.get("sarif") or {}
|
||||
workflow.logger.info(
|
||||
f"✓ Workflow completed successfully: {workflow_id} "
|
||||
f"({results['summary'].get('crashes_found', 0)} crashes found)"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Cargo Fuzzing Workflow"""
|
||||
|
||||
from .workflow import CargoFuzzingWorkflow
|
||||
|
||||
__all__ = ["CargoFuzzingWorkflow"]
|
||||
@@ -0,0 +1,203 @@
|
||||
"""
|
||||
Cargo Fuzzing Workflow Activities
|
||||
|
||||
Activities specific to the cargo-fuzz fuzzing workflow.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import os
|
||||
|
||||
import httpx
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="fuzz_with_cargo")
|
||||
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Fuzzing activity using the CargoFuzzer module on user code.
|
||||
|
||||
This activity:
|
||||
1. Imports the reusable CargoFuzzer module
|
||||
2. Sets up real-time stats callback
|
||||
3. Executes fuzzing on user's fuzz_target!() functions
|
||||
4. Returns findings as ModuleResult
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory (user's uploaded Rust project)
|
||||
config: Fuzzer configuration (target_name, max_iterations, timeout_seconds, sanitizer)
|
||||
|
||||
Returns:
|
||||
Fuzzer results dictionary (findings, summary, metadata)
|
||||
"""
|
||||
logger.info(f"Activity: fuzz_with_cargo (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
# Import reusable CargoFuzzer module
|
||||
from modules.fuzzer import CargoFuzzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
# Get activity info for real-time stats
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
# Define stats callback for real-time monitoring
|
||||
async def stats_callback(stats_data: Dict[str, Any]):
|
||||
"""Callback for live fuzzing statistics"""
|
||||
try:
|
||||
# Prepare stats payload for backend
|
||||
coverage_value = stats_data.get("coverage", 0)
|
||||
|
||||
stats_payload = {
|
||||
"run_id": run_id,
|
||||
"workflow": "cargo_fuzzing",
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"unique_crashes": stats_data.get("crashes", 0),
|
||||
"coverage": coverage_value,
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"last_crash_time": None
|
||||
}
|
||||
|
||||
# POST stats to backend API for real-time monitoring
|
||||
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
try:
|
||||
await client.post(
|
||||
f"{backend_url}/fuzzing/{run_id}/stats",
|
||||
json=stats_payload
|
||||
)
|
||||
except Exception as http_err:
|
||||
logger.debug(f"Failed to post stats to backend: {http_err}")
|
||||
|
||||
# Also log for debugging
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "fuzzing_live_update",
|
||||
"workflow_type": "cargo_fuzzing",
|
||||
"run_id": run_id,
|
||||
"executions": stats_data.get("total_execs", 0),
|
||||
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
|
||||
"crashes": stats_data.get("crashes", 0),
|
||||
"corpus_size": stats_data.get("corpus_size", 0),
|
||||
"coverage": stats_data.get("coverage", 0.0),
|
||||
"elapsed_time": stats_data.get("elapsed_time", 0),
|
||||
"timestamp": datetime.utcnow().isoformat()
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Stats callback error: {e}")
|
||||
|
||||
# Initialize CargoFuzzer module
|
||||
fuzzer = CargoFuzzer()
|
||||
|
||||
# Execute fuzzing with stats callback
|
||||
module_result = await fuzzer.execute(
|
||||
config=config,
|
||||
workspace=workspace,
|
||||
stats_callback=stats_callback
|
||||
)
|
||||
|
||||
# Convert ModuleResult to dictionary
|
||||
result_dict = {
|
||||
"findings": [],
|
||||
"summary": module_result.summary,
|
||||
"metadata": module_result.metadata,
|
||||
"status": module_result.status,
|
||||
"error": module_result.error
|
||||
}
|
||||
|
||||
# Convert findings to dict format
|
||||
for finding in module_result.findings:
|
||||
finding_dict = {
|
||||
"id": finding.id,
|
||||
"title": finding.title,
|
||||
"description": finding.description,
|
||||
"severity": finding.severity,
|
||||
"category": finding.category,
|
||||
"file_path": finding.file_path,
|
||||
"line_start": finding.line_start,
|
||||
"line_end": finding.line_end,
|
||||
"code_snippet": finding.code_snippet,
|
||||
"recommendation": finding.recommendation,
|
||||
"metadata": finding.metadata
|
||||
}
|
||||
result_dict["findings"].append(finding_dict)
|
||||
|
||||
# Generate SARIF report from findings
|
||||
if module_result.findings:
|
||||
# Convert findings to SARIF format
|
||||
severity_map = {
|
||||
"critical": "error",
|
||||
"high": "error",
|
||||
"medium": "warning",
|
||||
"low": "note",
|
||||
"info": "note"
|
||||
}
|
||||
|
||||
results = []
|
||||
for finding in module_result.findings:
|
||||
result = {
|
||||
"ruleId": finding.metadata.get("rule_id", finding.category),
|
||||
"level": severity_map.get(finding.severity, "warning"),
|
||||
"message": {"text": finding.description},
|
||||
"locations": []
|
||||
}
|
||||
|
||||
if finding.file_path:
|
||||
location = {
|
||||
"physicalLocation": {
|
||||
"artifactLocation": {"uri": finding.file_path},
|
||||
"region": {
|
||||
"startLine": finding.line_start or 1,
|
||||
"endLine": finding.line_end or finding.line_start or 1
|
||||
}
|
||||
}
|
||||
}
|
||||
result["locations"].append(location)
|
||||
|
||||
results.append(result)
|
||||
|
||||
result_dict["sarif"] = {
|
||||
"version": "2.1.0",
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"runs": [{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "cargo-fuzz",
|
||||
"version": "0.11.2"
|
||||
}
|
||||
},
|
||||
"results": results
|
||||
}]
|
||||
}
|
||||
else:
|
||||
result_dict["sarif"] = {
|
||||
"version": "2.1.0",
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"runs": []
|
||||
}
|
||||
|
||||
logger.info(
|
||||
f"Fuzzing activity completed: {len(module_result.findings)} crashes found, "
|
||||
f"{module_result.summary.get('total_executions', 0)} executions"
|
||||
)
|
||||
|
||||
return result_dict
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Fuzzing activity failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -0,0 +1,71 @@
|
||||
name: cargo_fuzzing
|
||||
version: "1.0.0"
|
||||
vertical: rust
|
||||
description: "Fuzz Rust code using cargo-fuzz with real-time monitoring. Automatically discovers and fuzzes fuzz_target!() functions in user code."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "cargo-fuzz"
|
||||
- "rust"
|
||||
- "libfuzzer"
|
||||
- "memory-safety"
|
||||
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
target_name: null
|
||||
max_iterations: 1000000
|
||||
timeout_seconds: 1800
|
||||
sanitizer: "address"
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_name:
|
||||
type: string
|
||||
description: "Fuzz target name from fuzz/fuzz_targets/ (auto-discovered if not specified)"
|
||||
max_iterations:
|
||||
type: integer
|
||||
default: 1000000
|
||||
description: "Maximum fuzzing iterations"
|
||||
timeout_seconds:
|
||||
type: integer
|
||||
default: 1800
|
||||
description: "Fuzzing timeout in seconds (30 minutes)"
|
||||
sanitizer:
|
||||
type: string
|
||||
enum: ["address", "memory", "undefined"]
|
||||
default: "address"
|
||||
description: "Sanitizer to use (address, memory, undefined)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
findings:
|
||||
type: array
|
||||
description: "Crashes and memory safety issues found during fuzzing"
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
title:
|
||||
type: string
|
||||
severity:
|
||||
type: string
|
||||
category:
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
summary:
|
||||
type: object
|
||||
description: "Fuzzing execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
execution_time:
|
||||
type: number
|
||||
@@ -0,0 +1,180 @@
|
||||
"""
|
||||
Cargo Fuzzing Workflow - Temporal Version
|
||||
|
||||
Fuzzes user-provided Rust code using cargo-fuzz with real-time monitoring.
|
||||
"""
|
||||
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class CargoFuzzingWorkflow:
|
||||
"""
|
||||
Fuzz Rust code using cargo-fuzz (libFuzzer).
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run cargo_fuzzing .
|
||||
2. CLI uploads Rust project to MinIO
|
||||
3. Worker downloads project
|
||||
4. Worker discovers fuzz targets in fuzz/fuzz_targets/
|
||||
5. Worker fuzzes the target with cargo-fuzz
|
||||
6. Crashes reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # MinIO UUID of uploaded user code
|
||||
target_name: Optional[str] = None, # Optional: specific fuzz target name
|
||||
max_iterations: int = 1000000,
|
||||
timeout_seconds: int = 1800, # 30 minutes default for fuzzing
|
||||
sanitizer: str = "address"
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
target_name: Optional specific fuzz target name (auto-discovered if None)
|
||||
max_iterations: Maximum fuzzing iterations
|
||||
timeout_seconds: Fuzzing timeout in seconds
|
||||
sanitizer: Sanitizer to use (address, memory, undefined)
|
||||
|
||||
Returns:
|
||||
Dictionary containing findings and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting CargoFuzzingWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id}, "
|
||||
f"target_name={target_name or 'auto-discover'}, max_iterations={max_iterations}, "
|
||||
f"timeout_seconds={timeout_seconds}, sanitizer={sanitizer})"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Step 1: Download user's Rust project from MinIO
|
||||
workflow.logger.info("Step 1: Downloading user code from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
|
||||
|
||||
# Step 2: Run cargo-fuzz
|
||||
workflow.logger.info("Step 2: Running cargo-fuzz")
|
||||
|
||||
# Use defaults if parameters are None
|
||||
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
|
||||
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
|
||||
actual_sanitizer = sanitizer if sanitizer is not None else "address"
|
||||
|
||||
fuzz_config = {
|
||||
"target_name": target_name,
|
||||
"max_iterations": actual_max_iterations,
|
||||
"timeout_seconds": actual_timeout_seconds,
|
||||
"sanitizer": actual_sanitizer
|
||||
}
|
||||
|
||||
fuzz_results = await workflow.execute_activity(
|
||||
"fuzz_with_cargo",
|
||||
args=[target_path, fuzz_config],
|
||||
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 120),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
|
||||
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: "
|
||||
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
|
||||
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
|
||||
)
|
||||
|
||||
# Step 3: Upload results to MinIO
|
||||
workflow.logger.info("Step 3: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, fuzz_results, "json"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 4: Cleanup cache
|
||||
workflow.logger.info("Step 4: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["findings"] = fuzz_results.get("findings", [])
|
||||
results["summary"] = fuzz_results.get("summary", {})
|
||||
results["sarif"] = fuzz_results.get("sarif") or {}
|
||||
workflow.logger.info(
|
||||
f"✓ Workflow completed successfully: {workflow_id} "
|
||||
f"({results['summary'].get('crashes_found', 0)} crashes found)"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -1,12 +0,0 @@
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
@@ -1,47 +0,0 @@
|
||||
# Secret Detection Workflow Dockerfile
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog (use direct binary download to avoid install script issues)
|
||||
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
|
||||
&& tar -xzf trufflehog.tar.gz \
|
||||
&& mv trufflehog /usr/local/bin/ \
|
||||
&& rm trufflehog.tar.gz
|
||||
|
||||
# Install Gitleaks (use specific version to avoid API rate limiting)
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_8.18.2_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create toolbox directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# The toolbox code will be mounted at runtime from the backend container
|
||||
# This includes:
|
||||
# - /opt/prefect/toolbox/modules/base.py
|
||||
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
|
||||
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
|
||||
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
VOLUME /opt/prefect/toolbox
|
||||
|
||||
# Set working directory for execution
|
||||
WORKDIR /opt/prefect
|
||||
-58
@@ -1,58 +0,0 @@
|
||||
# Secret Detection Workflow Dockerfile - Self-Contained Version
|
||||
# This version copies all required modules into the image for complete isolation
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
git \
|
||||
ca-certificates \
|
||||
gnupg \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install TruffleHog
|
||||
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
|
||||
|
||||
# Install Gitleaks
|
||||
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
|
||||
&& tar -xzf gitleaks_linux_x64.tar.gz \
|
||||
&& mv gitleaks /usr/local/bin/ \
|
||||
&& rm gitleaks_linux_x64.tar.gz
|
||||
|
||||
# Verify installations
|
||||
RUN trufflehog --version && gitleaks version
|
||||
|
||||
# Set working directory
|
||||
WORKDIR /opt/prefect
|
||||
|
||||
# Create directory structure
|
||||
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
|
||||
/opt/prefect/toolbox/modules/reporter \
|
||||
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
|
||||
|
||||
# Copy the base module and required modules
|
||||
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
|
||||
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
|
||||
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
|
||||
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
|
||||
|
||||
# Copy the workflow code
|
||||
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
|
||||
|
||||
# Copy toolbox init files
|
||||
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
|
||||
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
|
||||
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
|
||||
|
||||
# Install Python dependencies for the modules
|
||||
RUN pip install --no-cache-dir \
|
||||
pydantic \
|
||||
asyncio
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
|
||||
ENV WORKFLOW_NAME=secret_detection_scan
|
||||
|
||||
# Set default command (can be overridden)
|
||||
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
|
||||
@@ -1,130 +0,0 @@
|
||||
# Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple industry-standard tools:
|
||||
|
||||
- **TruffleHog**: Comprehensive secret detection with verification capabilities
|
||||
- **Gitleaks**: Git-specific secret scanning and leak detection
|
||||
|
||||
## Features
|
||||
|
||||
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
|
||||
- **Deduplication**: Automatically removes duplicate findings across tools
|
||||
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
|
||||
- **Configurable**: Supports extensive configuration for both tools
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Required Modules
|
||||
- `toolbox.modules.secret_detection.trufflehog`
|
||||
- `toolbox.modules.secret_detection.gitleaks`
|
||||
- `toolbox.modules.reporter` (SARIF reporter)
|
||||
- `toolbox.modules.base` (Base module interface)
|
||||
|
||||
### External Tools
|
||||
- TruffleHog v3.63.2+
|
||||
- Gitleaks v8.18.0+
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
This workflow provides two Docker deployment approaches:
|
||||
|
||||
### 1. Volume-Based Approach (Default: `Dockerfile`)
|
||||
|
||||
**Advantages:**
|
||||
- Live code updates without rebuilding images
|
||||
- Smaller image sizes
|
||||
- Consistent module versions across workflows
|
||||
- Faster development iteration
|
||||
|
||||
**How it works:**
|
||||
- Docker image contains only external tools (TruffleHog, Gitleaks)
|
||||
- Python modules are mounted at runtime from the backend container
|
||||
- Backend manages code synchronization via shared volumes
|
||||
|
||||
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
|
||||
|
||||
**Advantages:**
|
||||
- Complete isolation and reproducibility
|
||||
- No runtime dependencies on backend code
|
||||
- Can run independently of FuzzForge platform
|
||||
- Better for CI/CD integration
|
||||
|
||||
**How it works:**
|
||||
- All required Python modules are copied into the Docker image
|
||||
- Image is completely self-contained
|
||||
- Larger image size but fully portable
|
||||
|
||||
## Configuration
|
||||
|
||||
### TruffleHog Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"trufflehog_config": {
|
||||
"verify": true, // Verify discovered secrets
|
||||
"concurrency": 10, // Number of concurrent workers
|
||||
"max_depth": 10, // Maximum directory depth
|
||||
"include_detectors": [], // Specific detectors to include
|
||||
"exclude_detectors": [] // Specific detectors to exclude
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gitleaks Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect", // "detect" or "protect"
|
||||
"redact": true, // Redact secrets in output
|
||||
"max_target_megabytes": 100, // Maximum file size (MB)
|
||||
"no_git": false, // Scan without Git context
|
||||
"config_file": "", // Custom Gitleaks config
|
||||
"baseline_file": "" // Baseline file for known findings
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/path/to/scan",
|
||||
"volume_mode": "ro",
|
||||
"parameters": {
|
||||
"trufflehog_config": {
|
||||
"verify": true,
|
||||
"concurrency": 15
|
||||
},
|
||||
"gitleaks_config": {
|
||||
"scan_mode": "detect",
|
||||
"max_target_megabytes": 200
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
The workflow generates a SARIF report containing:
|
||||
- All unique findings from both tools
|
||||
- Severity levels mapped to standard scale
|
||||
- File locations and line numbers
|
||||
- Detailed descriptions and recommendations
|
||||
- Tool-specific metadata
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **TruffleHog**: CPU-intensive with verification enabled
|
||||
- **Gitleaks**: Memory-intensive for large repositories
|
||||
- **Recommended Resources**: 512Mi memory, 500m CPU
|
||||
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
|
||||
|
||||
## Security Notes
|
||||
|
||||
- Secrets are redacted in output by default
|
||||
- Verified secrets are marked with higher severity
|
||||
- Both tools support custom rules and exclusions
|
||||
- Consider using baseline files for known false positives
|
||||
@@ -1,17 +0,0 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This package contains the comprehensive secret detection workflow that combines
|
||||
multiple secret detection tools for thorough analysis.
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
@@ -1,113 +0,0 @@
|
||||
name: secret_detection_scan
|
||||
version: "2.0.0"
|
||||
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "secrets"
|
||||
- "credentials"
|
||||
- "detection"
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "trufflehog"
|
||||
- "gitleaks"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace"
|
||||
volume_mode: "ro"
|
||||
trufflehog_config: {}
|
||||
gitleaks_config: {}
|
||||
reporter_config: {}
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
trufflehog_config:
|
||||
type: object
|
||||
description: "TruffleHog configuration"
|
||||
properties:
|
||||
verify:
|
||||
type: boolean
|
||||
description: "Verify discovered secrets"
|
||||
concurrency:
|
||||
type: integer
|
||||
description: "Number of concurrent workers"
|
||||
max_depth:
|
||||
type: integer
|
||||
description: "Maximum directory depth to scan"
|
||||
include_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to include"
|
||||
exclude_detectors:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: "Specific detectors to exclude"
|
||||
gitleaks_config:
|
||||
type: object
|
||||
description: "Gitleaks configuration"
|
||||
properties:
|
||||
scan_mode:
|
||||
type: string
|
||||
enum: ["detect", "protect"]
|
||||
description: "Scan mode"
|
||||
redact:
|
||||
type: boolean
|
||||
description: "Redact secrets in output"
|
||||
max_target_megabytes:
|
||||
type: integer
|
||||
description: "Maximum file size to scan (MB)"
|
||||
no_git:
|
||||
type: boolean
|
||||
description: "Scan files without Git context"
|
||||
config_file:
|
||||
type: string
|
||||
description: "Path to custom configuration file"
|
||||
baseline_file:
|
||||
type: string
|
||||
description: "Path to baseline file"
|
||||
reporter_config:
|
||||
type: object
|
||||
description: "SARIF reporter configuration"
|
||||
properties:
|
||||
output_file:
|
||||
type: string
|
||||
description: "Output SARIF file name"
|
||||
include_code_flows:
|
||||
type: boolean
|
||||
description: "Include code flow information"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted security findings"
|
||||
@@ -1,290 +0,0 @@
|
||||
"""
|
||||
Secret Detection Scan Workflow
|
||||
|
||||
This workflow performs comprehensive secret detection using multiple tools:
|
||||
- TruffleHog: Comprehensive secret detection with verification
|
||||
- Gitleaks: Git-specific secret scanning
|
||||
"""
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, List, Optional
|
||||
from prefect import flow, task
|
||||
from prefect.artifacts import create_markdown_artifact, create_table_artifact
|
||||
import asyncio
|
||||
import json
|
||||
|
||||
# Add modules to path
|
||||
sys.path.insert(0, '/app')
|
||||
|
||||
# Import modules
|
||||
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
|
||||
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@task(name="trufflehog_scan")
|
||||
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run TruffleHog secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: TruffleHog configuration
|
||||
|
||||
Returns:
|
||||
TruffleHog results
|
||||
"""
|
||||
logger.info("Running TruffleHog secret detection")
|
||||
module = TruffleHogModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="gitleaks_scan")
|
||||
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to run Gitleaks secret detection.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Gitleaks configuration
|
||||
|
||||
Returns:
|
||||
Gitleaks results
|
||||
"""
|
||||
logger.info("Running Gitleaks secret detection")
|
||||
module = GitleaksModule()
|
||||
result = await module.execute(config, workspace)
|
||||
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="aggregate_findings")
|
||||
async def aggregate_findings_task(
|
||||
trufflehog_results: Dict[str, Any],
|
||||
gitleaks_results: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to aggregate findings from all secret detection tools.
|
||||
|
||||
Args:
|
||||
trufflehog_results: Results from TruffleHog
|
||||
gitleaks_results: Results from Gitleaks
|
||||
config: Reporter configuration
|
||||
workspace: Path to workspace
|
||||
|
||||
Returns:
|
||||
Aggregated SARIF report
|
||||
"""
|
||||
logger.info("Aggregating secret detection findings")
|
||||
|
||||
# Combine all findings
|
||||
all_findings = []
|
||||
|
||||
# Add TruffleHog findings
|
||||
trufflehog_findings = trufflehog_results.get("findings", [])
|
||||
all_findings.extend(trufflehog_findings)
|
||||
|
||||
# Add Gitleaks findings
|
||||
gitleaks_findings = gitleaks_results.get("findings", [])
|
||||
all_findings.extend(gitleaks_findings)
|
||||
|
||||
# Deduplicate findings based on file path and line number
|
||||
unique_findings = []
|
||||
seen_signatures = set()
|
||||
|
||||
for finding in all_findings:
|
||||
# Create signature for deduplication
|
||||
signature = (
|
||||
finding.get("file_path", ""),
|
||||
finding.get("line_start", 0),
|
||||
finding.get("title", "").lower()[:50] # First 50 chars of title
|
||||
)
|
||||
|
||||
if signature not in seen_signatures:
|
||||
seen_signatures.add(signature)
|
||||
unique_findings.append(finding)
|
||||
else:
|
||||
logger.debug(f"Deduplicated finding: {signature}")
|
||||
|
||||
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
|
||||
|
||||
# Generate SARIF report
|
||||
reporter = SARIFReporter()
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": unique_findings,
|
||||
"tool_name": "FuzzForge Secret Detection",
|
||||
"tool_version": "1.0.0",
|
||||
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
return result.dict().get("sarif", {})
|
||||
|
||||
|
||||
@flow(name="secret_detection_scan", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
volume_mode: str = "ro",
|
||||
trufflehog_config: Optional[Dict[str, Any]] = None,
|
||||
gitleaks_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main secret detection workflow.
|
||||
|
||||
This workflow:
|
||||
1. Runs TruffleHog for comprehensive secret detection
|
||||
2. Runs Gitleaks for Git-specific secret detection
|
||||
3. Aggregates and deduplicates findings
|
||||
4. Generates a unified SARIF report
|
||||
|
||||
Args:
|
||||
target_path: Path to the mounted workspace (default: /workspace)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
trufflehog_config: Configuration for TruffleHog
|
||||
gitleaks_config: Configuration for Gitleaks
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings report
|
||||
"""
|
||||
logger.info("Starting comprehensive secret detection workflow")
|
||||
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
|
||||
|
||||
# Set workspace path
|
||||
workspace = Path(target_path)
|
||||
|
||||
if not workspace.exists():
|
||||
logger.error(f"Workspace does not exist: {workspace}")
|
||||
return {
|
||||
"error": f"Workspace not found: {workspace}",
|
||||
"sarif": None
|
||||
}
|
||||
|
||||
# Default configurations - merge with provided configs to ensure defaults are always applied
|
||||
default_trufflehog_config = {
|
||||
"verify": False,
|
||||
"concurrency": 10,
|
||||
"max_depth": 10,
|
||||
"no_git": True # Add no_git for filesystem scanning
|
||||
}
|
||||
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
|
||||
|
||||
default_gitleaks_config = {
|
||||
"scan_mode": "detect",
|
||||
"redact": True,
|
||||
"max_target_megabytes": 100,
|
||||
"no_git": True # Critical for non-git directories
|
||||
}
|
||||
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
|
||||
|
||||
default_reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
reporter_config = {**default_reporter_config, **(reporter_config or {})}
|
||||
|
||||
try:
|
||||
# Run secret detection tools in parallel
|
||||
logger.info("Phase 1: Running secret detection tools")
|
||||
|
||||
# Create tasks for parallel execution
|
||||
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
|
||||
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
|
||||
|
||||
# Wait for both to complete
|
||||
trufflehog_results, gitleaks_results = await asyncio.gather(
|
||||
trufflehog_task_result,
|
||||
gitleaks_task_result,
|
||||
return_exceptions=True
|
||||
)
|
||||
|
||||
# Handle any exceptions
|
||||
if isinstance(trufflehog_results, Exception):
|
||||
logger.error(f"TruffleHog failed: {trufflehog_results}")
|
||||
trufflehog_results = {"findings": [], "status": "failed"}
|
||||
|
||||
if isinstance(gitleaks_results, Exception):
|
||||
logger.error(f"Gitleaks failed: {gitleaks_results}")
|
||||
gitleaks_results = {"findings": [], "status": "failed"}
|
||||
|
||||
# Aggregate findings
|
||||
logger.info("Phase 2: Aggregating findings")
|
||||
sarif_report = await aggregate_findings_task(
|
||||
trufflehog_results,
|
||||
gitleaks_results,
|
||||
reporter_config,
|
||||
workspace
|
||||
)
|
||||
|
||||
# Log summary
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
results_count = len(sarif_report["runs"][0].get("results", []))
|
||||
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
|
||||
|
||||
# Log tool-specific stats
|
||||
trufflehog_count = len(trufflehog_results.get("findings", []))
|
||||
gitleaks_count = len(gitleaks_results.get("findings", []))
|
||||
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
|
||||
else:
|
||||
logger.info("Workflow completed successfully with no findings")
|
||||
|
||||
return sarif_report
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Secret detection workflow failed: {e}")
|
||||
# Return error in SARIF format
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "FuzzForge Secret Detection",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e)
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For local testing
|
||||
import asyncio
|
||||
|
||||
asyncio.run(main_flow(
|
||||
target_path="/tmp/test",
|
||||
trufflehog_config={"verify": True, "max_depth": 5},
|
||||
gitleaks_config={"scan_mode": "detect"}
|
||||
))
|
||||
@@ -0,0 +1,113 @@
|
||||
name: ossfuzz_campaign
|
||||
version: "1.0.0"
|
||||
vertical: ossfuzz
|
||||
description: "Generic OSS-Fuzz fuzzing campaign. Automatically reads project configuration from OSS-Fuzz repo and runs fuzzing using Google's infrastructure."
|
||||
author: "FuzzForge Team"
|
||||
tags:
|
||||
- "fuzzing"
|
||||
- "oss-fuzz"
|
||||
- "libfuzzer"
|
||||
- "afl"
|
||||
- "honggfuzz"
|
||||
- "memory-safety"
|
||||
- "security"
|
||||
|
||||
# Workspace isolation mode
|
||||
# OSS-Fuzz campaigns use isolated mode for safe concurrent campaigns
|
||||
workspace_isolation: "isolated"
|
||||
|
||||
default_parameters:
|
||||
project_name: null
|
||||
campaign_duration_hours: 1
|
||||
override_engine: null
|
||||
override_sanitizer: null
|
||||
max_iterations: null
|
||||
|
||||
parameters:
|
||||
type: object
|
||||
required:
|
||||
- project_name
|
||||
properties:
|
||||
project_name:
|
||||
type: string
|
||||
description: "OSS-Fuzz project name (e.g., 'curl', 'sqlite3', 'libxml2')"
|
||||
examples:
|
||||
- "curl"
|
||||
- "sqlite3"
|
||||
- "libxml2"
|
||||
- "openssl"
|
||||
- "zlib"
|
||||
|
||||
campaign_duration_hours:
|
||||
type: integer
|
||||
default: 1
|
||||
minimum: 1
|
||||
maximum: 168 # 1 week max
|
||||
description: "How many hours to run the fuzzing campaign"
|
||||
|
||||
override_engine:
|
||||
type: string
|
||||
enum: ["libfuzzer", "afl", "honggfuzz"]
|
||||
description: "Override fuzzing engine from project.yaml (optional)"
|
||||
|
||||
override_sanitizer:
|
||||
type: string
|
||||
enum: ["address", "memory", "undefined", "dataflow"]
|
||||
description: "Override sanitizer from project.yaml (optional)"
|
||||
|
||||
max_iterations:
|
||||
type: integer
|
||||
minimum: 1000
|
||||
description: "Optional limit on fuzzing iterations (optional)"
|
||||
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
project_name:
|
||||
type: string
|
||||
description: "OSS-Fuzz project that was fuzzed"
|
||||
|
||||
summary:
|
||||
type: object
|
||||
description: "Campaign execution summary"
|
||||
properties:
|
||||
total_executions:
|
||||
type: integer
|
||||
crashes_found:
|
||||
type: integer
|
||||
unique_crashes:
|
||||
type: integer
|
||||
duration_hours:
|
||||
type: number
|
||||
engine_used:
|
||||
type: string
|
||||
sanitizer_used:
|
||||
type: string
|
||||
|
||||
crashes:
|
||||
type: array
|
||||
description: "List of crash file paths"
|
||||
items:
|
||||
type: string
|
||||
|
||||
sarif:
|
||||
type: object
|
||||
description: "SARIF-formatted crash reports (future)"
|
||||
|
||||
examples:
|
||||
- name: "Fuzz curl for 1 hour"
|
||||
parameters:
|
||||
project_name: "curl"
|
||||
campaign_duration_hours: 1
|
||||
|
||||
- name: "Fuzz sqlite3 with AFL"
|
||||
parameters:
|
||||
project_name: "sqlite3"
|
||||
campaign_duration_hours: 2
|
||||
override_engine: "afl"
|
||||
|
||||
- name: "Fuzz libxml2 with memory sanitizer"
|
||||
parameters:
|
||||
project_name: "libxml2"
|
||||
campaign_duration_hours: 6
|
||||
override_sanitizer: "memory"
|
||||
@@ -0,0 +1,219 @@
|
||||
"""
|
||||
OSS-Fuzz Campaign Workflow - Temporal Version
|
||||
|
||||
Generic workflow for running OSS-Fuzz campaigns using Google's infrastructure.
|
||||
Automatically reads project configuration from OSS-Fuzz project.yaml files.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import for type hints (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@workflow.defn
|
||||
class OssfuzzCampaignWorkflow:
|
||||
"""
|
||||
Generic OSS-Fuzz fuzzing campaign workflow.
|
||||
|
||||
User workflow:
|
||||
1. User runs: ff workflow run ossfuzz_campaign . project_name=curl
|
||||
2. Worker loads project config from OSS-Fuzz repo
|
||||
3. Worker builds project using OSS-Fuzz's build system
|
||||
4. Worker runs fuzzing with engines from project.yaml
|
||||
5. Crashes and corpus reported as findings
|
||||
"""
|
||||
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # Required by FuzzForge (not used, OSS-Fuzz downloads from Google)
|
||||
project_name: str, # Required: OSS-Fuzz project name (e.g., "curl", "sqlite3")
|
||||
campaign_duration_hours: int = 1,
|
||||
override_engine: Optional[str] = None, # Override engine from project.yaml
|
||||
override_sanitizer: Optional[str] = None, # Override sanitizer from project.yaml
|
||||
max_iterations: Optional[int] = None # Optional: limit fuzzing iterations
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
Args:
|
||||
target_id: UUID of uploaded target (not used, required by FuzzForge)
|
||||
project_name: Name of OSS-Fuzz project (e.g., "curl", "sqlite3", "libxml2")
|
||||
campaign_duration_hours: How many hours to fuzz (default: 1)
|
||||
override_engine: Override fuzzing engine from project.yaml
|
||||
override_sanitizer: Override sanitizer from project.yaml
|
||||
max_iterations: Optional limit on fuzzing iterations
|
||||
|
||||
Returns:
|
||||
Dictionary containing crashes, stats, and SARIF report
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
workflow.logger.info(
|
||||
f"Starting OSS-Fuzz Campaign for project '{project_name}' "
|
||||
f"(workflow_id={workflow_id}, duration={campaign_duration_hours}h)"
|
||||
)
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"project_name": project_name,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Step 1: Load OSS-Fuzz project configuration
|
||||
workflow.logger.info(f"Step 1: Loading project config for '{project_name}'")
|
||||
project_config = await workflow.execute_activity(
|
||||
"load_ossfuzz_project",
|
||||
args=[project_name],
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "load_config",
|
||||
"status": "success",
|
||||
"language": project_config.get("language"),
|
||||
"engines": project_config.get("fuzzing_engines", []),
|
||||
"sanitizers": project_config.get("sanitizers", [])
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Loaded config: language={project_config.get('language')}, "
|
||||
f"engines={project_config.get('fuzzing_engines')}"
|
||||
)
|
||||
|
||||
# Step 2: Build project using OSS-Fuzz infrastructure
|
||||
workflow.logger.info(f"Step 2: Building project '{project_name}'")
|
||||
|
||||
build_result = await workflow.execute_activity(
|
||||
"build_ossfuzz_project",
|
||||
args=[
|
||||
project_name,
|
||||
project_config,
|
||||
override_sanitizer,
|
||||
override_engine
|
||||
],
|
||||
start_to_close_timeout=timedelta(minutes=30),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
|
||||
results["steps"].append({
|
||||
"step": "build_project",
|
||||
"status": "success",
|
||||
"fuzz_targets": len(build_result.get("fuzz_targets", [])),
|
||||
"sanitizer": build_result.get("sanitizer_used"),
|
||||
"engine": build_result.get("engine_used")
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Build completed: {len(build_result.get('fuzz_targets', []))} fuzz targets found"
|
||||
)
|
||||
|
||||
if not build_result.get("fuzz_targets"):
|
||||
raise Exception(f"No fuzz targets found for project {project_name}")
|
||||
|
||||
# Step 3: Run fuzzing on discovered targets
|
||||
workflow.logger.info(f"Step 3: Fuzzing {len(build_result['fuzz_targets'])} targets")
|
||||
|
||||
# Determine which engine to use
|
||||
engine_to_use = override_engine if override_engine else build_result["engine_used"]
|
||||
duration_seconds = campaign_duration_hours * 3600
|
||||
|
||||
# Fuzz each target (in parallel if multiple targets)
|
||||
fuzz_futures = []
|
||||
for target_path in build_result["fuzz_targets"]:
|
||||
future = workflow.execute_activity(
|
||||
"fuzz_target",
|
||||
args=[target_path, engine_to_use, duration_seconds, None, None],
|
||||
start_to_close_timeout=timedelta(seconds=duration_seconds + 300),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=1 # Fuzzing shouldn't retry
|
||||
)
|
||||
)
|
||||
fuzz_futures.append(future)
|
||||
|
||||
# Wait for all fuzzing to complete
|
||||
fuzz_results = await asyncio.gather(*fuzz_futures, return_exceptions=True)
|
||||
|
||||
# Aggregate results
|
||||
total_execs = 0
|
||||
total_crashes = 0
|
||||
all_crashes = []
|
||||
|
||||
for i, result in enumerate(fuzz_results):
|
||||
if isinstance(result, Exception):
|
||||
workflow.logger.error(f"Fuzzing failed for target {i}: {result}")
|
||||
continue
|
||||
|
||||
total_execs += result.get("total_executions", 0)
|
||||
total_crashes += result.get("crashes", 0)
|
||||
all_crashes.extend(result.get("crash_files", []))
|
||||
|
||||
results["steps"].append({
|
||||
"step": "fuzzing",
|
||||
"status": "success",
|
||||
"total_executions": total_execs,
|
||||
"crashes_found": total_crashes,
|
||||
"targets_fuzzed": len(build_result["fuzz_targets"])
|
||||
})
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Fuzzing completed: {total_execs} executions, {total_crashes} crashes"
|
||||
)
|
||||
|
||||
# Step 4: Generate SARIF report
|
||||
workflow.logger.info("Step 4: Generating SARIF report")
|
||||
|
||||
# TODO: Implement crash minimization and SARIF generation
|
||||
# For now, return raw results
|
||||
|
||||
results["status"] = "success"
|
||||
results["summary"] = {
|
||||
"project": project_name,
|
||||
"total_executions": total_execs,
|
||||
"crashes_found": total_crashes,
|
||||
"unique_crashes": len(set(all_crashes)),
|
||||
"duration_hours": campaign_duration_hours,
|
||||
"engine_used": engine_to_use,
|
||||
"sanitizer_used": build_result.get("sanitizer_used")
|
||||
}
|
||||
results["crashes"] = all_crashes[:100] # Limit to first 100 crashes
|
||||
|
||||
workflow.logger.info(
|
||||
f"✓ Campaign completed: {project_name} - "
|
||||
f"{total_execs} execs, {total_crashes} crashes"
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
@@ -1,187 +0,0 @@
|
||||
"""
|
||||
Manual Workflow Registry for Prefect Deployment
|
||||
|
||||
This file contains the manual registry of all workflows that can be deployed.
|
||||
Developers MUST add their workflows here after creating them.
|
||||
|
||||
This approach is required because:
|
||||
1. Prefect cannot deploy dynamically imported flows
|
||||
2. Docker deployment needs static flow references
|
||||
3. Explicit registration provides better control and visibility
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
#
|
||||
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
|
||||
# at the root of this repository for details.
|
||||
#
|
||||
# After the Change Date (four years from publication), this version of the
|
||||
# Licensed Work will be made available under the Apache License, Version 2.0.
|
||||
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
from typing import Dict, Any, Callable
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Import only essential workflows
|
||||
# Import each workflow individually to handle failures gracefully
|
||||
security_assessment_flow = None
|
||||
secret_detection_flow = None
|
||||
|
||||
# Try to import each workflow individually
|
||||
try:
|
||||
from .security_assessment.workflow import main_flow as security_assessment_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import security_assessment workflow: {e}")
|
||||
|
||||
try:
|
||||
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
|
||||
except ImportError as e:
|
||||
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
|
||||
|
||||
|
||||
# Manual registry - developers add workflows here after creation
|
||||
# Only include workflows that were successfully imported
|
||||
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
# Add workflows that were successfully imported
|
||||
if security_assessment_flow is not None:
|
||||
WORKFLOW_REGISTRY["security_assessment"] = {
|
||||
"flow": security_assessment_flow,
|
||||
"module_path": "toolbox.workflows.security_assessment.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
|
||||
}
|
||||
|
||||
if secret_detection_flow is not None:
|
||||
WORKFLOW_REGISTRY["secret_detection_scan"] = {
|
||||
"flow": secret_detection_flow,
|
||||
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
|
||||
"function_name": "main_flow",
|
||||
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
|
||||
"version": "1.0.0",
|
||||
"author": "FuzzForge Team",
|
||||
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
|
||||
}
|
||||
|
||||
#
|
||||
# To add a new workflow, follow this pattern:
|
||||
#
|
||||
# "my_new_workflow": {
|
||||
# "flow": my_new_flow_function, # Import the flow function above
|
||||
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
|
||||
# "function_name": "my_new_flow_function",
|
||||
# "description": "Description of what this workflow does",
|
||||
# "version": "1.0.0",
|
||||
# "author": "Developer Name",
|
||||
# "tags": ["tag1", "tag2"]
|
||||
# }
|
||||
|
||||
|
||||
def get_workflow_flow(workflow_name: str) -> Callable:
|
||||
"""
|
||||
Get the flow function for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Flow function
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}. "
|
||||
f"Please add the workflow to toolbox/workflows/registry.py"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]["flow"]
|
||||
|
||||
|
||||
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Get registry information for a workflow.
|
||||
|
||||
Args:
|
||||
workflow_name: Name of the workflow
|
||||
|
||||
Returns:
|
||||
Registry information dictionary
|
||||
|
||||
Raises:
|
||||
KeyError: If workflow not found in registry
|
||||
"""
|
||||
if workflow_name not in WORKFLOW_REGISTRY:
|
||||
available = list(WORKFLOW_REGISTRY.keys())
|
||||
raise KeyError(
|
||||
f"Workflow '{workflow_name}' not found in registry. "
|
||||
f"Available workflows: {available}"
|
||||
)
|
||||
|
||||
return WORKFLOW_REGISTRY[workflow_name]
|
||||
|
||||
|
||||
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
|
||||
"""
|
||||
Get all registered workflows.
|
||||
|
||||
Returns:
|
||||
Dictionary of all workflow registry entries
|
||||
"""
|
||||
return WORKFLOW_REGISTRY.copy()
|
||||
|
||||
|
||||
def validate_registry() -> bool:
|
||||
"""
|
||||
Validate the workflow registry for consistency.
|
||||
|
||||
Returns:
|
||||
True if valid, raises exceptions if not
|
||||
|
||||
Raises:
|
||||
ValueError: If registry is invalid
|
||||
"""
|
||||
if not WORKFLOW_REGISTRY:
|
||||
raise ValueError("Workflow registry is empty")
|
||||
|
||||
required_fields = ["flow", "module_path", "function_name", "description"]
|
||||
|
||||
for name, entry in WORKFLOW_REGISTRY.items():
|
||||
# Check required fields
|
||||
missing_fields = [field for field in required_fields if field not in entry]
|
||||
if missing_fields:
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' missing required fields: {missing_fields}"
|
||||
)
|
||||
|
||||
# Check if flow is callable
|
||||
if not callable(entry["flow"]):
|
||||
raise ValueError(f"Workflow '{name}' flow is not callable")
|
||||
|
||||
# Check if flow has the required Prefect attributes
|
||||
if not hasattr(entry["flow"], "deploy"):
|
||||
raise ValueError(
|
||||
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
|
||||
)
|
||||
|
||||
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
|
||||
return True
|
||||
|
||||
|
||||
# Validate registry on import
|
||||
try:
|
||||
validate_registry()
|
||||
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow registry validation failed: {e}")
|
||||
raise
|
||||
@@ -1,30 +0,0 @@
|
||||
FROM prefecthq/prefect:3-python3.11
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Create toolbox directory structure to match expected import paths
|
||||
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
|
||||
|
||||
# Copy base module infrastructure
|
||||
COPY modules/__init__.py /app/toolbox/modules/
|
||||
COPY modules/base.py /app/toolbox/modules/
|
||||
|
||||
# Copy only required modules (manual selection)
|
||||
COPY modules/scanner /app/toolbox/modules/scanner
|
||||
COPY modules/analyzer /app/toolbox/modules/analyzer
|
||||
COPY modules/reporter /app/toolbox/modules/reporter
|
||||
|
||||
# Copy this workflow
|
||||
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
|
||||
|
||||
# Install workflow-specific requirements if they exist
|
||||
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
|
||||
|
||||
# Install common requirements
|
||||
RUN pip install --no-cache-dir pyyaml
|
||||
|
||||
# Set Python path
|
||||
ENV PYTHONPATH=/app:$PYTHONPATH
|
||||
|
||||
# Create workspace directory
|
||||
RUN mkdir -p /workspace
|
||||
@@ -0,0 +1,150 @@
|
||||
"""
|
||||
Security Assessment Workflow Activities
|
||||
|
||||
Activities specific to the security assessment workflow:
|
||||
- scan_files_activity: Scan files in the workspace
|
||||
- analyze_security_activity: Analyze security vulnerabilities
|
||||
- generate_sarif_report_activity: Generate SARIF report from findings
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from temporalio import activity
|
||||
|
||||
# Configure logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Add toolbox to path for module imports
|
||||
sys.path.insert(0, '/app/toolbox')
|
||||
|
||||
|
||||
@activity.defn(name="scan_files")
|
||||
async def scan_files_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Scan files in the workspace.
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory
|
||||
config: Scanner configuration
|
||||
|
||||
Returns:
|
||||
Scanner results dictionary
|
||||
"""
|
||||
logger.info(f"Activity: scan_files (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
from modules.scanner import FileScanner
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
scanner = FileScanner()
|
||||
result = await scanner.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ File scanning completed: "
|
||||
f"{result.summary.get('total_files', 0)} files scanned"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"File scanning failed: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@activity.defn(name="analyze_security")
|
||||
async def analyze_security_activity(workspace_path: str, config: dict) -> dict:
|
||||
"""
|
||||
Analyze security vulnerabilities in the workspace.
|
||||
|
||||
Args:
|
||||
workspace_path: Path to the workspace directory
|
||||
config: Analyzer configuration
|
||||
|
||||
Returns:
|
||||
Analysis results dictionary
|
||||
"""
|
||||
logger.info(f"Activity: analyze_security (workspace={workspace_path})")
|
||||
|
||||
try:
|
||||
from modules.analyzer import SecurityAnalyzer
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
if not workspace.exists():
|
||||
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
|
||||
|
||||
analyzer = SecurityAnalyzer()
|
||||
result = await analyzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"✓ Security analysis completed: "
|
||||
f"{result.summary.get('total_findings', 0)} findings"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Security analysis failed: {e}", exc_info=True)
|
||||
raise
|
||||
|
||||
|
||||
@activity.defn(name="generate_sarif_report")
|
||||
async def generate_sarif_report_activity(
|
||||
scan_results: dict,
|
||||
analysis_results: dict,
|
||||
config: dict,
|
||||
workspace_path: str
|
||||
) -> dict:
|
||||
"""
|
||||
Generate SARIF report from scan and analysis results.
|
||||
|
||||
Args:
|
||||
scan_results: Results from file scanner
|
||||
analysis_results: Results from security analyzer
|
||||
config: Reporter configuration
|
||||
workspace_path: Path to the workspace
|
||||
|
||||
Returns:
|
||||
SARIF report dictionary
|
||||
"""
|
||||
logger.info("Activity: generate_sarif_report")
|
||||
|
||||
try:
|
||||
from modules.reporter import SARIFReporter
|
||||
|
||||
workspace = Path(workspace_path)
|
||||
|
||||
# Combine findings from all modules
|
||||
all_findings = []
|
||||
|
||||
# Add scanner findings (only sensitive files, not all files)
|
||||
scanner_findings = scan_results.get("findings", [])
|
||||
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
|
||||
all_findings.extend(sensitive_findings)
|
||||
|
||||
# Add analyzer findings
|
||||
analyzer_findings = analysis_results.get("findings", [])
|
||||
all_findings.extend(analyzer_findings)
|
||||
|
||||
# Prepare reporter config
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": all_findings,
|
||||
"tool_name": "FuzzForge Security Assessment",
|
||||
"tool_version": "1.0.0"
|
||||
}
|
||||
|
||||
reporter = SARIFReporter()
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
|
||||
# Extract SARIF from result
|
||||
sarif = result.dict().get("sarif", {})
|
||||
|
||||
logger.info(f"✓ SARIF report generated with {len(all_findings)} findings")
|
||||
return sarif
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"SARIF report generation failed: {e}", exc_info=True)
|
||||
raise
|
||||
@@ -1,8 +1,8 @@
|
||||
name: security_assessment
|
||||
version: "2.0.0"
|
||||
vertical: rust
|
||||
description: "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports"
|
||||
author: "FuzzForge Team"
|
||||
category: "comprehensive"
|
||||
tags:
|
||||
- "security"
|
||||
- "scanner"
|
||||
@@ -11,28 +11,14 @@ tags:
|
||||
- "sarif"
|
||||
- "comprehensive"
|
||||
|
||||
supported_volume_modes:
|
||||
- "ro"
|
||||
- "rw"
|
||||
|
||||
default_volume_mode: "ro"
|
||||
default_target_path: "/workspace"
|
||||
|
||||
requirements:
|
||||
tools:
|
||||
- "file_scanner"
|
||||
- "security_analyzer"
|
||||
- "sarif_reporter"
|
||||
resources:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
timeout: 1800
|
||||
|
||||
has_docker: true
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
|
||||
# - "shared": All runs share the same workspace (for read-only analysis workflows)
|
||||
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
|
||||
# Using "shared" mode for read-only security analysis (no file modifications)
|
||||
workspace_isolation: "shared"
|
||||
|
||||
default_parameters:
|
||||
target_path: "/workspace"
|
||||
volume_mode: "ro"
|
||||
scanner_config: {}
|
||||
analyzer_config: {}
|
||||
reporter_config: {}
|
||||
@@ -40,15 +26,6 @@ default_parameters:
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
volume_mode:
|
||||
type: string
|
||||
enum: ["ro", "rw"]
|
||||
default: "ro"
|
||||
description: "Volume mount mode"
|
||||
scanner_config:
|
||||
type: object
|
||||
description: "File scanner configuration"
|
||||
|
||||
@@ -1,4 +0,0 @@
|
||||
# Requirements for security assessment workflow
|
||||
pydantic>=2.0.0
|
||||
pyyaml>=6.0
|
||||
aiofiles>=23.0.0
|
||||
@@ -1,5 +1,7 @@
|
||||
"""
|
||||
Security Assessment Workflow - Comprehensive security analysis using multiple modules
|
||||
Security Assessment Workflow - Temporal Version
|
||||
|
||||
Comprehensive security analysis using multiple modules.
|
||||
"""
|
||||
|
||||
# Copyright (c) 2025 FuzzingLabs
|
||||
@@ -13,240 +15,219 @@ Security Assessment Workflow - Comprehensive security analysis using multiple mo
|
||||
#
|
||||
# Additional attribution and requirements are provided in the NOTICE file.
|
||||
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from datetime import timedelta
|
||||
from typing import Dict, Any, Optional
|
||||
from prefect import flow, task
|
||||
import json
|
||||
|
||||
# Add modules to path
|
||||
sys.path.insert(0, '/app')
|
||||
from temporalio import workflow
|
||||
from temporalio.common import RetryPolicy
|
||||
|
||||
# Import modules
|
||||
from toolbox.modules.scanner import FileScanner
|
||||
from toolbox.modules.analyzer import SecurityAnalyzer
|
||||
from toolbox.modules.reporter import SARIFReporter
|
||||
# Import activity interfaces (will be executed by worker)
|
||||
with workflow.unsafe.imports_passed_through():
|
||||
import logging
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@task(name="file_scanning")
|
||||
async def scan_files_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@workflow.defn
|
||||
class SecurityAssessmentWorkflow:
|
||||
"""
|
||||
Task to scan files in the workspace.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Scanner configuration
|
||||
|
||||
Returns:
|
||||
Scanner results
|
||||
"""
|
||||
logger.info(f"Starting file scanning in {workspace}")
|
||||
scanner = FileScanner()
|
||||
|
||||
result = await scanner.execute(config, workspace)
|
||||
|
||||
logger.info(f"File scanning completed: {result.summary.get('total_files', 0)} files found")
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="security_analysis")
|
||||
async def analyze_security_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to analyze security vulnerabilities.
|
||||
|
||||
Args:
|
||||
workspace: Path to the workspace
|
||||
config: Analyzer configuration
|
||||
|
||||
Returns:
|
||||
Analysis results
|
||||
"""
|
||||
logger.info("Starting security analysis")
|
||||
analyzer = SecurityAnalyzer()
|
||||
|
||||
result = await analyzer.execute(config, workspace)
|
||||
|
||||
logger.info(
|
||||
f"Security analysis completed: {result.summary.get('total_findings', 0)} findings"
|
||||
)
|
||||
return result.dict()
|
||||
|
||||
|
||||
@task(name="report_generation")
|
||||
async def generate_report_task(
|
||||
scan_results: Dict[str, Any],
|
||||
analysis_results: Dict[str, Any],
|
||||
config: Dict[str, Any],
|
||||
workspace: Path
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Task to generate SARIF report from all findings.
|
||||
|
||||
Args:
|
||||
scan_results: Results from scanner
|
||||
analysis_results: Results from analyzer
|
||||
config: Reporter configuration
|
||||
workspace: Path to the workspace
|
||||
|
||||
Returns:
|
||||
SARIF report
|
||||
"""
|
||||
logger.info("Generating SARIF report")
|
||||
reporter = SARIFReporter()
|
||||
|
||||
# Combine findings from all modules
|
||||
all_findings = []
|
||||
|
||||
# Add scanner findings (only sensitive files, not all files)
|
||||
scanner_findings = scan_results.get("findings", [])
|
||||
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
|
||||
all_findings.extend(sensitive_findings)
|
||||
|
||||
# Add analyzer findings
|
||||
analyzer_findings = analysis_results.get("findings", [])
|
||||
all_findings.extend(analyzer_findings)
|
||||
|
||||
# Prepare reporter config
|
||||
reporter_config = {
|
||||
**config,
|
||||
"findings": all_findings,
|
||||
"tool_name": "FuzzForge Security Assessment",
|
||||
"tool_version": "1.0.0"
|
||||
}
|
||||
|
||||
result = await reporter.execute(reporter_config, workspace)
|
||||
|
||||
# Extract SARIF from result
|
||||
sarif = result.dict().get("sarif", {})
|
||||
|
||||
logger.info(f"Report generated with {len(all_findings)} total findings")
|
||||
return sarif
|
||||
|
||||
|
||||
@flow(name="security_assessment", log_prints=True)
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
volume_mode: str = "ro",
|
||||
scanner_config: Optional[Dict[str, Any]] = None,
|
||||
analyzer_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main security assessment workflow.
|
||||
Comprehensive security assessment workflow.
|
||||
|
||||
This workflow:
|
||||
1. Scans files in the workspace
|
||||
2. Analyzes code for security vulnerabilities
|
||||
3. Generates a SARIF report with all findings
|
||||
|
||||
Args:
|
||||
target_path: Path to the mounted workspace (default: /workspace)
|
||||
volume_mode: Volume mount mode (ro/rw)
|
||||
scanner_config: Configuration for file scanner
|
||||
analyzer_config: Configuration for security analyzer
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
Returns:
|
||||
SARIF-formatted findings report
|
||||
1. Downloads target from MinIO
|
||||
2. Scans files in the workspace
|
||||
3. Analyzes code for security vulnerabilities
|
||||
4. Generates a SARIF report with all findings
|
||||
5. Uploads results to MinIO
|
||||
6. Cleans up cache
|
||||
"""
|
||||
logger.info(f"Starting security assessment workflow")
|
||||
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
|
||||
|
||||
# Set workspace path
|
||||
workspace = Path(target_path)
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str,
|
||||
scanner_config: Optional[Dict[str, Any]] = None,
|
||||
analyzer_config: Optional[Dict[str, Any]] = None,
|
||||
reporter_config: Optional[Dict[str, Any]] = None
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Main workflow execution.
|
||||
|
||||
if not workspace.exists():
|
||||
logger.error(f"Workspace does not exist: {workspace}")
|
||||
return {
|
||||
"error": f"Workspace not found: {workspace}",
|
||||
"sarif": None
|
||||
}
|
||||
Args:
|
||||
target_id: UUID of the uploaded target in MinIO
|
||||
scanner_config: Configuration for file scanner
|
||||
analyzer_config: Configuration for security analyzer
|
||||
reporter_config: Configuration for SARIF reporter
|
||||
|
||||
# Default configurations
|
||||
if not scanner_config:
|
||||
scanner_config = {
|
||||
"patterns": ["*"],
|
||||
"check_sensitive": True,
|
||||
"calculate_hashes": False,
|
||||
"max_file_size": 10485760 # 10MB
|
||||
}
|
||||
Returns:
|
||||
Dictionary containing SARIF report and summary
|
||||
"""
|
||||
workflow_id = workflow.info().workflow_id
|
||||
|
||||
if not analyzer_config:
|
||||
analyzer_config = {
|
||||
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
|
||||
"check_secrets": True,
|
||||
"check_sql": True,
|
||||
"check_dangerous_functions": True
|
||||
}
|
||||
|
||||
if not reporter_config:
|
||||
reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
|
||||
try:
|
||||
# Execute workflow tasks
|
||||
logger.info("Phase 1: File scanning")
|
||||
scan_results = await scan_files_task(workspace, scanner_config)
|
||||
|
||||
logger.info("Phase 2: Security analysis")
|
||||
analysis_results = await analyze_security_task(workspace, analyzer_config)
|
||||
|
||||
logger.info("Phase 3: Report generation")
|
||||
sarif_report = await generate_report_task(
|
||||
scan_results,
|
||||
analysis_results,
|
||||
reporter_config,
|
||||
workspace
|
||||
workflow.logger.info(
|
||||
f"Starting SecurityAssessmentWorkflow "
|
||||
f"(workflow_id={workflow_id}, target_id={target_id})"
|
||||
)
|
||||
|
||||
# Log summary
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
results_count = len(sarif_report["runs"][0].get("results", []))
|
||||
logger.info(f"Workflow completed successfully with {results_count} findings")
|
||||
else:
|
||||
logger.info("Workflow completed successfully")
|
||||
# Default configurations
|
||||
if not scanner_config:
|
||||
scanner_config = {
|
||||
"patterns": ["*"],
|
||||
"check_sensitive": True,
|
||||
"calculate_hashes": False,
|
||||
"max_file_size": 10485760 # 10MB
|
||||
}
|
||||
|
||||
return sarif_report
|
||||
if not analyzer_config:
|
||||
analyzer_config = {
|
||||
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
|
||||
"check_secrets": True,
|
||||
"check_sql": True,
|
||||
"check_dangerous_functions": True
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Workflow failed: {e}")
|
||||
# Return error in SARIF format
|
||||
return {
|
||||
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
|
||||
"version": "2.1.0",
|
||||
"runs": [
|
||||
{
|
||||
"tool": {
|
||||
"driver": {
|
||||
"name": "FuzzForge Security Assessment",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
},
|
||||
"results": [],
|
||||
"invocations": [
|
||||
{
|
||||
"executionSuccessful": False,
|
||||
"exitCode": 1,
|
||||
"exitCodeDescription": str(e)
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
if not reporter_config:
|
||||
reporter_config = {
|
||||
"include_code_flows": False
|
||||
}
|
||||
|
||||
results = {
|
||||
"workflow_id": workflow_id,
|
||||
"target_id": target_id,
|
||||
"status": "running",
|
||||
"steps": []
|
||||
}
|
||||
|
||||
try:
|
||||
# Get run ID for workspace isolation (using shared mode for read-only analysis)
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
if __name__ == "__main__":
|
||||
# For local testing
|
||||
import asyncio
|
||||
# Step 1: Download target from MinIO
|
||||
workflow.logger.info("Step 1: Downloading target from MinIO")
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=1),
|
||||
maximum_interval=timedelta(seconds=30),
|
||||
maximum_attempts=3
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "download_target",
|
||||
"status": "success",
|
||||
"target_path": target_path
|
||||
})
|
||||
workflow.logger.info(f"✓ Target downloaded to: {target_path}")
|
||||
|
||||
asyncio.run(main_flow(
|
||||
target_path="/tmp/test",
|
||||
scanner_config={"patterns": ["*.py"]},
|
||||
analyzer_config={"check_secrets": True}
|
||||
))
|
||||
# Step 2: File scanning
|
||||
workflow.logger.info("Step 2: Scanning files")
|
||||
scan_results = await workflow.execute_activity(
|
||||
"scan_files",
|
||||
args=[target_path, scanner_config],
|
||||
start_to_close_timeout=timedelta(minutes=10),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "file_scanning",
|
||||
"status": "success",
|
||||
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ File scanning completed: "
|
||||
f"{scan_results.get('summary', {}).get('total_files', 0)} files"
|
||||
)
|
||||
|
||||
# Step 3: Security analysis
|
||||
workflow.logger.info("Step 3: Analyzing security vulnerabilities")
|
||||
analysis_results = await workflow.execute_activity(
|
||||
"analyze_security",
|
||||
args=[target_path, analyzer_config],
|
||||
start_to_close_timeout=timedelta(minutes=15),
|
||||
retry_policy=RetryPolicy(
|
||||
initial_interval=timedelta(seconds=2),
|
||||
maximum_interval=timedelta(seconds=60),
|
||||
maximum_attempts=2
|
||||
)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "security_analysis",
|
||||
"status": "success",
|
||||
"findings": analysis_results.get("summary", {}).get("total_findings", 0)
|
||||
})
|
||||
workflow.logger.info(
|
||||
f"✓ Security analysis completed: "
|
||||
f"{analysis_results.get('summary', {}).get('total_findings', 0)} findings"
|
||||
)
|
||||
|
||||
# Step 4: Generate SARIF report
|
||||
workflow.logger.info("Step 4: Generating SARIF report")
|
||||
sarif_report = await workflow.execute_activity(
|
||||
"generate_sarif_report",
|
||||
args=[scan_results, analysis_results, reporter_config, target_path],
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
results["steps"].append({
|
||||
"step": "report_generation",
|
||||
"status": "success"
|
||||
})
|
||||
|
||||
# Count total findings in SARIF
|
||||
total_findings = 0
|
||||
if sarif_report and "runs" in sarif_report:
|
||||
total_findings = len(sarif_report["runs"][0].get("results", []))
|
||||
|
||||
workflow.logger.info(f"✓ SARIF report generated with {total_findings} findings")
|
||||
|
||||
# Step 5: Upload results to MinIO
|
||||
workflow.logger.info("Step 5: Uploading results")
|
||||
try:
|
||||
results_url = await workflow.execute_activity(
|
||||
"upload_results",
|
||||
args=[workflow_id, sarif_report, "sarif"],
|
||||
start_to_close_timeout=timedelta(minutes=2)
|
||||
)
|
||||
results["results_url"] = results_url
|
||||
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Failed to upload results: {e}")
|
||||
results["results_url"] = None
|
||||
|
||||
# Step 6: Cleanup cache
|
||||
workflow.logger.info("Step 6: Cleaning up cache")
|
||||
try:
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "shared"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
workflow.logger.info("✓ Cache cleaned up (skipped for shared mode)")
|
||||
except Exception as e:
|
||||
workflow.logger.warning(f"Cache cleanup failed: {e}")
|
||||
|
||||
# Mark workflow as successful
|
||||
results["status"] = "success"
|
||||
results["sarif"] = sarif_report
|
||||
results["summary"] = {
|
||||
"total_findings": total_findings,
|
||||
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
|
||||
}
|
||||
workflow.logger.info(f"✓ Workflow completed successfully: {workflow_id}")
|
||||
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
workflow.logger.error(f"Workflow failed: {e}")
|
||||
results["status"] = "error"
|
||||
results["error"] = str(e)
|
||||
results["steps"].append({
|
||||
"step": "error",
|
||||
"status": "failed",
|
||||
"error": str(e)
|
||||
})
|
||||
raise
|
||||
|
||||
Reference in New Issue
Block a user