CI/CD Integration with Ephemeral Deployment Model (#14)

* feat: Complete migration from Prefect to Temporal

BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal

## Major Changes
- Replace Prefect with Temporal for workflow orchestration
- Implement vertical worker architecture (rust, android)
- Replace Docker registry with MinIO for unified storage
- Refactor activities to be co-located with workflows
- Update all API endpoints for Temporal compatibility

## Infrastructure
- New: docker-compose.temporal.yaml (Temporal + MinIO + workers)
- New: workers/ directory with rust and android vertical workers
- New: backend/src/temporal/ (manager, discovery)
- New: backend/src/storage/ (S3-cached storage with MinIO)
- New: backend/toolbox/common/ (shared storage activities)
- Deleted: docker-compose.yaml (old Prefect setup)
- Deleted: backend/src/core/prefect_manager.py
- Deleted: backend/src/services/prefect_stats_monitor.py
- Deleted: Docker registry and insecure-registries requirement

## Workflows
- Migrated: security_assessment workflow to Temporal
- New: rust_test workflow (example/test workflow)
- Deleted: secret_detection_scan (Prefect-based, to be reimplemented)
- Activities now co-located with workflows for independent testing

## API Changes
- Updated: backend/src/api/workflows.py (Temporal submission)
- Updated: backend/src/api/runs.py (Temporal status/results)
- Updated: backend/src/main.py (727 lines, TemporalManager integration)
- Updated: All 16 MCP tools to use TemporalManager

## Testing
-  All services healthy (Temporal, PostgreSQL, MinIO, workers, backend)
-  All API endpoints functional
-  End-to-end workflow test passed (72 findings from vulnerable_app)
-  MinIO storage integration working (target upload/download, results)
-  Worker activity discovery working (6 activities registered)
-  Tarball extraction working
-  SARIF report generation working

## Documentation
- ARCHITECTURE.md: Complete Temporal architecture documentation
- QUICKSTART_TEMPORAL.md: Getting started guide
- MIGRATION_DECISION.md: Why we chose Temporal over Prefect
- IMPLEMENTATION_STATUS.md: Migration progress tracking
- workers/README.md: Worker development guide

## Dependencies
- Added: temporalio>=1.6.0
- Added: boto3>=1.34.0 (MinIO S3 client)
- Removed: prefect>=3.4.18

* feat: Add Python fuzzing vertical with Atheris integration

This commit implements a complete Python fuzzing workflow using Atheris:

## Python Worker (workers/python/)
- Dockerfile with Python 3.11, Atheris, and build tools
- Generic worker.py for dynamic workflow discovery
- requirements.txt with temporalio, boto3, atheris dependencies
- Added to docker-compose.temporal.yaml with dedicated cache volume

## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/)
- Reusable module extending BaseModule
- Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py)
- Recursive search to find targets in nested directories
- Dynamically loads TestOneInput() function
- Configurable max_iterations and timeout
- Real-time stats callback support for live monitoring
- Returns findings as ModuleFinding objects

## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/)
- Temporal workflow for orchestrating fuzzing
- Downloads user code from MinIO
- Executes AtherisFuzzer module
- Uploads results to MinIO
- Cleans up cache after execution
- metadata.yaml with vertical: python for routing

## Test Project (test_projects/python_fuzz_waterfall/)
- Demonstrates stateful waterfall vulnerability
- main.py with check_secret() that leaks progress
- fuzz_target.py with Atheris TestOneInput() harness
- Complete README with usage instructions

## Backend Fixes
- Fixed parameter merging in REST API endpoints (workflows.py)
- Changed workflow parameter passing from positional args to kwargs (manager.py)
- Default parameters now properly merged with user parameters

## Testing
 Worker discovered AtherisFuzzingWorkflow
 Workflow executed end-to-end successfully
 Fuzz target auto-discovered in nested directories
 Atheris ran 100,000 iterations
 Results uploaded and cache cleaned

* chore: Complete Temporal migration with updated CLI/SDK/docs

This commit includes all remaining Temporal migration changes:

## CLI Updates (cli/)
- Updated workflow execution commands for Temporal
- Enhanced error handling and exceptions
- Updated dependencies in uv.lock

## SDK Updates (sdk/)
- Client methods updated for Temporal workflows
- Updated models for new workflow execution
- Updated dependencies in uv.lock

## Documentation Updates (docs/)
- Architecture documentation for Temporal
- Workflow concept documentation
- Resource management documentation (new)
- Debugging guide (new)
- Updated tutorials and how-to guides
- Troubleshooting updates

## README Updates
- Main README with Temporal instructions
- Backend README
- CLI README
- SDK README

## Other
- Updated IMPLEMENTATION_STATUS.md
- Removed old vulnerable_app.tar.gz

These changes complete the Temporal migration and ensure the
CLI/SDK work correctly with the new backend.

* fix: Use positional args instead of kwargs for Temporal workflows

The Temporal Python SDK's start_workflow() method doesn't accept
a 'kwargs' parameter. Workflows must receive parameters as positional
arguments via the 'args' parameter.

Changed from:
  args=workflow_args  # Positional arguments

This fixes the error:
  TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs'

Workflows now correctly receive parameters in order:
- security_assessment: [target_id, scanner_config, analyzer_config, reporter_config]
- atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds]
- rust_test: [target_id, test_message]

* fix: Filter metadata-only parameters from workflow arguments

SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5.
The issue was that target_path and volume_mode from default_parameters
were being passed to the workflow, when they should only be used by
the system for configuration.

Now filters out metadata-only parameters (target_path, volume_mode)
before passing arguments to workflow execution.

* refactor: Remove Prefect leftovers and volume mounting legacy

Complete cleanup of Prefect migration artifacts:

Backend:
- Delete registry.py and workflow_discovery.py (Prefect-specific files)
- Remove Docker validation from setup.py (no longer needed)
- Remove ResourceLimits and VolumeMount models
- Remove target_path and volume_mode from WorkflowSubmission
- Remove supported_volume_modes from API and discovery
- Clean up metadata.yaml files (remove volume/path fields)
- Simplify parameter filtering in manager.py

SDK:
- Remove volume_mode parameter from client methods
- Remove ResourceLimits and VolumeMount models
- Remove Prefect error patterns from docker_logs.py
- Clean up WorkflowSubmission and WorkflowMetadata models

CLI:
- Remove Volume Modes display from workflow info

All removed features are Prefect-specific or Docker volume mounting
artifacts. Temporal workflows use MinIO storage exclusively.

* feat: Add comprehensive test suite and benchmark infrastructure

- Add 68 unit tests for fuzzer, scanner, and analyzer modules
- Implement pytest-based test infrastructure with fixtures
- Add 6 performance benchmarks with category-specific thresholds
- Configure GitHub Actions for automated testing and benchmarking
- Add test and benchmark documentation

Test coverage:
- AtherisFuzzer: 8 tests
- CargoFuzzer: 14 tests
- FileScanner: 22 tests
- SecurityAnalyzer: 24 tests

All tests passing (68/68)
All benchmarks passing (6/6)

* fix: Resolve all ruff linting violations across codebase

Fixed 27 ruff violations in 12 files:
- Removed unused imports (Depends, Dict, Any, Optional, etc.)
- Fixed undefined workflow_info variable in workflows.py
- Removed dead code with undefined variables in atheris_fuzzer.py
- Changed f-string to regular string where no placeholders used

All files now pass ruff checks for CI/CD compliance.

* fix: Configure CI for unit tests only

- Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility
- Commented out integration-tests job (no integration tests yet)
- Updated test-summary to only depend on lint and unit-tests

CI will now run successfully with 68 unit tests. Integration tests can be added later.

* feat: Add CI/CD integration with ephemeral deployment model

Implements comprehensive CI/CD support for FuzzForge with on-demand worker management:

**Worker Management (v0.7.0)**
- Add WorkerManager for automatic worker lifecycle control
- Auto-start workers from stopped state when workflows execute
- Auto-stop workers after workflow completion
- Health checks and startup timeout handling (90s default)

**CI/CD Features**
- `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info)
- `--export-sarif` flag: Export findings in SARIF 2.1.0 format
- `--auto-start`/`--auto-stop` flags: Control worker lifecycle
- Exit code propagation: Returns 1 on blocking findings, 0 on success

**Exit Code Fix**
- Add `except typer.Exit: raise` handlers at 3 critical locations
- Move worker cleanup to finally block for guaranteed execution
- Exit codes now propagate correctly even when build fails

**CI Scripts & Examples**
- ci-start.sh: Start FuzzForge services with health checks
- ci-stop.sh: Clean shutdown with volume preservation option
- GitHub Actions workflow example (security-scan.yml)
- GitLab CI pipeline example (.gitlab-ci.example.yml)
- docker-compose.ci.yml: CI-optimized compose file with profiles

**OSS-Fuzz Integration**
- New ossfuzz_campaign workflow for running OSS-Fuzz projects
- OSS-Fuzz worker with Docker-in-Docker support
- Configurable campaign duration and project selection

**Documentation**
- Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md)
- Updated architecture docs with worker lifecycle details
- Updated workspace isolation documentation
- CLI README with worker management examples

**SDK Enhancements**
- Add get_workflow_worker_info() endpoint
- Worker vertical metadata in workflow responses

**Testing**
- All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing
- All monitoring commands tested: stats, crashes, status, finding
- Full CI pipeline simulation verified
- Exit codes verified for success/failure scenarios

Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers.

* fix: Resolve ruff linting violations in CI/CD code

- Remove unused variables (run_id, defaults, result)
- Remove unused imports
- Fix f-string without placeholders

All CI/CD integration files now pass ruff checks.
This commit is contained in:
tduhamel42
2025-10-14 10:13:45 +02:00
committed by GitHub
parent 987c49569c
commit 60ca088ecf
167 changed files with 26101 additions and 5703 deletions
@@ -0,0 +1,9 @@
"""
Atheris Fuzzing Workflow
Fuzzes user-provided Python code using Atheris.
"""
from .workflow import AtherisFuzzingWorkflow
__all__ = ["AtherisFuzzingWorkflow"]
@@ -0,0 +1,122 @@
"""
Atheris Fuzzing Workflow Activities
Activities specific to the Atheris fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_atheris")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the AtherisFuzzer module on user code.
This activity:
1. Imports the reusable AtherisFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's TestOneInput() function
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded code)
config: Fuzzer configuration (target_file, max_iterations, timeout_seconds)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_atheris (workspace={workspace_path})")
try:
# Import reusable AtherisFuzzer module
from modules.fuzzer import AtherisFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
logger.info(f"COVERAGE_DEBUG: coverage from stats_data = {coverage_value}")
stats_payload = {
"run_id": run_id,
"workflow": "atheris_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "atheris_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.warning(f"Error in stats callback: {e}")
# Add stats callback and run_id to config
config["stats_callback"] = stats_callback
config["run_id"] = run_id
# Execute the fuzzer module
fuzzer = AtherisFuzzer()
result = await fuzzer.execute(config, workspace)
logger.info(
f"✓ Fuzzing completed: "
f"{result.summary.get('total_executions', 0)} executions, "
f"{result.summary.get('crashes_found', 0)} crashes"
)
return result.dict()
except Exception as e:
logger.error(f"Fuzzing failed: {e}", exc_info=True)
raise
@@ -0,0 +1,65 @@
name: atheris_fuzzing
version: "1.0.0"
vertical: python
description: "Fuzz Python code using Atheris with real-time monitoring. Automatically discovers and fuzzes TestOneInput() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "atheris"
- "python"
- "coverage"
- "security"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_file: null
max_iterations: 1000000
timeout_seconds: 1800
parameters:
type: object
properties:
target_file:
type: string
description: "Python file with TestOneInput() function (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and vulnerabilities found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,175 @@
"""
Atheris Fuzzing Workflow - Temporal Version
Fuzzes user-provided Python code using Atheris with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class AtherisFuzzingWorkflow:
"""
Fuzz Python code using Atheris.
User workflow:
1. User runs: ff workflow run atheris_fuzzing .
2. CLI uploads project to MinIO
3. Worker downloads project
4. Worker fuzzes TestOneInput() function
5. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_file: Optional[str] = None, # Optional: specific file to fuzz
max_iterations: int = 1000000,
timeout_seconds: int = 1800 # 30 minutes default for fuzzing
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_file: Optional specific Python file with TestOneInput() (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting AtherisFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_file={target_file or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run Atheris fuzzing
workflow.logger.info("Step 2: Running Atheris fuzzing")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
fuzz_config = {
"target_file": target_file,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_atheris",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 60),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -0,0 +1,5 @@
"""Cargo Fuzzing Workflow"""
from .workflow import CargoFuzzingWorkflow
__all__ = ["CargoFuzzingWorkflow"]
@@ -0,0 +1,203 @@
"""
Cargo Fuzzing Workflow Activities
Activities specific to the cargo-fuzz fuzzing workflow.
"""
import logging
import sys
from datetime import datetime
from pathlib import Path
from typing import Dict, Any
import os
import httpx
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="fuzz_with_cargo")
async def fuzz_activity(workspace_path: str, config: dict) -> dict:
"""
Fuzzing activity using the CargoFuzzer module on user code.
This activity:
1. Imports the reusable CargoFuzzer module
2. Sets up real-time stats callback
3. Executes fuzzing on user's fuzz_target!() functions
4. Returns findings as ModuleResult
Args:
workspace_path: Path to the workspace directory (user's uploaded Rust project)
config: Fuzzer configuration (target_name, max_iterations, timeout_seconds, sanitizer)
Returns:
Fuzzer results dictionary (findings, summary, metadata)
"""
logger.info(f"Activity: fuzz_with_cargo (workspace={workspace_path})")
try:
# Import reusable CargoFuzzer module
from modules.fuzzer import CargoFuzzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
# Get activity info for real-time stats
info = activity.info()
run_id = info.workflow_id
# Define stats callback for real-time monitoring
async def stats_callback(stats_data: Dict[str, Any]):
"""Callback for live fuzzing statistics"""
try:
# Prepare stats payload for backend
coverage_value = stats_data.get("coverage", 0)
stats_payload = {
"run_id": run_id,
"workflow": "cargo_fuzzing",
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"unique_crashes": stats_data.get("crashes", 0),
"coverage": coverage_value,
"corpus_size": stats_data.get("corpus_size", 0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"last_crash_time": None
}
# POST stats to backend API for real-time monitoring
backend_url = os.getenv("BACKEND_URL", "http://backend:8000")
async with httpx.AsyncClient(timeout=5.0) as client:
try:
await client.post(
f"{backend_url}/fuzzing/{run_id}/stats",
json=stats_payload
)
except Exception as http_err:
logger.debug(f"Failed to post stats to backend: {http_err}")
# Also log for debugging
logger.info("LIVE_STATS", extra={
"stats_type": "fuzzing_live_update",
"workflow_type": "cargo_fuzzing",
"run_id": run_id,
"executions": stats_data.get("total_execs", 0),
"executions_per_sec": stats_data.get("execs_per_sec", 0.0),
"crashes": stats_data.get("crashes", 0),
"corpus_size": stats_data.get("corpus_size", 0),
"coverage": stats_data.get("coverage", 0.0),
"elapsed_time": stats_data.get("elapsed_time", 0),
"timestamp": datetime.utcnow().isoformat()
})
except Exception as e:
logger.error(f"Stats callback error: {e}")
# Initialize CargoFuzzer module
fuzzer = CargoFuzzer()
# Execute fuzzing with stats callback
module_result = await fuzzer.execute(
config=config,
workspace=workspace,
stats_callback=stats_callback
)
# Convert ModuleResult to dictionary
result_dict = {
"findings": [],
"summary": module_result.summary,
"metadata": module_result.metadata,
"status": module_result.status,
"error": module_result.error
}
# Convert findings to dict format
for finding in module_result.findings:
finding_dict = {
"id": finding.id,
"title": finding.title,
"description": finding.description,
"severity": finding.severity,
"category": finding.category,
"file_path": finding.file_path,
"line_start": finding.line_start,
"line_end": finding.line_end,
"code_snippet": finding.code_snippet,
"recommendation": finding.recommendation,
"metadata": finding.metadata
}
result_dict["findings"].append(finding_dict)
# Generate SARIF report from findings
if module_result.findings:
# Convert findings to SARIF format
severity_map = {
"critical": "error",
"high": "error",
"medium": "warning",
"low": "note",
"info": "note"
}
results = []
for finding in module_result.findings:
result = {
"ruleId": finding.metadata.get("rule_id", finding.category),
"level": severity_map.get(finding.severity, "warning"),
"message": {"text": finding.description},
"locations": []
}
if finding.file_path:
location = {
"physicalLocation": {
"artifactLocation": {"uri": finding.file_path},
"region": {
"startLine": finding.line_start or 1,
"endLine": finding.line_end or finding.line_start or 1
}
}
}
result["locations"].append(location)
results.append(result)
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": [{
"tool": {
"driver": {
"name": "cargo-fuzz",
"version": "0.11.2"
}
},
"results": results
}]
}
else:
result_dict["sarif"] = {
"version": "2.1.0",
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"runs": []
}
logger.info(
f"Fuzzing activity completed: {len(module_result.findings)} crashes found, "
f"{module_result.summary.get('total_executions', 0)} executions"
)
return result_dict
except Exception as e:
logger.error(f"Fuzzing activity failed: {e}", exc_info=True)
raise
@@ -0,0 +1,71 @@
name: cargo_fuzzing
version: "1.0.0"
vertical: rust
description: "Fuzz Rust code using cargo-fuzz with real-time monitoring. Automatically discovers and fuzzes fuzz_target!() functions in user code."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "cargo-fuzz"
- "rust"
- "libfuzzer"
- "memory-safety"
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
workspace_isolation: "isolated"
default_parameters:
target_name: null
max_iterations: 1000000
timeout_seconds: 1800
sanitizer: "address"
parameters:
type: object
properties:
target_name:
type: string
description: "Fuzz target name from fuzz/fuzz_targets/ (auto-discovered if not specified)"
max_iterations:
type: integer
default: 1000000
description: "Maximum fuzzing iterations"
timeout_seconds:
type: integer
default: 1800
description: "Fuzzing timeout in seconds (30 minutes)"
sanitizer:
type: string
enum: ["address", "memory", "undefined"]
default: "address"
description: "Sanitizer to use (address, memory, undefined)"
output_schema:
type: object
properties:
findings:
type: array
description: "Crashes and memory safety issues found during fuzzing"
items:
type: object
properties:
title:
type: string
severity:
type: string
category:
type: string
metadata:
type: object
summary:
type: object
description: "Fuzzing execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
execution_time:
type: number
@@ -0,0 +1,180 @@
"""
Cargo Fuzzing Workflow - Temporal Version
Fuzzes user-provided Rust code using cargo-fuzz with real-time monitoring.
"""
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class CargoFuzzingWorkflow:
"""
Fuzz Rust code using cargo-fuzz (libFuzzer).
User workflow:
1. User runs: ff workflow run cargo_fuzzing .
2. CLI uploads Rust project to MinIO
3. Worker downloads project
4. Worker discovers fuzz targets in fuzz/fuzz_targets/
5. Worker fuzzes the target with cargo-fuzz
6. Crashes reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # MinIO UUID of uploaded user code
target_name: Optional[str] = None, # Optional: specific fuzz target name
max_iterations: int = 1000000,
timeout_seconds: int = 1800, # 30 minutes default for fuzzing
sanitizer: str = "address"
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of the uploaded target in MinIO
target_name: Optional specific fuzz target name (auto-discovered if None)
max_iterations: Maximum fuzzing iterations
timeout_seconds: Fuzzing timeout in seconds
sanitizer: Sanitizer to use (address, memory, undefined)
Returns:
Dictionary containing findings and summary
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting CargoFuzzingWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id}, "
f"target_name={target_name or 'auto-discover'}, max_iterations={max_iterations}, "
f"timeout_seconds={timeout_seconds}, sanitizer={sanitizer})"
)
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation
run_id = workflow.info().run_id
# Step 1: Download user's Rust project from MinIO
workflow.logger.info("Step 1: Downloading user code from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ User code downloaded to: {target_path}")
# Step 2: Run cargo-fuzz
workflow.logger.info("Step 2: Running cargo-fuzz")
# Use defaults if parameters are None
actual_max_iterations = max_iterations if max_iterations is not None else 1000000
actual_timeout_seconds = timeout_seconds if timeout_seconds is not None else 1800
actual_sanitizer = sanitizer if sanitizer is not None else "address"
fuzz_config = {
"target_name": target_name,
"max_iterations": actual_max_iterations,
"timeout_seconds": actual_timeout_seconds,
"sanitizer": actual_sanitizer
}
fuzz_results = await workflow.execute_activity(
"fuzz_with_cargo",
args=[target_path, fuzz_config],
start_to_close_timeout=timedelta(seconds=actual_timeout_seconds + 120),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
results["steps"].append({
"step": "fuzzing",
"status": "success",
"executions": fuzz_results.get("summary", {}).get("total_executions", 0),
"crashes": fuzz_results.get("summary", {}).get("crashes_found", 0)
})
workflow.logger.info(
f"✓ Fuzzing completed: "
f"{fuzz_results.get('summary', {}).get('total_executions', 0)} executions, "
f"{fuzz_results.get('summary', {}).get('crashes_found', 0)} crashes"
)
# Step 3: Upload results to MinIO
workflow.logger.info("Step 3: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, fuzz_results, "json"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 4: Cleanup cache
workflow.logger.info("Step 4: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "isolated"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["findings"] = fuzz_results.get("findings", [])
results["summary"] = fuzz_results.get("summary", {})
results["sarif"] = fuzz_results.get("sarif") or {}
workflow.logger.info(
f"✓ Workflow completed successfully: {workflow_id} "
f"({results['summary'].get('crashes_found', 0)} crashes found)"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
@@ -1,12 +0,0 @@
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,47 +0,0 @@
# Secret Detection Workflow Dockerfile
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog (use direct binary download to avoid install script issues)
RUN curl -sSfL "https://github.com/trufflesecurity/trufflehog/releases/download/v3.63.2/trufflehog_3.63.2_linux_amd64.tar.gz" -o trufflehog.tar.gz \
&& tar -xzf trufflehog.tar.gz \
&& mv trufflehog /usr/local/bin/ \
&& rm trufflehog.tar.gz
# Install Gitleaks (use specific version to avoid API rate limiting)
RUN wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz \
&& tar -xzf gitleaks_8.18.2_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_8.18.2_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create toolbox directory structure
RUN mkdir -p /opt/prefect/toolbox
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# The toolbox code will be mounted at runtime from the backend container
# This includes:
# - /opt/prefect/toolbox/modules/base.py
# - /opt/prefect/toolbox/modules/secret_detection/ (TruffleHog, Gitleaks modules)
# - /opt/prefect/toolbox/modules/reporter/ (SARIF reporter)
# - /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
VOLUME /opt/prefect/toolbox
# Set working directory for execution
WORKDIR /opt/prefect
@@ -1,58 +0,0 @@
# Secret Detection Workflow Dockerfile - Self-Contained Version
# This version copies all required modules into the image for complete isolation
FROM prefecthq/prefect:3-python3.11
# Install system dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
git \
ca-certificates \
gnupg \
&& rm -rf /var/lib/apt/lists/*
# Install TruffleHog
RUN curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh -s -- -b /usr/local/bin
# Install Gitleaks
RUN wget https://github.com/gitleaks/gitleaks/releases/latest/download/gitleaks_linux_x64.tar.gz \
&& tar -xzf gitleaks_linux_x64.tar.gz \
&& mv gitleaks /usr/local/bin/ \
&& rm gitleaks_linux_x64.tar.gz
# Verify installations
RUN trufflehog --version && gitleaks version
# Set working directory
WORKDIR /opt/prefect
# Create directory structure
RUN mkdir -p /opt/prefect/toolbox/modules/secret_detection \
/opt/prefect/toolbox/modules/reporter \
/opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan
# Copy the base module and required modules
COPY toolbox/modules/base.py /opt/prefect/toolbox/modules/base.py
COPY toolbox/modules/__init__.py /opt/prefect/toolbox/modules/__init__.py
COPY toolbox/modules/secret_detection/ /opt/prefect/toolbox/modules/secret_detection/
COPY toolbox/modules/reporter/ /opt/prefect/toolbox/modules/reporter/
# Copy the workflow code
COPY toolbox/workflows/comprehensive/secret_detection_scan/ /opt/prefect/toolbox/workflows/comprehensive/secret_detection_scan/
# Copy toolbox init files
COPY toolbox/__init__.py /opt/prefect/toolbox/__init__.py
COPY toolbox/workflows/__init__.py /opt/prefect/toolbox/workflows/__init__.py
COPY toolbox/workflows/comprehensive/__init__.py /opt/prefect/toolbox/workflows/comprehensive/__init__.py
# Install Python dependencies for the modules
RUN pip install --no-cache-dir \
pydantic \
asyncio
# Set environment variables
ENV PYTHONPATH=/opt/prefect/toolbox:/opt/prefect/toolbox/workflows
ENV WORKFLOW_NAME=secret_detection_scan
# Set default command (can be overridden)
CMD ["python", "-m", "toolbox.workflows.comprehensive.secret_detection_scan.workflow"]
@@ -1,130 +0,0 @@
# Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple industry-standard tools:
- **TruffleHog**: Comprehensive secret detection with verification capabilities
- **Gitleaks**: Git-specific secret scanning and leak detection
## Features
- **Parallel Execution**: Runs TruffleHog and Gitleaks concurrently for faster results
- **Deduplication**: Automatically removes duplicate findings across tools
- **SARIF Output**: Generates standardized SARIF reports for integration with security tools
- **Configurable**: Supports extensive configuration for both tools
## Dependencies
### Required Modules
- `toolbox.modules.secret_detection.trufflehog`
- `toolbox.modules.secret_detection.gitleaks`
- `toolbox.modules.reporter` (SARIF reporter)
- `toolbox.modules.base` (Base module interface)
### External Tools
- TruffleHog v3.63.2+
- Gitleaks v8.18.0+
## Docker Deployment
This workflow provides two Docker deployment approaches:
### 1. Volume-Based Approach (Default: `Dockerfile`)
**Advantages:**
- Live code updates without rebuilding images
- Smaller image sizes
- Consistent module versions across workflows
- Faster development iteration
**How it works:**
- Docker image contains only external tools (TruffleHog, Gitleaks)
- Python modules are mounted at runtime from the backend container
- Backend manages code synchronization via shared volumes
### 2. Self-Contained Approach (`Dockerfile.self-contained`)
**Advantages:**
- Complete isolation and reproducibility
- No runtime dependencies on backend code
- Can run independently of FuzzForge platform
- Better for CI/CD integration
**How it works:**
- All required Python modules are copied into the Docker image
- Image is completely self-contained
- Larger image size but fully portable
## Configuration
### TruffleHog Configuration
```json
{
"trufflehog_config": {
"verify": true, // Verify discovered secrets
"concurrency": 10, // Number of concurrent workers
"max_depth": 10, // Maximum directory depth
"include_detectors": [], // Specific detectors to include
"exclude_detectors": [] // Specific detectors to exclude
}
}
```
### Gitleaks Configuration
```json
{
"gitleaks_config": {
"scan_mode": "detect", // "detect" or "protect"
"redact": true, // Redact secrets in output
"max_target_megabytes": 100, // Maximum file size (MB)
"no_git": false, // Scan without Git context
"config_file": "", // Custom Gitleaks config
"baseline_file": "" // Baseline file for known findings
}
}
```
## Usage Example
```bash
curl -X POST "http://localhost:8000/workflows/secret_detection_scan/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/path/to/scan",
"volume_mode": "ro",
"parameters": {
"trufflehog_config": {
"verify": true,
"concurrency": 15
},
"gitleaks_config": {
"scan_mode": "detect",
"max_target_megabytes": 200
}
}
}'
```
## Output Format
The workflow generates a SARIF report containing:
- All unique findings from both tools
- Severity levels mapped to standard scale
- File locations and line numbers
- Detailed descriptions and recommendations
- Tool-specific metadata
## Performance Considerations
- **TruffleHog**: CPU-intensive with verification enabled
- **Gitleaks**: Memory-intensive for large repositories
- **Recommended Resources**: 512Mi memory, 500m CPU
- **Typical Runtime**: 1-5 minutes for small repos, 10-30 minutes for large ones
## Security Notes
- Secrets are redacted in output by default
- Verified secrets are marked with higher severity
- Both tools support custom rules and exclusions
- Consider using baseline files for known false positives
@@ -1,17 +0,0 @@
"""
Secret Detection Scan Workflow
This package contains the comprehensive secret detection workflow that combines
multiple secret detection tools for thorough analysis.
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
@@ -1,113 +0,0 @@
name: secret_detection_scan
version: "2.0.0"
description: "Comprehensive secret detection using TruffleHog and Gitleaks"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "secrets"
- "credentials"
- "detection"
- "trufflehog"
- "gitleaks"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "trufflehog"
- "gitleaks"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
trufflehog_config: {}
gitleaks_config: {}
reporter_config: {}
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
trufflehog_config:
type: object
description: "TruffleHog configuration"
properties:
verify:
type: boolean
description: "Verify discovered secrets"
concurrency:
type: integer
description: "Number of concurrent workers"
max_depth:
type: integer
description: "Maximum directory depth to scan"
include_detectors:
type: array
items:
type: string
description: "Specific detectors to include"
exclude_detectors:
type: array
items:
type: string
description: "Specific detectors to exclude"
gitleaks_config:
type: object
description: "Gitleaks configuration"
properties:
scan_mode:
type: string
enum: ["detect", "protect"]
description: "Scan mode"
redact:
type: boolean
description: "Redact secrets in output"
max_target_megabytes:
type: integer
description: "Maximum file size to scan (MB)"
no_git:
type: boolean
description: "Scan files without Git context"
config_file:
type: string
description: "Path to custom configuration file"
baseline_file:
type: string
description: "Path to baseline file"
reporter_config:
type: object
description: "SARIF reporter configuration"
properties:
output_file:
type: string
description: "Output SARIF file name"
include_code_flows:
type: boolean
description: "Include code flow information"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"
@@ -1,290 +0,0 @@
"""
Secret Detection Scan Workflow
This workflow performs comprehensive secret detection using multiple tools:
- TruffleHog: Comprehensive secret detection with verification
- Gitleaks: Git-specific secret scanning
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from typing import Dict, Any, List, Optional
from prefect import flow, task
from prefect.artifacts import create_markdown_artifact, create_table_artifact
import asyncio
import json
# Add modules to path
sys.path.insert(0, '/app')
# Import modules
from toolbox.modules.secret_detection.trufflehog import TruffleHogModule
from toolbox.modules.secret_detection.gitleaks import GitleaksModule
from toolbox.modules.reporter import SARIFReporter
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="trufflehog_scan")
async def run_trufflehog_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run TruffleHog secret detection.
Args:
workspace: Path to the workspace
config: TruffleHog configuration
Returns:
TruffleHog results
"""
logger.info("Running TruffleHog secret detection")
module = TruffleHogModule()
result = await module.execute(config, workspace)
logger.info(f"TruffleHog completed: {result.summary.get('total_secrets', 0)} secrets found")
return result.dict()
@task(name="gitleaks_scan")
async def run_gitleaks_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to run Gitleaks secret detection.
Args:
workspace: Path to the workspace
config: Gitleaks configuration
Returns:
Gitleaks results
"""
logger.info("Running Gitleaks secret detection")
module = GitleaksModule()
result = await module.execute(config, workspace)
logger.info(f"Gitleaks completed: {result.summary.get('total_leaks', 0)} leaks found")
return result.dict()
@task(name="aggregate_findings")
async def aggregate_findings_task(
trufflehog_results: Dict[str, Any],
gitleaks_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to aggregate findings from all secret detection tools.
Args:
trufflehog_results: Results from TruffleHog
gitleaks_results: Results from Gitleaks
config: Reporter configuration
workspace: Path to workspace
Returns:
Aggregated SARIF report
"""
logger.info("Aggregating secret detection findings")
# Combine all findings
all_findings = []
# Add TruffleHog findings
trufflehog_findings = trufflehog_results.get("findings", [])
all_findings.extend(trufflehog_findings)
# Add Gitleaks findings
gitleaks_findings = gitleaks_results.get("findings", [])
all_findings.extend(gitleaks_findings)
# Deduplicate findings based on file path and line number
unique_findings = []
seen_signatures = set()
for finding in all_findings:
# Create signature for deduplication
signature = (
finding.get("file_path", ""),
finding.get("line_start", 0),
finding.get("title", "").lower()[:50] # First 50 chars of title
)
if signature not in seen_signatures:
seen_signatures.add(signature)
unique_findings.append(finding)
else:
logger.debug(f"Deduplicated finding: {signature}")
logger.info(f"Aggregated {len(unique_findings)} unique findings from {len(all_findings)} total")
# Generate SARIF report
reporter = SARIFReporter()
reporter_config = {
**config,
"findings": unique_findings,
"tool_name": "FuzzForge Secret Detection",
"tool_version": "1.0.0",
"tool_description": "Comprehensive secret detection using TruffleHog and Gitleaks"
}
result = await reporter.execute(reporter_config, workspace)
return result.dict().get("sarif", {})
@flow(name="secret_detection_scan", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
trufflehog_config: Optional[Dict[str, Any]] = None,
gitleaks_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main secret detection workflow.
This workflow:
1. Runs TruffleHog for comprehensive secret detection
2. Runs Gitleaks for Git-specific secret detection
3. Aggregates and deduplicates findings
4. Generates a unified SARIF report
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
trufflehog_config: Configuration for TruffleHog
gitleaks_config: Configuration for Gitleaks
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
"""
logger.info("Starting comprehensive secret detection workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
# Default configurations - merge with provided configs to ensure defaults are always applied
default_trufflehog_config = {
"verify": False,
"concurrency": 10,
"max_depth": 10,
"no_git": True # Add no_git for filesystem scanning
}
trufflehog_config = {**default_trufflehog_config, **(trufflehog_config or {})}
default_gitleaks_config = {
"scan_mode": "detect",
"redact": True,
"max_target_megabytes": 100,
"no_git": True # Critical for non-git directories
}
gitleaks_config = {**default_gitleaks_config, **(gitleaks_config or {})}
default_reporter_config = {
"include_code_flows": False
}
reporter_config = {**default_reporter_config, **(reporter_config or {})}
try:
# Run secret detection tools in parallel
logger.info("Phase 1: Running secret detection tools")
# Create tasks for parallel execution
trufflehog_task_result = run_trufflehog_task(workspace, trufflehog_config)
gitleaks_task_result = run_gitleaks_task(workspace, gitleaks_config)
# Wait for both to complete
trufflehog_results, gitleaks_results = await asyncio.gather(
trufflehog_task_result,
gitleaks_task_result,
return_exceptions=True
)
# Handle any exceptions
if isinstance(trufflehog_results, Exception):
logger.error(f"TruffleHog failed: {trufflehog_results}")
trufflehog_results = {"findings": [], "status": "failed"}
if isinstance(gitleaks_results, Exception):
logger.error(f"Gitleaks failed: {gitleaks_results}")
gitleaks_results = {"findings": [], "status": "failed"}
# Aggregate findings
logger.info("Phase 2: Aggregating findings")
sarif_report = await aggregate_findings_task(
trufflehog_results,
gitleaks_results,
reporter_config,
workspace
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} unique secret findings")
# Log tool-specific stats
trufflehog_count = len(trufflehog_results.get("findings", []))
gitleaks_count = len(gitleaks_results.get("findings", []))
logger.info(f"Tool results - TruffleHog: {trufflehog_count}, Gitleaks: {gitleaks_count}")
else:
logger.info("Workflow completed successfully with no findings")
return sarif_report
except Exception as e:
logger.error(f"Secret detection workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Secret Detection",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
}
if __name__ == "__main__":
# For local testing
import asyncio
asyncio.run(main_flow(
target_path="/tmp/test",
trufflehog_config={"verify": True, "max_depth": 5},
gitleaks_config={"scan_mode": "detect"}
))
@@ -0,0 +1,113 @@
name: ossfuzz_campaign
version: "1.0.0"
vertical: ossfuzz
description: "Generic OSS-Fuzz fuzzing campaign. Automatically reads project configuration from OSS-Fuzz repo and runs fuzzing using Google's infrastructure."
author: "FuzzForge Team"
tags:
- "fuzzing"
- "oss-fuzz"
- "libfuzzer"
- "afl"
- "honggfuzz"
- "memory-safety"
- "security"
# Workspace isolation mode
# OSS-Fuzz campaigns use isolated mode for safe concurrent campaigns
workspace_isolation: "isolated"
default_parameters:
project_name: null
campaign_duration_hours: 1
override_engine: null
override_sanitizer: null
max_iterations: null
parameters:
type: object
required:
- project_name
properties:
project_name:
type: string
description: "OSS-Fuzz project name (e.g., 'curl', 'sqlite3', 'libxml2')"
examples:
- "curl"
- "sqlite3"
- "libxml2"
- "openssl"
- "zlib"
campaign_duration_hours:
type: integer
default: 1
minimum: 1
maximum: 168 # 1 week max
description: "How many hours to run the fuzzing campaign"
override_engine:
type: string
enum: ["libfuzzer", "afl", "honggfuzz"]
description: "Override fuzzing engine from project.yaml (optional)"
override_sanitizer:
type: string
enum: ["address", "memory", "undefined", "dataflow"]
description: "Override sanitizer from project.yaml (optional)"
max_iterations:
type: integer
minimum: 1000
description: "Optional limit on fuzzing iterations (optional)"
output_schema:
type: object
properties:
project_name:
type: string
description: "OSS-Fuzz project that was fuzzed"
summary:
type: object
description: "Campaign execution summary"
properties:
total_executions:
type: integer
crashes_found:
type: integer
unique_crashes:
type: integer
duration_hours:
type: number
engine_used:
type: string
sanitizer_used:
type: string
crashes:
type: array
description: "List of crash file paths"
items:
type: string
sarif:
type: object
description: "SARIF-formatted crash reports (future)"
examples:
- name: "Fuzz curl for 1 hour"
parameters:
project_name: "curl"
campaign_duration_hours: 1
- name: "Fuzz sqlite3 with AFL"
parameters:
project_name: "sqlite3"
campaign_duration_hours: 2
override_engine: "afl"
- name: "Fuzz libxml2 with memory sanitizer"
parameters:
project_name: "libxml2"
campaign_duration_hours: 6
override_sanitizer: "memory"
@@ -0,0 +1,219 @@
"""
OSS-Fuzz Campaign Workflow - Temporal Version
Generic workflow for running OSS-Fuzz campaigns using Google's infrastructure.
Automatically reads project configuration from OSS-Fuzz project.yaml files.
"""
import asyncio
from datetime import timedelta
from typing import Dict, Any, Optional
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import for type hints (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
logger = logging.getLogger(__name__)
@workflow.defn
class OssfuzzCampaignWorkflow:
"""
Generic OSS-Fuzz fuzzing campaign workflow.
User workflow:
1. User runs: ff workflow run ossfuzz_campaign . project_name=curl
2. Worker loads project config from OSS-Fuzz repo
3. Worker builds project using OSS-Fuzz's build system
4. Worker runs fuzzing with engines from project.yaml
5. Crashes and corpus reported as findings
"""
@workflow.run
async def run(
self,
target_id: str, # Required by FuzzForge (not used, OSS-Fuzz downloads from Google)
project_name: str, # Required: OSS-Fuzz project name (e.g., "curl", "sqlite3")
campaign_duration_hours: int = 1,
override_engine: Optional[str] = None, # Override engine from project.yaml
override_sanitizer: Optional[str] = None, # Override sanitizer from project.yaml
max_iterations: Optional[int] = None # Optional: limit fuzzing iterations
) -> Dict[str, Any]:
"""
Main workflow execution.
Args:
target_id: UUID of uploaded target (not used, required by FuzzForge)
project_name: Name of OSS-Fuzz project (e.g., "curl", "sqlite3", "libxml2")
campaign_duration_hours: How many hours to fuzz (default: 1)
override_engine: Override fuzzing engine from project.yaml
override_sanitizer: Override sanitizer from project.yaml
max_iterations: Optional limit on fuzzing iterations
Returns:
Dictionary containing crashes, stats, and SARIF report
"""
workflow_id = workflow.info().workflow_id
workflow.logger.info(
f"Starting OSS-Fuzz Campaign for project '{project_name}' "
f"(workflow_id={workflow_id}, duration={campaign_duration_hours}h)"
)
results = {
"workflow_id": workflow_id,
"project_name": project_name,
"status": "running",
"steps": []
}
try:
# Step 1: Load OSS-Fuzz project configuration
workflow.logger.info(f"Step 1: Loading project config for '{project_name}'")
project_config = await workflow.execute_activity(
"load_ossfuzz_project",
args=[project_name],
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "load_config",
"status": "success",
"language": project_config.get("language"),
"engines": project_config.get("fuzzing_engines", []),
"sanitizers": project_config.get("sanitizers", [])
})
workflow.logger.info(
f"✓ Loaded config: language={project_config.get('language')}, "
f"engines={project_config.get('fuzzing_engines')}"
)
# Step 2: Build project using OSS-Fuzz infrastructure
workflow.logger.info(f"Step 2: Building project '{project_name}'")
build_result = await workflow.execute_activity(
"build_ossfuzz_project",
args=[
project_name,
project_config,
override_sanitizer,
override_engine
],
start_to_close_timeout=timedelta(minutes=30),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "build_project",
"status": "success",
"fuzz_targets": len(build_result.get("fuzz_targets", [])),
"sanitizer": build_result.get("sanitizer_used"),
"engine": build_result.get("engine_used")
})
workflow.logger.info(
f"✓ Build completed: {len(build_result.get('fuzz_targets', []))} fuzz targets found"
)
if not build_result.get("fuzz_targets"):
raise Exception(f"No fuzz targets found for project {project_name}")
# Step 3: Run fuzzing on discovered targets
workflow.logger.info(f"Step 3: Fuzzing {len(build_result['fuzz_targets'])} targets")
# Determine which engine to use
engine_to_use = override_engine if override_engine else build_result["engine_used"]
duration_seconds = campaign_duration_hours * 3600
# Fuzz each target (in parallel if multiple targets)
fuzz_futures = []
for target_path in build_result["fuzz_targets"]:
future = workflow.execute_activity(
"fuzz_target",
args=[target_path, engine_to_use, duration_seconds, None, None],
start_to_close_timeout=timedelta(seconds=duration_seconds + 300),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=1 # Fuzzing shouldn't retry
)
)
fuzz_futures.append(future)
# Wait for all fuzzing to complete
fuzz_results = await asyncio.gather(*fuzz_futures, return_exceptions=True)
# Aggregate results
total_execs = 0
total_crashes = 0
all_crashes = []
for i, result in enumerate(fuzz_results):
if isinstance(result, Exception):
workflow.logger.error(f"Fuzzing failed for target {i}: {result}")
continue
total_execs += result.get("total_executions", 0)
total_crashes += result.get("crashes", 0)
all_crashes.extend(result.get("crash_files", []))
results["steps"].append({
"step": "fuzzing",
"status": "success",
"total_executions": total_execs,
"crashes_found": total_crashes,
"targets_fuzzed": len(build_result["fuzz_targets"])
})
workflow.logger.info(
f"✓ Fuzzing completed: {total_execs} executions, {total_crashes} crashes"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
# TODO: Implement crash minimization and SARIF generation
# For now, return raw results
results["status"] = "success"
results["summary"] = {
"project": project_name,
"total_executions": total_execs,
"crashes_found": total_crashes,
"unique_crashes": len(set(all_crashes)),
"duration_hours": campaign_duration_hours,
"engine_used": engine_to_use,
"sanitizer_used": build_result.get("sanitizer_used")
}
results["crashes"] = all_crashes[:100] # Limit to first 100 crashes
workflow.logger.info(
f"✓ Campaign completed: {project_name} - "
f"{total_execs} execs, {total_crashes} crashes"
)
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise
-187
View File
@@ -1,187 +0,0 @@
"""
Manual Workflow Registry for Prefect Deployment
This file contains the manual registry of all workflows that can be deployed.
Developers MUST add their workflows here after creating them.
This approach is required because:
1. Prefect cannot deploy dynamically imported flows
2. Docker deployment needs static flow references
3. Explicit registration provides better control and visibility
"""
# Copyright (c) 2025 FuzzingLabs
#
# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
# at the root of this repository for details.
#
# After the Change Date (four years from publication), this version of the
# Licensed Work will be made available under the Apache License, Version 2.0.
# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
#
# Additional attribution and requirements are provided in the NOTICE file.
from typing import Dict, Any, Callable
import logging
logger = logging.getLogger(__name__)
# Import only essential workflows
# Import each workflow individually to handle failures gracefully
security_assessment_flow = None
secret_detection_flow = None
# Try to import each workflow individually
try:
from .security_assessment.workflow import main_flow as security_assessment_flow
except ImportError as e:
logger.warning(f"Failed to import security_assessment workflow: {e}")
try:
from .comprehensive.secret_detection_scan.workflow import main_flow as secret_detection_flow
except ImportError as e:
logger.warning(f"Failed to import secret_detection_scan workflow: {e}")
# Manual registry - developers add workflows here after creation
# Only include workflows that were successfully imported
WORKFLOW_REGISTRY: Dict[str, Dict[str, Any]] = {}
# Add workflows that were successfully imported
if security_assessment_flow is not None:
WORKFLOW_REGISTRY["security_assessment"] = {
"flow": security_assessment_flow,
"module_path": "toolbox.workflows.security_assessment.workflow",
"function_name": "main_flow",
"description": "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["security", "scanner", "analyzer", "static-analysis", "sarif"]
}
if secret_detection_flow is not None:
WORKFLOW_REGISTRY["secret_detection_scan"] = {
"flow": secret_detection_flow,
"module_path": "toolbox.workflows.comprehensive.secret_detection_scan.workflow",
"function_name": "main_flow",
"description": "Comprehensive secret detection using TruffleHog and Gitleaks for thorough credential scanning",
"version": "1.0.0",
"author": "FuzzForge Team",
"tags": ["secrets", "credentials", "detection", "trufflehog", "gitleaks", "comprehensive"]
}
#
# To add a new workflow, follow this pattern:
#
# "my_new_workflow": {
# "flow": my_new_flow_function, # Import the flow function above
# "module_path": "toolbox.workflows.my_new_workflow.workflow",
# "function_name": "my_new_flow_function",
# "description": "Description of what this workflow does",
# "version": "1.0.0",
# "author": "Developer Name",
# "tags": ["tag1", "tag2"]
# }
def get_workflow_flow(workflow_name: str) -> Callable:
"""
Get the flow function for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Flow function
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}. "
f"Please add the workflow to toolbox/workflows/registry.py"
)
return WORKFLOW_REGISTRY[workflow_name]["flow"]
def get_workflow_info(workflow_name: str) -> Dict[str, Any]:
"""
Get registry information for a workflow.
Args:
workflow_name: Name of the workflow
Returns:
Registry information dictionary
Raises:
KeyError: If workflow not found in registry
"""
if workflow_name not in WORKFLOW_REGISTRY:
available = list(WORKFLOW_REGISTRY.keys())
raise KeyError(
f"Workflow '{workflow_name}' not found in registry. "
f"Available workflows: {available}"
)
return WORKFLOW_REGISTRY[workflow_name]
def list_registered_workflows() -> Dict[str, Dict[str, Any]]:
"""
Get all registered workflows.
Returns:
Dictionary of all workflow registry entries
"""
return WORKFLOW_REGISTRY.copy()
def validate_registry() -> bool:
"""
Validate the workflow registry for consistency.
Returns:
True if valid, raises exceptions if not
Raises:
ValueError: If registry is invalid
"""
if not WORKFLOW_REGISTRY:
raise ValueError("Workflow registry is empty")
required_fields = ["flow", "module_path", "function_name", "description"]
for name, entry in WORKFLOW_REGISTRY.items():
# Check required fields
missing_fields = [field for field in required_fields if field not in entry]
if missing_fields:
raise ValueError(
f"Workflow '{name}' missing required fields: {missing_fields}"
)
# Check if flow is callable
if not callable(entry["flow"]):
raise ValueError(f"Workflow '{name}' flow is not callable")
# Check if flow has the required Prefect attributes
if not hasattr(entry["flow"], "deploy"):
raise ValueError(
f"Workflow '{name}' flow is not a Prefect flow (missing deploy method)"
)
logger.info(f"Registry validation passed. {len(WORKFLOW_REGISTRY)} workflows registered.")
return True
# Validate registry on import
try:
validate_registry()
logger.info(f"Workflow registry loaded successfully with {len(WORKFLOW_REGISTRY)} workflows")
except Exception as e:
logger.error(f"Workflow registry validation failed: {e}")
raise
@@ -1,30 +0,0 @@
FROM prefecthq/prefect:3-python3.11
WORKDIR /app
# Create toolbox directory structure to match expected import paths
RUN mkdir -p /app/toolbox/workflows /app/toolbox/modules
# Copy base module infrastructure
COPY modules/__init__.py /app/toolbox/modules/
COPY modules/base.py /app/toolbox/modules/
# Copy only required modules (manual selection)
COPY modules/scanner /app/toolbox/modules/scanner
COPY modules/analyzer /app/toolbox/modules/analyzer
COPY modules/reporter /app/toolbox/modules/reporter
# Copy this workflow
COPY workflows/security_assessment /app/toolbox/workflows/security_assessment
# Install workflow-specific requirements if they exist
RUN if [ -f /app/toolbox/workflows/security_assessment/requirements.txt ]; then pip install --no-cache-dir -r /app/toolbox/workflows/security_assessment/requirements.txt; fi
# Install common requirements
RUN pip install --no-cache-dir pyyaml
# Set Python path
ENV PYTHONPATH=/app:$PYTHONPATH
# Create workspace directory
RUN mkdir -p /workspace
@@ -0,0 +1,150 @@
"""
Security Assessment Workflow Activities
Activities specific to the security assessment workflow:
- scan_files_activity: Scan files in the workspace
- analyze_security_activity: Analyze security vulnerabilities
- generate_sarif_report_activity: Generate SARIF report from findings
"""
import logging
import sys
from pathlib import Path
from temporalio import activity
# Configure logging
logger = logging.getLogger(__name__)
# Add toolbox to path for module imports
sys.path.insert(0, '/app/toolbox')
@activity.defn(name="scan_files")
async def scan_files_activity(workspace_path: str, config: dict) -> dict:
"""
Scan files in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Scanner configuration
Returns:
Scanner results dictionary
"""
logger.info(f"Activity: scan_files (workspace={workspace_path})")
try:
from modules.scanner import FileScanner
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(
f"✓ File scanning completed: "
f"{result.summary.get('total_files', 0)} files scanned"
)
return result.dict()
except Exception as e:
logger.error(f"File scanning failed: {e}", exc_info=True)
raise
@activity.defn(name="analyze_security")
async def analyze_security_activity(workspace_path: str, config: dict) -> dict:
"""
Analyze security vulnerabilities in the workspace.
Args:
workspace_path: Path to the workspace directory
config: Analyzer configuration
Returns:
Analysis results dictionary
"""
logger.info(f"Activity: analyze_security (workspace={workspace_path})")
try:
from modules.analyzer import SecurityAnalyzer
workspace = Path(workspace_path)
if not workspace.exists():
raise FileNotFoundError(f"Workspace not found: {workspace_path}")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"✓ Security analysis completed: "
f"{result.summary.get('total_findings', 0)} findings"
)
return result.dict()
except Exception as e:
logger.error(f"Security analysis failed: {e}", exc_info=True)
raise
@activity.defn(name="generate_sarif_report")
async def generate_sarif_report_activity(
scan_results: dict,
analysis_results: dict,
config: dict,
workspace_path: str
) -> dict:
"""
Generate SARIF report from scan and analysis results.
Args:
scan_results: Results from file scanner
analysis_results: Results from security analyzer
config: Reporter configuration
workspace_path: Path to the workspace
Returns:
SARIF report dictionary
"""
logger.info("Activity: generate_sarif_report")
try:
from modules.reporter import SARIFReporter
workspace = Path(workspace_path)
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
reporter = SARIFReporter()
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"✓ SARIF report generated with {len(all_findings)} findings")
return sarif
except Exception as e:
logger.error(f"SARIF report generation failed: {e}", exc_info=True)
raise
@@ -1,8 +1,8 @@
name: security_assessment
version: "2.0.0"
vertical: rust
description: "Comprehensive security assessment workflow that scans files, analyzes code for vulnerabilities, and generates SARIF reports"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "security"
- "scanner"
@@ -11,28 +11,14 @@ tags:
- "sarif"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
default_volume_mode: "ro"
default_target_path: "/workspace"
requirements:
tools:
- "file_scanner"
- "security_analyzer"
- "sarif_reporter"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
# Workspace isolation mode (system-level configuration)
# - "isolated" (default): Each workflow run gets its own isolated workspace (safe for concurrent fuzzing)
# - "shared": All runs share the same workspace (for read-only analysis workflows)
# - "copy-on-write": Download once, copy for each run (balances performance and isolation)
# Using "shared" mode for read-only security analysis (no file modifications)
workspace_isolation: "shared"
default_parameters:
target_path: "/workspace"
volume_mode: "ro"
scanner_config: {}
analyzer_config: {}
reporter_config: {}
@@ -40,15 +26,6 @@ default_parameters:
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
scanner_config:
type: object
description: "File scanner configuration"
@@ -1,4 +0,0 @@
# Requirements for security assessment workflow
pydantic>=2.0.0
pyyaml>=6.0
aiofiles>=23.0.0
@@ -1,5 +1,7 @@
"""
Security Assessment Workflow - Comprehensive security analysis using multiple modules
Security Assessment Workflow - Temporal Version
Comprehensive security analysis using multiple modules.
"""
# Copyright (c) 2025 FuzzingLabs
@@ -13,240 +15,219 @@ Security Assessment Workflow - Comprehensive security analysis using multiple mo
#
# Additional attribution and requirements are provided in the NOTICE file.
import sys
import logging
from pathlib import Path
from datetime import timedelta
from typing import Dict, Any, Optional
from prefect import flow, task
import json
# Add modules to path
sys.path.insert(0, '/app')
from temporalio import workflow
from temporalio.common import RetryPolicy
# Import modules
from toolbox.modules.scanner import FileScanner
from toolbox.modules.analyzer import SecurityAnalyzer
from toolbox.modules.reporter import SARIFReporter
# Import activity interfaces (will be executed by worker)
with workflow.unsafe.imports_passed_through():
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@task(name="file_scanning")
async def scan_files_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
@workflow.defn
class SecurityAssessmentWorkflow:
"""
Task to scan files in the workspace.
Args:
workspace: Path to the workspace
config: Scanner configuration
Returns:
Scanner results
"""
logger.info(f"Starting file scanning in {workspace}")
scanner = FileScanner()
result = await scanner.execute(config, workspace)
logger.info(f"File scanning completed: {result.summary.get('total_files', 0)} files found")
return result.dict()
@task(name="security_analysis")
async def analyze_security_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Task to analyze security vulnerabilities.
Args:
workspace: Path to the workspace
config: Analyzer configuration
Returns:
Analysis results
"""
logger.info("Starting security analysis")
analyzer = SecurityAnalyzer()
result = await analyzer.execute(config, workspace)
logger.info(
f"Security analysis completed: {result.summary.get('total_findings', 0)} findings"
)
return result.dict()
@task(name="report_generation")
async def generate_report_task(
scan_results: Dict[str, Any],
analysis_results: Dict[str, Any],
config: Dict[str, Any],
workspace: Path
) -> Dict[str, Any]:
"""
Task to generate SARIF report from all findings.
Args:
scan_results: Results from scanner
analysis_results: Results from analyzer
config: Reporter configuration
workspace: Path to the workspace
Returns:
SARIF report
"""
logger.info("Generating SARIF report")
reporter = SARIFReporter()
# Combine findings from all modules
all_findings = []
# Add scanner findings (only sensitive files, not all files)
scanner_findings = scan_results.get("findings", [])
sensitive_findings = [f for f in scanner_findings if f.get("severity") != "info"]
all_findings.extend(sensitive_findings)
# Add analyzer findings
analyzer_findings = analysis_results.get("findings", [])
all_findings.extend(analyzer_findings)
# Prepare reporter config
reporter_config = {
**config,
"findings": all_findings,
"tool_name": "FuzzForge Security Assessment",
"tool_version": "1.0.0"
}
result = await reporter.execute(reporter_config, workspace)
# Extract SARIF from result
sarif = result.dict().get("sarif", {})
logger.info(f"Report generated with {len(all_findings)} total findings")
return sarif
@flow(name="security_assessment", log_prints=True)
async def main_flow(
target_path: str = "/workspace",
volume_mode: str = "ro",
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main security assessment workflow.
Comprehensive security assessment workflow.
This workflow:
1. Scans files in the workspace
2. Analyzes code for security vulnerabilities
3. Generates a SARIF report with all findings
Args:
target_path: Path to the mounted workspace (default: /workspace)
volume_mode: Volume mount mode (ro/rw)
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
Returns:
SARIF-formatted findings report
1. Downloads target from MinIO
2. Scans files in the workspace
3. Analyzes code for security vulnerabilities
4. Generates a SARIF report with all findings
5. Uploads results to MinIO
6. Cleans up cache
"""
logger.info(f"Starting security assessment workflow")
logger.info(f"Workspace: {target_path}, Mode: {volume_mode}")
# Set workspace path
workspace = Path(target_path)
@workflow.run
async def run(
self,
target_id: str,
scanner_config: Optional[Dict[str, Any]] = None,
analyzer_config: Optional[Dict[str, Any]] = None,
reporter_config: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Main workflow execution.
if not workspace.exists():
logger.error(f"Workspace does not exist: {workspace}")
return {
"error": f"Workspace not found: {workspace}",
"sarif": None
}
Args:
target_id: UUID of the uploaded target in MinIO
scanner_config: Configuration for file scanner
analyzer_config: Configuration for security analyzer
reporter_config: Configuration for SARIF reporter
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
Returns:
Dictionary containing SARIF report and summary
"""
workflow_id = workflow.info().workflow_id
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
try:
# Execute workflow tasks
logger.info("Phase 1: File scanning")
scan_results = await scan_files_task(workspace, scanner_config)
logger.info("Phase 2: Security analysis")
analysis_results = await analyze_security_task(workspace, analyzer_config)
logger.info("Phase 3: Report generation")
sarif_report = await generate_report_task(
scan_results,
analysis_results,
reporter_config,
workspace
workflow.logger.info(
f"Starting SecurityAssessmentWorkflow "
f"(workflow_id={workflow_id}, target_id={target_id})"
)
# Log summary
if sarif_report and "runs" in sarif_report:
results_count = len(sarif_report["runs"][0].get("results", []))
logger.info(f"Workflow completed successfully with {results_count} findings")
else:
logger.info("Workflow completed successfully")
# Default configurations
if not scanner_config:
scanner_config = {
"patterns": ["*"],
"check_sensitive": True,
"calculate_hashes": False,
"max_file_size": 10485760 # 10MB
}
return sarif_report
if not analyzer_config:
analyzer_config = {
"file_extensions": [".py", ".js", ".java", ".php", ".rb", ".go"],
"check_secrets": True,
"check_sql": True,
"check_dangerous_functions": True
}
except Exception as e:
logger.error(f"Workflow failed: {e}")
# Return error in SARIF format
return {
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "FuzzForge Security Assessment",
"version": "1.0.0"
}
},
"results": [],
"invocations": [
{
"executionSuccessful": False,
"exitCode": 1,
"exitCodeDescription": str(e)
}
]
}
]
if not reporter_config:
reporter_config = {
"include_code_flows": False
}
results = {
"workflow_id": workflow_id,
"target_id": target_id,
"status": "running",
"steps": []
}
try:
# Get run ID for workspace isolation (using shared mode for read-only analysis)
run_id = workflow.info().run_id
if __name__ == "__main__":
# For local testing
import asyncio
# Step 1: Download target from MinIO
workflow.logger.info("Step 1: Downloading target from MinIO")
target_path = await workflow.execute_activity(
"get_target",
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
start_to_close_timeout=timedelta(minutes=5),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=1),
maximum_interval=timedelta(seconds=30),
maximum_attempts=3
)
)
results["steps"].append({
"step": "download_target",
"status": "success",
"target_path": target_path
})
workflow.logger.info(f"✓ Target downloaded to: {target_path}")
asyncio.run(main_flow(
target_path="/tmp/test",
scanner_config={"patterns": ["*.py"]},
analyzer_config={"check_secrets": True}
))
# Step 2: File scanning
workflow.logger.info("Step 2: Scanning files")
scan_results = await workflow.execute_activity(
"scan_files",
args=[target_path, scanner_config],
start_to_close_timeout=timedelta(minutes=10),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "file_scanning",
"status": "success",
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
})
workflow.logger.info(
f"✓ File scanning completed: "
f"{scan_results.get('summary', {}).get('total_files', 0)} files"
)
# Step 3: Security analysis
workflow.logger.info("Step 3: Analyzing security vulnerabilities")
analysis_results = await workflow.execute_activity(
"analyze_security",
args=[target_path, analyzer_config],
start_to_close_timeout=timedelta(minutes=15),
retry_policy=RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_interval=timedelta(seconds=60),
maximum_attempts=2
)
)
results["steps"].append({
"step": "security_analysis",
"status": "success",
"findings": analysis_results.get("summary", {}).get("total_findings", 0)
})
workflow.logger.info(
f"✓ Security analysis completed: "
f"{analysis_results.get('summary', {}).get('total_findings', 0)} findings"
)
# Step 4: Generate SARIF report
workflow.logger.info("Step 4: Generating SARIF report")
sarif_report = await workflow.execute_activity(
"generate_sarif_report",
args=[scan_results, analysis_results, reporter_config, target_path],
start_to_close_timeout=timedelta(minutes=5)
)
results["steps"].append({
"step": "report_generation",
"status": "success"
})
# Count total findings in SARIF
total_findings = 0
if sarif_report and "runs" in sarif_report:
total_findings = len(sarif_report["runs"][0].get("results", []))
workflow.logger.info(f"✓ SARIF report generated with {total_findings} findings")
# Step 5: Upload results to MinIO
workflow.logger.info("Step 5: Uploading results")
try:
results_url = await workflow.execute_activity(
"upload_results",
args=[workflow_id, sarif_report, "sarif"],
start_to_close_timeout=timedelta(minutes=2)
)
results["results_url"] = results_url
workflow.logger.info(f"✓ Results uploaded to: {results_url}")
except Exception as e:
workflow.logger.warning(f"Failed to upload results: {e}")
results["results_url"] = None
# Step 6: Cleanup cache
workflow.logger.info("Step 6: Cleaning up cache")
try:
await workflow.execute_activity(
"cleanup_cache",
args=[target_path, "shared"], # target_path, workspace_isolation
start_to_close_timeout=timedelta(minutes=1)
)
workflow.logger.info("✓ Cache cleaned up (skipped for shared mode)")
except Exception as e:
workflow.logger.warning(f"Cache cleanup failed: {e}")
# Mark workflow as successful
results["status"] = "success"
results["sarif"] = sarif_report
results["summary"] = {
"total_findings": total_findings,
"files_scanned": scan_results.get("summary", {}).get("total_files", 0)
}
workflow.logger.info(f"✓ Workflow completed successfully: {workflow_id}")
return results
except Exception as e:
workflow.logger.error(f"Workflow failed: {e}")
results["status"] = "error"
results["error"] = str(e)
results["steps"].append({
"step": "error",
"status": "failed",
"error": str(e)
})
raise