Files
fuzzforge_ai/backend/src/storage/s3_cached.py
tduhamel42 60ca088ecf CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal

BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal

## Major Changes
- Replace Prefect with Temporal for workflow orchestration
- Implement vertical worker architecture (rust, android)
- Replace Docker registry with MinIO for unified storage
- Refactor activities to be co-located with workflows
- Update all API endpoints for Temporal compatibility

## Infrastructure
- New: docker-compose.temporal.yaml (Temporal + MinIO + workers)
- New: workers/ directory with rust and android vertical workers
- New: backend/src/temporal/ (manager, discovery)
- New: backend/src/storage/ (S3-cached storage with MinIO)
- New: backend/toolbox/common/ (shared storage activities)
- Deleted: docker-compose.yaml (old Prefect setup)
- Deleted: backend/src/core/prefect_manager.py
- Deleted: backend/src/services/prefect_stats_monitor.py
- Deleted: Docker registry and insecure-registries requirement

## Workflows
- Migrated: security_assessment workflow to Temporal
- New: rust_test workflow (example/test workflow)
- Deleted: secret_detection_scan (Prefect-based, to be reimplemented)
- Activities now co-located with workflows for independent testing

## API Changes
- Updated: backend/src/api/workflows.py (Temporal submission)
- Updated: backend/src/api/runs.py (Temporal status/results)
- Updated: backend/src/main.py (727 lines, TemporalManager integration)
- Updated: All 16 MCP tools to use TemporalManager

## Testing
-  All services healthy (Temporal, PostgreSQL, MinIO, workers, backend)
-  All API endpoints functional
-  End-to-end workflow test passed (72 findings from vulnerable_app)
-  MinIO storage integration working (target upload/download, results)
-  Worker activity discovery working (6 activities registered)
-  Tarball extraction working
-  SARIF report generation working

## Documentation
- ARCHITECTURE.md: Complete Temporal architecture documentation
- QUICKSTART_TEMPORAL.md: Getting started guide
- MIGRATION_DECISION.md: Why we chose Temporal over Prefect
- IMPLEMENTATION_STATUS.md: Migration progress tracking
- workers/README.md: Worker development guide

## Dependencies
- Added: temporalio>=1.6.0
- Added: boto3>=1.34.0 (MinIO S3 client)
- Removed: prefect>=3.4.18

* feat: Add Python fuzzing vertical with Atheris integration

This commit implements a complete Python fuzzing workflow using Atheris:

## Python Worker (workers/python/)
- Dockerfile with Python 3.11, Atheris, and build tools
- Generic worker.py for dynamic workflow discovery
- requirements.txt with temporalio, boto3, atheris dependencies
- Added to docker-compose.temporal.yaml with dedicated cache volume

## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/)
- Reusable module extending BaseModule
- Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py)
- Recursive search to find targets in nested directories
- Dynamically loads TestOneInput() function
- Configurable max_iterations and timeout
- Real-time stats callback support for live monitoring
- Returns findings as ModuleFinding objects

## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/)
- Temporal workflow for orchestrating fuzzing
- Downloads user code from MinIO
- Executes AtherisFuzzer module
- Uploads results to MinIO
- Cleans up cache after execution
- metadata.yaml with vertical: python for routing

## Test Project (test_projects/python_fuzz_waterfall/)
- Demonstrates stateful waterfall vulnerability
- main.py with check_secret() that leaks progress
- fuzz_target.py with Atheris TestOneInput() harness
- Complete README with usage instructions

## Backend Fixes
- Fixed parameter merging in REST API endpoints (workflows.py)
- Changed workflow parameter passing from positional args to kwargs (manager.py)
- Default parameters now properly merged with user parameters

## Testing
 Worker discovered AtherisFuzzingWorkflow
 Workflow executed end-to-end successfully
 Fuzz target auto-discovered in nested directories
 Atheris ran 100,000 iterations
 Results uploaded and cache cleaned

* chore: Complete Temporal migration with updated CLI/SDK/docs

This commit includes all remaining Temporal migration changes:

## CLI Updates (cli/)
- Updated workflow execution commands for Temporal
- Enhanced error handling and exceptions
- Updated dependencies in uv.lock

## SDK Updates (sdk/)
- Client methods updated for Temporal workflows
- Updated models for new workflow execution
- Updated dependencies in uv.lock

## Documentation Updates (docs/)
- Architecture documentation for Temporal
- Workflow concept documentation
- Resource management documentation (new)
- Debugging guide (new)
- Updated tutorials and how-to guides
- Troubleshooting updates

## README Updates
- Main README with Temporal instructions
- Backend README
- CLI README
- SDK README

## Other
- Updated IMPLEMENTATION_STATUS.md
- Removed old vulnerable_app.tar.gz

These changes complete the Temporal migration and ensure the
CLI/SDK work correctly with the new backend.

* fix: Use positional args instead of kwargs for Temporal workflows

The Temporal Python SDK's start_workflow() method doesn't accept
a 'kwargs' parameter. Workflows must receive parameters as positional
arguments via the 'args' parameter.

Changed from:
  args=workflow_args  # Positional arguments

This fixes the error:
  TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs'

Workflows now correctly receive parameters in order:
- security_assessment: [target_id, scanner_config, analyzer_config, reporter_config]
- atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds]
- rust_test: [target_id, test_message]

* fix: Filter metadata-only parameters from workflow arguments

SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5.
The issue was that target_path and volume_mode from default_parameters
were being passed to the workflow, when they should only be used by
the system for configuration.

Now filters out metadata-only parameters (target_path, volume_mode)
before passing arguments to workflow execution.

* refactor: Remove Prefect leftovers and volume mounting legacy

Complete cleanup of Prefect migration artifacts:

Backend:
- Delete registry.py and workflow_discovery.py (Prefect-specific files)
- Remove Docker validation from setup.py (no longer needed)
- Remove ResourceLimits and VolumeMount models
- Remove target_path and volume_mode from WorkflowSubmission
- Remove supported_volume_modes from API and discovery
- Clean up metadata.yaml files (remove volume/path fields)
- Simplify parameter filtering in manager.py

SDK:
- Remove volume_mode parameter from client methods
- Remove ResourceLimits and VolumeMount models
- Remove Prefect error patterns from docker_logs.py
- Clean up WorkflowSubmission and WorkflowMetadata models

CLI:
- Remove Volume Modes display from workflow info

All removed features are Prefect-specific or Docker volume mounting
artifacts. Temporal workflows use MinIO storage exclusively.

* feat: Add comprehensive test suite and benchmark infrastructure

- Add 68 unit tests for fuzzer, scanner, and analyzer modules
- Implement pytest-based test infrastructure with fixtures
- Add 6 performance benchmarks with category-specific thresholds
- Configure GitHub Actions for automated testing and benchmarking
- Add test and benchmark documentation

Test coverage:
- AtherisFuzzer: 8 tests
- CargoFuzzer: 14 tests
- FileScanner: 22 tests
- SecurityAnalyzer: 24 tests

All tests passing (68/68)
All benchmarks passing (6/6)

* fix: Resolve all ruff linting violations across codebase

Fixed 27 ruff violations in 12 files:
- Removed unused imports (Depends, Dict, Any, Optional, etc.)
- Fixed undefined workflow_info variable in workflows.py
- Removed dead code with undefined variables in atheris_fuzzer.py
- Changed f-string to regular string where no placeholders used

All files now pass ruff checks for CI/CD compliance.

* fix: Configure CI for unit tests only

- Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility
- Commented out integration-tests job (no integration tests yet)
- Updated test-summary to only depend on lint and unit-tests

CI will now run successfully with 68 unit tests. Integration tests can be added later.

* feat: Add CI/CD integration with ephemeral deployment model

Implements comprehensive CI/CD support for FuzzForge with on-demand worker management:

**Worker Management (v0.7.0)**
- Add WorkerManager for automatic worker lifecycle control
- Auto-start workers from stopped state when workflows execute
- Auto-stop workers after workflow completion
- Health checks and startup timeout handling (90s default)

**CI/CD Features**
- `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info)
- `--export-sarif` flag: Export findings in SARIF 2.1.0 format
- `--auto-start`/`--auto-stop` flags: Control worker lifecycle
- Exit code propagation: Returns 1 on blocking findings, 0 on success

**Exit Code Fix**
- Add `except typer.Exit: raise` handlers at 3 critical locations
- Move worker cleanup to finally block for guaranteed execution
- Exit codes now propagate correctly even when build fails

**CI Scripts & Examples**
- ci-start.sh: Start FuzzForge services with health checks
- ci-stop.sh: Clean shutdown with volume preservation option
- GitHub Actions workflow example (security-scan.yml)
- GitLab CI pipeline example (.gitlab-ci.example.yml)
- docker-compose.ci.yml: CI-optimized compose file with profiles

**OSS-Fuzz Integration**
- New ossfuzz_campaign workflow for running OSS-Fuzz projects
- OSS-Fuzz worker with Docker-in-Docker support
- Configurable campaign duration and project selection

**Documentation**
- Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md)
- Updated architecture docs with worker lifecycle details
- Updated workspace isolation documentation
- CLI README with worker management examples

**SDK Enhancements**
- Add get_workflow_worker_info() endpoint
- Worker vertical metadata in workflow responses

**Testing**
- All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing
- All monitoring commands tested: stats, crashes, status, finding
- Full CI pipeline simulation verified
- Exit codes verified for success/failure scenarios

Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers.

* fix: Resolve ruff linting violations in CI/CD code

- Remove unused variables (run_id, defaults, result)
- Remove unused imports
- Fix f-string without placeholders

All CI/CD integration files now pass ruff checks.
2025-10-14 10:13:45 +02:00

424 lines
15 KiB
Python

"""
S3-compatible storage backend with local caching.
Works with MinIO (dev/prod) or AWS S3 (cloud).
"""
import json
import logging
import os
import shutil
from datetime import datetime
from pathlib import Path
from typing import Optional, Dict, Any
from uuid import uuid4
import boto3
from botocore.exceptions import ClientError
from .base import StorageBackend, StorageError
logger = logging.getLogger(__name__)
class S3CachedStorage(StorageBackend):
"""
S3-compatible storage with local caching.
Features:
- Upload targets to S3/MinIO
- Download with local caching (LRU eviction)
- Lifecycle management (auto-cleanup old files)
- Metadata tracking
"""
def __init__(
self,
endpoint_url: Optional[str] = None,
access_key: Optional[str] = None,
secret_key: Optional[str] = None,
bucket: str = "targets",
region: str = "us-east-1",
use_ssl: bool = False,
cache_dir: Optional[Path] = None,
cache_max_size_gb: int = 10
):
"""
Initialize S3 storage backend.
Args:
endpoint_url: S3 endpoint (None = AWS S3, or MinIO URL)
access_key: S3 access key (None = from env)
secret_key: S3 secret key (None = from env)
bucket: S3 bucket name
region: AWS region
use_ssl: Use HTTPS
cache_dir: Local cache directory
cache_max_size_gb: Maximum cache size in GB
"""
# Use environment variables as defaults
self.endpoint_url = endpoint_url or os.getenv('S3_ENDPOINT', 'http://minio:9000')
self.access_key = access_key or os.getenv('S3_ACCESS_KEY', 'fuzzforge')
self.secret_key = secret_key or os.getenv('S3_SECRET_KEY', 'fuzzforge123')
self.bucket = bucket or os.getenv('S3_BUCKET', 'targets')
self.region = region or os.getenv('S3_REGION', 'us-east-1')
self.use_ssl = use_ssl or os.getenv('S3_USE_SSL', 'false').lower() == 'true'
# Cache configuration
self.cache_dir = cache_dir or Path(os.getenv('CACHE_DIR', '/tmp/fuzzforge-cache'))
self.cache_max_size = cache_max_size_gb * (1024 ** 3) # Convert to bytes
# Ensure cache directory exists
self.cache_dir.mkdir(parents=True, exist_ok=True)
# Initialize S3 client
try:
self.s3_client = boto3.client(
's3',
endpoint_url=self.endpoint_url,
aws_access_key_id=self.access_key,
aws_secret_access_key=self.secret_key,
region_name=self.region,
use_ssl=self.use_ssl
)
logger.info(f"Initialized S3 storage: {self.endpoint_url}/{self.bucket}")
except Exception as e:
logger.error(f"Failed to initialize S3 client: {e}")
raise StorageError(f"S3 initialization failed: {e}")
async def upload_target(
self,
file_path: Path,
user_id: str,
metadata: Optional[Dict[str, Any]] = None
) -> str:
"""Upload target file to S3/MinIO."""
if not file_path.exists():
raise FileNotFoundError(f"File not found: {file_path}")
# Generate unique target ID
target_id = str(uuid4())
# Prepare metadata
upload_metadata = {
'user_id': user_id,
'uploaded_at': datetime.now().isoformat(),
'filename': file_path.name,
'size': str(file_path.stat().st_size)
}
if metadata:
upload_metadata.update(metadata)
# Upload to S3
s3_key = f'{target_id}/target'
try:
logger.info(f"Uploading target to s3://{self.bucket}/{s3_key}")
self.s3_client.upload_file(
str(file_path),
self.bucket,
s3_key,
ExtraArgs={
'Metadata': upload_metadata
}
)
file_size_mb = file_path.stat().st_size / (1024 * 1024)
logger.info(
f"✓ Uploaded target {target_id} "
f"({file_path.name}, {file_size_mb:.2f} MB)"
)
return target_id
except ClientError as e:
logger.error(f"S3 upload failed: {e}", exc_info=True)
raise StorageError(f"Failed to upload target: {e}")
except Exception as e:
logger.error(f"Upload failed: {e}", exc_info=True)
raise StorageError(f"Upload error: {e}")
async def get_target(self, target_id: str) -> Path:
"""Get target from cache or download from S3/MinIO."""
# Check cache first
cache_path = self.cache_dir / target_id
cached_file = cache_path / "target"
if cached_file.exists():
# Update access time for LRU
cached_file.touch()
logger.info(f"Cache HIT: {target_id}")
return cached_file
# Cache miss - download from S3
logger.info(f"Cache MISS: {target_id}, downloading from S3...")
try:
# Create cache directory
cache_path.mkdir(parents=True, exist_ok=True)
# Download from S3
s3_key = f'{target_id}/target'
logger.info(f"Downloading s3://{self.bucket}/{s3_key}")
self.s3_client.download_file(
self.bucket,
s3_key,
str(cached_file)
)
# Verify download
if not cached_file.exists():
raise StorageError(f"Downloaded file not found: {cached_file}")
file_size_mb = cached_file.stat().st_size / (1024 * 1024)
logger.info(f"✓ Downloaded target {target_id} ({file_size_mb:.2f} MB)")
return cached_file
except ClientError as e:
error_code = e.response.get('Error', {}).get('Code')
if error_code in ['404', 'NoSuchKey']:
logger.error(f"Target not found: {target_id}")
raise FileNotFoundError(f"Target {target_id} not found in storage")
else:
logger.error(f"S3 download failed: {e}", exc_info=True)
raise StorageError(f"Download failed: {e}")
except Exception as e:
logger.error(f"Download error: {e}", exc_info=True)
# Cleanup partial download
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
raise StorageError(f"Download error: {e}")
async def delete_target(self, target_id: str) -> None:
"""Delete target from S3/MinIO."""
try:
s3_key = f'{target_id}/target'
logger.info(f"Deleting s3://{self.bucket}/{s3_key}")
self.s3_client.delete_object(
Bucket=self.bucket,
Key=s3_key
)
# Also delete from cache if present
cache_path = self.cache_dir / target_id
if cache_path.exists():
shutil.rmtree(cache_path, ignore_errors=True)
logger.info(f"✓ Deleted target {target_id} from S3 and cache")
else:
logger.info(f"✓ Deleted target {target_id} from S3")
except ClientError as e:
logger.error(f"S3 delete failed: {e}", exc_info=True)
# Don't raise error if object doesn't exist
if e.response.get('Error', {}).get('Code') not in ['404', 'NoSuchKey']:
raise StorageError(f"Delete failed: {e}")
except Exception as e:
logger.error(f"Delete error: {e}", exc_info=True)
raise StorageError(f"Delete error: {e}")
async def upload_results(
self,
workflow_id: str,
results: Dict[str, Any],
results_format: str = "json"
) -> str:
"""Upload workflow results to S3/MinIO."""
try:
# Prepare results content
if results_format == "json":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
elif results_format == "sarif":
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/sarif+json'
file_ext = 'sarif'
else:
content = json.dumps(results, indent=2).encode('utf-8')
content_type = 'application/json'
file_ext = 'json'
# Upload to results bucket
results_bucket = 'results'
s3_key = f'{workflow_id}/results.{file_ext}'
logger.info(f"Uploading results to s3://{results_bucket}/{s3_key}")
self.s3_client.put_object(
Bucket=results_bucket,
Key=s3_key,
Body=content,
ContentType=content_type,
Metadata={
'workflow_id': workflow_id,
'format': results_format,
'uploaded_at': datetime.now().isoformat()
}
)
# Construct URL
results_url = f"{self.endpoint_url}/{results_bucket}/{s3_key}"
logger.info(f"✓ Uploaded results: {results_url}")
return results_url
except Exception as e:
logger.error(f"Results upload failed: {e}", exc_info=True)
raise StorageError(f"Results upload failed: {e}")
async def get_results(self, workflow_id: str) -> Dict[str, Any]:
"""Get workflow results from S3/MinIO."""
try:
results_bucket = 'results'
s3_key = f'{workflow_id}/results.json'
logger.info(f"Downloading results from s3://{results_bucket}/{s3_key}")
response = self.s3_client.get_object(
Bucket=results_bucket,
Key=s3_key
)
content = response['Body'].read().decode('utf-8')
results = json.loads(content)
logger.info(f"✓ Downloaded results for workflow {workflow_id}")
return results
except ClientError as e:
error_code = e.response.get('Error', {}).get('Code')
if error_code in ['404', 'NoSuchKey']:
logger.error(f"Results not found: {workflow_id}")
raise FileNotFoundError(f"Results for workflow {workflow_id} not found")
else:
logger.error(f"Results download failed: {e}", exc_info=True)
raise StorageError(f"Results download failed: {e}")
except Exception as e:
logger.error(f"Results download error: {e}", exc_info=True)
raise StorageError(f"Results download error: {e}")
async def list_targets(
self,
user_id: Optional[str] = None,
limit: int = 100
) -> list[Dict[str, Any]]:
"""List uploaded targets."""
try:
targets = []
paginator = self.s3_client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket=self.bucket, PaginationConfig={'MaxItems': limit}):
for obj in page.get('Contents', []):
# Get object metadata
try:
metadata_response = self.s3_client.head_object(
Bucket=self.bucket,
Key=obj['Key']
)
metadata = metadata_response.get('Metadata', {})
# Filter by user_id if specified
if user_id and metadata.get('user_id') != user_id:
continue
targets.append({
'target_id': obj['Key'].split('/')[0],
'key': obj['Key'],
'size': obj['Size'],
'last_modified': obj['LastModified'].isoformat(),
'metadata': metadata
})
except Exception as e:
logger.warning(f"Failed to get metadata for {obj['Key']}: {e}")
continue
logger.info(f"Listed {len(targets)} targets (user_id={user_id})")
return targets
except Exception as e:
logger.error(f"List targets failed: {e}", exc_info=True)
raise StorageError(f"List targets failed: {e}")
async def cleanup_cache(self) -> int:
"""Clean up local cache using LRU eviction."""
try:
cache_files = []
total_size = 0
# Gather all cached files with metadata
for cache_file in self.cache_dir.rglob('*'):
if cache_file.is_file():
try:
stat = cache_file.stat()
cache_files.append({
'path': cache_file,
'size': stat.st_size,
'atime': stat.st_atime # Last access time
})
total_size += stat.st_size
except Exception as e:
logger.warning(f"Failed to stat {cache_file}: {e}")
continue
# Check if cleanup is needed
if total_size <= self.cache_max_size:
logger.info(
f"Cache size OK: {total_size / (1024**3):.2f} GB / "
f"{self.cache_max_size / (1024**3):.2f} GB"
)
return 0
# Sort by access time (oldest first)
cache_files.sort(key=lambda x: x['atime'])
# Remove files until under limit
removed_count = 0
for file_info in cache_files:
if total_size <= self.cache_max_size:
break
try:
file_info['path'].unlink()
total_size -= file_info['size']
removed_count += 1
logger.debug(f"Evicted from cache: {file_info['path']}")
except Exception as e:
logger.warning(f"Failed to delete {file_info['path']}: {e}")
continue
logger.info(
f"✓ Cache cleanup: removed {removed_count} files, "
f"new size: {total_size / (1024**3):.2f} GB"
)
return removed_count
except Exception as e:
logger.error(f"Cache cleanup failed: {e}", exc_info=True)
raise StorageError(f"Cache cleanup failed: {e}")
def get_cache_stats(self) -> Dict[str, Any]:
"""Get cache statistics."""
try:
total_size = 0
file_count = 0
for cache_file in self.cache_dir.rglob('*'):
if cache_file.is_file():
total_size += cache_file.stat().st_size
file_count += 1
return {
'total_size_bytes': total_size,
'total_size_gb': total_size / (1024 ** 3),
'file_count': file_count,
'max_size_gb': self.cache_max_size / (1024 ** 3),
'usage_percent': (total_size / self.cache_max_size) * 100
}
except Exception as e:
logger.error(f"Failed to get cache stats: {e}")
return {'error': str(e)}