mirror of
https://github.com/FuzzingLabs/fuzzforge_ai.git
synced 2026-02-12 22:32:45 +00:00
CI/CD Integration with Ephemeral Deployment Model (#14)
* feat: Complete migration from Prefect to Temporal BREAKING CHANGE: Replaces Prefect workflow orchestration with Temporal ## Major Changes - Replace Prefect with Temporal for workflow orchestration - Implement vertical worker architecture (rust, android) - Replace Docker registry with MinIO for unified storage - Refactor activities to be co-located with workflows - Update all API endpoints for Temporal compatibility ## Infrastructure - New: docker-compose.temporal.yaml (Temporal + MinIO + workers) - New: workers/ directory with rust and android vertical workers - New: backend/src/temporal/ (manager, discovery) - New: backend/src/storage/ (S3-cached storage with MinIO) - New: backend/toolbox/common/ (shared storage activities) - Deleted: docker-compose.yaml (old Prefect setup) - Deleted: backend/src/core/prefect_manager.py - Deleted: backend/src/services/prefect_stats_monitor.py - Deleted: Docker registry and insecure-registries requirement ## Workflows - Migrated: security_assessment workflow to Temporal - New: rust_test workflow (example/test workflow) - Deleted: secret_detection_scan (Prefect-based, to be reimplemented) - Activities now co-located with workflows for independent testing ## API Changes - Updated: backend/src/api/workflows.py (Temporal submission) - Updated: backend/src/api/runs.py (Temporal status/results) - Updated: backend/src/main.py (727 lines, TemporalManager integration) - Updated: All 16 MCP tools to use TemporalManager ## Testing - ✅ All services healthy (Temporal, PostgreSQL, MinIO, workers, backend) - ✅ All API endpoints functional - ✅ End-to-end workflow test passed (72 findings from vulnerable_app) - ✅ MinIO storage integration working (target upload/download, results) - ✅ Worker activity discovery working (6 activities registered) - ✅ Tarball extraction working - ✅ SARIF report generation working ## Documentation - ARCHITECTURE.md: Complete Temporal architecture documentation - QUICKSTART_TEMPORAL.md: Getting started guide - MIGRATION_DECISION.md: Why we chose Temporal over Prefect - IMPLEMENTATION_STATUS.md: Migration progress tracking - workers/README.md: Worker development guide ## Dependencies - Added: temporalio>=1.6.0 - Added: boto3>=1.34.0 (MinIO S3 client) - Removed: prefect>=3.4.18 * feat: Add Python fuzzing vertical with Atheris integration This commit implements a complete Python fuzzing workflow using Atheris: ## Python Worker (workers/python/) - Dockerfile with Python 3.11, Atheris, and build tools - Generic worker.py for dynamic workflow discovery - requirements.txt with temporalio, boto3, atheris dependencies - Added to docker-compose.temporal.yaml with dedicated cache volume ## AtherisFuzzer Module (backend/toolbox/modules/fuzzer/) - Reusable module extending BaseModule - Auto-discovers fuzz targets (fuzz_*.py, *_fuzz.py, fuzz_target.py) - Recursive search to find targets in nested directories - Dynamically loads TestOneInput() function - Configurable max_iterations and timeout - Real-time stats callback support for live monitoring - Returns findings as ModuleFinding objects ## Atheris Fuzzing Workflow (backend/toolbox/workflows/atheris_fuzzing/) - Temporal workflow for orchestrating fuzzing - Downloads user code from MinIO - Executes AtherisFuzzer module - Uploads results to MinIO - Cleans up cache after execution - metadata.yaml with vertical: python for routing ## Test Project (test_projects/python_fuzz_waterfall/) - Demonstrates stateful waterfall vulnerability - main.py with check_secret() that leaks progress - fuzz_target.py with Atheris TestOneInput() harness - Complete README with usage instructions ## Backend Fixes - Fixed parameter merging in REST API endpoints (workflows.py) - Changed workflow parameter passing from positional args to kwargs (manager.py) - Default parameters now properly merged with user parameters ## Testing ✅ Worker discovered AtherisFuzzingWorkflow ✅ Workflow executed end-to-end successfully ✅ Fuzz target auto-discovered in nested directories ✅ Atheris ran 100,000 iterations ✅ Results uploaded and cache cleaned * chore: Complete Temporal migration with updated CLI/SDK/docs This commit includes all remaining Temporal migration changes: ## CLI Updates (cli/) - Updated workflow execution commands for Temporal - Enhanced error handling and exceptions - Updated dependencies in uv.lock ## SDK Updates (sdk/) - Client methods updated for Temporal workflows - Updated models for new workflow execution - Updated dependencies in uv.lock ## Documentation Updates (docs/) - Architecture documentation for Temporal - Workflow concept documentation - Resource management documentation (new) - Debugging guide (new) - Updated tutorials and how-to guides - Troubleshooting updates ## README Updates - Main README with Temporal instructions - Backend README - CLI README - SDK README ## Other - Updated IMPLEMENTATION_STATUS.md - Removed old vulnerable_app.tar.gz These changes complete the Temporal migration and ensure the CLI/SDK work correctly with the new backend. * fix: Use positional args instead of kwargs for Temporal workflows The Temporal Python SDK's start_workflow() method doesn't accept a 'kwargs' parameter. Workflows must receive parameters as positional arguments via the 'args' parameter. Changed from: args=workflow_args # Positional arguments This fixes the error: TypeError: Client.start_workflow() got an unexpected keyword argument 'kwargs' Workflows now correctly receive parameters in order: - security_assessment: [target_id, scanner_config, analyzer_config, reporter_config] - atheris_fuzzing: [target_id, target_file, max_iterations, timeout_seconds] - rust_test: [target_id, test_message] * fix: Filter metadata-only parameters from workflow arguments SecurityAssessmentWorkflow was receiving 7 arguments instead of 2-5. The issue was that target_path and volume_mode from default_parameters were being passed to the workflow, when they should only be used by the system for configuration. Now filters out metadata-only parameters (target_path, volume_mode) before passing arguments to workflow execution. * refactor: Remove Prefect leftovers and volume mounting legacy Complete cleanup of Prefect migration artifacts: Backend: - Delete registry.py and workflow_discovery.py (Prefect-specific files) - Remove Docker validation from setup.py (no longer needed) - Remove ResourceLimits and VolumeMount models - Remove target_path and volume_mode from WorkflowSubmission - Remove supported_volume_modes from API and discovery - Clean up metadata.yaml files (remove volume/path fields) - Simplify parameter filtering in manager.py SDK: - Remove volume_mode parameter from client methods - Remove ResourceLimits and VolumeMount models - Remove Prefect error patterns from docker_logs.py - Clean up WorkflowSubmission and WorkflowMetadata models CLI: - Remove Volume Modes display from workflow info All removed features are Prefect-specific or Docker volume mounting artifacts. Temporal workflows use MinIO storage exclusively. * feat: Add comprehensive test suite and benchmark infrastructure - Add 68 unit tests for fuzzer, scanner, and analyzer modules - Implement pytest-based test infrastructure with fixtures - Add 6 performance benchmarks with category-specific thresholds - Configure GitHub Actions for automated testing and benchmarking - Add test and benchmark documentation Test coverage: - AtherisFuzzer: 8 tests - CargoFuzzer: 14 tests - FileScanner: 22 tests - SecurityAnalyzer: 24 tests All tests passing (68/68) All benchmarks passing (6/6) * fix: Resolve all ruff linting violations across codebase Fixed 27 ruff violations in 12 files: - Removed unused imports (Depends, Dict, Any, Optional, etc.) - Fixed undefined workflow_info variable in workflows.py - Removed dead code with undefined variables in atheris_fuzzer.py - Changed f-string to regular string where no placeholders used All files now pass ruff checks for CI/CD compliance. * fix: Configure CI for unit tests only - Renamed docker-compose.temporal.yaml → docker-compose.yml for CI compatibility - Commented out integration-tests job (no integration tests yet) - Updated test-summary to only depend on lint and unit-tests CI will now run successfully with 68 unit tests. Integration tests can be added later. * feat: Add CI/CD integration with ephemeral deployment model Implements comprehensive CI/CD support for FuzzForge with on-demand worker management: **Worker Management (v0.7.0)** - Add WorkerManager for automatic worker lifecycle control - Auto-start workers from stopped state when workflows execute - Auto-stop workers after workflow completion - Health checks and startup timeout handling (90s default) **CI/CD Features** - `--fail-on` flag: Fail builds based on SARIF severity levels (error/warning/note/info) - `--export-sarif` flag: Export findings in SARIF 2.1.0 format - `--auto-start`/`--auto-stop` flags: Control worker lifecycle - Exit code propagation: Returns 1 on blocking findings, 0 on success **Exit Code Fix** - Add `except typer.Exit: raise` handlers at 3 critical locations - Move worker cleanup to finally block for guaranteed execution - Exit codes now propagate correctly even when build fails **CI Scripts & Examples** - ci-start.sh: Start FuzzForge services with health checks - ci-stop.sh: Clean shutdown with volume preservation option - GitHub Actions workflow example (security-scan.yml) - GitLab CI pipeline example (.gitlab-ci.example.yml) - docker-compose.ci.yml: CI-optimized compose file with profiles **OSS-Fuzz Integration** - New ossfuzz_campaign workflow for running OSS-Fuzz projects - OSS-Fuzz worker with Docker-in-Docker support - Configurable campaign duration and project selection **Documentation** - Comprehensive CI/CD integration guide (docs/how-to/cicd-integration.md) - Updated architecture docs with worker lifecycle details - Updated workspace isolation documentation - CLI README with worker management examples **SDK Enhancements** - Add get_workflow_worker_info() endpoint - Worker vertical metadata in workflow responses **Testing** - All workflows tested: security_assessment, atheris_fuzzing, secret_detection, cargo_fuzzing - All monitoring commands tested: stats, crashes, status, finding - Full CI pipeline simulation verified - Exit codes verified for success/failure scenarios Ephemeral CI/CD model: ~3-4GB RAM, ~60-90s startup, runs entirely in CI containers. * fix: Resolve ruff linting violations in CI/CD code - Remove unused variables (run_id, defaults, result) - Remove unused imports - Fix f-string without placeholders All CI/CD integration files now pass ruff checks.
This commit is contained in:
@@ -93,51 +93,61 @@ graph TB
|
||||
|
||||
### Orchestration Layer
|
||||
|
||||
- **Prefect Server:** Schedules and tracks workflows, backed by PostgreSQL.
|
||||
- **Prefect Workers:** Execute workflows in Docker containers. Can be scaled horizontally.
|
||||
- **Workflow Scheduler:** Balances load, manages priorities, and enforces resource limits.
|
||||
- **Temporal Server:** Schedules and tracks workflows, backed by PostgreSQL.
|
||||
- **Vertical Workers:** Long-lived workers pre-built with domain-specific toolchains (Android, Rust, Web, etc.). Can be scaled horizontally.
|
||||
- **Task Queues:** Route workflows to appropriate vertical workers based on workflow metadata.
|
||||
|
||||
### Execution Layer
|
||||
|
||||
- **Docker Engine:** Runs workflow containers, enforcing isolation and resource limits.
|
||||
- **Workflow Containers:** Custom images with security tools, mounting code and results volumes.
|
||||
- **Docker Registry:** Stores and distributes workflow images.
|
||||
- **Vertical Workers:** Long-lived processes with pre-installed security tools for specific domains.
|
||||
- **MinIO Storage:** S3-compatible storage for uploaded targets and results.
|
||||
- **Worker Cache:** Local cache for downloaded targets, with LRU eviction.
|
||||
|
||||
### Storage Layer
|
||||
|
||||
- **PostgreSQL Database:** Stores workflow metadata, state, and results.
|
||||
- **Docker Volumes:** Persist workflow results and artifacts.
|
||||
- **Result Cache:** Speeds up access to recent results, with in-memory and disk persistence.
|
||||
- **PostgreSQL Database:** Stores Temporal workflow state and metadata.
|
||||
- **MinIO (S3):** Persistent storage for uploaded targets and workflow results.
|
||||
- **Worker Cache:** Local filesystem cache for downloaded targets with workspace isolation:
|
||||
- **Isolated mode**: Each run gets `/cache/{target_id}/{run_id}/workspace/`
|
||||
- **Shared mode**: All runs share `/cache/{target_id}/workspace/`
|
||||
- **Copy-on-write mode**: Download once, copy per run
|
||||
- **LRU eviction** when cache exceeds configured size
|
||||
|
||||
## How Does Data Flow Through the System?
|
||||
|
||||
### Submitting a Workflow
|
||||
|
||||
1. **User submits a workflow** via CLI or API client.
|
||||
2. **API validates** the request and creates a deployment in Prefect.
|
||||
3. **Prefect schedules** the workflow and assigns it to a worker.
|
||||
4. **Worker launches a container** to run the workflow.
|
||||
5. **Results are stored** in Docker volumes and the database.
|
||||
6. **Status updates** flow back through Prefect and the API to the user.
|
||||
1. **User submits a workflow** via CLI or API client (with optional file upload).
|
||||
2. **If file provided, API uploads** to MinIO and gets a `target_id`.
|
||||
3. **API validates** the request and submits to Temporal.
|
||||
4. **Temporal routes** the workflow to the appropriate vertical worker queue.
|
||||
5. **Worker downloads target** from MinIO to local cache (if needed).
|
||||
6. **Worker executes workflow** with pre-installed tools.
|
||||
7. **Results are stored** in MinIO and metadata in PostgreSQL.
|
||||
8. **Status updates** flow back through Temporal and the API to the user.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant User
|
||||
participant API
|
||||
participant Prefect
|
||||
participant MinIO
|
||||
participant Temporal
|
||||
participant Worker
|
||||
participant Container
|
||||
participant Storage
|
||||
participant Cache
|
||||
|
||||
User->>API: Submit workflow
|
||||
User->>API: Submit workflow + file
|
||||
API->>API: Validate parameters
|
||||
API->>Prefect: Create deployment
|
||||
Prefect->>Worker: Schedule execution
|
||||
Worker->>Container: Create and start
|
||||
Container->>Container: Execute security tools
|
||||
Container->>Storage: Store SARIF results
|
||||
Worker->>Prefect: Update status
|
||||
Prefect->>API: Workflow complete
|
||||
API->>MinIO: Upload target file
|
||||
MinIO-->>API: Return target_id
|
||||
API->>Temporal: Submit workflow(target_id)
|
||||
Temporal->>Worker: Route to vertical queue
|
||||
Worker->>MinIO: Download target
|
||||
MinIO-->>Worker: Stream file
|
||||
Worker->>Cache: Store in local cache
|
||||
Worker->>Worker: Execute security tools
|
||||
Worker->>MinIO: Upload SARIF results
|
||||
Worker->>Temporal: Update status
|
||||
Temporal->>API: Workflow complete
|
||||
API->>User: Return results
|
||||
```
|
||||
|
||||
@@ -149,25 +159,27 @@ sequenceDiagram
|
||||
|
||||
## How Do Services Communicate?
|
||||
|
||||
- **Internally:** FastAPI talks to Prefect via REST; Prefect coordinates with workers over HTTP; workers manage containers via the Docker Engine API. All core services use pooled connections to PostgreSQL.
|
||||
- **Externally:** Users interact via CLI or API clients (HTTP REST). The MCP server can automate workflows via its own protocol.
|
||||
- **Internally:** FastAPI talks to Temporal via gRPC; Temporal coordinates with workers over gRPC; workers access MinIO via S3 API. All core services use pooled connections to PostgreSQL.
|
||||
- **Externally:** Users interact via CLI or API clients (HTTP REST).
|
||||
|
||||
## How Is Security Enforced?
|
||||
|
||||
- **Container Isolation:** Each workflow runs in its own Docker network, as a non-root user, with strict resource limits and only necessary volumes mounted.
|
||||
- **Volume Security:** Source code is mounted read-only; results are written to dedicated, temporary volumes.
|
||||
- **API Security:** All endpoints require API keys, validate inputs, enforce rate limits, and log requests for auditing.
|
||||
- **Worker Isolation:** Each workflow runs in isolated vertical workers with pre-defined toolchains.
|
||||
- **Storage Security:** Uploaded files stored in MinIO with lifecycle policies; read-only access by default.
|
||||
- **API Security:** All endpoints validate inputs, enforce rate limits, and log requests for auditing.
|
||||
- **No Host Access:** Workers access targets via MinIO, not host filesystem.
|
||||
|
||||
## How Does FuzzForge Scale?
|
||||
|
||||
- **Horizontally:** Add more Prefect workers to handle more workflows in parallel. Scale the database with read replicas and connection pooling.
|
||||
- **Vertically:** Adjust CPU and memory limits for containers and services as needed.
|
||||
- **Horizontally:** Add more vertical workers to handle more workflows in parallel. Scale specific worker types based on demand.
|
||||
- **Vertically:** Adjust CPU and memory limits for workers and adjust concurrent activity limits.
|
||||
|
||||
Example Docker Compose scaling:
|
||||
```yaml
|
||||
services:
|
||||
prefect-worker:
|
||||
worker-rust:
|
||||
deploy:
|
||||
replicas: 3 # Scale rust workers
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
@@ -179,21 +191,22 @@ services:
|
||||
|
||||
## How Is It Deployed?
|
||||
|
||||
- **Development:** All services run via Docker Compose—backend, Prefect, workers, database, and registry.
|
||||
- **Production:** Add load balancers, database clustering, and multiple worker instances for high availability. Health checks, metrics, and centralized logging support monitoring and troubleshooting.
|
||||
- **Development:** All services run via Docker Compose—backend, Temporal, vertical workers, database, and MinIO.
|
||||
- **Production:** Add load balancers, Temporal clustering, database replication, and multiple worker instances for high availability. Health checks, metrics, and centralized logging support monitoring and troubleshooting.
|
||||
|
||||
## How Is Configuration Managed?
|
||||
|
||||
- **Environment Variables:** Control core settings like database URLs, registry location, and Prefect API endpoints.
|
||||
- **Service Discovery:** Docker Compose’s internal DNS lets services find each other by name, with consistent port mapping and health check endpoints.
|
||||
- **Environment Variables:** Control core settings like database URLs, MinIO endpoints, and Temporal addresses.
|
||||
- **Service Discovery:** Docker Compose's internal DNS lets services find each other by name, with consistent port mapping and health check endpoints.
|
||||
|
||||
Example configuration:
|
||||
```bash
|
||||
COMPOSE_PROJECT_NAME=fuzzforge
|
||||
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/fuzzforge
|
||||
PREFECT_API_URL=http://prefect-server:4200/api
|
||||
DOCKER_REGISTRY=localhost:5001
|
||||
DOCKER_INSECURE_REGISTRY=true
|
||||
TEMPORAL_ADDRESS=temporal:7233
|
||||
S3_ENDPOINT=http://minio:9000
|
||||
S3_ACCESS_KEY=fuzzforge
|
||||
S3_SECRET_KEY=fuzzforge123
|
||||
```
|
||||
|
||||
## How Are Failures Handled?
|
||||
@@ -203,9 +216,9 @@ DOCKER_INSECURE_REGISTRY=true
|
||||
|
||||
## Implementation Details
|
||||
|
||||
- **Tech Stack:** FastAPI (Python async), Prefect 3.x, Docker, Docker Compose, PostgreSQL (asyncpg), and Docker networking.
|
||||
- **Performance:** Workflows start in 2–5 seconds; results are retrieved quickly thanks to caching and database indexing.
|
||||
- **Extensibility:** Add new workflows by deploying new Docker images; extend the API with new endpoints; configure storage backends as needed.
|
||||
- **Tech Stack:** FastAPI (Python async), Temporal, MinIO, Docker, Docker Compose, PostgreSQL (asyncpg), and boto3 (S3 client).
|
||||
- **Performance:** Workflows start immediately (workers are long-lived); results are retrieved quickly thanks to MinIO caching and database indexing.
|
||||
- **Extensibility:** Add new workflows by mounting code; add new vertical workers with specialized toolchains; extend the API with new endpoints.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -22,58 +22,62 @@ FuzzForge relies on Docker containers for several key reasons:
|
||||
|
||||
Every workflow in FuzzForge is executed inside a Docker container. Here’s what that means in practice:
|
||||
|
||||
- **Workflow containers** are built from language-specific base images (like Python or Node.js), with security tools and workflow code pre-installed.
|
||||
- **Infrastructure containers** (API server, Prefect, database) use official images and are configured for the platform’s needs.
|
||||
- **Vertical worker containers** are built from language-specific base images with domain-specific security toolchains pre-installed (Android, Rust, Web, etc.).
|
||||
- **Infrastructure containers** (API server, Temporal, MinIO, database) use official images and are configured for the platform's needs.
|
||||
|
||||
### Container Lifecycle: From Build to Cleanup
|
||||
### Worker Lifecycle: From Build to Long-Running
|
||||
|
||||
The lifecycle of a workflow container looks like this:
|
||||
The lifecycle of a vertical worker looks like this:
|
||||
|
||||
1. **Image Build:** A Docker image is built with all required tools and code.
|
||||
2. **Image Push/Pull:** The image is pushed to (and later pulled from) a local or remote registry.
|
||||
3. **Container Creation:** The container is created with the right volumes and environment.
|
||||
4. **Execution:** The workflow runs inside the container.
|
||||
5. **Result Storage:** Results are written to mounted volumes.
|
||||
6. **Cleanup:** The container and any temporary data are removed.
|
||||
1. **Image Build:** A Docker image is built with all required toolchains for the vertical.
|
||||
2. **Worker Start:** The worker container starts as a long-lived process.
|
||||
3. **Workflow Discovery:** Worker scans mounted `/app/toolbox` for workflows matching its vertical.
|
||||
4. **Registration:** Workflows are registered with Temporal on the worker's task queue.
|
||||
5. **Execution:** When a workflow is submitted, the worker downloads the target from MinIO and executes.
|
||||
6. **Continuous Running:** Worker remains running, ready for the next workflow.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
Build[Build Image] --> Push[Push to Registry]
|
||||
Push --> Pull[Pull Image]
|
||||
Pull --> Create[Create Container]
|
||||
Create --> Mount[Mount Volumes]
|
||||
Mount --> Start[Start Container]
|
||||
Start --> Execute[Run Workflow]
|
||||
Execute --> Results[Store Results]
|
||||
Execute --> Stop[Stop Container]
|
||||
Stop --> Cleanup[Cleanup Data]
|
||||
Cleanup --> Remove[Remove Container]
|
||||
Build[Build Worker Image] --> Start[Start Worker Container]
|
||||
Start --> Mount[Mount Toolbox Volume]
|
||||
Mount --> Discover[Discover Workflows]
|
||||
Discover --> Register[Register with Temporal]
|
||||
Register --> Ready[Worker Ready]
|
||||
Ready --> Workflow[Workflow Submitted]
|
||||
Workflow --> Download[Download Target from MinIO]
|
||||
Download --> Execute[Execute Workflow]
|
||||
Execute --> Upload[Upload Results to MinIO]
|
||||
Upload --> Ready
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What’s Inside a Workflow Container?
|
||||
## What's Inside a Vertical Worker Container?
|
||||
|
||||
A typical workflow container is structured like this:
|
||||
A typical vertical worker container is structured like this:
|
||||
|
||||
- **Base Image:** Usually a slim language image (e.g., `python:3.11-slim`).
|
||||
- **Base Image:** Language-specific image (e.g., `python:3.11-slim`).
|
||||
- **System Dependencies:** Installed as needed (e.g., `git`, `curl`).
|
||||
- **Security Tools:** Pre-installed (e.g., `semgrep`, `bandit`, `safety`).
|
||||
- **Workflow Code:** Copied into the container.
|
||||
- **Domain-Specific Toolchains:** Pre-installed (e.g., Rust: `AFL++`, `cargo-fuzz`; Android: `apktool`, `Frida`).
|
||||
- **Temporal Python SDK:** For workflow execution.
|
||||
- **Boto3:** For MinIO/S3 access.
|
||||
- **Worker Script:** Discovers and registers workflows.
|
||||
- **Non-root User:** Created for execution.
|
||||
- **Entrypoint:** Runs the workflow code.
|
||||
- **Entrypoint:** Runs the worker discovery and registration loop.
|
||||
|
||||
Example Dockerfile snippet:
|
||||
Example Dockerfile snippet for Rust worker:
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
|
||||
RUN pip install semgrep bandit safety
|
||||
COPY ./toolbox /app/toolbox
|
||||
RUN apt-get update && apt-get install -y git curl build-essential && rm -rf /var/lib/apt/lists/*
|
||||
# Install AFL++, cargo, etc.
|
||||
RUN pip install temporalio boto3 pydantic
|
||||
COPY worker.py /app/
|
||||
WORKDIR /app
|
||||
RUN useradd -m -u 1000 fuzzforge
|
||||
USER fuzzforge
|
||||
CMD ["python", "-m", "toolbox.main"]
|
||||
# Toolbox will be mounted as volume at /app/toolbox
|
||||
CMD ["python", "worker.py"]
|
||||
```
|
||||
|
||||
---
|
||||
@@ -102,37 +106,42 @@ networks:
|
||||
|
||||
### Volume Types
|
||||
|
||||
- **Target Code Volume:** Mounts the code to be analyzed, read-only, into the container.
|
||||
- **Result Volume:** Stores workflow results and artifacts, persists after container exit.
|
||||
- **Temporary Volumes:** Used for scratch space, destroyed with the container.
|
||||
- **Toolbox Volume:** Mounts the workflow code directory, read-only, for dynamic discovery.
|
||||
- **Worker Cache:** Local cache for downloaded MinIO targets, with LRU eviction.
|
||||
- **MinIO Data:** Persistent storage for uploaded targets and results (S3-compatible).
|
||||
|
||||
Example volume mount:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- "/host/path/to/code:/app/target:ro"
|
||||
- "fuzzforge_prefect_storage:/app/prefect"
|
||||
- "./toolbox:/app/toolbox:ro" # Workflow code
|
||||
- "worker_cache:/cache" # Local cache
|
||||
- "minio_data:/data" # MinIO storage
|
||||
```
|
||||
|
||||
### Volume Security
|
||||
|
||||
- **Read-only Mounts:** Prevent workflows from modifying source code.
|
||||
- **Isolated Results:** Each workflow writes to its own result directory.
|
||||
- **No Arbitrary Host Access:** Only explicitly mounted paths are accessible.
|
||||
- **Read-only Toolbox:** Workflows cannot modify the mounted toolbox code.
|
||||
- **Isolated Storage:** Each workflow's target is stored with a unique `target_id` in MinIO.
|
||||
- **No Host Filesystem Access:** Workers access targets via MinIO, not host paths.
|
||||
- **Automatic Cleanup:** MinIO lifecycle policies delete old targets after 7 days.
|
||||
|
||||
---
|
||||
|
||||
## How Are Images Built and Managed?
|
||||
## How Are Worker Images Built and Managed?
|
||||
|
||||
- **Automated Builds:** Images are built and pushed to a local registry for development, or a secure registry for production.
|
||||
- **Automated Builds:** Vertical worker images are built with specialized toolchains.
|
||||
- **Build Optimization:** Use layer caching, multi-stage builds, and minimal base images.
|
||||
- **Versioning:** Use tags (`latest`, semantic versions, or SHA digests) to track images.
|
||||
- **Versioning:** Use tags (`latest`, semantic versions) to track worker images.
|
||||
- **Long-Lived:** Workers run continuously, not ephemeral per-workflow.
|
||||
|
||||
Example build and push:
|
||||
Example build:
|
||||
|
||||
```bash
|
||||
docker build -t localhost:5001/fuzzforge-static-analysis:latest .
|
||||
docker push localhost:5001/fuzzforge-static-analysis:latest
|
||||
cd workers/rust
|
||||
docker build -t fuzzforge-worker-rust:latest .
|
||||
# Or via docker-compose
|
||||
docker-compose -f docker-compose.temporal.yaml build worker-rust
|
||||
```
|
||||
|
||||
---
|
||||
@@ -147,7 +156,7 @@ Example resource config:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
prefect-worker:
|
||||
worker-rust:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
@@ -156,6 +165,8 @@ services:
|
||||
reservations:
|
||||
memory: 1G
|
||||
cpus: '0.5'
|
||||
environment:
|
||||
MAX_CONCURRENT_ACTIVITIES: 5
|
||||
```
|
||||
|
||||
---
|
||||
@@ -172,7 +183,7 @@ Example security options:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
prefect-worker:
|
||||
worker-rust:
|
||||
security_opt:
|
||||
- no-new-privileges:true
|
||||
cap_drop:
|
||||
@@ -188,8 +199,9 @@ services:
|
||||
## How Is Performance Optimized?
|
||||
|
||||
- **Image Layering:** Structure Dockerfiles for efficient caching.
|
||||
- **Dependency Preinstallation:** Reduce startup time by pre-installing dependencies.
|
||||
- **Warm Containers:** Optionally pre-create containers for faster workflow startup.
|
||||
- **Pre-installed Toolchains:** All tools installed in worker image, zero setup time per workflow.
|
||||
- **Long-Lived Workers:** Eliminate container startup overhead entirely.
|
||||
- **Local Caching:** MinIO targets cached locally for repeated workflows.
|
||||
- **Horizontal Scaling:** Scale worker containers to handle more workflows in parallel.
|
||||
|
||||
---
|
||||
@@ -205,10 +217,10 @@ services:
|
||||
|
||||
## How Does This All Fit Into FuzzForge?
|
||||
|
||||
- **Prefect Workers:** Manage the full lifecycle of workflow containers.
|
||||
- **API Integration:** Exposes container status, logs, and resource metrics.
|
||||
- **Volume Management:** Ensures results and artifacts are collected and persisted.
|
||||
- **Security and Resource Controls:** Enforced automatically for every workflow.
|
||||
- **Temporal Workers:** Long-lived vertical workers execute workflows with pre-installed toolchains.
|
||||
- **API Integration:** Exposes workflow status, logs, and resource metrics via Temporal.
|
||||
- **MinIO Storage:** Ensures targets and results are stored, cached, and cleaned up automatically.
|
||||
- **Security and Resource Controls:** Enforced automatically for every worker and workflow.
|
||||
|
||||
---
|
||||
|
||||
|
||||
594
docs/docs/concept/resource-management.md
Normal file
594
docs/docs/concept/resource-management.md
Normal file
@@ -0,0 +1,594 @@
|
||||
# Resource Management in FuzzForge
|
||||
|
||||
FuzzForge uses a multi-layered approach to manage CPU, memory, and concurrency for workflow execution. This ensures stable operation, prevents resource exhaustion, and allows predictable performance.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Resource limiting in FuzzForge operates at three levels:
|
||||
|
||||
1. **Docker Container Limits** (Primary Enforcement) - Hard limits enforced by Docker
|
||||
2. **Worker Concurrency Limits** - Controls parallel workflow execution
|
||||
3. **Workflow Metadata** (Advisory) - Documents resource requirements
|
||||
|
||||
---
|
||||
|
||||
## Worker Lifecycle Management (On-Demand Startup)
|
||||
|
||||
**New in v0.7.0**: Workers now support on-demand startup/shutdown for optimal resource usage.
|
||||
|
||||
### Architecture
|
||||
|
||||
Workers are **pre-built** but **not auto-started**:
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ docker- │ Pre-built worker images
|
||||
│ compose │ with profiles: ["workers", "ossfuzz"]
|
||||
│ build │ restart: "no"
|
||||
└─────────────┘
|
||||
↓
|
||||
┌─────────────┐
|
||||
│ Workers │ Status: Exited (not running)
|
||||
│ Pre-built │ RAM Usage: 0 MB
|
||||
└─────────────┘
|
||||
↓
|
||||
┌─────────────┐
|
||||
│ ff workflow │ CLI detects required worker
|
||||
│ run │ via /workflows/{name}/worker-info API
|
||||
└─────────────┘
|
||||
↓
|
||||
┌─────────────┐
|
||||
│ docker │ docker start fuzzforge-worker-ossfuzz
|
||||
│ start │ Wait for healthy status
|
||||
└─────────────┘
|
||||
↓
|
||||
┌─────────────┐
|
||||
│ Worker │ Status: Up
|
||||
│ Running │ RAM Usage: ~1-2 GB
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### Resource Savings
|
||||
|
||||
| State | Services Running | RAM Usage |
|
||||
|-------|-----------------|-----------|
|
||||
| **Idle** (no workflows) | Temporal, PostgreSQL, MinIO, Backend | ~1.2 GB |
|
||||
| **Active** (1 workflow) | Core + 1 worker | ~3-5 GB |
|
||||
| **Legacy** (all workers) | Core + all 5 workers | ~8 GB |
|
||||
|
||||
**Savings: ~6-7GB RAM when idle** ✨
|
||||
|
||||
### Configuration
|
||||
|
||||
Control via `.fuzzforge/config.yaml`:
|
||||
|
||||
```yaml
|
||||
workers:
|
||||
auto_start_workers: true # Auto-start when needed
|
||||
auto_stop_workers: false # Auto-stop after completion
|
||||
worker_startup_timeout: 60 # Startup timeout (seconds)
|
||||
docker_compose_file: null # Custom compose file path
|
||||
```
|
||||
|
||||
Or via CLI flags:
|
||||
|
||||
```bash
|
||||
# Auto-start disabled
|
||||
ff workflow run ossfuzz_campaign . --no-auto-start
|
||||
|
||||
# Auto-stop enabled
|
||||
ff workflow run ossfuzz_campaign . --wait --auto-stop
|
||||
```
|
||||
|
||||
### Backend API
|
||||
|
||||
New endpoint: `GET /workflows/{workflow_name}/worker-info`
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"workflow": "ossfuzz_campaign",
|
||||
"vertical": "ossfuzz",
|
||||
"worker_container": "fuzzforge-worker-ossfuzz",
|
||||
"task_queue": "ossfuzz-queue",
|
||||
"required": true
|
||||
}
|
||||
```
|
||||
|
||||
### SDK Integration
|
||||
|
||||
```python
|
||||
from fuzzforge_sdk import FuzzForgeClient
|
||||
|
||||
client = FuzzForgeClient()
|
||||
worker_info = client.get_workflow_worker_info("ossfuzz_campaign")
|
||||
# Returns: {"vertical": "ossfuzz", "worker_container": "fuzzforge-worker-ossfuzz", ...}
|
||||
```
|
||||
|
||||
### Manual Control
|
||||
|
||||
```bash
|
||||
# Start worker manually
|
||||
docker start fuzzforge-worker-ossfuzz
|
||||
|
||||
# Stop worker manually
|
||||
docker stop fuzzforge-worker-ossfuzz
|
||||
|
||||
# Check all worker statuses
|
||||
docker ps -a --filter "name=fuzzforge-worker"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Level 1: Docker Container Limits (Primary)
|
||||
|
||||
Docker container limits are the **primary enforcement mechanism** for CPU and memory resources. These are configured in `docker-compose.temporal.yaml` and enforced by the Docker runtime.
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
services:
|
||||
worker-rust:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2.0' # Maximum 2 CPU cores
|
||||
memory: 2G # Maximum 2GB RAM
|
||||
reservations:
|
||||
cpus: '0.5' # Minimum 0.5 CPU cores reserved
|
||||
memory: 512M # Minimum 512MB RAM reserved
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
- **CPU Limit**: Docker throttles CPU usage when the container exceeds the limit
|
||||
- **Memory Limit**: Docker kills the container (OOM) if it exceeds the memory limit
|
||||
- **Reservations**: Guarantees minimum resources are available to the worker
|
||||
|
||||
### Example Configuration by Vertical
|
||||
|
||||
Different verticals have different resource needs:
|
||||
|
||||
**Rust Worker** (CPU-intensive fuzzing):
|
||||
```yaml
|
||||
worker-rust:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4.0'
|
||||
memory: 4G
|
||||
```
|
||||
|
||||
**Android Worker** (Memory-intensive emulation):
|
||||
```yaml
|
||||
worker-android:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2.0'
|
||||
memory: 8G
|
||||
```
|
||||
|
||||
**Web Worker** (Lightweight analysis):
|
||||
```yaml
|
||||
worker-web:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '1.0'
|
||||
memory: 1G
|
||||
```
|
||||
|
||||
### Monitoring Container Resources
|
||||
|
||||
Check real-time resource usage:
|
||||
|
||||
```bash
|
||||
# Monitor all workers
|
||||
docker stats
|
||||
|
||||
# Monitor specific worker
|
||||
docker stats fuzzforge-worker-rust
|
||||
|
||||
# Output:
|
||||
# CONTAINER CPU % MEM USAGE / LIMIT MEM %
|
||||
# fuzzforge-worker-rust 85% 1.5GiB / 2GiB 75%
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Level 2: Worker Concurrency Limits
|
||||
|
||||
The `MAX_CONCURRENT_ACTIVITIES` environment variable controls how many workflows can execute **simultaneously** on a single worker.
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
services:
|
||||
worker-rust:
|
||||
environment:
|
||||
MAX_CONCURRENT_ACTIVITIES: 5
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 2G
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
- **Total Container Memory**: 2GB
|
||||
- **Concurrent Workflows**: 5
|
||||
- **Memory per Workflow**: ~400MB (2GB ÷ 5)
|
||||
|
||||
If a 6th workflow is submitted, it **waits in the Temporal queue** until one of the 5 running workflows completes.
|
||||
|
||||
### Calculating Concurrency
|
||||
|
||||
Use this formula to determine `MAX_CONCURRENT_ACTIVITIES`:
|
||||
|
||||
```
|
||||
MAX_CONCURRENT_ACTIVITIES = Container Memory Limit / Estimated Workflow Memory
|
||||
```
|
||||
|
||||
**Example:**
|
||||
- Container limit: 4GB
|
||||
- Workflow memory: ~800MB
|
||||
- Concurrency: 4GB ÷ 800MB = **5 concurrent workflows**
|
||||
|
||||
### Configuration Examples
|
||||
|
||||
**High Concurrency (Lightweight Workflows)**:
|
||||
```yaml
|
||||
worker-web:
|
||||
environment:
|
||||
MAX_CONCURRENT_ACTIVITIES: 10 # Many small workflows
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 2G # ~200MB per workflow
|
||||
```
|
||||
|
||||
**Low Concurrency (Heavy Workflows)**:
|
||||
```yaml
|
||||
worker-rust:
|
||||
environment:
|
||||
MAX_CONCURRENT_ACTIVITIES: 2 # Few large workflows
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G # ~2GB per workflow
|
||||
```
|
||||
|
||||
### Monitoring Concurrency
|
||||
|
||||
Check how many workflows are running:
|
||||
|
||||
```bash
|
||||
# View worker logs
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "Starting"
|
||||
|
||||
# Check Temporal UI
|
||||
# Open http://localhost:8233
|
||||
# Navigate to "Task Queues" → "rust" → See pending/running counts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Level 3: Workflow Metadata (Advisory)
|
||||
|
||||
Workflow metadata in `metadata.yaml` documents resource requirements, but these are **advisory only** (except for timeout).
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# backend/toolbox/workflows/security_assessment/metadata.yaml
|
||||
requirements:
|
||||
resources:
|
||||
memory: "512Mi" # Estimated memory usage (advisory)
|
||||
cpu: "500m" # Estimated CPU usage (advisory)
|
||||
timeout: 1800 # Execution timeout in seconds (ENFORCED)
|
||||
```
|
||||
|
||||
### What's Enforced vs Advisory
|
||||
|
||||
| Field | Enforcement | Description |
|
||||
|-------|-------------|-------------|
|
||||
| `timeout` | ✅ **Enforced by Temporal** | Workflow killed if exceeds timeout |
|
||||
| `memory` | ⚠️ Advisory only | Documents expected memory usage |
|
||||
| `cpu` | ⚠️ Advisory only | Documents expected CPU usage |
|
||||
|
||||
### Why Metadata Is Useful
|
||||
|
||||
Even though `memory` and `cpu` are advisory, they're valuable for:
|
||||
|
||||
1. **Capacity Planning**: Determine appropriate container limits
|
||||
2. **Concurrency Tuning**: Calculate `MAX_CONCURRENT_ACTIVITIES`
|
||||
3. **Documentation**: Communicate resource needs to users
|
||||
4. **Scheduling Hints**: Future horizontal scaling logic
|
||||
|
||||
### Timeout Enforcement
|
||||
|
||||
The `timeout` field is **enforced by Temporal**:
|
||||
|
||||
```python
|
||||
# Temporal automatically cancels workflow after timeout
|
||||
@workflow.defn
|
||||
class SecurityAssessmentWorkflow:
|
||||
@workflow.run
|
||||
async def run(self, target_id: str):
|
||||
# If this takes longer than metadata.timeout (1800s),
|
||||
# Temporal will cancel the workflow
|
||||
...
|
||||
```
|
||||
|
||||
**Check timeout in Temporal UI:**
|
||||
1. Open http://localhost:8233
|
||||
2. Navigate to workflow execution
|
||||
3. See "Timeout" in workflow details
|
||||
4. If exceeded, status shows "TIMED_OUT"
|
||||
|
||||
---
|
||||
|
||||
## Resource Management Best Practices
|
||||
|
||||
### 1. Set Conservative Container Limits
|
||||
|
||||
Start with lower limits and increase based on actual usage:
|
||||
|
||||
```yaml
|
||||
# Start conservative
|
||||
worker-rust:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2.0'
|
||||
memory: 2G
|
||||
|
||||
# Monitor with: docker stats
|
||||
# Increase if consistently hitting limits
|
||||
```
|
||||
|
||||
### 2. Calculate Concurrency from Profiling
|
||||
|
||||
Profile a single workflow first:
|
||||
|
||||
```bash
|
||||
# Run single workflow and monitor
|
||||
docker stats fuzzforge-worker-rust
|
||||
|
||||
# Note peak memory usage (e.g., 800MB)
|
||||
# Calculate concurrency: 4GB ÷ 800MB = 5
|
||||
```
|
||||
|
||||
### 3. Set Realistic Timeouts
|
||||
|
||||
Base timeouts on actual workflow duration:
|
||||
|
||||
```yaml
|
||||
# Static analysis: 5-10 minutes
|
||||
timeout: 600
|
||||
|
||||
# Fuzzing: 1-24 hours
|
||||
timeout: 86400
|
||||
|
||||
# Quick scans: 1-2 minutes
|
||||
timeout: 120
|
||||
```
|
||||
|
||||
### 4. Monitor Resource Exhaustion
|
||||
|
||||
Watch for these warning signs:
|
||||
|
||||
```bash
|
||||
# Check for OOM kills
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep -i "oom\|killed"
|
||||
|
||||
# Check for CPU throttling
|
||||
docker stats fuzzforge-worker-rust
|
||||
# If CPU% consistently at limit → increase cpus
|
||||
|
||||
# Check for memory pressure
|
||||
docker stats fuzzforge-worker-rust
|
||||
# If MEM% consistently >90% → increase memory
|
||||
```
|
||||
|
||||
### 5. Use Vertical-Specific Configuration
|
||||
|
||||
Different verticals have different needs:
|
||||
|
||||
| Vertical | CPU Priority | Memory Priority | Typical Config |
|
||||
|----------|--------------|-----------------|----------------|
|
||||
| Rust Fuzzing | High | Medium | 4 CPUs, 4GB RAM |
|
||||
| Android Analysis | Medium | High | 2 CPUs, 8GB RAM |
|
||||
| Web Scanning | Low | Low | 1 CPU, 1GB RAM |
|
||||
| Static Analysis | Medium | Medium | 2 CPUs, 2GB RAM |
|
||||
|
||||
---
|
||||
|
||||
## Horizontal Scaling
|
||||
|
||||
To handle more workflows, scale worker containers horizontally:
|
||||
|
||||
```bash
|
||||
# Scale rust worker to 3 instances
|
||||
docker-compose -f docker-compose.temporal.yaml up -d --scale worker-rust=3
|
||||
|
||||
# Now you can run:
|
||||
# - 3 workers × 5 concurrent activities = 15 workflows simultaneously
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- Temporal load balances across all workers on the same task queue
|
||||
- Each worker has independent resource limits
|
||||
- No shared state between workers
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting Resource Issues
|
||||
|
||||
### Issue: Workflows Stuck in "Running" State
|
||||
|
||||
**Symptom:** Workflow shows RUNNING but makes no progress
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check worker is alive
|
||||
docker-compose -f docker-compose.temporal.yaml ps worker-rust
|
||||
|
||||
# Check worker resource usage
|
||||
docker stats fuzzforge-worker-rust
|
||||
|
||||
# Check for OOM kills
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep -i oom
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
- Increase memory limit if worker was killed
|
||||
- Reduce `MAX_CONCURRENT_ACTIVITIES` if overloaded
|
||||
- Check worker logs for errors
|
||||
|
||||
### Issue: "Too Many Pending Tasks"
|
||||
|
||||
**Symptom:** Temporal shows many queued workflows
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check concurrent activities setting
|
||||
docker exec fuzzforge-worker-rust env | grep MAX_CONCURRENT_ACTIVITIES
|
||||
|
||||
# Check current workload
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "Starting"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
- Increase `MAX_CONCURRENT_ACTIVITIES` if resources allow
|
||||
- Add more worker instances (horizontal scaling)
|
||||
- Increase container resource limits
|
||||
|
||||
### Issue: Workflow Timeout
|
||||
|
||||
**Symptom:** Workflow shows "TIMED_OUT" in Temporal UI
|
||||
|
||||
**Diagnosis:**
|
||||
1. Check `metadata.yaml` timeout setting
|
||||
2. Check Temporal UI for execution duration
|
||||
3. Determine if timeout is appropriate
|
||||
|
||||
**Solution:**
|
||||
```yaml
|
||||
# Increase timeout in metadata.yaml
|
||||
requirements:
|
||||
resources:
|
||||
timeout: 3600 # Increased from 1800
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workspace Isolation and Cache Management
|
||||
|
||||
FuzzForge uses workspace isolation to prevent concurrent workflows from interfering with each other. Each workflow run can have its own isolated workspace or share a common workspace based on the isolation mode.
|
||||
|
||||
### Cache Directory Structure
|
||||
|
||||
Workers cache downloaded targets locally to avoid repeated downloads:
|
||||
|
||||
```
|
||||
/cache/
|
||||
├── {target_id_1}/
|
||||
│ ├── {run_id_1}/ # Isolated mode
|
||||
│ │ ├── target # Downloaded tarball
|
||||
│ │ └── workspace/ # Extracted files
|
||||
│ ├── {run_id_2}/
|
||||
│ │ ├── target
|
||||
│ │ └── workspace/
|
||||
│ └── workspace/ # Shared mode (no run_id)
|
||||
│ └── ...
|
||||
├── {target_id_2}/
|
||||
│ └── shared/ # Copy-on-write shared download
|
||||
│ ├── target
|
||||
│ └── workspace/
|
||||
```
|
||||
|
||||
### Isolation Modes
|
||||
|
||||
**Isolated Mode** (default for fuzzing):
|
||||
- Each run gets `/cache/{target_id}/{run_id}/workspace/`
|
||||
- Safe for concurrent execution
|
||||
- Cleanup removes entire run directory
|
||||
|
||||
**Shared Mode** (for read-only workflows):
|
||||
- All runs share `/cache/{target_id}/workspace/`
|
||||
- Efficient (downloads once)
|
||||
- No cleanup (cache persists)
|
||||
|
||||
**Copy-on-Write Mode**:
|
||||
- Downloads to `/cache/{target_id}/shared/`
|
||||
- Copies to `/cache/{target_id}/{run_id}/` per run
|
||||
- Balances performance and isolation
|
||||
|
||||
### Cache Limits
|
||||
|
||||
Configure cache limits via environment variables:
|
||||
|
||||
```yaml
|
||||
worker-rust:
|
||||
environment:
|
||||
CACHE_DIR: /cache
|
||||
CACHE_MAX_SIZE: 10GB # Maximum cache size before LRU eviction
|
||||
CACHE_TTL: 7d # Time-to-live for cached files
|
||||
```
|
||||
|
||||
### LRU Eviction
|
||||
|
||||
When cache exceeds `CACHE_MAX_SIZE`, the least-recently-used files are automatically evicted:
|
||||
|
||||
1. Worker tracks last access time for each cached target
|
||||
2. When cache is full, oldest accessed files are removed first
|
||||
3. Eviction runs periodically (every 30 minutes)
|
||||
|
||||
### Monitoring Cache Usage
|
||||
|
||||
Check cache size and cleanup logs:
|
||||
|
||||
```bash
|
||||
# Check cache size
|
||||
docker exec fuzzforge-worker-rust du -sh /cache
|
||||
|
||||
# Monitor cache evictions
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "Evicted from cache"
|
||||
|
||||
# Check download vs cache hit rate
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep -E "Cache (HIT|MISS)"
|
||||
```
|
||||
|
||||
See the [Workspace Isolation](/concept/workspace-isolation) guide for complete details on isolation modes and when to use each.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
FuzzForge's resource management strategy:
|
||||
|
||||
1. **Docker Container Limits**: Primary enforcement (CPU/memory hard limits)
|
||||
2. **Concurrency Limits**: Controls parallel workflows per worker
|
||||
3. **Workflow Metadata**: Advisory resource hints + enforced timeout
|
||||
4. **Workspace Isolation**: Controls cache sharing and cleanup behavior
|
||||
|
||||
**Key Takeaways:**
|
||||
- Set conservative Docker limits and adjust based on monitoring
|
||||
- Calculate `MAX_CONCURRENT_ACTIVITIES` from container memory ÷ workflow memory
|
||||
- Use `docker stats` and Temporal UI to monitor resource usage
|
||||
- Scale horizontally by adding more worker instances
|
||||
- Set realistic timeouts based on actual workflow duration
|
||||
- Choose appropriate isolation mode (isolated for fuzzing, shared for analysis)
|
||||
- Monitor cache usage and adjust `CACHE_MAX_SIZE` as needed
|
||||
|
||||
---
|
||||
|
||||
**Next Steps:**
|
||||
- Review `docker-compose.temporal.yaml` resource configuration
|
||||
- Profile your workflows to determine actual resource usage
|
||||
- Adjust limits based on monitoring data
|
||||
- Set up alerts for resource exhaustion
|
||||
@@ -25,30 +25,31 @@ Here’s how a workflow moves through the FuzzForge system:
|
||||
```mermaid
|
||||
graph TB
|
||||
User[User/CLI/API] --> API[FuzzForge API]
|
||||
API --> Prefect[Prefect Orchestrator]
|
||||
Prefect --> Worker[Prefect Worker]
|
||||
Worker --> Container[Docker Container]
|
||||
Container --> Tools[Security Tools]
|
||||
API --> MinIO[MinIO Storage]
|
||||
API --> Temporal[Temporal Orchestrator]
|
||||
Temporal --> Worker[Vertical Worker]
|
||||
Worker --> MinIO
|
||||
Worker --> Tools[Security Tools]
|
||||
Tools --> Results[SARIF Results]
|
||||
Results --> Storage[Persistent Storage]
|
||||
Results --> MinIO
|
||||
```
|
||||
|
||||
**Key roles:**
|
||||
- **User/CLI/API:** Submits and manages workflows.
|
||||
- **FuzzForge API:** Validates, orchestrates, and tracks workflows.
|
||||
- **Prefect Orchestrator:** Schedules and manages workflow execution.
|
||||
- **Prefect Worker:** Runs the workflow in a Docker container.
|
||||
- **User/CLI/API:** Submits workflows and uploads files.
|
||||
- **FuzzForge API:** Validates, uploads targets, and tracks workflows.
|
||||
- **Temporal Orchestrator:** Schedules and manages workflow execution.
|
||||
- **Vertical Worker:** Long-lived worker with pre-installed security tools.
|
||||
- **MinIO Storage:** Stores uploaded targets and results.
|
||||
- **Security Tools:** Perform the actual analysis.
|
||||
- **Persistent Storage:** Stores results and artifacts.
|
||||
|
||||
---
|
||||
|
||||
## Workflow Lifecycle: From Idea to Results
|
||||
|
||||
1. **Design:** Choose tools, define integration logic, set up parameters, and build the Docker image.
|
||||
2. **Deployment:** Build and push the image, register the workflow, and configure defaults.
|
||||
3. **Execution:** User submits a workflow; parameters and target are validated; the workflow is scheduled and executed in a container; tools run as designed.
|
||||
4. **Completion:** Results are collected, normalized, and stored; status is updated; temporary resources are cleaned up; results are made available via API/CLI.
|
||||
1. **Design:** Choose tools, define integration logic, set up parameters, and specify the vertical worker.
|
||||
2. **Deployment:** Create workflow code, add metadata with `vertical` field, mount as volume in worker.
|
||||
3. **Execution:** User submits a workflow with file upload; file is stored in MinIO; workflow is routed to vertical worker; worker downloads target and executes; tools run as designed.
|
||||
4. **Completion:** Results are collected, normalized, and stored in MinIO; status is updated; MinIO lifecycle policies clean up old files; results are made available via API/CLI.
|
||||
|
||||
---
|
||||
|
||||
@@ -85,25 +86,25 @@ FuzzForge supports several workflow types, each optimized for a specific securit
|
||||
|
||||
## Data Flow and Storage
|
||||
|
||||
- **Input:** Target code and parameters are validated and mounted as read-only volumes.
|
||||
- **Processing:** Tools are initialized and run (often in parallel); outputs are collected and normalized.
|
||||
- **Output:** Results are stored in persistent volumes and indexed for fast retrieval; metadata is saved in the database; intermediate results may be cached for performance.
|
||||
- **Input:** Target files uploaded via HTTP to MinIO; parameters validated and passed to Temporal.
|
||||
- **Processing:** Worker downloads target from MinIO to local cache; tools are initialized and run (often in parallel); outputs are collected and normalized.
|
||||
- **Output:** Results are stored in MinIO and indexed for fast retrieval; metadata is saved in PostgreSQL; targets cached locally for repeated workflows; lifecycle policies clean up after 7 days.
|
||||
|
||||
---
|
||||
|
||||
## Error Handling and Recovery
|
||||
|
||||
- **Tool-Level:** Timeouts, resource exhaustion, and crashes are handled gracefully; failed tools don’t stop the workflow.
|
||||
- **Workflow-Level:** Container failures, volume issues, and network problems are detected and reported.
|
||||
- **Recovery:** Automatic retries for transient errors; partial results are returned when possible; workflows degrade gracefully if some tools are unavailable.
|
||||
- **Tool-Level:** Timeouts, resource exhaustion, and crashes are handled gracefully; failed tools don't stop the workflow.
|
||||
- **Workflow-Level:** Worker failures, storage issues, and network problems are detected and reported by Temporal.
|
||||
- **Recovery:** Automatic retries for transient errors via Temporal; partial results are returned when possible; workflows degrade gracefully if some tools are unavailable; MinIO ensures targets remain accessible.
|
||||
|
||||
---
|
||||
|
||||
## Performance and Optimization
|
||||
|
||||
- **Container Efficiency:** Docker images are layered and cached for fast startup; containers may be reused when safe.
|
||||
- **Worker Efficiency:** Long-lived workers eliminate container startup overhead; pre-installed toolchains reduce setup time.
|
||||
- **Parallel Processing:** Independent tools run concurrently to maximize CPU usage and minimize wait times.
|
||||
- **Caching:** Images, dependencies, and intermediate results are cached to avoid unnecessary recomputation.
|
||||
- **Caching:** MinIO targets are cached locally; repeated workflows reuse cached targets; worker cache uses LRU eviction.
|
||||
|
||||
---
|
||||
|
||||
|
||||
378
docs/docs/concept/workspace-isolation.md
Normal file
378
docs/docs/concept/workspace-isolation.md
Normal file
@@ -0,0 +1,378 @@
|
||||
# Workspace Isolation
|
||||
|
||||
FuzzForge's workspace isolation system ensures that concurrent workflow runs don't interfere with each other. This is critical for fuzzing and security analysis workloads where multiple workflows might process the same target simultaneously.
|
||||
|
||||
---
|
||||
|
||||
## Why Workspace Isolation?
|
||||
|
||||
### The Problem
|
||||
|
||||
Without isolation, concurrent workflows accessing the same target would share the same cache directory:
|
||||
|
||||
```
|
||||
/cache/{target_id}/workspace/
|
||||
```
|
||||
|
||||
This causes problems when:
|
||||
- **Fuzzing workflows** modify corpus files and crash artifacts
|
||||
- **Multiple runs** operate on the same target simultaneously
|
||||
- **File conflicts** occur during read/write operations
|
||||
|
||||
### The Solution
|
||||
|
||||
FuzzForge implements configurable workspace isolation with three modes:
|
||||
|
||||
1. **isolated** (default): Each run gets its own workspace
|
||||
2. **shared**: All runs share the same workspace
|
||||
3. **copy-on-write**: Download once, copy per run
|
||||
|
||||
---
|
||||
|
||||
## Isolation Modes
|
||||
|
||||
### Isolated Mode (Default)
|
||||
|
||||
**Use for**: Fuzzing workflows, any workflow that modifies files
|
||||
|
||||
**Cache path**: `/cache/{target_id}/{run_id}/workspace/`
|
||||
|
||||
Each workflow run gets a completely isolated workspace directory. The target is downloaded to a run-specific path using the unique `run_id`.
|
||||
|
||||
**Advantages:**
|
||||
- ✅ Safe for concurrent execution
|
||||
- ✅ No file conflicts
|
||||
- ✅ Clean per-run state
|
||||
|
||||
**Disadvantages:**
|
||||
- ⚠️ Downloads target for each run (higher bandwidth/storage)
|
||||
- ⚠️ No sharing of downloaded artifacts
|
||||
|
||||
**Example workflows:**
|
||||
- `atheris_fuzzing` - Modifies corpus, creates crash files
|
||||
- `cargo_fuzzing` - Modifies corpus, generates artifacts
|
||||
|
||||
**metadata.yaml:**
|
||||
```yaml
|
||||
name: atheris_fuzzing
|
||||
workspace_isolation: "isolated"
|
||||
```
|
||||
|
||||
**Cleanup behavior:**
|
||||
Entire run directory `/cache/{target_id}/{run_id}/` is removed after workflow completes.
|
||||
|
||||
---
|
||||
|
||||
### Shared Mode
|
||||
|
||||
**Use for**: Read-only analysis workflows, security scanners
|
||||
|
||||
**Cache path**: `/cache/{target_id}/workspace/`
|
||||
|
||||
All workflow runs for the same target share a single workspace directory. The target is downloaded once and reused across runs.
|
||||
|
||||
**Advantages:**
|
||||
- ✅ Efficient (download once, use many times)
|
||||
- ✅ Lower bandwidth and storage usage
|
||||
- ✅ Faster startup (cache hit after first download)
|
||||
|
||||
**Disadvantages:**
|
||||
- ⚠️ Not safe for workflows that modify files
|
||||
- ⚠️ Potential race conditions if workflows write
|
||||
|
||||
**Example workflows:**
|
||||
- `security_assessment` - Read-only file scanning and analysis
|
||||
- `secret_detection` - Read-only secret scanning
|
||||
|
||||
**metadata.yaml:**
|
||||
```yaml
|
||||
name: security_assessment
|
||||
workspace_isolation: "shared"
|
||||
```
|
||||
|
||||
**Cleanup behavior:**
|
||||
No cleanup (workspace shared across runs). Cache persists until LRU eviction.
|
||||
|
||||
---
|
||||
|
||||
### Copy-on-Write Mode
|
||||
|
||||
**Use for**: Workflows that need isolation but benefit from shared initial download
|
||||
|
||||
**Cache paths**:
|
||||
- Shared download: `/cache/{target_id}/shared/target`
|
||||
- Per-run copy: `/cache/{target_id}/{run_id}/workspace/`
|
||||
|
||||
Target is downloaded once to a shared location, then copied for each run.
|
||||
|
||||
**Advantages:**
|
||||
- ✅ Download once (shared bandwidth)
|
||||
- ✅ Isolated per-run workspace (safe for modifications)
|
||||
- ✅ Balances performance and safety
|
||||
|
||||
**Disadvantages:**
|
||||
- ⚠️ Copy overhead (disk I/O per run)
|
||||
- ⚠️ Higher storage usage than shared mode
|
||||
|
||||
**metadata.yaml:**
|
||||
```yaml
|
||||
name: my_workflow
|
||||
workspace_isolation: "copy-on-write"
|
||||
```
|
||||
|
||||
**Cleanup behavior:**
|
||||
Run-specific copies removed, shared download persists until LRU eviction.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### Activity Signature
|
||||
|
||||
The `get_target` activity accepts isolation parameters:
|
||||
|
||||
```python
|
||||
from temporalio import workflow
|
||||
|
||||
# In your workflow
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
```
|
||||
|
||||
### Path Resolution
|
||||
|
||||
Based on the isolation mode:
|
||||
|
||||
```python
|
||||
# Isolated mode
|
||||
if workspace_isolation == "isolated":
|
||||
cache_path = f"/cache/{target_id}/{run_id}/"
|
||||
|
||||
# Shared mode
|
||||
elif workspace_isolation == "shared":
|
||||
cache_path = f"/cache/{target_id}/"
|
||||
|
||||
# Copy-on-write mode
|
||||
else: # copy-on-write
|
||||
shared_path = f"/cache/{target_id}/shared/"
|
||||
cache_path = f"/cache/{target_id}/{run_id}/"
|
||||
# Download to shared_path, copy to cache_path
|
||||
```
|
||||
|
||||
### Cleanup
|
||||
|
||||
The `cleanup_cache` activity respects isolation mode:
|
||||
|
||||
```python
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
```
|
||||
|
||||
**Cleanup behavior by mode:**
|
||||
- `isolated`: Removes `/cache/{target_id}/{run_id}/` entirely
|
||||
- `shared`: Skips cleanup (shared across runs)
|
||||
- `copy-on-write`: Removes run directory, keeps shared cache
|
||||
|
||||
---
|
||||
|
||||
## Cache Management
|
||||
|
||||
### Cache Directory Structure
|
||||
|
||||
```
|
||||
/cache/
|
||||
├── {target_id_1}/
|
||||
│ ├── {run_id_1}/
|
||||
│ │ ├── target # Downloaded tarball
|
||||
│ │ └── workspace/ # Extracted files
|
||||
│ ├── {run_id_2}/
|
||||
│ │ ├── target
|
||||
│ │ └── workspace/
|
||||
│ └── workspace/ # Shared mode (no run_id subdirectory)
|
||||
│ └── ...
|
||||
├── {target_id_2}/
|
||||
│ └── shared/
|
||||
│ ├── target # Copy-on-write shared download
|
||||
│ └── workspace/
|
||||
```
|
||||
|
||||
### LRU Eviction
|
||||
|
||||
When cache exceeds the configured limit (default: 10GB), least-recently-used files are evicted automatically.
|
||||
|
||||
**Configuration:**
|
||||
```yaml
|
||||
# In worker environment
|
||||
CACHE_DIR: /cache
|
||||
CACHE_MAX_SIZE: 10GB
|
||||
CACHE_TTL: 7d
|
||||
```
|
||||
|
||||
**Eviction policy:**
|
||||
- Tracks last access time for each cached target
|
||||
- When cache is full, removes oldest accessed files first
|
||||
- Cleanup runs periodically (every 30 minutes)
|
||||
|
||||
---
|
||||
|
||||
## Choosing the Right Mode
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Workflow Type | Modifies Files? | Concurrent Runs? | Recommended Mode |
|
||||
|---------------|----------------|------------------|------------------|
|
||||
| Fuzzing (AFL, libFuzzer, Atheris) | ✅ Yes | ✅ Yes | **isolated** |
|
||||
| Static Analysis | ❌ No | ✅ Yes | **shared** |
|
||||
| Secret Scanning | ❌ No | ✅ Yes | **shared** |
|
||||
| File Modification | ✅ Yes | ❌ No | **isolated** |
|
||||
| Large Downloads | ❌ No | ✅ Yes | **copy-on-write** |
|
||||
|
||||
### Guidelines
|
||||
|
||||
**Use `isolated` when:**
|
||||
- Workflow modifies files (corpus, crashes, logs)
|
||||
- Fuzzing or dynamic analysis
|
||||
- Concurrent runs must not interfere
|
||||
|
||||
**Use `shared` when:**
|
||||
- Workflow only reads files
|
||||
- Static analysis or scanning
|
||||
- Want to minimize bandwidth/storage
|
||||
|
||||
**Use `copy-on-write` when:**
|
||||
- Workflow modifies files but target is large (>100MB)
|
||||
- Want isolation but minimize download overhead
|
||||
- Balance between shared and isolated
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### In Workflow Metadata
|
||||
|
||||
Document the isolation mode in `metadata.yaml`:
|
||||
|
||||
```yaml
|
||||
name: atheris_fuzzing
|
||||
version: "1.0.0"
|
||||
vertical: python
|
||||
|
||||
# Workspace isolation mode
|
||||
# - "isolated" (default): Each run gets own workspace
|
||||
# - "shared": All runs share workspace (read-only workflows)
|
||||
# - "copy-on-write": Download once, copy per run
|
||||
workspace_isolation: "isolated"
|
||||
```
|
||||
|
||||
### In Workflow Code
|
||||
|
||||
Pass isolation mode to storage activities:
|
||||
|
||||
```python
|
||||
@workflow.defn
|
||||
class MyWorkflow:
|
||||
@workflow.run
|
||||
async def run(self, target_id: str) -> Dict[str, Any]:
|
||||
# Get run ID for isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Download target with isolation
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "isolated"],
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
|
||||
# ... workflow logic ...
|
||||
|
||||
# Cleanup with same isolation mode
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "isolated"],
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Workflows interfere with each other
|
||||
|
||||
**Symptom:** Fuzzing crashes from one run appear in another
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check workspace paths in logs
|
||||
docker logs fuzzforge-worker-python | grep "User code downloaded"
|
||||
|
||||
# Should see run-specific paths:
|
||||
# ✅ /cache/abc-123/run-xyz-456/workspace (isolated)
|
||||
# ❌ /cache/abc-123/workspace (shared - problem for fuzzing)
|
||||
```
|
||||
|
||||
**Solution:** Change `workspace_isolation` to `"isolated"` in metadata.yaml
|
||||
|
||||
### Issue: High bandwidth usage
|
||||
|
||||
**Symptom:** Target downloaded repeatedly for same target_id
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check MinIO downloads in logs
|
||||
docker logs fuzzforge-worker-python | grep "downloading from MinIO"
|
||||
|
||||
# If many downloads for same target_id with shared workflow:
|
||||
# Problem is using "isolated" mode for read-only workflow
|
||||
```
|
||||
|
||||
**Solution:** Change to `"shared"` mode for read-only workflows
|
||||
|
||||
### Issue: Cache fills up quickly
|
||||
|
||||
**Symptom:** Disk space consumed by /cache directory
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check cache size
|
||||
docker exec fuzzforge-worker-python du -sh /cache
|
||||
|
||||
# Check LRU settings
|
||||
docker exec fuzzforge-worker-python env | grep CACHE
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
- Increase `CACHE_MAX_SIZE` environment variable
|
||||
- Use `shared` mode for read-only workflows
|
||||
- Decrease `CACHE_TTL` for faster eviction
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
FuzzForge's workspace isolation system provides:
|
||||
|
||||
1. **Safe concurrent execution** for fuzzing and analysis workflows
|
||||
2. **Three isolation modes** to balance safety vs efficiency
|
||||
3. **Automatic cache management** with LRU eviction
|
||||
4. **Per-workflow configuration** via metadata.yaml
|
||||
|
||||
**Key Takeaways:**
|
||||
- Use `isolated` (default) for workflows that modify files
|
||||
- Use `shared` for read-only analysis workflows
|
||||
- Use `copy-on-write` to balance isolation and bandwidth
|
||||
- Configure via `workspace_isolation` field in metadata.yaml
|
||||
- Workers automatically handle download, extraction, and cleanup
|
||||
|
||||
---
|
||||
|
||||
**Next Steps:**
|
||||
- Review your workflows and set appropriate isolation modes
|
||||
- Monitor cache usage with `docker exec fuzzforge-worker-python du -sh /cache`
|
||||
- Adjust `CACHE_MAX_SIZE` if needed for your workload
|
||||
550
docs/docs/how-to/cicd-integration.md
Normal file
550
docs/docs/how-to/cicd-integration.md
Normal file
@@ -0,0 +1,550 @@
|
||||
# CI/CD Integration Guide
|
||||
|
||||
This guide shows you how to integrate FuzzForge into your CI/CD pipeline for automated security testing on every commit, pull request, or scheduled run.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
FuzzForge can run entirely inside CI containers (GitHub Actions, GitLab CI, etc.) with no external infrastructure required. The complete FuzzForge stack—Temporal, PostgreSQL, MinIO, Backend, and workers—starts automatically when needed and cleans up after execution.
|
||||
|
||||
### Key Benefits
|
||||
|
||||
✅ **Zero Infrastructure**: No servers to maintain
|
||||
✅ **Ephemeral**: Fresh environment per run
|
||||
✅ **Resource Efficient**: On-demand workers (v0.7.0) save ~6-7GB RAM
|
||||
✅ **Fast Feedback**: Fail builds on critical/high findings
|
||||
✅ **Standards Compliant**: SARIF export for GitHub Security / GitLab SAST
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required
|
||||
- **CI Runner**: Ubuntu with Docker support
|
||||
- **RAM**: At least 4GB available (7GB on GitHub Actions)
|
||||
- **Startup Time**: ~60-90 seconds
|
||||
|
||||
### Optional
|
||||
- **jq**: For merging Docker daemon config (auto-installed in examples)
|
||||
- **Python 3.11+**: For FuzzForge CLI
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Add Startup Scripts
|
||||
|
||||
FuzzForge provides helper scripts to configure Docker and start services:
|
||||
|
||||
```bash
|
||||
# Start FuzzForge (configure Docker, start services, wait for health)
|
||||
bash scripts/ci-start.sh
|
||||
|
||||
# Stop and cleanup after execution
|
||||
bash scripts/ci-stop.sh
|
||||
```
|
||||
|
||||
### 2. Install CLI
|
||||
|
||||
```bash
|
||||
pip install ./cli
|
||||
```
|
||||
|
||||
### 3. Initialize Project
|
||||
|
||||
```bash
|
||||
ff init --api-url http://localhost:8000 --name "CI Security Scan"
|
||||
```
|
||||
|
||||
### 4. Run Workflow
|
||||
|
||||
```bash
|
||||
# Run and fail on error findings
|
||||
ff workflow run security_assessment . \
|
||||
--wait \
|
||||
--fail-on error \
|
||||
--export-sarif results.sarif
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Models
|
||||
|
||||
FuzzForge supports two CI/CD deployment models:
|
||||
|
||||
### Option A: Ephemeral (Recommended)
|
||||
|
||||
**Everything runs inside the CI container for each job.**
|
||||
|
||||
```
|
||||
┌────────────────────────────────────┐
|
||||
│ GitHub Actions Runner │
|
||||
│ │
|
||||
│ ┌──────────────────────────────┐ │
|
||||
│ │ FuzzForge Stack │ │
|
||||
│ │ • Temporal │ │
|
||||
│ │ • PostgreSQL │ │
|
||||
│ │ • MinIO │ │
|
||||
│ │ • Backend │ │
|
||||
│ │ • Workers (on-demand) │ │
|
||||
│ └──────────────────────────────┘ │
|
||||
│ │
|
||||
│ ff workflow run ... │
|
||||
└────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- No infrastructure to maintain
|
||||
- Complete isolation per run
|
||||
- Works on GitHub/GitLab free tier
|
||||
|
||||
**Cons:**
|
||||
- 60-90s startup time per run
|
||||
- Limited to runner resources
|
||||
|
||||
**Best For:** Open source projects, infrequent scans, PR checks
|
||||
|
||||
### Option B: Persistent Backend
|
||||
|
||||
**Backend runs on a separate server, CLI connects remotely.**
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌──────────────────┐
|
||||
│ CI Runner │────────▶│ FuzzForge Server │
|
||||
│ (ff CLI) │ HTTPS │ (self-hosted) │
|
||||
└──────────────┘ └──────────────────┘
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- No startup time
|
||||
- More resources
|
||||
- Faster execution
|
||||
|
||||
**Cons:**
|
||||
- Requires infrastructure
|
||||
- Needs API tokens
|
||||
|
||||
**Best For:** Large teams, frequent scans, long fuzzing campaigns
|
||||
|
||||
---
|
||||
|
||||
## GitHub Actions Integration
|
||||
|
||||
### Complete Example
|
||||
|
||||
See `.github/workflows/examples/security-scan.yml` for a full working example.
|
||||
|
||||
**Basic workflow:**
|
||||
|
||||
```yaml
|
||||
name: Security Scan
|
||||
|
||||
on: [pull_request, push]
|
||||
|
||||
jobs:
|
||||
security:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Start FuzzForge
|
||||
run: bash scripts/ci-start.sh
|
||||
|
||||
- name: Install CLI
|
||||
run: pip install ./cli
|
||||
|
||||
- name: Security Scan
|
||||
run: |
|
||||
ff init --api-url http://localhost:8000
|
||||
ff workflow run security_assessment . \
|
||||
--wait \
|
||||
--fail-on error \
|
||||
--export-sarif results.sarif
|
||||
|
||||
- name: Upload SARIF
|
||||
if: always()
|
||||
uses: github/codeql-action/upload-sarif@v3
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
|
||||
- name: Cleanup
|
||||
if: always()
|
||||
run: bash scripts/ci-stop.sh
|
||||
```
|
||||
|
||||
### GitHub Security Tab Integration
|
||||
|
||||
Upload SARIF results to see findings directly in GitHub:
|
||||
|
||||
```yaml
|
||||
- name: Upload SARIF to GitHub Security
|
||||
if: always()
|
||||
uses: github/codeql-action/upload-sarif@v3
|
||||
with:
|
||||
sarif_file: results.sarif
|
||||
```
|
||||
|
||||
Findings appear in:
|
||||
- **Security** tab → **Code scanning alerts**
|
||||
- Pull request annotations
|
||||
- Commit status checks
|
||||
|
||||
---
|
||||
|
||||
## GitLab CI Integration
|
||||
|
||||
### Complete Example
|
||||
|
||||
See `.gitlab-ci.example.yml` for a full working example.
|
||||
|
||||
**Basic pipeline:**
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- security
|
||||
|
||||
variables:
|
||||
FUZZFORGE_API_URL: "http://localhost:8000"
|
||||
|
||||
security:scan:
|
||||
image: docker:24
|
||||
services:
|
||||
- docker:24-dind
|
||||
before_script:
|
||||
- apk add bash python3 py3-pip
|
||||
- bash scripts/ci-start.sh
|
||||
- pip3 install ./cli --break-system-packages
|
||||
- ff init --api-url $FUZZFORGE_API_URL
|
||||
script:
|
||||
- ff workflow run security_assessment . --wait --fail-on error --export-sarif results.sarif
|
||||
artifacts:
|
||||
reports:
|
||||
sast: results.sarif
|
||||
after_script:
|
||||
- bash scripts/ci-stop.sh
|
||||
```
|
||||
|
||||
### GitLab SAST Dashboard Integration
|
||||
|
||||
The `reports: sast:` section automatically integrates with GitLab's Security Dashboard.
|
||||
|
||||
---
|
||||
|
||||
## CLI Flags for CI/CD
|
||||
|
||||
### `--fail-on`
|
||||
|
||||
Fail the build if findings match specified SARIF severity levels.
|
||||
|
||||
**Syntax:**
|
||||
```bash
|
||||
--fail-on error,warning,note,info,all,none
|
||||
```
|
||||
|
||||
**SARIF Levels:**
|
||||
- `error` - Critical security issues (fail build)
|
||||
- `warning` - Potential security issues (may fail build)
|
||||
- `note` - Informational findings (typically don't fail)
|
||||
- `info` - Additional context (rarely blocks)
|
||||
- `all` - Any finding (strictest)
|
||||
- `none` - Never fail (report only)
|
||||
|
||||
**Examples:**
|
||||
```bash
|
||||
# Fail on errors only (recommended for CI)
|
||||
--fail-on error
|
||||
|
||||
# Fail on errors or warnings
|
||||
--fail-on error,warning
|
||||
|
||||
# Fail on any finding (strictest)
|
||||
--fail-on all
|
||||
|
||||
# Never fail, just report (useful for monitoring)
|
||||
--fail-on none
|
||||
```
|
||||
|
||||
**Common Patterns:**
|
||||
- **PR checks**: `--fail-on error` (block critical issues)
|
||||
- **Release gates**: `--fail-on error,warning` (stricter)
|
||||
- **Nightly scans**: `--fail-on none` (monitoring only)
|
||||
- **Security audit**: `--fail-on all` (maximum strictness)
|
||||
|
||||
**Exit Codes:**
|
||||
- `0` - No blocking findings
|
||||
- `1` - Found blocking findings or error
|
||||
|
||||
### `--export-sarif`
|
||||
|
||||
Export SARIF results to a file after workflow completion.
|
||||
|
||||
**Syntax:**
|
||||
```bash
|
||||
--export-sarif <path>
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
ff workflow run security_assessment . \
|
||||
--wait \
|
||||
--export-sarif results.sarif
|
||||
```
|
||||
|
||||
### `--wait`
|
||||
|
||||
Wait for workflow execution to complete (required for CI/CD).
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
ff workflow run security_assessment . --wait
|
||||
```
|
||||
|
||||
Without `--wait`, the command returns immediately and the workflow runs in the background.
|
||||
|
||||
---
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### PR Security Gate
|
||||
|
||||
Block PRs with critical/high findings:
|
||||
|
||||
```yaml
|
||||
on: pull_request
|
||||
|
||||
jobs:
|
||||
security:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: bash scripts/ci-start.sh
|
||||
- run: pip install ./cli
|
||||
- run: |
|
||||
ff init --api-url http://localhost:8000
|
||||
ff workflow run security_assessment . --wait --fail-on error
|
||||
- if: always()
|
||||
run: bash scripts/ci-stop.sh
|
||||
```
|
||||
|
||||
### Secret Detection (Zero Tolerance)
|
||||
|
||||
Fail on ANY exposed secrets:
|
||||
|
||||
```bash
|
||||
ff workflow run secret_detection . --wait --fail-on all
|
||||
```
|
||||
|
||||
### Nightly Fuzzing (Report Only)
|
||||
|
||||
Run long fuzzing campaigns without failing the build:
|
||||
|
||||
```yaml
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 2 * * *' # 2 AM daily
|
||||
|
||||
jobs:
|
||||
fuzzing:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 120
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: bash scripts/ci-start.sh
|
||||
- run: pip install ./cli
|
||||
- run: |
|
||||
ff init --api-url http://localhost:8000
|
||||
ff workflow run atheris_fuzzing . \
|
||||
max_iterations=100000000 \
|
||||
timeout_seconds=7200 \
|
||||
--wait \
|
||||
--export-sarif fuzzing-results.sarif
|
||||
continue-on-error: true
|
||||
- if: always()
|
||||
run: bash scripts/ci-stop.sh
|
||||
```
|
||||
|
||||
### Release Gate
|
||||
|
||||
Block releases with ANY security findings:
|
||||
|
||||
```yaml
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
|
||||
jobs:
|
||||
release-security:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: bash scripts/ci-start.sh
|
||||
- run: pip install ./cli
|
||||
- run: |
|
||||
ff init --api-url http://localhost:8000
|
||||
ff workflow run security_assessment . --wait --fail-on all
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Startup Time
|
||||
|
||||
**Current:** ~60-90 seconds
|
||||
**Breakdown:**
|
||||
- Docker daemon restart: 10-15s
|
||||
- docker-compose up: 30-40s
|
||||
- Health check wait: 20-30s
|
||||
|
||||
**Tips to reduce:**
|
||||
1. Use `docker-compose.ci.yml` (optional, see below)
|
||||
2. Cache Docker layers (GitHub Actions)
|
||||
3. Use self-hosted runners (persistent Docker)
|
||||
|
||||
### Optional: CI-Optimized Compose File
|
||||
|
||||
Create `docker-compose.ci.yml`:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
postgresql:
|
||||
# Use in-memory storage (faster, ephemeral)
|
||||
tmpfs:
|
||||
- /var/lib/postgresql/data
|
||||
command: postgres -c fsync=off -c full_page_writes=off
|
||||
|
||||
minio:
|
||||
# Use in-memory storage
|
||||
tmpfs:
|
||||
- /data
|
||||
|
||||
temporal:
|
||||
healthcheck:
|
||||
# More frequent health checks
|
||||
interval: 5s
|
||||
retries: 10
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
docker-compose -f docker-compose.yml -f docker-compose.ci.yml up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Permission denied" connecting to Docker socket
|
||||
|
||||
**Solution:** Add user to docker group or use `sudo`.
|
||||
|
||||
```bash
|
||||
# GitHub Actions (already has permissions)
|
||||
# GitLab CI: use docker:dind service
|
||||
```
|
||||
|
||||
### "Connection refused to localhost:8000"
|
||||
|
||||
**Problem:** Services not healthy yet.
|
||||
|
||||
**Solution:** Increase health check timeout in `ci-start.sh`:
|
||||
|
||||
```bash
|
||||
timeout 180 bash -c 'until curl -sf http://localhost:8000/health; do sleep 3; done'
|
||||
```
|
||||
|
||||
### "Out of disk space"
|
||||
|
||||
**Problem:** Docker volumes filling up.
|
||||
|
||||
**Solution:** Cleanup in `after_script`:
|
||||
|
||||
```bash
|
||||
after_script:
|
||||
- bash scripts/ci-stop.sh
|
||||
- docker system prune -af --volumes
|
||||
```
|
||||
|
||||
### Worker not starting
|
||||
|
||||
**Problem:** Worker container exists but not running.
|
||||
|
||||
**Solution:** Workers are pre-built but start on-demand (v0.7.0). If a workflow fails immediately, check:
|
||||
|
||||
```bash
|
||||
docker logs fuzzforge-worker-<vertical>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always use `--wait`** in CI/CD pipelines
|
||||
2. **Set appropriate `--fail-on` levels** for your use case:
|
||||
- PR checks: `error` (block critical issues)
|
||||
- Release gates: `error,warning` (stricter)
|
||||
- Nightly scans: Don't use (report only)
|
||||
3. **Export SARIF** to integrate with security dashboards
|
||||
4. **Set timeouts** on CI jobs to prevent hanging
|
||||
5. **Use artifacts** to preserve findings for review
|
||||
6. **Cleanup always** with `if: always()` or `after_script`
|
||||
|
||||
---
|
||||
|
||||
## Advanced: Persistent Backend Setup
|
||||
|
||||
For high-frequency usage, deploy FuzzForge on a dedicated server:
|
||||
|
||||
### 1. Deploy FuzzForge Server
|
||||
|
||||
```bash
|
||||
# On your CI server
|
||||
git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
|
||||
cd fuzzforge_ai
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### 2. Generate API Token (Future Feature)
|
||||
|
||||
```bash
|
||||
# This will be available in a future release
|
||||
docker exec fuzzforge-backend python -c "
|
||||
from src.auth import generate_token
|
||||
print(generate_token(name='github-actions'))
|
||||
"
|
||||
```
|
||||
|
||||
### 3. Configure CI to Use Remote Backend
|
||||
|
||||
```yaml
|
||||
env:
|
||||
FUZZFORGE_API_URL: https://fuzzforge.company.com
|
||||
FUZZFORGE_API_TOKEN: ${{ secrets.FUZZFORGE_TOKEN }}
|
||||
|
||||
steps:
|
||||
- run: pip install fuzzforge-cli
|
||||
- run: ff workflow run security_assessment . --wait --fail-on error
|
||||
```
|
||||
|
||||
**Note:** Authentication is not yet implemented (v0.7.0). Use network isolation or VPN for now.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
- **GitHub Actions**: `.github/workflows/examples/security-scan.yml`
|
||||
- **GitLab CI**: `.gitlab-ci.example.yml`
|
||||
- **Startup Script**: `scripts/ci-start.sh`
|
||||
- **Cleanup Script**: `scripts/ci-stop.sh`
|
||||
|
||||
---
|
||||
|
||||
## Support
|
||||
|
||||
- **Documentation**: [https://docs.fuzzforge.io](https://docs.fuzzforge.io)
|
||||
- **Issues**: [GitHub Issues](https://github.com/FuzzingLabs/fuzzforge_ai/issues)
|
||||
- **Discussions**: [GitHub Discussions](https://github.com/FuzzingLabs/fuzzforge_ai/discussions)
|
||||
@@ -9,18 +9,18 @@ This guide will walk you through the process of creating a custom security analy
|
||||
Before you start, make sure you have:
|
||||
|
||||
- A working FuzzForge development environment (see [Contributing](/reference/contributing.md))
|
||||
- Familiarity with Python (async/await), Docker, and Prefect 3
|
||||
- Familiarity with Python (async/await), Docker, and Temporal
|
||||
- At least one custom or built-in module to use in your workflow
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Understand Workflow Architecture
|
||||
|
||||
A FuzzForge workflow is a Prefect 3 flow that:
|
||||
A FuzzForge workflow is a Temporal workflow that:
|
||||
|
||||
- Runs in an isolated Docker container
|
||||
- Runs inside a long-lived vertical worker container (pre-built with toolchains)
|
||||
- Orchestrates one or more analysis modules (scanner, analyzer, reporter, etc.)
|
||||
- Handles secure volume mounting for code and results
|
||||
- Downloads targets from MinIO (S3-compatible storage) automatically
|
||||
- Produces standardized SARIF output
|
||||
- Supports configurable parameters and resource limits
|
||||
|
||||
@@ -28,9 +28,9 @@ A FuzzForge workflow is a Prefect 3 flow that:
|
||||
|
||||
```
|
||||
backend/toolbox/workflows/{workflow_name}/
|
||||
├── workflow.py # Main workflow definition (Prefect flow)
|
||||
├── Dockerfile # Container image definition
|
||||
├── metadata.yaml # Workflow metadata and configuration
|
||||
├── workflow.py # Main workflow definition (Temporal workflow)
|
||||
├── activities.py # Workflow activities (optional)
|
||||
├── metadata.yaml # Workflow metadata and configuration (must include vertical field)
|
||||
└── requirements.txt # Additional Python dependencies (optional)
|
||||
```
|
||||
|
||||
@@ -48,6 +48,7 @@ version: "1.0.0"
|
||||
description: "Analyzes project dependencies for security vulnerabilities"
|
||||
author: "FuzzingLabs Security Team"
|
||||
category: "comprehensive"
|
||||
vertical: "web" # REQUIRED: Which vertical worker to use (rust, android, web, etc.)
|
||||
tags:
|
||||
- "dependency-scanning"
|
||||
- "vulnerability-analysis"
|
||||
@@ -63,10 +64,6 @@ requirements:
|
||||
parameters:
|
||||
type: object
|
||||
properties:
|
||||
target_path:
|
||||
type: string
|
||||
default: "/workspace"
|
||||
description: "Path to analyze"
|
||||
scan_dev_dependencies:
|
||||
type: boolean
|
||||
description: "Include development dependencies"
|
||||
@@ -85,36 +82,63 @@ output_schema:
|
||||
description: "Scan execution summary"
|
||||
```
|
||||
|
||||
**Important:** The `vertical` field determines which worker runs your workflow. Ensure the worker has the required tools installed.
|
||||
|
||||
### Workspace Isolation
|
||||
|
||||
Add the `workspace_isolation` field to control how workflow runs share or isolate workspaces:
|
||||
|
||||
```yaml
|
||||
# Workspace isolation mode (system-level configuration)
|
||||
# - "isolated" (default): Each workflow run gets its own isolated workspace
|
||||
# - "shared": All runs share the same workspace (for read-only workflows)
|
||||
# - "copy-on-write": Download once, copy for each run
|
||||
workspace_isolation: "isolated"
|
||||
```
|
||||
|
||||
**Choosing the right mode:**
|
||||
|
||||
- **`isolated`** (default) - For fuzzing workflows that modify files (corpus, crashes)
|
||||
- Example: `atheris_fuzzing`, `cargo_fuzzing`
|
||||
- Safe for concurrent execution
|
||||
|
||||
- **`shared`** - For read-only analysis workflows
|
||||
- Example: `security_assessment`, `secret_detection`
|
||||
- Efficient (downloads once, reuses cache)
|
||||
|
||||
- **`copy-on-write`** - For large targets that need isolation
|
||||
- Downloads once, copies per run
|
||||
- Balances performance and isolation
|
||||
|
||||
See the [Workspace Isolation](/concept/workspace-isolation) guide for details.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Add Live Statistics to Your Workflow 🚦
|
||||
|
||||
Want real-time progress and stats for your workflow? FuzzForge supports live statistics reporting using Prefect and structured logging. This lets users (and the platform) monitor workflow progress, see live updates, and stream stats via API or WebSocket.
|
||||
Want real-time progress and stats for your workflow? FuzzForge supports live statistics reporting using Temporal workflow logging. This lets users (and the platform) monitor workflow progress, see live updates, and stream stats via API or WebSocket.
|
||||
|
||||
### 1. Import Required Dependencies
|
||||
|
||||
```python
|
||||
from prefect import task, get_run_context
|
||||
from temporalio import workflow, activity
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
```
|
||||
|
||||
### 2. Create a Statistics Callback Function
|
||||
### 2. Create a Statistics Callback in Activity
|
||||
|
||||
Add a callback that logs structured stats updates:
|
||||
Add a callback that logs structured stats updates in your activity:
|
||||
|
||||
```python
|
||||
@task(name="my_workflow_task")
|
||||
async def my_workflow_task(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
# Get run context for statistics reporting
|
||||
try:
|
||||
context = get_run_context()
|
||||
run_id = str(context.flow_run.id)
|
||||
logger.info(f"Running task for flow run: {run_id}")
|
||||
except Exception:
|
||||
run_id = None
|
||||
logger.warning("Could not get run context for statistics")
|
||||
@activity.defn
|
||||
async def my_workflow_activity(target_path: str, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
# Get activity info for run tracking
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
logger.info(f"Running activity for workflow: {run_id}")
|
||||
|
||||
# Define callback function for live statistics
|
||||
async def stats_callback(stats_data: Dict[str, Any]):
|
||||
@@ -124,7 +148,7 @@ async def my_workflow_task(workspace: Path, config: Dict[str, Any]) -> Dict[str,
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "live_stats", # Type of statistics
|
||||
"workflow_type": "my_workflow", # Your workflow name
|
||||
"run_id": stats_data.get("run_id"),
|
||||
"run_id": run_id,
|
||||
|
||||
# Add your custom statistics fields here:
|
||||
"progress": stats_data.get("progress", 0),
|
||||
@@ -138,7 +162,7 @@ async def my_workflow_task(workspace: Path, config: Dict[str, Any]) -> Dict[str,
|
||||
|
||||
# Pass callback to your module/processor
|
||||
processor = MyWorkflowModule()
|
||||
result = await processor.execute(config, workspace, stats_callback=stats_callback)
|
||||
result = await processor.execute(config, target_path, stats_callback=stats_callback)
|
||||
return result.dict()
|
||||
```
|
||||
|
||||
@@ -224,15 +248,16 @@ Live statistics automatically appear in:
|
||||
#### Example: Adding Stats to a Security Scanner
|
||||
|
||||
```python
|
||||
async def security_scan_task(workspace: Path, config: Dict[str, Any]):
|
||||
context = get_run_context()
|
||||
run_id = str(context.flow_run.id)
|
||||
@activity.defn
|
||||
async def security_scan_activity(target_path: str, config: Dict[str, Any]):
|
||||
info = activity.info()
|
||||
run_id = info.workflow_id
|
||||
|
||||
async def stats_callback(stats_data):
|
||||
logger.info("LIVE_STATS", extra={
|
||||
"stats_type": "scan_progress",
|
||||
"workflow_type": "security_scan",
|
||||
"run_id": stats_data.get("run_id"),
|
||||
"run_id": run_id,
|
||||
"files_scanned": stats_data.get("files_scanned", 0),
|
||||
"vulnerabilities_found": stats_data.get("vulnerabilities_found", 0),
|
||||
"scan_percentage": stats_data.get("scan_percentage", 0.0),
|
||||
@@ -241,7 +266,7 @@ async def security_scan_task(workspace: Path, config: Dict[str, Any]):
|
||||
})
|
||||
|
||||
scanner = SecurityScannerModule()
|
||||
return await scanner.execute(config, workspace, stats_callback=stats_callback)
|
||||
return await scanner.execute(config, target_path, stats_callback=stats_callback)
|
||||
```
|
||||
|
||||
With these steps, your workflow will provide rich, real-time feedback to users and the FuzzForge platform—making automation more transparent and interactive!
|
||||
@@ -250,95 +275,182 @@ With these steps, your workflow will provide rich, real-time feedback to users a
|
||||
|
||||
## Step 4: Implement the Workflow Logic
|
||||
|
||||
Create a `workflow.py` file. This is where you define your Prefect flow and tasks.
|
||||
Create a `workflow.py` file. This is where you define your Temporal workflow and activities.
|
||||
|
||||
Example (simplified):
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
from prefect import flow, task
|
||||
from temporalio import workflow, activity
|
||||
from datetime import timedelta
|
||||
from src.toolbox.modules.dependency_scanner import DependencyScanner
|
||||
from src.toolbox.modules.vulnerability_analyzer import VulnerabilityAnalyzer
|
||||
from src.toolbox.modules.reporter import SARIFReporter
|
||||
|
||||
@task
|
||||
async def scan_dependencies(workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@activity.defn
|
||||
async def scan_dependencies(target_path: str, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
scanner = DependencyScanner()
|
||||
return (await scanner.execute(config, workspace)).dict()
|
||||
return (await scanner.execute(config, target_path)).dict()
|
||||
|
||||
@task
|
||||
async def analyze_vulnerabilities(dependencies: Dict[str, Any], workspace: Path, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@activity.defn
|
||||
async def analyze_vulnerabilities(dependencies: Dict[str, Any], target_path: str, config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
analyzer = VulnerabilityAnalyzer()
|
||||
analyzer_config = {**config, 'dependencies': dependencies.get('findings', [])}
|
||||
return (await analyzer.execute(analyzer_config, workspace)).dict()
|
||||
return (await analyzer.execute(analyzer_config, target_path)).dict()
|
||||
|
||||
@task
|
||||
async def generate_report(dep_results: Dict[str, Any], vuln_results: Dict[str, Any], config: Dict[str, Any], workspace: Path) -> Dict[str, Any]:
|
||||
@activity.defn
|
||||
async def generate_report(dep_results: Dict[str, Any], vuln_results: Dict[str, Any], config: Dict[str, Any]) -> Dict[str, Any]:
|
||||
reporter = SARIFReporter()
|
||||
all_findings = dep_results.get("findings", []) + vuln_results.get("findings", [])
|
||||
reporter_config = {**config, "findings": all_findings}
|
||||
return (await reporter.execute(reporter_config, workspace)).dict().get("sarif", {})
|
||||
return (await reporter.execute(reporter_config, None)).dict().get("sarif", {})
|
||||
|
||||
@flow(name="dependency_analysis")
|
||||
async def main_flow(
|
||||
target_path: str = "/workspace",
|
||||
scan_dev_dependencies: bool = True,
|
||||
vulnerability_threshold: str = "medium"
|
||||
) -> Dict[str, Any]:
|
||||
workspace = Path(target_path)
|
||||
scanner_config = {"scan_dev_dependencies": scan_dev_dependencies}
|
||||
analyzer_config = {"vulnerability_threshold": vulnerability_threshold}
|
||||
reporter_config = {}
|
||||
@workflow.defn
|
||||
class DependencyAnalysisWorkflow:
|
||||
@workflow.run
|
||||
async def run(
|
||||
self,
|
||||
target_id: str, # Target file ID from MinIO (downloaded by worker automatically)
|
||||
scan_dev_dependencies: bool = True,
|
||||
vulnerability_threshold: str = "medium"
|
||||
) -> Dict[str, Any]:
|
||||
workflow.logger.info(f"Starting dependency analysis for target: {target_id}")
|
||||
|
||||
dep_results = await scan_dependencies(workspace, scanner_config)
|
||||
vuln_results = await analyze_vulnerabilities(dep_results, workspace, analyzer_config)
|
||||
sarif_report = await generate_report(dep_results, vuln_results, reporter_config, workspace)
|
||||
return sarif_report
|
||||
# Get run ID for workspace isolation
|
||||
run_id = workflow.info().run_id
|
||||
|
||||
# Worker downloads target from MinIO with isolation
|
||||
target_path = await workflow.execute_activity(
|
||||
"get_target",
|
||||
args=[target_id, run_id, "shared"], # target_id, run_id, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=5)
|
||||
)
|
||||
|
||||
scanner_config = {"scan_dev_dependencies": scan_dev_dependencies}
|
||||
analyzer_config = {"vulnerability_threshold": vulnerability_threshold}
|
||||
|
||||
# Execute activities with retries and timeouts
|
||||
dep_results = await workflow.execute_activity(
|
||||
scan_dependencies,
|
||||
args=[target_path, scanner_config],
|
||||
start_to_close_timeout=timedelta(minutes=10),
|
||||
retry_policy=workflow.RetryPolicy(maximum_attempts=3)
|
||||
)
|
||||
|
||||
vuln_results = await workflow.execute_activity(
|
||||
analyze_vulnerabilities,
|
||||
args=[dep_results, target_path, analyzer_config],
|
||||
start_to_close_timeout=timedelta(minutes=10),
|
||||
retry_policy=workflow.RetryPolicy(maximum_attempts=3)
|
||||
)
|
||||
|
||||
sarif_report = await workflow.execute_activity(
|
||||
generate_report,
|
||||
args=[dep_results, vuln_results, {}],
|
||||
start_to_close_timeout=timedelta(minutes=5),
|
||||
retry_policy=workflow.RetryPolicy(maximum_attempts=3)
|
||||
)
|
||||
|
||||
# Cleanup cache (respects isolation mode)
|
||||
await workflow.execute_activity(
|
||||
"cleanup_cache",
|
||||
args=[target_path, "shared"], # target_path, workspace_isolation
|
||||
start_to_close_timeout=timedelta(minutes=1)
|
||||
)
|
||||
|
||||
workflow.logger.info("Dependency analysis completed")
|
||||
return sarif_report
|
||||
```
|
||||
|
||||
**Key differences from Prefect:**
|
||||
- Use `@workflow.defn` class instead of `@flow` function
|
||||
- Use `@activity.defn` instead of `@task`
|
||||
- Must call `get_target` activity to download from MinIO with isolation mode
|
||||
- Use `workflow.execute_activity()` with explicit timeouts and retry policies
|
||||
- Use `workflow.logger` for logging (appears in Temporal UI)
|
||||
- Call `cleanup_cache` activity at end to clean up workspace
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Create the Dockerfile
|
||||
## Step 5: No Dockerfile Needed! 🎉
|
||||
|
||||
Your workflow runs in a container. Create a `Dockerfile`:
|
||||
**Good news:** You don't need to create a Dockerfile for your workflow. Workflows run inside pre-built **vertical worker containers** that already have toolchains installed.
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
WORKDIR /app
|
||||
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
|
||||
COPY ../../../pyproject.toml ./
|
||||
COPY ../../../uv.lock ./
|
||||
RUN pip install uv && uv sync --no-dev
|
||||
COPY requirements.txt ./
|
||||
RUN uv pip install -r requirements.txt
|
||||
COPY ../../../ .
|
||||
RUN mkdir -p /workspace
|
||||
CMD ["uv", "run", "python", "-m", "src.toolbox.workflows.dependency_analysis.workflow"]
|
||||
```
|
||||
**How it works:**
|
||||
1. Your workflow code lives in `backend/toolbox/workflows/{workflow_name}/`
|
||||
2. This directory is **mounted as a volume** in the worker container at `/app/toolbox/workflows/`
|
||||
3. Worker discovers and registers your workflow automatically on startup
|
||||
4. When submitted, the workflow runs inside the long-lived worker container
|
||||
|
||||
**Benefits:**
|
||||
- Zero container build time per workflow
|
||||
- Instant code changes (just restart worker)
|
||||
- All toolchains pre-installed (AFL++, cargo-fuzz, apktool, etc.)
|
||||
- Consistent environment across all workflows of the same vertical
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Register and Test Your Workflow
|
||||
## Step 6: Test Your Workflow
|
||||
|
||||
- Add your workflow to the registry (e.g., `backend/toolbox/workflows/registry.py`)
|
||||
- Write a test script or use the CLI to submit a workflow run
|
||||
- Check that SARIF results are produced and stored as expected
|
||||
### Using the CLI
|
||||
|
||||
Example test:
|
||||
```bash
|
||||
# Start FuzzForge with Temporal
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
|
||||
# Wait for services to initialize
|
||||
sleep 10
|
||||
|
||||
# Submit workflow with file upload
|
||||
cd test_projects/vulnerable_app/
|
||||
fuzzforge workflow run dependency_analysis .
|
||||
|
||||
# CLI automatically:
|
||||
# - Creates tarball of current directory
|
||||
# - Uploads to MinIO via backend
|
||||
# - Submits workflow with target_id
|
||||
# - Worker downloads from MinIO and executes
|
||||
```
|
||||
|
||||
### Using Python SDK
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from backend.src.toolbox.workflows.dependency_analysis.workflow import main_flow
|
||||
from fuzzforge_sdk import FuzzForgeClient
|
||||
from pathlib import Path
|
||||
|
||||
async def test_workflow():
|
||||
result = await main_flow(target_path="/tmp/test-project", scan_dev_dependencies=True)
|
||||
print(result)
|
||||
client = FuzzForgeClient(base_url="http://localhost:8000")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(test_workflow())
|
||||
# Submit with automatic upload
|
||||
response = client.submit_workflow_with_upload(
|
||||
workflow_name="dependency_analysis",
|
||||
target_path=Path("/path/to/project"),
|
||||
parameters={
|
||||
"scan_dev_dependencies": True,
|
||||
"vulnerability_threshold": "medium"
|
||||
}
|
||||
)
|
||||
|
||||
print(f"Workflow started: {response.run_id}")
|
||||
|
||||
# Wait for completion
|
||||
final_status = client.wait_for_completion(response.run_id)
|
||||
|
||||
# Get findings
|
||||
findings = client.get_run_findings(response.run_id)
|
||||
print(findings.sarif)
|
||||
|
||||
client.close()
|
||||
```
|
||||
|
||||
### Check Temporal UI
|
||||
|
||||
Open http://localhost:8233 to see:
|
||||
- Workflow execution timeline
|
||||
- Activity results
|
||||
- Logs and errors
|
||||
- Retry history
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
453
docs/docs/how-to/debugging.md
Normal file
453
docs/docs/how-to/debugging.md
Normal file
@@ -0,0 +1,453 @@
|
||||
# Debugging Workflows and Modules
|
||||
|
||||
This guide shows you how to debug FuzzForge workflows and modules using Temporal's powerful debugging features.
|
||||
|
||||
---
|
||||
|
||||
## Quick Debugging Checklist
|
||||
|
||||
When something goes wrong:
|
||||
|
||||
1. **Check worker logs** - `docker-compose -f docker-compose.temporal.yaml logs worker-rust -f`
|
||||
2. **Check Temporal UI** - http://localhost:8233 (visual execution history)
|
||||
3. **Check MinIO console** - http://localhost:9001 (inspect uploaded files)
|
||||
4. **Check backend logs** - `docker-compose -f docker-compose.temporal.yaml logs fuzzforge-backend -f`
|
||||
|
||||
---
|
||||
|
||||
## Debugging Workflow Discovery
|
||||
|
||||
### Problem: Workflow Not Found
|
||||
|
||||
**Symptom:** Worker logs show "No workflows found for vertical: rust"
|
||||
|
||||
**Debug Steps:**
|
||||
|
||||
1. **Check if worker can see the workflow:**
|
||||
```bash
|
||||
docker exec fuzzforge-worker-rust ls /app/toolbox/workflows/
|
||||
```
|
||||
|
||||
2. **Check metadata.yaml exists:**
|
||||
```bash
|
||||
docker exec fuzzforge-worker-rust cat /app/toolbox/workflows/my_workflow/metadata.yaml
|
||||
```
|
||||
|
||||
3. **Verify vertical field matches:**
|
||||
```bash
|
||||
docker exec fuzzforge-worker-rust grep "vertical:" /app/toolbox/workflows/my_workflow/metadata.yaml
|
||||
```
|
||||
Should output: `vertical: rust`
|
||||
|
||||
4. **Check worker logs for discovery errors:**
|
||||
```bash
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "my_workflow"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
- Ensure `metadata.yaml` has correct `vertical` field
|
||||
- Restart worker to reload: `docker-compose -f docker-compose.temporal.yaml restart worker-rust`
|
||||
- Check worker logs for discovery confirmation
|
||||
|
||||
---
|
||||
|
||||
## Debugging Workflow Execution
|
||||
|
||||
### Using Temporal Web UI
|
||||
|
||||
The Temporal UI at http://localhost:8233 is your primary debugging tool.
|
||||
|
||||
**Navigate to a workflow:**
|
||||
1. Open http://localhost:8233
|
||||
2. Click "Workflows" in left sidebar
|
||||
3. Find your workflow by `run_id` or workflow name
|
||||
4. Click to see detailed execution
|
||||
|
||||
**What you can see:**
|
||||
- **Execution timeline** - When each activity started/completed
|
||||
- **Input/output** - Exact parameters passed to workflow
|
||||
- **Activity results** - Return values from each activity
|
||||
- **Error stack traces** - Full Python tracebacks
|
||||
- **Retry history** - All retry attempts with reasons
|
||||
- **Worker information** - Which worker executed each activity
|
||||
|
||||
**Example: Finding why an activity failed:**
|
||||
1. Open workflow in Temporal UI
|
||||
2. Scroll to failed activity (marked in red)
|
||||
3. Click on the activity
|
||||
4. See full error message and stack trace
|
||||
5. Check "Input" tab to see what parameters were passed
|
||||
|
||||
---
|
||||
|
||||
## Viewing Worker Logs
|
||||
|
||||
### Real-time Monitoring
|
||||
|
||||
```bash
|
||||
# Follow logs from rust worker
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust -f
|
||||
|
||||
# Follow logs from all workers
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust worker-android -f
|
||||
|
||||
# Show last 100 lines
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust --tail 100
|
||||
```
|
||||
|
||||
### What Worker Logs Show
|
||||
|
||||
**On startup:**
|
||||
```
|
||||
INFO: Scanning for workflows in: /app/toolbox/workflows
|
||||
INFO: Importing workflow module: toolbox.workflows.security_assessment.workflow
|
||||
INFO: ✓ Discovered workflow: SecurityAssessmentWorkflow from security_assessment (vertical: rust)
|
||||
INFO: 🚀 Worker started for vertical 'rust'
|
||||
```
|
||||
|
||||
**During execution:**
|
||||
```
|
||||
INFO: Starting SecurityAssessmentWorkflow (workflow_id=security_assessment-abc123, target_id=548193a1...)
|
||||
INFO: Downloading target from MinIO: 548193a1-f73f-4ec1-8068-19ec2660b8e4
|
||||
INFO: Executing activity: scan_files
|
||||
INFO: Completed activity: scan_files (duration: 3.2s)
|
||||
```
|
||||
|
||||
**On errors:**
|
||||
```
|
||||
ERROR: Failed to import workflow module toolbox.workflows.broken.workflow:
|
||||
File "/app/toolbox/workflows/broken/workflow.py", line 42
|
||||
def run(
|
||||
IndentationError: expected an indented block
|
||||
```
|
||||
|
||||
### Filtering Logs
|
||||
|
||||
```bash
|
||||
# Show only errors
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep ERROR
|
||||
|
||||
# Show workflow discovery
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "Discovered workflow"
|
||||
|
||||
# Show specific workflow execution
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "security_assessment-abc123"
|
||||
|
||||
# Show activity execution
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep "activity"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging File Upload
|
||||
|
||||
### Check if File Was Uploaded
|
||||
|
||||
**Using MinIO Console:**
|
||||
1. Open http://localhost:9001
|
||||
2. Login: `fuzzforge` / `fuzzforge123`
|
||||
3. Click "Buckets" → "targets"
|
||||
4. Look for your `target_id` (UUID format)
|
||||
5. Click to download and inspect locally
|
||||
|
||||
**Using CLI:**
|
||||
```bash
|
||||
# Check MinIO status
|
||||
curl http://localhost:9000
|
||||
|
||||
# List backend logs for upload
|
||||
docker-compose -f docker-compose.temporal.yaml logs fuzzforge-backend | grep "upload"
|
||||
```
|
||||
|
||||
### Check Worker Cache
|
||||
|
||||
```bash
|
||||
# List cached targets
|
||||
docker exec fuzzforge-worker-rust ls -lh /cache/
|
||||
|
||||
# Check specific target
|
||||
docker exec fuzzforge-worker-rust ls -lh /cache/548193a1-f73f-4ec1-8068-19ec2660b8e4
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Interactive Debugging
|
||||
|
||||
### Access Running Worker
|
||||
|
||||
```bash
|
||||
# Open shell in worker container
|
||||
docker exec -it fuzzforge-worker-rust bash
|
||||
|
||||
# Now you can:
|
||||
# - Check filesystem
|
||||
ls -la /app/toolbox/workflows/
|
||||
|
||||
# - Test imports
|
||||
python3 -c "from toolbox.workflows.my_workflow.workflow import MyWorkflow; print(MyWorkflow)"
|
||||
|
||||
# - Check environment variables
|
||||
env | grep TEMPORAL
|
||||
|
||||
# - Test activities
|
||||
cd /app/toolbox/workflows/my_workflow
|
||||
python3 -c "from activities import my_activity; print(my_activity)"
|
||||
|
||||
# - Check cache
|
||||
ls -lh /cache/
|
||||
```
|
||||
|
||||
### Test Module in Isolation
|
||||
|
||||
```bash
|
||||
# Enter worker container
|
||||
docker exec -it fuzzforge-worker-rust bash
|
||||
|
||||
# Navigate to module
|
||||
cd /app/toolbox/modules/scanner
|
||||
|
||||
# Run module directly
|
||||
python3 -c "
|
||||
from file_scanner import FileScannerModule
|
||||
scanner = FileScannerModule()
|
||||
print(scanner.get_metadata())
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging Module Code
|
||||
|
||||
### Edit and Reload
|
||||
|
||||
Since toolbox is mounted as a volume, you can edit code on your host and reload:
|
||||
|
||||
1. **Edit module on host:**
|
||||
```bash
|
||||
# On your host machine
|
||||
vim backend/toolbox/modules/scanner/file_scanner.py
|
||||
```
|
||||
|
||||
2. **Restart worker to reload:**
|
||||
```bash
|
||||
docker-compose -f docker-compose.temporal.yaml restart worker-rust
|
||||
```
|
||||
|
||||
3. **Check discovery logs:**
|
||||
```bash
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | tail -50
|
||||
```
|
||||
|
||||
### Add Debug Logging
|
||||
|
||||
Add logging to your workflow or module:
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@workflow.defn
|
||||
class MyWorkflow:
|
||||
@workflow.run
|
||||
async def run(self, target_id: str):
|
||||
workflow.logger.info(f"Starting with target_id: {target_id}") # Shows in Temporal UI
|
||||
|
||||
logger.info("Processing step 1") # Shows in worker logs
|
||||
logger.debug(f"Debug info: {some_variable}") # Shows if LOG_LEVEL=DEBUG
|
||||
|
||||
try:
|
||||
result = await some_activity()
|
||||
logger.info(f"Activity result: {result}")
|
||||
except Exception as e:
|
||||
logger.error(f"Activity failed: {e}", exc_info=True) # Full stack trace
|
||||
raise
|
||||
```
|
||||
|
||||
Set debug logging:
|
||||
```bash
|
||||
# Edit docker-compose.temporal.yaml
|
||||
services:
|
||||
worker-rust:
|
||||
environment:
|
||||
LOG_LEVEL: DEBUG # Change from INFO to DEBUG
|
||||
|
||||
# Restart
|
||||
docker-compose -f docker-compose.temporal.yaml restart worker-rust
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Issue: Workflow stuck in "Running" state
|
||||
|
||||
**Debug:**
|
||||
1. Check Temporal UI for last completed activity
|
||||
2. Check worker logs for errors
|
||||
3. Check if worker is still running: `docker-compose -f docker-compose.temporal.yaml ps worker-rust`
|
||||
|
||||
**Solution:**
|
||||
- Worker may have crashed - restart it
|
||||
- Activity may be hanging - check for infinite loops or stuck network calls
|
||||
- Check worker resource limits: `docker stats fuzzforge-worker-rust`
|
||||
|
||||
### Issue: Import errors in workflow
|
||||
|
||||
**Debug:**
|
||||
1. Check worker logs for full error trace
|
||||
2. Check if module file exists:
|
||||
```bash
|
||||
docker exec fuzzforge-worker-rust ls /app/toolbox/modules/my_module/
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
- Ensure module is in correct directory
|
||||
- Check for syntax errors: `docker exec fuzzforge-worker-rust python3 -m py_compile /app/toolbox/modules/my_module/my_module.py`
|
||||
- Verify imports are correct
|
||||
|
||||
### Issue: Target file not found in worker
|
||||
|
||||
**Debug:**
|
||||
1. Check if target exists in MinIO console
|
||||
2. Check worker logs for download errors
|
||||
3. Verify target_id is correct
|
||||
|
||||
**Solution:**
|
||||
- Re-upload file via CLI
|
||||
- Check MinIO is running: `docker-compose -f docker-compose.temporal.yaml ps minio`
|
||||
- Check MinIO credentials in worker environment
|
||||
|
||||
---
|
||||
|
||||
## Performance Debugging
|
||||
|
||||
### Check Activity Duration
|
||||
|
||||
**In Temporal UI:**
|
||||
1. Open workflow execution
|
||||
2. Scroll through activities
|
||||
3. Each shows duration (e.g., "3.2s")
|
||||
4. Identify slow activities
|
||||
|
||||
### Monitor Resource Usage
|
||||
|
||||
```bash
|
||||
# Monitor worker resource usage
|
||||
docker stats fuzzforge-worker-rust
|
||||
|
||||
# Check worker logs for memory warnings
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | grep -i "memory\|oom"
|
||||
```
|
||||
|
||||
### Profile Workflow Execution
|
||||
|
||||
Add timing to your workflow:
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
@workflow.defn
|
||||
class MyWorkflow:
|
||||
@workflow.run
|
||||
async def run(self, target_id: str):
|
||||
start = time.time()
|
||||
|
||||
result1 = await activity1()
|
||||
workflow.logger.info(f"Activity1 took: {time.time() - start:.2f}s")
|
||||
|
||||
start = time.time()
|
||||
result2 = await activity2()
|
||||
workflow.logger.info(f"Activity2 took: {time.time() - start:.2f}s")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Debugging
|
||||
|
||||
### Enable Temporal Worker Debug Logs
|
||||
|
||||
```bash
|
||||
# Edit docker-compose.temporal.yaml
|
||||
services:
|
||||
worker-rust:
|
||||
environment:
|
||||
TEMPORAL_LOG_LEVEL: DEBUG
|
||||
LOG_LEVEL: DEBUG
|
||||
|
||||
# Restart
|
||||
docker-compose -f docker-compose.temporal.yaml restart worker-rust
|
||||
```
|
||||
|
||||
### Inspect Temporal Workflows via CLI
|
||||
|
||||
```bash
|
||||
# Install Temporal CLI
|
||||
docker exec fuzzforge-temporal tctl
|
||||
|
||||
# List workflows
|
||||
docker exec fuzzforge-temporal tctl workflow list
|
||||
|
||||
# Describe workflow
|
||||
docker exec fuzzforge-temporal tctl workflow describe -w security_assessment-abc123
|
||||
|
||||
# Show workflow history
|
||||
docker exec fuzzforge-temporal tctl workflow show -w security_assessment-abc123
|
||||
```
|
||||
|
||||
### Check Network Connectivity
|
||||
|
||||
```bash
|
||||
# From worker to Temporal
|
||||
docker exec fuzzforge-worker-rust ping temporal
|
||||
|
||||
# From worker to MinIO
|
||||
docker exec fuzzforge-worker-rust curl http://minio:9000
|
||||
|
||||
# From host to services
|
||||
curl http://localhost:8233 # Temporal UI
|
||||
curl http://localhost:9000 # MinIO
|
||||
curl http://localhost:8000/health # Backend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging Best Practices
|
||||
|
||||
1. **Always check Temporal UI first** - It shows the most complete execution history
|
||||
2. **Use structured logging** - Include workflow_id, target_id in log messages
|
||||
3. **Log at decision points** - Before/after each major operation
|
||||
4. **Keep worker logs** - They persist across workflow runs
|
||||
5. **Test modules in isolation** - Use `docker exec` to test before integrating
|
||||
6. **Use debug builds** - Enable DEBUG logging during development
|
||||
7. **Monitor resources** - Use `docker stats` to catch resource issues
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you're still stuck:
|
||||
|
||||
1. **Collect diagnostic info:**
|
||||
```bash
|
||||
# Save all logs
|
||||
docker-compose -f docker-compose.temporal.yaml logs > fuzzforge-logs.txt
|
||||
|
||||
# Check service status
|
||||
docker-compose -f docker-compose.temporal.yaml ps > service-status.txt
|
||||
```
|
||||
|
||||
2. **Check Temporal UI** and take screenshots of:
|
||||
- Workflow execution timeline
|
||||
- Failed activity details
|
||||
- Error messages
|
||||
|
||||
3. **Report issue** with:
|
||||
- Workflow name and run_id
|
||||
- Error messages from logs
|
||||
- Screenshots from Temporal UI
|
||||
- Steps to reproduce
|
||||
|
||||
---
|
||||
|
||||
**Happy debugging!** 🐛🔍
|
||||
@@ -97,6 +97,43 @@ If you prefer, you can use a systemd override to add the registry flag. See the
|
||||
|
||||
---
|
||||
|
||||
## Worker Profiles (Resource Optimization - v0.7.0)
|
||||
|
||||
FuzzForge workers use Docker Compose profiles to prevent auto-startup:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
worker-ossfuzz:
|
||||
profiles:
|
||||
- workers # For starting all workers
|
||||
- ossfuzz # For starting just this worker
|
||||
restart: "no" # Don't auto-restart
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
- **`docker-compose up -d`**: Workers DON'T start (saves ~6-7GB RAM)
|
||||
- **CLI workflows**: Workers start automatically on-demand
|
||||
- **Manual start**: `docker start fuzzforge-worker-ossfuzz`
|
||||
|
||||
### Resource Savings
|
||||
|
||||
| Command | Workers Started | RAM Usage |
|
||||
|---------|----------------|-----------|
|
||||
| `docker-compose up -d` | None (core only) | ~1.2 GB |
|
||||
| `ff workflow run ossfuzz_campaign .` | ossfuzz worker only | ~3-5 GB |
|
||||
| `docker-compose --profile workers up -d` | All workers | ~8 GB |
|
||||
|
||||
### Starting All Workers (Legacy Behavior)
|
||||
|
||||
If you prefer the old behavior where all workers start:
|
||||
|
||||
```bash
|
||||
docker-compose --profile workers up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues & How to Fix Them
|
||||
|
||||
### "x509: certificate signed by unknown authority"
|
||||
|
||||
@@ -10,15 +10,16 @@ Before diving into specific errors, let’s check the basics:
|
||||
|
||||
```bash
|
||||
# Check all FuzzForge services
|
||||
docker compose ps
|
||||
docker-compose -f docker-compose.temporal.yaml ps
|
||||
|
||||
# Verify Docker registry config
|
||||
# Verify Docker registry config (if using workflow registry)
|
||||
docker info | grep -i "insecure registries"
|
||||
|
||||
# Test service health endpoints
|
||||
curl http://localhost:8000/health
|
||||
curl http://localhost:4200
|
||||
curl http://localhost:5001/v2/
|
||||
curl http://localhost:8233 # Temporal Web UI
|
||||
curl http://localhost:9000 # MinIO API
|
||||
curl http://localhost:9001 # MinIO Console
|
||||
```
|
||||
|
||||
If any of these commands fail, note the error message and continue below.
|
||||
@@ -51,15 +52,17 @@ Docker is trying to use HTTPS for the local registry, but it’s set up for HTTP
|
||||
The registry isn’t running or the port is blocked.
|
||||
|
||||
**How to fix:**
|
||||
- Make sure the registry container is up:
|
||||
- Make sure the registry container is up (if using registry for workflow images):
|
||||
```bash
|
||||
docker compose ps registry
|
||||
docker-compose -f docker-compose.temporal.yaml ps registry
|
||||
```
|
||||
- Check logs for errors:
|
||||
```bash
|
||||
docker compose logs registry
|
||||
docker-compose -f docker-compose.temporal.yaml logs registry
|
||||
```
|
||||
- If port 5001 is in use, change it in `docker-compose.yaml` and your Docker config.
|
||||
- If port 5001 is in use, change it in `docker-compose.temporal.yaml` and your Docker config.
|
||||
|
||||
**Note:** With Temporal architecture, target files use MinIO (port 9000), not the registry.
|
||||
|
||||
### "no such host" error
|
||||
|
||||
@@ -74,31 +77,42 @@ Docker can’t resolve `localhost`.
|
||||
|
||||
## Workflow Execution Issues
|
||||
|
||||
### "mounts denied" or volume errors
|
||||
### Upload fails or file access errors
|
||||
|
||||
**What’s happening?**
|
||||
Docker can’t access the path you provided.
|
||||
**What's happening?**
|
||||
File upload to MinIO failed or worker can't download target.
|
||||
|
||||
**How to fix:**
|
||||
- Always use absolute paths.
|
||||
- On Docker Desktop, add your project directory to File Sharing.
|
||||
- Confirm the path exists and is readable.
|
||||
|
||||
### Workflow status is "Crashed" or "Late"
|
||||
|
||||
**What’s happening?**
|
||||
- "Crashed": Usually a registry, path, or tool error.
|
||||
- "Late": Worker is overloaded or system is slow.
|
||||
|
||||
**How to fix:**
|
||||
- Check logs for details:
|
||||
- Check MinIO is running:
|
||||
```bash
|
||||
docker compose logs prefect-worker | tail -50
|
||||
docker-compose -f docker-compose.temporal.yaml ps minio
|
||||
```
|
||||
- Check MinIO logs:
|
||||
```bash
|
||||
docker-compose -f docker-compose.temporal.yaml logs minio
|
||||
```
|
||||
- Verify MinIO is accessible:
|
||||
```bash
|
||||
curl http://localhost:9000
|
||||
```
|
||||
- Check file size (max 10GB by default).
|
||||
|
||||
### Workflow status is "Failed" or "Running" (stuck)
|
||||
|
||||
**What's happening?**
|
||||
- "Failed": Usually a target download, storage, or tool error.
|
||||
- "Running" (stuck): Worker is overloaded, target download failed, or worker crashed.
|
||||
|
||||
**How to fix:**
|
||||
- Check worker logs for details:
|
||||
```bash
|
||||
docker-compose -f docker-compose.temporal.yaml logs worker-rust | tail -50
|
||||
```
|
||||
- Check Temporal Web UI at http://localhost:8233 for detailed execution history
|
||||
- Restart services:
|
||||
```bash
|
||||
docker compose down
|
||||
docker compose up -d
|
||||
docker-compose -f docker-compose.temporal.yaml down
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
```
|
||||
- Reduce the number of concurrent workflows if your system is resource-constrained.
|
||||
|
||||
@@ -106,22 +120,23 @@ Docker can’t access the path you provided.
|
||||
|
||||
## Service Connectivity Issues
|
||||
|
||||
### Backend (port 8000) or Prefect UI (port 4200) not responding
|
||||
### Backend (port 8000) or Temporal UI (port 8233) not responding
|
||||
|
||||
**How to fix:**
|
||||
- Check if the service is running:
|
||||
```bash
|
||||
docker compose ps fuzzforge-backend
|
||||
docker compose ps prefect-server
|
||||
docker-compose -f docker-compose.temporal.yaml ps fuzzforge-backend
|
||||
docker-compose -f docker-compose.temporal.yaml ps temporal
|
||||
```
|
||||
- View logs for errors:
|
||||
```bash
|
||||
docker compose logs fuzzforge-backend --tail 50
|
||||
docker compose logs prefect-server --tail 20
|
||||
docker-compose -f docker-compose.temporal.yaml logs fuzzforge-backend --tail 50
|
||||
docker-compose -f docker-compose.temporal.yaml logs temporal --tail 20
|
||||
```
|
||||
- Restart the affected service:
|
||||
```bash
|
||||
docker compose restart fuzzforge-backend
|
||||
docker-compose -f docker-compose.temporal.yaml restart fuzzforge-backend
|
||||
docker-compose -f docker-compose.temporal.yaml restart temporal
|
||||
```
|
||||
|
||||
---
|
||||
@@ -197,13 +212,13 @@ Docker can’t access the path you provided.
|
||||
- Check Docker network configuration:
|
||||
```bash
|
||||
docker network ls
|
||||
docker network inspect fuzzforge_default
|
||||
docker network inspect fuzzforge-temporal_default
|
||||
```
|
||||
- Recreate the network:
|
||||
```bash
|
||||
docker compose down
|
||||
docker-compose -f docker-compose.temporal.yaml down
|
||||
docker network prune -f
|
||||
docker compose up -d
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
```
|
||||
|
||||
---
|
||||
@@ -229,10 +244,10 @@ Docker can’t access the path you provided.
|
||||
### Enable debug logging
|
||||
|
||||
```bash
|
||||
export PREFECT_LOGGING_LEVEL=DEBUG
|
||||
docker compose down
|
||||
docker compose up -d
|
||||
docker compose logs fuzzforge-backend -f
|
||||
export TEMPORAL_LOGGING_LEVEL=DEBUG
|
||||
docker-compose -f docker-compose.temporal.yaml down
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
docker-compose -f docker-compose.temporal.yaml logs fuzzforge-backend -f
|
||||
```
|
||||
|
||||
### Collect diagnostic info
|
||||
@@ -243,12 +258,12 @@ Save and run this script to gather info for support:
|
||||
#!/bin/bash
|
||||
echo "=== FuzzForge Diagnostics ==="
|
||||
date
|
||||
docker compose ps
|
||||
docker-compose -f docker-compose.temporal.yaml ps
|
||||
docker info | grep -A 5 -i "insecure registries"
|
||||
curl -s http://localhost:8000/health || echo "Backend unhealthy"
|
||||
curl -s http://localhost:4200 >/dev/null && echo "Prefect UI healthy" || echo "Prefect UI unhealthy"
|
||||
curl -s http://localhost:5001/v2/ >/dev/null && echo "Registry healthy" || echo "Registry unhealthy"
|
||||
docker compose logs --tail 10
|
||||
curl -s http://localhost:8233 >/dev/null && echo "Temporal UI healthy" || echo "Temporal UI unhealthy"
|
||||
curl -s http://localhost:9000 >/dev/null && echo "MinIO healthy" || echo "MinIO unhealthy"
|
||||
docker-compose -f docker-compose.temporal.yaml logs --tail 10
|
||||
```
|
||||
|
||||
### Still stuck?
|
||||
|
||||
@@ -85,24 +85,23 @@ docker pull localhost:5001/hello-world 2>/dev/null || echo "Registry not accessi
|
||||
Start all FuzzForge services:
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
docker-compose -f docker-compose.temporal.yaml up -d
|
||||
```
|
||||
|
||||
This will start 8 services:
|
||||
- **prefect-server**: Workflow orchestration server
|
||||
- **prefect-worker**: Executes workflows in Docker containers
|
||||
This will start 6+ services:
|
||||
- **temporal**: Workflow orchestration server (includes embedded PostgreSQL for dev)
|
||||
- **minio**: S3-compatible storage for uploaded targets and results
|
||||
- **minio-setup**: One-time setup for MinIO buckets (exits after setup)
|
||||
- **fuzzforge-backend**: FastAPI backend and workflow management
|
||||
- **postgres**: Metadata and workflow state storage
|
||||
- **redis**: Message broker and caching
|
||||
- **registry**: Local Docker registry for workflow images
|
||||
- **docker-proxy**: Secure Docker socket proxy
|
||||
- **prefect-services**: Additional Prefect services
|
||||
- **worker-rust**: Long-lived worker for Rust/native security analysis
|
||||
- **worker-android**: Long-lived worker for Android security analysis (if configured)
|
||||
- **worker-web**: Long-lived worker for web security analysis (if configured)
|
||||
|
||||
Wait for all services to be healthy (this may take 2-3 minutes on first startup):
|
||||
|
||||
```bash
|
||||
# Check service health
|
||||
docker compose ps
|
||||
docker-compose -f docker-compose.temporal.yaml ps
|
||||
|
||||
# Verify FuzzForge is ready
|
||||
curl http://localhost:8000/health
|
||||
@@ -154,33 +153,41 @@ You should see 6 production workflows:
|
||||
|
||||
## Step 6: Run Your First Workflow
|
||||
|
||||
Let's run a static analysis workflow on one of the included vulnerable test projects.
|
||||
Let's run a security assessment workflow on one of the included vulnerable test projects.
|
||||
|
||||
### Using the CLI (Recommended):
|
||||
|
||||
```bash
|
||||
# Navigate to a test project
|
||||
cd /path/to/fuzzforge/test_projects/static_analysis_vulnerable
|
||||
cd /path/to/fuzzforge/test_projects/vulnerable_app
|
||||
|
||||
# Submit the workflow
|
||||
fuzzforge runs submit static_analysis_scan .
|
||||
# Submit the workflow - CLI automatically uploads the local directory
|
||||
fuzzforge workflow run security_assessment .
|
||||
|
||||
# The CLI will:
|
||||
# 1. Detect that '.' is a local directory
|
||||
# 2. Create a compressed tarball of the directory
|
||||
# 3. Upload it to the backend via HTTP
|
||||
# 4. The backend stores it in MinIO
|
||||
# 5. The worker downloads it when ready to analyze
|
||||
|
||||
# Monitor the workflow
|
||||
fuzzforge runs status <run-id>
|
||||
fuzzforge workflow status <run-id>
|
||||
|
||||
# View results when complete
|
||||
fuzzforge findings get <run-id>
|
||||
fuzzforge finding <run-id>
|
||||
```
|
||||
|
||||
### Using the API:
|
||||
|
||||
For local files, you can use the upload endpoint:
|
||||
|
||||
```bash
|
||||
# Submit workflow
|
||||
curl -X POST "http://localhost:8000/workflows/static_analysis_scan/submit" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"target_path": "/path/to/your/project"
|
||||
}'
|
||||
# Create tarball and upload
|
||||
tar -czf project.tar.gz /path/to/your/project
|
||||
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
|
||||
-F "file=@project.tar.gz" \
|
||||
-F "volume_mode=ro"
|
||||
|
||||
# Check status
|
||||
curl "http://localhost:8000/runs/{run-id}/status"
|
||||
@@ -189,6 +196,8 @@ curl "http://localhost:8000/runs/{run-id}/status"
|
||||
curl "http://localhost:8000/runs/{run-id}/findings"
|
||||
```
|
||||
|
||||
**Note**: The CLI handles file upload automatically. For remote workflows where the target path exists on the backend server, you can still use path-based submission for backward compatibility.
|
||||
|
||||
## Step 7: Understanding the Results
|
||||
|
||||
The workflow will complete in 30-60 seconds and return results in SARIF format. For the test project, you should see:
|
||||
@@ -216,13 +225,19 @@ Example output:
|
||||
}
|
||||
```
|
||||
|
||||
## Step 8: Access the Prefect Dashboard
|
||||
## Step 8: Access the Temporal Web UI
|
||||
|
||||
You can monitor workflow execution in real-time using the Prefect dashboard:
|
||||
You can monitor workflow execution in real-time using the Temporal Web UI:
|
||||
|
||||
1. Open http://localhost:4200 in your browser
|
||||
2. Navigate to "Flow Runs" to see workflow executions
|
||||
3. Click on a run to see detailed logs and execution graph
|
||||
1. Open http://localhost:8233 in your browser
|
||||
2. Navigate to "Workflows" to see workflow executions
|
||||
3. Click on a workflow to see detailed execution history and activity results
|
||||
|
||||
You can also access the MinIO console to view uploaded targets:
|
||||
|
||||
1. Open http://localhost:9001 in your browser
|
||||
2. Login with: `fuzzforge` / `fuzzforge123`
|
||||
3. Browse the `targets` bucket to see uploaded files
|
||||
|
||||
## Next Steps
|
||||
|
||||
@@ -242,9 +257,10 @@ Congratulations! You've successfully:
|
||||
If you encounter problems:
|
||||
|
||||
1. **Workflow crashes with registry errors**: Check Docker insecure registry configuration
|
||||
2. **Services won't start**: Ensure ports 4200, 5001, 8000 are available
|
||||
2. **Services won't start**: Ensure ports 8000, 8233, 9000, 9001 are available
|
||||
3. **No findings returned**: Verify the target path contains analyzable code files
|
||||
4. **CLI not found**: Ensure Python/pip installation path is in your PATH
|
||||
5. **Upload fails**: Check that MinIO is running and accessible at http://localhost:9000
|
||||
|
||||
See the [Troubleshooting Guide](../how-to/troubleshooting.md) for detailed solutions.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user