Files
fuzzforge_ai/backend/README.md
2025-09-30 15:18:53 +02:00

257 lines
7.0 KiB
Markdown

# FuzzForge Backend
A stateless API server for security testing workflow orchestration using Prefect. This system dynamically discovers workflows, executes them in isolated Docker containers with volume mounting, and returns findings in SARIF format.
## Architecture Overview
### Core Components
1. **Workflow Discovery System**: Automatically discovers workflows at startup
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
3. **Prefect Integration**: Handles container orchestration, workflow execution, and monitoring
4. **Volume Mounting**: Secure file access with configurable permissions (ro/rw)
5. **SARIF Output**: Standardized security findings format
### Key Features
- **Stateless**: No persistent data, fully scalable
- **Generic**: No hardcoded workflows, automatic discovery
- **Isolated**: Each workflow runs in its own Docker container
- **Extensible**: Easy to add new workflows and modules
- **Secure**: Read-only volume mounts by default, path validation
- **Observable**: Comprehensive logging and status tracking
## Quick Start
### Prerequisites
- Docker and Docker Compose
### Installation
From the project root, start all services:
```bash
docker-compose up -d
```
This will start:
- Prefect server (API at http://localhost:4200/api)
- PostgreSQL database
- Redis cache
- Docker registry (port 5001)
- Prefect worker (for running workflows)
- FuzzForge backend API (port 8000)
- FuzzForge MCP server (port 8010)
**Note**: The Prefect UI at http://localhost:4200 is not currently accessible from the host due to the API being configured for inter-container communication. Use the REST API or MCP interface instead.
## API Endpoints
### Workflows
- `GET /workflows` - List all discovered workflows
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
- `POST /workflows/{name}/submit` - Submit a workflow for execution
### Runs
- `GET /runs/{run_id}/status` - Get run status
- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name
## Workflow Structure
Each workflow must have:
```
toolbox/workflows/{workflow_name}/
workflow.py # Prefect flow definition
metadata.yaml # Mandatory metadata (parameters, version, etc.)
Dockerfile # Optional custom container definition
requirements.txt # Optional Python dependencies
```
### Example metadata.yaml
```yaml
name: security_assessment
version: "1.0.0"
description: "Comprehensive security analysis workflow"
author: "FuzzForge Team"
category: "comprehensive"
tags:
- "security"
- "analysis"
- "comprehensive"
supported_volume_modes:
- "ro"
- "rw"
requirements:
tools:
- "file_scanner"
- "security_analyzer"
- "sarif_reporter"
resources:
memory: "512Mi"
cpu: "500m"
timeout: 1800
has_docker: true
parameters:
type: object
properties:
target_path:
type: string
default: "/workspace"
description: "Path to analyze"
volume_mode:
type: string
enum: ["ro", "rw"]
default: "ro"
description: "Volume mount mode"
scanner_config:
type: object
description: "Scanner configuration"
properties:
max_file_size:
type: integer
description: "Maximum file size to scan (bytes)"
output_schema:
type: object
properties:
sarif:
type: object
description: "SARIF-formatted security findings"
summary:
type: object
description: "Scan execution summary"
```
### Metadata Field Descriptions
- **name**: Workflow identifier (must match directory name)
- **version**: Semantic version (x.y.z format)
- **description**: Human-readable description of the workflow
- **author**: Workflow author/maintainer
- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
- **tags**: Array of descriptive tags for categorization
- **requirements.tools**: Required security tools that the workflow uses
- **requirements.resources**: Resource requirements enforced at runtime:
- `memory`: Memory limit (e.g., "512Mi", "1Gi")
- `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
- `timeout`: Maximum execution time in seconds
- **parameters**: JSON Schema object defining workflow parameters
- **output_schema**: Expected output format (typically SARIF)
### Resource Requirements
Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:
```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/tmp/project",
"volume_mode": "ro",
"resource_limits": {
"memory_limit": "1Gi",
"cpu_limit": "1"
}
}'
```
Resource precedence: User limits > Workflow requirements > System defaults
## Module Development
Modules implement the `BaseModule` interface:
```python
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
class MyModule(BaseModule):
def get_metadata(self) -> ModuleMetadata:
return ModuleMetadata(
name="my_module",
version="1.0.0",
description="Module description",
category="scanner",
...
)
async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
# Module logic here
findings = [...]
return self.create_result(findings=findings)
def validate_config(self, config: Dict) -> bool:
# Validate configuration
return True
```
## Submitting a Workflow
```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
-H "Content-Type: application/json" \
-d '{
"target_path": "/home/user/project",
"volume_mode": "ro",
"parameters": {
"scanner_config": {"patterns": ["*.py"]},
"analyzer_config": {"check_secrets": true}
}
}'
```
## Getting Findings
```bash
curl "http://localhost:8000/runs/{run_id}/findings"
```
Returns SARIF-formatted findings:
```json
{
"workflow": "security_assessment",
"run_id": "abc-123",
"sarif": {
"version": "2.1.0",
"runs": [{
"tool": {...},
"results": [...]
}]
}
}
```
## Security Considerations
1. **Volume Mounting**: Only allowed directories can be mounted
2. **Read-Only Default**: Volumes mounted as read-only unless explicitly set
3. **Container Isolation**: Each workflow runs in an isolated container
4. **Resource Limits**: Can set CPU/memory limits via Prefect
5. **Network Isolation**: Containers use bridge networking
## Development
### Adding a New Workflow
1. Create directory: `toolbox/workflows/my_workflow/`
2. Add `workflow.py` with a Prefect flow
3. Add mandatory `metadata.yaml`
4. Restart backend: `docker-compose restart fuzzforge-backend`
### Adding a New Module
1. Create module in `toolbox/modules/{category}/`
2. Implement `BaseModule` interface
3. Use in workflows via import