fuzzforge_ai/backend/README.md

# FuzzForge Backend

A stateless API server for security testing workflow orchestration using Prefect. This system dynamically discovers workflows, executes them in isolated Docker containers with volume mounting, and returns findings in SARIF format.

## Architecture Overview

### Core Components

1. **Workflow Discovery System**: Automatically discovers workflows at startup
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
3. **Prefect Integration**: Handles container orchestration, workflow execution, and monitoring
4. **Volume Mounting**: Secure file access with configurable permissions (ro/rw)
5. **SARIF Output**: Standardized security findings format

### Key Features

- **Stateless**: No persistent data, fully scalable
- **Generic**: No hardcoded workflows, automatic discovery
- **Isolated**: Each workflow runs in its own Docker container
- **Extensible**: Easy to add new workflows and modules
- **Secure**: Read-only volume mounts by default, path validation
- **Observable**: Comprehensive logging and status tracking

## Quick Start

### Prerequisites

- Docker and Docker Compose

### Installation

From the project root, start all services:

```bash
docker-compose up -d
```

This will start:
- Prefect server (API at http://localhost:4200/api)
- PostgreSQL database
- Redis cache
- Docker registry (port 5001)
- Prefect worker (for running workflows)
- FuzzForge backend API (port 8000)
- FuzzForge MCP server (port 8010)

**Note**: The Prefect UI at http://localhost:4200 is not currently accessible from the host due to the API being configured for inter-container communication. Use the REST API or MCP interface instead.

## API Endpoints

### Workflows

- `GET /workflows` - List all discovered workflows
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
- `POST /workflows/{name}/submit` - Submit a workflow for execution

### Runs

- `GET /runs/{run_id}/status` - Get run status
- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name

## Workflow Structure

Each workflow must have:

```
toolbox/workflows/{workflow_name}/
   workflow.py       # Prefect flow definition
   metadata.yaml     # Mandatory metadata (parameters, version, etc.)
   Dockerfile        # Optional custom container definition
   requirements.txt  # Optional Python dependencies
```

### Example metadata.yaml

```yaml
name: security_assessment
version: "1.0.0"
description: "Comprehensive security analysis workflow"
author: "FuzzForge Team"
category: "comprehensive"
tags:
  - "security"
  - "analysis"
  - "comprehensive"

supported_volume_modes:
  - "ro"
  - "rw"

requirements:
  tools:
    - "file_scanner"
    - "security_analyzer"
    - "sarif_reporter"
  resources:
    memory: "512Mi"
    cpu: "500m"
    timeout: 1800

has_docker: true

parameters:
  type: object
  properties:
    target_path:
      type: string
      default: "/workspace"
      description: "Path to analyze"
    volume_mode:
      type: string
      enum: ["ro", "rw"]
      default: "ro"
      description: "Volume mount mode"
    scanner_config:
      type: object
      description: "Scanner configuration"
      properties:
        max_file_size:
          type: integer
          description: "Maximum file size to scan (bytes)"

output_schema:
  type: object
  properties:
    sarif:
      type: object
      description: "SARIF-formatted security findings"
    summary:
      type: object
      description: "Scan execution summary"
```

### Metadata Field Descriptions

- **name**: Workflow identifier (must match directory name)
- **version**: Semantic version (x.y.z format)
- **description**: Human-readable description of the workflow
- **author**: Workflow author/maintainer
- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
- **tags**: Array of descriptive tags for categorization
- **requirements.tools**: Required security tools that the workflow uses
- **requirements.resources**: Resource requirements enforced at runtime:
  - `memory`: Memory limit (e.g., "512Mi", "1Gi")
  - `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
  - `timeout`: Maximum execution time in seconds
- **parameters**: JSON Schema object defining workflow parameters
- **output_schema**: Expected output format (typically SARIF)

### Resource Requirements

Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:

```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
  -H "Content-Type: application/json" \
  -d '{
    "target_path": "/tmp/project",
    "volume_mode": "ro",
    "resource_limits": {
      "memory_limit": "1Gi",
      "cpu_limit": "1"
    }
  }'
```

Resource precedence: User limits > Workflow requirements > System defaults

## Module Development

Modules implement the `BaseModule` interface:

```python
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult

class MyModule(BaseModule):
    def get_metadata(self) -> ModuleMetadata:
        return ModuleMetadata(
            name="my_module",
            version="1.0.0",
            description="Module description",
            category="scanner",
            ...
        )

    async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
        # Module logic here
        findings = [...]
        return self.create_result(findings=findings)

    def validate_config(self, config: Dict) -> bool:
        # Validate configuration
        return True
```

## Submitting a Workflow

```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
  -H "Content-Type: application/json" \
  -d '{
    "target_path": "/home/user/project",
    "volume_mode": "ro",
    "parameters": {
      "scanner_config": {"patterns": ["*.py"]},
      "analyzer_config": {"check_secrets": true}
    }
  }'
```

## Getting Findings

```bash
curl "http://localhost:8000/runs/{run_id}/findings"
```

Returns SARIF-formatted findings:

```json
{
  "workflow": "security_assessment",
  "run_id": "abc-123",
  "sarif": {
    "version": "2.1.0",
    "runs": [{
      "tool": {...},
      "results": [...]
    }]
  }
}
```

## Security Considerations

1. **Volume Mounting**: Only allowed directories can be mounted
2. **Read-Only Default**: Volumes mounted as read-only unless explicitly set
3. **Container Isolation**: Each workflow runs in an isolated container
4. **Resource Limits**: Can set CPU/memory limits via Prefect
5. **Network Isolation**: Containers use bridge networking

## Development

### Adding a New Workflow

1. Create directory: `toolbox/workflows/my_workflow/`
2. Add `workflow.py` with a Prefect flow
3. Add mandatory `metadata.yaml`
4. Restart backend: `docker-compose restart fuzzforge-backend`

### Adding a New Module

1. Create module in `toolbox/modules/{category}/`
2. Implement `BaseModule` interface
3. Use in workflows via import