fuzzforge_ai/backend/README.md

# FuzzForge Backend

A stateless API server for security testing workflow orchestration using Temporal. This system dynamically discovers workflows, executes them in isolated worker environments, and returns findings in SARIF format.

## Architecture Overview

### Core Components

1. **Workflow Discovery System**: Automatically discovers workflows at startup
2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
3. **Temporal Integration**: Handles workflow orchestration, execution, and monitoring with vertical workers
4. **File Upload & Storage**: HTTP multipart upload to MinIO for target files
5. **SARIF Output**: Standardized security findings format

### Key Features

- **Stateless**: No persistent data, fully scalable
- **Generic**: No hardcoded workflows, automatic discovery
- **Isolated**: Each workflow runs in specialized vertical workers
- **Extensible**: Easy to add new workflows and modules
- **Secure**: File upload with MinIO storage, automatic cleanup via lifecycle policies
- **Observable**: Comprehensive logging and status tracking

## Quick Start

### Prerequisites

- Docker and Docker Compose

### Installation

From the project root, start all services:

```bash
docker-compose -f docker-compose.temporal.yaml up -d
```

This will start:
- Temporal server (Web UI at http://localhost:8233, gRPC at :7233)
- MinIO (S3 storage at http://localhost:9000, Console at http://localhost:9001)
- PostgreSQL database (for Temporal state)
- Vertical workers (worker-rust, worker-android, worker-web, etc.)
- FuzzForge backend API (port 8000)

**Note**: MinIO console login: `fuzzforge` / `fuzzforge123`

## API Endpoints

### Workflows

- `GET /workflows` - List all discovered workflows
- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
- `GET /workflows/{name}/parameters` - Get workflow parameter schema
- `GET /workflows/metadata/schema` - Get metadata.yaml schema
- `POST /workflows/{name}/submit` - Submit a workflow for execution (path-based, legacy)
- `POST /workflows/{name}/upload-and-submit` - **Upload local files and submit workflow** (recommended)

### Runs

- `GET /runs/{run_id}/status` - Get run status
- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name

## Workflow Structure

Each workflow must have:

```
toolbox/workflows/{workflow_name}/
   workflow.py       # Temporal workflow definition
   metadata.yaml     # Mandatory metadata (parameters, version, vertical, etc.)
   requirements.txt  # Optional Python dependencies (installed in vertical worker)
```

**Note**: With Temporal architecture, workflows run in pre-built vertical workers (e.g., `worker-rust`, `worker-android`), not individual Docker containers. The workflow code is mounted as a volume and discovered at runtime.

### Example metadata.yaml

```yaml
name: security_assessment
version: "1.0.0"
description: "Comprehensive security analysis workflow"
author: "FuzzForge Team"
category: "comprehensive"
vertical: "rust"  # Routes to worker-rust
tags:
  - "security"
  - "analysis"
  - "comprehensive"

supported_volume_modes:
  - "ro"
  - "rw"

requirements:
  tools:
    - "file_scanner"
    - "security_analyzer"
    - "sarif_reporter"
  resources:
    memory: "512Mi"
    cpu: "500m"
    timeout: 1800

has_docker: true

parameters:
  type: object
  properties:
    target_path:
      type: string
      default: "/workspace"
      description: "Path to analyze"
    volume_mode:
      type: string
      enum: ["ro", "rw"]
      default: "ro"
      description: "Volume mount mode"
    scanner_config:
      type: object
      description: "Scanner configuration"
      properties:
        max_file_size:
          type: integer
          description: "Maximum file size to scan (bytes)"

output_schema:
  type: object
  properties:
    sarif:
      type: object
      description: "SARIF-formatted security findings"
    summary:
      type: object
      description: "Scan execution summary"
```

### Metadata Field Descriptions

- **name**: Workflow identifier (must match directory name)
- **version**: Semantic version (x.y.z format)
- **description**: Human-readable description of the workflow
- **author**: Workflow author/maintainer
- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
- **tags**: Array of descriptive tags for categorization
- **requirements.tools**: Required security tools that the workflow uses
- **requirements.resources**: Resource requirements enforced at runtime:
  - `memory`: Memory limit (e.g., "512Mi", "1Gi")
  - `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
  - `timeout`: Maximum execution time in seconds
- **parameters**: JSON Schema object defining workflow parameters
- **output_schema**: Expected output format (typically SARIF)

### Resource Requirements

Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:

```bash
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
  -H "Content-Type: application/json" \
  -d '{
    "target_path": "/tmp/project",
    "volume_mode": "ro",
    "resource_limits": {
      "memory_limit": "1Gi",
      "cpu_limit": "1"
    }
  }'
```

Resource precedence: User limits > Workflow requirements > System defaults

## File Upload and Target Access

### Upload Endpoint

The backend provides an upload endpoint for submitting workflows with local files:

```
POST /workflows/{workflow_name}/upload-and-submit
Content-Type: multipart/form-data

Parameters:
  file: File upload (supports .tar.gz for directories)
  parameters: JSON string of workflow parameters (optional)
  volume_mode: "ro" or "rw" (default: "ro")
  timeout: Execution timeout in seconds (optional)
```

Example using curl:

```bash
# Upload a directory (create tarball first)
tar -czf project.tar.gz /path/to/project
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
  -F "file=@project.tar.gz" \
  -F "parameters={\"check_secrets\":true}" \
  -F "volume_mode=ro"

# Upload a single file
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
  -F "file=@binary.elf" \
  -F "volume_mode=ro"
```

### Storage Flow

1. **CLI/API uploads file** via HTTP multipart
2. **Backend receives file** and streams to temporary location (max 10GB)
3. **Backend uploads to MinIO** with generated `target_id`
4. **Workflow is submitted** to Temporal with `target_id`
5. **Worker downloads target** from MinIO to local cache
6. **Workflow processes target** from cache
7. **MinIO lifecycle policy** deletes files after 7 days

### Advantages

- **No host filesystem access required** - workers can run anywhere
- **Automatic cleanup** - lifecycle policies prevent disk exhaustion
- **Caching** - repeated workflows reuse cached targets
- **Multi-host ready** - targets accessible from any worker
- **Secure** - isolated storage, no arbitrary host path access

## Module Development

Modules implement the `BaseModule` interface:

```python
from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult

class MyModule(BaseModule):
    def get_metadata(self) -> ModuleMetadata:
        return ModuleMetadata(
            name="my_module",
            version="1.0.0",
            description="Module description",
            category="scanner",
            ...
        )

    async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
        # Module logic here
        findings = [...]
        return self.create_result(findings=findings)

    def validate_config(self, config: Dict) -> bool:
        # Validate configuration
        return True
```

## Submitting a Workflow

### With File Upload (Recommended)

```bash
# Automatic tarball and upload
tar -czf project.tar.gz /home/user/project
curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
  -F "file=@project.tar.gz" \
  -F "parameters={\"scanner_config\":{\"patterns\":[\"*.py\"]},\"analyzer_config\":{\"check_secrets\":true}}" \
  -F "volume_mode=ro"
```

### Legacy Path-Based Submission

```bash
# Only works if backend and target are on same machine
curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
  -H "Content-Type: application/json" \
  -d '{
    "target_path": "/home/user/project",
    "volume_mode": "ro",
    "parameters": {
      "scanner_config": {"patterns": ["*.py"]},
      "analyzer_config": {"check_secrets": true}
    }
  }'
```

## Getting Findings

```bash
curl "http://localhost:8000/runs/{run_id}/findings"
```

Returns SARIF-formatted findings:

```json
{
  "workflow": "security_assessment",
  "run_id": "abc-123",
  "sarif": {
    "version": "2.1.0",
    "runs": [{
      "tool": {...},
      "results": [...]
    }]
  }
}
```

## Security Considerations

1. **File Upload Security**: Files uploaded to MinIO with isolated storage
2. **Read-Only Default**: Target files accessed as read-only unless explicitly set
3. **Worker Isolation**: Each workflow runs in isolated vertical workers
4. **Resource Limits**: Can set CPU/memory limits per worker
5. **Automatic Cleanup**: MinIO lifecycle policies delete old files after 7 days

## Development

### Adding a New Workflow

1. Create directory: `toolbox/workflows/my_workflow/`
2. Add `workflow.py` with a Temporal workflow (using `@workflow.defn`)
3. Add mandatory `metadata.yaml` with `vertical` field
4. Restart the appropriate worker: `docker-compose -f docker-compose.temporal.yaml restart worker-rust`
5. Worker will automatically discover and register the new workflow

### Adding a New Module

1. Create module in `toolbox/modules/{category}/`
2. Implement `BaseModule` interface
3. Use in workflows via import

### Adding a New Vertical Worker

1. Create worker directory: `workers/{vertical}/`
2. Create `Dockerfile` with required tools
3. Add worker to `docker-compose.temporal.yaml`
4. Worker will automatically discover workflows with matching `vertical` in metadata