Initial commit

This commit is contained in:
Tanguy Duhamel
2025-09-29 21:26:41 +02:00
parent f0fd367ed8
commit 323a434c73
208 changed files with 72069 additions and 53 deletions
+8
View File
@@ -0,0 +1,8 @@
{
"label": "Concept",
"position": 2,
"link": {
"type": "generated-index",
"description": "Concept pages that are understanding-oriented."
}
}
+214
View File
@@ -0,0 +1,214 @@
# Architecture
FuzzForge is a distributed, containerized platform for security analysis workflows. Its architecture is designed for scalability, isolation, and reliability, drawing on modern patterns like microservices and orchestration. This page explains the core architectural concepts behind FuzzForge, meaning what the main components are, how they interact, and why the system is structured this way.
:::warning
FuzzForges architecture is evolving. While the long-term goal is a hexagonal architecture, the current implementation is still in transition. Expect changes as the platform matures.
:::
---
## Why This Architecture?
FuzzForges architecture is shaped by several key goals:
- **Scalability:** Handle many workflows in parallel, scaling up or down as needed.
- **Isolation:** Run each workflow in its own secure environment, minimizing risk.
- **Reliability:** Ensure that failures in one part of the system dont bring down the whole platform.
- **Extensibility:** Make it easy to add new workflows, tools, or integrations.
## High-Level System Overview
At a glance, FuzzForge is organized into several layers, each with a clear responsibility:
- **Client Layer:** Where users and external systems interact (CLI, API clients, MCP server).
- **API Layer:** The FastAPI backend, which exposes REST endpoints and manages requests.
- **Orchestration Layer:** Prefect server and workers, which schedule and execute workflows.
- **Execution Layer:** Docker Engine and containers, where workflows actually run.
- **Storage Layer:** PostgreSQL database, Docker volumes, and a result cache for persistence.
Heres a simplified view of how these layers fit together:
```mermaid
graph TB
subgraph "Client Layer"
CLI[CLI Client]
API_Client[API Client]
MCP[MCP Server]
end
subgraph "API Layer"
FastAPI[FastAPI Backend]
Router[Route Handlers]
Middleware[Middleware Stack]
end
subgraph "Orchestration Layer"
Prefect[Prefect Server]
Workers[Prefect Workers]
Scheduler[Workflow Scheduler]
end
subgraph "Execution Layer"
Docker[Docker Engine]
Containers[Workflow Containers]
Registry[Docker Registry]
end
subgraph "Storage Layer"
PostgreSQL[PostgreSQL Database]
Volumes[Docker Volumes]
Cache[Result Cache]
end
CLI --> FastAPI
API_Client --> FastAPI
MCP --> FastAPI
FastAPI --> Router
Router --> Middleware
Middleware --> Prefect
Prefect --> Workers
Workers --> Scheduler
Scheduler --> Docker
Docker --> Containers
Docker --> Registry
Containers --> Volumes
FastAPI --> PostgreSQL
Workers --> PostgreSQL
Containers --> Cache
```
## What Are the Main Components?
### API Layer
- **FastAPI Backend:** The main entry point for users and clients. Handles authentication, request validation, and exposes endpoints for workflow management, results, and health checks.
- **Middleware Stack:** Manages API keys, user authentication, CORS, logging, and error handling.
### Orchestration Layer
- **Prefect Server:** Schedules and tracks workflows, backed by PostgreSQL.
- **Prefect Workers:** Execute workflows in Docker containers. Can be scaled horizontally.
- **Workflow Scheduler:** Balances load, manages priorities, and enforces resource limits.
### Execution Layer
- **Docker Engine:** Runs workflow containers, enforcing isolation and resource limits.
- **Workflow Containers:** Custom images with security tools, mounting code and results volumes.
- **Docker Registry:** Stores and distributes workflow images.
### Storage Layer
- **PostgreSQL Database:** Stores workflow metadata, state, and results.
- **Docker Volumes:** Persist workflow results and artifacts.
- **Result Cache:** Speeds up access to recent results, with in-memory and disk persistence.
## How Does Data Flow Through the System?
### Submitting a Workflow
1. **User submits a workflow** via CLI or API client.
2. **API validates** the request and creates a deployment in Prefect.
3. **Prefect schedules** the workflow and assigns it to a worker.
4. **Worker launches a container** to run the workflow.
5. **Results are stored** in Docker volumes and the database.
6. **Status updates** flow back through Prefect and the API to the user.
```mermaid
sequenceDiagram
participant User
participant API
participant Prefect
participant Worker
participant Container
participant Storage
User->>API: Submit workflow
API->>API: Validate parameters
API->>Prefect: Create deployment
Prefect->>Worker: Schedule execution
Worker->>Container: Create and start
Container->>Container: Execute security tools
Container->>Storage: Store SARIF results
Worker->>Prefect: Update status
Prefect->>API: Workflow complete
API->>User: Return results
```
### Retrieving Results
1. **User requests status or results** via the API.
2. **API queries the database** for workflow metadata.
3. **If complete,** results are fetched from storage and returned to the user.
## How Do Services Communicate?
- **Internally:** FastAPI talks to Prefect via REST; Prefect coordinates with workers over HTTP; workers manage containers via the Docker Engine API. All core services use pooled connections to PostgreSQL.
- **Externally:** Users interact via CLI or API clients (HTTP REST). The MCP server can automate workflows via its own protocol.
## How Is Security Enforced?
- **Container Isolation:** Each workflow runs in its own Docker network, as a non-root user, with strict resource limits and only necessary volumes mounted.
- **Volume Security:** Source code is mounted read-only; results are written to dedicated, temporary volumes.
- **API Security:** All endpoints require API keys, validate inputs, enforce rate limits, and log requests for auditing.
## How Does FuzzForge Scale?
- **Horizontally:** Add more Prefect workers to handle more workflows in parallel. Scale the database with read replicas and connection pooling.
- **Vertically:** Adjust CPU and memory limits for containers and services as needed.
Example Docker Compose scaling:
```yaml
services:
prefect-worker:
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'
reservations:
memory: 1G
cpus: '0.5'
```
## How Is It Deployed?
- **Development:** All services run via Docker Compose—backend, Prefect, workers, database, and registry.
- **Production:** Add load balancers, database clustering, and multiple worker instances for high availability. Health checks, metrics, and centralized logging support monitoring and troubleshooting.
## How Is Configuration Managed?
- **Environment Variables:** Control core settings like database URLs, registry location, and Prefect API endpoints.
- **Service Discovery:** Docker Composes internal DNS lets services find each other by name, with consistent port mapping and health check endpoints.
Example configuration:
```bash
COMPOSE_PROJECT_NAME=fuzzforge_alpha
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/fuzzforge
PREFECT_API_URL=http://prefect-server:4200/api
DOCKER_REGISTRY=localhost:5001
DOCKER_INSECURE_REGISTRY=true
```
## How Are Failures Handled?
- **Failure Isolation:** Each service is independent; failures dont cascade. Circuit breakers and graceful degradation keep the system stable.
- **Recovery:** Automatic retries with backoff for transient errors, dead letter queues for persistent failures, and workflow state recovery after restarts.
## Implementation Details
- **Tech Stack:** FastAPI (Python async), Prefect 3.x, Docker, Docker Compose, PostgreSQL (asyncpg), and Docker networking.
- **Performance:** Workflows start in 25 seconds; results are retrieved quickly thanks to caching and database indexing.
- **Extensibility:** Add new workflows by deploying new Docker images; extend the API with new endpoints; configure storage backends as needed.
---
## In Summary
FuzzForges architecture is designed to be robust, scalable, and secure—ready to handle demanding security analysis workflows in a modern, distributed environment. As the platform evolves, expect even more modularity and flexibility, making it easier to adapt to new requirements and technologies.
+20
View File
@@ -0,0 +1,20 @@
# {Concept Title}
{Brief introduction of the concept, including its origin and general purpose.}
## Purpose
- {The primary purpose and its relevance in its field.}
## Common Usage
- {Usage 1}: {Brief description.}
- {Usage 2}: {Brief description.}
## Benefits
- {Key benefit and why it's preferred in certain scenarios.}
## Conclusion
{Summary of its importance and role in its respective field.}
+217
View File
@@ -0,0 +1,217 @@
# Docker Containers in FuzzForge: Concept and Design
Docker containers are at the heart of FuzzForges execution model. They provide the isolation, consistency, and flexibility needed to run security workflows reliably—no matter where FuzzForge is deployed. This page explains the core concepts behind container usage in FuzzForge, why containers are used, and how they shape the platforms behavior.
---
## Why Use Docker Containers?
FuzzForge relies on Docker containers for several key reasons:
- **Isolation:** Each workflow runs in its own container, so tools and processes cant interfere with each other or the host.
- **Consistency:** The environment inside a container is always the same, regardless of the underlying system.
- **Security:** Containers restrict access to host resources and run as non-root users.
- **Reproducibility:** Results are deterministic, since the environment is controlled and versioned.
- **Scalability:** Containers can be started, stopped, and scaled up or down as needed.
---
## How Does FuzzForge Use Containers?
### The Container Model
Every workflow in FuzzForge is executed inside a Docker container. Heres what that means in practice:
- **Workflow containers** are built from language-specific base images (like Python or Node.js), with security tools and workflow code pre-installed.
- **Infrastructure containers** (API server, Prefect, database) use official images and are configured for the platforms needs.
### Container Lifecycle: From Build to Cleanup
The lifecycle of a workflow container looks like this:
1. **Image Build:** A Docker image is built with all required tools and code.
2. **Image Push/Pull:** The image is pushed to (and later pulled from) a local or remote registry.
3. **Container Creation:** The container is created with the right volumes and environment.
4. **Execution:** The workflow runs inside the container.
5. **Result Storage:** Results are written to mounted volumes.
6. **Cleanup:** The container and any temporary data are removed.
```mermaid
graph TB
Build[Build Image] --> Push[Push to Registry]
Push --> Pull[Pull Image]
Pull --> Create[Create Container]
Create --> Mount[Mount Volumes]
Mount --> Start[Start Container]
Start --> Execute[Run Workflow]
Execute --> Results[Store Results]
Execute --> Stop[Stop Container]
Stop --> Cleanup[Cleanup Data]
Cleanup --> Remove[Remove Container]
```
---
## Whats Inside a Workflow Container?
A typical workflow container is structured like this:
- **Base Image:** Usually a slim language image (e.g., `python:3.11-slim`).
- **System Dependencies:** Installed as needed (e.g., `git`, `curl`).
- **Security Tools:** Pre-installed (e.g., `semgrep`, `bandit`, `safety`).
- **Workflow Code:** Copied into the container.
- **Non-root User:** Created for execution.
- **Entrypoint:** Runs the workflow code.
Example Dockerfile snippet:
```dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
RUN pip install semgrep bandit safety
COPY ./toolbox /app/toolbox
WORKDIR /app
RUN useradd -m -u 1000 fuzzforge
USER fuzzforge
CMD ["python", "-m", "toolbox.main"]
```
---
## How Are Containers Networked and Connected?
- **Docker Compose Network:** All containers are attached to a custom bridge network for internal communication.
- **Internal DNS:** Services communicate using Docker Compose service names.
- **Port Exposure:** Only necessary ports are exposed to the host.
- **Network Isolation:** Workflow containers are isolated from infrastructure containers when possible.
Example network config:
```yaml
networks:
fuzzforge:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
```
---
## How Is Data Managed with Volumes?
### Volume Types
- **Target Code Volume:** Mounts the code to be analyzed, read-only, into the container.
- **Result Volume:** Stores workflow results and artifacts, persists after container exit.
- **Temporary Volumes:** Used for scratch space, destroyed with the container.
Example volume mount:
```yaml
volumes:
- "/host/path/to/code:/app/target:ro"
- "fuzzforge_alpha_prefect_storage:/app/prefect"
```
### Volume Security
- **Read-only Mounts:** Prevent workflows from modifying source code.
- **Isolated Results:** Each workflow writes to its own result directory.
- **No Arbitrary Host Access:** Only explicitly mounted paths are accessible.
---
## How Are Images Built and Managed?
- **Automated Builds:** Images are built and pushed to a local registry for development, or a secure registry for production.
- **Build Optimization:** Use layer caching, multi-stage builds, and minimal base images.
- **Versioning:** Use tags (`latest`, semantic versions, or SHA digests) to track images.
Example build and push:
```bash
docker build -t localhost:5001/fuzzforge-static-analysis:latest .
docker push localhost:5001/fuzzforge-static-analysis:latest
```
---
## How Are Resources Controlled?
- **Memory and CPU Limits:** Set per container to prevent resource exhaustion.
- **Resource Monitoring:** Use `docker stats` and platform APIs to track usage.
- **Alerts:** Detect and handle out-of-memory or CPU throttling events.
Example resource config:
```yaml
services:
prefect-worker:
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'
reservations:
memory: 1G
cpus: '0.5'
```
---
## How Is Security Enforced?
- **Non-root Execution:** Containers run as a dedicated, non-root user.
- **Capability Restrictions:** Drop unnecessary Linux capabilities.
- **Filesystem Protection:** Use read-only filesystems and tmpfs for temporary data.
- **Network Isolation:** Restrict network access to only whats needed.
- **No Privileged Mode:** Containers never run with elevated privileges.
Example security options:
```yaml
services:
prefect-worker:
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
- CHOWN
- SETGID
- SETUID
```
---
## How Is Performance Optimized?
- **Image Layering:** Structure Dockerfiles for efficient caching.
- **Dependency Preinstallation:** Reduce startup time by pre-installing dependencies.
- **Warm Containers:** Optionally pre-create containers for faster workflow startup.
- **Horizontal Scaling:** Scale worker containers to handle more workflows in parallel.
---
## How Are Containers Monitored and Debugged?
- **Health Checks:** Each service and workflow container has a health endpoint or check.
- **Logging:** All container logs are collected and can be accessed via `docker logs` or the FuzzForge API.
- **Debug Access:** Use `docker exec` to access running containers for troubleshooting.
- **Resource Stats:** Monitor with `docker stats` or platform dashboards.
---
## How Does This All Fit Into FuzzForge?
- **Prefect Workers:** Manage the full lifecycle of workflow containers.
- **API Integration:** Exposes container status, logs, and resource metrics.
- **Volume Management:** Ensures results and artifacts are collected and persisted.
- **Security and Resource Controls:** Enforced automatically for every workflow.
---
## In Summary
Docker containers are the foundation of FuzzForges execution model. They provide the isolation, security, and reproducibility needed for robust security analysis workflows—while making it easy to scale, monitor, and extend the platform.
+83
View File
@@ -0,0 +1,83 @@
# FuzzForge AI: Conceptual Overview
Welcome to FuzzForge AI—a multi-agent orchestration platform designed to supercharge your intelligent automation, security workflows, and project knowledge management. This document provides a high-level conceptual introduction to what FuzzForge AI is, what problems it solves, and how its architecture enables powerful, context-aware agent collaboration.
---
## What is FuzzForge AI?
FuzzForge AI is a multi-agent orchestration system that implements the A2A (Agent-to-Agent) protocol for intelligent agent routing, persistent memory management, and project-scoped knowledge graphs. Think of it as an intelligent hub that coordinates a team of specialized agents, each with their own skills, while maintaining context and knowledge across sessions and projects.
**Key Goals:**
- Seamlessly route requests to the right agent for the job
- Preserve and leverage project-specific knowledge
- Enable secure, auditable, and extensible automation workflows
- Make multi-agent collaboration as easy as talking to a single assistant
---
## Core Concepts
### 1. **Agent Orchestration**
FuzzForge AI acts as a conductor, automatically routing your requests to the most capable registered agent. Agents can be local or remote, and each advertises its skills and capabilities via the A2A protocol.
### 2. **Memory & Knowledge Management**
The system features a three-layer memory architecture:
- **Session Persistence:** Keeps track of ongoing sessions and conversations.
- **Semantic Memory:** Archives conversations and enables semantic search.
- **Knowledge Graphs:** Maintains structured, project-scoped knowledge for deep context.
### 3. **Artifact System**
Artifacts are files or structured content generated, processed, or shared by agents. The artifact system supports creation, storage, and secure sharing of code, configs, reports, and more—enabling reproducible, auditable workflows.
### 4. **A2A Protocol Compliance**
FuzzForge AI fully implements the A2A (Agent-to-Agent) protocol (spec 0.3.0), ensuring standardized, interoperable communication between agents—whether they're running locally or across the network.
---
## High-Level Architecture
Here's how the main components fit together:
```
FuzzForge AI System
├── CLI Interface (cli.py)
│ ├── Commands & Session Management
│ └── Agent Registry Persistence
├── Agent Core (agent.py)
│ ├── Main Coordinator
│ └── Memory Manager Integration
├── Agent Executor (agent_executor.py)
│ ├── Tool Management & Orchestration
│ ├── ROUTE_TO Pattern Implementation
│ └── Artifact Creation & Management
├── Memory Architecture (Three Layers)
│ ├── Session Persistence
│ ├── Semantic Memory
│ └── Knowledge Graphs
├── A2A Communication Layer
│ ├── Remote Agent Connection
│ ├── Agent Card Management
│ └── Protocol Compliance
└── A2A Server (a2a_server.py)
├── HTTP/SSE Server
├── Artifact HTTP Serving
└── Task Store & Queue Management
```
**How it works:**
1. **User Input:** You interact via CLI or API, using natural language or commands.
2. **Agent Routing:** The system decides whether to handle the request itself or route it to a specialist agent.
3. **Tool Execution:** Built-in and agent-provided tools perform operations.
4. **Memory Integration:** Results and context are stored for future use.
5. **Response Generation:** The system returns results, often with artifacts or actionable insights.
---
## Why FuzzForge AI?
- **Extensible:** Easily add new agents, tools, and workflows.
- **Context-Aware:** Remembers project history, conversations, and knowledge.
- **Secure:** Project isolation, input validation, and artifact management.
- **Collaborative:** Enables multi-agent workflows and knowledge sharing.
- **Fun & Productive:** Designed to make automation and security tasks less tedious and more interactive.
+618
View File
@@ -0,0 +1,618 @@
# SARIF Format
FuzzForge uses the Static Analysis Results Interchange Format (SARIF) as the standardized output format for all security analysis results. SARIF provides a consistent, machine-readable format that enables tool interoperability and comprehensive result analysis.
## What is SARIF?
### Overview
SARIF (Static Analysis Results Interchange Format) is an OASIS-approved standard (SARIF 2.1.0) designed to standardize the output of static analysis tools. FuzzForge extends this standard to cover dynamic analysis, secret detection, infrastructure analysis, and fuzzing results.
### Key Benefits
- **Standardization**: Consistent format across all security tools and workflows
- **Interoperability**: Integration with existing security tools and platforms
- **Rich Metadata**: Comprehensive information about findings, tools, and analysis runs
- **Tool Agnostic**: Works with any security tool that produces structured results
- **IDE Integration**: Native support in modern development environments
### SARIF Structure
```json
{
"version": "2.1.0",
"schema": "https://json.schemastore.org/sarif-2.1.0.json",
"runs": [
{
"tool": { /* Tool information */ },
"invocations": [ /* How the tool was run */ ],
"artifacts": [ /* Files analyzed */ ],
"results": [ /* Security findings */ ]
}
]
}
```
## FuzzForge SARIF Implementation
### Run Structure
Each FuzzForge workflow produces a SARIF "run" containing:
```json
{
"tool": {
"driver": {
"name": "FuzzForge",
"version": "1.0.0",
"informationUri": "https://github.com/FuzzingLabs/fuzzforge",
"organization": "FuzzingLabs",
"rules": [ /* Security rules applied */ ]
},
"extensions": [
{
"name": "semgrep",
"version": "1.45.0",
"rules": [ /* Semgrep-specific rules */ ]
}
]
},
"invocations": [
{
"executionSuccessful": true,
"startTimeUtc": "2025-09-25T12:00:00.000Z",
"endTimeUtc": "2025-09-25T12:05:30.000Z",
"workingDirectory": {
"uri": "file:///app/target/"
},
"commandLine": "python -m toolbox.workflows.static_analysis",
"environmentVariables": {
"WORKFLOW_TYPE": "static_analysis_scan"
}
}
]
}
```
### Result Structure
Each security finding is represented as a SARIF result:
```json
{
"ruleId": "semgrep.security.audit.sqli.pg-sqli",
"ruleIndex": 42,
"level": "error",
"message": {
"text": "Potential SQL injection vulnerability detected"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "src/database/queries.py",
"uriBaseId": "SRCROOT"
},
"region": {
"startLine": 156,
"startColumn": 20,
"endLine": 156,
"endColumn": 45,
"snippet": {
"text": "cursor.execute(query)"
}
}
}
}
],
"properties": {
"tool": "semgrep",
"confidence": "high",
"severity": "high",
"cwe": ["CWE-89"],
"owasp": ["A03:2021"],
"references": [
"https://owasp.org/Top10/A03_2021-Injection/"
]
}
}
```
## Finding Categories and Severity
### Severity Levels
FuzzForge maps tool-specific severity levels to SARIF standard levels:
#### SARIF Level Mapping
- **error**: Critical and High severity findings
- **warning**: Medium severity findings
- **note**: Low severity findings
- **info**: Informational findings
#### Extended Severity Properties
```json
{
"properties": {
"severity": "high", // FuzzForge severity
"confidence": "medium", // Tool confidence
"exploitability": "high", // Likelihood of exploitation
"impact": "data_breach" // Potential impact
}
}
```
### Vulnerability Classification
#### CWE (Common Weakness Enumeration)
```json
{
"properties": {
"cwe": ["CWE-89", "CWE-79"],
"cwe_category": "Injection"
}
}
```
#### OWASP Top 10 Mapping
```json
{
"properties": {
"owasp": ["A03:2021", "A06:2021"],
"owasp_category": "Injection"
}
}
```
#### Tool-Specific Classifications
```json
{
"properties": {
"tool_category": "security",
"rule_type": "semantic_grep",
"finding_type": "sql_injection"
}
}
```
## Multi-Tool Result Aggregation
### Tool Extension Model
FuzzForge aggregates results from multiple tools using SARIF's extension model:
```json
{
"tool": {
"driver": {
"name": "FuzzForge",
"version": "1.0.0"
},
"extensions": [
{
"name": "semgrep",
"version": "1.45.0",
"guid": "semgrep-extension-guid"
},
{
"name": "bandit",
"version": "1.7.5",
"guid": "bandit-extension-guid"
}
]
}
}
```
### Result Correlation
#### Cross-Tool Finding Correlation
```json
{
"ruleId": "fuzzforge.correlation.sql-injection",
"level": "error",
"message": {
"text": "SQL injection vulnerability confirmed by multiple tools"
},
"locations": [ /* Primary location */ ],
"relatedLocations": [ /* Additional contexts */ ],
"properties": {
"correlation_id": "corr-001",
"confirming_tools": ["semgrep", "bandit"],
"confidence_score": 0.95,
"aggregated_severity": "critical"
}
}
```
#### Finding Relationships
```json
{
"ruleId": "semgrep.security.audit.xss.direct-use-of-jinja2",
"properties": {
"related_findings": [
{
"correlation_type": "same_vulnerability_class",
"related_rule": "bandit.B703",
"relationship": "confirms"
},
{
"correlation_type": "attack_chain",
"related_rule": "nuclei.xss.reflected",
"relationship": "exploits"
}
]
}
}
```
## Workflow-Specific Extensions
### Static Analysis Results
```json
{
"properties": {
"analysis_type": "static",
"language": "python",
"complexity_score": 3.2,
"coverage": {
"lines_analyzed": 15420,
"functions_analyzed": 892,
"classes_analyzed": 156
}
}
}
```
### Dynamic Analysis Results
```json
{
"properties": {
"analysis_type": "dynamic",
"test_method": "web_application_scan",
"target_url": "https://example.com",
"http_method": "POST",
"request_payload": "user_input=<script>alert(1)</script>",
"response_code": 200,
"exploitation_proof": "alert_box_displayed"
}
}
```
### Secret Detection Results
```json
{
"properties": {
"analysis_type": "secret_detection",
"secret_type": "api_key",
"entropy_score": 4.2,
"commit_hash": "abc123def456",
"commit_date": "2025-09-20T10:30:00Z",
"author": "developer@example.com",
"exposure_duration": "30_days"
}
}
```
### Infrastructure Analysis Results
```json
{
"properties": {
"analysis_type": "infrastructure",
"resource_type": "docker_container",
"policy_violation": "privileged_container",
"compliance_framework": ["CIS", "NIST"],
"remediation_effort": "low",
"deployment_risk": "high"
}
}
```
### Fuzzing Results
```json
{
"properties": {
"analysis_type": "fuzzing",
"fuzzer": "afl++",
"crash_type": "segmentation_fault",
"crash_address": "0x7fff8b2a1000",
"exploitability": "likely_exploitable",
"test_case": "base64:SGVsbG8gV29ybGQ=",
"coverage_achieved": "85%"
}
}
```
## SARIF Processing and Analysis
### Result Filtering
#### Severity-Based Filtering
```python
def filter_by_severity(sarif_results, min_severity="medium"):
"""Filter SARIF results by minimum severity level"""
severity_order = {"info": 0, "note": 1, "warning": 2, "error": 3}
min_level = severity_order.get(min_severity, 1)
filtered_results = []
for result in sarif_results["runs"][0]["results"]:
result_level = severity_order.get(result.get("level", "note"), 1)
if result_level >= min_level:
filtered_results.append(result)
return filtered_results
```
#### Rule-Based Filtering
```python
def filter_by_rules(sarif_results, rule_patterns):
"""Filter results by rule ID patterns"""
import re
filtered_results = []
for result in sarif_results["runs"][0]["results"]:
rule_id = result.get("ruleId", "")
for pattern in rule_patterns:
if re.match(pattern, rule_id):
filtered_results.append(result)
break
return filtered_results
```
### Statistical Analysis
#### Severity Distribution
```python
def analyze_severity_distribution(sarif_results):
"""Analyze distribution of findings by severity"""
distribution = {"error": 0, "warning": 0, "note": 0, "info": 0}
for result in sarif_results["runs"][0]["results"]:
level = result.get("level", "note")
distribution[level] += 1
return distribution
```
#### Tool Coverage Analysis
```python
def analyze_tool_coverage(sarif_results):
"""Analyze which tools contributed findings"""
tool_stats = {}
for result in sarif_results["runs"][0]["results"]:
tool = result.get("properties", {}).get("tool", "unknown")
if tool not in tool_stats:
tool_stats[tool] = {"count": 0, "severities": {"error": 0, "warning": 0, "note": 0, "info": 0}}
tool_stats[tool]["count"] += 1
level = result.get("level", "note")
tool_stats[tool]["severities"][level] += 1
return tool_stats
```
## SARIF Export and Integration
### Export Formats
#### JSON Export
```python
def export_sarif_json(sarif_results, output_path):
"""Export SARIF results as JSON"""
import json
with open(output_path, 'w') as f:
json.dump(sarif_results, f, indent=2, ensure_ascii=False)
```
#### CSV Export for Spreadsheets
```python
def export_sarif_csv(sarif_results, output_path):
"""Export SARIF results as CSV for spreadsheet analysis"""
import csv
with open(output_path, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Rule ID', 'Severity', 'Message', 'File', 'Line', 'Tool'])
for result in sarif_results["runs"][0]["results"]:
rule_id = result.get("ruleId", "unknown")
level = result.get("level", "note")
message = result.get("message", {}).get("text", "")
tool = result.get("properties", {}).get("tool", "unknown")
for location in result.get("locations", []):
physical_location = location.get("physicalLocation", {})
file_path = physical_location.get("artifactLocation", {}).get("uri", "")
line = physical_location.get("region", {}).get("startLine", "")
writer.writerow([rule_id, level, message, file_path, line, tool])
```
### IDE Integration
#### Visual Studio Code
SARIF files can be opened directly in VS Code with the SARIF extension:
```json
{
"recommendations": ["ms-sarif.sarif-viewer"],
"sarif.viewer.connectToGitHub": true,
"sarif.viewer.showResultsInExplorer": true
}
```
#### GitHub Integration
GitHub automatically processes SARIF files uploaded through Actions:
```yaml
- name: Upload SARIF results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: fuzzforge-results.sarif
category: security-analysis
```
### API Integration
#### SARIF Result Access
```python
# Example: Accessing SARIF results via FuzzForge API
async with FuzzForgeClient() as client:
result = await client.get_workflow_result(run_id)
# Access SARIF data
sarif_data = result["sarif"]
findings = sarif_data["runs"][0]["results"]
# Filter critical findings
critical_findings = [
f for f in findings
if f.get("level") == "error" and
f.get("properties", {}).get("severity") == "critical"
]
```
## SARIF Validation and Quality
### Schema Validation
```python
import jsonschema
import requests
def validate_sarif(sarif_data):
"""Validate SARIF data against official schema"""
schema_url = "https://json.schemastore.org/sarif-2.1.0.json"
schema = requests.get(schema_url).json()
try:
jsonschema.validate(sarif_data, schema)
return True, "Valid SARIF 2.1.0 format"
except jsonschema.ValidationError as e:
return False, f"SARIF validation error: {e.message}"
```
### Quality Metrics
```python
def calculate_sarif_quality_metrics(sarif_data):
"""Calculate quality metrics for SARIF results"""
results = sarif_data["runs"][0]["results"]
metrics = {
"total_findings": len(results),
"findings_with_location": len([r for r in results if r.get("locations")]),
"findings_with_message": len([r for r in results if r.get("message", {}).get("text")]),
"findings_with_remediation": len([r for r in results if r.get("fixes")]),
"unique_rules": len(set(r.get("ruleId") for r in results)),
"coverage_percentage": calculate_coverage(sarif_data)
}
metrics["quality_score"] = (
metrics["findings_with_location"] / max(metrics["total_findings"], 1) * 0.3 +
metrics["findings_with_message"] / max(metrics["total_findings"], 1) * 0.3 +
metrics["findings_with_remediation"] / max(metrics["total_findings"], 1) * 0.2 +
min(metrics["coverage_percentage"] / 100, 1.0) * 0.2
)
return metrics
```
## Advanced SARIF Features
### Fixes and Remediation
```json
{
"ruleId": "semgrep.security.audit.sqli.pg-sqli",
"fixes": [
{
"description": {
"text": "Use parameterized queries to prevent SQL injection"
},
"artifactChanges": [
{
"artifactLocation": {
"uri": "src/database/queries.py"
},
"replacements": [
{
"deletedRegion": {
"startLine": 156,
"startColumn": 20,
"endLine": 156,
"endColumn": 45
},
"insertedContent": {
"text": "cursor.execute(query, params)"
}
}
]
}
]
}
]
}
```
### Code Flows for Complex Vulnerabilities
```json
{
"ruleId": "dataflow.taint.sql-injection",
"codeFlows": [
{
"message": {
"text": "Tainted data flows from user input to SQL query"
},
"threadFlows": [
{
"locations": [
{
"location": {
"physicalLocation": {
"artifactLocation": {"uri": "src/api/handlers.py"},
"region": {"startLine": 45}
}
},
"state": {"source": "user_input"},
"nestingLevel": 0
},
{
"location": {
"physicalLocation": {
"artifactLocation": {"uri": "src/database/queries.py"},
"region": {"startLine": 156}
}
},
"state": {"sink": "sql_query"},
"nestingLevel": 0
}
]
}
]
}
]
}
```
---
## SARIF Best Practices
### Result Quality
- **Precise Locations**: Always include accurate file paths and line numbers
- **Clear Messages**: Write descriptive, actionable finding messages
- **Remediation Guidance**: Include fix suggestions when possible
- **Severity Consistency**: Use consistent severity mappings across tools
### Performance
- **Efficient Processing**: Process SARIF results efficiently for large result sets
- **Streaming**: Use streaming for very large SARIF files
- **Caching**: Cache processed results for faster repeated access
- **Compression**: Compress SARIF files for storage and transmission
### Integration
- **Tool Interoperability**: Ensure SARIF compatibility with existing tools
- **Standard Compliance**: Follow SARIF 2.1.0 specification precisely
- **Extension Documentation**: Document any custom extensions clearly
- **Version Management**: Handle SARIF schema version differences
+174
View File
@@ -0,0 +1,174 @@
# Security Analysis in FuzzForge: Concepts and Approach
Security analysis is at the core of FuzzForges mission. This page explains the philosophy, methodologies, and integration patterns that shape how FuzzForge discovers vulnerabilities and helps teams secure their software. If youre curious about what “security analysis” really means in this platform—and why its designed this way—read on.
---
## Why Does FuzzForge Approach Security Analysis This Way?
FuzzForges security analysis is built on a few guiding principles:
- **Defense in Depth:** No single tool or method catches everything. FuzzForge layers multiple analysis types—static, dynamic, secret detection, infrastructure checks, and fuzzing—to maximize coverage.
- **Tool Diversity:** Different tools find different issues. Running several tools for each analysis type reduces blind spots and increases confidence in results.
- **Standardized Results:** All findings are normalized into SARIF, a widely adopted format. This makes results easy to aggregate, review, and integrate with other tools.
- **Automation and Integration:** Security analysis is only useful if it fits into real-world workflows. FuzzForge is designed for CI/CD, developer feedback, and automated reporting.
---
## What Types of Security Analysis Does FuzzForge Perform?
### Static Analysis
- **What it is:** Examines source code without running it, looking for vulnerabilities, anti-patterns, and risky constructs.
- **How it works:** Parses code, analyzes control and data flow, and matches patterns against known vulnerabilities.
- **Tools:** Semgrep, Bandit, CodeQL, ESLint, and more.
- **Strengths:** Fast, broad coverage, no runtime needed.
- **Limitations:** Cant see runtime issues, may produce false positives.
### Dynamic Analysis
- **What it is:** Tests running applications to find vulnerabilities that only appear at runtime.
- **How it works:** Deploys the app in a test environment, probes entry points, and observes behavior under attack.
- **Tools:** Nuclei, OWASP ZAP, Nmap, SQLMap.
- **Strengths:** Finds real, exploitable issues; validates actual behavior.
- **Limitations:** Needs a working environment; slower; may not cover all code.
### Secret Detection
- **What it is:** Scans code and configuration for exposed credentials, API keys, and sensitive data.
- **How it works:** Uses pattern matching, entropy analysis, and context checks—sometimes even scanning git history.
- **Tools:** TruffleHog, Gitleaks, detect-secrets, GitGuardian.
- **Strengths:** Fast, critical for preventing leaks.
- **Limitations:** Cant find encrypted/encoded secrets; needs regular pattern updates.
### Infrastructure Analysis
- **What it is:** Analyzes infrastructure-as-code, container configs, and deployment manifests for security misconfigurations.
- **How it works:** Parses config files, applies security policies, checks compliance, and assesses risk.
- **Tools:** Checkov, Hadolint, Kubesec, Terrascan.
- **Strengths:** Prevents misconfigurations before deployment; automates compliance.
- **Limitations:** Cant see runtime changes; depends on up-to-date policies.
### Fuzzing
- **What it is:** Automatically generates and sends unexpected or random inputs to code, looking for crashes or unexpected behavior.
- **How it works:** Identifies targets, generates inputs, monitors execution, and analyzes crashes.
- **Tools:** AFL++, libFuzzer, Cargo Fuzz, Jazzer.
- **Strengths:** Finds deep, complex bugs; great for memory safety.
- **Limitations:** Resource-intensive; may need manual setup.
### Comprehensive Assessment
- **What it is:** Combines all the above for a holistic view, correlating findings and prioritizing risks.
- **How it works:** Runs multiple analyses, aggregates and correlates results, and generates unified reports.
- **Benefits:** Complete coverage, better context, prioritized remediation, and compliance support.
---
## How Does FuzzForge Integrate and Orchestrate Analysis?
### Workflow Composition
FuzzForge composes analysis workflows by combining different analysis types, each running in its own containerized environment. Inputs (code, configs, parameters) are fed into the appropriate tools, and results are normalized and aggregated.
```mermaid
graph TB
subgraph "Input"
Target[Target Codebase]
Config[Analysis Configuration]
end
subgraph "Analysis Workflows"
Static[Static Analysis]
Dynamic[Dynamic Analysis]
Secrets[Secret Detection]
Infra[Infrastructure Analysis]
Fuzz[Fuzzing Analysis]
end
subgraph "Processing"
Normalize[Result Normalization]
Merge[Finding Aggregation]
Correlate[Cross-Tool Correlation]
end
subgraph "Output"
SARIF[SARIF Results]
Report[Security Report]
Metrics[Analysis Metrics]
end
Target --> Static
Target --> Dynamic
Target --> Secrets
Target --> Infra
Target --> Fuzz
Config --> Static
Config --> Dynamic
Config --> Secrets
Config --> Infra
Config --> Fuzz
Static --> Normalize
Dynamic --> Normalize
Secrets --> Normalize
Infra --> Normalize
Fuzz --> Normalize
Normalize --> Merge
Merge --> Correlate
Correlate --> SARIF
Correlate --> Report
Correlate --> Metrics
```
### Orchestration Patterns
- **Parallel Execution:** Tools of the same type (e.g., multiple static analyzers) run in parallel for speed and redundancy.
- **Sequential Execution:** Some analyses depend on previous results (e.g., dynamic analysis using endpoints found by static analysis).
- **Result Normalization:** All findings are converted to SARIF for consistency.
- **Correlation:** Related findings from different tools are grouped and prioritized.
---
## How Is Quality Ensured?
### Metrics and Measurement
- **Coverage:** How much code, how many rules, and how many vulnerability types are analyzed.
- **Accuracy:** False positive/negative rates, confidence scores, and validation rates.
- **Performance:** Analysis duration, resource usage, and scalability.
### Quality Assurance
- **Cross-Tool Validation:** Findings are confirmed by multiple tools when possible.
- **Manual Review:** High-severity findings can be flagged for expert review.
- **Continuous Improvement:** Tools and rules are updated regularly, and user feedback is incorporated.
---
## How Does Security Analysis Fit Into Development Workflows?
### CI/CD Integration
- **Pre-commit Hooks:** Run security checks before code is committed.
- **Pipeline Integration:** Block deployments if high/critical issues are found.
- **Quality Gates:** Enforce severity thresholds and track trends over time.
### Developer Experience
- **IDE Integration:** Import SARIF findings into supported IDEs for inline feedback.
- **Real-Time Analysis:** Optionally run background checks during development.
- **Reporting:** Executive dashboards, technical reports, and compliance summaries.
---
## Whats Next for Security Analysis in FuzzForge?
FuzzForge is designed to evolve. Advanced techniques like machine learning for pattern recognition, contextual analysis, and business logic checks are on the roadmap. The goal: keep raising the bar for automated, actionable, and developer-friendly security analysis.
---
## In Summary
FuzzForges security analysis is comprehensive, layered, and designed for real-world integration. By combining multiple analysis types, normalizing results, and focusing on automation and developer experience, FuzzForge helps teams find and fix vulnerabilities—before attackers do.
+128
View File
@@ -0,0 +1,128 @@
# Understanding Workflows in FuzzForge
Workflows are the backbone of FuzzForges security analysis platform. If you want to get the most out of FuzzForge, its essential to understand what workflows are, how theyre designed, and how they operate from start to finish. This page explains the core concepts, design principles, and execution models behind FuzzForge workflows—so you can use them confidently and effectively.
---
## What Is a Workflow?
A **workflow** in FuzzForge is a containerized process that orchestrates one or more security tools to analyze a target codebase or application. Each workflow is tailored for a specific type of security analysis (like static analysis, secret detection, or fuzzing) and is designed to be:
- **Isolated:** Runs in its own Docker container for security and reproducibility.
- **Integrated:** Can combine multiple tools for comprehensive results.
- **Standardized:** Always produces SARIF-compliant output.
- **Configurable:** Accepts parameters to customize analysis.
- **Scalable:** Can run in parallel and scale horizontally.
---
## How Does a Workflow Operate?
### High-Level Architecture
Heres how a workflow moves through the FuzzForge system:
```mermaid
graph TB
User[User/CLI/API] --> API[FuzzForge API]
API --> Prefect[Prefect Orchestrator]
Prefect --> Worker[Prefect Worker]
Worker --> Container[Docker Container]
Container --> Tools[Security Tools]
Tools --> Results[SARIF Results]
Results --> Storage[Persistent Storage]
```
**Key roles:**
- **User/CLI/API:** Submits and manages workflows.
- **FuzzForge API:** Validates, orchestrates, and tracks workflows.
- **Prefect Orchestrator:** Schedules and manages workflow execution.
- **Prefect Worker:** Runs the workflow in a Docker container.
- **Security Tools:** Perform the actual analysis.
- **Persistent Storage:** Stores results and artifacts.
---
## Workflow Lifecycle: From Idea to Results
1. **Design:** Choose tools, define integration logic, set up parameters, and build the Docker image.
2. **Deployment:** Build and push the image, register the workflow, and configure defaults.
3. **Execution:** User submits a workflow; parameters and target are validated; the workflow is scheduled and executed in a container; tools run as designed.
4. **Completion:** Results are collected, normalized, and stored; status is updated; temporary resources are cleaned up; results are made available via API/CLI.
---
## Types of Workflows
FuzzForge supports several workflow types, each optimized for a specific security need:
- **Static Analysis:** Examines source code without running it (e.g., Semgrep, Bandit).
- **Dynamic Analysis:** Tests running applications for runtime vulnerabilities (e.g., OWASP ZAP, Nuclei).
- **Secret Detection:** Finds exposed credentials and sensitive data (e.g., TruffleHog, Gitleaks).
- **Infrastructure Analysis:** Checks infrastructure-as-code and configs for misconfigurations (e.g., Checkov, Hadolint).
- **Fuzzing:** Generates unexpected inputs to find crashes and edge cases (e.g., AFL++, libFuzzer).
- **Comprehensive Assessment:** Combines multiple analysis types for full coverage.
---
## Workflow Design Principles
- **Tool Agnostic:** Workflows abstract away the specifics of underlying tools, providing a consistent interface.
- **Fail-Safe Execution:** If one tool fails, others continue—partial results are still valuable.
- **Configurable:** Users can adjust parameters to control tool behavior, output, and execution.
- **Resource-Aware:** Workflows specify and respect resource limits (CPU, memory).
- **Standardized Output:** All results are normalized to SARIF for easy integration and reporting.
---
## Execution Models
- **Synchronous:** Wait for the workflow to finish and get results immediately—great for interactive use.
- **Asynchronous:** Submit a workflow and check back later for results—ideal for long-running or batch jobs.
- **Parallel:** Run multiple workflows at once for comprehensive or time-sensitive analysis.
---
## Data Flow and Storage
- **Input:** Target code and parameters are validated and mounted as read-only volumes.
- **Processing:** Tools are initialized and run (often in parallel); outputs are collected and normalized.
- **Output:** Results are stored in persistent volumes and indexed for fast retrieval; metadata is saved in the database; intermediate results may be cached for performance.
---
## Error Handling and Recovery
- **Tool-Level:** Timeouts, resource exhaustion, and crashes are handled gracefully; failed tools dont stop the workflow.
- **Workflow-Level:** Container failures, volume issues, and network problems are detected and reported.
- **Recovery:** Automatic retries for transient errors; partial results are returned when possible; workflows degrade gracefully if some tools are unavailable.
---
## Performance and Optimization
- **Container Efficiency:** Docker images are layered and cached for fast startup; containers may be reused when safe.
- **Parallel Processing:** Independent tools run concurrently to maximize CPU usage and minimize wait times.
- **Caching:** Images, dependencies, and intermediate results are cached to avoid unnecessary recomputation.
---
## Monitoring and Observability
- **Metrics:** Track execution time, resource usage, and success/failure rates.
- **Logging:** Structured logs and tool outputs are captured for debugging and analysis.
- **Real-Time Monitoring:** Live status updates and progress indicators are available via API/WebSocket.
---
## Integration Patterns
- **CI/CD:** Integrate workflows into pipelines to block deployments on critical findings.
- **API:** Programmatically submit and track workflows from your own tools or scripts.
- **Event-Driven:** Use webhooks or event listeners to trigger actions on workflow completion.
---
## In Summary
Workflows in FuzzForge are designed to be robust, flexible, and easy to integrate into your security and development processes. By combining containerization, orchestration, and a standardized interface, FuzzForge workflows help you automate and scale security analysis—so you can focus on fixing issues, not just finding them.
@@ -0,0 +1,72 @@
# Working with documentation
To update the documentation on any of the sections just add a new markdown file to the designated subfolder below :
```
├─concepts
├─tutorials
├─how-to
│ └─troubleshooting
└─reference
├─architecture
├─decisions
└─faq
```
:::note Templates
Each folder contains templates that can be used as quickstarts. Those are named `<template name>.tpml`.
:::
See [Diataxis documentation](../reference/diataxis-documentation.md) for more information on diátaxis.
## Manage Docs Versions
Docusaurus can manage multiple versions of the docs.
### Create a docs version
Release a version 1.0 of your project:
```bash
npm run docusaurus docs:version 1.0
```
The `docs` folder is copied into `versioned_docs/version-1.0` and `versions.json` is created.
Your docs now have 2 versions:
- `1.0` at `http://localhost:3000/docs/` for the version 1.0 docs
- `current` at `http://localhost:3000/docs/next/` for the **upcoming, unreleased docs**
### Add a Version Dropdown
To navigate seamlessly across versions, add a version dropdown.
Modify the `docusaurus.config.js` file:
```js title="docusaurus.config.js"
export default {
themeConfig: {
navbar: {
items: [
// highlight-start
{
type: 'docsVersionDropdown',
},
// highlight-end
],
},
},
};
```
The docs version dropdown appears in the navbar.
## Update an existing version
It is possible to edit versioned docs in their respective folder:
- `versioned_docs/version-1.0/hello.md` updates `http://localhost:3000/docs/hello`
- `docs/hello.md` updates `http://localhost:3000/docs/next/hello`